WorldWideScience

Sample records for liferaft data-driven batch

  1. Data-driven batch schuduling

    Energy Technology Data Exchange (ETDEWEB)

    Bent, John [Los Alamos National Laboratory; Denehy, Tim [GOOGLE; Arpaci - Dusseau, Remzi [UNIV OF WISCONSIN; Livny, Miron [UNIV OF WISCONSIN; Arpaci - Dusseau, Andrea C [NON LANL

    2009-01-01

    In this paper, we develop data-driven strategies for batch computing schedulers. Current CPU-centric batch schedulers ignore the data needs within workloads and execute them by linking them transparently and directly to their needed data. When scheduled on remote computational resources, this elegant solution of direct data access can incur an order of magnitude performance penalty for data-intensive workloads. Adding data-awareness to batch schedulers allows a careful coordination of data and CPU allocation thereby reducing the cost of remote execution. We offer here new techniques by which batch schedulers can become data-driven. Such systems can use our analytical predictive models to select one of the four data-driven scheduling policies that we have created. Through simulation, we demonstrate the accuracy of our predictive models and show how they can reduce time to completion for some workloads by as much as 80%.

  2. Data Driven Modeling for Monitoring and Control of Industrial Fed-Batch Cultivations

    DEFF Research Database (Denmark)

    Bonné, Dennis; Alvarez, María Antonieta; Jørgensen, Sten Bay

    2014-01-01

    time within the batch and the batch number. The model set is parsimoniously parametrized as a set of local, interdependent models which are estimated from data for as few as half a dozen batches. On the basis of state space models transformed from the acquired input–output model set, the asymptotic...

  3. Data-driven storytelling

    CERN Document Server

    Henry Riche, Nathalie

    2017-01-01

    This book is an accessible introduction to data-driven storytelling, resulting from discussions between data visualization researchers and data journalists. This book will be the first to define the topic, present compelling examples and existing resources, as well as identify challenges and new opportunities for research.

  4. Data-driven computational mechanics

    CERN Document Server

    Kirchdoerfer, Trenton

    2015-01-01

    We develop a new computing paradigm, which we refer to as data-driven computing, according to which calculations are carried out directly from experimental material data and pertinent constraints and conservation laws, such as compatibility and equilibrium, thus bypassing the empirical material modeling step of conventional computing altogether. Data-driven solvers seek to assign to each material point the state from a prespecified data set that is closest to satisfying the conservation laws. Equivalently, data-driven solvers aim to find the state satisfying the conservation laws that is closest to the data set. The resulting data-driven problem thus consists of the minimization of a distance function to the data set in phase space subject to constraints introduced by the conservation laws. We motivate the data-driven paradigm and investigate the performance of data-driven solvers by means of two examples of application, namely, the static equilibrium of nonlinear three-dimensional trusses and linear elastici...

  5. Data-Driven Proficiency Profiling

    Science.gov (United States)

    Mostafavi, Behrooz; Liu, Zhongxiu; Barnes, Tiffany

    2015-01-01

    Deep Thought is a logic tutor where students practice constructing deductive logic proofs. Within Deep Thought is a data-driven mastery learning system (DDML), which calculates student proficiency based on rule scores weighted by expert-decided weights in order to assign problem sets of appropriate difficulty. In this study, we designed and tested…

  6. 46 CFR 160.051-5 - Design and performance of Coastal Service inflatable liferafts.

    Science.gov (United States)

    2010-10-01

    ... 46 Shipping 6 2010-10-01 2010-10-01 false Design and performance of Coastal Service inflatable... Liferafts for Domestic Service § 160.051-5 Design and performance of Coastal Service inflatable liferafts. To obtain Coast Guard approval, each Coastal Service inflatable liferaft must comply with subpart...

  7. Data driven marketing for dummies

    CERN Document Server

    Semmelroth, David

    2013-01-01

    Embrace data and use it to sell and market your products Data is everywhere and it keeps growing and accumulating. Companies need to embrace big data and make it work harder to help them sell and market their products. Successful data analysis can help marketing professionals spot sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty. Data Driven Marketing For Dummies helps companies use all the data at their disposal to make current customers more satisfied, reach new customers, and sell to their most important customer segments more efficiently. Identifyi

  8. Data-driven architectural production and operation

    NARCIS (Netherlands)

    Bier, H.H.; Mostafavi, S.

    2014-01-01

    Data-driven architectural production and operation as explored within Hyperbody rely heavily on system thinking implying that all parts of a system are to be understood in relation to each other. These relations are increasingly established bi-directionally so that data-driven architecture is not

  9. Data-driven regionalization of housing markets

    NARCIS (Netherlands)

    Helbich, M.; Brunauer, W.; Hagenauer, J.; Leitner, M.

    2013-01-01

    This article presents a data-driven framework for housing market segmentation. Local marginal house price surfaces are investigated by means of mixed geographically weighted regression and are reduced to a set of principal component maps, which in turn serve as input for spatial regionalization. The

  10. On the data-driven COS method

    NARCIS (Netherlands)

    A. Leitao Rodriguez (Álvaro); C.W. Oosterlee (Cornelis); L. Ortiz Gracia (Luis); S.M. Bohte (Sander)

    2018-01-01

    textabstractIn this paper, we present the data-driven COS method, ddCOS. It is a Fourier-based finan- cial option valuation method which assumes the availability of asset data samples: a char- acteristic function of the underlying asset probability density function is not required. As such, the

  11. Data-driven regionalization of housing markets

    NARCIS (Netherlands)

    Helbich, M.|info:eu-repo/dai/nl/370530349; Brunauer, W.; Hagenauer, J.; Leitner, M.

    2013-01-01

    This article presents a data-driven framework for housing market segmentation. Local marginal house price surfaces are investigated by means of mixed geographically weighted regression and are reduced to a set of principal component maps, which in turn serve as input for spatial regionalization. The

  12. Data Driven Constraints for the SVM

    DEFF Research Database (Denmark)

    Darkner, Sune; Clemmensen, Line Katrine Harder

    2012-01-01

    We propose a generalized data driven constraint for support vector machines exemplified by classification of paired observations in general and specifically on the human ear canal. This is particularly interesting in dynamic cases such as tissue movement or pathologies developing over time....... Assuming that two observations of the same subject in different states span a vector, we hypothesise that such structure of the data contains implicit information which can aid the classification, thus the name data driven constraints. We derive a constraint based on the data which allow for the use...... of the ℓ1-norm on the constraint while still allowing for the application of kernels. We specialize the proposed constraint to orthogonality of the vectors between paired observations and the estimated hyperplane. We show that imposing the constraint of orthogonality on the paired data yields a more robust...

  13. Data Driven Tuning of Inventory Controllers

    DEFF Research Database (Denmark)

    Huusom, Jakob Kjøbsted; Santacoloma, Paloma Andrade; Poulsen, Niels Kjølstad

    2007-01-01

    A systematic method for criterion based tuning of inventory controllers based on data-driven iterative feedback tuning is presented. This tuning method circumvent problems with modeling bias. The process model used for the design of the inventory control is utilized in the tuning...... as an approximation to reduce time required on experiments. The method is illustrated in an application with a multivariable inventory control implementation on a four tank system....

  14. Supervision of Fed-Batch Fermentations

    DEFF Research Database (Denmark)

    Gregersen, Lars; Jørgensen, Sten Bay

    1999-01-01

    Process faults may be detected on-line using existing measurements based upon modelling that is entirely data driven. A multivariate statistical model is developed and used for fault diagnosis of an industrial fed-batch fermentation process. Data from several (25) batches are used to develop a mo...

  15. Challenges of Data-driven Healthcare Management

    DEFF Research Database (Denmark)

    Bossen, Claus; Danholt, Peter; Ubbesen, Morten Bonde

    activity and financing and relies of extensive data entry, reporting and calculations. This has required the development of new skills, work and work roles. The second case concerns a New Governance project aimed at developing new performance indicators for healthcare delivery as an alternative to DRG....... Here, a core challenge is select indicators and actually being able to acquire data upon them. The two cases point out that data-driven healthcare requires more and new kinds of work for which new skills, functions and work roles have to be developed....

  16. Combining engineering and data-driven approaches

    DEFF Research Database (Denmark)

    Fischer, Katharina; De Sanctis, Gianluca; Kohler, Jochen

    2015-01-01

    assumptions that may result in a biased risk assessment. In two related papers we show how engineering and data-driven modelling can be combined by developing generic risk models that are calibrated to statistical data on observed fire events. The focus of the present paper is on the calibration procedure....... A framework is developed that is able to deal with data collection in non-homogeneous portfolios of buildings. Also incomplete data sets containing only little information on each fire event can be used for model calibration. To illustrate the capabilities of the proposed framework, it is applied...

  17. Data-Driven Control of Refrigeration System

    DEFF Research Database (Denmark)

    Vinther, Kasper

    facilities without using a pressure sensor. A single-sensor solution is thus provided, which either reduces the variable costs or increases the robustness of the system by not relying on pressure measurements. MSS is an example of data-driven control and can be applied to a broad class of nonlinear control......Refrigeration is used in a wide range of applications, e.g., for storage of food at low temperatures to prolong shelf life and in air conditioning for occupancy comfort. The main focus of this thesis is control of supermarket refrigeration systems. This market is very competitive...... and it is important to keep the variable costs at a minimum and, if possible, offer products which have higher robustness, performance, and functionality than similar products from competitors. However, the multitude of different system configurations, system complexity, component wear, and changing operating...

  18. Data-driven workflows for microservices

    DEFF Research Database (Denmark)

    Safina, Larisa; Mazzara, Manuel; Montesi, Fabrizio

    2016-01-01

    Microservices is an architectural style inspired by service-oriented computing that has recently started gainingpopularity. Jolie is a programming language based on the microservices paradigm: the main building block of Jolie systems are services, in contrast to, e.g., functions or objects....... The primitives offered by the Jolie language elicit many of the recurring patterns found in microservices, like load balancers and structured processes. However, Jolie still lacks some useful constructs for dealing with message types and data manipulation that are present in service-oriented computing......). We show the impact of our implementation on some of the typical scenarios found in microservice systems. This shows how computation can move from a process-driven to a data-driven approach, and leads to the preliminary identification of recurring communication patterns that can be shaped as design...

  19. Data-Driven Security-Constrained OPF

    DEFF Research Database (Denmark)

    Thams, Florian; Halilbasic, Lejla; Pinson, Pierre

    2017-01-01

    In this paper we unify electricity market operations with power system security considerations. Using data-driven techniques, we address both small signal stability and steady-state security, derive tractable decision rules in the form of line flow limits, and incorporate the resulting constraints......, both from measurements and simulations, in order to determine stable and unstable operating regions. With the help of decision trees, we transform this information to linear decision rules for line flow constraints. We propose conditional line transfer limits, which can accurately capture security...... considerations, while being less conservative than current approaches. Our approach can be scalable for large systems, accounts explicitly for power system security, and enables the electricity market to identify a cost-efficient dispatch avoiding redispatching actions. We demonstrate the performance of our...

  20. Data driven innovations in structural health monitoring

    Science.gov (United States)

    Rosales, M. J.; Liyanapathirana, R.

    2017-05-01

    At present, substantial investments are being allocated to civil infrastructures also considered as valuable assets at a national or global scale. Structural Health Monitoring (SHM) is an indispensable tool required to ensure the performance and safety of these structures based on measured response parameters. The research to date on damage assessment has tended to focus on the utilization of wireless sensor networks (WSN) as it proves to be the best alternative over the traditional visual inspections and tethered or wired counterparts. Over the last decade, the structural health and behaviour of innumerable infrastructure has been measured and evaluated owing to several successful ventures of implementing these sensor networks. Various monitoring systems have the capability to rapidly transmit, measure, and store large capacities of data. The amount of data collected from these networks have eventually been unmanageable which paved the way to other relevant issues such as data quality, relevance, re-use, and decision support. There is an increasing need to integrate new technologies in order to automate the evaluation processes as well as to enhance the objectivity of data assessment routines. This paper aims to identify feasible methodologies towards the application of time-series analysis techniques to judiciously exploit the vast amount of readily available as well as the upcoming data resources. It continues the momentum of a greater effort to collect and archive SHM approaches that will serve as data-driven innovations for the assessment of damage through efficient algorithms and data analytics.

  1. 46 CFR 122.730 - Servicing of inflatable liferafts, inflatable buoyant apparatus, inflatable life jackets, and...

    Science.gov (United States)

    2010-10-01

    ... apparatus, inflatable life jackets, and inflated rescue boats. 122.730 Section 122.730 Shipping COAST GUARD..., inflatable life jackets, and inflated rescue boats. (a) An inflatable liferaft or inflatable buoyant... the vessel, provided that the delay does not exceed 5 months. (c) Each inflatable life jacket must be...

  2. 46 CFR 185.730 - Servicing of inflatable liferafts, inflatable buoyant apparatus, inflatable life jackets, and...

    Science.gov (United States)

    2010-10-01

    ... apparatus, inflatable life jackets, and inflated rescue boats. 185.730 Section 185.730 Shipping COAST GUARD... liferafts, inflatable buoyant apparatus, inflatable life jackets, and inflated rescue boats. (a) An... life jacket must be serviced in accordance with the servicing procedure under § 160.176 in subchapter Q...

  3. High dimensional data driven statistical mechanics.

    Science.gov (United States)

    Adachi, Yoshitaka; Sadamatsu, Sunao

    2014-11-01

    In "3D4D materials science", there are five categories such as (a) Image acquisition, (b) Processing, (c) Analysis, (d) Modelling, and (e) Data sharing. This presentation highlights the core of these categories [1]. Analysis and modellingA three-dimensional (3D) microstructure image contains topological features such as connectivity in addition to metric features. Such more microstructural information seems to be useful for more precise property prediction. There are two ways for microstructure-based property prediction (Fig. 1A). One is 3D image data based modelling such as micromechanics or crystal plasticity finite element method. The other one is a numerical microstructural features driven machine learning approach such as artificial neural network or Bayesian estimation method. It is the key to convert the 3D image data into numerals in order to apply the dataset to property prediction. As a numerical feature of microstructures, grain size, number of density, of particles, connectivity of particles, grain boundary connectivity, stacking degree, clustering etc. should be taken into consideration. These microstructural features are so-called "materials genome". Among those materials genome, we have to find out dominant factors to determine a focused property. The dominant factorzs are defined as "descriptor(s)" in high dimensional data driven statistical mechanics.jmicro;63/suppl_1/i4/DFU086F1F1DFU086F1Fig. 1.(a) A concept of 3D4D materials science. (b) Fully-automated serial sectioning 3D microscope "Genus_3D". (c) Materials Genome Archive (JSPS). Image acquisitionIt is important for researchers to choice a 3D microscope from various microscopes depending on a length-scale of a focused microstructure. There is a long term request to acquire a 3D microstructure image more conveniently. Therefore a fully automated serial sectioning 3D optical microscope "Genus_3D" (Fig. 1B) has been developed and nowadays it is commercially available. A user can get a good

  4. Data-Driven Hint Generation from Peer Debugging Solutions

    Science.gov (United States)

    Liu, Zhongxiu

    2015-01-01

    Data-driven methods have been a successful approach to generating hints for programming problems. However, the majority of previous studies are focused on procedural hints that aim at moving students to the next closest state to the solution. In this paper, I propose a data-driven method to generate remedy hints for BOTS, a game that teaches…

  5. Construction, Analysis, and Data-Driven Augmentation of Supersaturated Designs

    Science.gov (United States)

    2013-09-01

    CONSTRUCTION, ANALYSIS, AND DATA-DRIVEN AUGMENTATION OF SUPERSATURATED DESIGNS DISSERTATION Alex J. Gutman, AFIT-ENC-DS-13-S-02 DEPARTMENT OF THE AIR...DRIVEN AUGMENTATION OF SUPERSATURATED DESIGNS DISSERTATION Presented to the Faculty Graduate School of Engineering and Management Air Force Institute...13-S-02 CONSTRUCTION, ANALYSIS, AND DATA-DRIVEN AUGMENTATION OF SUPERSATURATED DESIGNS Alex J. Gutman, BS, MS Approved: //signed// September 2013

  6. Dynamic Data-Driven UAV Network for Plume Characterization

    Science.gov (United States)

    2016-05-23

    management and response. Data driven operation of a mobile sensor network enables asset allocation to regions with highest impact on the mission success. We...operation of a mobile sensor network enables asset allocation to regions with highest impact on the mission success. We studied a dynamic data driven...investigated a two-dimensional Gaussian puff evolving within a uniform background flow. The standard Kalman filter handles the data assimila- tion; an SPH

  7. The Structural Consequences of Big Data-Driven Education.

    Science.gov (United States)

    Zeide, Elana

    2017-06-01

    Educators and commenters who evaluate big data-driven learning environments focus on specific questions: whether automated education platforms improve learning outcomes, invade student privacy, and promote equality. This article puts aside separate unresolved-and perhaps unresolvable-issues regarding the concrete effects of specific technologies. It instead examines how big data-driven tools alter the structure of schools' pedagogical decision-making, and, in doing so, change fundamental aspects of America's education enterprise. Technological mediation and data-driven decision-making have a particularly significant impact in learning environments because the education process primarily consists of dynamic information exchange. In this overview, I highlight three significant structural shifts that accompany school reliance on data-driven instructional platforms that perform core school functions: teaching, assessment, and credentialing. First, virtual learning environments create information technology infrastructures featuring constant data collection, continuous algorithmic assessment, and possibly infinite record retention. This undermines the traditional intellectual privacy and safety of classrooms. Second, these systems displace pedagogical decision-making from educators serving public interests to private, often for-profit, technology providers. They constrain teachers' academic autonomy, obscure student evaluation, and reduce parents' and students' ability to participate or challenge education decision-making. Third, big data-driven tools define what "counts" as education by mapping the concepts, creating the content, determining the metrics, and setting desired learning outcomes of instruction. These shifts cede important decision-making to private entities without public scrutiny or pedagogical examination. In contrast to the public and heated debates that accompany textbook choices, schools often adopt education technologies ad hoc. Given education

  8. Temporal Data-Driven Sleep Scheduling and Spatial Data-Driven Anomaly Detection for Clustered Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Gang Li

    2016-09-01

    Full Text Available The spatial–temporal correlation is an important feature of sensor data in wireless sensor networks (WSNs. Most of the existing works based on the spatial–temporal correlation can be divided into two parts: redundancy reduction and anomaly detection. These two parts are pursued separately in existing works. In this work, the combination of temporal data-driven sleep scheduling (TDSS and spatial data-driven anomaly detection is proposed, where TDSS can reduce data redundancy. The TDSS model is inspired by transmission control protocol (TCP congestion control. Based on long and linear cluster structure in the tunnel monitoring system, cooperative TDSS and spatial data-driven anomaly detection are then proposed. To realize synchronous acquisition in the same ring for analyzing the situation of every ring, TDSS is implemented in a cooperative way in the cluster. To keep the precision of sensor data, spatial data-driven anomaly detection based on the spatial correlation and Kriging method is realized to generate an anomaly indicator. The experiment results show that cooperative TDSS can realize non-uniform sensing effectively to reduce the energy consumption. In addition, spatial data-driven anomaly detection is quite significant for maintaining and improving the precision of sensor data.

  9. Dynamic Data Driven Methods for Self-aware Aerospace Vehicles

    Science.gov (United States)

    2015-04-08

    AFRL-OSR-VA-TR-2015-0127 Dynamic Data Driven Methods for Self-aware Aerospace Vehicles Karen Willcox MASSACHUSETTS INSTITUTE OF TECHNOLOGY Final...Methods for Self-aware Aerospace Vehicles 5a. CONTRACT NUMBER 5b. GRANT NUMBER FA9550-11-1-0339 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Karen E...Back (Rev. 8/98) Dynamic Data Driven Methods for Self-aware Aerospace Vehicles Grant # FA9550-11-1-0339 Final Report Participating Institutions

  10. Data-Driven Learning: Reasonable Fears and Rational Reassurance

    Science.gov (United States)

    Boulton, Alex

    2009-01-01

    Computer corpora have many potential applications in teaching and learning languages, the most direct of which--when the learners explore a corpus themselves--has become known as data-driven learning (DDL). Despite considerable enthusiasm in the research community and interest in higher education, the approach has not made major inroads to…

  11. Data-driven importance distributions for articulated tracking

    DEFF Research Database (Denmark)

    Hauberg, Søren; Pedersen, Kim Steenstrup

    2011-01-01

    We present two data-driven importance distributions for particle filterbased articulated tracking; one based on background subtraction, another on depth information. In order to keep the algorithms efficient, we represent human poses in terms of spatial joint positions. To ensure constant bone...

  12. Social Capital in Data-Driven Community College Reform

    Science.gov (United States)

    Kerrigan, Monica Reid

    2015-01-01

    The current rhetoric around using data to improve community college student outcomes with only limited research on data-driven decision-making (DDDM) within postsecondary education compels a more comprehensive understanding of colleges' capacity for using data to inform decisions. Based on an analysis of faculty and administrators' perceptions and…

  13. A Statistical Quality Model for Data-Driven Speech Animation.

    Science.gov (United States)

    Ma, Xiaohan; Deng, Zhigang

    2012-11-01

    In recent years, data-driven speech animation approaches have achieved significant successes in terms of animation quality. However, how to automatically evaluate the realism of novel synthesized speech animations has been an important yet unsolved research problem. In this paper, we propose a novel statistical model (called SAQP) to automatically predict the quality of on-the-fly synthesized speech animations by various data-driven techniques. Its essential idea is to construct a phoneme-based, Speech Animation Trajectory Fitting (SATF) metric to describe speech animation synthesis errors and then build a statistical regression model to learn the association between the obtained SATF metric and the objective speech animation synthesis quality. Through delicately designed user studies, we evaluate the effectiveness and robustness of the proposed SAQP model. To the best of our knowledge, this work is the first-of-its-kind, quantitative quality model for data-driven speech animation. We believe it is the important first step to remove a critical technical barrier for applying data-driven speech animation techniques to numerous online or interactive talking avatar applications.

  14. Data Driven Decision Making in the Social Studies

    Science.gov (United States)

    Ediger, Marlow

    2010-01-01

    Data driven decision making emphasizes the importance of the teacher using objective sources of information in developing the social studies curriculum. Too frequently, decisions of teachers have been made based on routine and outdated methods of teaching. Valid and reliable tests used to secure results from pupil learning make for better…

  15. Data-driven services marketing in a connected world

    NARCIS (Netherlands)

    Kumar, V.; Chattaraman, Veena; Neghina, Carmen; Skiera, Bernd; Aksoy, Lerzan; Buoye, Alexander; Henseler, Joerg

    2013-01-01

    Purpose – The purpose of this paper is to provide insights into the benefits of data-driven services marketing and provide a conceptual framework for how to link traditional and new sources of customer data and their metrics. Linking data and metrics to strategic and tactical business insights and i

  16. Data mining, knowledge discovery and data-driven modelling

    NARCIS (Netherlands)

    Solomatine, D.P.; Velickov, S.; Bhattacharya, B.; Van der Wal, B.

    2003-01-01

    The project was aimed at exploring the possibilities of a new paradigm in modelling - data-driven modelling, often referred as "data mining". Several application areas were considered: sedimentation problems in the Port of Rotterdam, automatic soil classification on the basis of cone penetration tes

  17. Social Capital in Data-Driven Community College Reform

    Science.gov (United States)

    Kerrigan, Monica Reid

    2015-01-01

    The current rhetoric around using data to improve community college student outcomes with only limited research on data-driven decision-making (DDDM) within postsecondary education compels a more comprehensive understanding of colleges' capacity for using data to inform decisions. Based on an analysis of faculty and administrators' perceptions and…

  18. Data-Driven Learning of Q-Matrix

    Science.gov (United States)

    Liu, Jingchen; Xu, Gongjun; Ying, Zhiliang

    2012-01-01

    The recent surge of interests in cognitive assessment has led to developments of novel statistical models for diagnostic classification. Central to many such models is the well-known "Q"-matrix, which specifies the item-attribute relationships. This article proposes a data-driven approach to identification of the "Q"-matrix and estimation of…

  19. Data-Driven Planning: Using Assessment in Strategic Planning

    Science.gov (United States)

    Bresciani, Marilee J.

    2010-01-01

    Data-driven planning or evidence-based decision making represents nothing new in its concept. For years, business leaders have claimed they have implemented planning informed by data that have been strategically and systematically gathered. Within higher education and student affairs, there may be less evidence of the actual practice of…

  20. Data-Driven Decision Making in Auction Markets

    NARCIS (Netherlands)

    Y. Lu (Yixin)

    2014-01-01

    markdownabstract__Abstract__ This dissertation consists of three essays that examine the promises of data-driven decision making in the design and operationalization of complex auction markets. In the first essay, we derive a structural econometric model to understand the effect of auction design

  1. Data-Driven Model Order Reduction for Bayesian Inverse Problems

    KAUST Repository

    Cui, Tiangang

    2014-01-06

    One of the major challenges in using MCMC for the solution of inverse problems is the repeated evaluation of computationally expensive numerical models. We develop a data-driven projection- based model order reduction technique to reduce the computational cost of numerical PDE evaluations in this context.

  2. A Data-Driven Control Design Approach for Freeway Traffic Ramp Metering with Virtual Reference Feedback Tuning

    Directory of Open Access Journals (Sweden)

    Shangtai Jin

    2014-01-01

    Full Text Available ALINEA is a simple, efficient, and easily implemented ramp metering strategy. Virtual reference feedback tuning (VRFT is most suitable for many practical systems since it is a “one-shot” data-driven control design methodology. This paper presents an application of VRFT to a ramp metering problem of freeway traffic system. When there is not enough prior knowledge of the controlled system to select a proper parameter of ALINEA, the VRFT approach is used to optimize the ALINEA's parameter by only using a batch of input and output data collected from the freeway traffic system. The extensive simulations are built on both the macroscopic MATLAB platform and the microscopic PARAMICS platform to show the effectiveness and applicability of the proposed data-driven controller tuning approach.

  3. Turbulence Model Discovery with Data-Driven Learning and Optimization

    Science.gov (United States)

    King, Ryan; Hamlington, Peter

    2016-11-01

    Data-driven techniques have emerged as a useful tool for model development in applications where first-principles approaches are intractable. In this talk, data-driven multi-task learning techniques are used to discover flow-specific optimal turbulence closure models. We use the recently introduced autonomic closure technique to pose an online supervised learning problem created by test filtering turbulent flows in the self-similar inertial range. The autonomic closure is modified to solve the learning problem for all stress components simultaneously with multi-task learning techniques. The closure is further augmented with a feature extraction step that learns a set of orthogonal modes that are optimal at predicting the turbulent stresses. We demonstrate that these modes can be severely truncated to enable drastic reductions in computational costs without compromising the model accuracy. Furthermore, we discuss the potential universality of the extracted features and implications for reduced order modeling of other turbulent flows.

  4. Data-driven approaches in the investigation of social perception.

    Science.gov (United States)

    Adolphs, Ralph; Nummenmaa, Lauri; Todorov, Alexander; Haxby, James V

    2016-05-05

    The complexity of social perception poses a challenge to traditional approaches to understand its psychological and neurobiological underpinnings. Data-driven methods are particularly well suited to tackling the often high-dimensional nature of stimulus spaces and of neural representations that characterize social perception. Such methods are more exploratory, capitalize on rich and large datasets, and attempt to discover patterns often without strict hypothesis testing. We present four case studies here: behavioural studies on face judgements, two neuroimaging studies of movies, and eyetracking studies in autism. We conclude with suggestions for particular topics that seem ripe for data-driven approaches, as well as caveats and limitations. © 2016 The Author(s).

  5. Undersampled MR Image Reconstruction with Data-Driven Tight Frame

    Directory of Open Access Journals (Sweden)

    Jianbo Liu

    2015-01-01

    Full Text Available Undersampled magnetic resonance image reconstruction employing sparsity regularization has fascinated many researchers in recent years under the support of compressed sensing theory. Nevertheless, most existing sparsity-regularized reconstruction methods either lack adaptability to capture the structure information or suffer from high computational load. With the aim of further improving image reconstruction accuracy without introducing too much computation, this paper proposes a data-driven tight frame magnetic image reconstruction (DDTF-MRI method. By taking advantage of the efficiency and effectiveness of data-driven tight frame, DDTF-MRI trains an adaptive tight frame to sparsify the to-be-reconstructed MR image. Furthermore, a two-level Bregman iteration algorithm has been developed to solve the proposed model. The proposed method has been compared to two state-of-the-art methods on four datasets and encouraging performances have been achieved by DDTF-MRI.

  6. Undersampled MR Image Reconstruction with Data-Driven Tight Frame.

    Science.gov (United States)

    Liu, Jianbo; Wang, Shanshan; Peng, Xi; Liang, Dong

    2015-01-01

    Undersampled magnetic resonance image reconstruction employing sparsity regularization has fascinated many researchers in recent years under the support of compressed sensing theory. Nevertheless, most existing sparsity-regularized reconstruction methods either lack adaptability to capture the structure information or suffer from high computational load. With the aim of further improving image reconstruction accuracy without introducing too much computation, this paper proposes a data-driven tight frame magnetic image reconstruction (DDTF-MRI) method. By taking advantage of the efficiency and effectiveness of data-driven tight frame, DDTF-MRI trains an adaptive tight frame to sparsify the to-be-reconstructed MR image. Furthermore, a two-level Bregman iteration algorithm has been developed to solve the proposed model. The proposed method has been compared to two state-of-the-art methods on four datasets and encouraging performances have been achieved by DDTF-MRI.

  7. Data-Driven Controller Design The H2 Approach

    CERN Document Server

    Sanfelice Bazanella, Alexandre; Eckhard, Diego

    2012-01-01

    Data-driven methodologies have recently emerged as an important paradigm alternative to model-based controller design and several such methodologies are formulated as an H2 performance optimization. This book presents a comprehensive theoretical treatment of the H2 approach to data-driven control design. The fundamental properties implied by the H2 problem formulation are analyzed in detail, so that common features to all solutions are identified. Direct methods (VRFT) and iterative methods (IFT, DFT, CbT) are put under a common theoretical framework. The choice of the reference model, the experimental conditions, the optimization method to be used, and several other designer’s choices are crucial to the quality of the final outcome, and firm guidelines for all these choices are derived from the theoretical analysis presented. The practical application of the concepts in the book is illustrated with a large number of practical designs performed for different classes of processes: thermal, fluid processing a...

  8. A data-driven framework for investigating customer retention

    OpenAIRE

    Mgbemena, Chidozie Simon

    2016-01-01

    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London. This study presents a data-driven simulation framework in order to understand customer behaviour and therefore improve customer retention. The overarching system design methodology used for this study is aligned with the design science paradigm. The Social Media Domain Analysis (SoMeDoA) approach is adopted and evaluated to build a model on the determinants of customer satisfaction ...

  9. Large-Scale Mode Identification and Data-Driven Sciences

    OpenAIRE

    Mukhopadhyay, Subhadeep

    2015-01-01

    Bump-hunting or mode identification is a fundamental problem that arises in almost every scientific field of data-driven discovery. Surprisingly, very few data modeling tools are available for automatic (not requiring manual case-by-base investigation), objective (not subjective), and nonparametric (not based on restrictive parametric model assumptions) mode discovery, which can scale to large data sets. This article introduces LPMode--an algorithm based on a new theory for detecting multimod...

  10. Large-scale mode identification and data-driven sciences

    OpenAIRE

    Mukhopadhyay, Subhadeep

    2017-01-01

    Bump-hunting or mode identification is a fundamental problem that arises in almost every scientific field of data-driven discovery. Surprisingly, very few data modeling tools are available for automatic (not requiring manual case-by-case investigation), objective (not subjective), and nonparametric (not based on restrictive parametric model assumptions) mode discovery, which can scale to large data sets. This article introduces LPMode–an algorithm based on a new theory for detecting multimoda...

  11. Undersampled MR Image Reconstruction with Data-Driven Tight Frame

    OpenAIRE

    Jianbo Liu; Shanshan Wang; Xi Peng; Dong Liang

    2015-01-01

    Undersampled magnetic resonance image reconstruction employing sparsity regularization has fascinated many researchers in recent years under the support of compressed sensing theory. Nevertheless, most existing sparsity-regularized reconstruction methods either lack adaptability to capture the structure information or suffer from high computational load. With the aim of further improving image reconstruction accuracy without introducing too much computation, this paper proposes a data-driven ...

  12. Mobile Assessment in Schizophrenia: A Data-Driven Momentary Approach

    OpenAIRE

    Oorschot, Margreet; Lataster, Tineke; Thewissen, Viviane; Wichers, Marieke; Myin-Germeys, Inez

    2011-01-01

    In this article, a data-driven approach was adopted to demonstrate how real-life diary techniques [ie, the experience sampling method (ESM)] could be deployed for assessment purposes in patients with psychotic disorder, delivering individualized and clinically relevant information. The dataset included patients in an acute phase of psychosis and the focus was on paranoia as one of the main psychotic symptoms (30 patients with high levels of paranoia and 34 with low levels of paranoia). Based ...

  13. Data-Driven Adaptive Observer for Fault Diagnosis

    OpenAIRE

    Shen Yin; Xuebo Yang; Hamid Reza Karimi

    2012-01-01

    This paper presents an approach for data-driven design of fault diagnosis system. The proposed fault diagnosis scheme consists of an adaptive residual generator and a bank of isolation observers, whose parameters are directly identified from the process data without identification of complete process model. To deal with normal variations in the process, the parameters of residual generator are online updated by standard adaptive technique to achieve reliable fault detection performance. After...

  14. Data-Driven H∞ Control for Nonlinear Distributed Parameter Systems.

    Science.gov (United States)

    Luo, Biao; Huang, Tingwen; Wu, Huai-Ning; Yang, Xiong

    2015-11-01

    The data-driven H∞ control problem of nonlinear distributed parameter systems is considered in this paper. An off-policy learning method is developed to learn the H∞ control policy from real system data rather than the mathematical model. First, Karhunen-Loève decomposition is used to compute the empirical eigenfunctions, which are then employed to derive a reduced-order model (ROM) of slow subsystem based on the singular perturbation theory. The H∞ control problem is reformulated based on the ROM, which can be transformed to solve the Hamilton-Jacobi-Isaacs (HJI) equation, theoretically. To learn the solution of the HJI equation from real system data, a data-driven off-policy learning approach is proposed based on the simultaneous policy update algorithm and its convergence is proved. For implementation purpose, a neural network (NN)- based action-critic structure is developed, where a critic NN and two action NNs are employed to approximate the value function, control, and disturbance policies, respectively. Subsequently, a least-square NN weight-tuning rule is derived with the method of weighted residuals. Finally, the developed data-driven off-policy learning approach is applied to a nonlinear diffusion-reaction process, and the obtained results demonstrate its effectiveness.

  15. Data-Driven Guides: Supporting Expressive Design for Information Graphics.

    Science.gov (United States)

    Kim, Nam Wook; Schweickart, Eston; Liu, Zhicheng; Dontcheva, Mira; Li, Wilmot; Popovic, Jovan; Pfister, Hanspeter

    2017-01-01

    In recent years, there is a growing need for communicating complex data in an accessible graphical form. Existing visualization creation tools support automatic visual encoding, but lack flexibility for creating custom design; on the other hand, freeform illustration tools require manual visual encoding, making the design process time-consuming and error-prone. In this paper, we present Data-Driven Guides (DDG), a technique for designing expressive information graphics in a graphic design environment. Instead of being confined by predefined templates or marks, designers can generate guides from data and use the guides to draw, place and measure custom shapes. We provide guides to encode data using three fundamental visual encoding channels: length, area, and position. Users can combine more than one guide to construct complex visual structures and map these structures to data. When underlying data is changed, we use a deformation technique to transform custom shapes using the guides as the backbone of the shapes. Our evaluation shows that data-driven guides allow users to create expressive and more accurate custom data-driven graphics.

  16. Data-Driven Predictive Direct Load Control of Refrigeration Systems

    DEFF Research Database (Denmark)

    Shafiei, Seyed Ehsan; Knudsen, Torben; Wisniewski, Rafal

    2015-01-01

    A predictive control using subspace identification is applied for the smart grid integration of refrigeration systems under a direct load control scheme. A realistic demand response scenario based on regulation of the electrical power consumption is considered. A receding horizon optimal control...... is proposed to fulfil two important objectives: to secure high coefficient of performance and to participate in power consumption management. Moreover, a new method for design of input signals for system identification is put forward. The control method is fully data driven without an explicit use of model...

  17. Data-driven algorithm to estimate friction in automobile engine

    DEFF Research Database (Denmark)

    Stotsky, Alexander A.

    2010-01-01

    Algorithms based on the oscillations of the engine angular rotational speed under fuel cutoff and no-load were proposed for estimation of the engine friction torque. The recursive algorithm to restore the periodic signal is used to calculate the amplitude of the engine speed signal at fuel cutoff....... The values of the friction torque in the corresponding table entries are updated at acquiring new measurements of the friction moment. A new, data-driven algorithm for table adaptation on the basis of stepwise regression was developed and verified using the six-cylinder Volvo engine....

  18. Data-driven parameterization of the generalized Langevin equation

    Energy Technology Data Exchange (ETDEWEB)

    Lei, Huan; Baker, Nathan A.; Li, Xiantao

    2016-11-29

    We present a data-driven approach to determine the memory kernel and random noise of the generalized Langevin equation. To facilitate practical implementations, we parameterize the kernel function in the Laplace domain by a rational function, with coefficients directly linked to the equilibrium statistics of the coarse-grain variables. Further, we show that such an approximation can be constructed to arbitrarily high order. Within these approximations, the generalized Langevin dynamics can be embedded in an extended stochastic model without memory. We demonstrate how to introduce the stochastic noise so that the fluctuation-dissipation theorem is exactly satisfied.

  19. Building Energy Modeling: A Data-Driven Approach

    Science.gov (United States)

    Cui, Can

    Buildings consume nearly 50% of the total energy in the United States, which drives the need to develop high-fidelity models for building energy systems. Extensive methods and techniques have been developed, studied, and applied to building energy simulation and forecasting, while most of work have focused on developing dedicated modeling approach for generic buildings. In this study, an integrated computationally efficient and high-fidelity building energy modeling framework is proposed, with the concentration on developing a generalized modeling approach for various types of buildings. First, a number of data-driven simulation models are reviewed and assessed on various types of computationally expensive simulation problems. Motivated by the conclusion that no model outperforms others if amortized over diverse problems, a meta-learning based recommendation system for data-driven simulation modeling is proposed. To test the feasibility of the proposed framework on the building energy system, an extended application of the recommendation system for short-term building energy forecasting is deployed on various buildings. Finally, Kalman filter-based data fusion technique is incorporated into the building recommendation system for on-line energy forecasting. Data fusion enables model calibration to update the state estimation in real-time, which filters out the noise and renders more accurate energy forecast. The framework is composed of two modules: off-line model recommendation module and on-line model calibration module. Specifically, the off-line model recommendation module includes 6 widely used data-driven simulation models, which are ranked by meta-learning recommendation system for off-line energy modeling on a given building scenario. Only a selective set of building physical and operational characteristic features is needed to complete the recommendation task. The on-line calibration module effectively addresses system uncertainties, where data fusion on

  20. Controller synthesis for negative imaginary systems: a data driven approach

    KAUST Repository

    Mabrok, Mohamed

    2016-02-17

    The negative imaginary (NI) property occurs in many important applications. For instance, flexible structure systems with collocated force actuators and position sensors can be modelled as negative imaginary systems. In this study, a data-driven controller synthesis methodology for NI systems is presented. In this approach, measured frequency response data of the plant is used to construct the controller frequency response at every frequency by minimising a cost function. Then, this controller response is used to identify the controller transfer function using system identification methods. © The Institution of Engineering and Technology 2016.

  1. Data driven information system for supervision of judicial open

    Directory of Open Access Journals (Sweden)

    Ming LI

    2016-08-01

    Full Text Available Aiming at the four outstanding problems of informationized supervision for judicial publicity, the judicial public data is classified based on data driven to form the finally valuable data. Then, the functional structure, technical structure and business structure of the data processing system are put forward, including data collection module, data reduction module, data analysis module, data application module and data security module, etc. The development of the data processing system based on these structures can effectively reduce work intensity of judicial open iformation management, summarize the work state, find the problems, and promote the level of judicial publicity.

  2. The Under-represented in the Data-Driven Economy

    DEFF Research Database (Denmark)

    Bechmann, Anja

    2017-01-01

    -jewellery). These data traces are increasingly used to inform product and processual decisions by companies that want to ‘listen’ to the user and optimize products and revenue accordingly or governments that want to ‘adjust’ behavior using large data streams and big data methods. What are the democratic implications...... of this data driven economy, in which data-enriched decisions may have profound consequence for the equal representation of individuals in society? How does the democratic society make sure that data traces actually represent the user and that all users are part of the data processing on equal terms...

  3. Data-Driven Forecasting Schemes: Evaluation and Applications

    Directory of Open Access Journals (Sweden)

    J. V. Jr

    2007-01-01

    Full Text Available A reliable multi-step predictor is very useful to a wide array of applications to forecast the behavior of dynamic systems. The objective of this paper is to develop a more robust data-driven predictor for time series forecasting. Based on simulation analysis, it is found that multi-step-ahead forecasting schemes based on step inputs perform better than those based on sequential inputs. It is also realized that recurrent neural fuzzy predictor is superior to both recurrent neural networks and feedforward networks. In order to enhance the forecasting convergence, a hybrid training technique is proposed base on the real-time recurrent training and weighted least squares estimate. The developed predictor is also implemented for real-time applications in material property testing. The investigation results show that the developed adaptive predictor is a reliable forecasting tool. It can capture the system’s dynamic behavior quickly and track the system’s characteristics accurately. Its performance is superior to other classical data-driven forecasting schemes.

  4. A data driven nonlinear stochastic model for blood glucose dynamics.

    Science.gov (United States)

    Zhang, Yan; Holt, Tim A; Khovanova, Natalia

    2016-03-01

    The development of adequate mathematical models for blood glucose dynamics may improve early diagnosis and control of diabetes mellitus (DM). We have developed a stochastic nonlinear second order differential equation to describe the response of blood glucose concentration to food intake using continuous glucose monitoring (CGM) data. A variational Bayesian learning scheme was applied to define the number and values of the system's parameters by iterative optimisation of free energy. The model has the minimal order and number of parameters to successfully describe blood glucose dynamics in people with and without DM. The model accounts for the nonlinearity and stochasticity of the underlying glucose-insulin dynamic process. Being data-driven, it takes full advantage of available CGM data and, at the same time, reflects the intrinsic characteristics of the glucose-insulin system without detailed knowledge of the physiological mechanisms. We have shown that the dynamics of some postprandial blood glucose excursions can be described by a reduced (linear) model, previously seen in the literature. A comprehensive analysis demonstrates that deterministic system parameters belong to different ranges for diabetes and controls. Implications for clinical practice are discussed. This is the first study introducing a continuous data-driven nonlinear stochastic model capable of describing both DM and non-DM profiles.

  5. Data-driven execution of fast multipole methods

    KAUST Repository

    Ltaief, Hatem

    2013-09-17

    Fast multipole methods (FMMs) have O (N) complexity, are compute bound, and require very little synchronization, which makes them a favorable algorithm on next-generation supercomputers. Their most common application is to accelerate N-body problems, but they can also be used to solve boundary integral equations. When the particle distribution is irregular and the tree structure is adaptive, load balancing becomes a non-trivial question. A common strategy for load balancing FMMs is to use the work load from the previous step as weights to statically repartition the next step. The authors discuss in the paper another approach based on data-driven execution to efficiently tackle this challenging load balancing problem. The core idea consists of breaking the most time-consuming stages of the FMMs into smaller tasks. The algorithm can then be represented as a directed acyclic graph where nodes represent tasks and edges represent dependencies among them. The execution of the algorithm is performed by asynchronously scheduling the tasks using the queueing and runtime for kernels runtime environment, in a way such that data dependencies are not violated for numerical correctness purposes. This asynchronous scheduling results in an out-of-order execution. The performance results of the data-driven FMM execution outperform the previous strategy and show linear speedup on a quad-socket quad-core Intel Xeon system.Copyright © 2013 John Wiley & Sons, Ltd. Copyright © 2013 John Wiley & Sons, Ltd.

  6. Data driven CAN node reliability assessment for manufacturing system

    Science.gov (United States)

    Zhang, Leiming; Yuan, Yong; Lei, Yong

    2017-01-01

    The reliability of the Controller Area Network(CAN) is critical to the performance and safety of the system. However, direct bus-off time assessment tools are lacking in practice due to inaccessibility of the node information and the complexity of the node interactions upon errors. In order to measure the mean time to bus-off(MTTB) of all the nodes, a novel data driven node bus-off time assessment method for CAN network is proposed by directly using network error information. First, the corresponding network error event sequence for each node is constructed using multiple-layer network error information. Then, the generalized zero inflated Poisson process(GZIP) model is established for each node based on the error event sequence. Finally, the stochastic model is constructed to predict the MTTB of the node. The accelerated case studies with different error injection rates are conducted on a laboratory network to demonstrate the proposed method, where the network errors are generated by a computer controlled error injection system. Experiment results show that the MTTB of nodes predicted by the proposed method agree well with observations in the case studies. The proposed data driven node time to bus-off assessment method for CAN networks can successfully predict the MTTB of nodes by directly using network error event data.

  7. Locative media and data-driven computing experiments

    Directory of Open Access Journals (Sweden)

    Sung-Yueh Perng

    2016-06-01

    Full Text Available Over the past two decades urban social life has undergone a rapid and pervasive geocoding, becoming mediated, augmented and anticipated by location-sensitive technologies and services that generate and utilise big, personal, locative data. The production of these data has prompted the development of exploratory data-driven computing experiments that seek to find ways to extract value and insight from them. These projects often start from the data, rather than from a question or theory, and try to imagine and identify their potential utility. In this paper, we explore the desires and mechanics of data-driven computing experiments. We demonstrate how both locative media data and computing experiments are ‘staged’ to create new values and computing techniques, which in turn are used to try and derive possible futures that are ridden with unintended consequences. We argue that using computing experiments to imagine potential urban futures produces effects that often have little to do with creating new urban practices. Instead, these experiments promote Big Data science and the prospect that data produced for one purpose can be recast for another and act as alternative mechanisms of envisioning urban futures.

  8. Data-Driven Adaptive Observer for Fault Diagnosis

    Directory of Open Access Journals (Sweden)

    Shen Yin

    2012-01-01

    Full Text Available This paper presents an approach for data-driven design of fault diagnosis system. The proposed fault diagnosis scheme consists of an adaptive residual generator and a bank of isolation observers, whose parameters are directly identified from the process data without identification of complete process model. To deal with normal variations in the process, the parameters of residual generator are online updated by standard adaptive technique to achieve reliable fault detection performance. After a fault is successfully detected, the isolation scheme will be activated, in which each isolation observer serves as an indicator corresponding to occurrence of a particular type of fault in the process. The thresholds can be determined analytically or through estimating the probability density function of related variables. To illustrate the performance of proposed fault diagnosis approach, a laboratory-scale three-tank system is finally utilized. It shows that the proposed data-driven scheme is efficient to deal with applications, whose analytical process models are unavailable. Especially, for the large-scale plants, whose physical models are generally difficult to be established, the proposed approach may offer an effective alternative solution for process monitoring.

  9. Product design pattern based on big data-driven scenario

    Directory of Open Access Journals (Sweden)

    Conggang Yu

    2016-07-01

    Full Text Available This article discusses about new product design patterns in the big data era, gives designer a new rational thinking way, and is a new way to understand the design of the product. Based on the key criteria of the product design process, category, element, and product are used to input the data, which comprises concrete data and abstract data as an enlargement of the criteria of product design process for the establishment of a big data-driven product design pattern’s model. Moreover, an experiment and a product design case are conducted to verify the feasibility of the new pattern. Ultimately, we will conclude that the data-driven product design has two patterns: one is the concrete data supporting the product design, namely “product–data–product” pattern, and the second is based on the value of the abstract data for product design, namely “data–product–data” pattern. Through the data, users are involving themselves in the design development process. Data and product form a huge network, and data plays a role of connection or node. So the essence of the design is to find a new connection based on element, and to find a new node based on category.

  10. Data-driven Science in Geochemistry & Petrology: Vision & Reality

    Science.gov (United States)

    Lehnert, K. A.; Ghiorso, M. S.; Spear, F. S.

    2013-12-01

    Science in many fields is increasingly ';data-driven'. Though referred to as a ';new' Fourth Paradigm (Hey, 2009), data-driven science is not new, and examples are cited in the Geochemical Society's data policy, including the compilation of Dziewonski & Anderson (1981) that led to PREM, and Zindler & Hart (1986), who compiled mantle isotope data to present for the first time a comprehensive view of the Earth's mantle. Today, rapidly growing data volumes, ubiquity of data access, and new computational and information management technologies enable data-driven science at a radically advanced scale of speed, extent, flexibility, and inclusiveness, with the ability to seamlessly synthesize observations, experiments, theory, and computation, and to statistically mine data across disciplines, leading to more comprehensive, well informed, and high impact scientific advances. Are geochemists, petrologists, and volcanologists ready to participate in this revolution of the scientific process? In the past year, researchers from the VGP community and related disciplines have come together at several cyberinfrastructure related workshops, in part prompted by the EarthCube initiative of the US NSF, to evaluate the status of cyberinfrastructure in their field, to put forth key scientific challenges, and identify primary data and software needs to address these. Science scenarios developed by workshop participants that range from non-equilibrium experiments focusing on mass transport, chemical reactions, and phase transformations (J. Hammer) to defining the abundance of elements and isotopes in every voxel in the Earth (W. McDonough), demonstrate the potential of cyberinfrastructure enabled science, and define the vision of how data access, visualization, analysis, computation, and cross-domain interoperability can and should support future research in VGP. The primary obstacle for data-driven science in VGP remains the dearth of accessible, integrated data from lab and sensor

  11. Data-driven warehouse optimization : Deploying skills of order pickers

    NARCIS (Netherlands)

    M. Matusiak (Marek); M.B.M. de Koster (René); J. Saarinen (Jari)

    2015-01-01

    textabstractBatching orders and routing order pickers is a commonly studied problem in many picker-to-parts warehouses. The impact of individual differences in picking skills on performance has received little attention. In this paper, we show that taking into account differences in the skills of in

  12. Data-driven warehouse optimization : Deploying skills of order pickers

    NARCIS (Netherlands)

    M. Matusiak (Marek); M.B.M. de Koster (René); J. Saarinen (Jari)

    2015-01-01

    textabstractBatching orders and routing order pickers is a commonly studied problem in many picker-to-parts warehouses. The impact of individual differences in picking skills on performance has received little attention. In this paper, we show that taking into account differences in the skills of

  13. Data-Driven Techniques for Regional Groundwater Level Forecasts

    Science.gov (United States)

    Chang, F. J.; Chang, L. C.; Tsai, F. H.; Shen, H. Y.

    2015-12-01

    Data-Driven Techniques for Regional Groundwater Level Forecasts Fi-John Changa, Li-Chiu Changb, Fong He Tsaia, Hung-Yu Shenba Department of Bioenvironmental Systems Engineering, National Taiwan University, Taipei 10617, Taiwan, ROC. b Department of Water Resources and Environmental Engineering, Tamkang University, New Taipei City 25137, Taiwan, ROC..Correspondence to: Fi-John Chang (email: changfj@ntu.edu.tw)The alluvial fan of the Zhuoshui River in Taiwan is a good natural recharge area of groundwater. However, the over extraction of groundwater occurs in the coastland results in serious land subsidence. Groundwater systems are heterogeneous with diverse temporal-spatial patterns, and it is very difficult to quantify their complex processes. Data-driven methods can effectively capture the spatial-temporal characteristics of input-output patterns at different scales for accurately imitating dynamic complex systems with less computational requirements. In this study, we implement various data-driven methods to suitably predict the regional groundwater level variations for making countermeasures in response to the land subsidence issue in the study area. We first establish the relationship between regional rainfall, streamflow as well as groundwater levels and then construct intelligent groundwater level prediction models for the basin based on the long-term (2000-2013) regional monthly data sets collected from the Zhuoshui River basin. We analyze the interaction between hydrological factors and groundwater level variations; apply the self-organizing map (SOM) to obtain the clustering results of the spatial-temporal groundwater level variations; and then apply the recurrent configuration of nonlinear autoregressive with exogenous inputs (R-NARX) to predicting the monthly groundwater levels. As a consequence, a regional intelligent groundwater level prediction model can be constructed based on the adaptive results of the SOM. Results demonstrate that the development

  14. Batch By Batch Longitudinal Emittance Blowup MD

    CERN Document Server

    Mastoridis, T; Butterworth, A; Jaussi, M; Molendijk, J

    2012-01-01

    The transverse bunch emittance increases significantly at 450 GeV from the time of injection till the ramp due to IBS. By selectively blowing up the longitudinal emittance of the incoming batch at each injection, it should be possible to reduce the transverse emittance growth rates due to IBS. An MD was conducted on April 22nd 2012 to test the feasibility and performance of the batch-by-batch longitudinal emittance blowup. There were three main goals during the MD. First, to test the developed hardware, firmware, and software for the batch-by-batch blowup. Then, to measure the transverse emittance growth rates of blown-up and "witness" batches to quantify any improvement, and finally to test the ALLInjectSequencer class, which deals with the complicated gymnastics of introducing or masking the new batch to various RF loops.

  15. submitter Data-driven RBE parameterization for helium ion beams

    CERN Document Server

    Mairani, A; Dokic, I; Valle, S M; Tessonnier, T; Galm, R; Ciocca, M; Parodi, K; Ferrari, A; Jäkel, O; Haberer, T; Pedroni, P; Böhlen, T T

    2016-01-01

    Helium ion beams are expected to be available again in the near future for clinical use. A suitable formalism to obtain relative biological effectiveness (RBE) values for treatment planning (TP) studies is needed. In this work we developed a data-driven RBE parameterization based on published in vitro experimental values. The RBE parameterization has been developed within the framework of the linear-quadratic (LQ) model as a function of the helium linear energy transfer (LET), dose and the tissue specific parameter ${{(\\alpha /\\beta )}_{\\text{ph}}}$ of the LQ model for the reference radiation. Analytic expressions are provided, derived from the collected database, describing the $\\text{RB}{{\\text{E}}_{\\alpha}}={{\\alpha}_{\\text{He}}}/{{\\alpha}_{\\text{ph}}}$ and ${{\\text{R}}_{\\beta}}={{\\beta}_{\\text{He}}}/{{\\beta}_{\\text{ph}}}$ ratios as a function of LET. Calculated RBE values at 2 Gy photon dose and at 10% survival ($\\text{RB}{{\\text{E}}_{10}}$ ) are compared with the experimental ones. Pearson's correlati...

  16. Using Shape Memory Alloys: A Dynamic Data Driven Approach

    KAUST Repository

    Douglas, Craig C.

    2013-06-01

    Shape Memory Alloys (SMAs) are capable of changing their crystallographic structure due to changes of either stress or temperature. SMAs are used in a number of aerospace devices and are required in some devices in exotic environments. We are developing dynamic data driven application system (DDDAS) tools to monitor and change SMAs in real time for delivering payloads by aerospace vehicles. We must be able to turn on and off the sensors and heating units, change the stress on the SMA, monitor on-line data streams, change scales based on incoming data, and control what type of data is generated. The application must have the capability to be run and steered remotely as an unmanned feedback control loop.

  17. Data-driven system to predict academic grades and dropout.

    Science.gov (United States)

    Rovira, Sergi; Puertas, Eloi; Igual, Laura

    2017-01-01

    Nowadays, the role of a tutor is more important than ever to prevent students dropout and improve their academic performance. This work proposes a data-driven system to extract relevant information hidden in the student academic data and, thus, help tutors to offer their pupils a more proactive personal guidance. In particular, our system, based on machine learning techniques, makes predictions of dropout intention and courses grades of students, as well as personalized course recommendations. Moreover, we present different visualizations which help in the interpretation of the results. In the experimental validation, we show that the system obtains promising results with data from the degree studies in Law, Computer Science and Mathematics of the Universitat de Barcelona.

  18. Integrative systems biology for data-driven knowledge discovery.

    Science.gov (United States)

    Greene, Casey S; Troyanskaya, Olga G

    2010-09-01

    Integrative systems biology is an approach that brings together diverse high-throughput experiments and databases to gain new insights into biological processes or systems at molecular through physiological levels. These approaches rely on diverse high-throughput experimental techniques that generate heterogeneous data by assaying varying aspects of complex biological processes. Computational approaches are necessary to provide an integrative view of these experimental results and enable data-driven knowledge discovery. Hypotheses generated from these approaches can direct definitive molecular experiments in a cost-effective manner. By using integrative systems biology approaches, we can leverage existing biological knowledge and large-scale data to improve our understanding of as yet unknown components of a system of interest and how its malfunction leads to disease.

  19. Data-driven system to predict academic grades and dropout

    Science.gov (United States)

    Rovira, Sergi; Puertas, Eloi

    2017-01-01

    Nowadays, the role of a tutor is more important than ever to prevent students dropout and improve their academic performance. This work proposes a data-driven system to extract relevant information hidden in the student academic data and, thus, help tutors to offer their pupils a more proactive personal guidance. In particular, our system, based on machine learning techniques, makes predictions of dropout intention and courses grades of students, as well as personalized course recommendations. Moreover, we present different visualizations which help in the interpretation of the results. In the experimental validation, we show that the system obtains promising results with data from the degree studies in Law, Computer Science and Mathematics of the Universitat de Barcelona. PMID:28196078

  20. Data-driven parameterization of the generalized Langevin equation.

    Science.gov (United States)

    Lei, Huan; Baker, Nathan A; Li, Xiantao

    2016-12-13

    We present a data-driven approach to determine the memory kernel and random noise in generalized Langevin equations. To facilitate practical implementations, we parameterize the kernel function in the Laplace domain by a rational function, with coefficients directly linked to the equilibrium statistics of the coarse-grain variables. We show that such an approximation can be constructed to arbitrarily high order and the resulting generalized Langevin dynamics can be embedded in an extended stochastic model without explicit memory. We demonstrate how to introduce the stochastic noise so that the second fluctuation-dissipation theorem is exactly satisfied. Results from several numerical tests are presented to demonstrate the effectiveness of the proposed method.

  1. Data-driven parameterization of the generalized Langevin equation

    CERN Document Server

    Lei, Huan; Li, Xiantao

    2016-01-01

    We present a data-driven approach to determine the memory kernel and random noise in generalized Langevin equations. To facilitate practical implementations, we parameterize the kernel function in the Laplace domain by a rational function, with coefficients directly linked to the equilibrium statistics of the coarse-grain variables. We show that such an approximation can be constructed to arbitrarily high order and the resulting generalized Langevin dynamics can be embedded in an extended stochastic model without explicit memory. We demonstrate how to introduce the stochastic noise so that the second fluctuation-dissipation theorem is exactly satisfied. Results from several numerical tests are presented to demonstrate the effectiveness of the proposed method.

  2. Data-driven forward model inference for EEG brain imaging

    DEFF Research Database (Denmark)

    Hansen, Sofie Therese; Hauberg, Søren; Hansen, Lars Kai

    2016-01-01

    Electroencephalography (EEG) is a flexible and accessible tool with excellent temporal resolution but with a spatial resolution hampered by volume conduction. Reconstruction of the cortical sources of measured EEG activity partly alleviates this problem and effectively turns EEG into a brain......-of-concept study, we show that, even when anatomical knowledge is unavailable, a suitable forward model can be estimated directly from the EEG. We propose a data-driven approach that provides a low-dimensional parametrization of head geometry and compartment conductivities, built using a corpus of forward models....... Combined with only a recorded EEG signal, we are able to estimate both the brain sources and a person-specific forward model by optimizing this parametrization. We thus not only solve an inverse problem, but also optimize over its specification. Our work demonstrates that personalized EEG brain imaging...

  3. Data-driven forward model inference for EEG brain imaging

    DEFF Research Database (Denmark)

    Hansen, Sofie Therese; Hauberg, Søren; Hansen, Lars Kai

    2016-01-01

    Electroencephalography (EEG) is a flexible and accessible tool with excellent temporal resolution but with a spatial resolution hampered by volume conduction. Reconstruction of the cortical sources of measured EEG activity partly alleviates this problem and effectively turns EEG into a brain......-of-concept study, we show that, even when anatomical knowledge is unavailable, a suitable forward model can be estimated directly from the EEG. We propose a data-driven approach that provides a low-dimensional parametrization of head geometry and compartment conductivities, built using a corpus of forward models....... Combined with only a recorded EEG signal, we are able to estimate both the brain sources and a person-specific forward model by optimizing this parametrization. We thus not only solve an inverse problem, but also optimize over its specification. Our work demonstrates that personalized EEG brain imaging...

  4. Data-driven simulation methodology using DES 4-layer architecture

    Directory of Open Access Journals (Sweden)

    Aida Saez

    2016-05-01

    Full Text Available In this study, we present a methodology to build data-driven simulation models of manufacturing plants. We go further than other research proposals and we suggest focusing simulation model development under a 4-layer architecture (network, logic, database and visual reality. The Network layer includes system infrastructure. The Logic layer covers operations planning and control system, and material handling equipment system. The Database holds all the information needed to perform the simulation, the results used to analyze and the values that the Logic layer is using to manage the Plant. Finally, the Visual Reality displays an augmented reality system including not only the machinery and the movement but also blackboards and other Andon elements. This architecture provides numerous advantages as helps to build a simulation model that consistently considers the internal logistics, in a very flexible way.

  5. A purely data driven method for European option valuation

    Institute of Scientific and Technical Information of China (English)

    HUANG Guang-hui; WAN Jian-ping

    2006-01-01

    An alternative option pricing method is proposed based on a random walk market model.The minimal entropy martingale measure which adopts no arbitrage opportunity in the market,is deduced for this market model and is used as the pricing measure to evaluate European call options by a Monte Carlo simulation method.The proposed method is a purely data driven valuation method without any distributional assumption about the price process of underlying asset.The performance of the proposed method is compared with the canonical valuation method and the historical volatility-based Black-Scholes method in an artificial Black-Scholes world.The simulation results show that the proposed method has merits,and is valuable to financial engineering.

  6. Data-driven identification of potential Zika virus vectors

    Science.gov (United States)

    Evans, Michelle V; Dallas, Tad A; Han, Barbara A; Murdock, Courtney C; Drake, John M

    2017-01-01

    Zika is an emerging virus whose rapid spread is of great public health concern. Knowledge about transmission remains incomplete, especially concerning potential transmission in geographic areas in which it has not yet been introduced. To identify unknown vectors of Zika, we developed a data-driven model linking vector species and the Zika virus via vector-virus trait combinations that confer a propensity toward associations in an ecological network connecting flaviviruses and their mosquito vectors. Our model predicts that thirty-five species may be able to transmit the virus, seven of which are found in the continental United States, including Culex quinquefasciatus and Cx. pipiens. We suggest that empirical studies prioritize these species to confirm predictions of vector competence, enabling the correct identification of populations at risk for transmission within the United States. DOI: http://dx.doi.org/10.7554/eLife.22053.001 PMID:28244371

  7. Facilitating Data Driven Business Model Innovation - A Case study

    DEFF Research Database (Denmark)

    Bjerrum, Torben Cæsar Bisgaard; Andersen, Troels Christian; Aagaard, Annabeth

    2016-01-01

    , that gathers knowledge is of great importance. The SMEs have little, if no experience, within data handling, data analytics, and working with structured Business Model Innovation (BMI), that relates to both new and conventional products, processes and services. This new frontier of data and BMI will have......This paper aims to understand the barriers that businesses meet in understanding their current business models (BM) and in their attempt at innovating new data driven business models (DDBM) using data. The interdisciplinary challenge of knowledge exchange occurring outside and/or inside businesses...... ability to adapt these new DDBM depends on the ability to pick up, share and develop knowledge between customers, partners and the network. This knowledge can be embedded into core BMs and constitutes a strategic opportunity enabling businesses to extract value from data into BMI, resulting in DDBMs...

  8. Econophysics and Data Driven Modelling of Market Dynamics

    CERN Document Server

    Aoyama, Hideaki; Chakrabarti, Bikas; Chakraborti, Anirban; Ghosh, Asim; Econophysics and Data Driven Modelling of Market Dynamics

    2015-01-01

    This book presents the works and research findings of physicists, economists, mathematicians, statisticians, and financial engineers who have undertaken data-driven modelling of market dynamics and other empirical studies in the field of Econophysics. During recent decades, the financial market landscape has changed dramatically with the deregulation of markets and the growing complexity of products. The ever-increasing speed and decreasing costs of computational power and networks have led to the emergence of huge databases. The availability of these data should permit the development of models that are better founded empirically, and econophysicists have accordingly been advocating that one should rely primarily on the empirical observations in order to construct models and validate them. The recent turmoil in financial markets and the 2008 crash appear to offer a strong rationale for new models and approaches. The Econophysics community accordingly has an important future role to play in market modelling....

  9. Data-Driven Information Extraction from Chinese Electronic Medical Records.

    Directory of Open Access Journals (Sweden)

    Dong Xu

    Full Text Available This study aims to propose a data-driven framework that takes unstructured free text narratives in Chinese Electronic Medical Records (EMRs as input and converts them into structured time-event-description triples, where the description is either an elaboration or an outcome of the medical event.Our framework uses a hybrid approach. It consists of constructing cross-domain core medical lexica, an unsupervised, iterative algorithm to accrue more accurate terms into the lexica, rules to address Chinese writing conventions and temporal descriptors, and a Support Vector Machine (SVM algorithm that innovatively utilizes Normalized Google Distance (NGD to estimate the correlation between medical events and their descriptions.The effectiveness of the framework was demonstrated with a dataset of 24,817 de-identified Chinese EMRs. The cross-domain medical lexica were capable of recognizing terms with an F1-score of 0.896. 98.5% of recorded medical events were linked to temporal descriptors. The NGD SVM description-event matching achieved an F1-score of 0.874. The end-to-end time-event-description extraction of our framework achieved an F1-score of 0.846.In terms of named entity recognition, the proposed framework outperforms state-of-the-art supervised learning algorithms (F1-score: 0.896 vs. 0.886. In event-description association, the NGD SVM is superior to SVM using only local context and semantic features (F1-score: 0.874 vs. 0.838.The framework is data-driven, weakly supervised, and robust against the variations and noises that tend to occur in a large corpus. It addresses Chinese medical writing conventions and variations in writing styles through patterns used for discovering new terms and rules for updating the lexica.

  10. Data-driven non-Markovian closure models

    Science.gov (United States)

    Kondrashov, Dmitri; Chekroun, Mickaël D.; Ghil, Michael

    2015-03-01

    This paper has two interrelated foci: (i) obtaining stable and efficient data-driven closure models by using a multivariate time series of partial observations from a large-dimensional system; and (ii) comparing these closure models with the optimal closures predicted by the Mori-Zwanzig (MZ) formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a generalization and a time-continuous limit of existing multilevel, regression-based approaches to closure in a data-driven setting; these approaches include empirical model reduction (EMR), as well as more recent multi-layer modeling. It is shown that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the MZ formalism. A simple correlation-based stopping criterion for an EMR-MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are derived on the structure of the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a broad class of MSM applications, a class that includes non-polynomial predictors and nonlinearities that do not necessarily preserve quadratic energy invariants. The EMR-MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. It is shown that the resulting closure model with energy-conserving nonlinearities efficiently captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lotka-Volterra model of population dynamics in its chaotic regime. The challenges here include the rarity of strange attractors in the model's parameter

  11. Data-driven discovery of partial differential equations

    Science.gov (United States)

    Rudy, Samuel; Brunton, Steven; Proctor, Joshua; Kutz, J. Nathan

    2016-11-01

    Fluid dynamics is inherently governed by spatial-temporal interactions which can be characterized by partial differential equations (PDEs). Emerging sensor and measurement technologies allowing for rich, time-series data collection motivate new data-driven methods for discovering governing equations. We present a novel computational technique for discovering governing PDEs from time series measurements. A library of candidate terms for the PDE including nonlinearities and partial derivatives is computed and sparse regression is then used to identify a subset which accurately reflects the measured dynamics. Measurements may be taken either in a Eulerian framework to discover field equations or in a Lagrangian framework to study a single stochastic trajectory. The method is shown to be robust, efficient, and to work on a variety of canonical equations. Data collected from a simulation of a flow field around a cylinder is used to accurately identify the Navier-Stokes vorticity equation and the Reynolds number to within 1%. A single trace of Brownian motion is also used to identify the diffusion equation. Our method provides a novel approach towards data enabled science where spatial-temporal information bolsters classical machine learning techniques to identify physical laws.

  12. Data-driven encoding for quantitative genetic trait prediction.

    Science.gov (United States)

    He, Dan; Wang, Zhanyong; Parida, Laxmi

    2015-01-01

    Given a set of biallelic molecular markers, such as SNPs, with genotype values on a collection of plant, animal or human samples, the goal of quantitative genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Quantitative genetic trait prediction is usually represented as linear regression models which require quantitative encodings for the genotypes: the three distinct genotype values, corresponding to one heterozygous and two homozygous alleles, are usually coded as integers, and manipulated algebraically in the model. Further, epistasis between multiple markers is modeled as multiplication between the markers: it is unclear that the regression model continues to be effective under this. In this work we investigate the effects of encodings to the quantitative genetic trait prediction problem. We first showed that different encodings lead to different prediction accuracies, in many test cases. We then proposed a data-driven encoding strategy, where we encode the genotypes according to their distribution in the phenotypes and we allow each marker to have different encodings. We show in our experiments that this encoding strategy is able to improve the performance of the genetic trait prediction method and it is more helpful for the oligogenic traits, whose values rely on a relatively small set of markers. To the best of our knowledge, this is the first paper that discusses the effects of encodings to the genetic trait prediction problem.

  13. A data-driven approach to quality risk management

    Directory of Open Access Journals (Sweden)

    Demissie Alemayehu

    2013-01-01

    Full Text Available Aim: An effective clinical trial strategy to ensure patient safety as well as trial quality and efficiency involves an integrated approach, including prospective identification of risk factors, mitigation of the risks through proper study design and execution, and assessment of quality metrics in real-time. Such an integrated quality management plan may also be enhanced by using data-driven techniques to identify risk factors that are most relevant in predicting quality issues associated with a trial. In this paper, we illustrate such an approach using data collected from actual clinical trials. Materials and Methods: Several statistical methods were employed, including the Wilcoxon rank-sum test and logistic regression, to identify the presence of association between risk factors and the occurrence of quality issues, applied to data on quality of clinical trials sponsored by Pfizer. Results: Only a subset of the risk factors had a significant association with quality issues, and included: Whether study used Placebo, whether an agent was a biologic, unusual packaging label, complex dosing, and over 25 planned procedures. Conclusion: Proper implementation of the strategy can help to optimize resource utilization without compromising trial integrity and patient safety.

  14. Data driven fault detection and isolation: a wind turbine scenario

    Directory of Open Access Journals (Sweden)

    Rubén Francisco Manrique Piramanrique

    2015-04-01

    Full Text Available One of the greatest drawbacks in wind energy generation is the high maintenance cost associated to mechanical faults. This problem becomes more evident in utility scale wind turbines, where the increased size and nominal capacity comes with additional problems associated with structural vibrations and aeroelastic effects in the blades. Due to the increased operation capability, it is imperative to detect system degradation and faults in an efficient manner, maintaining system integrity, reliability and reducing operation costs. This paper presents a comprehensive comparison of four different Fault Detection and Isolation (FDI filters based on “Data Driven” (DD techniques. In order to enhance FDI performance, a multi-level strategy is used where:  the first level detects the occurrence of any given fault (detection, while  the second identifies the source of the fault (isolation. Four different DD classification techniques (namely Support Vector Machines, Artificial Neural Networks, K Nearest Neighbors and Gaussian Mixture Models were studied and compared for each of the proposed classification levels. The best strategy at each level could be selected to build the final data driven FDI system. The performance of the proposed scheme is evaluated on a benchmark model of a commercial wind turbine. 

  15. Data-driven forward model inference for EEG brain imaging.

    Science.gov (United States)

    Hansen, Sofie Therese; Hauberg, Søren; Hansen, Lars Kai

    2016-06-13

    Electroencephalography (EEG) is a flexible and accessible tool with excellent temporal resolution but with a spatial resolution hampered by volume conduction. Reconstruction of the cortical sources of measured EEG activity partly alleviates this problem and effectively turns EEG into a brain imaging device. The quality of the source reconstruction depends on the forward model which details head geometry and conductivities of different head compartments. These person-specific factors are complex to determine, requiring detailed knowledge of the subject's anatomy and physiology. In this proof-of-concept study, we show that, even when anatomical knowledge is unavailable, a suitable forward model can be estimated directly from the EEG. We propose a data-driven approach that provides a low-dimensional parametrization of head geometry and compartment conductivities, built using a corpus of forward models. Combined with only a recorded EEG signal, we are able to estimate both the brain sources and a person-specific forward model by optimizing this parametrization. We thus not only solve an inverse problem, but also optimize over its specification. Our work demonstrates that personalized EEG brain imaging is possible, even when the head geometry and conductivities are unknown.

  16. Realistic Data-Driven Traffic Flow Animation Using Texture Synthesis.

    Science.gov (United States)

    Chao, Qianwen; Deng, Zhigang; Ren, Jiaping; Ye, Qianqian; Jin, Xiaogang

    2017-01-11

    We present a novel data-driven approach to populate virtual road networks with realistic traffic flows. Specifically, given a limited set of vehicle trajectories as the input samples, our approach first synthesizes a large set of vehicle trajectories. By taking the spatio-temporal information of traffic flows as a 2D texture, the generation of new traffic flows can be formulated as a texture synthesis process, which is solved by minimizing a newly developed traffic texture energy. The synthesized output captures the spatio-temporal dynamics of the input traffic flows, and the vehicle interactions in it strictly follow traffic rules. After that, we position the synthesized vehicle trajectory data to virtual road networks using a cage-based registration scheme, where a few traffic-specific constraints are enforced to maintain each vehicle's original spatial location and synchronize its motion in concert with its neighboring vehicles. Our approach is intuitive to control and scalable to the complexity of virtual road networks. We validated our approach through many experiments and paired comparison user studies.

  17. Data driven uncertainty evaluation for complex engineered system design

    Science.gov (United States)

    Liu, Boyuan; Huang, Shuangxi; Fan, Wenhui; Xiao, Tianyuan; Humann, James; Lai, Yuyang; Jin, Yan

    2016-09-01

    Complex engineered systems are often difficult to analyze and design due to the tangled interdependencies among their subsystems and components. Conventional design methods often need exact modeling or accurate structure decomposition, which limits their practical application. The rapid expansion of data makes utilizing data to guide and improve system design indispensable in practical engineering. In this paper, a data driven uncertainty evaluation approach is proposed to support the design of complex engineered systems. The core of the approach is a data-mining based uncertainty evaluation method that predicts the uncertainty level of a specific system design by means of analyzing association relations along different system attributes and synthesizing the information entropy of the covered attribute areas, and a quantitative measure of system uncertainty can be obtained accordingly. Monte Carlo simulation is introduced to get the uncertainty extrema, and the possible data distributions under different situations is discussed in detail. The uncertainty values can be normalized using the simulation results and the values can be used to evaluate different system designs. A prototype system is established, and two case studies have been carried out. The case of an inverted pendulum system validates the effectiveness of the proposed method, and the case of an oil sump design shows the practicability when two or more design plans need to be compared. This research can be used to evaluate the uncertainty of complex engineered systems completely relying on data, and is ideally suited for plan selection and performance analysis in system design.

  18. Mobile assessment in schizophrenia: a data-driven momentary approach.

    Science.gov (United States)

    Oorschot, Margreet; Lataster, Tineke; Thewissen, Viviane; Wichers, Marieke; Myin-Germeys, Inez

    2012-05-01

    In this article, a data-driven approach was adopted to demonstrate how real-life diary techniques [ie, the experience sampling method (ESM)] could be deployed for assessment purposes in patients with psychotic disorder, delivering individualized and clinically relevant information. The dataset included patients in an acute phase of psychosis and the focus was on paranoia as one of the main psychotic symptoms (30 patients with high levels of paranoia and 34 with low levels of paranoia). Based on individual cases, it was demonstrated how (1) symptom and mood patterns, (2) patterns of social interactions or activities, (3) contextual risk profiles (eg, is being among strangers, as opposed to family, associated with higher paranoia severity?), and (4) temporal dynamics between mood states and paranoia (eg, does anxiety precipitate or follow the onset of increased paranoia severity?) substantially differ within individual patients and across the high vs low paranoid patient group. Most striking, it was shown that individual findings are different from what is found on overall group levels. Some people stay anxious after a paranoid thought came to mind. For others, paranoia is followed by a state of relaxation. It is discussed how ESM, surfacing the patient's implicit knowledge about symptom patterns, may provide an excellent starting point for person-tailored psychoeducation and for choosing the most applicable therapeutic intervention.

  19. Data driven uncertainty evaluation for complex engineered system design

    Science.gov (United States)

    Liu, Boyuan; Huang, Shuangxi; Fan, Wenhui; Xiao, Tianyuan; Humann, James; Lai, Yuyang; Jin, Yan

    2016-05-01

    Complex engineered systems are often difficult to analyze and design due to the tangled interdependencies among their subsystems and components. Conventional design methods often need exact modeling or accurate structure decomposition, which limits their practical application. The rapid expansion of data makes utilizing data to guide and improve system design indispensable in practical engineering. In this paper, a data driven uncertainty evaluation approach is proposed to support the design of complex engineered systems. The core of the approach is a data-mining based uncertainty evaluation method that predicts the uncertainty level of a specific system design by means of analyzing association relations along different system attributes and synthesizing the information entropy of the covered attribute areas, and a quantitative measure of system uncertainty can be obtained accordingly. Monte Carlo simulation is introduced to get the uncertainty extrema, and the possible data distributions under different situations is discussed in detail. The uncertainty values can be normalized using the simulation results and the values can be used to evaluate different system designs. A prototype system is established, and two case studies have been carried out. The case of an inverted pendulum system validates the effectiveness of the proposed method, and the case of an oil sump design shows the practicability when two or more design plans need to be compared. This research can be used to evaluate the uncertainty of complex engineered systems completely relying on data, and is ideally suited for plan selection and performance analysis in system design.

  20. Data-Driven Modeling and Prediction of Arctic Sea Ice

    Science.gov (United States)

    Kondrashov, Dmitri; Chekroun, Mickael; Ghil, Michael

    2016-04-01

    We present results of data-driven predictive analyses of sea ice over the main Arctic regions. Our approach relies on the Multilayer Stochastic Modeling (MSM) framework of Kondrashov, Chekroun and Ghil [Physica D, 2015] and it leads to probabilistic prognostic models of sea ice concentration (SIC) anomalies on seasonal time scales. This approach is applied to monthly time series of state-of-the-art data-adaptive decompositions of SIC and selected climate variables over the Arctic. We evaluate the predictive skill of MSM models by performing retrospective forecasts with "no-look ahead" for up to 6-months ahead. It will be shown in particular that the memory effects included intrinsically in the formulation of our non-Markovian MSM models allow for improvements of the prediction skill of large-amplitude SIC anomalies in certain Arctic regions on the one hand, and of September Sea Ice Extent, on the other. Further improvements allowed by the MSM framework will adopt a nonlinear formulation and explore next-generation data-adaptive decompositions, namely modification of Principal Oscillation Patterns (POPs) and rotated Multichannel Singular Spectrum Analysis (M-SSA).

  1. Human body segmentation via data-driven graph cut.

    Science.gov (United States)

    Li, Shifeng; Lu, Huchuan; Shao, Xingqing

    2014-11-01

    Human body segmentation is a challenging and important problem in computer vision. Existing methods usually entail a time-consuming training phase for prior knowledge learning with complex shape matching for body segmentation. In this paper, we propose a data-driven method that integrates top-down body pose information and bottom-up low-level visual cues for segmenting humans in static images within the graph cut framework. The key idea of our approach is first to exploit human kinematics to search for body part candidates via dynamic programming for high-level evidence. Then, by using the body parts classifiers, obtaining bottom-up cues of human body distribution for low-level evidence. All the evidence collected from top-down and bottom-up procedures are integrated in a graph cut framework for human body segmentation. Qualitative and quantitative experiment results demonstrate the merits of the proposed method in segmenting human bodies with arbitrary poses from cluttered backgrounds.

  2. Data-driven coronal evolutionary model of active region 11944.

    Science.gov (United States)

    Kazachenko, M.

    2014-12-01

    Recent availability of systematic measurements of vector magnetic fields and Doppler velocities has allowed us to utilize a data-driven approach for modeling observed active regions (AR), a crucial step for understanding the nature of solar flare initiation. We use a sequence of vector magnetograms and Dopplergrams from the Helioseismic and Magnetic Imager (HMI) aboard the SDO to drive magnetofrictional (MF) model of the coronal magnetic field in the the vicinity of AR 11944, where an X1.2 flare on January 7 2014 occurred. To drive the coronal field we impose a time-dependent boundary condition based on temporal sequences of magnetic and electric fields at the bottom of the computational domain, i.e. the photosphere. To derive the electric fields we use a recently improved poloidal-toroidal decomposition (PTD), which we call the ``PTD-Doppler-FLCT-Ideal'' or PDFI technique. We investigate the results of the simulated coronal evolution, compare those with EUV observations from Atmospheric Imaging Assembly (AIA) and discuss what we could learn from them. This work is a a collaborative effort from the UC Berkeley Space Sciences Laboratory (SSL), Stanford University, and Lockheed-Martin and is a part of Coronal Global Evolutionary (CGEM) Model, funded jointly by NASA and NSF.

  3. A Data-Driven Approach to Realistic Shape Morphing

    KAUST Repository

    Gao, Lin

    2013-05-01

    Morphing between 3D objects is a fundamental technique in computer graphics. Traditional methods of shape morphing focus on establishing meaningful correspondences and finding smooth interpolation between shapes. Such methods however only take geometric information as input and thus cannot in general avoid producing unnatural interpolation, in particular for large-scale deformations. This paper proposes a novel data-driven approach for shape morphing. Given a database with various models belonging to the same category, we treat them as data samples in the plausible deformation space. These models are then clustered to form local shape spaces of plausible deformations. We use a simple metric to reasonably represent the closeness between pairs of models. Given source and target models, the morphing problem is casted as a global optimization problem of finding a minimal distance path within the local shape spaces connecting these models. Under the guidance of intermediate models in the path, an extended as-rigid-as-possible interpolation is used to produce the final morphing. By exploiting the knowledge of plausible models, our approach produces realistic morphing for challenging cases as demonstrated by various examples in the paper. © 2013 The Eurographics Association and Blackwell Publishing Ltd.

  4. Boosted learned kernels for data-driven vesselness measure

    Science.gov (United States)

    Grisan, E.

    2017-03-01

    Common vessel centerline extraction methods rely on the computation of a measure providing the likeness of the local appearance of the data to a curvilinear tube-like structure. The most popular techniques rely on empirically designed (hand crafted) measurements as the widely used Hessian vesselness, the recent oriented flux tubeness or filters (e.g. the Gaussian matched filter) that are developed to respond to local features, without exploiting any context information nor the rich structural information embedded in the data. At variance with the previously proposed methods, we propose a completely data-driven approach for learning a vesselness measure from expert-annotated dataset. For each data point (voxel or pixel), we extract the intensity values in a neighborhood region, and estimate the discriminative convolutional kernel yielding a positive response for vessel data and negative response for non-vessel data. The process is iterated within a boosting framework, providing a set of linear filters, whose combined response is the learned vesselness measure. We show the results of the general-use proposed method on the DRIVE retinal images dataset, comparing its performance against the hessian-based vesselness, oriented flux antisymmetry tubeness, and vesselness learned with a probabilistic boosting tree or with a regression tree. We demonstrate the superiority of our approach that yields a vessel detection accuracy of 0.95, with respect to 0.92 (hessian), 0.90 (oriented flux) and 0.85 (boosting tree).

  5. Data-driven optimization of dynamic reconfigurable systems of systems.

    Energy Technology Data Exchange (ETDEWEB)

    Tucker, Conrad S.; Eddy, John P.

    2010-11-01

    This report documents the results of a Strategic Partnership (aka University Collaboration) LDRD program between Sandia National Laboratories and the University of Illinois at Urbana-Champagne. The project is titled 'Data-Driven Optimization of Dynamic Reconfigurable Systems of Systems' and was conducted during FY 2009 and FY 2010. The purpose of this study was to determine and implement ways to incorporate real-time data mining and information discovery into existing Systems of Systems (SoS) modeling capabilities. Current SoS modeling is typically conducted in an iterative manner in which replications are carried out in order to quantify variation in the simulation results. The expense of many replications for large simulations, especially when considering the need for optimization, sensitivity analysis, and uncertainty quantification, can be prohibitive. In addition, extracting useful information from the resulting large datasets is a challenging task. This work demonstrates methods of identifying trends and other forms of information in datasets that can be used on a wide range of applications such as quantifying the strength of various inputs on outputs, identifying the sources of variation in the simulation, and potentially steering an optimization process for improved efficiency.

  6. Pro Spring Batch

    CERN Document Server

    Minella, Michael T

    2011-01-01

    Since its release, Spring Framework has transformed virtually every aspect of Java development including web applications, security, aspect-oriented programming, persistence, and messaging. Spring Batch, one of its newer additions, now brings the same familiar Spring idioms to batch processing. Spring Batch addresses the needs of any batch process, from the complex calculations performed in the biggest financial institutions to simple data migrations that occur with many software development projects. Pro Spring Batch is intended to answer three questions: *What? What is batch processing? What

  7. Dynamic Data-Driven Event Reconstruction for Atmospheric Releases

    Energy Technology Data Exchange (ETDEWEB)

    Kosovic, B; Belles, R; Chow, F K; Monache, L D; Dyer, K; Glascoe, L; Hanley, W; Johannesson, G; Larsen, S; Loosmore, G; Lundquist, J K; Mirin, A; Neuman, S; Nitao, J; Serban, R; Sugiyama, G; Aines, R

    2007-02-22

    Accidental or terrorist releases of hazardous materials into the atmosphere can impact large populations and cause significant loss of life or property damage. Plume predictions have been shown to be extremely valuable in guiding an effective and timely response. The two greatest sources of uncertainty in the prediction of the consequences of hazardous atmospheric releases result from poorly characterized source terms and lack of knowledge about the state of the atmosphere as reflected in the available meteorological data. In this report, we discuss the development of a new event reconstruction methodology that provides probabilistic source term estimates from field measurement data for both accidental and clandestine releases. Accurate plume dispersion prediction requires the following questions to be answered: What was released? When was it released? How much material was released? Where was it released? We have developed a dynamic data-driven event reconstruction capability which couples data and predictive models through Bayesian inference to obtain a solution to this inverse problem. The solution consists of a probability distribution of unknown source term parameters. For consequence assessment, we then use this probability distribution to construct a ''''composite'' forward plume prediction which accounts for the uncertainties in the source term. Since in most cases of practical significance it is impossible to find a closed form solution, Bayesian inference is accomplished by utilizing stochastic sampling methods. This approach takes into consideration both measurement and forward model errors and thus incorporates all the sources of uncertainty in the solution to the inverse problem. Stochastic sampling methods have the additional advantage of being suitable for problems characterized by a non-Gaussian distribution of source term parameters and for cases in which the underlying dynamical system is non-linear. We initially

  8. SIDEKICK: Genomic data driven analysis and decision-making framework

    Directory of Open Access Journals (Sweden)

    Yoon Kihoon

    2010-12-01

    Full Text Available Abstract Background Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation. Results Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick. Conclusions Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to

  9. Mapping landslide susceptibility using data-driven methods.

    Science.gov (United States)

    Zêzere, J L; Pereira, S; Melo, R; Oliveira, S C; Garcia, R A C

    2017-07-01

    Most epistemic uncertainty within data-driven landslide susceptibility assessment results from errors in landslide inventories, difficulty in identifying and mapping landslide causes and decisions related with the modelling procedure. In this work we evaluate and discuss differences observed on landslide susceptibility maps resulting from: (i) the selection of the statistical method; (ii) the selection of the terrain mapping unit; and (iii) the selection of the feature type to represent landslides in the model (polygon versus point). The work is performed in a single study area (Silveira Basin - 18.2km(2) - Lisbon Region, Portugal) using a unique database of geo-environmental landslide predisposing factors and an inventory of 82 shallow translational slides. The logistic regression, the discriminant analysis and two versions of the information value were used and we conclude that multivariate statistical methods perform better when computed over heterogeneous terrain units and should be selected to assess landslide susceptibility based on slope terrain units, geo-hydrological terrain units or census terrain units. However, evidence was found that the chosen terrain mapping unit can produce greater differences on final susceptibility results than those resulting from the chosen statistical method for modelling. The landslide susceptibility should be assessed over grid cell terrain units whenever the spatial accuracy of landslide inventory is good. In addition, a single point per landslide proved to be efficient to generate accurate landslide susceptibility maps, providing the landslides are of small size, thus minimizing the possible existence of heterogeneities of predisposing factors within the landslide boundary. Although during last years the ROC curves have been preferred to evaluate the susceptibility model's performance, evidence was found that the model with the highest AUC ROC is not necessarily the best landslide susceptibility model, namely when terrain

  10. Evidence-based and data-driven road safety management

    Directory of Open Access Journals (Sweden)

    Fred Wegman

    2015-07-01

    Full Text Available Over the past decades, road safety in highly-motorised countries has made significant progress. Although we have a fair understanding of the reasons for this progress, we don't have conclusive evidence for this. A new generation of road safety management approaches has entered road safety, starting when countries decided to guide themselves by setting quantitative targets (e.g. 50% less casualties in ten years' time. Setting realistic targets, designing strategies and action plans to achieve these targets and monitoring progress have resulted in more scientific research to support decision-making on these topics. Three subjects are key in this new approach of evidence-based and data-driven road safety management: ex-post and ex-ante evaluation of both individual interventions and intervention packages in road safety strategies, and transferability (external validity of the research results. In this article, we explore these subjects based on recent experiences in four jurisdictions (Western Australia, the Netherlands, Sweden and Switzerland. All four apply similar approaches and tools; differences are considered marginal. It is concluded that policy-making and political decisions were influenced to a great extent by the results of analysis and research. Nevertheless, to compensate for a relatively weak theoretical basis and to improve the power of this new approach, a number of issues will need further research. This includes ex-post and ex-ante evaluation, a better understanding of extrapolation of historical trends and the transferability of research results. This new approach cannot be realized without high-quality road safety data. Good data and knowledge are indispensable for this new and very promising approach.

  11. Improving Cybersecurity Governance Through Data-Driven Decision-Making and Execution (Briefing Charts)

    Science.gov (United States)

    2014-10-01

    Professional Sentiments Analysis Orient Unstructured Data Machine Learning Text Analysis Trend Analysis Correlation 14 Sources of Constraints and...2014 Carnegie Mellon University Improving Cybersecurity Governance Through Data-Driven Decision- Making and Execution Doug Gray Report...REPORT TYPE N/A 3. DATES COVERED - 4. TITLE AND SUBTITLE Improving Cybersecurity Governance Through Data-Driven Decision-Making and Execution

  12. Spring batch essentials

    CERN Document Server

    Rao, P Raja Malleswara

    2015-01-01

    If you are a Java developer with basic knowledge of Spring and some experience in the development of enterprise applications, and want to learn about batch application development in detail, then this book is ideal for you. This book will be perfect as your next step towards building simple yet powerful batch applications on a Java-based platform.

  13. Designing Data-Driven Battery Prognostic Approaches for Variable Loading Profiles: Some Lessons Learned

    Data.gov (United States)

    National Aeronautics and Space Administration — Among various approaches for implementing prognostic algorithms data-driven algorithms are popular in the industry due to their intuitive nature and relatively fast...

  14. The Potential of Knowing More: A Review of Data-Driven Urban Water Management.

    Science.gov (United States)

    Eggimann, Sven; Mutzner, Lena; Wani, Omar; Schneider, Mariane Yvonne; Spuhler, Dorothee; Moy de Vitry, Matthew; Beutler, Philipp; Maurer, Max

    2017-02-14

    The promise of collecting and utilizing large amounts of data has never been greater in the history of urban water management (UWM). This paper reviews several data-driven approaches which play a key role in bringing forward a sea change. It critically investigates whether data-driven UWM offers a promising foundation for addressing current challenges and supporting fundamental changes in UWM. We discuss the examples of better rain-data management, urban pluvial flood-risk management and forecasting, drinking water and sewer network operation and management, integrated design and management, increasing water productivity, wastewater-based epidemiology and on-site water and wastewater treatment. The accumulated evidence from literature points toward a future UWM that offers significant potential benefits thanks to increased collection and utilization of data. The findings show that data-driven UWM allows us to develop and apply novel methods, to optimize the efficiency of the current network-based approach, and to extend functionality of today's systems. However, generic challenges related to data-driven approaches (e.g., data processing, data availability, data quality, data costs) and the specific challenges of data-driven UWM need to be addressed, namely data access and ownership, current engineering practices and the difficulty of assessing the cost benefits of data-driven UWM.

  15. The influence of data-driven versus conceptually-driven processing on the development of PTSD-like symptoms

    NARCIS (Netherlands)

    M. Kindt; M. van den Hout; A. Arntz; J. Drost

    2008-01-01

    Ehlers and Clark [(2000). A cognitive model of posttraumatic stress disorder. Behaviour Research and Therapy, 38, 319-345] propose that a predominance of data-driven processing during the trauma predicts subsequent PTSD. We wondered whether, apart from data-driven encoding, sustained data-driven pro

  16. Data-driven design of fault diagnosis and fault-tolerant control systems

    CERN Document Server

    Ding, Steven X

    2014-01-01

    Data-driven Design of Fault Diagnosis and Fault-tolerant Control Systems presents basic statistical process monitoring, fault diagnosis, and control methods, and introduces advanced data-driven schemes for the design of fault diagnosis and fault-tolerant control systems catering to the needs of dynamic industrial processes. With ever increasing demands for reliability, availability and safety in technical processes and assets, process monitoring and fault-tolerance have become important issues surrounding the design of automatic control systems. This text shows the reader how, thanks to the rapid development of information technology, key techniques of data-driven and statistical process monitoring and control can now become widely used in industrial practice to address these issues. To allow for self-contained study and facilitate implementation in real applications, important mathematical and control theoretical knowledge and tools are included in this book. Major schemes are presented in algorithm form and...

  17. Design of video quality metrics with multi-way data analysis a data driven approach

    CERN Document Server

    Keimel, Christian

    2016-01-01

    This book proposes a data-driven methodology using multi-way data analysis for the design of video-quality metrics. It also enables video- quality metrics to be created using arbitrary features. This data- driven design approach not only requires no detailed knowledge of the human visual system, but also allows a proper consideration of the temporal nature of video using a three-way prediction model, corresponding to the three-way structure of video. Using two simple example metrics, the author demonstrates not only that this purely data- driven approach outperforms state-of-the-art video-quality metrics, which are often optimized for specific properties of the human visual system, but also that multi-way data analysis methods outperform the combination of two-way data analysis methods and temporal pooling. .

  18. Service and Data Driven Multi Business Model Platform in a World of Persuasive Technologies

    DEFF Research Database (Denmark)

    Andersen, Troels Christian; Bjerrum, Torben Cæsar Bisgaard

    2016-01-01

    companies in establishing a service organization that delivers, creates and captures value through service and data driven business models by utilizing their network, resources and customers and/or users. Furthermore, based on literature and collaboration with the case company, the suggestion of a new...... framework provides the necessary construction of how the manufac- turing companies can evolve their current business to provide multi service and data driven business models, using the same resources, networks and customers.......This article provides a new contribution to the concept of business models with the focus on the emerging gap between the usage of data, service and business models by suggesting a framework that function as a service and data driven business model platform. The purpose is to support manufacturing...

  19. Data-driven remaining useful life prognosis techniques stochastic models, methods and applications

    CERN Document Server

    Si, Xiao-Sheng; Hu, Chang-Hua

    2017-01-01

    This book introduces data-driven remaining useful life prognosis techniques, and shows how to utilize the condition monitoring data to predict the remaining useful life of stochastic degrading systems and to schedule maintenance and logistics plans. It is also the first book that describes the basic data-driven remaining useful life prognosis theory systematically and in detail. The emphasis of the book is on the stochastic models, methods and applications employed in remaining useful life prognosis. It includes a wealth of degradation monitoring experiment data, practical prognosis methods for remaining useful life in various cases, and a series of applications incorporated into prognostic information in decision-making, such as maintenance-related decisions and ordering spare parts. It also highlights the latest advances in data-driven remaining useful life prognosis techniques, especially in the contexts of adaptive prognosis for linear stochastic degrading systems, nonlinear degradation modeling based pro...

  20. Safety analysis of proposed data-driven physiologic alarm parameters for hospitalized children.

    Science.gov (United States)

    Goel, Veena V; Poole, Sarah F; Longhurst, Christopher A; Platchek, Terry S; Pageler, Natalie M; Sharek, Paul J; Palma, Jonathan P

    2016-12-01

    Modification of alarm limits is one approach to mitigating alarm fatigue. We aimed to create and validate heart rate (HR) and respiratory rate (RR) percentiles for hospitalized children, and analyze the safety of replacing current vital sign reference ranges with proposed data-driven, age-stratified 5th and 95th percentile values. In this retrospective cross-sectional study, nurse-charted HR and RR data from a training set of 7202 hospitalized children were used to develop percentile tables. We compared 5th and 95th percentile values with currently accepted reference ranges in a validation set of 2287 patients. We analyzed 148 rapid response team (RRT) and cardiorespiratory arrest (CRA) events over a 12-month period, using HR and RR values in the 12 hours prior to the event, to determine the proportion of patients with out-of-range vitals based upon reference versus data-driven limits. There were 24,045 (55.6%) fewer out-of-range measurements using data-driven vital sign limits. Overall, 144/148 RRT and CRA patients had out-of-range HR or RR values preceding the event using current limits, and 138/148 were abnormal using data-driven limits. Chart review of RRT and CRA patients with abnormal HR and RR per current limits considered normal by data-driven limits revealed that clinical status change was identified by other vital sign abnormalities or clinical context. A large proportion of vital signs in hospitalized children are outside presently used norms. Safety evaluation of data-driven limits suggests they are as safe as those currently used. Implementation of these parameters in physiologic monitors may mitigate alarm fatigue. Journal of Hospital Medicine 2015;11:817-823. © 2015 Society of Hospital Medicine. © 2016 Society of Hospital Medicine.

  1. Hybrid Batch Bayesian Optimization

    CERN Document Server

    Azimi, Javad; Fern, Xiaoli

    2012-01-01

    Bayesian Optimization aims at optimizing an unknown non-convex/concave function that is costly to evaluate. We are interested in application scenarios where concurrent function evaluations are possible. Under such a setting, BO could choose to either sequentially evaluate the function, one input at a time and wait for the output of the function before making the next selection, or evaluate the function at a batch of multiple inputs at once. These two different settings are commonly referred to as the sequential and batch settings of Bayesian Optimization. In general, the sequential setting leads to better optimization performance as each function evaluation is selected with more information, whereas the batch setting has an advantage in terms of the total experimental time (the number of iterations). In this work, our goal is to combine the strength of both settings. Specifically, we systematically analyze Bayesian optimization using Gaussian process as the posterior estimator and provide a hybrid algorithm t...

  2. Heuristics for batching and sequencing in batch processing machines

    Directory of Open Access Journals (Sweden)

    Chuda Basnet

    2016-12-01

    Full Text Available In this paper, we discuss the “batch processing” problem, where there are multiple jobs to be processed in flow shops. These jobs can however be formed into batches and the number of jobs in a batch is limited by the capacity of the processing machines to accommodate the jobs. The processing time required by a batch in a machine is determined by the greatest processing time of the jobs included in the batch. Thus, the batch processing problem is a mix of batching and sequencing – the jobs need to be grouped into distinct batches, the batches then need to be sequenced through the flow shop. We apply certain newly developed heuristics to the problem and present computational results. The contributions of this paper are deriving a lower bound, and the heuristics developed and tested in this paper.

  3. Data-Driven Learning of Speech Acts Based on Corpora of DVD Subtitles

    Science.gov (United States)

    Kitao, S. Kathleen; Kitao, Kenji

    2013-01-01

    Data-driven learning (DDL) is an inductive approach to language learning in which students study examples of authentic language and use them to find patterns of language use. This inductive approach to learning has the advantages of being learner-centered, encouraging hypothesis testing and learner autonomy, and helping develop learning skills.…

  4. Federal Policy to Local Level Decision-Making: Data Driven Education Planning in Nigeria

    Science.gov (United States)

    Iyengar, Radhika; Mahal, Angelique R.; Felicia, Ukaegbu-Nnamchi Ifeyinwa; Aliyu, Balaraba; Karim, Alia

    2015-01-01

    This article discusses the implementation of local level education data-driven planning as implemented by the Office of the Senior Special Assistant to the President of Nigeria on the Millennium Development Goals (OSSAP-MDGs) in partnership with The Earth Institute, Columbia University. It focuses on the design and implementation of the…

  5. Data-Driven Decision Making: Teachers' Use of Data in the Classroom

    Science.gov (United States)

    Moriarty, Tammy Wu

    2013-01-01

    Data-driven decision making has become an important educational issue in the United States, primarily because of federal and state emphasis on school accountability and achievement. Data use has been highlighted as a key factor in monitoring student progress and informing decision making at various levels of the education system. Federal and state…

  6. Data-driven medicinal chemistry in the era of big data

    NARCIS (Netherlands)

    Lusher, S.J.; McGuire, R.; Schaik, R.C. Van; Nicholson, C.D.; Vlieg, J. de

    2014-01-01

    Science, and the way we undertake research, is changing. The increasing rate of data generation across all scientific disciplines is providing incredible opportunities for data-driven research, with the potential to transform our current practices. The exploitation of so-called 'big data' will enabl

  7. Data-Driven Hint Generation in Vast Solution Spaces: A Self-Improving Python Programming Tutor

    Science.gov (United States)

    Rivers, Kelly; Koedinger, Kenneth R.

    2017-01-01

    To provide personalized help to students who are working on code-writing problems, we introduce a data-driven tutoring system, ITAP (Intelligent Teaching Assistant for Programming). ITAP uses state abstraction, path construction, and state reification to automatically generate personalized hints for students, even when given states that have not…

  8. Data-Driven Visualization and Group Analysis of Multichannel EEG Coherence with Functional Units

    NARCIS (Netherlands)

    Caat, Michael ten; Maurits, Natasha M.; Roerdink, Jos B.T.M.

    2008-01-01

    A typical data- driven visualization of electroencephalography ( EEG) coherence is a graph layout, with vertices representing electrodes and edges representing significant coherences between electrode signals. A drawback of this layout is its visual clutter for multichannel EEG. To reduce clutter, w

  9. Teacher Perceptions and Use of Data-Driven Instruction: A Qualitative Study

    Science.gov (United States)

    Melucci, Laura

    2013-01-01

    The purpose of this study was to determine how teacher perceptions of data and use of data-driven instruction affect student performance in English language arts (ELA). This study investigated teachers' perceptions of using data in the classroom and what supports they need to do so. The goal of the research was to increase the level of knowledge…

  10. Data-driven diagnostics of terrestrial carbon dynamics over North America

    Science.gov (United States)

    Jingfeng Xiao; Scott V. Ollinger; Steve Frolking; George C. Hurtt; David Y. Hollinger; Kenneth J. Davis; Yude Pan; Xiaoyang Zhang; Feng Deng; Jiquan Chen; Dennis D. Baldocchi; Bevery E. Law; M. Altaf Arain; Ankur R. Desai; Andrew D. Richardson; Ge Sun; Brian Amiro; Hank Margolis; Lianhong Gu; Russell L. Scott; Peter D. Blanken; Andrew E. Suyker

    2014-01-01

    The exchange of carbon dioxide is a key measure of ecosystem metabolism and a critical intersection between the terrestrial biosphere and the Earth's climate. Despite the general agreement that the terrestrial ecosystems in North America provide a sizeable carbon sink, the size and distribution of the sink remain uncertain. We use a data-driven approach to upscale...

  11. Exploring Techniques of Developing Writing Skill in IELTS Preparatory Courses: A Data-Driven Study

    Science.gov (United States)

    Ostovar-Namaghi, Seyyed Ali; Safaee, Seyyed Esmail

    2017-01-01

    Being driven by the hypothetico-deductive mode of inquiry, previous studies have tested the effectiveness of theory-driven interventions under controlled experimental conditions to come up with universally applicable generalizations. To make a case in the opposite direction, this data-driven study aims at uncovering techniques and strategies…

  12. Physical Strength as a Cue to Dominance : A Data-Driven Approach

    NARCIS (Netherlands)

    Toscano, Hugo; Schubert, Thomas W; Dotsch, Ron; Falvello, Virginia; Todorov, Alexander

    2016-01-01

    We investigate both similarities and differences between dominance and strength judgments using a data-driven approach. First, we created statistical face shape models of judgments of both dominance and physical strength. The resulting faces representing dominance and strength were highly similar,

  13. A Case for Reevaluating Teacher's Role in Data-Driven Learning (DDL) of English Articles

    Institute of Scientific and Technical Information of China (English)

    ZHAO Juan

    2013-01-01

    A case study has been made to explore whether the teacher’s role in data-driven learning (DDL) can be minimized. The outcome shows that the teacher’s role in offering an explicit instruction may be indispensable and even central to the acquisi-tion of English articles.

  14. Hybrid models for hydrological forecasting: integration of data-driven and conceptual modelling techniques

    NARCIS (Netherlands)

    Corzo Perez, G.A.

    2009-01-01

    This book presents the investigation of different architectures of integrating hydrological knowledge and models with data-driven models for the purpose of hydrological flow forecasting. The models resulting from such integration are referred to as hybrid models. The book addresses the following top

  15. Hybrid models for hydrological forecasting: Integration of data-driven and conceptual modelling techniques

    NARCIS (Netherlands)

    Corzo Perez, G.A.

    2009-01-01

    This book presents the investigation of different architectures of integrating hydrological knowledge and models with data-driven models for the purpose of hydrological flow forecasting. The models resulting from such integration are referred to as hybrid models. The book addresses the following top

  16. Data-driven modeling of nano-nose gas sensor arrays

    DEFF Research Database (Denmark)

    Alstrøm, Tommy Sonne; Larsen, Jan; Nielsen, Claus Højgård

    2010-01-01

    We present a data-driven approach to classification of Quartz Crystal Microbalance (QCM) sensor data. The sensor is a nano-nose gas sensor that detects concentrations of analytes down to ppm levels using plasma polymorized coatings. Each sensor experiment takes approximately one hour hence the nu......-of-the-art machine learning methods and the Bayesian learning paradigm....

  17. Data-Driven Learning: Changing the Teaching of Grammar in EFL Classes

    Science.gov (United States)

    Lin, Ming Huei; Lee, Jia-Ying

    2015-01-01

    This study aims to investigate the experience of six early-career teachers who team-taught grammar to EFL college students using data-driven learning (DDL) for the first time. The results show that the teachers found DDL an innovative and interesting approach to teaching grammar, approved of DDL's capacity to provide more incentives for students…

  18. Using data-driven model-brain mappings to constrain formal models of cognition

    NARCIS (Netherlands)

    Borst, Jelmer P; Nijboer, Menno; Taatgen, Niels A; van Rijn, Hedderik; Anderson, John R

    2015-01-01

    In this paper we propose a method to create data-driven mappings from components of cognitive models to brain regions. Cognitive models are notoriously hard to evaluate, especially based on behavioral measures alone. Neuroimaging data can provide additional constraints, but this requires a mapping f

  19. Ability Grouping and Differentiated Instruction in an Era of Data-Driven Decision Making

    Science.gov (United States)

    Park, Vicki; Datnow, Amanda

    2017-01-01

    Despite data-driven decision making being a ubiquitous part of policy and school reform efforts, little is known about how teachers use data for instructional decision making. Drawing on data from a qualitative case study of four elementary schools, we examine the logic and patterns of teacher decision making about differentiation and ability…

  20. A framework for the automated data-driven constitutive characterization of composites

    Science.gov (United States)

    J.G. Michopoulos; John Hermanson; T. Furukawa; A. Iliopoulos

    2010-01-01

    We present advances on the development of a mechatronically and algorithmically automated framework for the data-driven identification of constitutive material models based on energy density considerations. These models can capture both the linear and nonlinear constitutive response of multiaxially loaded composite materials in a manner that accounts for progressive...

  1. Data-Driven Hint Generation in Vast Solution Spaces: A Self-Improving Python Programming Tutor

    Science.gov (United States)

    Rivers, Kelly; Koedinger, Kenneth R.

    2017-01-01

    To provide personalized help to students who are working on code-writing problems, we introduce a data-driven tutoring system, ITAP (Intelligent Teaching Assistant for Programming). ITAP uses state abstraction, path construction, and state reification to automatically generate personalized hints for students, even when given states that have not…

  2. Data-driven modeling of nano-nose gas sensor arrays

    DEFF Research Database (Denmark)

    Alstrøm, Tommy Sonne; Larsen, Jan; Nielsen, Claus Højgård

    2010-01-01

    We present a data-driven approach to classification of Quartz Crystal Microbalance (QCM) sensor data. The sensor is a nano-nose gas sensor that detects concentrations of analytes down to ppm levels using plasma polymorized coatings. Each sensor experiment takes approximately one hour hence...

  3. Hybrid models for hydrological forecasting: integration of data-driven and conceptual modelling techniques

    NARCIS (Netherlands)

    Corzo Perez, G.A.

    2009-01-01

    This book presents the investigation of different architectures of integrating hydrological knowledge and models with data-driven models for the purpose of hydrological flow forecasting. The models resulting from such integration are referred to as hybrid models. The book addresses the following

  4. Hybrid models for hydrological forecasting: Integration of data-driven and conceptual modelling techniques

    NARCIS (Netherlands)

    Corzo Perez, G.A.

    2009-01-01

    This book presents the investigation of different architectures of integrating hydrological knowledge and models with data-driven models for the purpose of hydrological flow forecasting. The models resulting from such integration are referred to as hybrid models. The book addresses the following

  5. Big-Data-Driven Stem Cell Science and Tissue Engineering: Vision and Unique Opportunities.

    Science.gov (United States)

    Del Sol, Antonio; Thiesen, Hans J; Imitola, Jaime; Carazo Salas, Rafael

    2017-02-02

    Achieving the promises of stem cell science to generate precise disease models and designer cell samples for personalized therapeutics will require harnessing pheno-genotypic cell-level data quantitatively and predictively in the lab and clinic. Those requirements could be met by developing a Big-Data-driven stem cell science strategy and community.

  6. Development of a Scale to Measure Learners' Perceived Preferences and Benefits of Data-Driven Learning

    Science.gov (United States)

    Mizumoto, Atsushi; Chujo, Kiyomi; Yokota, Kenji

    2016-01-01

    In spite of researchers' and practitioners' increasing attention to data-driven learning (DDL) and increasing numbers of DDL studies, a multi-item scale to measure learners' attitude toward DDL has not been developed thus far. In the present study, we developed and validated a psychometric scale to measure learners' perceived preferences and…

  7. Using Flexible Data-Driven Frameworks to Enhance School Psychology Training and Practice

    Science.gov (United States)

    Coleman, Stephanie L.; Hendricker, Elise

    2016-01-01

    While a great number of scientific advances have been made in school psychology, the research to practice gap continues to exist, which has significant implications for training future school psychologists. Training in flexible, data-driven models may help school psychology trainees develop important competencies that will benefit them throughout…

  8. Robust Data-Driven Inference for Density-Weighted Average Derivatives

    DEFF Research Database (Denmark)

    Cattaneo, Matias D.; Crump, Richard K.; Jansson, Michael

    This paper presents a new data-driven bandwidth selector compatible with the small bandwidth asymptotics developed in Cattaneo, Crump, and Jansson (2009) for density- weighted average derivatives. The new bandwidth selector is of the plug-in variety, and is obtained based on a mean squared error...

  9. Data-Driven Learning: Changing the Teaching of Grammar in EFL Classes

    Science.gov (United States)

    Lin, Ming Huei; Lee, Jia-Ying

    2015-01-01

    This study aims to investigate the experience of six early-career teachers who team-taught grammar to EFL college students using data-driven learning (DDL) for the first time. The results show that the teachers found DDL an innovative and interesting approach to teaching grammar, approved of DDL's capacity to provide more incentives for students…

  10. A Hybrid Approach to Combine Physically Based and Data-Driven Models in Simulating Sediment Transportation

    NARCIS (Netherlands)

    Sewagudde, S.

    2008-01-01

    The objective of this study is to develop a methodology for hybrid modelling of sedimentation in a coastal basin or large shallow lake where physically based and data driven approaches are combined. This research was broken down into three blocks. The first block explores the possibility of approxim

  11. PHYCAA: Data-driven measurement and removal of physiological noise in BOLD fMRI

    DEFF Research Database (Denmark)

    Churchill, Nathan W.; Yourganov, Grigori; Spring, Robyn

    2012-01-01

    challenge for identifying and removing such artifact. This paper presents a multivariate, data-driven method for the characterization and removal of physiological noise in fMRI data, termed PHYCAA (PHYsiological correction using Canonical Autocorrelation Analysis). The method identifies high frequency...

  12. Data-driven region-of-interest selection without inflating Type I error rate.

    Science.gov (United States)

    Brooks, Joseph L; Zoumpoulaki, Alexia; Bowman, Howard

    2017-01-01

    In ERP and other large multidimensional neuroscience data sets, researchers often select regions of interest (ROIs) for analysis. The method of ROI selection can critically affect the conclusions of a study by causing the researcher to miss effects in the data or to detect spurious effects. In practice, to avoid inflating Type I error rate (i.e., false positives), ROIs are often based on a priori hypotheses or independent information. However, this can be insensitive to experiment-specific variations in effect location (e.g., latency shifts) reducing power to detect effects. Data-driven ROI selection, in contrast, is nonindependent and uses the data under analysis to determine ROI positions. Therefore, it has potential to select ROIs based on experiment-specific information and increase power for detecting effects. However, data-driven methods have been criticized because they can substantially inflate Type I error rate. Here, we demonstrate, using simulations of simple ERP experiments, that data-driven ROI selection can indeed be more powerful than a priori hypotheses or independent information. Furthermore, we show that data-driven ROI selection using the aggregate grand average from trials (AGAT), despite being based on the data at hand, can be safely used for ROI selection under many circumstances. However, when there is a noise difference between conditions, using the AGAT can inflate Type I error and should be avoided. We identify critical assumptions for use of the AGAT and provide a basis for researchers to use, and reviewers to assess, data-driven methods of ROI localization in ERP and other studies.

  13. 数据驱动系统方法概述%Notes on Data-driven System Approaches

    Institute of Scientific and Technical Information of China (English)

    许建新; 侯忠生

    2009-01-01

    In this paper, we present several considerations centered around the data-driven system approaches. We briefly explore three main issues: the evolving relationship between off-line and on-line data processing methods, the complementary relationship between the data-driven and model-based methods, and the perspectives of data-driven system approaches. Instead of offering solutions to data-driven system problems, which is impossible at the present level of knowledge and research, in this article we aim at categorizing and classifying open problems, exploring possible directions that may offer alternatives or potentials for the four key fields of interests: control, decision making, scheduling, and fault diagnosis.

  14. Dynamic Batch Bayesian Optimization

    CERN Document Server

    Azimi, Javad; Fern, Xiaoli

    2011-01-01

    Bayesian optimization (BO) algorithms try to optimize an unknown function that is expensive to evaluate using minimum number of evaluations/experiments. Most of the proposed algorithms in BO are sequential, where only one experiment is selected at each iteration. This method can be time inefficient when each experiment takes a long time and more than one experiment can be ran concurrently. On the other hand, requesting a fix-sized batch of experiments at each iteration causes performance inefficiency in BO compared to the sequential policies. In this paper, we present an algorithm that asks a batch of experiments at each time step t where the batch size p_t is dynamically determined in each step. Our algorithm is based on the observation that the sequence of experiments selected by the sequential policy can sometimes be almost independent from each other. Our algorithm identifies such scenarios and request those experiments at the same time without degrading the performance. We evaluate our proposed method us...

  15. Designing a Dynamic Data Driven Application System for Estimating Real-Time Load of DOC in a River

    Science.gov (United States)

    Ouyang, Y.; None

    2011-12-01

    Understanding the dynamics of naturally occurring dissolved organic carbon (DOC) in a river is central to estimating surface water quality, aquatic carbon cycling, and climate change. Currently, determination of DOC in surface water is primarily accomplished by manually collecting samples for laboratory analysis, which requires at least 24 hours. In other words, no effort has been devoted to monitoring real-time variations of DOC in a river due to the lack of suitable and/or cost-effective wireless sensors. However, when considering human health, carbon footprints, and effects of urbanization, industry, and agriculture on water resource supply, timely DOC information may be critical. We have developed here a new paradigm, a dynamic data driven application system (DDDAS), for estimating the real-time load of DOC into a river. This DDDAS consisted of the following four components: (1) a Visual Basic (VB) program for downloading US Geological Survey real-time chlorophyll and discharge data; (2) a STELLA model for evaluating real-time DOC load based on the relationship between chlorophyll a, DOC, and river discharge; (3) a batch file for linking the VB program and STELLA model; and (4) a Microsoft Windows Scheduled Tasks wizard for executing the model and displaying output on a computer screen at selected times. Results show that the real-time load of DOC into the St. Johns River basin near Satsuma, Putnam County, Florida, USA varied over a range from -13,143 to 29,248 kg/h at the selected site in Florida, USA. The negative loads occurred because of the back flow in the estuarine reach of the river. The cumulative load of DOC in the river for the selected site at the end of the simulation (178 hours) was about 1.2 tons. Our results support the utility of the DDDAS developed in this study for estimating the real-time variations of DOC in river ecosystems.

  16. Modelling of Batch Process Operations

    DEFF Research Database (Denmark)

    2011-01-01

    Here a batch cooling crystalliser is modelled and simulated as is a batch distillation system. In the batch crystalliser four operational modes of the crystalliser are considered, namely: initial cooling, nucleation, crystal growth and product removal. A model generation procedure is shown that s...

  17. An Open Framework for Dynamic Big-data-driven Application Systems (DBDDAS) Development

    KAUST Repository

    Douglas, Craig C.

    2014-06-06

    In this paper, we outline key features that dynamic data-driven application systems (DDDAS) have. A DDDAS is an application that has data assimilation that can change the models and/or scales of the computation and that the application controls the data collection based on the computational results. The term Big Data (BD) has come into being in recent years that is highly applicable to most DDDAS since most applications use networks of sensors that generate an overwhelming amount of data in the lifespan of the application runs. We describe what a dynamic big-data-driven application system (DBDDAS) toolkit must have in order to provide all of the essential building blocks that are necessary to easily create new DDDAS without re-inventing the building blocks.

  18. Data-Driven Temporal Filtering on Teager Energy Time Trajectory for Robust Speech Recognition

    Institute of Scientific and Technical Information of China (English)

    ZHAO Jun-hui; XIE Xiang; KUANG Jing-ming

    2006-01-01

    Data-driven temporal filtering technique is integrated into the time trajectory of Teager energy operation (TEO) based feature parameter for improving the robustness of speech recognition system against noise.Three kinds of data-driven temporal filters are investigated for the motivation of alleviating the harmful effects that the environmental factors have on the speech. The filters include: principle component analysis (PCA) based filters, linear discriminant analysis (LDA) based filters and minimum classification error (MCE) based filters. Detailed comparative analysis among these temporal filtering approaches applied in Teager energy domain is presented. It is shown that while all of them can improve the recognition performance of the original TEO based feature parameter in adverse environment, MCE based temporal filtering can provide the lowest error rate as SNR decreases than any other algorithms.

  19. Domain-oriented data-driven data mining:a new understanding for data mining

    Institute of Scientific and Technical Information of China (English)

    WANG Guo-yin; WANG Yan

    2008-01-01

    Recent advances in computing, communications, digital storage technologies, and high-throughput dam-acquisition technologies, make it possible to gather and store incredible volumes of data. It creates unprecedented opportunities for large-scale knowledge discovery from database. Data mining is an emerging area of computational intelligence that offers new theories, techniques, and tools for processing large volumes of data, such as data analysis, decision making, etc.There are many researchers working on designing efficient data mining techniques, methods, and algorithms. Unfortunate-ly, most data mining researchers pay much attention to technique problems for developing data mining models and methods,while little to basic issues of data mining. In this paper, we will propose a new understanding for data mining, that is, do-main-oriented data-driven data mining (3DM) model. Some data-driven data mining algorithms developed in our Lab are al-so presented to show its validity.

  20. A data-driven approach for quality assessment of radiologic interpretations.

    Science.gov (United States)

    Hsu, William; Han, Simon X; Arnold, Corey W; Bui, Alex At; Enzmann, Dieter R

    2016-04-01

    Given the increasing emphasis on delivering high-quality, cost-efficient healthcare, improved methodologies are needed to measure the accuracy and utility of ordered diagnostic examinations in achieving the appropriate diagnosis. Here, we present a data-driven approach for performing automated quality assessment of radiologic interpretations using other clinical information (e.g., pathology) as a reference standard for individual radiologists, subspecialty sections, imaging modalities, and entire departments. Downstream diagnostic conclusions from the electronic medical record are utilized as "truth" to which upstream diagnoses generated by radiology are compared. The described system automatically extracts and compares patient medical data to characterize concordance between clinical sources. Initial results are presented in the context of breast imaging, matching 18 101 radiologic interpretations with 301 pathology diagnoses and achieving a precision and recall of 84% and 92%, respectively. The presented data-driven method highlights the challenges of integrating multiple data sources and the application of information extraction tools to facilitate healthcare quality improvement.

  1. Specification and Verification of Multi-user Data-Driven Web Applications

    Science.gov (United States)

    Marcus, Monica

    We propose a model for multi-user data-driven communicating Web applications. An arbitrary number of users may access the application concurrently through Web sites and Web services. A Web service may have an arbitrary number of instances. The interaction between users and Web application is data-driven. Synchronous communication is done by shared access to the database and global application state. Private information may be stored in a local state. Asynchronous communication is done by message passing. A version of first-order linear time temporal logic (LTL-FO) is proposed to express behavioral properties of Web applications. The model is used to formally specify a significant fragment of an e-business application. Some of its desirable properties are expressed as LTL-FO formulas. We study a decision problem, namely whether the model satisfies an LTL-FO formula. We show the undecidability of the unrestricted verification problem and discuss some restrictions that ensure decidability.

  2. Data Driven Modelling of the Dynamic Wake Between Two Wind Turbines

    DEFF Research Database (Denmark)

    Knudsen, Torben; Bak, Thomas

    2012-01-01

    . This paper is the first where modern commercial mega watt turbines are used for data driven modelling including the upwind turbine loading by changing power reference. Obtaining the necessary data is difficult and data is therefore limited. A simple dynamic extension to the Jensen wake model is tested...... without much success. The best model turns out to be non linear with upwind turbine loading and wind speed as inputs. Using a transformation of these inputs it is possible to obtain a linear model and use well proven system identification methods. Finally it is shown that including the upwind wind...... turbine. This paper establishes flow models relating the wind speeds at turbines in a farm. So far, research in this area has been mainly based on first principles static models and the data driven modelling done has not included the loading of the upwind turbine and its impact on the wind speed downwind...

  3. Prop erties and Data-driven Design of Perceptual Reasoning Metho d Based Linguistic Dynamic Systems

    Institute of Scientific and Technical Information of China (English)

    LI Cheng-Dong; ZHANG Gui-Qing; WANG Hui-Dong; REN Wei-Na

    2014-01-01

    The linguistic dynamic systems (LDSs) based on type-1 fuzzy sets can provide a powerful tool for modeling, analysis, evaluation and control of complex systems. However, as pointed out in earlier studies, it is much more reasonable to take type-2 fuzzy sets to model the existing uncertainties of linguistic words. In this paper, the LDS based on type-2 fuzzy sets is studied, and its reasoning process is realized through the perceptual reasoning method. The properties of the perceptual reasoning method based LDS (PR-LDS) are explored. These properties demonstrated that the output of PR-LDS is intuitive and the computation complexity can be reduced when the consequent type-2 fuzzy numbers in the rule base satisfy some conditions. Further, a data driven method for the design of the PR-LDS is provided. At last, the effectiveness and rationality of the proposed data-driven method are verified by an example.

  4. CT Image Reconstruction by Spatial-Radon Domain Data-Driven Tight Frame Regularization

    CERN Document Server

    Zhan, Ruohan

    2016-01-01

    This paper proposes a spatial-Radon domain CT image reconstruction model based on data-driven tight frames (SRD-DDTF). The proposed SRD-DDTF model combines the idea of joint image and Radon domain inpainting model of \\cite{Dong2013X} and that of the data-driven tight frames for image denoising \\cite{cai2014data}. It is different from existing models in that both CT image and its corresponding high quality projection image are reconstructed simultaneously using sparsity priors by tight frames that are adaptively learned from the data to provide optimal sparse approximations. An alternative minimization algorithm is designed to solve the proposed model which is nonsmooth and nonconvex. Convergence analysis of the algorithm is provided. Numerical experiments showed that the SRD-DDTF model is superior to the model by \\cite{Dong2013X} especially in recovering some subtle structures in the images.

  5. Demo Abstract: Toward Data-driven Demand-Response Optimization in a Campus Microgrid

    Energy Technology Data Exchange (ETDEWEB)

    Amam, Saima; Natarajan, Sreedhar; Yin, Wei; Zhou, Qunzhi; Simmhan, Yogesh; Prasanna, Viktor

    2011-11-01

    We describe and demonstrate a prototype software architecture to support data-driven demand response optimization (DR) in the USC campus microgrid, as part of the Los Angeles Smart Grid Demonstration Project. The architecture includes a semantic information repository that integrates diverse data sources to support DR, demand forecasting using scalable machine-learned models, and detection of load curtailment opportunities by matching complex event patterns.

  6. Using Two Different Approaches to Assess Dietary Patterns: Hypothesis-Driven and Data-Driven Analysis

    Directory of Open Access Journals (Sweden)

    Ágatha Nogueira Previdelli

    2016-09-01

    Full Text Available The use of dietary patterns to assess dietary intake has become increasingly common in nutritional epidemiology studies due to the complexity and multidimensionality of the diet. Currently, two main approaches have been widely used to assess dietary patterns: data-driven and hypothesis-driven analysis. Since the methods explore different angles of dietary intake, using both approaches simultaneously might yield complementary and useful information; thus, we aimed to use both approaches to gain knowledge of adolescents’ dietary patterns. Food intake from a cross-sectional survey with 295 adolescents was assessed by 24 h dietary recall (24HR. In hypothesis-driven analysis, based on the American National Cancer Institute method, the usual intake of Brazilian Healthy Eating Index Revised components were estimated. In the data-driven approach, the usual intake of foods/food groups was estimated by the Multiple Source Method. In the results, hypothesis-driven analysis showed low scores for Whole grains, Total vegetables, Total fruit and Whole fruits, while, in data-driven analysis, fruits and whole grains were not presented in any pattern. High intakes of sodium, fats and sugars were observed in hypothesis-driven analysis with low total scores for Sodium, Saturated fat and SoFAA (calories from solid fat, alcohol and added sugar components in agreement, while the data-driven approach showed the intake of several foods/food groups rich in these nutrients, such as butter/margarine, cookies, chocolate powder, whole milk, cheese, processed meat/cold cuts and candies. In this study, using both approaches at the same time provided consistent and complementary information with regard to assessing the overall dietary habits that will be important in order to drive public health programs, and improve their efficiency to monitor and evaluate the dietary patterns of populations.

  7. Image Resolution Enhancement via Data-Driven Parametric Models in the Wavelet Space

    OpenAIRE

    2007-01-01

    We present a data-driven, project-based algorithm which enhances image resolution by extrapolating high-band wavelet coefficients. High-resolution images are reconstructed by alternating the projections onto two constraint sets: the observation constraint defined by the given low-resolution image and the prior constraint derived from the training data at the high resolution (HR). Two types of prior constraints are considered: spatially homogeneous constraint suitable for texture images and p...

  8. Using Two Different Approaches to Assess Dietary Patterns: Hypothesis-Driven and Data-Driven Analysis

    Science.gov (United States)

    Previdelli, Ágatha Nogueira; de Andrade, Samantha Caesar; Fisberg, Regina Mara; Marchioni, Dirce Maria

    2016-01-01

    The use of dietary patterns to assess dietary intake has become increasingly common in nutritional epidemiology studies due to the complexity and multidimensionality of the diet. Currently, two main approaches have been widely used to assess dietary patterns: data-driven and hypothesis-driven analysis. Since the methods explore different angles of dietary intake, using both approaches simultaneously might yield complementary and useful information; thus, we aimed to use both approaches to gain knowledge of adolescents’ dietary patterns. Food intake from a cross-sectional survey with 295 adolescents was assessed by 24 h dietary recall (24HR). In hypothesis-driven analysis, based on the American National Cancer Institute method, the usual intake of Brazilian Healthy Eating Index Revised components were estimated. In the data-driven approach, the usual intake of foods/food groups was estimated by the Multiple Source Method. In the results, hypothesis-driven analysis showed low scores for Whole grains, Total vegetables, Total fruit and Whole fruits), while, in data-driven analysis, fruits and whole grains were not presented in any pattern. High intakes of sodium, fats and sugars were observed in hypothesis-driven analysis with low total scores for Sodium, Saturated fat and SoFAA (calories from solid fat, alcohol and added sugar) components in agreement, while the data-driven approach showed the intake of several foods/food groups rich in these nutrients, such as butter/margarine, cookies, chocolate powder, whole milk, cheese, processed meat/cold cuts and candies. In this study, using both approaches at the same time provided consistent and complementary information with regard to assessing the overall dietary habits that will be important in order to drive public health programs, and improve their efficiency to monitor and evaluate the dietary patterns of populations. PMID:27669289

  9. Data Driven Marketing in Apple and Back to School Campaign 2011

    OpenAIRE

    Bernátek, Martin

    2011-01-01

    Out of the campaign analysis the most important contribution is that Data-Driven Marketing makes sense only once it is already part of the marketing plan. So the team preparing the marketing plan defines the goals and sets the proper measurement matrix according to those goals. It enables to adjust the marketing plan to extract more value, watch the execution and do adjustments if necessary and evaluate at the end of the campaign.

  10. Data-Driven Lead-Acid Battery Prognostics Using Random Survival Forests

    Science.gov (United States)

    2014-10-02

    Data-Driven Lead-Acid Battery Prognostics Using Random Survival Forests Erik Frisk1, Mattias Krysander2, and Emil Larsson3 1,2,3 Department of...driven approach using random survival forests is proposed where the prognostic algorithm has access to fleet manage- ment data including 291 variables...but it is also used to, for example, power auxiliary units such as heating and kitchen Erik Frisk et al. This is an open-access article distributed

  11. Using Two Different Approaches to Assess Dietary Patterns: Hypothesis-Driven and Data-Driven Analysis.

    Science.gov (United States)

    Previdelli, Ágatha Nogueira; de Andrade, Samantha Caesar; Fisberg, Regina Mara; Marchioni, Dirce Maria

    2016-09-23

    The use of dietary patterns to assess dietary intake has become increasingly common in nutritional epidemiology studies due to the complexity and multidimensionality of the diet. Currently, two main approaches have been widely used to assess dietary patterns: data-driven and hypothesis-driven analysis. Since the methods explore different angles of dietary intake, using both approaches simultaneously might yield complementary and useful information; thus, we aimed to use both approaches to gain knowledge of adolescents' dietary patterns. Food intake from a cross-sectional survey with 295 adolescents was assessed by 24 h dietary recall (24HR). In hypothesis-driven analysis, based on the American National Cancer Institute method, the usual intake of Brazilian Healthy Eating Index Revised components were estimated. In the data-driven approach, the usual intake of foods/food groups was estimated by the Multiple Source Method. In the results, hypothesis-driven analysis showed low scores for Whole grains, Total vegetables, Total fruit and Whole fruits), while, in data-driven analysis, fruits and whole grains were not presented in any pattern. High intakes of sodium, fats and sugars were observed in hypothesis-driven analysis with low total scores for Sodium, Saturated fat and SoFAA (calories from solid fat, alcohol and added sugar) components in agreement, while the data-driven approach showed the intake of several foods/food groups rich in these nutrients, such as butter/margarine, cookies, chocolate powder, whole milk, cheese, processed meat/cold cuts and candies. In this study, using both approaches at the same time provided consistent and complementary information with regard to assessing the overall dietary habits that will be important in order to drive public health programs, and improve their efficiency to monitor and evaluate the dietary patterns of populations.

  12. Dynamic Data Driven Applications Systems (DDDAS) modeling for automatic target recognition

    Science.gov (United States)

    Blasch, Erik; Seetharaman, Guna; Darema, Frederica

    2013-05-01

    The Dynamic Data Driven Applications System (DDDAS) concept uses applications modeling, mathematical algorithms, and measurement systems to work with dynamic systems. A dynamic systems such as Automatic Target Recognition (ATR) is subject to sensor, target, and the environment variations over space and time. We use the DDDAS concept to develop an ATR methodology for multiscale-multimodal analysis that seeks to integrated sensing, processing, and exploitation. In the analysis, we use computer vision techniques to explore the capabilities and analogies that DDDAS has with information fusion. The key attribute of coordination is the use of sensor management as a data driven techniques to improve performance. In addition, DDDAS supports the need for modeling from which uncertainty and variations are used within the dynamic models for advanced performance. As an example, we use a Wide-Area Motion Imagery (WAMI) application to draw parallels and contrasts between ATR and DDDAS systems that warrants an integrated perspective. This elementary work is aimed at triggering a sequence of deeper insightful research towards exploiting sparsely sampled piecewise dense WAMI measurements - an application where the challenges of big-data with regards to mathematical fusion relationships and high-performance computations remain significant and will persist. Dynamic data-driven adaptive computations are required to effectively handle the challenges with exponentially increasing data volume for advanced information fusion systems solutions such as simultaneous target tracking and ATR.

  13. Data-driven non-linear elasticity: constitutive manifold construction and problem discretization

    Science.gov (United States)

    Ibañez, Ruben; Borzacchiello, Domenico; Aguado, Jose Vicente; Abisset-Chavanne, Emmanuelle; Cueto, Elias; Ladeveze, Pierre; Chinesta, Francisco

    2017-07-01

    The use of constitutive equations calibrated from data has been implemented into standard numerical solvers for successfully addressing a variety problems encountered in simulation-based engineering sciences (SBES). However, the complexity remains constantly increasing due to the need of increasingly detailed models as well as the use of engineered materials. Data-Driven simulation constitutes a potential change of paradigm in SBES. Standard simulation in computational mechanics is based on the use of two very different types of equations. The first one, of axiomatic character, is related to balance laws (momentum, mass, energy,\\ldots ), whereas the second one consists of models that scientists have extracted from collected, either natural or synthetic, data. Data-driven (or data-intensive) simulation consists of directly linking experimental data to computers in order to perform numerical simulations. These simulations will employ laws, universally recognized as epistemic, while minimizing the need of explicit, often phenomenological, models. The main drawback of such an approach is the large amount of required data, some of them inaccessible from the nowadays testing facilities. Such difficulty can be circumvented in many cases, and in any case alleviated, by considering complex tests, collecting as many data as possible and then using a data-driven inverse approach in order to generate the whole constitutive manifold from few complex experimental tests, as discussed in the present work.

  14. Data-Driven User Feedback: An Improved Neurofeedback Strategy considering the Interindividual Variability of EEG Features

    Science.gov (United States)

    Lim, Jeong-Hwan; Lee, Jun-Hak; Kim, Kangsan

    2016-01-01

    It has frequently been reported that some users of conventional neurofeedback systems can experience only a small portion of the total feedback range due to the large interindividual variability of EEG features. In this study, we proposed a data-driven neurofeedback strategy considering the individual variability of electroencephalography (EEG) features to permit users of the neurofeedback system to experience a wider range of auditory or visual feedback without a customization process. The main idea of the proposed strategy is to adjust the ranges of each feedback level using the density in the offline EEG database acquired from a group of individuals. Twenty-two healthy subjects participated in offline experiments to construct an EEG database, and five subjects participated in online experiments to validate the performance of the proposed data-driven user feedback strategy. Using the optimized bin sizes, the number of feedback levels that each individual experienced was significantly increased to 139% and 144% of the original results with uniform bin sizes in the offline and online experiments, respectively. Our results demonstrated that the use of our data-driven neurofeedback strategy could effectively increase the overall range of feedback levels that each individual experienced during neurofeedback training. PMID:27631005

  15. Data-driven matched field processing for Lamb wave structural health monitoring.

    Science.gov (United States)

    Harley, Joel B; Moura, José M F

    2014-03-01

    Matched field processing is a model-based framework for localizing targets in complex propagation environments. In underwater acoustics, it has been extensively studied for improving localization performance in multimodal and multipath media. For guided wave structural health monitoring problems, matched field processing has not been widely applied but is an attractive option for damage localization due to equally complex propagation environments. Although effective, matched field processing is often challenging to implement because it requires accurate models of the propagation environment, and the optimization methods used to generate these models are often unreliable and computationally expensive. To address these obstacles, this paper introduces data-driven matched field processing, a framework to build models of multimodal propagation environments directly from measured data, and then use these models for localization. This paper presents the data-driven framework, analyzes its behavior under unmodeled multipath interference, and demonstrates its localization performance by distinguishing two nearby scatterers from experimental measurements of an aluminum plate. Compared with delay-based models that are commonly used in structural health monitoring, the data-driven matched field processing framework is shown to successfully localize two nearby scatterers with significantly smaller localization errors and finer resolutions.

  16. Using data-driven model-brain mappings to constrain formal models of cognition.

    Directory of Open Access Journals (Sweden)

    Jelmer P Borst

    Full Text Available In this paper we propose a method to create data-driven mappings from components of cognitive models to brain regions. Cognitive models are notoriously hard to evaluate, especially based on behavioral measures alone. Neuroimaging data can provide additional constraints, but this requires a mapping from model components to brain regions. Although such mappings can be based on the experience of the modeler or on a reading of the literature, a formal method is preferred to prevent researcher-based biases. In this paper we used model-based fMRI analysis to create a data-driven model-brain mapping for five modules of the ACT-R cognitive architecture. We then validated this mapping by applying it to two new datasets with associated models. The new mapping was at least as powerful as an existing mapping that was based on the literature, and indicated where the models were supported by the data and where they have to be improved. We conclude that data-driven model-brain mappings can provide strong constraints on cognitive models, and that model-based fMRI is a suitable way to create such mappings.

  17. KNMI DataLab experiences in serving data-driven innovations

    Science.gov (United States)

    Noteboom, Jan Willem; Sluiter, Raymond

    2016-04-01

    Climate change research and innovations in weather forecasting rely more and more on (Big) data. Besides increasing data from traditional sources (such as observation networks, radars and satellites), the use of open data, crowd sourced data and the Internet of Things (IoT) is emerging. To deploy these sources of data optimally in our services and products, KNMI has established a DataLab to serve data-driven innovations in collaboration with public and private sector partners. Big data management, data integration, data analytics including machine learning and data visualization techniques are playing an important role in the DataLab. Cross-domain data-driven innovations that arise from public-private collaborative projects and research programmes can be explored, experimented and/or piloted by the KNMI DataLab. Furthermore, advice can be requested on (Big) data techniques and data sources. In support of collaborative (Big) data science activities, scalable environments are offered with facilities for data integration, data analysis and visualization. In addition, Data Science expertise is provided directly or from a pool of internal and external experts. At the EGU conference, gained experiences and best practices are presented in operating the KNMI DataLab to serve data-driven innovations for weather and climate applications optimally.

  18. Data Use: Data-Driven Decision Making Takes a Big-Picture View of the Needs of Teachers and Students

    Science.gov (United States)

    Bernhardt, Victoria L.

    2009-01-01

    Data-driven decision making is the process of using data to inform decisions to improve teaching and learning. Schools typically engage in two kinds of data-driven decision making--at the school level and at the classroom level. The first leads to the second. In this article, the author describes how Marylin Avenue Elementary School successfully…

  19. Data-driven methods to improve baseflow prediction of a regional groundwater model

    Science.gov (United States)

    Xu, Tianfang; Valocchi, Albert J.

    2015-12-01

    Physically-based models of groundwater flow are powerful tools for water resources assessment under varying hydrologic, climate and human development conditions. One of the most important topics of investigation is how these conditions will affect the discharge of groundwater to rivers and streams (i.e. baseflow). Groundwater flow models are based upon discretized solution of mass balance equations, and contain important hydrogeological parameters that vary in space and cannot be measured. Common practice is to use least squares regression to estimate parameters and to infer prediction and associated uncertainty. Nevertheless, the unavoidable uncertainty associated with physically-based groundwater models often results in both aleatoric and epistemic model calibration errors, thus violating a key assumption for regression-based parameter estimation and uncertainty quantification. We present a complementary data-driven modeling and uncertainty quantification (DDM-UQ) framework to improve predictive accuracy of physically-based groundwater models and to provide more robust prediction intervals. First, we develop data-driven models (DDMs) based on statistical learning techniques to correct the bias of the calibrated groundwater model. Second, we characterize the aleatoric component of groundwater model residual using both parametric and non-parametric distribution estimation methods. We test the complementary data-driven framework on a real-world case study of the Republican River Basin, where a regional groundwater flow model was developed to assess the impact of groundwater pumping for irrigation. Compared to using only the flow model, DDM-UQ provides more accurate monthly baseflow predictions. In addition, DDM-UQ yields prediction intervals with coverage probability consistent with validation data. The DDM-UQ framework is computationally efficient and is expected to be applicable to many geoscience models for which model structural error is not negligible.

  20. Jet Substructure Templates: Data-driven QCD Backgrounds for Fat Jet Searches

    CERN Document Server

    Cohen, Timothy; Lisanti, Mariangela; Lou, Hou Keong; Wacker, Jay G

    2014-01-01

    QCD is often the dominant background to new physics searches for which jet substructure provides a useful handle. Due to the challenges associated with modeling this background, data-driven approaches are necessary. This paper presents a novel method for determining QCD predictions using templates -- probability distribution functions for jet substructure properties as a function of kinematic inputs. Templates can be extracted from a control region and then used to compute background distributions in the signal region. Using Monte Carlo, we illustrate the procedure with two case studies and show that the template approach effectively models the relevant QCD background. This work strongly motivates the application of these techniques to LHC data.

  1. Jet substructure templates: data-driven QCD backgrounds for fat jet searches

    Energy Technology Data Exchange (ETDEWEB)

    Cohen, Timothy [Theory Group, SLAC National Accelerator Laboratory, Menlo Park, CA 94025 (United States); Jankowiak, Martin [Institut für Theoretische Physik, Universität Heidelberg, 69120 Heidelberg (Germany); Lisanti, Mariangela; Lou, Hou Keong [Physics Department, Princeton University, Princeton, NJ 08544 (United States); Wacker, Jay G. [Theory Group, SLAC National Accelerator Laboratory, Menlo Park, CA 94025 (United States)

    2014-05-05

    QCD is often the dominant background to new physics searches for which jet substructure provides a useful handle. Due to the challenges associated with modeling this background, data-driven approaches are necessary. This paper presents a novel method for determining QCD predictions using templates — probability distribution functions for jet substructure properties as a function of kinematic inputs. Templates can be extracted from a control region and then used to compute background distributions in the signal region. Using Monte Carlo, we illustrate the procedure with two case studies and show that the template approach effectively models the relevant QCD background. This work strongly motivates the application of these techniques to LHC data.

  2. Formal techniques for a data-driven certification of advanced railway signalling systems

    DEFF Research Database (Denmark)

    Fantechi, Alessandro

    2016-01-01

    The technological evolution of railway signalling equipment promises significant increases in transport capacity, in operation regularity, in quality and safety of the service offered.This evolution is based on the massive use of computer control units on board trains and on the ground, that aims...... to advocate the adoption of a novel, data-driven safety certification approach, based on formal verification techniques, focusing on the desired attributes of the exchanged information. A discussion on this issue is presented, based on some initial observations of the needed concepts....

  3. Introducing the new GRASS module g.infer for data-driven rule-based applications

    Directory of Open Access Journals (Sweden)

    Peter Löwe

    2012-10-01

    Full Text Available This paper introduces the new GRASS GIS add-on module g.infer. The module enables rule-based analysis and workflow management in GRASS GIS, via data-driven inference processes based on the expert system shell CLIPS. The paper discusses the theoretical and developmental background that will help prepare the reader to use the module for Knowledge Engineering applications. In addition, potential application scenarios are sketched out, ranging from the rule-driven formulation of nontrivial GIS-classification tasks and GIS workflows to ontology management and intelligent software agents.

  4. A data-driven model for maximization of methane production in a wastewater treatment plant.

    Science.gov (United States)

    Kusiak, Andrew; Wei, Xiupeng

    2012-01-01

    A data-driven approach for maximization of methane production in a wastewater treatment plant is presented. Industrial data collected on a daily basis was used to build the model. Temperature, total solids, volatile solids, detention time and pH value were selected as parameters for the model construction. First, a prediction model of methane production was built by a multi-layer perceptron neural network. Then a particle swarm optimization algorithm was used to maximize methane production based on the model developed in this research. The model resulted in a 5.5% increase in methane production.

  5. Data-driven Discovery: A New Era of Exploiting the Literature and Data

    Directory of Open Access Journals (Sweden)

    Ying Ding

    2016-11-01

    Full Text Available In the current data-intensive era, the traditional hands-on method of conducting scientific research by exploring related publications to generate a testable hypothesis is well on its way of becoming obsolete within just a year or two. Analyzing the literature and data to automatically generate a hypothesis might become the de facto approach to inform the core research efforts of those trying to master the exponentially rapid expansion of publications and datasets. Here, viewpoints are provided and discussed to help the understanding of challenges of data-driven discovery.

  6. FaultBuster: data driven fault detection and diagnosis for industrial systems

    DEFF Research Database (Denmark)

    Bergantino, Nicola; Caponetti, Fabio; Longhi, Sauro

    2009-01-01

    Efficient and reliable monitoring systems are mandatory to assure the required security standards in industrial complexes. This paper describes the recent developments of FaultBuster, a purely data-driven diagnostic system. It is designed so to be easily scalable to different monitor tasks....... Multivariate statistical models based on principal components are used to detect abnormal situations. Tailored to alarms, a probabilistic inference engine process the fault evidences to output the most probable diagnosis. Results from the DX 09 Diagnostic Challenge shown strong detection properties, while...

  7. A research about data-driven simulation approach for Rolls-Royce 150 seater engine supply chain

    OpenAIRE

    Shi, Shuaijie

    2007-01-01

    This dissertation is about a data-driven simulation applied on the supply chain improvement project for Rolls-Royce. Rolls-Royce is now planning design an engine for 150 seats Boeing 737. One of requirements of Boeing is less than 65 days lead time. Compared current 2 years lead time, it is a big challenge for Rolls-Royce's supply chain. A data-driven simulation method is applied in this article to solve this problem. The model of 150 seater engine supply chain is built by data-driven simulat...

  8. A Data-Driven Approach to Reverse Engineering Customer Engagement Models: Towards Functional Constructs

    Science.gov (United States)

    de Vries, Natalie Jane; Carlson, Jamie; Moscato, Pablo

    2014-01-01

    Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The ‘communities’ of questionnaire items that emerge from our community detection method form possible ‘functional constructs’ inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such ‘functional constructs’ suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling. PMID:25036766

  9. Data-driven MHD simulation of a solar eruption observed in NOAA Active Region 12158

    Science.gov (United States)

    Lee, Hwanhee; Magara, Tetsuya; Kang, Jihye

    2017-08-01

    We present a data-driven magnetohydrodynamic (MHD) simulation of a solar eruption where the dynamics of a background solar wind is incorporated. The background solar wind exists in the real solar atmosphere, which continuously transports magnetized plasma toward the interplanetary space. This suggests that it may play a role in producing a solar eruption. We perform a simulation for NOAA AR 12158 accompanied with X1.6-class flare and CME on 2014 September 10. We construct a magnetohydrostatic state used as the initial state of data-driven simulation, which is composed of a nonlinear force-free field (NLFFF) derived from observation data of photospheric vector magnetic field and a hydrostatic atmosphere with prescribed distributions of temperature and gravity. We then reduce the gas pressure well above the solar surface to drive a solar wind. As a result, a magnetic field gradually evolves during an early phase, and eventually eruption is observed. To figure out what causes the transition from gradual evolution to eruption, we analyze the temporal development of force distribution and geometrical shape of magnetic field lines. The result suggests that the curvature and the scale height of a coronal magnetic field play an important role in determining its dynamic state.

  10. Data driven approaches vs. qualitative approaches in climate change impact and vulnerability assessment.

    Science.gov (United States)

    Zebisch, Marc; Schneiderbauer, Stefan; Petitta, Marcello

    2015-04-01

    In the last decade the scope of climate change science has broadened significantly. 15 years ago the focus was mainly on understanding climate change, providing climate change scenarios and giving ideas about potential climate change impacts. Today, adaptation to climate change has become an increasingly important field of politics and one role of science is to inform and consult this process. Therefore, climate change science is not anymore focusing on data driven approaches only (such as climate or climate impact models) but is progressively applying and relying on qualitative approaches including opinion and expertise acquired through interactive processes with local stakeholders and decision maker. Furthermore, climate change science is facing the challenge of normative questions, such us 'how important is a decrease of yield in a developed country where agriculture only represents 3% of the GDP and the supply with agricultural products is strongly linked to global markets and less depending on local production?'. In this talk we will present examples from various applied research and consultancy projects on climate change vulnerabilities including data driven methods (e.g. remote sensing and modelling) to semi-quantitative and qualitative assessment approaches. Furthermore, we will discuss bottlenecks, pitfalls and opportunities in transferring climate change science to policy and decision maker oriented climate services.

  11. An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models

    Science.gov (United States)

    Taormina, R.; Galelli, S.; Karakaya, G.; Ahipasaoglu, S. D.

    2016-11-01

    This work investigates the uncertainty associated to the presence of multiple subsets of predictors yielding data-driven models with the same, or similar, predictive accuracy. To handle this uncertainty effectively, we introduce a novel input variable selection algorithm, called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS), specifically conceived to identify all alternate subsets of predictors in a given dataset. The search process is based on a four-objective optimization problem that minimizes the number of selected predictors, maximizes the predictive accuracy of a data-driven model and optimizes two information theoretic metrics of relevance and redundancy, which guarantee that the selected subsets are highly informative and with little intra-subset similarity. The algorithm is first tested on two synthetic test problems and then demonstrated on a real-world streamflow prediction problem in the Yampa River catchment (US). Results show that complex hydro-meteorological datasets are characterized by a large number of alternate subsets of predictors, which provides useful insights on the underlying physical processes. Furthermore, the presence of multiple subsets of predictors-and associated models-helps find a better trade-off between different measures of predictive accuracy commonly adopted for hydrological modelling problems.

  12. Data-driven HR how to use analytics and metrics to drive performance

    CERN Document Server

    Marr, Bernard

    2018-01-01

    Traditionally seen as a purely people function unconcerned with numbers, HR is now uniquely placed to use company data to drive performance, both of the people in the organization and the organization as a whole. Data-driven HR is a practical guide which enables HR practitioners to leverage the value of the vast amount of data available at their fingertips. Covering how to identify the most useful sources of data, how to collect information in a transparent way that is in line with data protection requirements and how to turn this data into tangible insights, this book marks a turning point for the HR profession. Covering all the key elements of HR including recruitment, employee engagement, performance management, wellbeing and training, Data-driven HR examines the ways data can contribute to organizational success by, among other things, optimizing processes, driving performance and improving HR decision making. Packed with case studies and real-life examples, this is essential reading for all HR profession...

  13. Testing the Utility of a Data-Driven Approach for Assessing BMI from Face Images.

    Directory of Open Access Journals (Sweden)

    Karin Wolffhechel

    Full Text Available Several lines of evidence suggest that facial cues of adiposity may be important for human social interaction. However, tests for quantifiable cues of body mass index (BMI in the face have examined only a small number of facial proportions and these proportions were found to have relatively low predictive power. Here we employed a data-driven approach in which statistical models were built using principal components (PCs derived from objectively defined shape and color characteristics in face images. The predictive power of these models was then compared with models based on previously studied facial proportions (perimeter-to-area ratio, width-to-height ratio, and cheek-to-jaw width. Models based on 2D shape-only PCs, color-only PCs, and 2D shape and color PCs combined each performed significantly and substantially better than models based on one or more of the previously studied facial proportions. A non-linear PC model considering both 2D shape and color PCs was the best predictor of BMI. These results highlight the utility of a "bottom-up", data-driven approach for assessing BMI from face images.

  14. Testing the Utility of a Data-Driven Approach for Assessing BMI from Face Images.

    Science.gov (United States)

    Wolffhechel, Karin; Hahn, Amanda C; Jarmer, Hanne; Fisher, Claire I; Jones, Benedict C; DeBruine, Lisa M

    2015-01-01

    Several lines of evidence suggest that facial cues of adiposity may be important for human social interaction. However, tests for quantifiable cues of body mass index (BMI) in the face have examined only a small number of facial proportions and these proportions were found to have relatively low predictive power. Here we employed a data-driven approach in which statistical models were built using principal components (PCs) derived from objectively defined shape and color characteristics in face images. The predictive power of these models was then compared with models based on previously studied facial proportions (perimeter-to-area ratio, width-to-height ratio, and cheek-to-jaw width). Models based on 2D shape-only PCs, color-only PCs, and 2D shape and color PCs combined each performed significantly and substantially better than models based on one or more of the previously studied facial proportions. A non-linear PC model considering both 2D shape and color PCs was the best predictor of BMI. These results highlight the utility of a "bottom-up", data-driven approach for assessing BMI from face images.

  15. Data-Driven Engineering of Social Dynamics: Pattern Matching and Profit Maximization.

    Directory of Open Access Journals (Sweden)

    Huan-Kai Peng

    Full Text Available In this paper, we define a new problem related to social media, namely, the data-driven engineering of social dynamics. More precisely, given a set of observations from the past, we aim at finding the best short-term intervention that can lead to predefined long-term outcomes. Toward this end, we propose a general formulation that covers two useful engineering tasks as special cases, namely, pattern matching and profit maximization. By incorporating a deep learning model, we derive a solution using convex relaxation and quadratic-programming transformation. Moreover, we propose a data-driven evaluation method in place of the expensive field experiments. Using a Twitter dataset, we demonstrate the effectiveness of our dynamics engineering approach for both pattern matching and profit maximization, and study the multifaceted interplay among several important factors of dynamics engineering, such as solution validity, pattern-matching accuracy, and intervention cost. Finally, the method we propose is general enough to work with multi-dimensional time series, so it can potentially be used in many other applications.

  16. Data-driven risk identification in phase III clinical trials using central statistical monitoring.

    Science.gov (United States)

    Timmermans, Catherine; Venet, David; Burzykowski, Tomasz

    2016-02-01

    Our interest lies in quality control for clinical trials, in the context of risk-based monitoring (RBM). We specifically study the use of central statistical monitoring (CSM) to support RBM. Under an RBM paradigm, we claim that CSM has a key role to play in identifying the "risks to the most critical data elements and processes" that will drive targeted oversight. In order to support this claim, we first see how to characterize the risks that may affect clinical trials. We then discuss how CSM can be understood as a tool for providing a set of data-driven key risk indicators (KRIs), which help to organize adaptive targeted monitoring. Several case studies are provided where issues in a clinical trial have been identified thanks to targeted investigation after the identification of a risk using CSM. Using CSM to build data-driven KRIs helps to identify different kinds of issues in clinical trials. This ability is directly linked with the exhaustiveness of the CSM approach and its flexibility in the definition of the risks that are searched for when identifying the KRIs. In practice, a CSM assessment of the clinical database seems essential to ensure data quality. The atypical data patterns found in some centers and variables are seen as KRIs under a RBM approach. Targeted monitoring or data management queries can be used to confirm whether the KRIs point to an actual issue or not.

  17. General Purpose Data-Driven Online System Health Monitoring with Applications to Space Operations

    Science.gov (United States)

    Iverson, David L.; Spirkovska, Lilly; Schwabacher, Mark

    2010-01-01

    Modern space transportation and ground support system designs are becoming increasingly sophisticated and complex. Determining the health state of these systems using traditional parameter limit checking, or model-based or rule-based methods is becoming more difficult as the number of sensors and component interactions grows. Data-driven monitoring techniques have been developed to address these issues by analyzing system operations data to automatically characterize normal system behavior. System health can be monitored by comparing real-time operating data with these nominal characterizations, providing detection of anomalous data signatures indicative of system faults, failures, or precursors of significant failures. The Inductive Monitoring System (IMS) is a general purpose, data-driven system health monitoring software tool that has been successfully applied to several aerospace applications and is under evaluation for anomaly detection in vehicle and ground equipment for next generation launch systems. After an introduction to IMS application development, we discuss these NASA online monitoring applications, including the integration of IMS with complementary model-based and rule-based methods. Although the examples presented in this paper are from space operations applications, IMS is a general-purpose health-monitoring tool that is also applicable to power generation and transmission system monitoring.

  18. A data-driven approach to reverse engineering customer engagement models: towards functional constructs.

    Directory of Open Access Journals (Sweden)

    Natalie Jane de Vries

    Full Text Available Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The 'communities' of questionnaire items that emerge from our community detection method form possible 'functional constructs' inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such 'functional constructs' suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling.

  19. Data-driven integration of genome-scale regulatory and metabolic network models

    Directory of Open Access Journals (Sweden)

    Saheed eImam

    2015-05-01

    Full Text Available Microbes are diverse and extremely versatile organisms that play vital roles in all ecological niches. Understanding and harnessing microbial systems will be key to the sustainability of our planet. One approach to improving our knowledge of microbial processes is through data-driven and mechanism-informed computational modeling. Individual models of biological networks (such as metabolism, transcription and signaling have played pivotal roles in driving microbial research through the years. These networks, however, are highly interconnected and function in concert – a fact that has led to the development of a variety of approaches aimed at simulating the integrated functions of two or more network types. Though the task of integrating these different models is fraught with new challenges, the large amounts of high-throughput data sets being generated, and algorithms being developed, means that the time is at hand for concerted efforts to build integrated regulatory-metabolic networks in a data-driven fashion. In this perspective, we review current approaches for constructing integrated regulatory-metabolic models and outline new strategies for future development of these network models for any microbial system.

  20. The effects of data-driven learning activities on EFL learners' writing development.

    Science.gov (United States)

    Luo, Qinqin

    2016-01-01

    Data-driven learning has been proved as an effective approach in helping learners solve various writing problems such as correcting lexical or grammatical errors, improving the use of collocations and generating ideas in writing, etc. This article reports on an empirical study in which data-driven learning was accomplished with the assistance of the user-friendly BNCweb, and presents the evaluation of the outcome by comparing the effectiveness of BNCweb and a search engine Baidu which is most commonly used as reference resource by Chinese learners of English as a foreign language. The quantitative results about 48 Chinese college students revealed that the experimental group which used BNCweb performed significantly better in the post-test in terms of writing fluency and accuracy, as compared with the control group which used the search engine Baidu. However, no significant difference was found between the two groups in terms of writing complexity. The qualitative results about the interview revealed that learners generally showed a positive attitude toward the use of BNCweb but there were still some problems of using corpora in the writing process, thus the combined use of corpora and other types of reference resource was suggested as a possible way to counter the potential barriers for Chinese learners of English.

  1. Data-Driven Engineering of Social Dynamics: Pattern Matching and Profit Maximization.

    Science.gov (United States)

    Peng, Huan-Kai; Lee, Hao-Chih; Pan, Jia-Yu; Marculescu, Radu

    2016-01-01

    In this paper, we define a new problem related to social media, namely, the data-driven engineering of social dynamics. More precisely, given a set of observations from the past, we aim at finding the best short-term intervention that can lead to predefined long-term outcomes. Toward this end, we propose a general formulation that covers two useful engineering tasks as special cases, namely, pattern matching and profit maximization. By incorporating a deep learning model, we derive a solution using convex relaxation and quadratic-programming transformation. Moreover, we propose a data-driven evaluation method in place of the expensive field experiments. Using a Twitter dataset, we demonstrate the effectiveness of our dynamics engineering approach for both pattern matching and profit maximization, and study the multifaceted interplay among several important factors of dynamics engineering, such as solution validity, pattern-matching accuracy, and intervention cost. Finally, the method we propose is general enough to work with multi-dimensional time series, so it can potentially be used in many other applications.

  2. Data-driven modeling reveals cell behaviors controlling self-organization during Myxococcus xanthus development.

    Science.gov (United States)

    Cotter, Christopher R; Schüttler, Heinz-Bernd; Igoshin, Oleg A; Shimkets, Lawrence J

    2017-06-06

    Collective cell movement is critical to the emergent properties of many multicellular systems, including microbial self-organization in biofilms, embryogenesis, wound healing, and cancer metastasis. However, even the best-studied systems lack a complete picture of how diverse physical and chemical cues act upon individual cells to ensure coordinated multicellular behavior. Known for its social developmental cycle, the bacterium Myxococcus xanthus uses coordinated movement to generate three-dimensional aggregates called fruiting bodies. Despite extensive progress in identifying genes controlling fruiting body development, cell behaviors and cell-cell communication mechanisms that mediate aggregation are largely unknown. We developed an approach to examine emergent behaviors that couples fluorescent cell tracking with data-driven models. A unique feature of this approach is the ability to identify cell behaviors affecting the observed aggregation dynamics without full knowledge of the underlying biological mechanisms. The fluorescent cell tracking revealed large deviations in the behavior of individual cells. Our modeling method indicated that decreased cell motility inside the aggregates, a biased walk toward aggregate centroids, and alignment among neighboring cells in a radial direction to the nearest aggregate are behaviors that enhance aggregation dynamics. Our modeling method also revealed that aggregation is generally robust to perturbations in these behaviors and identified possible compensatory mechanisms. The resulting approach of directly combining behavior quantification with data-driven simulations can be applied to more complex systems of collective cell movement without prior knowledge of the cellular machinery and behavioral cues.

  3. A data-driven approach to reverse engineering customer engagement models: towards functional constructs.

    Science.gov (United States)

    de Vries, Natalie Jane; Carlson, Jamie; Moscato, Pablo

    2014-01-01

    Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The 'communities' of questionnaire items that emerge from our community detection method form possible 'functional constructs' inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such 'functional constructs' suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling.

  4. Data-driven normalization strategies for high-throughput quantitative RT-PCR

    Directory of Open Access Journals (Sweden)

    Suzuki Harukazu

    2009-04-01

    Full Text Available Abstract Background High-throughput real-time quantitative reverse transcriptase polymerase chain reaction (qPCR is a widely used technique in experiments where expression patterns of genes are to be profiled. Current stage technology allows the acquisition of profiles for a moderate number of genes (50 to a few thousand, and this number continues to grow. The use of appropriate normalization algorithms for qPCR-based data is therefore a highly important aspect of the data preprocessing pipeline. Results We present and evaluate two data-driven normalization methods that directly correct for technical variation and represent robust alternatives to standard housekeeping gene-based approaches. We evaluated the performance of these methods against a single gene housekeeping gene method and our results suggest that quantile normalization performs best. These methods are implemented in freely-available software as an R package qpcrNorm distributed through the Bioconductor project. Conclusion The utility of the approaches that we describe can be demonstrated most clearly in situations where standard housekeeping genes are regulated by some experimental condition. For large qPCR-based data sets, our approaches represent robust, data-driven strategies for normalization.

  5. Data-driven detrending of nonstationary fractal time series with echo state networks

    CERN Document Server

    Maiorino, Enrico; Livi, Lorenzo; Rizzi, Antonello; Sadeghian, Alireza

    2015-01-01

    In this paper, we propose a data-driven approach to the problem of detrending fractal and multifractal time series. We consider a time series as the measurements elaborated from a dynamical process over time. We assume that such a dynamical process is predictable to a certain degree, by means of a class of recurrent networks called echo state networks. Such networks have been shown to be able to predict the outcome of a number of dynamical processes. Here we propose to perform a data-driven detrending of nonstationary, fractal and multifractal time series by using an echo state network operating as a filter. Notably, we predict the trend component of a given input time series, which is superimposed to the (multi)fractal component of interest. Such a (estimated) trend is then removed from the original time series and the residual signal is analyzed with the Multifractal Detrended Fluctuation Analysis for a quantitative verification of the correctness of the proposed detrending procedure. In order to demonstrat...

  6. A data-driven model for spectra: Finding double redshifts in the Sloan Digital Sky Survey

    CERN Document Server

    Tsalmantza, P

    2012-01-01

    We present a data-driven method - heteroscedastic matrix factorization, a kind of probabilistic factor analysis - for modeling or performing dimensionality reduction on observed spectra or other high-dimensional data with known but non-uniform observational uncertainties. The method uses an iterative inverse-variance-weighted least-squares minimization procedure to generate a best set of basis functions. The method is similar to principal components analysis, but with the substantial advantage that it uses measurement uncertainties in a responsible way and accounts naturally for poorly measured and missing data; it models the variance in the noise-deconvolved data space. A regularization can be applied, in the form of a smoothness prior (inspired by Gaussian processes) or a non-negative constraint, without making the method prohibitively slow. Because the method optimizes a justified scalar (related to the likelihood), the basis provides a better fit to the data in a probabilistic sense than any PCA basis. We...

  7. Data-driven modeling, control and tools for cyber-physical energy systems

    Science.gov (United States)

    Behl, Madhur

    Energy systems are experiencing a gradual but substantial change in moving away from being non-interactive and manually-controlled systems to utilizing tight integration of both cyber (computation, communications, and control) and physical representations guided by first principles based models, at all scales and levels. Furthermore, peak power reduction programs like demand response (DR) are becoming increasingly important as the volatility on the grid continues to increase due to regulation, integration of renewables and extreme weather conditions. In order to shield themselves from the risk of price volatility, end-user electricity consumers must monitor electricity prices and be flexible in the ways they choose to use electricity. This requires the use of control-oriented predictive models of an energy system's dynamics and energy consumption. Such models are needed for understanding and improving the overall energy efficiency and operating costs. However, learning dynamical models using grey/white box approaches is very cost and time prohibitive since it often requires significant financial investments in retrofitting the system with several sensors and hiring domain experts for building the model. We present the use of data-driven methods for making model capture easy and efficient for cyber-physical energy systems. We develop Model-IQ, a methodology for analysis of uncertainty propagation for building inverse modeling and controls. Given a grey-box model structure and real input data from a temporary set of sensors, Model-IQ evaluates the effect of the uncertainty propagation from sensor data to model accuracy and to closed-loop control performance. We also developed a statistical method to quantify the bias in the sensor measurement and to determine near optimal sensor placement and density for accurate data collection for model training and control. Using a real building test-bed, we show how performing an uncertainty analysis can reveal trends about

  8. A Data Driven Framework for Real Time Power System Event Detection and Visualization

    CERN Document Server

    McCamish, Ben; Landford, Jordan; Bass, Robert; Cotilla-Sanchez, Eduardo; Chiu, David

    2015-01-01

    Increased adoption and deployment of phasor measurement units (PMU) has provided valuable fine-grained data over the grid. Analysis over these data can provide real-time insight into the health of the grid, thereby improving control over operations. Realizing this data-driven control, however, requires validating, processing and storing massive amounts of PMU data. This paper describes a PMU data management system that supports input from multiple PMU data streams, features an event-detection algorithm, and provides an efficient method for retrieving archival data. The event-detection algorithm rapidly correlates multiple PMU data streams, providing details on events occurring within the power system in real-time. The event-detection algorithm feeds into a visualization component, allowing operators to recognize events as they occur. The indexing and data retrieval mechanism facilitates fast access to archived PMU data. Using this method, we achieved over 30x speedup for queries with high selectivity. With th...

  9. Quarkonium production at the LHC: A data-driven analysis of remarkably simple experimental patterns

    Science.gov (United States)

    Faccioli, Pietro; Lourenço, Carlos; Araújo, Mariana; Knünz, Valentin; Krätschmer, Ilse; Seixas, João

    2017-10-01

    The LHC quarkonium production data reveal a startling observation: the J / ψ, ψ (2S), χc1, χc2 and ϒ (nS)pT-differential cross sections in the central rapidity region are compatible with one universal momentum scaling pattern. Considering also the absence of strong polarizations of directly and indirectly produced S-wave mesons, we conclude that there is currently no evidence of a dependence of the partonic production mechanisms on the quantum numbers and mass of the final state. The experimental observations supporting this universal production scenario are remarkably significant, as shown by a new analysis approach, unbiased by specific theoretical calculations of partonic cross sections, which are only considered a posteriori, in comparisons with the data-driven results.

  10. Combining engineering and data-driven approaches: Development of a generic fire risk model facilitating calibration

    DEFF Research Database (Denmark)

    De Sanctis, G.; Fischer, K.; Kohler, J.

    2014-01-01

    Fire risk models support decision making for engineering problems under the consistent consideration of the associated uncertainties. Empirical approaches can be used for cost-benefit studies when enough data about the decision problem are available. But often the empirical approaches...... a generic risk model that is calibrated to observed fire loss data. Generic risk models assess the risk of buildings based on specific risk indicators and support risk assessment at a portfolio level. After an introduction to the principles of generic risk assessment, the focus of the present paper...... are not detailed enough. Engineering risk models, on the other hand, may be detailed but typically involve assumptions that may result in a biased risk assessment and make a cost-benefit study problematic. In two related papers it is shown how engineering and data-driven modeling can be combined by developing...

  11. Data-driven pile-up correction for track-based analyses

    Energy Technology Data Exchange (ETDEWEB)

    Schulz, Holger; Lacker, Heiko; Leyton, Michael [HU Berlin (Germany); Brandt, Gerhard [University of Oxford (United Kingdom)

    2013-07-01

    The impact of pile-up can have considerable effects on observables measured at the LHC, especially those sensitive to the effects of the underlying event. We present a data-driven method that is based on the HBOM (''Hit Backspace Once More'') approach, to correct track-based distributions for tracks coming from pile-up interactions. We demonstrate successful application to a track-based measurement of event-shapes that are sensitive to the Underlying Event with the ATLAS detector. Tests of the method on Monte-Carlo simulation show closure within O(1-2 %) for the majority of bins of most observables studied.

  12. Adaptive data-driven parallelization of multi-view video coding on multi-core processor

    Institute of Scientific and Technical Information of China (English)

    PANG Yi; HU WeiDong; SUN LiFeng; YANG ShiQiang

    2009-01-01

    Multi-view video coding (MVC) comprises rich 3D information and is widely used in new visual media, such as 3DTV and free viewpoint TV (FTV). However, even with mainstream computer manufacturers migrating to multi-core processors, the huge computational requirement of MVC currently prohibits its wide use in consumer markets. In this paper, we demonstrate the design and implementation of the first parallel MVC system on Cell Broadband EngineTM processor which is a state-of-the-art multi-core processor. We propose a task-dispatching algorithm which is adaptive data-driven on the frame level for MVC, and implement a parallel multi-view video decoder with modified H.264/AVC codec on real machine. This approach provides scalable speedup (up to 16 times on sixteen cores) through proper local store management, utilization of code locality and SIMD improvement. Decoding speed, speedup and utilization rate of cores are expressed in experimental results.

  13. Automatic translation of MPI source into a latency-tolerant, data-driven form

    Energy Technology Data Exchange (ETDEWEB)

    Nguyen, Tan; Cicotti, Pietro; Bylaska, Eric; Quinlan, Dan; Baden, Scott

    2017-08-01

    Hiding communication behind useful computation is an important performance programming technique but remains an inscrutable programming exercise even for the expert. We present Bamboo, a code transformation framework that can realize communication overlap in applications written in MPI without the need to intrusively modify the source code. Bamboo reformulates MPI source into the form of a task dependency graph that expresses a partial ordering among tasks, enabling the program to execute in a data-driven fashion under the control of an external runtime system. Experimental results demonstrate that Bamboo significantly reduces communication delays while requiring only modest amounts of programmer annotation for a variety of applications and platforms, including those employing co-processors and accelerators. Moreover, Bamboo's performance meets or exceeds that of labor-intensive hand coding. The translator is more than a means of hiding communication costs automatically; it demonstrates the utility of semantic level optimization against a wellknown library.

  14. Toward data-driven methods in geophysics: the Analog Data Assimilation

    Science.gov (United States)

    Lguensat, Redouane; Tandeo, Pierre; Ailliot, Pierre; Pulido, Manuel; Fablet, Ronan

    2017-04-01

    The Analog Data Assimilation (AnDA) is a recently introduced data-driven methods for data assimilation where the dynamical model is learned from data, contrary to classical data assimilation where a physical model of the dynamics is needed. AnDA relies on replacing the physical dynamical model by a statistical emulator of the dynamics using analog forecasting methods. Then, the analog dynamical model is incorporated in ensemble-based data assimilation algorithms (Ensemble Kalman Filter and Smoother or Particle Filter). The relevance of the proposed AnDA is demonstrated for Lorenz-63 and Lorenz-96 chaotic dynamics. Applications in meteorology and oceanography as well as potential perspectives that are worthy of investigation are further discussed. We expect that the directions of research we suggest will help in bringing more interest in applied machine learning to geophysical sciences.

  15. Calibrating the pixel-level Kepler imaging data with a causal data-driven model

    CERN Document Server

    Wang, Dun; Hogg, David W; Schölkopf, Bernhard

    2015-01-01

    Astronomical observations are affected by several kinds of noise, each with its own causal source; there is photon noise, stochastic source variability, and residuals coming from imperfect calibration of the detector or telescope. The precision of NASA Kepler photometry for exoplanet science---the most precise photometric measurements of stars ever made---appears to be limited by unknown or untracked variations in spacecraft pointing and temperature, and unmodeled stellar variability. Here we present the Causal Pixel Model (CPM) for Kepler data, a data-driven model intended to capture variability but preserve transit signals. The CPM works at the pixel level so that it can capture very fine-grained information about the variation of the spacecraft. The CPM predicts each target pixel value from a large number of pixels of other stars sharing the instrument variabilities while not containing any information on possible transits in the target star. In addition, we use the target star's future and past (auto-regr...

  16. USACM Thematic Workshop On Uncertainty Quantification And Data-Driven Modeling.

    Energy Technology Data Exchange (ETDEWEB)

    Stewart, James R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-05-01

    The USACM Thematic Workshop on Uncertainty Quantification and Data-Driven Modeling was held on March 23-24, 2017, in Austin, TX. The organizers of the technical program were James R. Stewart of Sandia National Laboratories and Krishna Garikipati of University of Michigan. The administrative organizer was Ruth Hengst, who serves as Program Coordinator for the USACM. The organization of this workshop was coordinated through the USACM Technical Thrust Area on Uncertainty Quantification and Probabilistic Analysis. The workshop website (http://uqpm2017.usacm.org) includes the presentation agenda as well as links to several of the presentation slides (permission to access the presentations was granted by each of those speakers, respectively). Herein, this final report contains the complete workshop program that includes the presentation agenda, the presentation abstracts, and the list of posters.

  17. Data-driven design of fault diagnosis systems nonlinear multimode processes

    CERN Document Server

    Haghani Abandan Sari, Adel

    2014-01-01

    In many industrial applications early detection and diagnosis of abnormal behavior of the plant is of great importance. During the last decades, the complexity of process plants has been drastically increased, which imposes great challenges in development of model-based monitoring approaches and it sometimes becomes unrealistic for modern large-scale processes. The main objective of Adel Haghani Abandan Sari is to study efficient fault diagnosis techniques for complex industrial systems using process historical data and considering the nonlinear behavior of the process. To this end, different methods are presented to solve the fault diagnosis problem based on the overall behavior of the process and its dynamics. Moreover, a novel technique is proposed for fault isolation and determination of the root-cause of the faults in the system, based on the fault impacts on the process measurements. Contents Process monitoring Fault diagnosis and fault-tolerant control Data-driven approaches and decision making Target...

  18. Testing the Utility of a Data-Driven Approach for Assessing BMI from Face Images

    DEFF Research Database (Denmark)

    Wolffhechel, Karin Marie Brandt; Hahn, Amanda C.; Jarmer, Hanne Østergaard

    2015-01-01

    Several lines of evidence suggest that facial cues of adiposity may be important for human social interaction. However, tests for quantifiable cues of body mass index (BMI) in the face have examined only a small number of facial proportions and these proportions were found to have relatively low...... predictive power. Here we employed a data-driven approach in which statistical models were built using principal components (PCs) derived from objectively defined shape and color characteristics in face images. The predictive power of these models was then compared with models based on previously studied...... facial proportions (perimeter-to-area ratio, width-to-height ratio, and cheek-to-jaw width). Models based on 2D shape-only PCs, color-only PCs, and 2D shape and color PCs combined each performed significantly and substantially better than models based on one or more of the previously studied facial...

  19. A new meta-data driven data-sharing storage model for SaaS

    Directory of Open Access Journals (Sweden)

    Li Heng

    2012-11-01

    Full Text Available A multi-tenant database is the primary characteristic of SaaS, it allows SaaS vendors to run a single instance application which supports multiple tenants on the same hardware and software infrastructure. This application should be highly customizable to meet tenants expectations and business requirements. This paper examined current solutions on multi-tenancy, and proposed a new meta-data driven data-sharing storage model for multi-tenant applications. Our design enables tenants to extend their own database schema during multi-tenant application run-time execution to satisfy their business needs. Experimental results show that our model made a good balance between efficiency and customized.

  20. Automatic sleep classification using a data-driven topic model reveals latent sleep states

    DEFF Research Database (Denmark)

    Koch, Henriette; Christensen, Julie Anja Engelhard; Frandsen, Rune

    2014-01-01

    Background: The golden standard for sleep classification uses manual scoring of polysomnography despite points of criticism such as oversimplification, low inter-rater reliability and the standard being designed on young and healthy subjects. New method: To meet the criticism and reveal the latent...... sleep states, this study developed a general and automatic sleep classifier using a data-driven approach. Spectral EEG and EOG measures and eye correlation in 1 s windows were calculated and each sleep epoch was expressed as a mixture of probabilities of latent sleep states by using the topic model...... Latent Dirichlet Allocation. Model application was tested on control subjects and patients with periodic leg movements (PLM) representing a non-neurodegenerative group, and patients with idiopathic REM sleep behavior disorder (iRBD) and Parkinson's Disease (PD) representing a neurodegenerative group...

  1. DOE High Performance Computing Operational Review (HPCOR): Enabling Data-Driven Scientific Discovery at HPC Facilities

    Energy Technology Data Exchange (ETDEWEB)

    Gerber, Richard; Allcock, William; Beggio, Chris; Campbell, Stuart; Cherry, Andrew; Cholia, Shreyas; Dart, Eli; England, Clay; Fahey, Tim; Foertter, Fernanda; Goldstone, Robin; Hick, Jason; Karelitz, David; Kelly, Kaki; Monroe, Laura; Prabhat,; Skinner, David; White, Julia

    2014-10-17

    U.S. Department of Energy (DOE) High Performance Computing (HPC) facilities are on the verge of a paradigm shift in the way they deliver systems and services to science and engineering teams. Research projects are producing a wide variety of data at unprecedented scale and level of complexity, with community-specific services that are part of the data collection and analysis workflow. On June 18-19, 2014 representatives from six DOE HPC centers met in Oakland, CA at the DOE High Performance Operational Review (HPCOR) to discuss how they can best provide facilities and services to enable large-scale data-driven scientific discovery at the DOE national laboratories. The report contains findings from that review.

  2. A Data-Driven Diagnostic Framework for Wind Turbine Structures: A Holistic Approach.

    Science.gov (United States)

    Bogoevska, Simona; Spiridonakos, Minas; Chatzi, Eleni; Dumova-Jovanoska, Elena; Höffer, Rudiger

    2017-03-30

    The complex dynamics of operational wind turbine (WT) structures challenges the applicability of existing structural health monitoring (SHM) strategies for condition assessment. At the center of Europe's renewable energy strategic planning, WT systems call for implementation of strategies that may describe the WT behavior in its complete operational spectrum. The framework proposed in this paper relies on the symbiotic treatment of acting environmental/operational variables and the monitored vibration response of the structure. The approach aims at accurate simulation of the temporal variability characterizing the WT dynamics, and subsequently at the tracking of the evolution of this variability in a longer-term horizon. The bi-component analysis tool is applied on long-term data, collected as part of continuous monitoring campaigns on two actual operating WT structures located in different sites in Germany. The obtained data-driven structural models verify the potential of the proposed strategy for development of an automated SHM diagnostic tool.

  3. Domain-Oriented Data-Driven Data Mining Based on Rough Sets

    Institute of Scientific and Technical Information of China (English)

    Guoyin Wang

    2006-01-01

    understanding of data mining, we proposed a data-driven knowledge acquisition method based on rough sets. It also improved the performance of classical knowledge acquisition methods. In fact, we also find that the domain-driven data mining and user-driven data mining do not conflict with our data-driven data mining. They could be integrated into domain-oriented data-driven data mining. It is just like the views of data base. Users with different views could look at different partial data of a data base. Thus, users with different tasks or objectives wish, or could discover different knowledge (partial knowledge) from the same data base. However, all these partial knowledge should be originally existed in the data base. So, a domain-oriented data-driven data mining method would help us to extract the knowledge which is really existed in a data base, and really interesting and actionable to the real world.

  4. Developing credible AI - linguistic behaviour, simulations, data-driven input, and Turing's legacy

    CERN Document Server

    Paradowski, Michal B

    2011-01-01

    Barring swarm robotics, a substantial share of current machine-human and machine-machine learning and interaction mechanisms are being developed and fed by results of agent-based computer simulations, game-theoretic models, or robotic experiments based on a dyadic interaction pattern. Yet, in real life, humans no less frequently communicate in groups, and take decisions basing on information cumulatively gleaned from more than one single source. These properties should be taken into consideration in the design of autonomous artificial cognitive systems construed to interact with learn from more than one contact or 'neighbour'. To this end, significant practical import can be gleaned from research applying strict science methodology to phenomena humanistic and social, e.g. to discovery of realistic creativity potential spans, or the 'exposure thresholds' after which new information could be accepted by a cognitive system. Such rigorous data-driven research offers the chance of not only approximating to descrip...

  5. Data-driven modeling of systemic delay propagation under severe meteorological conditions

    CERN Document Server

    Fleurquin, Pablo; Eguiluz, Victor M

    2013-01-01

    The upsetting consequences of weather conditions are well known to any person involved in air transportation. Still the quantification of how these disturbances affect delay propagation and the effectiveness of managers and pilots interventions to prevent possible large-scale system failures needs further attention. In this work, we employ an agent-based data-driven model developed using real flight performance registers for the entire US airport network and focus on the events occurring on October 27 2010 in the United States. A major storm complex that was later called the 2010 Superstorm took place that day. Our model correctly reproduces the evolution of the delay-spreading dynamics. By considering different intervention measures, we can even improve the model predictions getting closer to the real delay data. Our model can thus be of help to managers as a tool to assess different intervention measures in order to diminish the impact of disruptive conditions in the air transport system.

  6. Data-driven fault detection for industrial processes canonical correlation analysis and projection based methods

    CERN Document Server

    Chen, Zhiwen

    2017-01-01

    Zhiwen Chen aims to develop advanced fault detection (FD) methods for the monitoring of industrial processes. With the ever increasing demands on reliability and safety in industrial processes, fault detection has become an important issue. Although the model-based fault detection theory has been well studied in the past decades, its applications are limited to large-scale industrial processes because it is difficult to build accurate models. Furthermore, motivated by the limitations of existing data-driven FD methods, novel canonical correlation analysis (CCA) and projection-based methods are proposed from the perspectives of process input and output data, less engineering effort and wide application scope. For performance evaluation of FD methods, a new index is also developed. Contents A New Index for Performance Evaluation of FD Methods CCA-based FD Method for the Monitoring of Stationary Processes Projection-based FD Method for the Monitoring of Dynamic Processes Benchmark Study and Real-Time Implementat...

  7. Data-driven Model-independent Searches for Long-lived Particles at the LHC

    CERN Document Server

    Coccaro, Andrea; Lubatti, H J; Russell, Heather; Shelton, Jessie

    2016-01-01

    Neutral long-lived particles (LLPs) are highly motivated by many BSM scenarios, such as theories of supersymmetry, baryogenesis, and neutral naturalness, and present both tremendous discovery opportunities and experimental challenges for the LHC. A major bottleneck for current LLP searches is the prediction of SM backgrounds, which are often impossible to simulate accurately. In this paper, we propose a general strategy for obtaining differential, data-driven background estimates in LLP searches, thereby notably extending the range of LLP masses and lifetimes that can be discovered at the LHC. We focus on LLPs decaying in the ATLAS Muon System, where triggers providing both signal and control samples are available at the LHC Run-2. While many existing searches require two displaced decays, a detailed knowledge of backgrounds will allow for very inclusive searches that require just one detected LLP decay. As we demonstrate for the $h \\to X X$ signal model of LLP pair production in exotic Higgs decays, this res...

  8. Topological obstructions in the way of data-driven collective variables.

    Science.gov (United States)

    Hashemian, Behrooz; Arroyo, Marino

    2015-01-28

    Nonlinear dimensionality reduction (NLDR) techniques are increasingly used to visualize molecular trajectories and to create data-driven collective variables for enhanced sampling simulations. The success of these methods relies on their ability to identify the essential degrees of freedom characterizing conformational changes. Here, we show that NLDR methods face serious obstacles when the underlying collective variables present periodicities, e.g., arising from proper dihedral angles. As a result, NLDR methods collapse very distant configurations, thus leading to misinterpretations and inefficiencies in enhanced sampling. Here, we identify this largely overlooked problem and discuss possible approaches to overcome it. We also characterize the geometry and topology of conformational changes of alanine dipeptide, a benchmark system for testing new methods to identify collective variables.

  9. A new data-driven controllability measure with application in intelligent buildings

    DEFF Research Database (Denmark)

    Shaker, Hamid Reza; Lazarova-Molnar, Sanja

    2017-01-01

    and sensors, and information obtained by data processing. This differs from the traditional model-based approaches that are based on mathematical models of systems. We propose and describe a data-driven controllability measure for discrete-time linear systems. The concept is developed within a data-based......Buildings account for ca. 40% of the total energy consumption and ca. 20% of the total CO2 emissions. More effective and advanced control integrated into Building Management Systems (BMS) represents an opportunity to improve energy efficiency. The ease of availability of sensors technology...... and instrumentation within today's intelligent buildings enable collecting high quality data which could be used directly in data-based analysis and control methods. The area of data-based systems analysis and control is concentrating on developing analysis and control methods that rely on data collected from meters...

  10. Data-Driven Multiagent Systems Consensus Tracking Using Model Free Adaptive Control.

    Science.gov (United States)

    Bu, Xuhui; Hou, Zhongsheng; Zhang, Hongwei

    2017-03-14

    This paper investigates the data-driven consensus tracking problem for multiagent systems with both fixed communication topology and switching topology by utilizing a distributed model free adaptive control (MFAC) method. Here, agent's dynamics are described by unknown nonlinear systems and only a subset of followers can access the desired trajectory. The dynamical linearization technique is applied to each agent based on the pseudo partial derivative, and then, a distributed MFAC algorithm is proposed to ensure that all agents can track the desired trajectory. It is shown that the consensus error can be reduced for both time invariable and time varying desired trajectories. The main feature of this design is that consensus tracking can be achieved using only input-output data of each agent. The effectiveness of the proposed design is verified by simulation examples.

  11. Manifold Learning With Contracting Observers for Data-Driven Time-Series Analysis

    Science.gov (United States)

    Shnitzer, Tal; Talmon, Ronen; Slotine, Jean-Jacques

    2017-02-01

    Analyzing signals arising from dynamical systems typically requires many modeling assumptions and parameter estimation. In high dimensions, this modeling is particularly difficult due to the "curse of dimensionality". In this paper, we propose a method for building an intrinsic representation of such signals in a purely data-driven manner. First, we apply a manifold learning technique, diffusion maps, to learn the intrinsic model of the latent variables of the dynamical system, solely from the measurements. Second, we use concepts and tools from control theory and build a linear contracting observer to estimate the latent variables in a sequential manner from new incoming measurements. The effectiveness of the presented framework is demonstrated by applying it to a toy problem and to a music analysis application. In these examples we show that our method reveals the intrinsic variables of the analyzed dynamical systems.

  12. Design and experiment of data-driven modeling and flutter control of a prototype wing

    Science.gov (United States)

    Lum, Kai-Yew; Xu, Cai-Lin; Lu, Zhenbo; Lai, Kwok-Leung; Cui, Yongdong

    2017-06-01

    This paper presents an approach for data-driven modeling of aeroelasticity and its application to flutter control design of a wind-tunnel wing model. Modeling is centered on system identification of unsteady aerodynamic loads using computational fluid dynamics data, and adopts a nonlinear multivariable extension of the Hammerstein-Wiener system. The formulation is in modal coordinates of the elastic structure, and yields a reduced-order model of the aeroelastic feedback loop that is parametrized by airspeed. Flutter suppression is thus cast as a robust stabilization problem over uncertain airspeed, for which a low-order H∞ controller is computed. The paper discusses in detail parameter sensitivity and observability of the model, the former to justify the chosen model structure, and the latter to provide a criterion for physical sensor placement. Wind tunnel experiments confirm the validity of the modeling approach and the effectiveness of the control design.

  13. Physical Strength as a Cue to Dominance: A Data-Driven Approach.

    Science.gov (United States)

    Toscano, Hugo; Schubert, Thomas W; Dotsch, Ron; Falvello, Virginia; Todorov, Alexander

    2016-12-01

    We investigate both similarities and differences between dominance and strength judgments using a data-driven approach. First, we created statistical face shape models of judgments of both dominance and physical strength. The resulting faces representing dominance and strength were highly similar, and participants were at chance in discriminating faces generated by the two models. Second, although the models are highly correlated, it is possible to create a model that captures their differences. This model generates faces that vary from dominant-yet-physically weak to nondominant-yet-physically strong. Participants were able to identify the difference in strength between the physically strong-yet-nondominant faces and the physically weak-yet-dominant faces. However, this was not the case for identifying dominance. These results suggest that representations of social dominance and physical strength are highly similar, and that strength is used as a cue for dominance more than dominance is used as a cue for strength.

  14. Image Resolution Enhancement via Data-Driven Parametric Models in the Wavelet Space

    Directory of Open Access Journals (Sweden)

    Xin Li

    2007-02-01

    Full Text Available We present a data-driven, project-based algorithm which enhances image resolution by extrapolating high-band wavelet coefficients. High-resolution images are reconstructed by alternating the projections onto two constraint sets: the observation constraint defined by the given low-resolution image and the prior constraint derived from the training data at the high resolution (HR. Two types of prior constraints are considered: spatially homogeneous constraint suitable for texture images and patch-based inhomogeneous one for generic images. A probabilistic fusion strategy is developed for combining reconstructed HR patches when overlapping (redundancy is present. It is argued that objective fidelity measure is important to evaluate the performance of resolution enhancement techniques and the role of antialiasing filter should be properly addressed. Experimental results are reported to show that our projection-based approach can achieve both good subjective and objective performance especially for the class of texture images.

  15. Image Resolution Enhancement via Data-Driven Parametric Models in the Wavelet Space

    Directory of Open Access Journals (Sweden)

    Li Xin

    2007-01-01

    Full Text Available We present a data-driven, project-based algorithm which enhances image resolution by extrapolating high-band wavelet coefficients. High-resolution images are reconstructed by alternating the projections onto two constraint sets: the observation constraint defined by the given low-resolution image and the prior constraint derived from the training data at the high resolution (HR. Two types of prior constraints are considered: spatially homogeneous constraint suitable for texture images and patch-based inhomogeneous one for generic images. A probabilistic fusion strategy is developed for combining reconstructed HR patches when overlapping (redundancy is present. It is argued that objective fidelity measure is important to evaluate the performance of resolution enhancement techniques and the role of antialiasing filter should be properly addressed. Experimental results are reported to show that our projection-based approach can achieve both good subjective and objective performance especially for the class of texture images.

  16. Sequencing quality assessment tools to enable data-driven informatics for high throughput genomics

    Directory of Open Access Journals (Sweden)

    Richard Mark Leggett

    2013-12-01

    Full Text Available The processes of quality assessment and control are an active area of research at The Genome Analysis Centre (TGAC. Unlike other sequencing centres that often concentrate on a certain species or technology, TGAC applies expertise in genomics and bioinformatics to a wide range of projects, often requiring bespoke wet lab and in silico workflows. TGAC is fortunate to have access to a diverse range of sequencing and analysis platforms, and we are at the forefront of investigations into library quality and sequence data assessment. We have developed and implemented a number of algorithms, tools, pipelines and packages to ascertain, store, and expose quality metrics across a number of next-generation sequencing platforms, allowing rapid and in-depth cross-platform QC bioinformatics. In this review, we describe these tools as a vehicle for data-driven informatics, offering the potential to provide richer context for downstream analysis and to inform experimental design.

  17. Secondary Use of Clinical Data to Enable Data-Driven Translational Science with Trustworthy Access Management.

    Science.gov (United States)

    Mosa, Abu Saleh Mohammad; Yoo, Illhoi; Apathy, Nate C; Ko, Kelly J; Parker, Jerry C

    2015-01-01

    University of Missouri (MU) Health Care produces a large amount of digitized clinical data that can be used in clinical and translational research for cohort identification, retrospective data analysis, feasibility study, and hypothesis generation. In this article, the implementation of an integrated clinical research data repository is discussed. We developed trustworthy access-management protocol for providing access to both clinically relevant data and protected health information. As of September 2014, the database contains approximately 400,000 patients and 82 million observations; and is growing daily. The system will facilitate the secondary use of electronic health record (EHR) data at MU to promote data-driven clinical and translational research, in turn enabling better healthcare through research.

  18. Data-driven optimization and knowledge discovery for an enterprise information system

    CERN Document Server

    Duan, Qing; Zeng, Jun

    2015-01-01

    This book provides a comprehensive set of optimization and prediction techniques for an enterprise information system. Readers with a background in operations research, system engineering, statistics, or data analytics can use this book as a reference to derive insight from data and use this knowledge as guidance for production management. The authors identify the key challenges in enterprise information management and present results that have emerged from leading-edge research in this domain. Coverage includes topics ranging from task scheduling and resource allocation, to workflow optimization, process time and status prediction, order admission policies optimization, and enterprise service-level performance analysis and prediction. With its emphasis on the above topics, this book provides an in-depth look at enterprise information management solutions that are needed for greater automation and reconfigurability-based fault tolerance, as well as to obtain data-driven recommendations for effective decision-...

  19. Estimation of DSGE Models under Diffuse Priors and Data-Driven Identification Constraints

    DEFF Research Database (Denmark)

    Lanne, Markku; Luoto, Jani

    the properties of the estimation method, and shows how the problem of multimodal posterior distributions caused by parameter redundancy is eliminated by identification constraints. Out-of-sample forecast comparisons as well as Bayes factors lend support to the constrained model.......We propose a sequential Monte Carlo (SMC) method augmented with an importance sampling step for estimation of DSGE models. In addition to being theoretically well motivated, the new method facilitates the assessment of estimation accuracy. Furthermore, in order to alleviate the problem...... of multimodal posterior distributions due to poor identification of DSGE models when uninformative prior distributions are assumed, we recommend imposing data-driven identification constraints and devise a procedure for finding them. An empirical application to the Smets-Wouters (2007) model demonstrates...

  20. First-principles data-driven discovery of transition metal oxides for artificial photosynthesis

    Science.gov (United States)

    Yan, Qimin

    We develop a first-principles data-driven approach for rapid identification of transition metal oxide (TMO) light absorbers and photocatalysts for artificial photosynthesis using the Materials Project. Initially focusing on Cr, V, and Mn-based ternary TMOs in the database, we design a broadly-applicable multiple-layer screening workflow automating density functional theory (DFT) and hybrid functional calculations of bulk and surface electronic and magnetic structures. We further assess the electrochemical stability of TMOs in aqueous environments from computed Pourbaix diagrams. Several promising earth-abundant low band-gap TMO compounds with desirable band edge energies and electrochemical stability are identified by our computational efforts and then synergistically evaluated using high-throughput synthesis and photoelectrochemical screening techniques by our experimental collaborators at Caltech. Our joint theory-experiment effort has successfully identified new earth-abundant copper and manganese vanadate complex oxides that meet highly demanding requirements for photoanodes, substantially expanding the known space of such materials. By integrating theory and experiment, we validate our approach and develop important new insights into structure-property relationships for TMOs for oxygen evolution photocatalysts, paving the way for use of first-principles data-driven techniques in future applications. This work is supported by the Materials Project Predictive Modeling Center and the Joint Center for Artificial Photosynthesis through the U.S. Department of Energy, Office of Basic Energy Sciences, Materials Sciences and Engineering Division, under Contract No. DE-AC02-05CH11231. Computational resources also provided by the Department of Energy through the National Energy Supercomputing Center.

  1. A data-driven prediction method for fast-slow systems

    Science.gov (United States)

    Groth, Andreas; Chekroun, Mickael; Kondrashov, Dmitri; Ghil, Michael

    2016-04-01

    In this work, we present a prediction method for processes that exhibit a mixture of variability on low and fast scales. The method relies on combining empirical model reduction (EMR) with singular spectrum analysis (SSA). EMR is a data-driven methodology for constructing stochastic low-dimensional models that account for nonlinearity and serial correlation in the estimated noise, while SSA provides a decomposition of the complex dynamics into low-order components that capture spatio-temporal behavior on different time scales. Our study focuses on the data-driven modeling of partial observations from dynamical systems that exhibit power spectra with broad peaks. The main result in this talk is that the combination of SSA pre-filtering with EMR modeling improves, under certain circumstances, the modeling and prediction skill of such a system, as compared to a standard EMR prediction based on raw data. Specifically, it is the separation into "fast" and "slow" temporal scales by the SSA pre-filtering that achieves the improvement. We show, in particular that the resulting EMR-SSA emulators help predict intermittent behavior such as rapid transitions between specific regions of the system's phase space. This capability of the EMR-SSA prediction will be demonstrated on two low-dimensional models: the Rössler system and a Lotka-Volterra model for interspecies competition. In either case, the chaotic dynamics is produced through a Shilnikov-type mechanism and we argue that the latter seems to be an important ingredient for the good prediction skills of EMR-SSA emulators. Shilnikov-type behavior has been shown to arise in various complex geophysical fluid models, such as baroclinic quasi-geostrophic flows in the mid-latitude atmosphere and wind-driven double-gyre ocean circulation models. This pervasiveness of the Shilnikow mechanism of fast-slow transition opens interesting perspectives for the extension of the proposed EMR-SSA approach to more realistic situations.

  2. Testing the Accuracy of Data-driven MHD Simulations of Active Region Evolution

    Science.gov (United States)

    Leake, James E.; Linton, Mark G.; Schuck, Peter W.

    2017-04-01

    Models for the evolution of the solar coronal magnetic field are vital for understanding solar activity, yet the best measurements of the magnetic field lie at the photosphere, necessitating the development of coronal models which are “data-driven” at the photosphere. We present an investigation to determine the feasibility and accuracy of such methods. Our validation framework uses a simulation of active region (AR) formation, modeling the emergence of magnetic flux from the convection zone to the corona, as a ground-truth data set, to supply both the photospheric information and to perform the validation of the data-driven method. We focus our investigation on how the accuracy of the data-driven model depends on the temporal frequency of the driving data. The Helioseismic and Magnetic Imager on NASA’s Solar Dynamics Observatory produces full-disk vector magnetic field measurements at a 12-minute cadence. Using our framework we show that ARs that emerge over 25 hr can be modeled by the data-driving method with only ∼1% error in the free magnetic energy, assuming the photospheric information is specified every 12 minutes. However, for rapidly evolving features, under-sampling of the dynamics at this cadence leads to a strobe effect, generating large electric currents and incorrect coronal morphology and energies. We derive a sampling condition for the driving cadence based on the evolution of these small-scale features, and show that higher-cadence driving can lead to acceptable errors. Future work will investigate the source of errors associated with deriving plasma variables from the photospheric magnetograms as well as other sources of errors, such as reduced resolution, instrument bias, and noise.

  3. Data-Driven Design of Intelligent Wireless Networks: An Overview and Tutorial.

    Science.gov (United States)

    Kulin, Merima; Fortuna, Carolina; De Poorter, Eli; Deschrijver, Dirk; Moerman, Ingrid

    2016-06-01

    Data science or "data-driven research" is a research approach that uses real-life data to gain insight about the behavior of systems. It enables the analysis of small, simple as well as large and more complex systems in order to assess whether they function according to the intended design and as seen in simulation. Data science approaches have been successfully applied to analyze networked interactions in several research areas such as large-scale social networks, advanced business and healthcare processes. Wireless networks can exhibit unpredictable interactions between algorithms from multiple protocol layers, interactions between multiple devices, and hardware specific influences. These interactions can lead to a difference between real-world functioning and design time functioning. Data science methods can help to detect the actual behavior and possibly help to correct it. Data science is increasingly used in wireless research. To support data-driven research in wireless networks, this paper illustrates the step-by-step methodology that has to be applied to extract knowledge from raw data traces. To this end, the paper (i) clarifies when, why and how to use data science in wireless network research; (ii) provides a generic framework for applying data science in wireless networks; (iii) gives an overview of existing research papers that utilized data science approaches in wireless networks; (iv) illustrates the overall knowledge discovery process through an extensive example in which device types are identified based on their traffic patterns; (v) provides the reader the necessary datasets and scripts to go through the tutorial steps themselves.

  4. A data-driven method to enhance vibration signal decomposition for rolling bearing fault analysis

    Science.gov (United States)

    Grasso, M.; Chatterton, S.; Pennacchi, P.; Colosimo, B. M.

    2016-12-01

    Health condition analysis and diagnostics of rotating machinery requires the capability of properly characterizing the information content of sensor signals in order to detect and identify possible fault features. Time-frequency analysis plays a fundamental role, as it allows determining both the existence and the causes of a fault. The separation of components belonging to different time-frequency scales, either associated to healthy or faulty conditions, represents a challenge that motivates the development of effective methodologies for multi-scale signal decomposition. In this framework, the Empirical Mode Decomposition (EMD) is a flexible tool, thanks to its data-driven and adaptive nature. However, the EMD usually yields an over-decomposition of the original signals into a large number of intrinsic mode functions (IMFs). The selection of most relevant IMFs is a challenging task, and the reference literature lacks automated methods to achieve a synthetic decomposition into few physically meaningful modes by avoiding the generation of spurious or meaningless modes. The paper proposes a novel automated approach aimed at generating a decomposition into a minimal number of relevant modes, called Combined Mode Functions (CMFs), each consisting in a sum of adjacent IMFs that share similar properties. The final number of CMFs is selected in a fully data driven way, leading to an enhanced characterization of the signal content without any information loss. A novel criterion to assess the dissimilarity between adjacent CMFs is proposed, based on probability density functions of frequency spectra. The method is suitable to analyze vibration signals that may be periodically acquired within the operating life of rotating machineries. A rolling element bearing fault analysis based on experimental data is presented to demonstrate the performances of the method and the provided benefits.

  5. A Novel Data-Driven Learning Method for Radar Target Detection in Nonstationary Environments

    Science.gov (United States)

    2016-05-01

    supported by the U.S. Missile Defense Agency (MDA) at the Oak Ridge National Laboratory, managed by UT- Battelle , LLC, for the U.S. Department of Energy...clutter characteristics, our proposed machine learning based radar first checks for any change in the clutter distribution by employing the CUSUM test , and... testing for the confidence intervals for the sample mean and sample variance. Specifically, we assume the availability of a batch of Nt training

  6. On the distribution of batch shelf lives.

    Science.gov (United States)

    Quinlan, Michelle; Stroup, Walter; Christopher, David; Schwenke, James

    2013-01-01

    Implicit in ICH Q1E (International Conference on Harmonization [ICH], 2003b ) are definitions of batch shelf life (the time the batch mean crosses the acceptance limit) and product shelf life (the minimum batch shelf life). The distribution of batch means over time projects to a distribution of batch shelf lives on the x-axis. Assuming multivariate normality, shelf life is the ratio of correlated Gaussian variables. Using Hinkley ( 1969 ), we describe the relationship between quantiles of the distributions of batch shelf lives and batch means. Exploiting this relationship, a linear mixed model is used to estimate a target quantile of batch shelf lives to address the ICH objective.

  7. Fork-join and data-driven execution models on multi-core architectures: Case study of the FMM

    KAUST Repository

    Amer, Abdelhalim

    2013-01-01

    Extracting maximum performance of multi-core architectures is a difficult task primarily due to bandwidth limitations of the memory subsystem and its complex hierarchy. In this work, we study the implications of fork-join and data-driven execution models on this type of architecture at the level of task parallelism. For this purpose, we use a highly optimized fork-join based implementation of the FMM and extend it to a data-driven implementation using a distributed task scheduling approach. This study exposes some limitations of the conventional fork-join implementation in terms of synchronization overheads. We find that these are not negligible and their elimination by the data-driven method, with a careful data locality strategy, was beneficial. Experimental evaluation of both methods on state-of-the-art multi-socket multi-core architectures showed up to 22% speed-ups of the data-driven approach compared to the original method. We demonstrate that a data-driven execution of FMM not only improves performance by avoiding global synchronization overheads but also reduces the memory-bandwidth pressure caused by memory-intensive computations. © 2013 Springer-Verlag.

  8. Tracking Invasive Alien Species (TrIAS: Building a data-driven framework to inform policy

    Directory of Open Access Journals (Sweden)

    Sonia Vanderhoeven

    2017-05-01

    Full Text Available Imagine a future where dynamically, from year to year, we can track the progression of alien species (AS, identify emerging problem species, assess their current and future risk and timely inform policy in a seamless data-driven workflow. One that is built on open science and open data infrastructures. By using international biodiversity standards and facilities, we would ensure interoperability, repeatability and sustainability. This would make the process adaptable to future requirements in an evolving AS policy landscape both locally and internationally. In recent years, Belgium has developed decision support tools to inform invasive alien species (IAS policy, including information systems, early warning initiatives and risk assessment protocols. However, the current workflows from biodiversity observations to IAS science and policy are slow, not easily repeatable, and their scope is often taxonomically, spatially and temporally limited. This is mainly caused by the diversity of actors involved and the closed, fragmented nature of the sources of these biodiversity data, which leads to considerable knowledge gaps for IAS research and policy. We will leverage expertise and knowledge from nine former and current BELSPO projects and initiatives: Alien Alert, Invaxen, Diars, INPLANBEL, Alien Impact, Ensis, CORDEX.be, Speedy and the Belgian Biodiversity Platform. The project will be built on two components: 1 The establishment of a data mobilization framework for AS data from diverse data sources and 2 the development of data-driven procedures for risk evaluation based on risk modelling, risk mapping and risk assessment. We will use facilities from the Global Biodiversity Information Facility (GBIF, standards from the Biodiversity Information Standards organization (TDWG and expertise from Lifewatch to create and facilitate a systematic workflow. Alien species data will be gathered from a large set of regional, national and international

  9. Integration of data-driven and physically-based methods to assess shallow landslides susceptibility

    Science.gov (United States)

    Lajas, Sara; Oliveira, Sérgio C.; Zêzere, José Luis

    2016-04-01

    Approaches used to assess shallow landslides susceptibility at the basin scale are conceptually different depending on the use of statistic or deterministic methods. The data-driven methods are sustained in the assumption that the same causes are likely to produce the same effects and for that reason a present/past landslide inventory and a dataset of factors assumed as predisposing factors are crucial for the landslide susceptibility assessment. The physically-based methods are based on a system controlled by physical laws and soil mechanics, where the forces which tend to promote movement are compared with forces that tend to promote resistance to movement. In this case, the evaluation of susceptibility is supported by the calculation of the Factor of safety (FoS), and dependent of the availability of detailed data related with the slope geometry and hydrological and geotechnical properties of the soils and rocks. Within this framework, this work aims to test two hypothesis: (i) although conceptually distinct and based on contrasting procedures, statistic and deterministic methods generate similar shallow landslides susceptibility results regarding the predictive capacity and spatial agreement; and (ii) the integration of the shallow landslides susceptibility maps obtained with data-driven and physically-based methods, for the same study area, generate a more reliable susceptibility model for shallow landslides occurrence. To evaluate these two hypotheses, we select the Information Value data-driven method and the physically-based Infinite Slope model to evaluate shallow landslides in the study area of Monfalim and Louriceira basins (13.9 km2), which is located in the north of Lisbon region (Portugal). The landslide inventory is composed by 111 shallow landslides and was divide in two independent groups based on temporal criteria (age ≤ 1983 and age > 1983): (i) the modelling group (51 cases) was used to define the weights for each predisposing factor

  10. Data-Driven Extraction of a Nested Model of Human Brain Function.

    Science.gov (United States)

    Bolt, Taylor; Nomi, Jason S; Yeo, B T Thomas; Uddin, Lucina Q

    2017-07-26

    Decades of cognitive neuroscience research have revealed two basic facts regarding task-driven brain activation patterns. First, distinct patterns of activation occur in response to different task demands. Second, a superordinate, dichotomous pattern of activation/deactivation, is common across a variety of task demands. We explore the possibility that a hierarchical model incorporates these two observed brain activation phenomena into a unifying framework. We apply a latent variable approach, exploratory bifactor analysis, to a large set of human (both sexes) brain activation maps (n = 108) encompassing cognition, perception, action, and emotion behavioral domains, to determine the potential existence of a nested structure of factors that underlie a variety of commonly observed activation patterns. We find that a general factor, associated with a superordinate brain activation/deactivation pattern, explained the majority of the variance (52.37%) in brain activation patterns. The bifactor analysis also revealed several subfactors that explained an additional 31.02% of variance in brain activation patterns, associated with different manifestations of the superordinate brain activation/deactivation pattern, each emphasizing different contexts in which the task demands occurred. Importantly, this nested factor structure provided better overall fit to the data compared with a non-nested factor structure model. These results point to a domain-general psychological process, representing a "focused awareness" process or "attentional episode" that is variously manifested according to the sensory modality of the stimulus and degree of cognitive processing. This novel model provides the basis for constructing a biologically informed, data-driven taxonomy of psychological processes.SIGNIFICANCE STATEMENT A crucial step in identifying how the brain supports various psychological processes is a well-defined categorization or taxonomy of psychological processes and their

  11. Data-driven and hybrid coastal morphological prediction methods for mesoscale forecasting

    Science.gov (United States)

    Reeve, Dominic E.; Karunarathna, Harshinie; Pan, Shunqi; Horrillo-Caraballo, Jose M.; Różyński, Grzegorz; Ranasinghe, Roshanka

    2016-03-01

    It is now common for coastal planning to anticipate changes anywhere from 70 to 100 years into the future. The process models developed and used for scheme design or for large-scale oceanography are currently inadequate for this task. This has prompted the development of a plethora of alternative methods. Some, such as reduced complexity or hybrid models simplify the governing equations retaining processes that are considered to govern observed morphological behaviour. The computational cost of these models is low and they have proven effective in exploring morphodynamic trends and improving our understanding of mesoscale behaviour. One drawback is that there is no generally agreed set of principles on which to make the simplifying assumptions and predictions can vary considerably between models. An alternative approach is data-driven techniques that are based entirely on analysis and extrapolation of observations. Here, we discuss the application of some of the better known and emerging methods in this category to argue that with the increasing availability of observations from coastal monitoring programmes and the development of more sophisticated statistical analysis techniques data-driven models provide a valuable addition to the armoury of methods available for mesoscale prediction. The continuation of established monitoring programmes is paramount, and those that provide contemporaneous records of the driving forces and the shoreline response are the most valuable in this regard. In the second part of the paper we discuss some recent research that combining some of the hybrid techniques with data analysis methods in order to synthesise a more consistent means of predicting mesoscale coastal morphological evolution. While encouraging in certain applications a universally applicable approach has yet to be found. The route to linking different model types is highlighted as a major challenge and requires further research to establish its viability. We argue that

  12. Microenvironment temperature prediction between body and seat interface using autoregressive data-driven model.

    Science.gov (United States)

    Liu, Zhuofu; Wang, Lin; Luo, Zhongming; Heusch, Andrew I; Cascioli, Vincenzo; McCarthy, Peter W

    2015-11-01

    There is a need to develop a greater understanding of temperature at the skin-seat interface during prolonged seating from the perspectives of both industrial design (comfort/discomfort) and medical care (skin ulcer formation). Here we test the concept of predicting temperature at the seat surface and skin interface during prolonged sitting (such as required from wheelchair users). As caregivers are usually busy, such a method would give them warning ahead of a problem. This paper describes a data-driven model capable of predicting thermal changes and thus having the potential to provide an early warning (15- to 25-min ahead prediction) of an impending temperature that may increase the risk for potential skin damages for those subject to enforced sitting and who have little or no sensory feedback from this area. Initially, the oscillations of the original signal are suppressed using the reconstruction strategy of empirical mode decomposition (EMD). Consequentially, the autoregressive data-driven model can be used to predict future thermal trends based on a shorter period of acquisition, which reduces the possibility of introducing human errors and artefacts associated with longer duration "enforced" sitting by volunteers. In this study, the method had a maximum predictive error of <0.4 °C when used to predict the temperature at the seat and skin interface 15 min ahead, but required 45 min data prior to give this accuracy. Although the 45 min front loading of data appears large (in proportion to the 15 min prediction), a relative strength derives from the fact that the same algorithm could be used on the other 4 sitting datasets created by the same individual, suggesting that the period of 45 min required to train the algorithm is transferable to other data from the same individual. This approach might be developed (along with incorporation of other measures such as movement and humidity) into a system that can give caregivers prior warning to help avoid

  13. Voting-based cancer module identification by combining topological and data-driven properties.

    Science.gov (United States)

    Azad, A K M; Lee, Hyunju

    2013-01-01

    Recently, computational approaches integrating copy number aberrations (CNAs) and gene expression (GE) have been extensively studied to identify cancer-related genes and pathways. In this work, we integrate these two data sets with protein-protein interaction (PPI) information to find cancer-related functional modules. To integrate CNA and GE data, we first built a gene-gene relationship network from a set of seed genes by enumerating all types of pairwise correlations, e.g. GE-GE, CNA-GE, and CNA-CNA, over multiple patients. Next, we propose a voting-based cancer module identification algorithm by combining topological and data-driven properties (VToD algorithm) by using the gene-gene relationship network as a source of data-driven information, and the PPI data as topological information. We applied the VToD algorithm to 266 glioblastoma multiforme (GBM) and 96 ovarian carcinoma (OVC) samples that have both expression and copy number measurements, and identified 22 GBM modules and 23 OVC modules. Among 22 GBM modules, 15, 12, and 20 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Among 23 OVC modules, 19, 18, and 23 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Similarly, we also observed that 9 and 2 GBM modules and 15 and 18 OVC modules were enriched with cancer gene census (CGC) and specific cancer driver genes, respectively. Our proposed module-detection algorithm significantly outperformed other existing methods in terms of both functional and cancer gene set enrichments. Most of the cancer-related pathways from both cancer data sets found in our algorithm contained more than two types of gene-gene relationships, showing strong positive correlations between the number of different types of relationship and CGC enrichment [Formula: see text]-values (0.64 for GBM and 0.49 for OVC). This study suggests that identified modules containing

  14. WIFIRE: A Scalable Data-Driven Monitoring, Dynamic Prediction and Resilience Cyberinfrastructure for Wildfires

    Science.gov (United States)

    Altintas, I.; Block, J.; Braun, H.; de Callafon, R. A.; Gollner, M. J.; Smarr, L.; Trouve, A.

    2013-12-01

    Recent studies confirm that climate change will cause wildfires to increase in frequency and severity in the coming decades especially for California and in much of the North American West. The most critical sustainability issue in the midst of these ever-changing dynamics is how to achieve a new social-ecological equilibrium of this fire ecology. Wildfire wind speeds and directions change in an instant, and first responders can only be effective when they take action as quickly as the conditions change. To deliver information needed for sustainable policy and management in this dynamically changing fire regime, we must capture these details to understand the environmental processes. We are building an end-to-end cyberinfrastructure (CI), called WIFIRE, for real-time and data-driven simulation, prediction and visualization of wildfire behavior. The WIFIRE integrated CI system supports social-ecological resilience to the changing fire ecology regime in the face of urban dynamics and climate change. Networked observations, e.g., heterogeneous satellite data and real-time remote sensor data is integrated with computational techniques in signal processing, visualization, modeling and data assimilation to provide a scalable, technological, and educational solution to monitor weather patterns to predict a wildfire's Rate of Spread. Our collaborative WIFIRE team of scientists, engineers, technologists, government policy managers, private industry, and firefighters architects implement CI pathways that enable joint innovation for wildfire management. Scientific workflows are used as an integrative distributed programming model and simplify the implementation of engineering modules for data-driven simulation, prediction and visualization while allowing integration with large-scale computing facilities. WIFIRE will be scalable to users with different skill-levels via specialized web interfaces and user-specified alerts for environmental events broadcasted to receivers before

  15. BatchJobs and BatchExperiments: Abstraction Mechanisms for Using R in Batch Environments

    Directory of Open Access Journals (Sweden)

    Bernd Bischl

    2015-03-01

    Full Text Available Empirical analysis of statistical algorithms often demands time-consuming experiments. We present two R packages which greatly simplify working in batch computing environments. The package BatchJobs implements the basic objects and procedures to control any batch cluster from within R. It is structured around cluster versions of the well-known higher order functions Map, Reduce and Filter from functional programming. Computations are performed asynchronously and all job states are persistently stored in a database, which can be queried at any point in time. The second package, BatchExperiments, is tailored for the still very general scenario of analyzing arbitrary algorithms on problem instances. It extends package BatchJobs by letting the user define an array of jobs of the kind apply algorithm A to problem instance P and store results. It is possible to associate statistical designs with parameters of problems and algorithms and therefore to systematically study their influence on the results. The packages main features are: (a Convenient usage: All relevant batch system operations are either handled internally or mapped to simple R functions. (b Portability: Both packages use a clear and well-defined interface to the batch system which makes them applicable in most high-performance computing environments. (c Reproducibility: Every computational part has an associated seed to ensure reproducibility even when the underlying batch system changes. (d Abstraction and good software design: The code layers for algorithms, experiment definitions and execution are cleanly separated and enable the writing of readable and maintainable code.

  16. Review of the Remaining Useful Life Prognostics of Vehicle Lithium-Ion Batteries Using Data-Driven Methodologies

    Directory of Open Access Journals (Sweden)

    Lifeng Wu

    2016-05-01

    Full Text Available Lithium-ion batteries are the primary power source in electric vehicles, and the prognosis of their remaining useful life is vital for ensuring the safety, stability, and long lifetime of electric vehicles. Accurately establishing a mechanism model of a vehicle lithium-ion battery involves a complex electrochemical process. Remaining useful life (RUL prognostics based on data-driven methods has become a focus of research. Current research on data-driven methodologies is summarized in this paper. By analyzing the problems of vehicle lithium-ion batteries in practical applications, the problems that need to be solved in the future are identified.

  17. A data driven approach for detection and isolation of anomalies in a group of UAVs

    Institute of Scientific and Technical Information of China (English)

    Wang Yin; Wang Daobo; Wang Jianhong

    2015-01-01

    The use of groups of unmanned aerial vehicles (UAVs) has greatly expanded UAV’s capa-bilities in a variety of applications, such as surveillance, searching and mapping. As the UAVs are operated as a team, it is important to detect and isolate the occurrence of anomalous aircraft in order to avoid collisions and other risks that would affect the safety of the team. In this paper, we present a data-driven approach to detect and isolate abnormal aircraft within a team of formatted flying aerial vehicles, which removes the requirements for the prior knowledge of the underlying dynamic model in conventional model-based fault detection algorithms. Based on the assumption that normal behaviored UAVs should share similar (dynamic) model parameters, we propose to firstly identify the model parameters for each aircraft of the team based on a sequence of input and output data pairs, and this is achieved by a novel sparse optimization technique. The fault states of the UAVs would be detected and isolated in the second step by identifying the change of model parameters. Simulation results have demonstrated the efficiency and flexibility of the proposed approach.

  18. New data-driven method from 3D confocal microscopy for calculating phytoplankton cell biovolume.

    Science.gov (United States)

    Roselli, L; Paparella, F; Stanca, E; Basset, A

    2015-06-01

    Confocal laser scanner microscopy coupled with an image analysis system was used to directly determine the shape and calculate the biovolume of phytoplankton organisms by constructing 3D models of cells. The study was performed on Biceratium furca (Ehrenberg) Vanhoeffen, which is one of the most complex-shaped phytoplankton. Traditionally, biovolume is obtained from a standardized set of geometric models based on linear dimensions measured by light microscopy. However, especially in the case of complex-shaped cells, biovolume is affected by very large errors associated with the numerous manual measurements that this entails. We evaluate the accuracy of these traditional methods by comparing the results obtained using geometric models with direct biovolume measurement by image analysis. Our results show cell biovolume measurement based on decomposition into simple geometrical shapes can be highly inaccurate. Although we assume that the most accurate cell shape is obtained by 3D direct biovolume measurement, which is based on voxel counting, the intrinsic uncertainty of this method is explored and assessed. Finally, we implement a data-driven formula-based approach to the calculation of biovolume of this complex-shaped organism. On one hand, the model is obtained from 3D direct calculation. On the other hand, it is based on just two linear dimensions which can easily be measured by hand. This approach has already been used for investigating the complexities of morphology and for determining the 3D structure of cells. It could also represent a novel way to generalize scaling laws for biovolume calculation.

  19. Data-driven asthma endotypes defined from blood biomarker and gene expression data.

    Directory of Open Access Journals (Sweden)

    Barbara Jane George

    Full Text Available The diagnosis and treatment of childhood asthma is complicated by its mechanistically distinct subtypes (endotypes driven by genetic susceptibility and modulating environmental factors. Clinical biomarkers and blood gene expression were collected from a stratified, cross-sectional study of asthmatic and non-asthmatic children from Detroit, MI. This study describes four distinct asthma endotypes identified via a purely data-driven method. Our method was specifically designed to integrate blood gene expression and clinical biomarkers in a way that provides new mechanistic insights regarding the different asthma endotypes. For example, we describe metabolic syndrome-induced systemic inflammation as an associated factor in three of the four asthma endotypes. Context provided by the clinical biomarker data was essential in interpreting gene expression patterns and identifying putative endotypes, which emphasizes the importance of integrated approaches when studying complex disease etiologies. These synthesized patterns of gene expression and clinical markers from our research may lead to development of novel serum-based biomarker panels.

  20. Data driven models of the performance and repeatability of NIF high foot implosions

    Science.gov (United States)

    Gaffney, Jim; Casey, Dan; Callahan, Debbie; Hartouni, Ed; Ma, Tammy; Spears, Brian

    2015-11-01

    Recent high foot (HF) inertial confinement fusion (ICF) experiments performed at the national ignition facility (NIF) have consisted of enough laser shots that a data-driven analysis of capsule performance is feasible. In this work we use 20-30 individual implosions of similar design, spanning laser drive energies from 1.2 to 1.8 MJ, to quantify our current understanding of the behavior of HF ICF implosions. We develop a probabilistic model for the projected performance of a given implosion and use it to quantify uncertainties in predicted performance including shot-shot variations and observation uncertainties. We investigate the statistical significance of the observed performance differences between different laser pulse shapes, ablator materials, and capsule designs. Finally, using a cross-validation technique, we demonstrate that 5-10 repeated shots of a similar design are required before real trends in the data can be distinguished from shot-shot variations. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-ABS-674957.

  1. A Dynamic Remote Sensing Data-Driven Approach for Oil Spill Simulation in the Sea

    Directory of Open Access Journals (Sweden)

    Jining Yan

    2015-05-01

    Full Text Available In view of the fact that oil spill remote sensing could only generate the oil slick information at a specific time and that traditional oil spill simulation models were not designed to deal with dynamic conditions, a dynamic data-driven application system (DDDAS was introduced. The DDDAS entails both the ability to incorporate additional data into an executing application and, in reverse, the ability of applications to dynamically steer the measurement process. Based on the DDDAS, combing a remote sensor system that detects oil spills with a numerical simulation, an integrated data processing, analysis, forecasting and emergency response system was established. Once an oil spill accident occurs, the DDDAS-based oil spill model receives information about the oil slick extracted from the dynamic remote sensor data in the simulation. Through comparison, information fusion and feedback updates, continuous and more precise oil spill simulation results can be obtained. Then, the simulation results can provide help for disaster control and clean-up. The Penglai, Xingang and Suizhong oil spill results showed our simulation model could increase the prediction accuracy and reduce the error caused by empirical parameters in existing simulation systems. Therefore, the DDDAS-based detection and simulation system can effectively improve oil spill simulation and diffusion forecasting, as well as provide decision-making information and technical support for emergency responses to oil spills.

  2. A data-driven approach for evaluating multi-modal therapy in traumatic brain injury

    Science.gov (United States)

    Haefeli, Jenny; Ferguson, Adam R.; Bingham, Deborah; Orr, Adrienne; Won, Seok Joon; Lam, Tina I.; Shi, Jian; Hawley, Sarah; Liu, Jialing; Swanson, Raymond A.; Massa, Stephen M.

    2017-01-01

    Combination therapies targeting multiple recovery mechanisms have the potential for additive or synergistic effects, but experimental design and analyses of multimodal therapeutic trials are challenging. To address this problem, we developed a data-driven approach to integrate and analyze raw source data from separate pre-clinical studies and evaluated interactions between four treatments following traumatic brain injury. Histologic and behavioral outcomes were measured in 202 rats treated with combinations of an anti-inflammatory agent (minocycline), a neurotrophic agent (LM11A-31), and physical therapy consisting of assisted exercise with or without botulinum toxin-induced limb constraint. Data was curated and analyzed in a linked workflow involving non-linear principal component analysis followed by hypothesis testing with a linear mixed model. Results revealed significant benefits of the neurotrophic agent LM11A-31 on learning and memory outcomes after traumatic brain injury. In addition, modulations of LM11A-31 effects by co-administration of minocycline and by the type of physical therapy applied reached statistical significance. These results suggest a combinatorial effect of drug and physical therapy interventions that was not evident by univariate analysis. The study designs and analytic techniques applied here form a structured, unbiased, internally validated workflow that may be applied to other combinatorial studies, both in animals and humans. PMID:28205533

  3. A Data-Driven Reliability Estimation Approach for Phased-Mission Systems

    Directory of Open Access Journals (Sweden)

    Hua-Feng He

    2014-01-01

    Full Text Available We attempt to address the issues associated with reliability estimation for phased-mission systems (PMS and present a novel data-driven approach to achieve reliability estimation for PMS using the condition monitoring information and degradation data of such system under dynamic operating scenario. In this sense, this paper differs from the existing methods only considering the static scenario without using the real-time information, which aims to estimate the reliability for a population but not for an individual. In the presented approach, to establish a linkage between the historical data and real-time information of the individual PMS, we adopt a stochastic filtering model to model the phase duration and obtain the updated estimation of the mission time by Bayesian law at each phase. At the meanwhile, the lifetime of PMS is estimated from degradation data, which are modeled by an adaptive Brownian motion. As such, the mission reliability can be real time obtained through the estimated distribution of the mission time in conjunction with the estimated lifetime distribution. We demonstrate the usefulness of the developed approach via a numerical example.

  4. Dynamic model reduction using data-driven Loewner-framework applied to thermally morphing structures

    Science.gov (United States)

    Phoenix, Austin A.; Tarazaga, Pablo A.

    2017-05-01

    The work herein proposes the use of the data-driven Loewner-framework for reduced order modeling as applied to dynamic Finite Element Models (FEM) of thermally morphing structures. The Loewner-based modeling approach is computationally efficient and accurately constructs reduced models using analytical output data from a FEM. This paper details the two-step process proposed in the Loewner approach. First, a random vibration FEM simulation is used as the input for the development of a Single Input Single Output (SISO) data-based dynamic Loewner state space model. Second, an SVD-based truncation is used on the Loewner state space model, such that the minimal, dynamically representative, state space model is achieved. For this second part, varying levels of reduction are generated and compared. The work herein can be extended to model generation using experimental measurements by replacing the FEM output data in the first step and following the same procedure. This method will be demonstrated on two thermally morphing structures, a rigidly fixed hexapod in multiple geometric configurations and a low mass anisotropic morphing boom. This paper is working to detail the method and identify the benefits of the reduced model methodology.

  5. A priori data-driven multi-clustered reservoir generation algorithm for echo state network.

    Directory of Open Access Journals (Sweden)

    Xiumin Li

    Full Text Available Echo state networks (ESNs with multi-clustered reservoir topology perform better in reservoir computing and robustness than those with random reservoir topology. However, these ESNs have a complex reservoir topology, which leads to difficulties in reservoir generation. This study focuses on the reservoir generation problem when ESN is used in environments with sufficient priori data available. Accordingly, a priori data-driven multi-cluster reservoir generation algorithm is proposed. The priori data in the proposed algorithm are used to evaluate reservoirs by calculating the precision and standard deviation of ESNs. The reservoirs are produced using the clustering method; only the reservoir with a better evaluation performance takes the place of a previous one. The final reservoir is obtained when its evaluation score reaches the preset requirement. The prediction experiment results obtained using the Mackey-Glass chaotic time series show that the proposed reservoir generation algorithm provides ESNs with extra prediction precision and increases the structure complexity of the network. Further experiments also reveal the appropriate values of the number of clusters and time window size to obtain optimal performance. The information entropy of the reservoir reaches the maximum when ESN gains the greatest precision.

  6. A data-driven method to characterize turbulence-caused uncertainty in wind power generation

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Jie; Jain, Rishabh; Hodge, Bri-Mathias

    2016-10-01

    A data-driven methodology is developed to analyze how ambient and wake turbulence affect the power generation of wind turbine(s). Using supervisory control and data acquisition (SCADA) data from a wind plant, we select two sets of wind velocity and power data for turbines on the edge of the plant that resemble (i) an out-of-wake scenario and (ii) an in-wake scenario. For each set of data, two surrogate models are developed to represent the turbine(s) power generation as a function of (i) the wind speed and (ii) the wind speed and turbulence intensity. Three types of uncertainties in turbine(s) power generation are investigated: (i) the uncertainty in power generation with respect to the reported power curve; (ii) the uncertainty in power generation with respect to the estimated power response that accounts for only mean wind speed; and (iii) the uncertainty in power generation with respect to the estimated power response that accounts for both mean wind speed and turbulence intensity. Results show that (i) the turbine(s) generally produce more power under the in-wake scenario than under the out-of-wake scenario with the same wind speed; and (ii) there is relatively more uncertainty in the power generation under the in-wake scenario than under the out-of-wake scenario.

  7. Cloudweaver: Adaptive and Data-Driven Workload Manager for Generic Clouds

    Science.gov (United States)

    Li, Rui; Chen, Lei; Li, Wen-Syan

    Cloud computing denotes the latest trend in application development for parallel computing on massive data volumes. It relies on clouds of servers to handle tasks that used to be managed by an individual server. With cloud computing, software vendors can provide business intelligence and data analytic services for internet scale data sets. Many open source projects, such as Hadoop, offer various software components that are essential for building a cloud infrastructure. Current Hadoop (and many others) requires users to configure cloud infrastructures via programs and APIs and such configuration is fixed during the runtime. In this chapter, we propose a workload manager (WLM), called CloudWeaver, which provides automated configuration of a cloud infrastructure for runtime execution. The workload management is data-driven and can adapt to dynamic nature of operator throughput during different execution phases. CloudWeaver works for a single job and a workload consisting of multiple jobs running concurrently, which aims at maximum throughput using a minimum set of processors.

  8. Data-Driven Simulation-Enhanced Optimization of People-Based Print Production Service

    Science.gov (United States)

    Rai, Sudhendu

    This paper describes a systematic six-step data-driven simulation-based methodology for optimizing people-based service systems on a large distributed scale that exhibit high variety and variability. The methodology is exemplified through its application within the printing services industry where it has been successfully deployed by Xerox Corporation across small, mid-sized and large print shops generating over 250 million in profits across the customer value chain. Each step of the methodology consisting of innovative concepts co-development and testing in partnership with customers, development of software and hardware tools to implement the innovative concepts, establishment of work-process and practices for customer-engagement and service implementation, creation of training and infrastructure for large scale deployment, integration of the innovative offering within the framework of existing corporate offerings and lastly the monitoring and deployment of the financial and operational metrics for estimating the return-on-investment and the continual renewal of the offering are described in detail.

  9. A novel data-driven approach to model error estimation in Data Assimilation

    Science.gov (United States)

    Pathiraja, Sahani; Moradkhani, Hamid; Marshall, Lucy; Sharma, Ashish

    2016-04-01

    Error characterisation is a fundamental component of Data Assimilation (DA) studies. Effectively describing model error statistics has been a challenging area, with many traditional methods requiring some level of subjectivity (for instance in defining the error covariance structure). Recent advances have focused on removing the need for tuning of error parameters, although there are still some outstanding issues. Many methods focus only on the first and second moments, and rely on assuming multivariate Gaussian statistics. We propose a non-parametric, data-driven framework to estimate the full distributional form of model error, ie. the transition density p(xt|xt-1). All sources of uncertainty associated with the model simulations are considered, without needing to assign error characteristics/devise stochastic perturbations for individual components of model uncertainty (eg. input, parameter and structural). A training period is used to derive the error distribution of observed variables, conditioned on (potentially hidden) states. Errors in hidden states are estimated from the conditional distribution of observed variables using non-linear optimization. The framework is discussed in detail, and an application to a hydrologic case study with hidden states for one-day ahead streamflow prediction is presented. Results demonstrate improved predictions and more realistic uncertainty bounds compared to a standard tuning approach.

  10. The Cannon 2: A data-driven model of stellar spectra for detailed chemical abundance analyses

    CERN Document Server

    Casey, Andrew R; Ness, Melissa; Rix, Hans-Walter; Ho, Anna Q Y; Gilmore, Gerry

    2016-01-01

    We have shown that data-driven models are effective for inferring physical attributes of stars (labels; Teff, logg, [M/H]) from spectra, even when the signal-to-noise ratio is low. Here we explore whether this is possible when the dimensionality of the label space is large (Teff, logg, and 15 abundances: C, N, O, Na, Mg, Al, Si, S, K, Ca, Ti, V, Mn, Fe, Ni) and the model is non-linear in its response to abundance and parameter changes. We adopt ideas from compressed sensing to limit overall model complexity while retaining model freedom. The model is trained with a set of 12,681 red-giant stars with high signal-to-noise spectroscopic observations and stellar parameters and abundances taken from the APOGEE Survey. We find that we can successfully train and use a model with 17 stellar labels. Validation shows that the model does a good job of inferring all 17 labels (typical abundance precision is 0.04 dex), even when we degrade the signal-to-noise by discarding ~50% of the observing time. The model dependencie...

  11. The Cannon: A data-driven approach to stellar label determination

    CERN Document Server

    Ness, Melissa; Rix, Hans-Walter; Ho, Anna; Zasowski, Gail

    2015-01-01

    New spectroscopic surveys offer the promise of consistent stellar parameters and abundances ('stellar labels') for hundreds of thousands of stars in the Milky Way: this poses a formidable spectral modeling challenge. In many cases, there is a sub-set of reference objects for which the stellar labels are known with high(er) fidelity. We take advantage of this with The Cannon, a new data-driven approach for determining stellar labels from spectroscopic data. The Cannon learns from the 'known' labels of reference stars how the continuum-normalized spectra depend on these labels by fitting a flexible model at each wavelength; then, The Cannon uses this model to derive labels for the remaining survey stars. We illustrate The Cannon by training the model on only 543 stars in 19 clusters as reference objects, with Teff, log g and [Fe/H] as the labels, and then applying it to the spectra of 56,000 stars from APOGEE DR10. The Cannon is very accurate. Its stellar labels compare well to the stars for which APOGEE pipeli...

  12. A Data-Driven Air Transportation Delay Propagation Model Using Epidemic Process Models

    Directory of Open Access Journals (Sweden)

    B. Baspinar

    2016-01-01

    Full Text Available In air transport network management, in addition to defining the performance behavior of the system’s components, identification of their interaction dynamics is a delicate issue in both strategic and tactical decision-making process so as to decide which elements of the system are “controlled” and how. This paper introduces a novel delay propagation model utilizing epidemic spreading process, which enables the definition of novel performance indicators and interaction rates of the elements of the air transportation network. In order to understand the behavior of the delay propagation over the network at different levels, we have constructed two different data-driven epidemic models approximating the dynamics of the system: (a flight-based epidemic model and (b airport-based epidemic model. The flight-based epidemic model utilizing SIS epidemic model focuses on the individual flights where each flight can be in susceptible or infected states. The airport-centric epidemic model, in addition to the flight-to-flight interactions, allows us to define the collective behavior of the airports, which are modeled as metapopulations. In network model construction, we have utilized historical flight-track data of Europe and performed analysis for certain days involving certain disturbances. Through this effort, we have validated the proposed delay propagation models under disruptive events.

  13. Data-Driven Baseline Estimation of Residential Buildings for Demand Response

    Directory of Open Access Journals (Sweden)

    Saehong Park

    2015-09-01

    Full Text Available The advent of advanced metering infrastructure (AMI generates a large volume of data related with energy service. This paper exploits data mining approach for customer baseline load (CBL estimation in demand response (DR management. CBL plays a significant role in measurement and verification process, which quantifies the amount of demand reduction and authenticates the performance. The proposed data-driven baseline modeling is based on the unsupervised learning technique. Specifically we leverage both the self organizing map (SOM and K-means clustering for accurate estimation. This two-level approach efficiently reduces the large data set into representative weight vectors in SOM, and then these weight vectors are clustered by K-means clustering to find the load pattern that would be similar to the potential load pattern of the DR event day. To verify the proposed method, we conduct nationwide scale experiments where three major cities’ residential consumption is monitored by smart meters. Our evaluation compares the proposed solution with the various types of day matching techniques, showing that our approach outperforms the existing methods by up to a 68.5% lower error rate.

  14. Data-driven inline optimization of the manufacturing process of car body parts

    Science.gov (United States)

    Purr, S.; Wendt, A.; Meinhardt, J.; Moelzl, K.; Werner, A.; Hagenah, H.; Merklein, M.

    2016-11-01

    The manufacturing process of car body parts needs to be adaptable during production because of fluctuating variables; finding the most suitable settings is often expensive. The cause-effect relation between variables and process results is currently unknown; thus, any measure taken to adjust the process is necessarily subjective and dependent on operator experience. To investigate the correlations involved, a data mining system that can detect influences and determine the quality of resulting parts is integrated into the series process. The collected data is used to analyze causes, predict defects, and optimize the overall process. In this paper, a data-driven method is proposed for the inline optimization of the manufacturing process of car body parts. The calculation of suitable settings to produce good parts is based on measurements of influencing variables, such as the characteristics of blanks. First, the available data are presented, and in the event of quality issues, current procedures are investigated. Thereafter, data mining techniques are applied to identify models that link occurring fluctuations and appropriate measures to adapt the process so that it addresses such fluctuations. Consequently, a method is derived for providing objective information on appropriate process parameters.

  15. Quantitative Data-driven Utilization of Hematologic Labs Following Lumbar Fusion.

    Science.gov (United States)

    Yew, Andrew Y; Hoffman, Haydn; Li, Charles; McBride, Duncan Q; Holly, Langston T; Lu, Daniel C

    2015-05-01

    Retrospective case series. Large national inpatient databases estimate that approximately 200,000 lumbar fusions are performed annually in the United States alone. It is common for surgeons to routinely order postoperative hematologic studies to rule out postoperative anemia despite a paucity of data to support routine laboratory utilization. To describe quantitative criteria to guide postoperative utilization of hematologic laboratory assessments. A retrospective analysis of 490 consecutive lumbar fusion procedures performed at a single institution by 3 spine surgeons was performed. Inclusion criteria included instrumented and noninstrumented lumbar fusions performed for any etiology. Data were acquired on preoperative and postoperative hematocrit, platelets, and international normalized ratio as well as age, sex, number of levels undergoing operation, indication for surgery, and intraoperative blood loss. Multivariate logistic regression was performed to determine correlation to postoperative transfusion requirement. A total of 490 patients undergoing lumbar fusion were identified. Twenty-five patients (5.1%) required postoperative transfusion. No patients required readmission for anemia or transfusion. Multivariate logistic regression analysis demonstrated that reduced preoperative hematocrit and increased intraoperative blood loss were independent predictors of postoperative transfusion requirement. Intraoperative blood loss >1000 mL had an odds ratio of 8.9 (P=0.013), and preoperative hematocrit quantitative preoperative and intraoperative criteria to guide data-driven utilization of postoperative hematologic studies following lumbar fusion.

  16. Data-driven spatially-adaptive metric adjustment for visual tracking.

    Science.gov (United States)

    Jiang, Nan; Liu, Wenyu

    2014-04-01

    Matching visual appearances of the target over consecutive video frames is a fundamental yet challenging task in visual tracking. Its performance largely depends on the distance metric that determines the quality of visual matching. Rather than using fixed and predefined metric, recent attempts of integrating metric learning-based trackers have shown more robust and promising results, as the learned metric can be more discriminative. In general, these global metric adjustment methods are computationally demanding in real-time visual tracking tasks, and they tend to underfit the data when the target exhibits dynamic appearance variation. This paper presents a nonparametric data-driven local metric adjustment method. The proposed method finds a spatially adaptive metric that exhibits different properties at different locations in the feature space, due to the differences of the data distribution in a local neighborhood. It minimizes the deviation of the empirical misclassification probability to obtain the optimal metric such that the asymptotic error as if using an infinite set of training samples can be approximated. Moreover, by taking the data local distribution into consideration, it is spatially adaptive. Integrating this new local metric learning method into target tracking leads to efficient and robust tracking performance. Extensive experiments have demonstrated the superiority and effectiveness of the proposed tracking method in various tracking scenarios.

  17. Data-driven modeling based on volterra series for multidimensional blast furnace system.

    Science.gov (United States)

    Gao, Chuanhou; Jian, Ling; Liu, Xueyi; Chen, Jiming; Sun, Youxian

    2011-12-01

    The multidimensional blast furnace system is one of the most complex industrial systems and, as such, there are still many unsolved theoretical and experimental difficulties, such as silicon prediction and blast furnace automation. For this reason, this paper is concerned with developing data-driven models based on the Volterra series for this complex system. Three kinds of different low-order Volterra filters are designed to predict the hot metal silicon content collected from a pint-sized blast furnace, in which a sliding window technique is used to update the filter kernels timely. The predictive results indicate that the linear Volterra predictor can describe the evolvement of the studied silicon sequence effectively with the high percentage of hitting the target, very low root mean square error and satisfactory confidence level about the reliability of the future prediction. These advantages and the low computational complexity reveal that the sliding-window linear Volterra filter is full of potential for multidimensional blast furnace system. Also, the lack of the constructed Volterra models is analyzed and the possible direction of future investigation is pointed out.

  18. Data-driven research: open data opportunities for growing knowledge, and ethical issues that arise

    Directory of Open Access Journals (Sweden)

    Aleksandra K Krotoski

    2012-03-01

    Full Text Available The Open Data Initiative in the UK offers incredible opportunities for researchers who seek to gain insight from the wealth of public and institutional data that is increasingly available from government sources – like NHS prescription and GP referral information – or the information we freely offer online. Coupled with digital technologies that can help teams generate connections and collaborations, these data sets can support large-scale innovation and insight. However, by looking at a comparable explosion in data-driven journalism, this article hopes to highlight some of the ethical questions that may arise from big data. The popularity of the social networking service Twitter to share information during the riots in London in August 2011 produced a real-time record of sense-making of enormous interest to academics, reporters and to Twitter users themselves; however, when analysed and published, academic and journalistic interpretations of aggregate content was transformed and individualized, with potential implications for a user-base that was unaware it was being observed. Similar issues arise in academic research with human subjects. Here, the questions of reflexivity in data design and research ethics are considered through a popular media frame.

  19. Data-driven dissection of emission-line regions in Seyfert galaxies

    CERN Document Server

    Villarroel, Beatriz

    2016-01-01

    Indirectly resolving the line-emitting gas regions in distant Active Galactic Nuclei (AGN) requires both high-resolution photometry and spectroscopy (i.e. through reverberation mapping). Emission in AGN originates on widely different scales; the broad-line region (BLR) has a typical radius less than a few parsec, the narrow-line region (NLR) extends out to hundreds of parsecs. But emission also appears on large scales from heated nebulae in the host galaxies (tenths of kpc). We propose a novel, data-driven method based on correlations between emission-line fluxes to identify which of the emission lines are produced in the same kind of emission-line regions. We test the method on Seyfert galaxies from the Sloan Digital Sky Survey (SDSS) Data Release 7 (DR7) and Galaxy Zoo project. We demonstrate the usefulness of the method on Seyfert-1s and Seyfert-2 objects, showing similar narrow-line regions (NLRs). Preliminary results from comparing Seyfert-2s in spiral and elliptical galaxy hosts suggest that the presenc...

  20. Deriving Flood-Mediated Connectivity between River Channels and Floodplains: Data-Driven Approaches

    Science.gov (United States)

    Zhao, Tongtiegang; Shao, Quanxi; Zhang, Yongyong

    2017-03-01

    The flood-mediated connectivity between river channels and floodplains plays a fundamental role in flood hazard mapping and exerts profound ecological effects. The classic nearest neighbor search (NNS) fails to derive this connectivity because of spatial heterogeneity and continuity. We develop two novel data-driven connectivity-deriving approaches, namely, progressive nearest neighbor search (PNNS) and progressive iterative nearest neighbor search (PiNNS). These approaches are illustrated through a case study in Northern Australia. First, PNNS and PiNNS are employed to identify flood pathways on floodplains through forward tracking. That is, progressive search is performed to associate newly inundated cells in each time step to previously inundated cells. In particular, iterations in PiNNS ensure that the connectivity is continuous - the connection between any two cells along the pathway is built through intermediate inundated cells. Second, inundated floodplain cells are collectively connected to river channel cells through backward tracing. Certain river channel sections are identified to connect to a large number of inundated floodplain cells. That is, the floodwater from these sections causes widespread floodplain inundation. Our proposed approaches take advantage of spatial-temporal data. They can be applied to achieve connectivity from hydro-dynamic and remote sensing data and assist in river basin planning and management.

  1. Data-driven analysis of functional brain interactions during free listening to music and speech.

    Science.gov (United States)

    Fang, Jun; Hu, Xintao; Han, Junwei; Jiang, Xi; Zhu, Dajiang; Guo, Lei; Liu, Tianming

    2015-06-01

    Natural stimulus functional magnetic resonance imaging (N-fMRI) such as fMRI acquired when participants were watching video streams or listening to audio streams has been increasingly used to investigate functional mechanisms of the human brain in recent years. One of the fundamental challenges in functional brain mapping based on N-fMRI is to model the brain's functional responses to continuous, naturalistic and dynamic natural stimuli. To address this challenge, in this paper we present a data-driven approach to exploring functional interactions in the human brain during free listening to music and speech streams. Specifically, we model the brain responses using N-fMRI by measuring the functional interactions on large-scale brain networks with intrinsically established structural correspondence, and perform music and speech classification tasks to guide the systematic identification of consistent and discriminative functional interactions when multiple subjects were listening music and speech in multiple categories. The underlying premise is that the functional interactions derived from N-fMRI data of multiple subjects should exhibit both consistency and discriminability. Our experimental results show that a variety of brain systems including attention, memory, auditory/language, emotion, and action networks are among the most relevant brain systems involved in classic music, pop music and speech differentiation. Our study provides an alternative approach to investigating the human brain's mechanism in comprehension of complex natural music and speech.

  2. Simulating Flying Insects Using Dynamics and Data-Driven Noise Modeling to Generate Diverse Collective Behaviors.

    Science.gov (United States)

    Ren, Jiaping; Wang, Xinjie; Jin, Xiaogang; Manocha, Dinesh

    2016-01-01

    We present a biologically plausible dynamics model to simulate swarms of flying insects. Our formulation, which is based on biological conclusions and experimental observations, is designed to simulate large insect swarms of varying densities. We use a force-based model that captures different interactions between the insects and the environment and computes collision-free trajectories for each individual insect. Furthermore, we model the noise as a constructive force at the collective level and present a technique to generate noise-induced insect movements in a large swarm that are similar to those observed in real-world trajectories. We use a data-driven formulation that is based on pre-recorded insect trajectories. We also present a novel evaluation metric and a statistical validation approach that takes into account various characteristics of insect motions. In practice, the combination of Curl noise function with our dynamics model is used to generate realistic swarm simulations and emergent behaviors. We highlight its performance for simulating large flying swarms of midges, fruit fly, locusts and moths and demonstrate many collective behaviors, including aggregation, migration, phase transition, and escape responses.

  3. A Data-Driven Diagnostic Framework for Wind Turbine Structures: A Holistic Approach

    Directory of Open Access Journals (Sweden)

    Simona Bogoevska

    2017-03-01

    Full Text Available The complex dynamics of operational wind turbine (WT structures challenges the applicability of existing structural health monitoring (SHM strategies for condition assessment. At the center of Europe’s renewable energy strategic planning, WT systems call for implementation of strategies that may describe the WT behavior in its complete operational spectrum. The framework proposed in this paper relies on the symbiotic treatment of acting environmental/operational variables and the monitored vibration response of the structure. The approach aims at accurate simulation of the temporal variability characterizing the WT dynamics, and subsequently at the tracking of the evolution of this variability in a longer-term horizon. The bi-component analysis tool is applied on long-term data, collected as part of continuous monitoring campaigns on two actual operating WT structures located in different sites in Germany. The obtained data-driven structural models verify the potential of the proposed strategy for development of an automated SHM diagnostic tool.

  4. Using drug exposure for predicting drug resistance - A data-driven genotypic interpretation tool.

    Science.gov (United States)

    Pironti, Alejandro; Pfeifer, Nico; Walter, Hauke; Jensen, Björn-Erik O; Zazzi, Maurizio; Gomes, Perpétua; Kaiser, Rolf; Lengauer, Thomas

    2017-01-01

    Antiretroviral treatment history and past HIV-1 genotypes have been shown to be useful predictors for the success of antiretroviral therapy. However, this information may be unavailable or inaccurate, particularly for patients with multiple treatment lines often attending different clinics. We trained statistical models for predicting drug exposure from current HIV-1 genotype. These models were trained on 63,742 HIV-1 nucleotide sequences derived from patients with known therapeutic history, and on 6,836 genotype-phenotype pairs (GPPs). The mean performance regarding prediction of drug exposure on two test sets was 0.78 and 0.76 (ROC-AUC), respectively. The mean correlation to phenotypic resistance in GPPs was 0.51 (PhenoSense) and 0.46 (Antivirogram). Performance on prediction of therapy-success on two test sets based on genetic susceptibility scores was 0.71 and 0.63 (ROC-AUC), respectively. Compared to geno2pheno[resistance], our novel models display a similar or superior performance. Our models are freely available on the internet via www.geno2pheno.org. They can be used for inferring which drug compounds have previously been used by an HIV-1-infected patient, for predicting drug resistance, and for selecting an optimal antiretroviral therapy. Our data-driven models can be periodically retrained without expert intervention as clinical HIV-1 databases are updated and therefore reduce our dependency on hard-to-obtain GPPs.

  5. Combining knowledge- and data-driven methods for de-identification of clinical narratives.

    Science.gov (United States)

    Dehghan, Azad; Kovacevic, Aleksandar; Karystianis, George; Keane, John A; Nenadic, Goran

    2015-12-01

    A recent promise to access unstructured clinical data from electronic health records on large-scale has revitalized the interest in automated de-identification of clinical notes, which includes the identification of mentions of Protected Health Information (PHI). We describe the methods developed and evaluated as part of the i2b2/UTHealth 2014 challenge to identify PHI defined by 25 entity types in longitudinal clinical narratives. Our approach combines knowledge-driven (dictionaries and rules) and data-driven (machine learning) methods with a large range of features to address de-identification of specific named entities. In addition, we have devised a two-pass recognition approach that creates a patient-specific run-time dictionary from the PHI entities identified in the first step with high confidence, which is then used in the second pass to identify mentions that lack specific clues. The proposed method achieved the overall micro F1-measures of 91% on strict and 95% on token-level evaluation on the test dataset (514 narratives). Whilst most PHI entities can be reliably identified, particularly challenging were mentions of Organizations and Professions. Still, the overall results suggest that automated text mining methods can be used to reliably process clinical notes to identify personal information and thus providing a crucial step in large-scale de-identification of unstructured data for further clinical and epidemiological studies.

  6. Data-driven model-independent searches for long-lived particles at the LHC

    Science.gov (United States)

    Coccaro, Andrea; Curtin, David; Lubatti, H. J.; Russell, Heather; Shelton, Jessie

    2016-12-01

    Neutral long-lived particles (LLPs) are highly motivated by many beyond the Standard Model scenarios, such as theories of supersymmetry, baryogenesis, and neutral naturalness, and present both tremendous discovery opportunities and experimental challenges for the LHC. A major bottleneck for current LLP searches is the prediction of Standard Model backgrounds, which are often impossible to simulate accurately. In this paper, we propose a general strategy for obtaining differential, data-driven background estimates in LLP searches, thereby notably extending the range of LLP masses and lifetimes that can be discovered at the LHC. We focus on LLPs decaying in the ATLAS muon system, where triggers providing both signal and control samples are available at LHC run 2. While many existing searches require two displaced decays, a detailed knowledge of backgrounds will allow for very inclusive searches that require just one detected LLP decay. As we demonstrate for the h →X X signal model of LLP pair production in exotic Higgs decays, this results in dramatic sensitivity improvements for proper lifetimes ≳10 m . In theories of neutral naturalness, this extends reach to glueball masses far below the b ¯b threshold. Our strategy readily generalizes to other signal models and other detector subsystems. This framework therefore lends itself to the development of a systematic, model-independent LLP search program, in analogy to the highly successful simplified-model framework of prompt searches.

  7. Parcellation of fMRI Datasets with ICA and PLS-A Data Driven Approach

    CERN Document Server

    Ji, Yongnan; Aickelin, Uwe; Pitiot, Alain

    2010-01-01

    Inter-subject parcellation of functional Magnetic Resonance Imaging (fMRI) data based on a standard General Linear Model (GLM)and spectral clustering was recently proposed as a means to alleviate the issues associated with spatial normalization in fMRI. However, for all its appeal, a GLM-based parcellation approach introduces its own biases, in the form of a priori knowledge about the shape of Hemodynamic Response Function (HRF) and task-related signal changes, or about the subject behaviour during the task. In this paper, we introduce a data-driven version of the spectral clustering parcellation, based on Independent Component Analysis (ICA) and Partial Least Squares (PLS) instead of the GLM. First, a number of independent components are automatically selected. Seed voxels are then obtained from the associated ICA maps and we compute the PLS latent variables between the fMRI signal of the seed voxels (which covers regional variations of the HRF) and the principal components of the signal across all voxels. F...

  8. Assessment of cardiovascular risk based on a data-driven knowledge discovery approach.

    Science.gov (United States)

    Mendes, D; Paredes, S; Rocha, T; Carvalho, P; Henriques, J; Cabiddu, R; Morais, J

    2015-01-01

    The cardioRisk project addresses the development of personalized risk assessment tools for patients who have been admitted to the hospital with acute myocardial infarction. Although there are models available that assess the short-term risk of death/new events for such patients, these models were established in circumstances that do not take into account the present clinical interventions and, in some cases, the risk factors used by such models are not easily available in clinical practice. The integration of the existing risk tools (applied in the clinician's daily practice) with data-driven knowledge discovery mechanisms based on data routinely collected during hospitalizations, will be a breakthrough in overcoming some of these difficulties. In this context, the development of simple and interpretable models (based on recent datasets), unquestionably will facilitate and will introduce confidence in this integration process. In this work, a simple and interpretable model based on a real dataset is proposed. It consists of a decision tree model structure that uses a reduced set of six binary risk factors. The validation is performed using a recent dataset provided by the Portuguese Society of Cardiology (11113 patients), which originally comprised 77 risk factors. A sensitivity, specificity and accuracy of, respectively, 80.42%, 77.25% and 78.80% were achieved showing the effectiveness of the approach.

  9. Data-driven quantification of the robustness and sensitivity of cell signaling networks

    Science.gov (United States)

    Mukherjee, Sayak; Seok, Sang-Cheol; Vieland, Veronica J.; Das, Jayajit

    2013-12-01

    Robustness and sensitivity of responses generated by cell signaling networks has been associated with survival and evolvability of organisms. However, existing methods analyzing robustness and sensitivity of signaling networks ignore the experimentally observed cell-to-cell variations of protein abundances and cell functions or contain ad hoc assumptions. We propose and apply a data-driven maximum entropy based method to quantify robustness and sensitivity of Escherichia coli (E. coli) chemotaxis signaling network. Our analysis correctly rank orders different models of E. coli chemotaxis based on their robustness and suggests that parameters regulating cell signaling are evolutionary selected to vary in individual cells according to their abilities to perturb cell functions. Furthermore, predictions from our approach regarding distribution of protein abundances and properties of chemotactic responses in individual cells based on cell population averaged data are in excellent agreement with their experimental counterparts. Our approach is general and can be used to evaluate robustness as well as generate predictions of single cell properties based on population averaged experimental data in a wide range of cell signaling systems.

  10. Legitimising neural network river forecasting models: a new data-driven mechanistic modelling framework

    Science.gov (United States)

    Mount, N. J.; Dawson, C. W.; Abrahart, R. J.

    2013-01-01

    In this paper we address the difficult problem of gaining an internal, mechanistic understanding of a neural network river forecasting (NNRF) model. Neural network models in hydrology have long been criticised for their black-box character, which prohibits adequate understanding of their modelling mechanisms and has limited their broad acceptance by hydrologists. In response, we here present a new, data-driven mechanistic modelling (DDMM) framework that incorporates an evaluation of the legitimacy of a neural network's internal modelling mechanism as a core element in the model development process. The framework is exemplified for two NNRF modelling scenarios, and uses a novel adaptation of first order, partial derivate, relative sensitivity analysis methods as the means by which each model's mechanistic legitimacy is explored. The results demonstrate the limitations of standard, goodness-of-fit validation procedures applied by NNRF modellers, by highlighting how the internal mechanisms of complex models that produce the best fit scores can have much lower legitimacy than simpler counterparts whose scores are only slightly inferior. The study emphasises the urgent need for better mechanistic understanding of neural network-based hydrological models and the further development of methods for elucidating their mechanisms.

  11. Application of a Data-Driven Fuzzy Control Design to a Wind Turbine Benchmark Model

    Directory of Open Access Journals (Sweden)

    Silvio Simani

    2012-01-01

    Full Text Available In general, the modelling of wind turbines is a challenging task, since they are complex dynamic systems, whose aerodynamics are nonlinear and unsteady. Accurate models should contain many degrees of freedom, and their control algorithm design must account for these complexities. However, these algorithms must capture the most important turbine dynamics without being too complex and unwieldy, mainly when they have to be implemented in real-time applications. The first contribution of this work consists of providing an application example of the design and testing through simulations, of a data-driven fuzzy wind turbine control. In particular, the strategy is based on fuzzy modelling and identification approaches to model-based control design. Fuzzy modelling and identification can represent an alternative for developing experimental models of complex systems, directly derived directly from measured input-output data without detailed system assumptions. Regarding the controller design, this paper suggests again a fuzzy control approach for the adjustment of both the wind turbine blade pitch angle and the generator torque. The effectiveness of the proposed strategies is assessed on the data sequences acquired from the considered wind turbine benchmark. Several experiments provide the evidence of the advantages of the proposed regulator with respect to different control methods.

  12. Parallel workflows for data-driven structural equation modeling in functional neuroimaging

    Directory of Open Access Journals (Sweden)

    Sarah Kenny

    2009-10-01

    Full Text Available We present a computational framework suitable for a data-driven approach to structural equation modeling (SEM and describe several workflows for modeling functional magnetic resonance imaging (fMRI data within this framework. The Computational Neuroscience Applications Research Infrastructure (CNARI employs a high-level scripting language called Swift, which is capable of spawning hundreds of thousands of simultaneous R processes (R Core Development Team, 2008, consisting of self-contained structural equation models, on a high performance computing system (HPC. These self-contained R processing jobs are data objects generated by OpenMx, a plug-in for R, which can generate a single model object containing the matrices and algebraic information necessary to estimate parameters of the model. With such an infrastructure in place a structural modeler may begin to investigate exhaustive searches of the model space. Specific applications of the infrastructure, statistics related to model fit, and limitations are discussed in relation to exhaustive SEM. In particular, we discuss how workflow management techniques can help to solve large computational problems in neuroimaging.

  13. Data-Driven Contextual Valence Shifter Quantification for Multi-Theme Sentiment Analysis

    Science.gov (United States)

    Yu, Hongkun; Shang, Jingbo; Hsu, Meichun; Castellanos, Malú; Han, Jiawei

    2017-01-01

    Users often write reviews on different themes involving linguistic structures with complex sentiments. The sentiment polarity of a word can be different across themes. Moreover, contextual valence shifters may change sentiment polarity depending on the contexts that they appear in. Both challenges cannot be modeled effectively and explicitly in traditional sentiment analysis. Studying both phenomena requires multi-theme sentiment analysis at the word level, which is very interesting but significantly more challenging than overall polarity classification. To simultaneously resolve the multi-theme and sentiment shifting problems, we propose a data-driven framework to enable both capabilities: (1) polarity predictions of the same word in reviews of different themes, and (2) discovery and quantification of contextual valence shifters. The framework formulates multi-theme sentiment by factorizing the review sentiments with theme/word embeddings and then derives the shifter effect learning problem as a logistic regression. The improvement of sentiment polarity classification accuracy demonstrates not only the importance of multi-theme and sentiment shifting, but also effectiveness of our framework. Human evaluations and case studies further show the success of multi-theme word sentiment predictions and automatic effect quantification of contextual valence shifters. PMID:28232874

  14. Modern data-driven decision support systems: the role of computing with words and computational linguistics

    Science.gov (United States)

    Kacprzyk, Janusz; Zadrożny, Sławomir

    2010-05-01

    We present how the conceptually and numerically simple concept of a fuzzy linguistic database summary can be a very powerful tool for gaining much insight into the very essence of data. The use of linguistic summaries provides tools for the verbalisation of data analysis (mining) results which, in addition to the more commonly used visualisation, e.g. via a graphical user interface, can contribute to an increased human consistency and ease of use, notably for supporting decision makers via the data-driven decision support system paradigm. Two new relevant aspects of the analysis are also outlined which were first initiated by the authors. First, following Kacprzyk and Zadrożny, it is further considered how linguistic data summarisation is closely related to some types of solutions used in natural language generation (NLG). This can make it possible to use more and more effective and efficient tools and techniques developed in NLG. Second, similar remarks are given on relations to systemic functional linguistics. Moreover, following Kacprzyk and Zadrożny, comments are given on an extremely relevant aspect of scalability of linguistic summarisation of data, using a new concept of a conceptual scalability.

  15. NERI PROJECT 99-119. TASK 2. DATA-DRIVEN PREDICTION OF PROCESS VARIABLES. FINAL REPORT

    Energy Technology Data Exchange (ETDEWEB)

    Upadhyaya, B.R.

    2003-04-10

    This report describes the detailed results for task 2 of DOE-NERI project number 99-119 entitled ''Automatic Development of Highly Reliable Control Architecture for Future Nuclear Power Plants''. This project is a collaboration effort between the Oak Ridge National Laboratory (ORNL,) The University of Tennessee, Knoxville (UTK) and the North Carolina State University (NCSU). UTK is the lead organization for Task 2 under contract number DE-FG03-99SF21906. Under task 2 we completed the development of data-driven models for the characterization of sub-system dynamics for predicting state variables, control functions, and expected control actions. We have also developed the ''Principal Component Analysis (PCA)'' approach for mapping system measurements, and a nonlinear system modeling approach called the ''Group Method of Data Handling (GMDH)'' with rational functions, and includes temporal data information for transient characterization. The majority of the results are presented in detailed reports for Phases 1 through 3 of our research, which are attached to this report.

  16. Outcomes from the GLEON fellowship program. Training graduate students in data driven network science.

    Science.gov (United States)

    Dugan, H.; Hanson, P. C.; Weathers, K. C.

    2016-12-01

    In the water sciences there is a massive need for graduate students who possess the analytical and technical skills to deal with large datasets and function in the new paradigm of open, collaborative -science. The Global Lake Ecological Observatory Network (GLEON) graduate fellowship program (GFP) was developed as an interdisciplinary training program to supplement the intensive disciplinary training of traditional graduate education. The primary goal of the GFP was to train a diverse cohort of graduate students in network science, open-web technologies, collaboration, and data analytics, and importantly to provide the opportunity to use these skills to conduct collaborative research resulting in publishable scientific products. The GFP is run as a series of three week-long workshops over two years that brings together a cohort of twelve students. In addition, fellows are expected to attend and contribute to at least one international GLEON all-hands' meeting. Here, we provide examples of training modules in the GFP (model building, data QA/QC, information management, bayesian modeling, open coding/version control, national data programs), as well as scientific outputs (manuscripts, software products, and new global datasets) produced by the fellows, as well as the process by which this team science was catalyzed. Data driven education that lets students apply learned skills to real research projects reinforces concepts, provides motivation, and can benefit their publication record. This program design is extendable to other institutions and networks.

  17. Data-Driven Methods for the Detection of Causal Structures in Process Technology

    Directory of Open Access Journals (Sweden)

    Christian Kühnert

    2014-11-01

    Full Text Available In modern industrial plants, process units are strongly cross-linked with eachother, and disturbances occurring in one unit potentially become plant-wide. This can leadto a flood of alarms at the supervisory control and data acquisition system, hiding the originalfault causing the disturbance. Hence, one major aim in fault diagnosis is to backtrackthe disturbance propagation path of the disturbance and to localize the root cause of thefault. Since detecting correlation in the data is not sufficient to describe the direction of thepropagation path, cause-effect dependencies among process variables need to be detected.Process variables that show a strong causal impact on other variables in the process comeinto consideration as being the root cause. In this paper, different data-driven methods areproposed, compared and combined that can detect causal relationships in data while solelyrelying on process data. The information of causal dependencies is used for localization ofthe root cause of a fault. All proposed methods consist of a statistical part, which determineswhether the disturbance traveling from one process variable to a second is significant, and aquantitative part, which calculates the causal information the first process variable has aboutthe second. The methods are tested on simulated data from a chemical stirred-tank reactorand on a laboratory plant.

  18. An automated data-driven DSP development approach for glycoproteins from yeast.

    Science.gov (United States)

    Rajamanickam, Vignesh; Krippl, Maximillian; Herwig, Christoph; Spadiut, Oliver

    2017-08-04

    Downstream process development for recombinant glycoproteins from yeast is cumbersome due to hyperglycosylation of target proteins. In a previous study, we purified three recombinant glycoproteins from Pichia pastoris using a simple two-step flowthrough mode approach using monolithic columns. In this study, we investigated a novel automated data science approach for identifying purification conditions for such glycoproteins using monolithic columns. We performed three sets of design of experiments in analytical scale to determine the separation efficiency of monolithic columns for three different recombinant horseradish peroxidase (HRP) isoenzymes. For ease of calculation, we introduced an arbitrary term, the relative impurity removal (IR), which is representative of the amount of impurities cleared. Both, the experimental part and the data analysis were automated and took less than 40 min for each HRP isoenzyme. We tested the identified purification conditions in laboratory scale and performed respective offline analyses to verify results from analytical scale. We found a clear correlation between the IR estimated online through our novel data-driven approach and the IR determined offline. Summarizing, we present a novel methodology, applying analytical scale advantages which can be used for fast and efficient DSP development for recombinant glycoproteins from yeast without offline analyses. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. GeneWeaver: data driven alignment of cross-species genomics in biology and disease.

    Science.gov (United States)

    Baker, Erich; Bubier, Jason A; Reynolds, Timothy; Langston, Michael A; Chesler, Elissa J

    2016-01-04

    The GeneWeaver data and analytics website (www.geneweaver.org) is a publically available resource for storing, curating and analyzing sets of genes from heterogeneous data sources. The system enables discovery of relationships among genes, variants, traits, drugs, environments, anatomical structures and diseases implicitly found through gene set intersections. Since the previous review in the 2012 Nucleic Acids Research Database issue, GeneWeaver's underlying analytics platform has been enhanced, its number and variety of publically available gene set data sources has been increased, and its advanced search mechanisms have been expanded. In addition, its interface has been redesigned to take advantage of flexible web services, programmatic data access, and a refined data model for handling gene network data in addition to its original emphasis on gene set data. By enumerating the common and distinct biological molecules associated with all subsets of curated or user submitted groups of gene sets and gene networks, GeneWeaver empowers users with the ability to construct data driven descriptions of shared and unique biological processes, diseases and traits within and across species.

  20. First-principle and data-driven model- based approach in rotating machinery failure mode detection

    Directory of Open Access Journals (Sweden)

    G. Wszołek

    2010-12-01

    Full Text Available Purpose: A major concern of modern diagnostics is the use of vibration or acoustic signals generated by a machine to reveal its operating conditions. This paper presents a method which allows to periodically obtain estimates of model eigenvalues represented by complex numbers. The method is intended to diagnose rotating machinery under transient conditions.Design/methodology/approach: The method uses a parametric data-driven model, the parameters of which are estimated using operational data.Findings: Experimental results were obtained with the use of a laboratory single-disc rotor system equipped with both sliding and hydrodynamic bearings. The test rig used allows measurements of data under normal, or reference, and malfunctioning operation, including oil instabilities, rub, looseness and unbalance, to be collected.Research limitations/implications: Numerical and experimental studies performed in order to validate the method are presented in the paper. Moreover, literature and industrial case studies are analyzed to better understand vibration modes of the rotor under abnormal operating conditions. Practical implications: A model of the test rig has been developed to verify the method proposed herein and to understand the results of the experiments. Hardware realization of the proposed method was implemented as a standalone operating module developed using the Texas Instruments TMS3200LF2407 Starter Kit.Originality/value: The parametric approach was proposed instead of nonparametric one towards diagnosing of rotating machinery.

  1. Data-Driven CFD Modeling of Turbulent Flows Through Complex Structures

    CERN Document Server

    Wang, Jian-Xun

    2016-01-01

    The growth of computational resources in the past decades has expanded the application of Computational Fluid Dynamics (CFD) from the traditional fields of aerodynamics and hydrodynamics to a number of new areas. Examples range from the heat and fluid flows in nuclear reactor vessels and in data centers to the turbulence flows through wind turbine farms and coastal vegetation plants. However, in these new applications complex structures are often exist (e.g., rod bundles in reactor vessels and turbines in wind farms), which makes fully resolved, first-principle based CFD modeling prohibitively expensive. This obstacle seriously impairs the predictive capability of CFD models in these applications. On the other hand, a limited amount of measurement data is often available in the systems in the above-mentioned applications. In this work we propose a data-driven, physics-based approach to perform full field inversion on the effects of the complex structures on the flow. This is achieved by assimilating observati...

  2. VLAM-G: Interactive Data Driven Workflow Engine for Grid-Enabled Resources

    Directory of Open Access Journals (Sweden)

    Vladimir Korkhov

    2007-01-01

    Full Text Available Grid brings the power of many computers to scientists. However, the development of Grid-enabled applications requires knowledge about Grid infrastructure and low-level API to Grid services. In turn, workflow management systems provide a high-level environment for rapid prototyping of experimental computing systems. Coupling Grid and workflow paradigms is important for the scientific community: it makes the power of the Grid easily available to the end user. The paradigm of data driven workflow execution is one of the ways to enable distributed workflow on the Grid. The work presented in this paper is carried out in the context of the Virtual Laboratory for e-Science project. We present the VLAM-G workflow management system and its core component: the Run-Time System (RTS. The RTS is a dataflow driven workflow engine which utilizes Grid resources, hiding the complexity of the Grid from a scientist. Special attention is paid to the concept of dataflow and direct data streaming between distributed workflow components. We present the architecture and components of the RTS, describe the features of VLAM-G workflow execution, and evaluate the system by performance measurements and a real life use case.

  3. Critical clusters in interdependent economic sectors. A data-driven spectral clustering analysis

    Science.gov (United States)

    Oliva, Gabriele; Setola, Roberto; Panzieri, Stefano

    2016-10-01

    In this paper we develop a data-driven hierarchical clustering methodology to group the economic sectors of a country in order to highlight strongly coupled groups that are weakly coupled with other groups. Specifically, we consider an input-output representation of the coupling among the sectors and we interpret the relation among sectors as a directed graph; then we recursively apply the spectral clustering methodology over the graph, without a priori information on the number of groups that have to be obtained. In order to do this, we resort to the eigengap criterion, where a suitable number of groups is selected automatically based on the intensity and structure of the coupling among the sectors. We validate the proposed methodology considering a case study for Italy, inspecting how the coupling among clusters and sectors changes from the year 1995 to 2011, showing that in the years the Italian structure underwent deep changes, becoming more and more interdependent, i.e., a large part of the economy has become tightly coupled.

  4. Validation of data-driven computational models of social perception of faces.

    Science.gov (United States)

    Todorov, Alexander; Dotsch, Ron; Porter, Jenny M; Oosterhof, Nikolaas N; Falvello, Virginia B

    2013-08-01

    People rapidly form impressions from facial appearance, and these impressions affect social decisions. We argue that data-driven, computational models are the best available tools for identifying the source of such impressions. Here we validate seven computational models of social judgments of faces: attractiveness, competence, dominance, extroversion, likability, threat, and trustworthiness. The models manipulate both face shape and reflectance (i.e., cues such as pigmentation and skin smoothness). We show that human judgments track the models' predictions (Experiment 1) and that the models differentiate between different judgments, though this differentiation is constrained by the similarity of the models (Experiment 2). We also make the validated stimuli available for academic research: seven databases containing 25 identities manipulated in the respective model to take on seven different dimension values, ranging from -3 SD to +3 SD (175 stimuli in each database). Finally, we show how the computational models can be used to control for shared variance of the models. For example, even for highly correlated dimensions (e.g., dominance and threat), we can identify cues specific to each dimension and, consequently, generate faces that vary only on these cues.

  5. A data driven approach for detection and isolation of anomalies in a group of UAVs

    Directory of Open Access Journals (Sweden)

    Wang Yin

    2015-02-01

    Full Text Available The use of groups of unmanned aerial vehicles (UAVs has greatly expanded UAV’s capabilities in a variety of applications, such as surveillance, searching and mapping. As the UAVs are operated as a team, it is important to detect and isolate the occurrence of anomalous aircraft in order to avoid collisions and other risks that would affect the safety of the team. In this paper, we present a data-driven approach to detect and isolate abnormal aircraft within a team of formatted flying aerial vehicles, which removes the requirements for the prior knowledge of the underlying dynamic model in conventional model-based fault detection algorithms. Based on the assumption that normal behaviored UAVs should share similar (dynamic model parameters, we propose to firstly identify the model parameters for each aircraft of the team based on a sequence of input and output data pairs, and this is achieved by a novel sparse optimization technique. The fault states of the UAVs would be detected and isolated in the second step by identifying the change of model parameters. Simulation results have demonstrated the efficiency and flexibility of the proposed approach.

  6. Big Data-Driven Based Real-Time Traffic Flow State Identification and Prediction

    Directory of Open Access Journals (Sweden)

    Hua-pu Lu

    2015-01-01

    Full Text Available With the rapid development of urban informatization, the era of big data is coming. To satisfy the demand of traffic congestion early warning, this paper studies the method of real-time traffic flow state identification and prediction based on big data-driven theory. Traffic big data holds several characteristics, such as temporal correlation, spatial correlation, historical correlation, and multistate. Traffic flow state quantification, the basis of traffic flow state identification, is achieved by a SAGA-FCM (simulated annealing genetic algorithm based fuzzy c-means based traffic clustering model. Considering simple calculation and predictive accuracy, a bilevel optimization model for regional traffic flow correlation analysis is established to predict traffic flow parameters based on temporal-spatial-historical correlation. A two-stage model for correction coefficients optimization is put forward to simplify the bilevel optimization model. The first stage model is built to calculate the number of temporal-spatial-historical correlation variables. The second stage model is present to calculate basic model formulation of regional traffic flow correlation. A case study based on a real-world road network in Beijing, China, is implemented to test the efficiency and applicability of the proposed modeling and computing methods.

  7. A Data-Driven Point Cloud Simplification Framework for City-Scale Image-Based Localization.

    Science.gov (United States)

    Cheng, Wentao; Lin, Weisi; Zhang, Xinfeng; Goesele, Michael; Sun, Ming-Ting

    2017-01-01

    City-scale 3D point clouds reconstructed via structure-from-motion from a large collection of Internet images are widely used in the image-based localization task to estimate a 6-DOF camera pose of a query image. Due to prohibitive memory footprint of city-scale point clouds, image-based localization is difficult to be implemented on devices with limited memory resources. Point cloud simplification aims to select a subset of points to achieve a comparable localization performance using the original point cloud. In this paper, we propose a data-driven point cloud simplification framework by taking it as a weighted K-Cover problem, which mainly includes two complementary parts. First, a utility-based parameter determination method is proposed to select a reasonable parameter K for K-Cover-based approaches by evaluating the potential of a point cloud for establishing sufficient 2D-3D feature correspondences. Second, we formulate the 3D point cloud simplification problem as a weighted K-Cover problem, and propose an adaptive exponential weight function based on the visibility probability of 3D points. The experimental results on three popular datasets demonstrate that the proposed point cloud simplification framework outperforms the state-of-the-art methods for the image-based localization application with a well predicted parameter in the K-Cover problem.

  8. Time-dependent ambulance allocation considering data-driven empirically required coverage.

    Science.gov (United States)

    Degel, Dirk; Wiesche, Lara; Rachuba, Sebastian; Werners, Brigitte

    2015-12-01

    Empirical studies considering the location and relocation of emergency medical service (EMS) vehicles in an urban region provide important insight into dynamic changes during the day. Within a 24-hour cycle, the demand, travel time, speed of ambulances and areas of coverage change. Nevertheless, most existing approaches in literature ignore these variations and require a (temporally and spatially) fixed (double) coverage of the planning area. Neglecting these variations and fixation of the coverage could lead to an inaccurate estimation of the time-dependent fleet size and individual positioning of ambulances. Through extensive data collection, now it is possible to precisely determine the required coverage of demand areas. Based on data-driven optimization, a new approach is presented, maximizing the flexible, empirically determined required coverage, which has been adjusted for variations due to day-time and site. This coverage prevents the EMS system from unavailability of ambulances due to parallel operations to ensure an improved coverage of the planning area closer to realistic demand. An integer linear programming model is formulated in order to locate and relocate ambulances. The use of such a programming model is supported by a comprehensive case study, which strongly suggests that through such a model, these objectives can be achieved and lead to greater cost-effectiveness and quality of emergency care.

  9. Data-driven modelling of structured populations a practical guide to the integral projection model

    CERN Document Server

    Ellner, Stephen P; Rees, Mark

    2016-01-01

    This book is a “How To” guide for modeling population dynamics using Integral Projection Models (IPM) starting from observational data. It is written by a leading research team in this area and includes code in the R language (in the text and online) to carry out all computations. The intended audience are ecologists, evolutionary biologists, and mathematical biologists interested in developing data-driven models for animal and plant populations. IPMs may seem hard as they involve integrals. The aim of this book is to demystify IPMs, so they become the model of choice for populations structured by size or other continuously varying traits. The book uses real examples of increasing complexity to show how the life-cycle of the study organism naturally leads to the appropriate statistical analysis, which leads directly to the IPM itself. A wide range of model types and analyses are presented, including model construction, computational methods, and the underlying theory, with the more technical material in B...

  10. Data Driven - Android based displays on data acquisition and system status

    CERN Document Server

    Canilho, Paulo

    2014-01-01

    For years, both hardware and software engineers have struggled with the acquisition of device information in a flexible and fast perspective, numerous devices cannot have their status quickly tested due to time limitation associated with the travelling to a computer terminal. For instance, in order to test a scintillator status, one has to inject beam into the device and quickly return to a terminal to see the results, this is not only time demanding but extremely inconvenient for the person responsible, it consumes time that would be used in more pressing matters. In this train of thoughts, the proposal of creating an interface to bring a stable, flexible, user friendly and data driven solution to this problem was created. Being the most common operative system for mobile display, the Android API proved to have the best efficient in financing, since it is based on an open source software, and in implementation difficulty since it’s backend development resides in JAVA calls and XML for visual representation...

  11. Global warming projection in the 21st century based on an observational data-driven model

    Science.gov (United States)

    Zeng, Xubin; Geil, Kerrie

    2016-10-01

    Global warming has been projected primarily by Earth system models (ESMs). Complementary to this approach, here we provide the decadal and long-term global warming projections based on an observational data-driven model. This model combines natural multidecadal variability with anthropogenic warming that depends on the history of annual emissions. It shows good skill in decadal hindcasts with the recent warming slowdown well captured. While our ensemble mean temperature projections at the end of 21st century are consistent with those from ESMs, our decadal warming projection of 0.35 (0.30-0.43) K from 1986-2005 to 2016-2035 is within their projection range and only two-thirds of the ensemble mean from ESMs. Our predicted warming rate in the next few years is slower than in the 1980s and 1990s, followed by a greater warming rate. Our projection uncertainty range is just one-third of that from ESMs, and its implication is also discussed.

  12. Articulatory distinctiveness of vowels and consonants: a data-driven approach.

    Science.gov (United States)

    Wang, Jun; Green, Jordan R; Samal, Ashok; Yunusova, Yana

    2013-10-01

    To quantify the articulatory distinctiveness of 8 major English vowels and 11 English consonants based on tongue and lip movement time series data using a data-driven approach. Tongue and lip movements of 8 vowels and 11 consonants from 10 healthy talkers were collected. First, classification accuracies were obtained using 2 complementary approaches: (a) Procrustes analysis and (b) a support vector machine. Procrustes distance was then used to measure the articulatory distinctiveness among vowels and consonants. Finally, the distance (distinctiveness) matrices of different vowel pairs and consonant pairs were used to derive articulatory vowel and consonant spaces using multidimensional scaling. Vowel classification accuracies of 91.67% and 89.05% and consonant classification accuracies of 91.37% and 88.94% were obtained using Procrustes analysis and a support vector machine, respectively. Articulatory vowel and consonant spaces were derived based on the pairwise Procrustes distances. The articulatory vowel space derived in this study resembled the long-standing descriptive articulatory vowel space defined by tongue height and advancement. The articulatory consonant space was consistent with feature-based classification of English consonants. The derived articulatory vowel and consonant spaces may have clinical implications, including serving as an objective measure of the severity of articulatory impairment.

  13. Data-driven automatic parking constrained control for four-wheeled mobile vehicles

    Directory of Open Access Journals (Sweden)

    Wenxu Yan

    2016-11-01

    Full Text Available In this article, a novel data-driven constrained control scheme is proposed for automatic parking systems. The design of the proposed scheme only depends on the steering angle and the orientation angle of the car, and it does not involve any model information of the car. Therefore, the proposed scheme-based automatic parking system is applicable to different kinds of cars. In order to further reduce the desired trajectory coordinate tracking errors, a coordinates compensation algorithm is also proposed. In the design procedure of the controller, a novel dynamic anti-windup compensator is used to deal with the change magnitude and rate saturations of automatic parking control input. It is theoretically proven that all the signals in the closed-loop system are uniformly ultimately bounded based on Lyapunov stability analysis method. Finally, a simulation comparison among the proposed scheme with coordinates compensation and Proportion Integration Differentiation (PID control algorithm is given. It is shown that the proposed scheme with coordinates compensation has smaller tracking errors and more rapid responses than PID scheme.

  14. BatchJS: Implementing Batches in JavaScript

    NARCIS (Netherlands)

    Kasemier, D.

    2014-01-01

    None of our popular programming languages know how to handle distribution well. Yet our programs interact more and more with each other and our data resorts in databases and web services. Batches are a new addition to languages that can finally bring native support for distribution to our favourite

  15. Analysis of Adiabatic Batch Reactor

    Directory of Open Access Journals (Sweden)

    Erald Gjonaj

    2016-05-01

    Full Text Available A mixture of acetic anhydride is reacted with excess water in an adiabatic batch reactor to form an exothermic reaction. The concentration of acetic anhydride and the temperature inside the adiabatic batch reactor are calculated with an initial temperature of 20°C, an initial temperature of 30°C, and with a cooling jacket maintaining the temperature at a constant of 20°C. The graphs of the three different scenarios show that the highest temperatures will cause the reaction to occur faster.

  16. Rural School Principals' Perceived Use of Data in Data-Driven Decision-Making and the Impact on Student Achievement

    Science.gov (United States)

    Rogers, K. Kaye

    2011-01-01

    This study examined the impact of principals' data-driven decision-making practices on student achievement using the theoretical frame of Dervin's sense-making theory. This study is a quantitative cross-sectional research design where principals' perceptions about data were quantitatively captured at a single point in time. The participants for…

  17. Educational Psychology's Instructional Challenge: Pre-Service Teacher Concerns Regarding Classroom-Level Data-Driven Decision-Making

    Science.gov (United States)

    Dunn, Karee E.

    2016-01-01

    Data-driven decision-making (DDDM) is a difficult topic to cover, but typically required, in the applied educational psychology course or other courses required for teacher licensure in the United States. While a growing body of literature indicates in-service teachers are resistant to DDDM and underprepared to engage in it, little has been done…

  18. Educational Psychology's Instructional Challenge: Pre-Service Teacher Concerns Regarding Classroom-Level Data-Driven Decision-Making

    Science.gov (United States)

    Dunn, Karee E.

    2016-01-01

    Data-driven decision-making (DDDM) is a difficult topic to cover, but typically required, in the applied educational psychology course or other courses required for teacher licensure in the United States. While a growing body of literature indicates in-service teachers are resistant to DDDM and underprepared to engage in it, little has been done…

  19. Input variable selection for data-driven models of Coriolis flowmeters for two-phase flow measurement

    Science.gov (United States)

    Wang, Lijuan; Yan, Yong; Wang, Xue; Wang, Tao

    2017-03-01

    Input variable selection is an essential step in the development of data-driven models for environmental, biological and industrial applications. Through input variable selection to eliminate the irrelevant or redundant variables, a suitable subset of variables is identified as the input of a model. Meanwhile, through input variable selection the complexity of the model structure is simplified and the computational efficiency is improved. This paper describes the procedures of the input variable selection for the data-driven models for the measurement of liquid mass flowrate and gas volume fraction under two-phase flow conditions using Coriolis flowmeters. Three advanced input variable selection methods, including partial mutual information (PMI), genetic algorithm-artificial neural network (GA-ANN) and tree-based iterative input selection (IIS) are applied in this study. Typical data-driven models incorporating support vector machine (SVM) are established individually based on the input candidates resulting from the selection methods. The validity of the selection outcomes is assessed through an output performance comparison of the SVM based data-driven models and sensitivity analysis. The validation and analysis results suggest that the input variables selected from the PMI algorithm provide more effective information for the models to measure liquid mass flowrate while the IIS algorithm provides a fewer but more effective variables for the models to predict gas volume fraction.

  20. Examining Data Driven Decision Making via Formative Assessment: A Confluence of Technology, Data Interpretation Heuristics and Curricular Policy

    Science.gov (United States)

    Swan, Gerry; Mazur, Joan

    2011-01-01

    Although the term data-driven decision making (DDDM) is relatively new (Moss, 2007), the underlying concept of DDDM is not. For example, the practices of formative assessment and computer-managed instruction have historically involved the use of student performance data to guide what happens next in the instructional sequence (Morrison, Kemp, &…

  1. Enhancing learning at work. How to combine theoretical and data-driven approaches, and multiple levels of data?

    NARCIS (Netherlands)

    Kalakoski, V.; Ratilainen, H.; Drupsteen, L.

    2015-01-01

    This research plan focuses on learning at work. Our aim is to gather empirical data on multiple factors that can affect learning for work, and to apply computational methods in order to understand the preconditions of effective learning. The design will systematically combine theory- and data-driven

  2. How Instructional Coaches Support Data-Driven Decision Making: Policy Implementation and Effects in Florida Middle Schools

    Science.gov (United States)

    Marsh, Julie A.; McCombs, Jennifer Sloan; Martorell, Francisco

    2010-01-01

    This article examines the convergence of two popular school improvement policies: instructional coaching and data-driven decision making (DDDM). Drawing on a mixed methods study of a statewide reading coach program in Florida middle schools, the article examines how coaches support DDDM and how this support relates to student and teacher outcomes.…

  3. Corpus of High School Academic Texts (COHAT): Data-Driven, Computer Assisted Discovery in Learning Academic English

    Science.gov (United States)

    Bohát, Róbert; Rödlingová, Beata; Horáková, Nina

    2015-01-01

    Corpus of High School Academic Texts (COHAT), currently of 150,000+ words, aims to make academic language instruction a more data-driven and student-centered discovery learning as a special type of Computer-Assisted Language Learning (CALL), emphasizing students' critical thinking and metacognition. Since 2013, high school English as an additional…

  4. Loss Modeling with a Data-Driven Approach in Event-Based Rainfall-Runoff Analysis

    Science.gov (United States)

    Chua, L. H. C.

    2012-04-01

    Mathematical models require the estimation of rainfall abstractions for accurate predictions of runoff. Although loss models such as the constant loss and exponential loss models are commonly used, these methods are based on simplified assumptions of the physical process. A new approach based on the data driven paradigm to estimate rainfall abstractions is proposed in this paper. The proposed data driven model, based on the artificial neural network (ANN) does not make any assumptions on the loss behavior. The estimated discharge from a physically-based model, obtained from the kinematic wave (KW) model assuming zero losses, was used as the only input to the ANN. The output is the measured discharge. Thus, the ANN functions as a black-box loss model. Two sets of data were analyzed for this study. The first dataset consists of rainfall and runoff data, measured from an artificial catchment (area = 25 m2) comprising two overland planes (slope = 11%), 25m long, transversely inclined towards a rectangular channel (slope = 2%) which conveyed the flow, recorded using calibrated weigh tanks, to the outlet. Two rain gauges, each placed 6.25 m from either ends of the channel, were used to record rainfall. Data for six storm events over the period between October 2002 and December 2002 were analyzed. The second dataset was obtained from the Upper Bukit Timah catchment (area = 6.4 km2) instrumented with two rain gauges and a flow measuring station. A total of six events recorded between November 1987 and July 1988 were selected for this study. The runoff predicted by the ANN was compared with the measured runoff. In addition, results from KW models developed for both the catchments were used as a benchmark. The KW models were calibrated assuming the loss rate for an average event for each of the datasets. The results from both the ANN and KW models agreed well with the runoff measured from the artificial catchment. The KW model is expected to perform well since the catchment

  5. Data-driven aerosol development in the GEOS-5 modeling and data assimilation system

    Science.gov (United States)

    Darmenov, A.; da Silva, A.; Liu, X.; Colarco, P. R.

    2013-12-01

    Atmospheric aerosols are important radiatively active agents that also affect clouds, atmospheric chemistry, the water cycle, land and ocean biogeochemistry. Furthermore, exposure to anthropogenic and/or natural fine particulates can have negative health effects. No single instrument or model is capable of quantifying the diverse and dynamic nature of aerosols at the range of spatial and temporal scales at which they interact with the other constituents and components of the Earth system. However, applying model-data integration techniques can minimize limitations of individual data products and remedy model deficiencies. The Goddard Earth Observing System Model, Version 5 (GEOS-5) is the latest version of the NASA Global Modeling and Assimilation Office (GMAO) Earth system model. GEOS-5 is a modeling and data assimilation framework well suited for aerosol research. It is being used to perform aerosol re-analysis and near real-time aerosol forecast on a global scale at resolutions comparable to those of aerosol products from modern spaceborne instruments. The aerosol processes in GEOS-5 derive from the Goddard Chemistry Aerosol Radiation and Transport (GOCART) but it is implemented on-line, within the climate model. GEOS-5 aerosol modeling capabilities have recently been enhanced by inclusion of the Modal Aerosol Microphysics module (MAM-7) originally developed in the Community Earth System Model (CESM) model. This work will present examples of data driven model development that include refining parameterization of sea-salt emissions, tuning of biomass burning emissions from vegetation fires and the effect of the updated emissions on the modeled direct aerosol forcing. We will also present results from GOES-5/MAM-7 model evaluation against AOD and particulate pollution datasets, and outline future directions of aerosol data assimilation in the GEOS-5 system.

  6. Data Science and its Relationship to Big Data and Data-Driven Decision Making.

    Science.gov (United States)

    Provost, Foster; Fawcett, Tom

    2013-03-01

    Companies have realized they need to hire data scientists, academic institutions are scrambling to put together data-science programs, and publications are touting data science as a hot-even "sexy"-career choice. However, there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz. In this article, we argue that there are good reasons why it has been hard to pin down exactly what is data science. One reason is that data science is intricately intertwined with other important concepts also of growing importance, such as big data and data-driven decision making. Another reason is the natural tendency to associate what a practitioner does with the definition of the practitioner's field; this can result in overlooking the fundamentals of the field. We believe that trying to define the boundaries of data science precisely is not of the utmost importance. We can debate the boundaries of the field in an academic setting, but in order for data science to serve business effectively, it is important (i) to understand its relationships to other important related concepts, and (ii) to begin to identify the fundamental principles underlying data science. Once we embrace (ii), we can much better understand and explain exactly what data science has to offer. Furthermore, only once we embrace (ii) should we be comfortable calling it data science. In this article, we present a perspective that addresses all these concepts. We close by offering, as examples, a partial list of fundamental principles underlying data science.

  7. A hybrid evolutionary data driven model for river water quality early warning.

    Science.gov (United States)

    Burchard-Levine, Alejandra; Liu, Shuming; Vince, Francois; Li, Mingming; Ostfeld, Avi

    2014-10-01

    China's fast pace industrialization and growing population has led to several accidental surface water pollution events in the last decades. The government of China, after the 2005 Songhua River incident, has pushed for the development of early warning systems (EWS) for drinking water source protection. However, there are still many weaknesses in EWS in China such as the lack of pollution monitoring and advanced water quality prediction models. The application of Data Driven Models (DDM) such as Artificial Neural Networks (ANN) has acquired recent attention as an alternative to physical models. For a case study in a south industrial city in China, a DDM based on genetic algorithm (GA) and ANN was tested to increase the response time of the city's EWS. The GA-ANN model was used to predict NH3-N, CODmn and TOC variables at station B 2 h ahead of time while showing the most sensitive input variables available at station A, 12 km upstream. For NH3-N, the most sensitive input variables were TOC, CODmn, TP, NH3-N and Turbidity with model performance giving a mean square error (MSE) of 0.0033, mean percent error (MPE) of 6% and regression (R) of 92%. For COD, the most sensitive input variables were Turbidity and CODmn with model performance giving a MSE of 0.201, MPE of 5% and R of 0.87. For TOC, the most sensitive input variables were Turbidity and CODmn with model performance giving a MSE of 0.101, MPE of 2% and R of 0.94. In addition, the GA-ANN model performed better for 8 h ahead of time. For future studies, the use of a GA-ANN modelling technique can be very useful for water quality prediction in Chinese monitoring stations which already measure and have immediately available water quality data.

  8. Full field reservoir modeling of shale assets using advanced data-driven analytics

    Directory of Open Access Journals (Sweden)

    Soodabeh Esmaili

    2016-01-01

    Full Text Available Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism (sorption process and flow behavior in complex fracture systems - induced or natural leaves much to be desired. In this paper, we present and discuss a novel approach to modeling, history matching of hydrocarbon production from a Marcellus shale asset in southwestern Pennsylvania using advanced data mining, pattern recognition and machine learning technologies. In this new approach instead of imposing our understanding of the flow mechanism, the impact of multi-stage hydraulic fractures, and the production process on the reservoir model, we allow the production history, well log, completion and hydraulic fracturing data to guide our model and determine its behavior. The uniqueness of this technology is that it incorporates the so-called “hard data” directly into the reservoir model, so that the model can be used to optimize the hydraulic fracture process. The “hard data” refers to field measurements during the hydraulic fracturing process such as fluid and proppant type and amount, injection pressure and rate as well as proppant concentration. This novel approach contrasts with the current industry focus on the use of “soft data” (non-measured, interpretive data such as frac length, width, height and conductivity in the reservoir models. The study focuses on a Marcellus shale asset that includes 135 wells with multiple pads, different landing targets, well length and reservoir properties. The full field history matching process was successfully completed using this data driven approach thus capturing the production behavior with acceptable accuracy for individual wells and for the entire asset.

  9. Application of a data-driven simulation method to the reconstruction of the coronal magnetic field

    Institute of Scientific and Technical Information of China (English)

    Yu-Liang Fan; Hua-Ning Wang; Han He; Xiao-Shuai Zhu

    2012-01-01

    Ever since the magnetohydrodynamic (MHD) method for extrapolation of the solar coronal magnetic field was first developed to study the dynamic evolution of twisted magnetic flux tubes,it has proven to be efficient in the reconstruction of the solar coronal magnetic field.A recent example is the so-called data-driven simulation method (DDSM),which has been demonstrated to be valid by an application to model analytic solutions such as a force-free equilibrium given by Low and Lou.We use DDSM for the observed magnetograms to reconstruct the magnetic field above an active region.To avoid an unnecessary sensitivity to boundary conditions,we use a classical total variation diminishing Lax-Friedrichs formulation to iteratively compute the full MHD equations.In order to incorporate a magnetogram consistently and stably,the bottom boundary conditions are derived from the characteristic method.In our simulation,we change the tangential fields continually from an initial potential field to the vector magnetogram.In the relaxation,the initial potential field is changed to a nonlinear magnetic field until the MHD equilibrium state is reached.Such a stable equilibrium is expected to be able to represent the solar atmosphere at a specified time.By inputting the magnetograms before and after the X3.4 flare that occurred on 2006 December 13,we find a topological change after comparing the magnetic field before and after the flare.Some discussions are given regarding the change of magnetic configuration and current distribution.Furthermore,we compare the reconstructed field line configuration with the coronal loop observations by XRT onboard Hinode.The comparison shows a relatively good correlation.

  10. A 1985-2015 data-driven global reconstruction of GRACE total water storage

    Science.gov (United States)

    Humphrey, Vincent; Gudmundsson, Lukas; Isabelle Seneviratne, Sonia

    2016-04-01

    After thirteen years of measurements, the Gravity Recovery and Climate Experiment (GRACE) mission has enabled for an unprecedented view on total water storage (TWS) variability. However, the relatively short record length, irregular time steps and multiple data gaps since 2011 still represent important limitations to a wider use of this dataset within the hydrological and climatological community especially for applications such as model evaluation or assimilation of GRACE in land surface models. To address this issue, we make use of the available GRACE record (2002-2015) to infer local statistical relationships between detrended monthly TWS anomalies and the main controlling atmospheric drivers (e.g. daily precipitation and temperature) at 1 degree resolution (Humphrey et al., in revision). Long-term and homogeneous monthly time series of detrended anomalies in total water storage are then reconstructed for the period 1985-2015. The quality of this reconstruction is evaluated in two different ways. First we perform a cross-validation experiment to assess the performance and robustness of the statistical model. Second we compare with independent basin-scale estimates of TWS anomalies derived by means of combined atmospheric and terrestrial water-balance using atmospheric water vapor flux convergence and change in atmospheric water vapor content (Mueller et al. 2011). The reconstructed time series are shown to provide robust data-driven estimates of global variations in water storage over large regions of the world. Example applications are provided for illustration, including an analysis of some selected major drought events which occurred before the GRACE era. References Humphrey V, Gudmundsson L, Seneviratne SI (in revision) Assessing global water storage variability from GRACE: trends, seasonal cycle, sub-seasonal anomalies and extremes. Surv Geophys Mueller B, Hirschi M, Seneviratne SI (2011) New diagnostic estimates of variations in terrestrial water storage

  11. Discovering Outliers of Potential Drug Toxicities Using a Large-scale Data-driven Approach.

    Science.gov (United States)

    Luo, Jake; Cisler, Ron A

    2016-01-01

    We systematically compared the adverse effects of cancer drugs to detect event outliers across different clinical trials using a data-driven approach. Because many cancer drugs are toxic to patients, better understanding of adverse events of cancer drugs is critical for developing therapies that could minimize the toxic effects. However, due to the large variabilities of adverse events across different cancer drugs, methods to efficiently compare adverse effects across different cancer drugs are lacking. To address this challenge, we present an exploration study that integrates multiple adverse event reports from clinical trials in order to systematically compare adverse events across different cancer drugs. To demonstrate our methods, we first collected data on 186,339 clinical trials from ClinicalTrials.gov and selected 30 common cancer drugs. We identified 1602 cancer trials that studied the selected cancer drugs. Our methods effectively extracted 12,922 distinct adverse events from the clinical trial reports. Using the extracted data, we ranked all 12,922 adverse events based on their prevalence in the clinical trials, such as nausea 82%, fatigue 77%, and vomiting 75.97%. To detect the significant drug outliers that could have a statistically high possibility of causing an event, we used the boxplot method to visualize adverse event outliers across different drugs and applied Grubbs' test to evaluate the significance. Analyses showed that by systematically integrating cross-trial data from multiple clinical trial reports, adverse event outliers associated with cancer drugs can be detected. The method was demonstrated by detecting the following four statistically significant adverse event cases: the association of the drug axitinib with hypertension (Grubbs' test, P < 0.001), the association of the drug imatinib with muscle spasm (P < 0.001), the association of the drug vorinostat with deep vein thrombosis (P < 0.001), and the association of the drug afatinib

  12. WaveSeq: a novel data-driven method of detecting histone modification enrichments using wavelets.

    Directory of Open Access Journals (Sweden)

    Apratim Mitra

    Full Text Available BACKGROUND: Chromatin immunoprecipitation followed by next-generation sequencing is a genome-wide analysis technique that can be used to detect various epigenetic phenomena such as, transcription factor binding sites and histone modifications. Histone modification profiles can be either punctate or diffuse which makes it difficult to distinguish regions of enrichment from background noise. With the discovery of histone marks having a wide variety of enrichment patterns, there is an urgent need for analysis methods that are robust to various data characteristics and capable of detecting a broad range of enrichment patterns. RESULTS: To address these challenges we propose WaveSeq, a novel data-driven method of detecting regions of significant enrichment in ChIP-Seq data. Our approach utilizes the wavelet transform, is free of distributional assumptions and is robust to diverse data characteristics such as low signal-to-noise ratios and broad enrichment patterns. Using publicly available datasets we showed that WaveSeq compares favorably with other published methods, exhibiting high sensitivity and precision for both punctate and diffuse enrichment regions even in the absence of a control data set. The application of our algorithm to a complex histone modification data set helped make novel functional discoveries which further underlined its utility in such an experimental setup. CONCLUSIONS: WaveSeq is a highly sensitive method capable of accurate identification of enriched regions in a broad range of data sets. WaveSeq can detect both narrow and broad peaks with a high degree of accuracy even in low signal-to-noise ratio data sets. WaveSeq is also suited for application in complex experimental scenarios, helping make biologically relevant functional discoveries.

  13. A data-driven approach for processing heterogeneous categorical sensor signals

    Science.gov (United States)

    Calderon, Christopher P.; Jones, Austin; Lundberg, Scott; Paffenroth, Randy

    2011-09-01

    False alarms generated by sensors pose a substantial problem to a variety of fusion applications. We focus on situations where the frequency of a genuine alarm is "rare" but the false alarm rate is high. The goal is to mitigate the false alarms while retaining power to detect true events. We propose to utilize data streams contaminated by false alarms (generated in the field) to compute statistics on a single sensor's misclassification rate. The nominal misclassification rate of a deployed sensor is often suspect because it is unlikely that these rates were tuned to the specific environmental conditions in which the sensor was deployed. Recent categorical measurement error methods will be applied to the collection of data streams to "train" the sensors and provide point estimates along with confidence intervals for the parameters characterizing sensor performance. By pooling a relatively small collection of random variables arising from a single sensor and using data-driven misclassification rate estimates along with estimated confidence bands, we show how one can transform the stream of categorical random variables into a test statistic with a limiting standard normal distribution. The procedure shows promise for normalizing sequences of misclassified random variables coming from different sensors (with a priori unknown population parameters) to comparable test statistics; this facilitates fusion through various downstream processing mechanisms. We have explored some possible downstream processing mechanisms that rely on false discovery rate (FDR) methods. The FDR methods exploit the test statistics we have computed in a chemical sensor fusion context where reducing false alarms and maintaining substantial power is important. FDR methods also provide a framework to fuse signals coming from non-chem/bio sensors in order to improve performance. Simulation results illustrating these ideas are presented. Extensions, future work and open problems are also briefly

  14. Probing the dynamics of identified neurons with a data-driven modeling approach.

    Directory of Open Access Journals (Sweden)

    Thomas Nowotny

    Full Text Available In controlling animal behavior the nervous system has to perform within the operational limits set by the requirements of each specific behavior. The implications for the corresponding range of suitable network, single neuron, and ion channel properties have remained elusive. In this article we approach the question of how well-constrained properties of neuronal systems may be on the neuronal level. We used large data sets of the activity of isolated invertebrate identified cells and built an accurate conductance-based model for this cell type using customized automated parameter estimation techniques. By direct inspection of the data we found that the variability of the neurons is larger when they are isolated from the circuit than when in the intact system. Furthermore, the responses of the neurons to perturbations appear to be more consistent than their autonomous behavior under stationary conditions. In the developed model, the constraints on different parameters that enforce appropriate model dynamics vary widely from some very tightly controlled parameters to others that are almost arbitrary. The model also allows predictions for the effect of blocking selected ionic currents and to prove that the origin of irregular dynamics in the neuron model is proper chaoticity and that this chaoticity is typical in an appropriate sense. Our results indicate that data driven models are useful tools for the in-depth analysis of neuronal dynamics. The better consistency of responses to perturbations, in the real neurons as well as in the model, suggests a paradigm shift away from measuring autonomous dynamics alone towards protocols of controlled perturbations. Our predictions for the impact of channel blockers on the neuronal dynamics and the proof of chaoticity underscore the wide scope of our approach.

  15. Data Driven Trigger Design and Analysis for the NOvA Experiment

    Energy Technology Data Exchange (ETDEWEB)

    Kurbanov, Serdar [Univ. of Virginia, Charlottesville, VA (United States)

    2016-01-01

    This thesis primarily describes analysis related to studying the Moon shadow with cosmic rays, an analysis using upward-going muons trigger data, and other work done as part of MSc thesis work conducted at Fermi National Laboratory. While at Fermilab I made hardware and software contributions to two experiments - NOvA and Mu2e. NOvA is a neutrino experiment with the primary goal of measuring parameters related to neutrino oscillation. This is a running experiment, so it's possible to provide analysis of real beam and cosmic data. Most of this work was related to the Data-Driven Trigger (DDT) system of NOvA. The results of the Upward-Going muon analysis was presented at ICHEP in August 2016. The analysis demonstrates the proof of principle for a low-mass dark matter search. Mu2e is an experiment currently being built at Fermilab. Its primary goal is to detect the hypothetical neutrinoless conversion from a muon into an electron. I contributed to the production and tests of Cathode Strip Chambers (CSCs) which are required for testing the Cosmic Ray Veto (CRV) system for the experiment. This contribution is described in the last chapter along with a short description of the technical work provided for the DDT system of the NOvA experiment. All of the work described in this thesis will be extended by the next generation of UVA graduate students and postdocs as new data is collected by the experiment. I hope my eorts of have helped lay the foundation for many years of beautiful results from Mu2e and NOvA.

  16. Towards an purely data driven view on the global carbon cycle and its spatiotemporal variability

    Science.gov (United States)

    Zscheischler, Jakob; Mahecha, Miguel; Reichstein, Markus; Avitabile, Valerio; Carvalhais, Nuno; Ciais, Philippe; Gans, Fabian; Gruber, Nicolas; Hartmann, Jens; Herold, Martin; Jung, Martin; Landschützer, Peter; Laruelle, Goulven; Lauerwald, Ronny; Papale, Dario; Peylin, Philippe; Regnier, Pierre; Rödenbeck, Christian; Cuesta, Rosa Maria Roman; Valentini, Ricardo

    2015-04-01

    Constraining carbon (C) fluxes between the Earth's surface and the atmosphere at regional scale via observations is essential for understanding the Earth's carbon budget and predicting future atmospheric C concentrations. Carbon budgets have often been derived based on merging observations, statistical models and process-based models, for example in the Global Carbon Project (GCP). However, it would be helpful to derive global C budgets and fluxes at global scale as independent as possible from model assumptions to obtain an independent reference. Long-term in-situ measurements of land and ocean C stocks and fluxes have enabled the derivation of a new generation of data driven upscaled data products. Here, we combine a wide range of in-situ derived estimates of terrestrial and aquatic C fluxes for one decade. The data were produced and/or collected during the FP7 project GEOCARBON and include surface-atmosphere C fluxes from the terrestrial biosphere, fossil fuels, fires, land use change, rivers, lakes, estuaries and open ocean. By including spatially explicit uncertainties in each dataset we are able to identify regions that are well constrained by observations and areas where more measurements are required. Although the budget cannot be closed at the global scale, we provide, for the first time, global time-varying maps of the most important C fluxes, which are all directly derived from observations. The resulting spatiotemporal patterns of C fluxes and their uncertainties inform us about the needs for intensifying global C observation activities. Likewise, we provide priors for inversion exercises or to identify regions of high (and low) uncertainty of integrated C fluxes. We discuss the reasons for regions of high observational uncertainties, and for biases in the budget. Our data synthesis might also be used as empirical reference for other local and global C budgeting exercises.

  17. Data-driven classification of hydrogeological conditions and its application in optimization problems in groundwater management

    Science.gov (United States)

    Fatkhutdinov, Aybulat

    2017-04-01

    Decision support in many research fields including surface water and groundwater management often relies on various optimization algorithms. However, application of an automated model optimization may require significant computational resources and be very time consuming. On the other side, during each scenario simulation large amount of data is produced which potentially can be used to train a data-driven model that can help to solve similar optimization problems more efficiently, e.g. by providing preliminary likelihood distribution of optimized variables. The main problem for application of any machine learning technique for characterization of hydrogeological situations is high variability of conditions including aquifer hydraulic properties and geometries, its interaction with surface water objects as well as artificial disturbance. The aim of this study is to find parameters that can be used as a training set for model learning, apply them on various learning algorithms and to test how strong performance of following optimization algorithm can be improved by supplementing it with a trained model. For the purposes of the experiment synthetically generated groundwater models with varying parameters are used. Generated models simulate a common situation when optimum position and parameters of designed well site have to be found. Parameters that compose set of model predictors include types, relative positions and properties of boundary conditions, aquifer properties and configuration. Target variables are relative positions of wells and ranges of their pumping/injection rates. Tested learning algorithms include neural networks, support vector machines and classification trees supplemented by posterior likelihood estimation. A variation of an evolutionary algorithm is used for optimization purposes.

  18. Kinetic modelling of [{sup 11}C]flumazenil using data-driven methods

    Energy Technology Data Exchange (ETDEWEB)

    Miederer, Isabelle; Ziegler, Sibylle I.; Liedtke, Christoph; Miederer, Matthias; Drzezga, Alexander [Technische Universitaet Muenchen, Department of Nuclear Medicine, Klinikum rechts der Isar, Munich (Germany); Spilker, Mary E. [GE Global Research, Computational Biology and Biostatistics Laboratory, Niscayuna, NY (United States); Sprenger, Till [Technische Universitaet Muenchen, Department of Neurology, Klinikum rechts der Isar, Munich (Germany); Wagner, Klaus J. [Technische Universitaet Muenchen, Department of Anaesthesiology, Klinikum rechts der Isar, Munich (Germany); Boecker, Henning [Universitaet Bonn, Department of Radiology, Bonn (Germany)

    2009-04-15

    [{sup 11}C]Flumazenil (FMZ) is a benzodiazepine receptor antagonist that binds reversibly to central-type gamma-aminobutyric acid (GABA-A) sites. A validated approach for analysis of [{sup 11}C]FMZ is the invasive one-tissue (1T) compartmental model. However, it would be advantageous to analyse FMZ binding with whole-brain pixel-based methods that do not require a-priori hypotheses regarding preselected regions. Therefore, in this study we compared invasive and noninvasive data-driven methods (Logan graphical analysis, LGA; multilinear reference tissue model, MRTM2; spectral analysis, SA; basis pursuit denoising, BPD) with the 1T model. We focused on two aspects: (1) replacing the arterial input function analyses with a reference tissue method using the pons as the reference tissue, and (2) shortening the scan protocol from 90 min to 60 min. Dynamic PET scans were conducted in seven healthy volunteers with arterial blood sampling. Distribution volume ratios (DVRs) were selected as the common outcome measure. The SA, LGA with and without arterial input, and MRTM2 agreed best with the 1T model DVR values. The invasive and noninvasive BPD were slightly less well correlated. The full protocol of a 90-min emission data performed better than the 60-min protocol, but the 60-min protocol still delivered useful data, as assessed by the coefficient of variation, and the correlation and bias analyses. This study showed that the SA, LGA and MRTM2 are valid methods for the quantification of benzodiazepine receptor binding with [{sup 11}C]FMZ using an invasive or noninvasive protocol, and therefore have the potential to reduce the invasiveness of the procedure. (orig.)

  19. Identification of preseizure states in epilepsy: A data-driven approach for multichannel EEG recordings

    Directory of Open Access Journals (Sweden)

    Hinnerk eFeldwisch-Drentrup

    2011-07-01

    Full Text Available The retrospective identification of preseizure states usually bases on a time-resolved characterization of dynamical aspects of multichannel neurophysiologic recordings that can be assessed with measures from linear or nonlinear time series analysis. This approach renders time profiles of a characterizing measure – so-called measure profiles – for different recording sites or combinations thereof. Various downstream evaluation techniques have been proposed to single out measure profiles that carry potential information about preseizure states. These techniques, however, rely on assumptions about seizure precursor dynamics that might not be generally valid or face the statistical problem of multiple testing. Addressing these issues, we have developed a method to preselect measure profiles that carry potential information about preseizure states, and to identify brain regions associated with seizure precursor dynamics. Our data-driven method is based on the ratio S of the global to local temporal variance of measure profiles. We evaluated its suitability by retrospectively analyzing long-lasting multichannel intracranial EEG recordings from 18 patients that included 133 focal onset seizures, using a bivariate measure for the strength of interactions. In 17/18 patients, we observed S to be significantly correlated with the predictive performance of measure profiles assessed retrospectively by means of receiver-operating-characteristic statistics. Predictive performance was higher for measure profiles preselected with S than for a manual selection using information about onset and spread of seizures. Across patients, highest predictive performance was not restricted to recordings from focal areas, thus supporting the notion of an extended epileptic network in which even distant brain regions contribute to seizure generation. We expect our method to provide further insight into the complex spatial and temporal aspects of the seizure generating

  20. The Right to be Forgotten in the Media: A Data-Driven Study

    Directory of Open Access Journals (Sweden)

    Xue Minhui

    2016-10-01

    Full Text Available Due to the recent “Right to be Forgotten” (RTBF ruling, for queries about an individual, Google and other search engines now delist links to web pages that contain “inadequate, irrelevant or no longer relevant, or excessive” information about that individual. In this paper we take a data-driven approach to study the RTBF in the traditional media outlets, its consequences, and its susceptibility to inference attacks. First, we do a content analysis on 283 known delisted UK media pages, using both manual investigation and Latent Dirichlet Allocation (LDA. We find that the strongest topic themes are violent crime, road accidents, drugs, murder, prostitution, financial misconduct, and sexual assault. Informed by this content analysis, we then show how a third party can discover delisted URLs along with the requesters’ names, thereby putting the efficacy of the RTBF for delisted media links in question. As a proof of concept, we perform an experiment that discovers two previously-unknown delisted URLs and their corresponding requesters. We also determine 80 requesters for the 283 known delisted media pages, and examine whether they suffer from the “Streisand effect,” a phenomenon whereby an attempt to hide a piece of information has the unintended consequence of publicizing the information more widely. To measure the presence (or lack of presence of a Streisand effect, we develop novel metrics and methodology based on Google Trends and Twitter data. Finally, we carry out a demographic analysis of the 80 known requesters. We hope the results and observations in this paper can inform lawmakers as they refine RTBF laws in the future.

  1. Comparison of ACL strain estimated via a data-driven model with in vitro measurements.

    Science.gov (United States)

    Weinhandl, Joshua T; Hoch, Matthew C; Bawab, Sebastian Y; Ringleb, Stacie I

    2016-11-01

    Computer modeling and simulation techniques have been increasingly used to investigate anterior cruciate ligament (ACL) loading during dynamic activities in an attempt to improve our understanding of injury mechanisms and development of injury prevention programs. However, the accuracy of many of these models remains unknown and thus the purpose of this study was to compare estimates of ACL strain from a previously developed three-dimensional, data-driven model with those obtained via in vitro measurements. ACL strain was measured as the knee was cycled from approximately 10° to 120° of flexion at 20 deg s(-1) with static loads of 100, 50, and 50 N applied to the quadriceps, biceps femoris and medial hamstrings (semimembranosus and semitendinosus) tendons, respectively. A two segment, five-degree-of-freedom musculoskeletal knee model was then scaled to match the cadaver's anthropometry and in silico ACL strains were then determined based on the knee joint kinematics and moments of force. Maximum and minimum ACL strains estimated in silico were within 0.2 and 0.42% of that measured in vitro, respectively. Additionally, the model estimated ACL strain with a bias (mean difference) of -0.03% and dynamic accuracy (rms error) of 0.36% across the flexion-extension cycle. These preliminary results suggest that the proposed model was capable of estimating ACL strains during a simple flexion-extension cycle. Future studies should validate the model under more dynamic conditions with variable muscle loading. This model could then be used to estimate ACL strains during dynamic sporting activities where ACL injuries are more common.

  2. Data-Driven Synthesis for Investigating Food Systems Resilience to Climate Change

    Science.gov (United States)

    Magliocca, N. R.; Hart, D.; Hondula, K. L.; Munoz, I.; Shelley, M.; Smorul, M.

    2014-12-01

    The production, supply, and distribution of our food involves a complex set of interactions between farmers, rural communities, governments, and global commodity markets that link important issues such as environmental quality, agricultural science and technology, health and nutrition, rural livelihoods, and social institutions and equality - all of which will be affected by climate change. The production of actionable science is thus urgently needed to inform and prepare the public for the consequences of climate change for local and global food systems. Access to data that spans multiple sectors/domains and spatial and temporal scales is key to beginning to tackle such complex issues. As part of the White House's Climate Data Initiative, the USDA and the National Socio-Environmental Synthesis Center (SESYNC) are launching a new collaboration to catalyze data-driven research to enhance food systems resilience to climate change. To support this collaboration, SESYNC is developing a new "Data to Motivate Synthesis" program designed to engage early career scholars in a highly interactive and dynamic process of real-time data discovery, analysis, and visualization to catalyze new research questions and analyses that would not have otherwise been possible and/or apparent. This program will be supported by an integrated, spatially-enabled cyberinfrastructure that enables the management, intersection, and analysis of large heterogeneous datasets relevant to food systems resilience to climate change. Our approach is to create a series of geospatial abstraction data structures and visualization services that can be used to accelerate analysis and visualization across various socio-economic and environmental datasets (e.g., reconcile census data with remote sensing raster datasets). We describe the application of this approach with a pilot workshop of socio-environmental scholars that will lay the groundwork for the larger SESYNC-USDA collaboration. We discuss the

  3. Disruption of functional networks in dyslexia: A whole-brain, data-driven analysis of connectivity

    Science.gov (United States)

    Finn, Emily S.; Shen, Xilin; Holahan, John M.; Scheinost, Dustin; Lacadie, Cheryl; Papademetris, Xenophon; Shaywitz, Sally E.; Shaywitz, Bennett A.; Constable, R. Todd

    2013-01-01

    Background Functional connectivity analyses of fMRI data are a powerful tool for characterizing brain networks and how they are disrupted in neural disorders. However, many such analyses examine only one or a small number of a priori seed regions. Studies that consider the whole brain frequently rely on anatomic atlases to define network nodes, which may result in mixing distinct activation timecourses within a single node. Here, we improve upon previous methods by using a data-driven brain parcellation to compare connectivity profiles of dyslexic (DYS) versus non-impaired (NI) readers in the first whole-brain functional connectivity analysis of dyslexia. Methods Whole-brain connectivity was assessed in children (n = 75; 43 NI, 32 DYS) and adult (n = 104; 64 NI, 40 DYS) readers. Results Compared to NI readers, DYS readers showed divergent connectivity within the visual pathway and between visual association areas and prefrontal attention areas; increased right-hemisphere connectivity; reduced connectivity in the visual word-form area (part of the left fusiform gyrus specialized for printed words); and persistent connectivity to anterior language regions around the inferior frontal gyrus. Conclusions Together, findings suggest that NI readers are better able to integrate visual information and modulate their attention to visual stimuli, allowing them to recognize words based on their visual properties, while DYS readers recruit altered reading circuits and rely on laborious phonology-based “sounding out” strategies into adulthood. These results deepen our understanding of the neural basis of dyslexia and highlight the importance of synchrony between diverse brain regions for successful reading. PMID:24124929

  4. Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data

    Science.gov (United States)

    Leistedt, Boris; Hogg, David W.

    2017-03-01

    We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data-driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine learning methods and template fitting methods by building template SEDs directly from the spectroscopic training data. This is made computationally tractable with Gaussian processes operating in flux–redshift space, encoding the physics of redshifts and the projection of galaxy SEDs onto photometric bandpasses. This method alleviates the need to acquire representative training data or to construct detailed galaxy SED models; it requires only that the photometric bandpasses and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data with reliable redshifts, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the i-magnitude-selected, spectroscopically confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes or to simulate populations of galaxies with realistic fluxes and redshifts, for example.

  5. Characterizing the (Perceived) Newsworthiness of Health Science Articles: A Data-Driven Approach

    Science.gov (United States)

    Willis, Erin; Paul, Michael J; Elhadad, Noémie; Wallace, Byron C

    2016-01-01

    Background Health science findings are primarily disseminated through manuscript publications. Information subsidies are used to communicate newsworthy findings to journalists in an effort to earn mass media coverage and further disseminate health science research to mass audiences. Journal editors and news journalists then select which news stories receive coverage and thus public attention. Objective This study aims to identify attributes of published health science articles that correlate with (1) journal editor issuance of press releases and (2) mainstream media coverage. Methods We constructed four novel datasets to identify factors that correlate with press release issuance and media coverage. These corpora include thousands of published articles, subsets of which received press release or mainstream media coverage. We used statistical machine learning methods to identify correlations between words in the science abstracts and press release issuance and media coverage. Further, we used a topic modeling-based machine learning approach to uncover latent topics predictive of the perceived newsworthiness of science articles. Results Both press release issuance for, and media coverage of, health science articles are predictable from corresponding journal article content. For the former task, we achieved average areas under the curve (AUCs) of 0.666 (SD 0.019) and 0.882 (SD 0.018) on two separate datasets, comprising 3024 and 10,760 articles, respectively. For the latter task, models realized mean AUCs of 0.591 (SD 0.044) and 0.783 (SD 0.022) on two datasets—in this case containing 422 and 28,910 pairs, respectively. We reported most-predictive words and topics for press release or news coverage. Conclusions We have presented a novel data-driven characterization of content that renders health science “newsworthy.” The analysis provides new insights into the news coverage selection process. For example, it appears epidemiological papers concerning common

  6. A Causal, Data-driven Approach to Modeling the Kepler Data

    Science.gov (United States)

    Wang, Dun; Hogg, David W.; Foreman-Mackey, Daniel; Schölkopf, Bernhard

    2016-09-01

    Astronomical observations are affected by several kinds of noise, each with its own causal source; there is photon noise, stochastic source variability, and residuals coming from imperfect calibration of the detector or telescope. The precision of NASA Kepler photometry for exoplanet science—the most precise photometric measurements of stars ever made—appears to be limited by unknown or untracked variations in spacecraft pointing and temperature, and unmodeled stellar variability. Here, we present the causal pixel model (CPM) for Kepler data, a data-driven model intended to capture variability but preserve transit signals. The CPM works at the pixel level so that it can capture very fine-grained information about the variation of the spacecraft. The CPM models the systematic effects in the time series of a pixel using the pixels of many other stars and the assumption that any shared signal in these causally disconnected light curves is caused by instrumental effects. In addition, we use the target star’s future and past (autoregression). By appropriately separating, for each data point, the data into training and test sets, we ensure that information about any transit will be perfectly isolated from the model. The method has four tuning parameters—the number of predictor stars or pixels, the autoregressive window size, and two L2-regularization amplitudes for model components, which we set by cross-validation. We determine values for tuning parameters that works well for most of the stars and apply the method to a corresponding set of target stars. We find that CPM can consistently produce low-noise light curves. In this paper, we demonstrate that pixel-level de-trending is possible while retaining transit signals, and we think that methods like CPM are generally applicable and might be useful for K2, TESS, etc., where the data are not clean postage stamps like Kepler.

  7. Data-Driven Neural Network Model for Robust Reconstruction of Automobile Casting

    Science.gov (United States)

    Lin, Jinhua; Wang, Yanjie; Li, Xin; Wang, Lu

    2017-09-01

    In computer vision system, it is a challenging task to robustly reconstruct complex 3D geometries of automobile castings. However, 3D scanning data is usually interfered by noises, the scanning resolution is low, these effects normally lead to incomplete matching and drift phenomenon. In order to solve these problems, a data-driven local geometric learning model is proposed to achieve robust reconstruction of automobile casting. In order to relieve the interference of sensor noise and to be compatible with incomplete scanning data, a 3D convolution neural network is established to match the local geometric features of automobile casting. The proposed neural network combines the geometric feature representation with the correlation metric function to robustly match the local correspondence. We use the truncated distance field(TDF) around the key point to represent the 3D surface of casting geometry, so that the model can be directly embedded into the 3D space to learn the geometric feature representation; Finally, the training labels is automatically generated for depth learning based on the existing RGB-D reconstruction algorithm, which accesses to the same global key matching descriptor. The experimental results show that the matching accuracy of our network is 92.2% for automobile castings, the closed loop rate is about 74.0% when the matching tolerance threshold τ is 0.2. The matching descriptors performed well and retained 81.6% matching accuracy at 95% closed loop. For the sparse geometric castings with initial matching failure, the 3D matching object can be reconstructed robustly by training the key descriptors. Our method performs 3D reconstruction robustly for complex automobile castings.

  8. Data-driven behavioural modelling of residential water consumption to inform water demand management strategies

    Science.gov (United States)

    Giuliani, Matteo; Cominola, Andrea; Alshaf, Ahmad; Castelletti, Andrea; Anda, Martin

    2016-04-01

    The continuous expansion of urban areas worldwide is expected to highly increase residential water demand over the next few years, ultimately challenging the distribution and supply of drinking water. Several studies have recently demonstrated that actions focused only on the water supply side of the problem (e.g., augmenting existing water supply infrastructure) will likely fail to meet future demands, thus calling for the concurrent deployment of effective water demand management strategies (WDMS) to pursue water savings and conservation. However, to be effective WDMS do require a substantial understanding of water consumers' behaviors and consumption patterns at different spatial and temporal resolutions. Retrieving information on users' behaviors, as well as their explanatory and/or causal factors, is key to spot potential areas for targeting water saving efforts and to design user-tailored WDMS, such as education campaigns and personalized recommendations. In this work, we contribute a data-driven approach to identify household water users' consumption behavioural profiles and model their water use habits. State-of-the-art clustering methods are coupled with big data machine learning techniques with the aim of extracting dominant behaviors from a set of water consumption data collected at the household scale. This allows identifying heterogeneous groups of consumers from the studied sample and characterizing them with respect to several consumption features. Our approach is validated onto a real-world household water consumption dataset associated with a variety of demographic and psychographic user data and household attributes, collected in nine towns of the Pilbara and Kimberley Regions of Western Australia. Results show the effectiveness of the proposed method in capturing the influence of candidate determinants on residential water consumption profiles and in attaining sufficiently accurate predictions of users' consumption behaviors, ultimately providing

  9. Data-driven haemodynamic response function extraction using Fourier-wavelet regularised deconvolution

    Directory of Open Access Journals (Sweden)

    Roerdink Jos BTM

    2008-04-01

    Full Text Available Abstract Background We present a simple, data-driven method to extract haemodynamic response functions (HRF from functional magnetic resonance imaging (fMRI time series, based on the Fourier-wavelet regularised deconvolution (ForWaRD technique. HRF data are required for many fMRI applications, such as defining region-specific HRFs, effciently representing a general HRF, or comparing subject-specific HRFs. Results ForWaRD is applied to fMRI time signals, after removing low-frequency trends by a wavelet-based method, and the output of ForWaRD is a time series of volumes, containing the HRF in each voxel. Compared to more complex methods, this extraction algorithm requires few assumptions (separability of signal and noise in the frequency and wavelet domains and the general linear model and it is fast (HRF extraction from a single fMRI data set takes about the same time as spatial resampling. The extraction method is tested on simulated event-related activation signals, contaminated with noise from a time series of real MRI images. An application for HRF data is demonstrated in a simple event-related experiment: data are extracted from a region with significant effects of interest in a first time series. A continuous-time HRF is obtained by fitting a nonlinear function to the discrete HRF coeffcients, and is then used to analyse a later time series. Conclusion With the parameters used in this paper, the extraction method presented here is very robust to changes in signal properties. Comparison of analyses with fitted HRFs and with a canonical HRF shows that a subject-specific, regional HRF significantly improves detection power. Sensitivity and specificity increase not only in the region from which the HRFs are extracted, but also in other regions of interest.

  10. A data-driven processing scheme for the GPR signal analysis and noise patterns removal

    Science.gov (United States)

    Jeng, Yih; Chen, Chih-Sung

    2015-04-01

    GPR signal events are inevitably interfered by a variety of noises. Noise waves degrade the quality of subsurface reflections, mask the reflections from targets, and may appear like true reflections. Some investigators have proposed ways to minimize the interference of specific noise events; however, a generalized noise removal methodology is still an interesting issue. In this study, we demonstrate an effective methodology for analyzing GPR data and suppressing noise events. The processing scheme is framed by the modified multidimensional ensemble empirical mode decomposition (MDEEMD), a multidimensional extension of the EMD algorithm. The MDEEMD is a data-driven time-frequency approach that has the advantages of dealing with nonlinear and non-stationary multichannel signals, and outperforms other univariate EMD algorithms with better uniformity, closer scale alignment, and more reliable intrinsic mode functions (IMFs). The procedure is implemented by performing the EEMD (ensemble empirical mode decomposition) in both directions of the B-scan GPR data set consecutively to obtain a 2D image matrix in which the elements are images representing fragmentary features of the B-scan GPR data. The final 2D EEMD filter bank is achieved by applying the comparable minimal scale combination technique to the 2D image matrix. With the velocity analysis and pattern recognition, the noise components can be distinguished from the signal components in the 2D EEMD filter bank. By subtracting the noise components from the filter bank and combining the rest components or directly picking the signal components for final image reconstruction, the noise events in the B-scan are suppressed effectively while most of the true reflections remain. The developed approach provides an alternative efficient method for GPR signal enhancement and can be applied to extract information from other noisy multidimensional geophysical data with limited modifications.

  11. Telling Anthropocene Tales: Localizing the impacts of global change using data-driven story maps

    Science.gov (United States)

    Mychajliw, A.; Hadly, E. A.

    2016-12-01

    Navigating the Anthropocene requires innovative approaches for generating scientific knowledge and for its communication outside academia. The global, synergistic nature of the environmental challenges we face - climate change, human population growth, biodiversity loss, pollution, invasive species and diseases - highlight the need for public outreach strategies that incorporate multiple scales and perspectives in an easily understandable and rapidly accessible format. Data-driven story-telling maps are optimal in that they can display variable geographic scales and their intersections with the environmental challenges relevant to both scientists and non-scientists. Maps are a powerful way to present complex data to all stakeholders. We present an overview of best practices in community-engaged scientific story-telling and data translation for policy-makers by reviewing three Story Map projects that map the geographic impacts of global change across multiple spatial and policy scales: the entire United States, the state of California, and the town of Pescadero, California. We document a chain of translation from a primary scientific manscript to a policy document (Scientific Consensus Statement on Maintaining Humanity's Life Support Systems in the 21st Century) to a set of interactive ArcGIS Story Maps. We discuss the widening breadth of participants (students, community members) and audiences (White House, Governor's Office of California, California Congressional Offices, general public) involved. We highlight how scientists, through careful curation of popular news media articles and stakeholder interviews, can co-produce these communication modules with community partners such as non-governmental organizations and government agencies. The placement of scientific and citizen's everyday knowledge of global change into an appropriate geographic context allows for effective dissemination by political units such as congressional districts and agency management units

  12. Full field reservoir modeling of shale assets using advanced data-driven analytics

    Institute of Scientific and Technical Information of China (English)

    Soodabeh Esmaili; Shahab D. Mohaghegh

    2016-01-01

    Hydrocarbon production from shale has attracted much attention in the recent years. When applied to this prolific and hydrocarbon rich resource plays, our understanding of the complexities of the flow mechanism (sorption process and flow behavior in complex fracture systems-induced or natural) leaves much to be desired. In this paper, we present and discuss a novel approach to modeling, history matching of hydrocarbon production from a Marcellus shale asset in southwestern Pennsylvania using advanced data mining, pattern recognition and machine learning technologies. In this new approach instead of imposing our understanding of the flow mechanism, the impact of multi-stage hydraulic fractures, and the production process on the reservoir model, we allow the production history, well log, completion and hydraulic fracturing data to guide our model and determine its behavior. The uniqueness of this tech-nology is that it incorporates the so-called “hard data” directly into the reservoir model, so that the model can be used to optimize the hydraulic fracture process. The “hard data” refers to field measure-ments during the hydraulic fracturing process such as fluid and proppant type and amount, injection pressure and rate as well as proppant concentration. This novel approach contrasts with the current industry focus on the use of “soft data” (non-measured, interpretive data such as frac length, width, height and conductivity) in the reservoir models. The study focuses on a Marcellus shale asset that in-cludes 135 wells with multiple pads, different landing targets, well length and reservoir properties. The full field history matching process was successfully completed using this data driven approach thus capturing the production behavior with acceptable accuracy for individual wells and for the entire asset.

  13. Data-driven event-by-event respiratory motion correction using TOF PET list-mode centroid of distribution

    Science.gov (United States)

    Ren, Silin; Jin, Xiao; Chan, Chung; Jian, Yiqiang; Mulnix, Tim; Liu, Chi; E Carson, Richard

    2017-06-01

    Data-driven respiratory gating techniques were developed to correct for respiratory motion in PET studies, without the help of external motion tracking systems. Due to the greatly increased image noise in gated reconstructions, it is desirable to develop a data-driven event-by-event respiratory motion correction method. In this study, using the Centroid-of-distribution (COD) algorithm, we established a data-driven event-by-event respiratory motion correction technique using TOF PET list-mode data, and investigated its performance by comparing with an external system-based correction method. Ten human scans with the pancreatic β-cell tracer 18F-FP-(+)-DTBZ were employed. Data-driven respiratory motions in superior-inferior (SI) and anterior-posterior (AP) directions were first determined by computing the centroid of all radioactive events during each short time frame with further processing. The Anzai belt system was employed to record respiratory motion in all studies. COD traces in both SI and AP directions were first compared with Anzai traces by computing the Pearson correlation coefficients. Then, respiratory gated reconstructions based on either COD or Anzai traces were performed to evaluate their relative performance in capturing respiratory motion. Finally, based on correlations of displacements of organ locations in all directions and COD information, continuous 3D internal organ motion in SI and AP directions was calculated based on COD traces to guide event-by-event respiratory motion correction in the MOLAR reconstruction framework. Continuous respiratory correction results based on COD were compared with that based on Anzai, and without motion correction. Data-driven COD traces showed a good correlation with Anzai in both SI and AP directions for the majority of studies, with correlation coefficients ranging from 63% to 89%. Based on the determined respiratory displacements of pancreas between end-expiration and end-inspiration from gated

  14. Data-driven event-by-event respiratory motion correction using TOF PET list-mode centroid of distribution.

    Science.gov (United States)

    Ren, Silin; Jin, Xiao; Chan, Chung; Jian, Yiqiang; Mulnix, Tim; Liu, Chi; Carson, Richard E

    2017-06-21

    Data-driven respiratory gating techniques were developed to correct for respiratory motion in PET studies, without the help of external motion tracking systems. Due to the greatly increased image noise in gated reconstructions, it is desirable to develop a data-driven event-by-event respiratory motion correction method. In this study, using the Centroid-of-distribution (COD) algorithm, we established a data-driven event-by-event respiratory motion correction technique using TOF PET list-mode data, and investigated its performance by comparing with an external system-based correction method. Ten human scans with the pancreatic β-cell tracer (18)F-FP-(+)-DTBZ were employed. Data-driven respiratory motions in superior-inferior (SI) and anterior-posterior (AP) directions were first determined by computing the centroid of all radioactive events during each short time frame with further processing. The Anzai belt system was employed to record respiratory motion in all studies. COD traces in both SI and AP directions were first compared with Anzai traces by computing the Pearson correlation coefficients. Then, respiratory gated reconstructions based on either COD or Anzai traces were performed to evaluate their relative performance in capturing respiratory motion. Finally, based on correlations of displacements of organ locations in all directions and COD information, continuous 3D internal organ motion in SI and AP directions was calculated based on COD traces to guide event-by-event respiratory motion correction in the MOLAR reconstruction framework. Continuous respiratory correction results based on COD were compared with that based on Anzai, and without motion correction. Data-driven COD traces showed a good correlation with Anzai in both SI and AP directions for the majority of studies, with correlation coefficients ranging from 63% to 89%. Based on the determined respiratory displacements of pancreas between end-expiration and end-inspiration from gated

  15. Physicochemical Characteristics of Transferon™ Batches

    Directory of Open Access Journals (Sweden)

    Emilio Medina-Rivero

    2016-01-01

    Full Text Available Transferon, a biotherapeutic agent that has been used for the past 2 decades for diseases with an inflammatory component, has been approved by regulatory authorities in Mexico (COFEPRIS for the treatment of patients with herpes infection. The active pharmaceutical ingredient (API of Transferon is based on polydispersion of peptides that have been extracted from lysed human leukocytes by a dialysis process and a subsequent ultrafiltration step to select molecules below 10 kDa. To physicochemically characterize the drug product, we developed chromatographic methods and an SDS-PAGE approach to analyze the composition and the overall variability of Transferon. Reversed-phase chromatographic profiles of peptide populations demonstrated batch-to-batch consistency from 10 representative batches that harbored 4 primary peaks with a relative standard deviation (RSD of less than 7%. Aminogram profiles exhibited 17 proteinogenic amino acids and showed that glycine was the most abundant amino acid, with a relative content of approximately 18%. Further, based on their electrophoretic migration, the peptide populations exhibited a molecular mass of about 10 kDa. Finally, we determined the Transferon fingerprint using a mass spectrometry tool. Because each batch was produced from independent pooled buffy coat samples from healthy donors, supplied by a local blood bank, our results support the consistency of the production of Transferon and reveal its peptide identity with regard to its physicochemical attributes.

  16. Physicochemical Characteristics of Transferon™ Batches

    Science.gov (United States)

    Pérez-Sánchez, Gilberto; Favari, Liliana; Estrada-Parra, Sergio

    2016-01-01

    Transferon, a biotherapeutic agent that has been used for the past 2 decades for diseases with an inflammatory component, has been approved by regulatory authorities in Mexico (COFEPRIS) for the treatment of patients with herpes infection. The active pharmaceutical ingredient (API) of Transferon is based on polydispersion of peptides that have been extracted from lysed human leukocytes by a dialysis process and a subsequent ultrafiltration step to select molecules below 10 kDa. To physicochemically characterize the drug product, we developed chromatographic methods and an SDS-PAGE approach to analyze the composition and the overall variability of Transferon. Reversed-phase chromatographic profiles of peptide populations demonstrated batch-to-batch consistency from 10 representative batches that harbored 4 primary peaks with a relative standard deviation (RSD) of less than 7%. Aminogram profiles exhibited 17 proteinogenic amino acids and showed that glycine was the most abundant amino acid, with a relative content of approximately 18%. Further, based on their electrophoretic migration, the peptide populations exhibited a molecular mass of about 10 kDa. Finally, we determined the Transferon fingerprint using a mass spectrometry tool. Because each batch was produced from independent pooled buffy coat samples from healthy donors, supplied by a local blood bank, our results support the consistency of the production of Transferon and reveal its peptide identity with regard to its physicochemical attributes. PMID:27525277

  17. Physicochemical Characteristics of Transferon™ Batches.

    Science.gov (United States)

    Medina-Rivero, Emilio; Vallejo-Castillo, Luis; Vázquez-Leyva, Said; Pérez-Sánchez, Gilberto; Favari, Liliana; Velasco-Velázquez, Marco; Estrada-Parra, Sergio; Pavón, Lenin; Pérez-Tapia, Sonia Mayra

    2016-01-01

    Transferon, a biotherapeutic agent that has been used for the past 2 decades for diseases with an inflammatory component, has been approved by regulatory authorities in Mexico (COFEPRIS) for the treatment of patients with herpes infection. The active pharmaceutical ingredient (API) of Transferon is based on polydispersion of peptides that have been extracted from lysed human leukocytes by a dialysis process and a subsequent ultrafiltration step to select molecules below 10 kDa. To physicochemically characterize the drug product, we developed chromatographic methods and an SDS-PAGE approach to analyze the composition and the overall variability of Transferon. Reversed-phase chromatographic profiles of peptide populations demonstrated batch-to-batch consistency from 10 representative batches that harbored 4 primary peaks with a relative standard deviation (RSD) of less than 7%. Aminogram profiles exhibited 17 proteinogenic amino acids and showed that glycine was the most abundant amino acid, with a relative content of approximately 18%. Further, based on their electrophoretic migration, the peptide populations exhibited a molecular mass of about 10 kDa. Finally, we determined the Transferon fingerprint using a mass spectrometry tool. Because each batch was produced from independent pooled buffy coat samples from healthy donors, supplied by a local blood bank, our results support the consistency of the production of Transferon and reveal its peptide identity with regard to its physicochemical attributes.

  18. Keeping Quality of Strawberry Batches

    NARCIS (Netherlands)

    Schouten, R.E.; Kooten, van O.

    2001-01-01

    Post-harvest life of strawberries is largely limited by Botrytis cinerea infection. It is assumed that there are two factors influencing the batch keeping quality: the botrytis pressure and the resistance of the strawberry against infection. The latter factor will be discussed here. A model is

  19. Batching System for Superior Service

    Science.gov (United States)

    2001-01-01

    Veridian's Portable Batch System (PBS) was the recipient of the 1997 NASA Space Act Award for outstanding software. A batch system is a set of processes for managing queues and jobs. Without a batch system, it is difficult to manage the workload of a computer system. By bundling the enterprise's computing resources, the PBS technology offers users a single coherent interface, resulting in efficient management of the batch services. Users choose which information to package into "containers" for system-wide use. PBS also provides detailed system usage data, a procedure not easily executed without this software. PBS operates on networked, multi-platform UNIX environments. Veridian's new version, PBS Pro,TM has additional features and enhancements, including support for additional operating systems. Veridian distributes the original version of PBS as Open Source software via the PBS website. Customers can register and download the software at no cost. PBS Pro is also available via the web and offers additional features such as increased stability, reliability, and fault tolerance.A company using PBS can expect a significant increase in the effective management of its computing resources. Tangible benefits include increased utilization of costly resources and enhanced understanding of computational requirements and user needs.

  20. NDA BATCH 2002-02

    Energy Technology Data Exchange (ETDEWEB)

    Lawrence Livermore National Laboratory

    2009-12-09

    QC sample results (daily background checks, 20-gram and 100-gram SGS drum checks) were within acceptable criteria established by WIPP's Quality Assurance Objectives for TRU Waste Characterization. Replicate runs were performed on 5 drums with IDs LL85101099TRU, LL85801147TRU, LL85801109TRU, LL85300999TRU and LL85500979TRU. All replicate measurement results are identical at the 95% confidence level as established by WIPP criteria. Note that the batch covered 5 weeks of SGS measurements from 23-Jan-2002 through 22-Feb-2002. Data packet for SGS Batch 2002-02 generated using gamma spectroscopy with the Pu Facility SGS unit is technically reasonable. All QC samples are in compliance with established control limits. The batch data packet has been reviewed for correctness, completeness, consistency and compliance with WIPP's Quality Assurance Objectives and determined to be acceptable. An Expert Review was performed on the data packet between 28-Feb-02 and 09-Jul-02 to check for potential U-235, Np-237 and Am-241 interferences and address drum cases where specific scan segments showed Se gamma ray transmissions for the 136-keV gamma to be below 0.1 %. Two drums in the batch showed Pu-238 at a relative mass ratio more than 2% of all the Pu isotopes.

  1. Unravelling abiotic and biotic controls on the seasonal water balance using data-driven dimensionless diagnostics

    Science.gov (United States)

    Seibert, Simon Paul; Jackisch, Conrad; Ehret, Uwe; Pfister, Laurent; Zehe, Erwin

    2017-06-01

    The baffling diversity of runoff generation processes, alongside our sketchy understanding of how physiographic characteristics control fundamental hydrological functions of water collection, storage, and release, continue to pose major research challenges in catchment hydrology. Here, we propose innovative data-driven diagnostic signatures for overcoming the prevailing status quo in catchment inter-comparison. More specifically, we present dimensionless double mass curves (dDMC) which allow inference of information on runoff generation and the water balance at the seasonal and annual timescales. By separating the vegetation and winter periods, dDMC furthermore provide information on the role of biotic and abiotic controls in seasonal runoff formation. A key aspect we address in this paper is the derivation of dimensionless expressions of fluxes which ensure the comparability of the signatures in space and time. We achieve this by using the limiting factors of a hydrological process as a scaling reference. We show that different references result in different diagnostics. As such we define two kinds of dDMC which allow us to derive seasonal runoff coefficients and to characterize dimensionless streamflow release as a function of the potential renewal rate of the soil storage. We expect these signatures for storage controlled seasonal runoff formation to remain invariant, as long as the ratios of release over supply and supply over storage capacity develop similarly in different catchments. We test the proposed methods by applying them to an operational data set comprising 22 catchments (12-166 km2) from different environments in southern Germany and hydrometeorological data from 4 hydrological years. The diagnostics are used to compare the sites and to reveal the dominant controls on runoff formation. The key findings are that dDMC are meaningful signatures for catchment runoff formation at the seasonal to annual scale and that the type of scaling strongly

  2. Data-Driven Microbial Modeling for Soil Carbon Decomposition and Stabilization

    Science.gov (United States)

    Luo, Yiqi; Chen, Ji; Chen, Yizhao; Feng, Wenting

    2017-04-01

    Microorganisms have long been known to catalyze almost all the soil organic carbon (SOC) transformation processes (e.g., decomposition, stabilization, and mineralization). Representing microbial processes in Earth system models (ESMs) has the potential to improve projections of SOC dynamics. We have recently examined (1) relationships of microbial functions with environmental factors and (2) microbial regulations of decomposition and other key soil processes. According to three lines of evidence, we have developed a data-driven enzyme (DENZY) model to simulate soil microbial decomposition and stabilization. First, our meta-analysis of 64 published field studies showed that field experimental warming significantly increased soil microbial communities abundance, which is negatively correlated with the mean annual temperature. The negative correlation indicates that warming had stronger effects in colder than warmer regions. Second, we found that the SOC decomposition, especially the transfer between labile SOC and protected SOC, is nonlinearly regulated by soil texture parameters, such as sand and silt contents. Third, we conducted a global analysis of the C-degrading enzyme activities, soil respiration, and SOC content under N addition. Our results show that N addition has contrasting effects on cellulase (hydrolytic C-degrading enzymes) and ligninase (oxidative C-degrading enzymes) activities. N-enhanced cellulase activity contributes to the minor stimulation of soil respiration whereas N-induced repression on ligninase activity drives soil C sequestration. Our analysis links the microbial extracellular C-degrading enzymes to the SOC dynamics at ecosystem scales across scores of experimental sites around the world. It offers direct evidence that N-induced changes in microbial community and physiology play fundamental roles in controlling the soil C cycle. Built upon those three lines of empirical evidence, the DENZY model includes two enzyme pools and explicitly

  3. A novel data-driven prognostic model for staging of colorectal cancer.

    Science.gov (United States)

    Manilich, Elena A; Kiran, Ravi P; Radivoyevitch, Tomas; Lavery, Ian; Fazio, Victor W; Remzi, Feza H

    2011-11-01

    The aim of this study was to develop a novel prognostic model that captures complex interplay among clinical and histologic factors to predict survival of patients with colorectal cancer after a radical potentially curative resection. Survival data of 2,505 colon cancer and 2,430 rectal cancer patients undergoing radical colorectal resection between 1969 and 2007 were analyzed by random forest technology. The effect of TNM and non-TNM factors such as histologic grade, lymph node ratio (number positive/number resected), type of operation, neoadjuvant and adjuvant treatment, American Society of Anesthesiologists (ASA) class, and age in staging and prognosis were evaluated. A forest of 1,000 random survival trees was grown using log-rank splitting. Competing risk-adjusted random survival forest methods were used to maximize survival prediction and produce importance measures of the predictor variables. Competing risk-adjusted 5-year survival after resection of colon and rectal cancer was dominated by pT stage (ie, tumor infiltration depth) and lymph node ratio. Increased lymph node ratio was associated with worse survival within the same pT stage for both colon and rectal cancer patients. Whereas survival for colon cancer was affected by ASA grade, the type of resection and neoadjuvant therapy had a strong effect on rectal cancer survival. A similar pattern in predicted survival rates was observed for patients with fewer than 12 lymph nodes examined. Our model suggests that lymph node ratio remains a significant predictor of survival in this group. A novel data-driven methodology predicts the survival times of patients with colorectal cancer and identifies patterns of cancer characteristics. The methods lead to stage groupings that could redefine the composition of TNM in a simple and orderly way. The higher predictive power of lymph node ratio as compared with traditional pN lymph node stage has specific implications and may address the important question of accuracy

  4. Data-driven spatio-temporal RGBD feature encoding for action recognition in operating rooms.

    Science.gov (United States)

    Twinanda, Andru P; Alkan, Emre O; Gangi, Afshin; de Mathelin, Michel; Padoy, Nicolas

    2015-06-01

    Context-aware systems for the operating room (OR) provide the possibility to significantly improve surgical workflow through various applications such as efficient OR scheduling, context-sensitive user interfaces, and automatic transcription of medical procedures. Being an essential element of such a system, surgical action recognition is thus an important research area. In this paper, we tackle the problem of classifying surgical actions from video clips that capture the activities taking place in the OR. We acquire recordings using a multi-view RGBD camera system mounted on the ceiling of a hybrid OR dedicated to X-ray-based procedures and annotate clips of the recordings with the corresponding actions. To recognize the surgical actions from the video clips, we use a classification pipeline based on the bag-of-words (BoW) approach. We propose a novel feature encoding method that extends the classical BoW approach. Instead of using the typical rigid grid layout to divide the space of the feature locations, we propose to learn the layout from the actual 4D spatio-temporal locations of the visual features. This results in a data-driven and non-rigid layout which retains more spatio-temporal information compared to the rigid counterpart. We classify multi-view video clips from a new dataset generated from 11-day recordings of real operations. This dataset is composed of 1734 video clips of 15 actions. These include generic actions (e.g., moving patient to the OR bed) and actions specific to the vertebroplasty procedure (e.g., hammering). The experiments show that the proposed non-rigid feature encoding method performs better than the rigid encoding one. The classifier's accuracy is increased by over 4 %, from 81.08 to 85.53 %. The combination of both intensity and depth information from the RGBD data provides more discriminative power in carrying out the surgical action recognition task as compared to using either one of them alone. Furthermore, the proposed non

  5. Current Trends in the Detection of Sociocultural Signatures: Data-Driven Models

    Energy Technology Data Exchange (ETDEWEB)

    Sanfilippo, Antonio P.; Bell, Eric B.; Corley, Courtney D.

    2014-09-15

    available that are shaping social computing as a strongly data-driven experimental discipline with an increasingly stronger impact on the decision-making process of groups and individuals alike. In this chapter, we review current advances and trends in the detection of sociocultural signatures. Specific embodiments of the issues discussed are provided with respect to the assessment of violent intent and sociopolitical contention. We begin by reviewing current approaches to the detection of sociocultural signatures in these domains. Next, we turn to the review of novel data harvesting methods for social media content. Finally, we discuss the application of sociocultural models to social media content, and conclude by commenting on current challenges and future developments.

  6. A data-driven model of biomarker changes in sporadic Alzheimer's disease.

    Science.gov (United States)

    Young, Alexandra L; Oxtoby, Neil P; Daga, Pankaj; Cash, David M; Fox, Nick C; Ourselin, Sebastien; Schott, Jonathan M; Alexander, Daniel C

    2014-09-01

    conversion from both mild cognitive impairment to Alzheimer's disease (P = 2.06 × 10(-7)) and cognitively normal to mild cognitive impairment (P = 0.033). The data-driven model we describe supports hypothetical models of biomarker ordering in amyloid-positive and APOE-positive subjects, but suggests that biomarker ordering in the wider population may diverge from this sequence. The model provides useful disease staging information across the full spectrum of disease progression, from cognitively normal to mild cognitive impairment to Alzheimer's disease. This approach has broad application across neurodegenerative disease, providing insights into disease biology, as well as staging and prognostication.

  7. Sign determination methods for the respiratory signal in data-driven PET gating

    Science.gov (United States)

    Bertolli, Ottavia; Arridge, Simon; Wollenweber, Scott D.; Stearns, Charles W.; Hutton, Brian F.; Thielemans, Kris

    2017-04-01

    Patient respiratory motion during PET image acquisition leads to blurring in the reconstructed images and may cause significant artifacts, resulting in decreased lesion detectability, inaccurate standard uptake value calculation and incorrect treatment planning in radiation therapy. To reduce these effects data can be regrouped into (nearly) ‘motion-free’ gates prior to reconstruction by selecting the events with respect to the breathing phase. This gating procedure therefore needs a respiratory signal: on current scanners it is obtained from an external device, whereas with data driven (DD) methods it can be directly obtained from the raw PET data. DD methods thus eliminate the use of external equipment, which is often expensive, needs prior setup and can cause patient discomfort, and they could also potentially provide increased fidelity to the internal movement. DD methods have been recently applied on PET data showing promising results. However, many methods provide signals whose direction with respect to the physical motion is uncertain (i.e. their sign is arbitrary), therefore a maximum in the signal could refer either to the end-inspiration or end-expiration phase, possibly causing inaccurate motion correction. In this work we propose two novel methods, CorrWeights and CorrSino, to detect the correct direction of the motion represented by the DD signal, that is obtained by applying principal component analysis (PCA) on the acquired data. They only require the PET raw data, and they rely on the assumption that one of the major causes of change in the acquired data related to the chest is respiratory motion in the axial direction, that generates a cranio-caudal motion of the internal organs. We also implemented two versions of a published registration-based method, that require image reconstruction. The methods were first applied on XCAT simulations, and later evaluated on cancer patient datasets monitored by the Varian Real-time Position ManagementTM (RPM

  8. Hybrid Modelling Approach to Prairie hydrology: Fusing Data-driven and Process-based Hydrological Models

    Science.gov (United States)

    Mekonnen, B.; Nazemi, A.; Elshorbagy, A.; Mazurek, K.; Putz, G.

    2012-04-01

    of process-based and data driven models can provide an alternative modelling approach to prairie hydrology. The approach is capable of representing the highly non-linear nature of the hydrological processes and in particular, the challenging response originating from the hydrologically non-contributing areas.

  9. NGBAuth - Next Generation Batch Authentication for long running batch jobs.

    CERN Document Server

    Juto, Zakarias

    2015-01-01

    This document describes the prototyping of a new solution for the CERN batch authentication of long running jobs. While the job submission requires valid user credentials, these have to be renewed due to long queuing and execution times. Described within is a new system which will guarantee a similar level of security as the old LSFAuth while simplifying the implementation and the overall architecture. The new system is being built on solid, streamlined and tested components (notably OpenSSL) and a priority has been to make it more generic in order to facilitate the evolution of the current system such as for the expected migration from LSF to Condor as backend batch system.

  10. Knowledge Based Cloud FE simulation - data-driven material characterization guidelines for the hot stamping of aluminium alloys

    Science.gov (United States)

    Wang, Ailing; Zheng, Yang; Liu, Jun; El Fakir, Omer; Masen, Marc; Wang, Liliang

    2016-08-01

    The Knowledge Based Cloud FEA (KBC-FEA) simulation technique allows multiobjective FE simulations to be conducted on a cloud-computing environment, which effectively reduces computation time and expands the capability of FE simulation software. In this paper, a novel functional module was developed for the data mining of experimentally verified FE simulation results for metal forming processes obtained from KBC-FE. Through this functional module, the thermo-mechanical characteristics of a metal forming process were deduced, enabling a systematic and data-driven guideline for mechanical property characterization to be developed, which will directly guide the material tests for a metal forming process towards the most efficient and effective scheme. Successful application of this data-driven guideline would reduce the efforts for material characterization, leading to the development of more accurate material models, which in turn enhance the accuracy of FE simulations.

  11. Data-driven technology for engineering systems health management design approach, feature construction, fault diagnosis, prognosis, fusion and decisions

    CERN Document Server

    Niu, Gang

    2017-01-01

    This book introduces condition-based maintenance (CBM)/data-driven prognostics and health management (PHM) in detail, first explaining the PHM design approach from a systems engineering perspective, then summarizing and elaborating on the data-driven methodology for feature construction, as well as feature-based fault diagnosis and prognosis. The book includes a wealth of illustrations and tables to help explain the algorithms, as well as practical examples showing how to use this tool to solve situations for which analytic solutions are poorly suited. It equips readers to apply the concepts discussed in order to analyze and solve a variety of problems in PHM system design, feature construction, fault diagnosis and prognosis.

  12. Data-Driven Sampling Matrix Boolean Optimization for Energy-Efficient Biomedical Signal Acquisition by Compressive Sensing.

    Science.gov (United States)

    Wang, Yuhao; Li, Xin; Xu, Kai; Ren, Fengbo; Yu, Hao

    2017-04-01

    Compressive sensing is widely used in biomedical applications, and the sampling matrix plays a critical role on both quality and power consumption of signal acquisition. It projects a high-dimensional vector of data into a low-dimensional subspace by matrix-vector multiplication. An optimal sampling matrix can ensure accurate data reconstruction and/or high compression ratio. Most existing optimization methods can only produce real-valued embedding matrices that result in large energy consumption during data acquisition. In this paper, we propose an efficient method that finds an optimal Boolean sampling matrix in order to reduce the energy consumption. Compared to random Boolean embedding, our data-driven Boolean sampling matrix can improve the image recovery quality by 9 dB. Moreover, in terms of sampling hardware complexity, it reduces the energy consumption by 4.6× and the silicon area by 1.9× over the data-driven real-valued embedding.

  13. A Data-Driven Stochastic Reactive Power Optimization Considering Uncertainties in Active Distribution Networks and Decomposition Method

    DEFF Research Database (Denmark)

    Ding, Tao; Yang, Qingrun; Yang, Yongheng

    2017-01-01

    To address the uncertain output of distributed generators (DGs) for reactive power optimization in active distribution networks, the stochastic programming model is widely used. The model is employed to find an optimal control strategy with minimum expected network loss while satisfying all......, in this paper, a data-driven modeling approach is introduced to assume that the probability distribution from the historical data is uncertain within a confidence set. Furthermore, a data-driven stochastic programming model is formulated as a two-stage problem, where the first-stage variables find the optimal...... control for discrete reactive power compensation equipment under the worst probability distribution of the second stage recourse. The second-stage variables are adjusted to uncertain probability distribution. In particular, this two-stage problem has a special structure so that the second-stage problem...

  14. Research on giant magnetostrictive actuator online nonlinear modeling based on data driven principle with grating sensing technique

    Science.gov (United States)

    Han, Ping

    2017-01-01

    A novel Giant Magnetostrictive Actuator (GMA) experimental system with Fiber Bragg Grating (FBG) sensing technique and its modeling method based on data driven principle are proposed. The FBG sensors are adopted to gather the multi-physics fields' status data of GMA considering the strong nonlinearity of the Giant Magnetostrictive Material and GMA micro-actuated structure. The feedback features are obtained from the raw dynamic status data, which are preprocessed by data fill and abnormal value detection algorithms. Correspondingly the Least Squares Support Vector Machine method is utilized to realize GMA online nonlinear modeling with data driven principle. The model performance and its relative algorithms are experimentally evaluated. The model can regularly run in the frequency range from 10 to 1000 Hz and temperature range from 20 to 100 °C with the minimum prediction error stable in the range from -1.2% to 1.1%.

  15. Batch-oriented software appliances

    CERN Document Server

    Murri, Riccardo

    2012-01-01

    This paper presents AppPot, a system for creating Linux software appliances. AppPot can be run as a regular batch or grid job and executed in user space, and requires no special virtualization support in the infrastructure. The main design goal of AppPot is to bring the benefits of a virtualization-based IaaS cloud to existing batch-oriented computing infrastructures. In particular, AppPot addresses the application deployment and configuration on large heterogeneous computing infrastructures: users are enabled to prepare their own customized virtual appliance for providing a safe execution environment for their applications. These appliances can then be executed on virtually any computing infrastructure being in a private or public cloud as well as any batch-controlled computing clusters the user may have access to. We give an overview of AppPot and its features, the technology that makes it possible, and report on experiences running it in production use within the Swiss National Grid infrastructure SMSCG.

  16. NDA Batch 2002-13

    Energy Technology Data Exchange (ETDEWEB)

    Hollister, R

    2009-09-17

    QC sample results (daily background check drum and 100-gram SGS check drum) were within acceptance criteria established by WIPP's Quality Assurance Objectives for TRU Waste Characterization. Replicate runs were performed on drum LL85501243TRU. Replicate measurement results are identical at the 95% confidence level as established by WIPP criteria. HWM NCAR No. 02-1000168 issued on 17-Oct-2002 regarding a partially dislodged Cd sheet filter on the HPGe coaxial detector. This physical geometry occurred on 01-Oct-2002 and was not corrected until 10-Oct-2002, during which period is inclusive of the present batch run of drums. Per discussions among the Independent Technical Reviewer, Expert Reviewer and the Technical QA Supervisor, as well as in consultation with John Fleissner, Technical Point of Contact from Canberra, the analytical results are technically reliable. All QC standard runs during this period were in control. Data packet for SGS Batch 2002-13 generated using passive gamma-ray spectroscopy with the Pu Facility SGS unit is technically reasonable. All QC samples are in compliance with establiShed control limits. The batch data packet has been reviewed for correctness, completeness, consistency and compliance with WIPP's Quality Assurance Objectives and determined to be acceptable.

  17. Data driven analysis of rain events: feature extraction, clustering, microphysical /macro physical relationship

    Science.gov (United States)

    Djallel Dilmi, Mohamed; Mallet, Cécile; Barthes, Laurent; Chazottes, Aymeric

    2017-04-01

    that a rain time series can be considered by an alternation of independent rain event and no rain period. The five selected feature are used to perform a hierarchical clustering of the events. The well-known division between stratiform and convective events appears clearly. This classification into two classes is then refined in 5 fairly homogeneous subclasses. The data driven analysis performed on whole rain events instead of fixed length samples allows identifying strong relationships between macrophysics (based on rain rate) and microphysics (based on raindrops) features. We show that among the 5 identified subclasses some of them have specific microphysics characteristics. Obtaining information on microphysical characteristics of rainfall events from rain gauges measurement suggests many implications in development of the quantitative precipitation estimation (QPE), for the improvement of rain rate retrieval algorithm in remote sensing context.

  18. 间歇结晶过程的分批优化%Batch-to-batch Optimization of Batch Crystallization Processes

    Institute of Scientific and Technical Information of China (English)

    Woranee Paengjuntuek; Paisan Kittisupakorn; Amornchai Arpornwichanop

    2008-01-01

    It is the fact that several process parameters are either unknown or uncertain. Therefore, an optimal control profile calculated with developed process models with respect to such process parameters may not give an optimal performance when implemented to real processes. This study proposes a batch-to-batch optimization strat-egy for the estimation of uncertain kinetic parameters in a batch crystallization process of potassium sulfate produc-tion. The knowledge of a crystal size distribution of the product at the end of batch operation is used in the proposedmethodology. The updated kinetic parameters are applied for determining an optimal operating temperature policy for the next batch run.

  19. The application of data mining and cloud computing techniques in data-driven models for structural health monitoring

    Science.gov (United States)

    Khazaeli, S.; Ravandi, A. G.; Banerji, S.; Bagchi, A.

    2016-04-01

    Recently, data-driven models for Structural Health Monitoring (SHM) have been of great interest among many researchers. In data-driven models, the sensed data are processed to determine the structural performance and evaluate the damages of an instrumented structure without necessitating the mathematical modeling of the structure. A framework of data-driven models for online assessment of the condition of a structure has been developed here. The developed framework is intended for automated evaluation of the monitoring data and structural performance by the Internet technology and resources. The main challenges in developing such framework include: (a) utilizing the sensor measurements to estimate and localize the induced damage in a structure by means of signal processing and data mining techniques, and (b) optimizing the computing and storage resources with the aid of cloud services. The main focus in this paper is to demonstrate the efficiency of the proposed framework for real-time damage detection of a multi-story shear-building structure in two damage scenarios (change in mass and stiffness) in various locations. Several features are extracted from the sensed data by signal processing techniques and statistical methods. Machine learning algorithms are deployed to select damage-sensitive features as well as classifying the data to trace the anomaly in the response of the structure. Here, the cloud computing resources from Amazon Web Services (AWS) have been used to implement the proposed framework.

  20. Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology - Part 1: Concepts and methodology

    Directory of Open Access Journals (Sweden)

    A. Elshorbagy

    2010-10-01

    Full Text Available A comprehensive data driven modeling experiment is presented in a two-part paper. In this first part, an extensive data-driven modeling experiment is proposed. The most important concerns regarding the way data driven modeling (DDM techniques and data were handled, compared, and evaluated, and the basis on which findings and conclusions were drawn are discussed. A concise review of key articles that presented comparisons among various DDM techniques is presented. Six DDM techniques, namely, neural networks, genetic programming, evolutionary polynomial regression, support vector machines, M5 model trees, and K-nearest neighbors are proposed and explained. Multiple linear regression and naïve models are also suggested as baseline for comparison with the various techniques. Five datasets from Canada and Europe representing evapotranspiration, upper and lower layer soil moisture content, and rainfall-runoff process are described and proposed, in the second paper, for the modeling experiment. Twelve different realizations (groups from each dataset are created by a procedure involving random sampling. Each group contains three subsets; training, cross-validation, and testing. Each modeling technique is proposed to be applied to each of the 12 groups of each dataset. This way, both prediction accuracy and uncertainty of the modeling techniques can be evaluated. The description of the datasets, the implementation of the modeling techniques, results and analysis, and the findings of the modeling experiment are deferred to the second part of this paper.

  1. Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method.

    Science.gov (United States)

    Zhang, Huaguang; Cui, Lili; Zhang, Xin; Luo, Yanhong

    2011-12-01

    In this paper, a novel data-driven robust approximate optimal tracking control scheme is proposed for unknown general nonlinear systems by using the adaptive dynamic programming (ADP) method. In the design of the controller, only available input-output data is required instead of known system dynamics. A data-driven model is established by a recurrent neural network (NN) to reconstruct the unknown system dynamics using available input-output data. By adding a novel adjustable term related to the modeling error, the resultant modeling error is first guaranteed to converge to zero. Then, based on the obtained data-driven model, the ADP method is utilized to design the approximate optimal tracking controller, which consists of the steady-state controller and the optimal feedback controller. Further, a robustifying term is developed to compensate for the NN approximation errors introduced by implementing the ADP method. Based on Lyapunov approach, stability analysis of the closed-loop system is performed to show that the proposed controller guarantees the system state asymptotically tracking the desired trajectory. Additionally, the obtained control input is proven to be close to the optimal control input within a small bound. Finally, two numerical examples are used to demonstrate the effectiveness of the proposed control scheme.

  2. A data-driven approach for modeling post-fire debris-flow volumes and their uncertainty

    Science.gov (United States)

    Friedel, M.J.

    2011-01-01

    This study demonstrates the novel application of genetic programming to evolve nonlinear post-fire debris-flow volume equations from variables associated with a data-driven conceptual model of the western United States. The search space is constrained using a multi-component objective function that simultaneously minimizes root-mean squared and unit errors for the evolution of fittest equations. An optimization technique is then used to estimate the limits of nonlinear prediction uncertainty associated with the debris-flow equations. In contrast to a published multiple linear regression three-variable equation, linking basin area with slopes greater or equal to 30 percent, burn severity characterized as area burned moderate plus high, and total storm rainfall, the data-driven approach discovers many nonlinear and several dimensionally consistent equations that are unbiased and have less prediction uncertainty. Of the nonlinear equations, the best performance (lowest prediction uncertainty) is achieved when using three variables: average basin slope, total burned area, and total storm rainfall. Further reduction in uncertainty is possible for the nonlinear equations when dimensional consistency is not a priority and by subsequently applying a gradient solver to the fittest solutions. The data-driven modeling approach can be applied to nonlinear multivariate problems in all fields of study. ?? 2011.

  3. Photospheric Current Spikes and Their Possible Association with Flares - Results from an HMI Data Driven Model

    Science.gov (United States)

    Goodman, Michael; Kwan, Chiman; Ayhan, Bulent; Shang, Eric L.

    2016-01-01

    A data driven, near photospheric magnetohydrodynamic model predicts spikes in the horizontal current density, and associated resistive heating rate per unit volume Q. The spikes appear as increases by orders of magnitude above background values in neutral line regions (NLRs) of active regions (ARs). The largest spikes typically occur a few hours to a few days prior to M or X flares. The spikes correspond to large vertical derivatives of the horizontal magnetic field. The model takes as input the photospheric magnetic field observed by the Helioseismic & Magnetic Imager (HMI) on the Solar Dynamics Observatory (SDO) satellite. This 2.5 D field is used to determine an analytic expression for a 3 D magnetic field, from which the current density, vector potential, and electric field are computed in every AR pixel for 14 ARs. The field is not assumed to be force-free. The spurious 6, 12, and 24 hour Doppler periods due to SDO orbital motion are filtered out of the time series of the HMI magnetic field for each pixel using a band pass filter. The subset of spikes analyzed at the pixel level are found to occur on HMI and granulation scales of 1 arcsec and 12 minutes. Spikes are found in ARs with and without M or X flares, and outside as well as inside NLRs, but the largest spikes are localized in the NLRs of ARs with M or X flares. The energy to drive the heating associated with the largest current spikes comes from bulk flow kinetic energy, not the electromagnetic field, and the current density is highly non-force free. The results suggest that, in combination with the model, HMI is revealing strong, convection driven, non-force free heating events on granulation scales, and that it is plausible these events are correlated with subsequent M or X flares. More and longer time series need to be analyzed to determine if such a correlation exists. Above an AR dependent threshold value of Q, the number of events N(Q) with heating rates greater than or equal to Q obeys a scale

  4. BATCH SETTLING IN VERTICAL SETTLERS

    OpenAIRE

    Lama Ramirez, R.; Universidad Nacional Mayor De San Marcos Facultad de Química e Ingeniería Química Departamento de Operaciones Unitarias Av. Venezuela cdra. 34 sin, Lima - Perú; Condorhuamán Ccorimanya, C.; Universidad Nacional Mayor De San Marcos Facultad de Química e Ingeniería Química Departamento de Operaciones Unitarias Av. Venezuela cdra. 34 sin, Lima - Perú

    2014-01-01

    lt has been studied the batch sedimentation of aqueous suspensions of precipitated calcium carbonate, barium sulphate and lead oxide , in vertical thickeners of rectangular and circular cross sectional area. Suspensions vary in concentration between 19.4 and 617.9 g/I and the rate of sedimentation obtained between 0.008 and 7.70 cm/min. The effect of the specific gravity of the solid on the rate of sedimentation is the same for all the suspensions, that is, the greater the value of the specif...

  5. Performance Monitoring of the Data-driven Subspace Predictive Control Systems Based on Historical Objective Function Benchmark

    Institute of Scientific and Technical Information of China (English)

    WANG Lu; LI Ning; LI Shao-Yuan

    2013-01-01

    In this paper,a historical objective function benchmark is proposed to monitor the performance of data-driven subspace predictive control systems.A new criterion for selection of the historical data set can be used to monitor the controller's performance,instead of using traditional methods based on prior knowledge.Under this monitoring framework,users can define their own index based on different demands and can also obtain the historical benchmark with a better sensitivity.Finally,a distillation column simulation example is used to illustrate the validity of the proposed algorithms.

  6. CEREF: A hybrid data-driven model for forecasting annual streamflow from a socio-hydrological system

    Science.gov (United States)

    Zhang, Hongbo; Singh, Vijay P.; Wang, Bin; Yu, Yinghao

    2016-09-01

    Hydrological forecasting is complicated by flow regime alterations in a coupled socio-hydrologic system, encountering increasingly non-stationary, nonlinear and irregular changes, which make decision support difficult for future water resources management. Currently, many hybrid data-driven models, based on the decomposition-prediction-reconstruction principle, have been developed to improve the ability to make predictions of annual streamflow. However, there exist many problems that require further investigation, the chief among which is the direction of trend components decomposed from annual streamflow series and is always difficult to ascertain. In this paper, a hybrid data-driven model was proposed to capture this issue, which combined empirical mode decomposition (EMD), radial basis function neural networks (RBFNN), and external forces (EF) variable, also called the CEREF model. The hybrid model employed EMD for decomposition and RBFNN for intrinsic mode function (IMF) forecasting, and determined future trend component directions by regression with EF as basin water demand representing the social component in the socio-hydrologic system. The Wuding River basin was considered for the case study, and two standard statistical measures, root mean squared error (RMSE) and mean absolute error (MAE), were used to evaluate the performance of CEREF model and compare with other models: the autoregressive (AR), RBFNN and EMD-RBFNN. Results indicated that the CEREF model had lower RMSE and MAE statistics, 42.8% and 7.6%, respectively, than did other models, and provided a superior alternative for forecasting annual runoff in the Wuding River basin. Moreover, the CEREF model can enlarge the effective intervals of streamflow forecasting compared to the EMD-RBFNN model by introducing the water demand planned by the government department to improve long-term prediction accuracy. In addition, we considered the high-frequency component, a frequent subject of concern in EMD

  7. The Distribution and Abundance of Bird Species: Towards a Satellite, Data Driven Avian Energetics and Species Richness Model

    Science.gov (United States)

    Smith, James A.

    2003-01-01

    This paper addresses the fundamental question of why birds occur where and when they do, i.e., what are the causative factors that determine the spatio-temporal distributions, abundance, or richness of bird species? In this paper we outline the first steps toward building a satellite, data-driven model of avian energetics and species richness based on individual bird physiology, morphology, and interaction with the spatio-temporal habitat. To evaluate our model, we will use the North American Breeding Bird Survey and Christmas Bird Count data for species richness, wintering and breeding range. Long term and current satellite data series include AVHRR, Landsat, and MODIS.

  8. Integrating a calibrated groundwater flow model with error-correcting data-driven models to improve predictions

    Science.gov (United States)

    Demissie, Yonas K.; Valocchi, Albert J.; Minsker, Barbara S.; Bailey, Barbara A.

    2009-01-01

    SummaryPhysically-based groundwater models (PBMs), such as MODFLOW, contain numerous parameters which are usually estimated using statistically-based methods, which assume that the underlying error is white noise. However, because of the practical difficulties of representing all the natural subsurface complexity, numerical simulations are often prone to large uncertainties that can result in both random and systematic model error. The systematic errors can be attributed to conceptual, parameter, and measurement uncertainty, and most often it can be difficult to determine their physical cause. In this paper, we have developed a framework to handle systematic error in physically-based groundwater flow model applications that uses error-correcting data-driven models (DDMs) in a complementary fashion. The data-driven models are separately developed to predict the MODFLOW head prediction errors, which were subsequently used to update the head predictions at existing and proposed observation wells. The framework is evaluated using a hypothetical case study developed based on a phytoremediation site at the Argonne National Laboratory. This case study includes structural, parameter, and measurement uncertainties. In terms of bias and prediction uncertainty range, the complementary modeling framework has shown substantial improvements (up to 64% reduction in RMSE and prediction error ranges) over the original MODFLOW model, in both the calibration and the verification periods. Moreover, the spatial and temporal correlations of the prediction errors are significantly reduced, thus resulting in reduced local biases and structures in the model prediction errors.

  9. Data-driven modeling and predictive control for boiler-turbine unit using fuzzy clustering and subspace methods.

    Science.gov (United States)

    Wu, Xiao; Shen, Jiong; Li, Yiguo; Lee, Kwang Y

    2014-05-01

    This paper develops a novel data-driven fuzzy modeling strategy and predictive controller for boiler-turbine unit using fuzzy clustering and subspace identification (SID) methods. To deal with the nonlinear behavior of boiler-turbine unit, fuzzy clustering is used to provide an appropriate division of the operation region and develop the structure of the fuzzy model. Then by combining the input data with the corresponding fuzzy membership functions, the SID method is extended to extract the local state-space model parameters. Owing to the advantages of the both methods, the resulting fuzzy model can represent the boiler-turbine unit very closely, and a fuzzy model predictive controller is designed based on this model. As an alternative approach, a direct data-driven fuzzy predictive control is also developed following the same clustering and subspace methods, where intermediate subspace matrices developed during the identification procedure are utilized directly as the predictor. Simulation results show the advantages and effectiveness of the proposed approach.

  10. A data-driven adaptive Reynolds-averaged Navier-Stokes k-ω model for turbulent flow

    Science.gov (United States)

    Li, Zhiyong; Zhang, Huaibao; Bailey, Sean C. C.; Hoagg, Jesse B.; Martin, Alexandre

    2017-09-01

    This paper presents a new data-driven adaptive computational model for simulating turbulent flow, where partial-but-incomplete measurement data is available. The model automatically adjusts the closure coefficients of the Reynolds-averaged Navier-Stokes (RANS) k- ω turbulence equations to improve agreement between the simulated flow and the measurements. This data-driven adaptive RANS k- ω (D-DARK) model is validated with 3 canonical flow geometries: pipe flow, backward-facing step, and flow around an airfoil. For all test cases, the D-DARK model improves agreement with experimental data in comparison to the results from a non-adaptive RANS k- ω model that uses standard values of the closure coefficients. For the pipe flow, adaptation is driven by mean stream-wise velocity data from 42 measurement locations along the pipe radius, and the D-DARK model reduces the average error from 5.2% to 1.1%. For the 2-dimensional backward-facing step, adaptation is driven by mean stream-wise velocity data from 100 measurement locations at 4 cross-sections of the flow. In this case, D-DARK reduces the average error from 40% to 12%. For the NACA 0012 airfoil, adaptation is driven by surface-pressure data at 25 measurement locations. The D-DARK model reduces the average error in surface-pressure coefficients from 45% to 12%.

  11. Classification of iRBD and Parkinson's patients using a general data-driven sleep staging model built on EEG.

    Science.gov (United States)

    Koch, Henriette; Christensen, Julie A E; Frandsen, Rune; Arvastson, Lars; Christensen, Soren R; Sorensen, Helge B D; Jennum, Poul

    2013-01-01

    Sleep analysis is an important diagnostic tool for sleep disorders. However, the current manual sleep scoring is time-consuming as it is a crude discretization in time and stages. This study changes Esbroeck and Westover's [1] latent sleep staging model into a global model. The proposed data-driven method trained a topic mixture model on 10 control subjects and was applied on 10 other control subjects, 10 iRBD patients and 10 Parkinson's patients. In that way 30 topic mixture diagrams were obtained from which features reflecting distinct sleep architectures between control subjects and patients were extracted. Two features calculated on basis of two latent sleep states classified subjects as "control" or "patient" by a simple clustering algorithm. The mean sleep staging accuracy compared to classical AASM scoring was 72.4% for control subjects and a clustering of the derived features resulted in a sensitivity of 95% and a specificity of 80 %. This study demonstrates that frequency analysis of sleep EEG can be used for data-driven global sleep classification and that topic features separates iRBD and Parkinson's patients from control subjects.

  12. Data-driven mono-component feature identification via modified nonlocal means and MEWT for mechanical drivetrain fault diagnosis

    Science.gov (United States)

    Pan, Jun; Chen, Jinglong; Zi, Yanyang; Yuan, Jing; Chen, Binqiang; He, Zhengjia

    2016-12-01

    It is significant to perform condition monitoring and fault diagnosis on rolling mills in steel-making plant to ensure economic benefit. However, timely fault identification of key parts in a complicated industrial system under operating condition is still a challenging task since acquired condition signals are usually multi-modulated and inevitably mixed with strong noise. Therefore, a new data-driven mono-component identification method is proposed in this paper for diagnostic purpose. First, the modified nonlocal means algorithm (NLmeans) is proposed to reduce noise in vibration signals without destroying its original Fourier spectrum structure. During the modified NLmeans, two modifications are investigated and performed to improve denoising effect. Then, the modified empirical wavelet transform (MEWT) is applied on the de-noised signal to adaptively extract empirical mono-component modes. Finally, the modes are analyzed for mechanical fault identification based on Hilbert transform. The results show that the proposed data-driven method owns superior performance during system operation compared with the MEWT method.

  13. Proactive monitoring of an onshore wind farm through lidar measurements, SCADA data and a data-driven RANS solver

    Science.gov (United States)

    Iungo, Giacomo Valerio; Camarri, Simone; Ciri, Umberto; El-Asha, Said; Leonardi, Stefano; Rotea, Mario A.; Santhanagopalan, Vignesh; Viola, Francesco; Zhan, Lu

    2016-11-01

    Site conditions, such as topography and local climate, as well as wind farm layout strongly affect performance of a wind power plant. Therefore, predictions of wake interactions and their effects on power production still remain a great challenge in wind energy. For this study, an onshore wind turbine array was monitored through lidar measurements, SCADA and met-tower data. Power losses due to wake interactions were estimated to be approximately 4% and 2% of the total power production under stable and convective conditions, respectively. This dataset was then leveraged for the calibration of a data driven RANS (DDRANS) solver, which is a compelling tool for prediction of wind turbine wakes and power production. DDRANS is characterized by a computational cost as low as that for engineering wake models, and adequate accuracy achieved through data-driven tuning of the turbulence closure model. DDRANS is based on a parabolic formulation, axisymmetry and boundary layer approximations, which allow achieving low computational costs. The turbulence closure model consists in a mixing length model, which is optimally calibrated with the experimental dataset. Assessment of DDRANS is then performed through lidar and SCADA data for different atmospheric conditions. This material is based upon work supported by the National Science Foundation under the I/UCRC WindSTAR, NSF Award IIP 1362033.

  14. Development of a data-driven forecasting tool for hydraulically fractured, horizontal wells in tight-gas sands

    Science.gov (United States)

    Kulga, B.; Artun, E.; Ertekin, T.

    2017-06-01

    Tight-gas sand reservoirs are considered to be one of the major unconventional resources. Due to the strong heterogeneity and very low permeability of the formation, and the complexity of well trajectories with multiple hydraulic fractures; there are challenges associated with performance forecasting and optimum exploitation of these resources using conventional modeling approaches. In this study, it is aimed to develop a data-driven forecasting tool for tight-gas sands, which are based on artificial neural networks that can complement the physics-driven modeling approach, namely numerical-simulation models. The tool is designed to predict the horizontal-well performance as a proxy to the numerical model, once the initial conditions, operational parameters, reservoir/hydraulic-fracture characteristics are provided. The data-driven model, that the forecasting tool is based on, is validated with blind cases by estimating the cumulative gas production after 10 years with an average error of 3.2%. A graphical-user-interface application is developed that allows the practicing engineer to use the developed tool in a practical manner by visualizing estimated performance for a given reservoir within a fraction of a second. Practicality of the tool is demonstrated with a case study for the Williams Fork Formation by assessing the performance of various well designs and by incorporating known uncertainties through Monte Carlo simulation. P10, P50 and P90 estimates of the horizontal-well performance are quickly obtained within acceptable accuracy levels.

  15. Protein engineering of Bacillus acidopullulyticus pullulanase for enhanced thermostability using in silico data driven rational design methods.

    Science.gov (United States)

    Chen, Ana; Li, Yamei; Nie, Jianqi; McNeil, Brian; Jeffrey, Laura; Yang, Yankun; Bai, Zhonghu

    2015-10-01

    Thermostability has been considered as a requirement in the starch processing industry to maintain high catalytic activity of pullulanase under high temperatures. Four data driven rational design methods (B-FITTER, proline theory, PoPMuSiC-2.1, and sequence consensus approach) were adopted to identify the key residue potential links with thermostability, and 39 residues of Bacillus acidopullulyticus pullulanase were chosen as mutagenesis targets. Single mutagenesis followed by combined mutagenesis resulted in the best mutant E518I-S662R-Q706P, which exhibited an 11-fold half-life improvement at 60 °C and a 9.5 °C increase in Tm. The optimum temperature of the mutant increased from 60 to 65 °C. Fluorescence spectroscopy results demonstrated that the tertiary structure of the mutant enzyme was more compact than that of the wild-type (WT) enzyme. Structural change analysis revealed that the increase in thermostability was most probably caused by a combination of lower stability free-energy and higher hydrophobicity of E518I, more hydrogen bonds of S662R, and higher rigidity of Q706P compared with the WT. The findings demonstrated the effectiveness of combined data-driven rational design approaches in engineering an industrial enzyme to improve thermostability.

  16. Data-Driven Modeling for UGI Gasification Processes via an Enhanced Genetic BP Neural Network With Link Switches.

    Science.gov (United States)

    Liu, Shida; Hou, Zhongsheng; Yin, Chenkun

    2016-12-01

    In this brief, an enhanced genetic back-propagation neural network with link switches (EGA-BPNN-LS) is proposed to address a data-driven modeling problem for gasification processes inside United Gas Improvement (UGI) gasifiers. The online-measured temperature of crude gas produced during the gasification processes plays a dominant role in the syngas industry; however, it is difficult to model temperature dynamics via first principles due to the practical complexity of the gasification process, especially as reflected by severe changes in the gas temperature resulting from infrequent manipulations of the gasifier in practice. The proposed data-driven modeling approach, EGA-BPNN-LS, incorporates an NN-LS, an EGA, and the Levenberg-Marquardt (LM) algorithm. The approach cannot only learn the relationships between the control input and the system output from historical data using an optimized network structure through a combination of EGA and NN-LS but also makes use of the networks gradient information via the LM algorithm. EGA-BPNN-LS is applied to a set of data collected from the field to model the UGI gasification processes, and the effectiveness of EGA-BPNN-LS is verified.

  17. Cognitive Profiles in Parkinson’s Disease and Their Relation to Dementia: A Data-Driven Approach

    Directory of Open Access Journals (Sweden)

    Inga Liepelt-Scarfone

    2012-01-01

    Full Text Available Parkinson’s disease is characterized by a substantial cognitive heterogeneity, which is apparent in different profiles and levels of severity. To date, a distinct clinical profile for patients with a potential risk of developing dementia still has to be identified. We introduce a data-driven approach to detect different cognitive profiles and stages. Comprehensive neuropsychological data sets from a cohort of 121 Parkinson’s disease patients with and without dementia were explored by a factor analysis to characterize different cognitive domains. Based on the factor scores that represent individual performance in each domain, hierarchical cluster analyses determined whether subgroups of Parkinson’s disease patients show varying cognitive profiles. A six-factor solution accounting for 65.2% of total variance fitted best to our data and revealed high internal consistencies (Cronbach’s alpha coefficients >0.6. The cluster analyses suggested two independent patient clusters with different cognitive profiles. They differed only in severity of cognitive impairment and self-reported limitation of activities of daily living function but not in motor performance, disease duration, or dopaminergic medication. Based on a data-driven approach, divers cognitive profiles were identified, which separated early and more advanced stages of cognitive impairment in Parkinson’s disease without dementia. Importantly, these profiles were independent of motor progression.

  18. Performance Analysis of Data-Driven and Model-Based Control Strategies Applied to a Thermal Unit Model

    Directory of Open Access Journals (Sweden)

    Cihan Turhan

    2017-01-01

    Full Text Available The paper presents the design and the implementation of different advanced control strategies that are applied to a nonlinear model of a thermal unit. A data-driven grey-box identification approach provided the physically–meaningful nonlinear continuous-time model, which represents the benchmark exploited in this work. The control problem of this thermal unit is important, since it constitutes the key element of passive air conditioning systems. The advanced control schemes analysed in this paper are used to regulate the outflow air temperature of the thermal unit by exploiting the inflow air speed, whilst the inflow air temperature is considered as an external disturbance. The reliability and robustness issues of the suggested control methodologies are verified with a Monte Carlo (MC analysis for simulating modelling uncertainty, disturbance and measurement errors. The achieved results serve to demonstrate the effectiveness and the viable application of the suggested control solutions to air conditioning systems. The benchmark model represents one of the key issues of this study, which is exploited for benchmarking different model-based and data-driven advanced control methodologies through extensive simulations. Moreover, this work highlights the main features of the proposed control schemes, while providing practitioners and heating, ventilating and air conditioning engineers with tools to design robust control strategies for air conditioning systems.

  19. Following an Optimal Batch Bioreactor Operations Model

    DEFF Research Database (Denmark)

    Ibarra-Junquera, V.; Jørgensen, Sten Bay; Virgen-Ortíz, J.J.;

    2012-01-01

    The problem of following an optimal batch operation model for a bioreactor in the presence of uncertainties is studied. The optimal batch bioreactor operation model (OBBOM) refers to the bioreactor trajectory for nominal cultivation to be optimal. A multiple-variable dynamic optimization of fed-b...

  20. Family based dispatching with batch availability

    NARCIS (Netherlands)

    van der Zee, D.J.

    2013-01-01

    Family based dispatching rules seek to lower set-up frequencies by grouping (batching) similar types of jobs for joint processing. Hence shop flow times may be improved, as less time is spent on set-ups. Motivated by an industrial project we study the control of machines with batch availability, i.e

  1. Automatic Endpoint Determination for Batch Tea Dryers

    NARCIS (Netherlands)

    Temple, S.J.; Boxtel, van A.J.B.

    2000-01-01

    Agricultural Engineering and Physics, Wageningen University, Bomenweg 4, Wageningen, 6703 HD, The Netherlands Abstract: A laboratory batch fluid-bed dryer was developed for handling small samples of tea for experimental batch manufacture, and this dryer required a means of stopping drying when the p

  2. Automatic endpoint determination for batch tea dryers

    NARCIS (Netherlands)

    Temple, S.J.; Boxtel, van A.J.B.

    2001-01-01

    A laboratory batch fluid-bed dryer was developed for handling small samples of tea for experimental batch manufacture, and this dryer required a means of stopping drying when the process was complete. A control system was devised which requires only the initial weight of the sample to be entered

  3. Norton's theorem for batch routing queueing networks

    NARCIS (Netherlands)

    Bause, Falko; Boucherie, Richard J.; Buchholz, Peter

    2001-01-01

    This paper shows that the aggregation and decomposition result known as Norton’s theorem for queueing networks can be extended to a general class of batch routing queueing networks with product-form solution that allows for multiple components to simultaneously release and receive (batches of) custo

  4. Operation of a Batch Stripping Distillation Column

    Institute of Scientific and Technical Information of China (English)

    2001-01-01

    A stripping batch distillation column is preferred when the amount of the light component in the feed is small and the products are to be recovered at high purity. The operation modes of a batch stripping are believed to be the same as those of a rectifier. However, the control system of a stripper is different. In this paper, we explore three different control methods with Hysys (Hyprotech Ltd. 1997) for a batch stripper. The main difference is the control scheme for reboiler liquid level: (a) controlled by reflux flow; (b) controlled by reboiler heat duty; (c) controlled by bottom product flow. The main characteristics of operating a batch stripper with different control scheme are presented in this paper. Guidelines are provided for the startup of a batch stripper, the effects of somecontrol tuning parameters on the column performance are discussed.

  5. Consequence Identification for Maloperation in Batch Process

    Institute of Scientific and Technical Information of China (English)

    张玉良; 张贝克; 马昕; 曹柳林; 吴重光

    2013-01-01

    Batch processes are important in chemical industry, in which operators usually play a major role and hazards may arise by their inadvertent acts. In this paper, based on hazard and operability study and concept of qualitative simulation, an automatic method for adverse consequence identification for potential maloperation is proposed. The qualitative model for production process is expressed by a novel directed graph. Possible operation deviations from normal operating procedure are identified systematically by using a group of guidewords. The pro-posed algorithm is used for qualitative simulation of batch processes to identify the effects of maloperations. The method is illustrated with a simple batch process and a batch reaction process. The results show that batch processes can be simulated qualitatively and hazards can be identified for operating procedures including maloperations. After analysis for possible plant maloperations, some measures can be taken to avoid maloperations or reduce losses re-sulted from maloperations.

  6. Batch Scheduling a Fresh Approach

    Science.gov (United States)

    Cardo, Nicholas P.; Woodrow, Thomas (Technical Monitor)

    1994-01-01

    The Network Queueing System (NQS) was designed to schedule jobs based on limits within queues. As systems obtain more memory, the number of queues increased to take advantage of the added memory resource. The problem now becomes too many queues. Having a large number of queues provides users with the capability to gain an unfair advantage over other users by tailoring their job to fit in an empty queue. Additionally, the large number of queues becomes confusing to the user community. The High Speed Processors group at the Numerical Aerodynamics Simulation (NAS) Facility at NASA Ames Research Center developed a new approach to batch job scheduling. This new method reduces the number of queues required by eliminating the need for queues based on resource limits. The scheduler examines each request for necessary resources before initiating the job. Also additional user limits at the complex level were added to provide a fairness to all users. Additional tools which include user job reordering are under development to work with the new scheduler. This paper discusses the objectives, design and implementation results of this new scheduler

  7. Big data-driven business how to use big data to win customers, beat competitors, and boost profits

    CERN Document Server

    Glass, Russell

    2014-01-01

    Get the expert perspective and practical advice on big data The Big Data-Driven Business: How to Use Big Data to Win Customers, Beat Competitors, and Boost Profits makes the case that big data is for real, and more than just big hype. The book uses real-life examples-from Nate Silver to Copernicus, and Apple to Blackberry-to demonstrate how the winners of the future will use big data to seek the truth. Written by a marketing journalist and the CEO of a multi-million-dollar B2B marketing platform that reaches more than 90% of the U.S. business population, this book is a comprehens

  8. Classification of iRBD and Parkinson's patients using a general data-driven sleep staging model built on EEG

    DEFF Research Database (Denmark)

    Koch, Henriette; Christensen, Julie Anja Engelhard; Frandsen, Rune

    2013-01-01

    Sleep analysis is an important diagnostic tool for sleep disorders. However, the current manual sleep scoring is time-consuming as it is a crude discretization in time and stages. This study changes Esbroeck and Westover's [1] latent sleep staging model into a global model. The proposed data...... were extracted. Two features calculated on basis of two latent sleep states classified subjects as “control” or “patient” by a simple clustering algorithm. The mean sleep staging accuracy compared to classical AASM scoring was 72.4% for control subjects and a clustering of the derived features resulted...... in a sensitivity of 95% and a specificity of 80%. This study demonstrates that frequency analysis of sleep EEG can be used for data-driven global sleep classification and that topic features separates iRBD and Parkinson's patients from control subjects....

  9. The Tyranny of Data? The Bright and Dark Sides of Data-Driven Decision-Making for Social Good

    CERN Document Server

    Lepri, Bruno; Sangokoya, David; Letouze, Emmanuel; Oliver, Nuria

    2016-01-01

    The unprecedented availability of large-scale human behavioral data is profoundly changing the world we live in. Researchers, companies, governments, financial institutions, non-governmental organizations and also citizen groups are actively experimenting, innovating and adapting algorithmic decision-making tools to understand global patterns of human behavior and provide decision support to tackle problems of societal importance. In this chapter, we focus our attention on social good decision-making algorithms, that is algorithms strongly influencing decision-making and resource optimization of public goods, such as public health, safety, access to finance and fair employment. Through an analysis of specific use cases and approaches, we highlight both the positive opportunities that are created through data-driven algorithmic decision-making, and the potential negative consequences that practitioners should be aware of and address in order to truly realize the potential of this emergent field. We elaborate o...

  10. Prognostic and health management for engineering systems: a review of the data-driven approach and algorithms

    Directory of Open Access Journals (Sweden)

    Thamo Sutharssan

    2015-07-01

    Full Text Available Prognostics and health management (PHM has become an important component of many engineering systems and products, where algorithms are used to detect anomalies, diagnose faults and predict remaining useful lifetime (RUL. PHM can provide many advantages to users and maintainers. Although primary goals are to ensure the safety, provide state of the health and estimate RUL of the components and systems, there are also financial benefits such as operational and maintenance cost reductions and extended lifetime. This study aims at reviewing the current status of algorithms and methods used to underpin different existing PHM approaches. The focus is on providing a structured and comprehensive classification of the existing state-of-the-art PHM approaches, data-driven approaches and algorithms.

  11. A predictive estimation method for carbon dioxide transport by data-driven modeling with a physically-based data model.

    Science.gov (United States)

    Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young; Jun, Seong-Chun; Choung, Sungwook; Yun, Seong-Taek; Oh, Junho; Kim, Hyun-Jun

    2017-09-27

    In this study, a data-driven method for predicting CO2 leaks and associated concentrations from geological CO2 sequestration is developed. Several candidate models are compared based on their reproducibility and predictive capability for CO2 concentration measurements from the Environment Impact Evaluation Test (EIT) site in Korea. Based on the data mining results, a one-dimensional solution of the advective-dispersive equation for steady flow (i.e., Ogata-Banks solution) is found to be most representative for the test data, and this model is adopted as the data model for the developed method. In the validation step, the method is applied to estimate future CO2 concentrations with the reference estimation by the Ogata-Banks solution, where a part of earlier data is used as the training dataset. From the analysis, it is found that the ensemble mean of multiple estimations based on the developed method shows high prediction accuracy relative to the reference estimation. In addition, the majority of the data to be predicted are included in the proposed quantile interval, which suggests adequate representation of the uncertainty by the developed method. Therefore, the incorporation of a reasonable physically-based data model enhances the prediction capability of the data-driven model. The proposed method is not confined to estimations of CO2 concentration and may be applied to various real-time monitoring data from subsurface sites to develop automated control, management or decision-making systems. Copyright © 2017. Published by Elsevier B.V.

  12. Inflammation Following Traumatic Brain Injury in Humans: Insights from Data-Driven and Mechanistic Models into Survival and Death

    Directory of Open Access Journals (Sweden)

    Andrew Abboud

    2016-09-01

    Full Text Available Inflammation induced by traumatic brain injury (TBI is a complex mediator of morbidity and mortality. We have previously demonstrated the utility of both data-driven and mechanistic models in settings of traumatic injury. We hypothesized that differential dynamic inflammation programs characterize TBI survivors vs. non-survivors, and sought to leverage computational modeling to derive novel insights into this life/death bifurcation. Thirteen inflammatory cytokines and chemokines were determined using Luminex™ in serial cerebrospinal fluid (CSF samples from 31 TBI patients over 5 days. In this cohort, 5 were non-survivors (Glasgow Outcome Scale [GOS] score = 1 and 26 were survivors (GOS > 1. A Pearson correlation analysis of initial injury (Glasgow Coma Scale [GCS] vs. GOS suggested that survivors and non-survivors had distinct clinical response trajectories to injury. Statistically significant differences in interleukin (IL-4, IL-5, IL-6, IL-8, IL-13, and tumor necrosis factor-α (TNF-α were observed between TBI survivors vs. non-survivors over 5 days. Principal Component Analysis and Dynamic Bayesian Network inference suggested differential roles of chemokines, TNF-α, IL-6, and IL-10, based upon which an ordinary differential equation model of TBI was generated. This model was calibrated separately to the time course data of TBI survivors vs. non-survivors as a function of initial GCS. Analysis of parameter values in ensembles of simulations from these models suggested differences in microglial and damage responses in TBI survivors vs. non-survivors. These studies suggest the utility of combined data-driven and mechanistic models in the context of human TBI.

  13. Data-driven approach to Type Ia supernovae: variable selection on the peak luminosity and clustering in visual analytics

    Science.gov (United States)

    Uemura, Makoto; Kawabata, Koji S.; Ikeda, Shiro; Maeda, Keiichi; Wu, Hsiang-Yun; Watanabe, Kazuho; Takahashi, Shigeo; Fujishiro, Issei

    2016-03-01

    Type Ia supernovae (SNIa) have an almost uniform peak luminosity, so that they are used as “standard candle” to estimate distances to galaxies in cosmology. In this article, we introduce our two recent works on SNIa based on data-driven approach. The diversity in the peak luminosity of SNIa can be reduced by corrections in several variables. The color and decay rate have been used as the explanatory variables of the peak luminosity in past studies. However, it is proposed that their spectral data could give a better model of the peak luminosity. We use cross-validation in order to control the generalization error and a LASSO-type estimator in order to choose the set of variables. Using 78 samples and 276 candidates of variables, we confirm that the peak luminosity depends on the color and decay rate. Our analysis does not support adding any other variables in order to have a better generalization error. On the other hand, this analysis is based on the assumption that SNIa originate in a single population, while it is not trivial. Indeed, several sub-types possibly having different nature have been proposed. We used a visual analytics tool for the asymmetric biclustering method to find both a good set of variables and samples at the same time. Using 14 variables and 132 samples, we found that SNIa can be divided into two categories by the expansion velocity of ejecta. Those examples demonstrate that the data-driven approach is useful for high-dimensional large-volume data which becomes common in modern astronomy.

  14. Inflammation Following Traumatic Brain Injury in Humans: Insights from Data-Driven and Mechanistic Models into Survival and Death

    Science.gov (United States)

    Abboud, Andrew; Mi, Qi; Puccio, Ava; Okonkwo, David; Buliga, Marius; Constantine, Gregory; Vodovotz, Yoram

    2016-01-01

    Inflammation induced by traumatic brain injury (TBI) is a complex mediator of morbidity and mortality. We have previously demonstrated the utility of both data-driven and mechanistic models in settings of traumatic injury. We hypothesized that differential dynamic inflammation programs characterize TBI survivors vs. non-survivors, and sought to leverage computational modeling to derive novel insights into this life/death bifurcation. Thirteen inflammatory cytokines and chemokines were determined using Luminex™ in serial cerebrospinal fluid (CSF) samples from 31 TBI patients over 5 days. In this cohort, 5 were non-survivors (Glasgow Outcome Scale [GOS] score = 1) and 26 were survivors (GOS > 1). A Pearson correlation analysis of initial injury (Glasgow Coma Scale [GCS]) vs. GOS suggested that survivors and non-survivors had distinct clinical response trajectories to injury. Statistically significant differences in interleukin (IL)-4, IL-5, IL-6, IL-8, IL-13, and tumor necrosis factor-α (TNF-α) were observed between TBI survivors vs. non-survivors over 5 days. Principal Component Analysis and Dynamic Bayesian Network inference suggested differential roles of chemokines, TNF-α, IL-6, and IL-10, based upon which an ordinary differential equation model of TBI was generated. This model was calibrated separately to the time course data of TBI survivors vs. non-survivors as a function of initial GCS. Analysis of parameter values in ensembles of simulations from these models suggested differences in microglial and damage responses in TBI survivors vs. non-survivors. These studies suggest the utility of combined data-driven and mechanistic models in the context of human TBI. PMID:27729864

  15. Reproducibility of data-driven dietary patterns in two groups of adult Spanish women from different studies.

    Science.gov (United States)

    Castelló, Adela; Lope, Virginia; Vioque, Jesús; Santamariña, Carmen; Pedraz-Pingarrón, Carmen; Abad, Soledad; Ederra, Maria; Salas-Trejo, Dolores; Vidal, Carmen; Sánchez-Contador, Carmen; Aragonés, Nuria; Pérez-Gómez, Beatriz; Pollán, Marina

    2016-08-01

    The objective of the present study was to assess the reproducibility of data-driven dietary patterns in different samples extracted from similar populations. Dietary patterns were extracted by applying principal component analyses to the dietary information collected from a sample of 3550 women recruited from seven screening centres belonging to the Spanish breast cancer (BC) screening network (Determinants of Mammographic Density in Spain (DDM-Spain) study). The resulting patterns were compared with three dietary patterns obtained from a previous Spanish case-control study on female BC (Epidemiological study of the Spanish group for breast cancer research (GEICAM: grupo Español de investigación en cáncer de mama)) using the dietary intake data of 973 healthy participants. The level of agreement between patterns was determined using both the congruence coefficient (CC) between the pattern loadings (considering patterns with a CC≥0·85 as fairly similar) and the linear correlation between patterns scores (considering as fairly similar those patterns with a statistically significant correlation). The conclusions reached with both methods were compared. This is the first study exploring the reproducibility of data-driven patterns from two studies and the first using the CC to determine pattern similarity. We were able to reproduce the EpiGEICAM Western pattern in the DDM-Spain sample (CC=0·90). However, the reproducibility of the Prudent (CC=0·76) and Mediterranean (CC=0·77) patterns was not as good. The linear correlation between pattern scores was statistically significant in all cases, highlighting its arbitrariness for determining pattern similarity. We conclude that the reproducibility of widely prevalent dietary patterns is better than the reproducibility of more population-specific patterns. More methodological studies are needed to establish an objective measurement and threshold to determine pattern similarity.

  16. Data driven testing development based on QTP%基于QTP的数据驱动测试开发

    Institute of Scientific and Technical Information of China (English)

    王敏; 高霞; 王智超

    2014-01-01

    This paper proposed a test case design format for automatic testing , using external data sources to realize more complex data driven testing by the programming function of QTP . The method used Excellas data source , and established the de-sign format of test data in Excell, and then created a function library to realize the data-driven automation test . This method makes it easy to design test data , and improve the readability of test data , it can effectively realize the separation of test data from the test script and improve the test efficiency and the maintenance efficiency of test data .%提出采用面向自动化测试的测试用例设计格式,通过 QTP 的编程功能,使用外部数据源来实现较复杂的数据驱动测试。具体方法是采用 Excel 作为测试数据源,制定 Excel 中测试数据的设计格式,建立专用的函数库来操作 Excel 数据,实现以数据为驱动的自动化测试。这一方法能有效地实现测试数据与测试脚本的分离,同时,测试数据的设计操作变得方便,数据的可读性增强,提高了测试效率和测试数据的维护效率。

  17. Data-Driven Nonlinear Subspace Modeling for Prediction and Control of Molten Iron Quality Indices in Blast Furnace Ironmaking

    Energy Technology Data Exchange (ETDEWEB)

    Zhou, Ping; Song, Heda; Wang, Hong; Chai, Tianyou

    2017-09-01

    Blast furnace (BF) in ironmaking is a nonlinear dynamic process with complicated physical-chemical reactions, where multi-phase and multi-field coupling and large time delay occur during its operation. In BF operation, the molten iron temperature (MIT) as well as Si, P and S contents of molten iron are the most essential molten iron quality (MIQ) indices, whose measurement, modeling and control have always been important issues in metallurgic engineering and automation field. This paper develops a novel data-driven nonlinear state space modeling for the prediction and control of multivariate MIQ indices by integrating hybrid modeling and control techniques. First, to improve modeling efficiency, a data-driven hybrid method combining canonical correlation analysis and correlation analysis is proposed to identify the most influential controllable variables as the modeling inputs from multitudinous factors would affect the MIQ indices. Then, a Hammerstein model for the prediction of MIQ indices is established using the LS-SVM based nonlinear subspace identification method. Such a model is further simplified by using piecewise cubic Hermite interpolating polynomial method to fit the complex nonlinear kernel function. Compared to the original Hammerstein model, this simplified model can not only significantly reduce the computational complexity, but also has almost the same reliability and accuracy for a stable prediction of MIQ indices. Last, in order to verify the practicability of the developed model, it is applied in designing a genetic algorithm based nonlinear predictive controller for multivariate MIQ indices by directly taking the established model as a predictor. Industrial experiments show the advantages and effectiveness of the proposed approach.

  18. Batch and Fed-Batch Fermentation System on Ethanol Production from Whey using Kluyveromyces marxianus

    Directory of Open Access Journals (Sweden)

    H Hadiyanto

    2013-10-01

    Full Text Available Nowadays reserve of fossil fuel has gradually depleted. This condition forces many researchers to  find energy alternatives which is renewable and sustainable in the future. Ethanol derived from cheese industrial waste (whey using fermentation process can be a new perspective in order to secure both energy and environment. The aim of this study was  to compare the operation modes (batch and fed-batch of fermentation system on ethanol production from whey using Kluyveromyces marxianus. The result showed that the fermentation process for ethanol production by fed-batch system was higher at some point of parameters compared with batch system. Growth rate and ethanol yield (YP/S of fed-batch fermentation were 0.122/h and 0.21 gP/gS respectively; growth rate and ethanol yield (YP/S of batch fermentation were 0.107/h, and 0.12 g ethanol/g substrate, respectively. Based on the data of biomass and ethanol concentrations, the fermentation process for ethanol production by fed-batch system were higher at some point of parameters compared to batch system. Periodic substrate addition performed on fed-batch system leads the yeast growth in low substrate concentrations and consequently  increasing their activity and ethanol productivity. Keywords: batch; ethanol; fed-batch; fermentation;Kluyveromyces marxianus, whey

  19. Uneven batch data alignment with application to the control of batch end-product quality.

    Science.gov (United States)

    Wan, Jian; Marjanovic, Ognjen; Lennox, Barry

    2014-03-01

    Batch processes are commonly characterized by uneven trajectories due to the existence of batch-to-batch variations. The batch end-product quality is usually measured at the end of these uneven trajectories. It is necessary to align the time differences for both the measured trajectories and the batch end-product quality in order to implement statistical process monitoring and control schemes. Apart from synchronizing trajectories with variable lengths using an indicator variable or dynamic time warping, this paper proposes a novel approach to align uneven batch data by identifying short-window PCA&PLS models at first and then applying these identified models to extend shorter trajectories and predict future batch end-product quality. Furthermore, uneven batch data can also be aligned to be a specified batch length using moving window estimation. The proposed approach and its application to the control of batch end-product quality are demonstrated with a simulated example of fed-batch fermentation for penicillin production. Copyright © 2013 ISA. Published by Elsevier Ltd. All rights reserved.

  20. Batch process. Changes and problems of a batch process; Bacchi prosesu no hensen to kadai

    Energy Technology Data Exchange (ETDEWEB)

    Niwa, T. [Asahi Engineering (Japan)

    1997-09-05

    One of the characteristics of the manufacture of fine chemical products is multikind production. The life cycles of chemical industrial products have become shorter, and the difference between these life cycles and those of the manufacturing facility has become larger. The use of an FMS (Flexible Manufacturing System) has been demanded as the measure for solving the problems, and the advantages of a batch process have begun to be reconsidered. This paper describes the history of the development of a batch process, and then explains the problems of a batch process. The paper mentions the process control techniques, production information control systems, production support systems, training systems and process simulation systems as the main techniques supporting the development of a batch process. The paper mentions the modeling and standardizing of a batch process, systematic batch process designing methods and the modeling of a production control information system as the problems of a batch process. 8 refs., 5 figs., 2 tabs.

  1. LSF usage for batch at CERN

    CERN Multimedia

    Schwickerath, Ulrich

    2007-01-01

    Contributed poster to the CHEP07. Original abstract: LSF 7, the latest version of Platform's batch workload management system, addresses many issues which limited the ability of LSF 6.1 to support large scale batch farms, such as the lxbatch service at CERN. In this paper we will present the status of the evaluation and deployment of LSF 7 at CERN, including issues concerning the integration of LSF 7 with the gLite grid middleware suite and, in particular, the steps taken to endure an efficient reporting of the local batch system status and usage to the Grid Information System

  2. Data-Driven Shakespeare

    Science.gov (United States)

    Bambrick-Santoyo, Paul

    2016-01-01

    Write first, talk second--it's a simple strategy, but one that's underused in literature classes, writes Paul Bambrick-Santoyo. The author describes a lesson on Shakespeare's Sonnet 65 conducted by a middle school English teacher, who incorporates writing as an important precursor to classroom discussion. By having students write about the poem…

  3. Data Driven Dark Ages

    Science.gov (United States)

    Thurner, Stefan

    If physics is the experimental science of matter that interacts through the four basic interactions, the science of complex systems is its natural extension, where the concepts of matter and interactions are generalized. Matter can be anything that is capable of interacting, interactions can be anything that is able to change states of the constituents of a system. Complex systems are made from many constituents (parts) that interact through interaction networks. These parts are characterized by states that change over time. At the same time the interaction networks may change over time. What makes a system complex is that the states of the parts change as a function F of the interaction network (and the states), and, simultaneously, the interaction networks change as another function G of the states of the nodes (and the networks). Physics is about the predictive understanding of the dynamics and changes of states once the interactions and initial and boundary conditions are specified. In complex systems interactions also change over time, and to make things really complicated, these changes are coupled to the dynamics of the state-changes. States co-evolve with the interaction networks. In this sense complex systems often are chicken-egg problems. They are evolutionary, show emergent behavior, can be self-organized critical, show power laws, etc...

  4. Data-Driven Shakespeare

    Science.gov (United States)

    Bambrick-Santoyo, Paul

    2016-01-01

    Write first, talk second--it's a simple strategy, but one that's underused in literature classes, writes Paul Bambrick-Santoyo. The author describes a lesson on Shakespeare's Sonnet 65 conducted by a middle school English teacher, who incorporates writing as an important precursor to classroom discussion. By having students write about the poem…

  5. Master-Batch Sector Develops Rapidly

    Institute of Scientific and Technical Information of China (English)

    Wu Lifeng

    2007-01-01

    @@ Plastic industry promotes the development of the master-batch sector The plastic processing industry in China has developed rapidly. The output is increasing rapidly and the quality is improving constantly.

  6. Batch process. Application of CAE technique to a batch process; Bacchi purosesu eno CAE gijutsu no tenkai

    Energy Technology Data Exchange (ETDEWEB)

    Wang, Y.; Nakai, K.; Oba, S. [Aspentic Japan Co. Ltd. (Japan)

    1997-09-05

    This paper introduces recent topics of the application of the CAE technique to a batch process. A batch distillation modeling tool (BATCHFRAC) is aimed at modeling a distillation tower and a batch reactor for a batch process for fine chemical products, and is provided as an expanded additional function for ASPEN FLUS batch distillation. A batch process designing system (BATCH PLUS) is a comprehensive batch process simulator for efficiently carrying out the designing, the development or the analysis of a complicated recipe-based batch process concerning medical treatment, biotechnology and agriculture. A batch process information control system (Batch/21) is provided as a system having an expanded and additional function for a batch process of InfoPlus/21, an information control system which enables the observation, management and controlling of a process. 4 figs.

  7. Systematic Methodology for Reproducible Optimizing Batch Operation

    DEFF Research Database (Denmark)

    Bonné, Dennis; Jørgensen, Sten Bay

    2006-01-01

    This contribution presents a systematic methodology for rapid acquirement of discrete-time state space model representations of batch processes based on their historical operation data. These state space models are parsimoniously parameterized as a set of local, interdependent models. The present....... This controller may also be used for Optimizing control. The modeling and control performance is demonstrated on a fed-batch protein cultivation example. The presented methodologies lend themselves directly for application as Process Analytical Technologies (PAT)....

  8. Batch Extractive Distillation with Light Entrainer

    OpenAIRE

    Varga, Viktoria; Rev, Endre; Gerbaud, Vincent; Fonyo, Zsolt; Joulia, Xavier

    2006-01-01

    Use of a light entrainer in batch extractive distillation is justified when the mixture boils at a high temperature, or when an appropriate heavy or intermediate entrainer cannot be found. Feasibility of batch extractive distillation with light entrainer for separating minimum and maximum boiling azeotropes and close boiling mixtures is studied in this article. Our test mixtures are: ethanol/water (minimum boiling azeotrope) with methanol, water/ethylene diamine (maximum boiling azeotro...

  9. Batch extractive distillation with light entrainer

    OpenAIRE

    Varga, Viktoria; Rev, Endre; Gerbaud, Vincent; Lelkes, Zoltan; Fonyo, Zsolt; Joulia, Xavier

    2006-01-01

    Use of a light entrainer in batch extractive distillation is justified when the mixture boils at a high temperature, or when an appropriate heavy or intermediate entrainer cannot be found. Feasibility of batch extractive distillation with light entrainer for separating minimum and maximum boiling azeotropes and close boiling mixtures is studied in this article. Our test mixtures are: ethanol / water (minimum boiling azeotrope) with methanol, water / ethylene diamine (maximum boiling azeotrope...

  10. Development of a data-driven semi-distributed hydrological model for regional scale catchments prone to Mediterranean flash floods

    Science.gov (United States)

    Adamovic, M.; Branger, F.; Braud, I.; Kralisch, S.

    2016-10-01

    Flash floods represent one of the most destructive natural hazards in the Mediterranean region. These floods result from very intense and spatially heterogeneous rainfall events. Distributed hydrological models are valuable tools to study these phenomena and increase our knowledge on the main processes governing the generation and propagation of floods over large spatial scales. They are generally built using a bottom-up approach that generalizes small-physics representations of processes. However, top-down or data-driven approach is increasingly shown to provide also valuable knowledge. A simplified semi-distributed continuous hydrological model, named SIMPLEFLOOD, was developed, based on the simple dynamical system approach (SDSA) proposed by Kirchner (WRR, 2009, 45, W02429), and applied to the Ardèche catchment in France (2388 km2). This data-driven method assumes that discharge at the outlet of a given catchment can be expressed as a function only of catchment storage. It leads to a 3-parameter nonlinear model according to rainfall and runoff observations. This model was distributed over sub-catchments and coupled with a kinematic wave based flow propagation module. The parameters were estimated by discharge recession analyses at several gauged stations. Parameter regionalization was conducted using a Factorial Analysis of Mixed Data (FAMD) and Hierarchical Classification on Principal Component (HCPC) in order to find relationships between the SDSA approach and catchments characteristics. Geology was found to be the main predictor of hydrological response variability and model parameters were regionalized according to the dominant geology. The SIMPLEFLOOD model was applied for a 12-year continuous simulation over the Ardèche catchment. Four flash flood events were also selected for further analysis. The simulated hydrographs were compared with the observations at 11 gauging stations with catchment size ranging from 17 to 2300 km2. The results show a good

  11. FireMap: A Web Tool for Dynamic Data-Driven Predictive Wildfire Modeling Powered by the WIFIRE Cyberinfrastructure

    Science.gov (United States)

    Block, J.; Crawl, D.; Artes, T.; Cowart, C.; de Callafon, R.; DeFanti, T.; Graham, J.; Smarr, L.; Srivas, T.; Altintas, I.

    2016-12-01

    The NSF-funded WIFIRE project has designed a web-based wildfire modeling simulation and visualization tool called FireMap. The tool executes FARSITE to model fire propagation using dynamic weather and fire data, configuration settings provided by the user, and static topography and fuel datasets already built-in. Using GIS capabilities combined with scalable big data integration and processing, FireMap enables simple execution of the model with options for running ensembles by taking the information uncertainty into account. The results are easily viewable, sharable, repeatable, and can be animated as a time series. From these capabilities, users can model real-time fire behavior, analyze what-if scenarios, and keep a history of model runs over time for sharing with collaborators. Firemap runs FARSITE with national and local sensor networks for real-time weather data ingestion and High-Resolution Rapid Refresh (HRRR) weather for forecasted weather. The HRRR is a NOAA/NCEP operational weather prediction system comprised of a numerical forecast model and an analysis/assimilation system to initialize the model. It is run with a horizontal resolution of 3 km, has 50 vertical levels, and has a temporal resolution of 15 minutes. The HRRR requires an Environmental Data Exchange (EDEX) server to receive the feed and generate secondary products out of it for the modeling. UCSD's EDEX server, funded by NSF, makes high-resolution weather data available to researchers worldwide and enables visualization of weather systems and weather events lasting months or even years. The high-speed server aggregates weather data from the University Consortium for Atmospheric Research by way of a subscription service from the Consortium called the Internet Data Distribution system. These features are part of WIFIRE's long term goals to build an end-to-end cyberinfrastructure for real-time and data-driven simulation, prediction and visualization of wildfire behavior. Although Firemap is a

  12. First steps in incorporating data-driven modelling to flood early warning in Norway's Flood Forecasting Service

    Science.gov (United States)

    Borsányi, Péter; Hamududu, Byman; Wong Kwok, Wai; Magnusson, Jan; Shi, Min

    2016-04-01

    The national Flood Early Warning Services (FEWS) in Norway use time-series of precipitation and temperature data as input to conceptual physically based rainfall-runoff models for forecasts. Runoff is forecasted in selected catchments and the warnings are based on regionalization of these. This concept proved useful in many catchments, however there are some exceptions, where forecasts are of worse quality. To improve this, data-driven modelling (DDM) techniques are sought applied. The first objective of the study is to identify those DDM methods, which are feasible for application and can easily be fit in the present, well-developed procedures of the operational FEWS. Therefore an experiment is conducted, where about thirty years of daily accumulated precipitation and daily mean temperature as input and observed runoff as output data are used. This was repeated from five, regionally and physically different catchments. In each case different DDMs were developed and their simulation results compared to those generated by the operational (conceptual based) models and to the observations. The methods of Artificial Neural Networks, Genetic Programming, Evolutionary Polynomial Regression and Support Vector Machines were used in the experiment. Various combinations of the last, the last two and the last three timesteps (in this case: days) of the data was tested as possible inputs. Forecast quality was described by Absolute Accumulated Error, Root Mean Square Error, Nash-Sutcliff Efficiency, the Ideal Point Error (combination of the previous) as well as by Taylor-diagrams. The first comparisons show promising results, which need to be further examined. The follow-up study will first focus on standardizing and automating the tests on forecast quality to be able to perform the studies on a larger number of datasets, as well as for other forecast periods. We expect the DDM to perform better in cases where conceptual models don't perform well. In these cases the quality

  13. PEPSI-Dock: a detailed data-driven protein-protein interaction potential accelerated by polar Fourier correlation.

    Science.gov (United States)

    Neveu, Emilie; Ritchie, David W; Popov, Petr; Grudinin, Sergei

    2016-09-01

    Docking prediction algorithms aim to find the native conformation of a complex of proteins from knowledge of their unbound structures. They rely on a combination of sampling and scoring methods, adapted to different scales. Polynomial Expansion of Protein Structures and Interactions for Docking (PEPSI-Dock) improves the accuracy of the first stage of the docking pipeline, which will sharpen up the final predictions. Indeed, PEPSI-Dock benefits from the precision of a very detailed data-driven model of the binding free energy used with a global and exhaustive rigid-body search space. As well as being accurate, our computations are among the fastest by virtue of the sparse representation of the pre-computed potentials and FFT-accelerated sampling techniques. Overall, this is the first demonstration of a FFT-accelerated docking method coupled with an arbitrary-shaped distance-dependent interaction potential. First, we present a novel learning process to compute data-driven distant-dependent pairwise potentials, adapted from our previous method used for rescoring of putative protein-protein binding poses. The potential coefficients are learned by combining machine-learning techniques with physically interpretable descriptors. Then, we describe the integration of the deduced potentials into a FFT-accelerated spherical sampling provided by the Hex library. Overall, on a training set of 163 heterodimers, PEPSI-Dock achieves a success rate of 91% mid-quality predictions in the top-10 solutions. On a subset of the protein docking benchmark v5, it achieves 44.4% mid-quality predictions in the top-10 solutions when starting from bound structures and 20.5% when starting from unbound structures. The method runs in 5-15 min on a modern laptop and can easily be extended to other types of interactions. https://team.inria.fr/nano-d/software/PEPSI-Dock sergei.grudinin@inria.fr. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e

  14. Performance Evaluation and Online Realization of Data-driven Normalization Methods Used in LC/MS based Untargeted Metabolomics Analysis.

    Science.gov (United States)

    Li, Bo; Tang, Jing; Yang, Qingxia; Cui, Xuejiao; Li, Shuang; Chen, Sijie; Cao, Quanxing; Xue, Weiwei; Chen, Na; Zhu, Feng

    2016-12-13

    In untargeted metabolomics analysis, several factors (e.g., unwanted experimental &biological variations and technical errors) may hamper the identification of differential metabolic features, which requires the data-driven normalization approaches before feature selection. So far, ≥16 normalization methods have been widely applied for processing the LC/MS based metabolomics data. However, the performance and the sample size dependence of those methods have not yet been exhaustively compared and no online tool for comparatively and comprehensively evaluating the performance of all 16 normalization methods has been provided. In this study, a comprehensive comparison on these methods was conducted. As a result, 16 methods were categorized into three groups based on their normalization performances across various sample sizes. The VSN, the Log Transformation and the PQN were identified as methods of the best normalization performance, while the Contrast consistently underperformed across all sub-datasets of different benchmark data. Moreover, an interactive web tool comprehensively evaluating the performance of 16 methods specifically for normalizing LC/MS based metabolomics data was constructed and hosted at http://server.idrb.cqu.edu.cn/MetaPre/. In summary, this study could serve as a useful guidance to the selection of suitable normalization methods in analyzing the LC/MS based metabolomics data.

  15. Data-driven Radiative Hydrodynamic Modeling of the 2014 March 29 X1.0 Solar Flare

    CERN Document Server

    da Costa, Fatima Rubio; Petrosian, Vahe'; Liu, Wei; Allred, Joel C

    2016-01-01

    Spectroscopic observations of solar flares provide critical diagnostics of the physical conditions in the flaring atmosphere. Some key features in observed spectra have not yet been accounted for in existing flare models. Here we report a data-driven simulation of the well-observed X1.0 flare on 2014 March 29 that can reconcile some well-known spectral discrepancies. We analyzed spectra of the flaring region from the Interface Region Imaging Spectrograph (IRIS) in MgII h&k, the Interferometric BIdimensional Spectropolarimeter at the Dunn Solar Telescope (DST/IBIS) in H$\\alpha$ 6563 {\\AA} and CaII 8542 {\\AA}, and the Reuven Ramaty High Energy Solar Spectroscope Imager (RHESSI) in hard X-rays. We constructed a multi-threaded flare loop model and used the electron flux inferred from RHESSI data as the input to the radiative hydrodynamic code RADYN to simulate the atmospheric response. We then synthesized various chromospheric emission lines and compared them with the IRIS and IBIS observations. In general, t...

  16. Data-Driven Techniques for Detecting Dynamical State Changes in Noisily Measured 3D Single-Molecule Trajectories

    Directory of Open Access Journals (Sweden)

    Christopher P. Calderon

    2014-11-01

    Full Text Available Optical microscopes and nanoscale probes (AFM, optical tweezers, etc. afford researchers tools capable of quantitatively exploring how molecules interact with one another in live cells. The analysis of in vivo single-molecule experimental data faces numerous challenges due to the complex, crowded, and time changing environments associated with live cells. Fluctuations and spatially varying systematic forces experienced by molecules change over time; these changes are obscured by “measurement noise” introduced by the experimental probe monitoring the system. In this article, we demonstrate how the Hierarchical Dirichlet Process Switching Linear Dynamical System (HDP-SLDS of Fox et al. [IEEE Transactions on Signal Processing 59] can be used to detect both subtle and abrupt state changes in time series containing “thermal” and “measurement” noise. The approach accounts for temporal dependencies induced by random and “systematic overdamped” forces. The technique does not require one to subjectively select the number of “hidden states” underlying a trajectory in an a priori fashion. The number of hidden states is simultaneously inferred along with change points and parameters characterizing molecular motion in a data-driven fashion. We use large scale simulations to study and compare the new approach to state-of-the-art Hidden Markov Modeling techniques. Simulations mimicking single particle tracking (SPT experiments are the focus of this study.

  17. Using data-driven discrete-time models and the unscented Kalman filter to estimate unobserved variables of nonlinear systems

    Science.gov (United States)

    Aguirre, Luis Antonio; Teixeira, Bruno Otávio S.; Tôrres, Leonardo Antônio B.

    2005-08-01

    This paper addresses the problem of state estimation for nonlinear systems by means of the unscented Kalman filter (UKF). Compared to the traditional extended Kalman filter, the UKF does not require the local linearization of the system equations used in the propagation stage. Important results using the UKF have been reported recently but in every case the system equations used by the filter were considered known. Not only that, such models are usually considered to be differential equations, which requires that numerical integration be performed during the propagation phase of the filter. In this paper the dynamical equations of the system are taken to be difference equations—thus avoiding numerical integration—and are built from data without prior knowledge. The identified models are subsequently implemented in the filter in order to accomplish state estimation. The paper discusses the impact of not knowing the exact equations and using data-driven models in the context of state and joint state-and-parameter estimation. The procedure is illustrated by means of examples that use simulated and measured data.

  18. Sensitivity of a data-driven soil water balance model to estimate summer evapotranspiration along a forest chronosequence

    Directory of Open Access Journals (Sweden)

    J. A. Breña Naranjo

    2011-11-01

    Full Text Available The hydrology of ecosystem succession gives rise to new challenges for the analysis and modelling of water balance components. Recent large-scale alterations of forest cover across the globe suggest that a significant portion of new biophysical environments will influence the long-term dynamics and limits of water fluxes compared to pre-succession conditions. This study assesses the estimation of summer evapotranspiration along three FLUXNET sites at Campbell River, British Columbia, Canada using a data-driven soil water balance model validated by Eddy Covariance measurements. It explores the sensitivity of the model to different forest succession states, a wide range of computational time steps, rooting depths, and canopy interception capacity values. Uncertainty in the measured EC fluxes resulting in an energy imbalance was consistent with previous studies and does not affect the validation of the model. The agreement between observations and model estimates proves that the usefulness of the method to predict summer AET over mid- and long-term periods is independent of stand age. However, an optimal combination of the parameters rooting depth, time step and interception capacity threshold is needed to avoid an underestimation of AET as seen in past studies. The study suggests that summer AET could be estimated and monitored in many more places than those equipped with Eddy Covariance or sap-flow measurements to advance the understanding of water balance changes in different successional ecosystems.

  19. A New Application of Dynamic Data Driven System in the Talbot-Ogden Model for Groundwater Infiltration

    KAUST Repository

    Yu, Han

    2012-06-02

    The TalbotOgden model is a mass conservative method to simulate flow of a wetting liquid in variably-saturated porous media. The principal feature of this model is the discretization of the moisture content domain into bins. This paper gives an analysis of the relationship between the number of bins and the computed flux. Under the circumstances of discrete bins and discontinuous wetting fronts, we show that fluxes increase with the number of bins. We then apply this analysis to the continuous case and get an upper bound of the difference of infiltration rates when the number of bins tends to infinity. We also extend this model by creating a two dimensional moisture content domain so that there exists a probability distribution of the moisture content for different soil systems. With these theoretical and experimental results and using a Dynamic Data Driven Application System (DDDAS), sensors can be put in soils to detect the infiltration fluxes, which are important to compute the proper number of bins for a specific soil system and predict fluxes. Using this feedback control loop, the extended TalbotOgden model can be made more efficient for estimating infiltration into soils.

  20. Customized maximal-overlap multiwavelet denoising with data-driven group threshold for condition monitoring of rolling mill drivetrain

    Science.gov (United States)

    Chen, Jinglong; Wan, Zhiguo; Pan, Jun; Zi, Yanyang; Wang, Yu; Chen, Binqiang; Sun, Hailiang; Yuan, Jing; He, Zhengjia

    2016-02-01

    Fault identification timely of rolling mill drivetrain is significant for guaranteeing product quality and realizing long-term safe operation. So, condition monitoring system of rolling mill drivetrain is designed and developed. However, because compound fault and weak fault feature information is usually sub-merged in heavy background noise, this task still faces challenge. This paper provides a possibility for fault identification of rolling mills drivetrain by proposing customized maximal-overlap multiwavelet denoising method. The effectiveness of wavelet denoising method mainly relies on the appropriate selections of wavelet base, transform strategy and threshold rule. First, in order to realize exact matching and accurate detection of fault feature, customized multiwavelet basis function is constructed via symmetric lifting scheme and then vibration signal is processed by maximal-overlap multiwavelet transform. Next, based on spatial dependency of multiwavelet transform coefficients, spatial neighboring coefficient data-driven group threshold shrinkage strategy is developed for denoising process by choosing the optimal group length and threshold via the minimum of Stein's Unbiased Risk Estimate. The effectiveness of proposed method is first demonstrated through compound fault identification of reduction gearbox on rolling mill. Then it is applied for weak fault identification of dedusting fan bearing on rolling mill and the results support its feasibility.

  1. An ISA-TAB-Nano based data collection framework to support data-driven modelling of nanotoxicology

    Directory of Open Access Journals (Sweden)

    Richard L. Marchese Robinson

    2015-10-01

    Full Text Available Analysis of trends in nanotoxicology data and the development of data driven models for nanotoxicity is facilitated by the reporting of data using a standardised electronic format. ISA-TAB-Nano has been proposed as such a format. However, in order to build useful datasets according to this format, a variety of issues has to be addressed. These issues include questions regarding exactly which (metadata to report and how to report them. The current article discusses some of the challenges associated with the use of ISA-TAB-Nano and presents a set of resources designed to facilitate the manual creation of ISA-TAB-Nano datasets from the nanotoxicology literature. These resources were developed within the context of the NanoPUZZLES EU project and include data collection templates, corresponding business rules that extend the generic ISA-TAB-Nano specification as well as Python code to facilitate parsing and integration of these datasets within other nanoinformatics resources. The use of these resources is illustrated by a “Toy Dataset” presented in the Supporting Information. The strengths and weaknesses of the resources are discussed along with possible future developments.

  2. An optimal baseline selection methodology for data-driven damage detection and temperature compensation in acousto-ultrasonics

    Science.gov (United States)

    Torres-Arredondo, M.-A.; Sierra-Pérez, Julián; Cabanes, Guénaël

    2016-05-01

    The process of measuring and analysing the data from a distributed sensor network all over a structural system in order to quantify its condition is known as structural health monitoring (SHM). For the design of a trustworthy health monitoring system, a vast amount of information regarding the inherent physical characteristics of the sources and their propagation and interaction across the structure is crucial. Moreover, any SHM system which is expected to transition to field operation must take into account the influence of environmental and operational changes which cause modifications in the stiffness and damping of the structure and consequently modify its dynamic behaviour. On that account, special attention is paid in this paper to the development of an efficient SHM methodology where robust signal processing and pattern recognition techniques are integrated for the correct interpretation of complex ultrasonic waves within the context of damage detection and identification. The methodology is based on an acousto-ultrasonics technique where the discrete wavelet transform is evaluated for feature extraction and selection, linear principal component analysis for data-driven modelling and self-organising maps for a two-level clustering under the principle of local density. At the end, the methodology is experimentally demonstrated and results show that all the damages were detectable and identifiable.

  3. Data-Driven Tracking Control With Adaptive Dynamic Programming for a Class of Continuous-Time Nonlinear Systems.

    Science.gov (United States)

    Mu, Chaoxu; Ni, Zhen; Sun, Changyin; He, Haibo

    2016-04-22

    A data-driven adaptive tracking control approach is proposed for a class of continuous-time nonlinear systems using a recent developed goal representation heuristic dynamic programming (GrHDP) architecture. The major focus of this paper is on designing a multivariable tracking scheme, including the filter-based action network (FAN) architecture, and the stability analysis in continuous-time fashion. In this design, the FAN is used to observe the system function, and then generates the corresponding control action together with the reference signals. The goal network will provide an internal reward signal adaptively based on the current system states and the control action. This internal reward signal is assigned as the input for the critic network, which approximates the cost function over time. We demonstrate its improved tracking performance in comparison with the existing heuristic dynamic programming (HDP) approach under the same parameter and environment settings. The simulation results of the multivariable tracking control on two examples have been presented to show that the proposed scheme can achieve better control in terms of learning speed and overall performance.

  4. GIS-Based and Data-Driven Bivariate Landslide-Susceptibility Mapping in the Three Gorges Area, China

    Institute of Scientific and Technical Information of China (English)

    BAI Shi-Biao; WANG Jian; L(U) Guo-Nian; ZHOU Ping-Gen; HOU Sheng-Shan; XU Su-Ning

    2009-01-01

    A detailed landslide-susceptibility map was produced using a data-driven objective bivariate analysis method with datasets developed for a geographic information system (CIS). Known as one of the most landslide-prone areas in China, the Zhongxian-Shizhu Segment in the Three Gorges Reservoir region of China was selected as a suitable case because of the frequency and distribution of landslides. The site covered an area of 260.93 km2 with a landslide area of 5.32 km2. Four data domains were used in this study, including remote sensing products, thematic maps, geological maps, and topographical maps, all with 25 m × 25 m pixels. Statistical relationships for landslide susceptibility were developed using landslide and landslide causative factor databases. All continuous variables were converted to categorical variables according to the percentile divisions of seed cells, and the corresponding class weight values were calculated and summed to create the susceptibility map. According to the map, 3.6% of the study area was identified as high-susceptibility. Extremely low-, very low-, low-, and medium-susceptibility zones covered 19.66%, 31.69%, 27.95%, and 17.1% of the area, respectively. The high- and medium-hazardous zones are along both sides of the Yangtze River, being in agreement with the actual distribution of landslides.

  5. Data-driven inference of network connectivity for modeling the dynamics of neural codes in the insect antennal lobe

    Directory of Open Access Journals (Sweden)

    Eli eShlizerman

    2014-08-01

    Full Text Available The antennal lobe (AL, olfactory processing center in insects, is able to process stimuli into distinct neural activity patterns, called olfactory neural codes. To model their dynamics we perform multichannel recordings from the projection neurons in the AL driven by different odorants. We then derive a dynamic neuronal network from the electrophysiological data. The network consists of lateral-inhibitory neurons and excitatory neurons (modeled as firing-rate units, and is capable of producing unique olfactory neural codes for the tested odorants. To construct the network, we (i design a projection, an odor space, for the neural recording from the AL, which discriminates between distinct odorants trajectories (ii characterize scent recognition, i.e., decision-making based on olfactory signals and (iii infer the wiring of the neural circuit, the connectome of the AL. We show that the constructed model is consistent with biological observations, such as contrast enhancement and robustness to noise. The study suggests a data-driven approach to answer a key biological question in identifying how lateral inhibitory neurons can be wired to excitatory neurons to permit robust activity patterns.

  6. Data-driven honeybee antennal lobe model suggests how stimulus-onset asynchrony can aid odour segregation.

    Science.gov (United States)

    Nowotny, Thomas; Stierle, Jacob S; Galizia, C Giovanni; Szyszka, Paul

    2013-11-01

    Insects have a remarkable ability to identify and track odour sources in multi-odour backgrounds. Recent behavioural experiments show that this ability relies on detecting millisecond stimulus asynchronies between odourants that originate from different sources. Honeybees, Apis mellifera, are able to distinguish mixtures where both odourants arrive at the same time (synchronous mixtures) from those where odourant onsets are staggered (asynchronous mixtures) down to an onset delay of only 6ms. In this paper we explore this surprising ability in a model of the insects' primary olfactory brain area, the antennal lobe. We hypothesize that a winner-take-all inhibitory network of local neurons in the antennal lobe has a symmetry-breaking effect, such that the response pattern in projection neurons to an asynchronous mixture is different from the response pattern to the corresponding synchronous mixture for an extended period of time beyond the initial odourant onset where the two mixture conditions actually differ. The prolonged difference between response patterns to synchronous and asynchronous mixtures could facilitate odoursegregation in downstream circuits of the olfactory pathway. We present a detailed data-driven model of the bee antennal lobe that reproduces a large data set of experimentally observed physiological odour responses, successfully implements the hypothesised symmetry-breaking mechanism and so demonstrates that this mechanism is consistent with our current knowledge of the olfactory circuits in the bee brain. This article is part of a Special Issue entitled Neural Coding 2012.

  7. A data-driven multi-model methodology with deep feature selection for short-term wind forecasting

    Energy Technology Data Exchange (ETDEWEB)

    Feng, Cong; Cui, Mingjian; Hodge, Bri-Mathias; Zhang, Jie

    2017-03-01

    With the growing wind penetration into the power system worldwide, improving wind power forecasting accuracy is becoming increasingly important to ensure continued economic and reliable power system operations. In this paper, a data-driven multi-model wind forecasting methodology is developed with a two-layer ensemble machine learning technique. The first layer is composed of multiple machine learning models that generate individual forecasts. A deep feature selection framework is developed to determine the most suitable inputs to the first layer machine learning models. Then, a blending algorithm is applied in the second layer to create an ensemble of the forecasts produced by first layer models and generate both deterministic and probabilistic forecasts. This two-layer model seeks to utilize the statistically different characteristics of each machine learning algorithm. A number of machine learning algorithms are selected and compared in both layers. This developed multi-model wind forecasting methodology is compared to several benchmarks. The effectiveness of the proposed methodology is evaluated to provide 1-hour-ahead wind speed forecasting at seven locations of the Surface Radiation network. Numerical results show that comparing to the single-algorithm models, the developed multi-model framework with deep feature selection procedure has improved the forecasting accuracy by up to 30%.

  8. Data-driven methods towards learning the highly nonlinear inverse kinematics of tendon-driven surgical manipulators.

    Science.gov (United States)

    Xu, Wenjun; Chen, Jie; Lau, Henry Y K; Ren, Hongliang

    2017-09-01

    Accurate motion control of flexible surgical manipulators is crucial in tissue manipulation tasks. The tendon-driven serpentine manipulator (TSM) is one of the most widely adopted flexible mechanisms in minimally invasive surgery because of its enhanced maneuverability in torturous environments. TSM, however, exhibits high nonlinearities and conventional analytical kinematics model is insufficient to achieve high accuracy. To account for the system nonlinearities, we applied a data driven approach to encode the system inverse kinematics. Three regression methods: extreme learning machine (ELM), Gaussian mixture regression (GMR) and K-nearest neighbors regression (KNNR) were implemented to learn a nonlinear mapping from the robot 3D position states to the control inputs. The performance of the three algorithms was evaluated both in simulation and physical trajectory tracking experiments. KNNR performed the best in the tracking experiments, with the lowest RMSE of 2.1275 mm. The proposed inverse kinematics learning methods provide an alternative and efficient way to accurately model the tendon driven flexible manipulator. Copyright © 2016 John Wiley & Sons, Ltd.

  9. Combining density functional theory calculations, supercomputing, and data-driven methods to design new materials (Conference Presentation)

    Science.gov (United States)

    Jain, Anubhav

    2017-04-01

    Density functional theory (DFT) simulations solve for the electronic structure of materials starting from the Schrödinger equation. Many case studies have now demonstrated that researchers can often use DFT to design new compounds in the computer (e.g., for batteries, catalysts, and hydrogen storage) before synthesis and characterization in the lab. In this talk, I will focus on how DFT calculations can be executed on large supercomputing resources in order to generate very large data sets on new materials for functional applications. First, I will briefly describe the Materials Project, an effort at LBNL that has virtually characterized over 60,000 materials using DFT and has shared the results with over 17,000 registered users. Next, I will talk about how such data can help discover new materials, describing how preliminary computational screening led to the identification and confirmation of a new family of bulk AMX2 thermoelectric compounds with measured zT reaching 0.8. I will outline future plans for how such data-driven methods can be used to better understand the factors that control thermoelectric behavior, e.g., for the rational design of electronic band structures, in ways that are different from conventional approaches.

  10. A new data-driven model for post-transplant antibody dynamics in high risk kidney transplantation.

    Science.gov (United States)

    Zhang, Yan; Briggs, David; Lowe, David; Mitchell, Daniel; Daga, Sunil; Krishnan, Nithya; Higgins, Robert; Khovanova, Natasha

    2017-02-01

    The dynamics of donor specific human leukocyte antigen antibodies during early stage after kidney transplantation are of great clinical interest as these antibodies are considered to be associated with short and long term clinical outcomes. The limited number of antibody time series and their diverse patterns have made the task of modelling difficult. Focusing on one typical post-transplant dynamic pattern with rapid falls and stable settling levels, a novel data-driven model has been developed for the first time. A variational Bayesian inference method has been applied to select the best model and learn its parameters for 39 time series from two groups of graft recipients, i.e. patients with and without acute antibody-mediated rejection (AMR) episodes. Linear and nonlinear dynamic models of different order were attempted to fit the time series, and the third order linear model provided the best description of the common features in both groups. Both deterministic and stochastic parameters are found to be significantly different in the AMR and no-AMR groups showing that the time series in the AMR group have significantly higher frequency of oscillations and faster dissipation rates. This research may potentially lead to better understanding of the immunological mechanisms involved in kidney transplantation.

  11. Nonlinear data-driven identification of polymer electrolyte membrane fuel cells for diagnostic purposes: A Volterra series approach

    Science.gov (United States)

    Ritzberger, D.; Jakubek, S.

    2017-09-01

    In this work, a data-driven identification method, based on polynomial nonlinear autoregressive models with exogenous inputs (NARX) and the Volterra series, is proposed to describe the dynamic and nonlinear voltage and current characteristics of polymer electrolyte membrane fuel cells (PEMFCs). The structure selection and parameter estimation of the NARX model is performed on broad-band voltage/current data. By transforming the time-domain NARX model into a Volterra series representation using the harmonic probing algorithm, a frequency-domain description of the linear and nonlinear dynamics is obtained. With the Volterra kernels corresponding to different operating conditions, information from existing diagnostic tools in the frequency domain such as electrochemical impedance spectroscopy (EIS) and total harmonic distortion analysis (THDA) are effectively combined. Additionally, the time-domain NARX model can be utilized for fault detection by evaluating the difference between measured and simulated output. To increase the fault detectability, an optimization problem is introduced which maximizes this output residual to obtain proper excitation frequencies. As a possible extension it is shown, that by optimizing the periodic signal shape itself that the fault detectability is further increased.

  12. PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry.

    Science.gov (United States)

    Nakata, Maho; Shimazaki, Tomomi

    2017-06-26

    Large-scale molecular databases play an essential role in the investigation of various subjects such as the development of organic materials, in silico drug design, and data-driven studies with machine learning. We have developed a large-scale quantum chemistry database based on first-principles methods. Our database currently contains the ground-state electronic structures of 3 million molecules based on density functional theory (DFT) at the B3LYP/6-31G* level, and we successively calculated 10 low-lying excited states of over 2 million molecules via time-dependent DFT with the B3LYP functional and the 6-31+G* basis set. To select the molecules calculated in our project, we referred to the PubChem Project, which was used as the source of the molecular structures in short strings using the InChI and SMILES representations. Accordingly, we have named our quantum chemistry database project "PubChemQC" ( http://pubchemqc.riken.jp/ ) and placed it in the public domain. In this paper, we show the fundamental features of the PubChemQC database and discuss the techniques used to construct the data set for large-scale quantum chemistry calculations. We also present a machine learning approach to predict the electronic structure of molecules as an example to demonstrate the suitability of the large-scale quantum chemistry database.

  13. Flood probability quantification for road infrastructure: Data-driven spatial-statistical approach and case study applications.

    Science.gov (United States)

    Kalantari, Zahra; Cavalli, Marco; Cantone, Carolina; Crema, Stefano; Destouni, Georgia

    2017-03-01

    Climate-driven increase in the frequency of extreme hydrological events is expected to impose greater strain on the built environment and major transport infrastructure, such as roads and railways. This study develops a data-driven spatial-statistical approach to quantifying and mapping the probability of flooding at critical road-stream intersection locations, where water flow and sediment transport may accumulate and cause serious road damage. The approach is based on novel integration of key watershed and road characteristics, including also measures of sediment connectivity. The approach is concretely applied to and quantified for two specific study case examples in southwest Sweden, with documented road flooding effects of recorded extreme rainfall. The novel contributions of this study in combining a sediment connectivity account with that of soil type, land use, spatial precipitation-runoff variability and road drainage in catchments, and in extending the connectivity measure use for different types of catchments, improve the accuracy of model results for road flood probability. Copyright © 2016 Elsevier B.V. All rights reserved.

  14. Environmental Data-Driven Inquiry and Exploration (EDDIE)- Water Focused Modules for interacting with Big Hydrologic Data

    Science.gov (United States)

    Meixner, T.; Gougis, R.; O'Reilly, C.; Klug, J.; Richardson, D.; Castendyk, D.; Carey, C.; Bader, N.; Stomberg, J.; Soule, D. C.

    2016-12-01

    High-frequency sensor data are driving a shift in the Earth and environmental sciences. The availability of high-frequency data creates an engagement opportunity for undergraduate students in primary research by using large, long-term, and sensor-based, data directly in the scientific curriculum. Project EDDIE (Environmental Data-Driven Inquiry & Exploration) has developed flexible classroom activity modules designed to meet a series of pedagogical goals that include (1) developing skills required to manipulate large datasets at different scales to conduct inquiry-based investigations; (2) developing students' reasoning about statistical variation; and (3) fostering accurate student conceptions about the nature of environmental science. The modules cover a wide range of topics, including lake physics and metabolism, stream discharge, water quality, soil respiration, seismology, and climate change. In this presentation we will focus on a sequence of modules of particular interest to hydrologists - stream discharge, water quality and nutrient loading. Assessment results show that our modules are effective at making students more comfortable analyzing data, improved understanding of statistical concepts, and stronger data analysis capability. This project is funded by an NSF TUES grant (NSF DEB 1245707).

  15. Conceptualizing neuropsychiatric diseases with multimodal data-driven meta-analyses – The case of behavioral variant frontotemporal dementia

    Science.gov (United States)

    Schroeter, Matthias L.; Laird, Angela R.; Chwiesko, Caroline; Deuschl, Christine; Schneider, Else; Bzdok, Danilo; Eickhoff, Simon B.; Neumann, Jane

    2014-01-01

    Introduction Uniform coordinate systems in neuroimaging research have enabled comprehensive systematic and quantitative meta-analyses. Such approaches are particularly relevant for neuropsychiatric diseases, the understanding of their symptoms, prediction and treatment. Behavioral variant frontotemporal dementia (bvFTD), a common neurodegenerative syndrome, is characterized by deep alterations in behavior and personality. Investigating this ‘nexopathy’ elucidates the healthy social and emotional brain. Methods Here, we combine three multimodal meta-analyses approaches – anatomical & activation likelihood estimates and behavioral domain profiles – to identify neural correlates of bvFTD in 417 patients and 406 control subjects and to extract mental functions associated with this disease by meta-analyzing functional activation studies in the comprehensive probabilistic functional brain atlas of the BrainMap database. Results The analyses identify the frontomedian cortex, basal ganglia, anterior insulae and thalamus as most relevant hubs, with a regional dissociation between atrophy and hypometabolism. Neural networks affected by bvFTD were associated with emotion and reward processing, empathy and executive functions (mainly inhibition), suggesting these functions as core domains affected by the disease and finally leading to its clinical symptoms. In contrast, changes in theory of mind or mentalizing abilities seem to be secondary phenomena of executive dysfunctions. Conclusions The study creates a novel conceptual framework to understand neuropsychiatric diseases by powerful data-driven meta-analytic approaches that shall be extended to the whole neuropsychiatric spectrum in the future. PMID:24763126

  16. On the data-driven inference of modulatory networks in climate science: an application to West African rainfall

    Science.gov (United States)

    González, D. L., II; Angus, M. P.; Tetteh, I. K.; Bello, G. A.; Padmanabhan, K.; Pendse, S. V.; Srinivas, S.; Yu, J.; Semazzi, F.; Kumar, V.; Samatova, N. F.

    2015-01-01

    Decades of hypothesis-driven and/or first-principles research have been applied towards the discovery and explanation of the mechanisms that drive climate phenomena, such as western African Sahel summer rainfall~variability. Although connections between various climate factors have been theorized, not all of the key relationships are fully understood. We propose a data-driven approach to identify candidate players in this climate system, which can help explain underlying mechanisms and/or even suggest new relationships, to facilitate building a more comprehensive and predictive model of the modulatory relationships influencing a climate phenomenon of interest. We applied coupled heterogeneous association rule mining (CHARM), Lasso multivariate regression, and dynamic Bayesian networks to find relationships within a complex system, and explored means with which to obtain a consensus result from the application of such varied methodologies. Using this fusion of approaches, we identified relationships among climate factors that modulate Sahel rainfall. These relationships fall into two categories: well-known associations from prior climate knowledge, such as the relationship with the El Niño-Southern Oscillation (ENSO) and putative links, such as North Atlantic Oscillation, that invite further research.

  17. Homologous Helical Jets: Observations by IRIS, SDO and Hinode and Magnetic Modeling with Data-Driven Simulations

    CERN Document Server

    Cheung, Mark C M; Tarbell, T D; Fu, Y; Tian, H; Testa, P; Reeves, K K; Martinez-Sykora, J; Boerner, P; Wuelser, J P; Lemen, J; Title, A M; Hurlburt, N; Kleint, L; Kankelborg, C; Jaeggli, S; Golub, L; McKillop, S; Saar, S; Carlsson, M; Hansteen, V

    2015-01-01

    We report on observations of recurrent jets by instruments onboard the Interface Region Imaging Spectrograph (IRIS), Solar Dynamics Observatory (SDO) and Hinode spacecrafts. Over a 4-hour period on July 21st 2013, recurrent coronal jets were observed to emanate from NOAA Active Region 11793. FUV spectra probing plasma at transition region temperatures show evidence of oppositely directed flows with components reaching Doppler velocities of +/- 100 km/s. Raster Doppler maps using a Si IV transition region line show all four jets to have helical motion of the same sense. Simultaneous observations of the region by SDO and Hinode show that the jets emanate from a source region comprising a pore embedded in the interior of a supergranule. The parasitic pore has opposite polarity flux compared to the surrounding network field. This leads to a spine-fan magnetic topology in the coronal field that is amenable to jet formation. Time-dependent data-driven simulations are used to investigate the underlying drivers for t...

  18. A Data-Driven Noise Reduction Method and Its Application for the Enhancement of Stress Wave Signals

    Directory of Open Access Journals (Sweden)

    Hai-Lin Feng

    2012-01-01

    Full Text Available Ensemble empirical mode decomposition (EEMD has been recently used to recover a signal from observed noisy data. Typically this is performed by partial reconstruction or thresholding operation. In this paper we describe an efficient noise reduction method. EEMD is used to decompose a signal into several intrinsic mode functions (IMFs. The time intervals between two adjacent zero-crossings within the IMF, called instantaneous half period (IHP, are used as a criterion to detect and classify the noise oscillations. The undesirable waveforms with a larger IHP are set to zero. Furthermore, the optimum threshold in this approach can be derived from the signal itself using the consecutive mean square error (CMSE. The method is fully data driven, and it requires no prior knowledge of the target signals. This method can be verified with the simulative program by using Matlab. The denoising results are proper. In comparison with other EEMD based methods, it is concluded that the means adopted in this paper is suitable to preprocess the stress wave signals in the wood nondestructive testing.

  19. Performance of a data-driven technique to changes in wave height and its effect on beach response

    Directory of Open Access Journals (Sweden)

    Jose M. Horrillo-Caraballo

    2016-01-01

    Full Text Available In this study the medium-term response of beach profiles was investigated at two sites: a gently sloping sandy beach and a steeper mixed sand and gravel beach. The former is the Duck site in North Carolina, on the east coast of the USA, which is exposed to Atlantic Ocean swells and storm waves, and the latter is the Milford-on-Sea site at Christchurch Bay, on the south coast of England, which is partially sheltered from Atlantic swells but has a directionally bimodal wave exposure. The data sets comprise detailed bathymetric surveys of beach profiles covering a period of more than 25 years for the Duck site and over 18 years for the Milford-on-Sea site. The structure of the data sets and the data-driven methods are described. Canonical correlation analysis (CCA was used to find linkages between the wave characteristics and beach profiles. The sensitivity of the linkages was investigated by deploying a wave height threshold to filter out the smaller waves incrementally. The results of the analysis indicate that, for the gently sloping sandy beach, waves of all heights are important to the morphological response. For the mixed sand and gravel beach, filtering the smaller waves improves the statistical fit and it suggests that low-height waves do not play a primary role in the medium-term morphological response, which is primarily driven by the intermittent larger storm waves.

  20. Examining the Relationship Between Past Orientation and US Suicide Rates: An Analysis Using Big Data-Driven Google Search Queries

    Science.gov (United States)

    Lee, Donghyun; Lee, Hojun

    2016-01-01

    Background Internet search query data reflect the attitudes of the users, using which we can measure the past orientation to commit suicide. Examinations of past orientation often highlight certain predispositions of attitude, many of which can be suicide risk factors. Objective To investigate the relationship between past orientation and suicide rate by examining Google search queries. Methods We measured the past orientation using Google search query data by comparing the search volumes of the past year and those of the future year, across the 50 US states and the District of Columbia during the period from 2004 to 2012. We constructed a panel dataset with independent variables as control variables; we then undertook an analysis using multiple ordinary least squares regression and methods that leverage the Akaike information criterion and the Bayesian information criterion. Results It was found that past orientation had a positive relationship with the suicide rate (P≤.001) and that it improves the goodness-of-fit of the model regarding the suicide rate. Unemployment rate (P≤.001 in Models 3 and 4), Gini coefficient (P≤.001), and population growth rate (P≤.001) had a positive relationship with the suicide rate, whereas the gross state product (P≤.001) showed a negative relationship with the suicide rate. Conclusions We empirically identified the positive relationship between the suicide rate and past orientation, which was measured by big data-driven Google search query. PMID:26868917

  1. 21 CFR 211.188 - Batch production and control records.

    Science.gov (United States)

    2010-04-01

    ... 21 Food and Drugs 4 2010-04-01 2010-04-01 false Batch production and control records. 211.188... Reports § 211.188 Batch production and control records. Batch production and control records shall be... production and control of each batch. These records shall include: (a) An accurate reproduction of...

  2. 27 CFR 19.748 - Dump/batch records.

    Science.gov (United States)

    2010-04-01

    ... 27 Alcohol, Tobacco Products and Firearms 1 2010-04-01 2010-04-01 false Dump/batch records. 19.748... OF THE TREASURY LIQUORS DISTILLED SPIRITS PLANTS Records and Reports Processing Account § 19.748 Dump/batch records. (a) Format of dump/batch records. Proprietor's dump/batch records shall contain,...

  3. A Batch Feeder for Inhomogeneous Bulk Materials

    Science.gov (United States)

    Vislov, I. S.; Kladiev, S. N.; Slobodyan, S. M.; Bogdan, A. M.

    2016-04-01

    The work includes the mechanical analysis of mechanical feeders and batchers that find application in various technological processes and industrial fields. Feeders are usually classified according to their design features into two groups: conveyor-type feeders and non-conveyor feeders. Batchers are used to batch solid bulk materials. Less frequently, they are used for liquids. In terms of a batching method, they are divided into volumetric and weighting batchers. Weighting batchers do not provide for sufficient batching accuracy. Automatic weighting batchers include a mass controlling sensor and systems for automatic material feed and automatic mass discharge control. In terms of operating principle, batchers are divided into gravitational batchers and batchers with forced feed of material using conveyors and pumps. Improved consumption of raw materials, decreased loss of materials, ease of use in automatic control systems of industrial facilities allows increasing the quality of technological processes and improve labor conditions. The batch feeder suggested by the authors is a volumetric batcher that has no comparable counterparts among conveyor-type feeders and allows solving the problem of targeted feeding of bulk material batches increasing reliability and hermeticity of the device.

  4. Energy efficiency of batch and semi-batch (CCRO) reverse osmosis desalination.

    Science.gov (United States)

    Warsinger, David M; Tow, Emily W; Nayar, Kishor G; Maswadeh, Laith A; Lienhard V, John H

    2016-12-01

    As reverse osmosis (RO) desalination capacity increases worldwide, the need to reduce its specific energy consumption becomes more urgent. In addition to the incremental changes attainable with improved components such as membranes and pumps, more significant reduction of energy consumption can be achieved through time-varying RO processes including semi-batch processes such as closed-circuit reverse osmosis (CCRO) and fully-batch processes that have not yet been commercialized or modelled in detail. In this study, numerical models of the energy consumption of batch RO (BRO), CCRO, and the standard continuous RO process are detailed. Two new energy-efficient configurations of batch RO are analyzed. Batch systems use significantly less energy than continuous RO over a wide range of recovery ratios and source water salinities. Relative to continuous RO, models predict that CCRO and batch RO demonstrate up to 37% and 64% energy savings, respectively, for brackish water desalination at high water recovery. For batch RO and CCRO, the primary reductions in energy use stem from atmospheric pressure brine discharge and reduced streamwise variation in driving pressure. Fully-batch systems further reduce energy consumption by not mixing streams of different concentrations, which CCRO does. These results demonstrate that time-varying processes can significantly raise RO energy efficiency. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Teachers' Experiences with the Data-Driven Decision Making Process in Increasing Students' Reading Achievement in a Title I Elementary Public School

    Science.gov (United States)

    Atkinson, Linton

    2015-01-01

    This paper is a research dissertation based on a qualitative case study conducted on Teachers' Experiences within a Data-Driven Decision Making (DDDM) process. The study site was a Title I elementary school in a large school district in Central Florida. Background information is given in relation to the need for research that was conducted on the…

  6. Guide to a Student-Family-School-Community Partnership: Using a Student & Data Driven Process to Improve School Environments & Promote Student Success

    Science.gov (United States)

    Burgoa, Carol; Izu, Jo Ann

    2010-01-01

    This guide presents a data-driven, research-based process--referred to as the "school-community forum process"--for increasing youth voice, promoting resilience, strengthening adult-youth connections, and ultimately, for improving schools. It uses a "student listening circle"--a special type of focus group involving eight to…

  7. Consuming America : A Data-Driven Analysis of the United States as a Reference Culture in Dutch Public Discourse on Consumer Goods, 1890-1990

    NARCIS (Netherlands)

    Wevers, M.J.H.F.

    2017-01-01

    Consuming America offers a data-driven, longitudinal analysis of the historical dynamics that have underpinned a long-term, layered cultural-historical process: the emergence of the United States as a dominant reference culture in Dutch public discourse on consumer goods between 1890 and 1990. The

  8. Data-driven Radiative Hydrodynamic Modeling of the 2014 March 29 X1.0 Solar Flare

    Science.gov (United States)

    Rubio da Costa, Fatima; Kleint, Lucia; Petrosian, Vahé; Liu, Wei; Allred, Joel C.

    2016-08-01

    Spectroscopic observations of solar flares provide critical diagnostics of the physical conditions in the flaring atmosphere. Some key features in observed spectra have not yet been accounted for in existing flare models. Here we report a data-driven simulation of the well-observed X1.0 flare on 2014 March 29 that can reconcile some well-known spectral discrepancies. We analyzed spectra of the flaring region from the Interface Region Imaging Spectrograph (IRIS) in Mg ii h&k, the Interferometric BIdimensional Spectropolarimeter at the Dunn Solar Telescope (DST/IBIS) in Hα 6563 Å and Ca ii 8542 Å, and the Reuven Ramaty High Energy Solar Spectroscope Imager (RHESSI) in hard X-rays. We constructed a multithreaded flare loop model and used the electron flux inferred from RHESSI data as the input to the radiative hydrodynamic code RADYN to simulate the atmospheric response. We then synthesized various chromospheric emission lines and compared them with the IRIS and IBIS observations. In general, the synthetic intensities agree with the observed ones, especially near the northern footpoint of the flare. The simulated Mg ii line profile has narrower wings than the observed one. This discrepancy can be reduced by using a higher microturbulent velocity (27 km s-1) in a narrow chromospheric layer. In addition, we found that an increase of electron density in the upper chromosphere within a narrow height range of ≈800 km below the transition region can turn the simulated Mg ii line core into emission and thus reproduce the single peaked profile, which is a common feature in all IRIS flares.

  9. Data-Driven Derivation of an "Informer Compound Set" for Improved Selection of Active Compounds in High-Throughput Screening.

    Science.gov (United States)

    Paricharak, Shardul; IJzerman, Adriaan P; Jenkins, Jeremy L; Bender, Andreas; Nigsch, Florian

    2016-09-26

    Despite the usefulness of high-throughput screening (HTS) in drug discovery, for some systems, low assay throughput or high screening cost can prohibit the screening of large numbers of compounds. In such cases, iterative cycles of screening involving active learning (AL) are employed, creating the need for smaller "informer sets" that can be routinely screened to build predictive models for selecting compounds from the screening collection for follow-up screens. Here, we present a data-driven derivation of an informer compound set with improved predictivity of active compounds in HTS, and we validate its benefit over randomly selected training sets on 46 PubChem assays comprising at least 300,000 compounds and covering a wide range of assay biology. The informer compound set showed improvement in BEDROC(α = 100), PRAUC, and ROCAUC values averaged over all assays of 0.024, 0.014, and 0.016, respectively, compared to randomly selected training sets, all with paired t-test p-values <10(-15). A per-assay assessment showed that the BEDROC(α = 100), which is of particular relevance for early retrieval of actives, improved for 38 out of 46 assays, increasing the success rate of smaller follow-up screens. Overall, we showed that an informer set derived from historical HTS activity data can be employed for routine small-scale exploratory screening in an assay-agnostic fashion. This approach led to a consistent improvement in hit rates in follow-up screens without compromising scaffold retrieval. The informer set is adjustable in size depending on the number of compounds one intends to screen, as performance gains are realized for sets with more than 3,000 compounds, and this set is therefore applicable to a variety of situations. Finally, our results indicate that random sampling may not adequately cover descriptor space, drawing attention to the importance of the composition of the training set for predicting actives.

  10. Intensity inhomogeneity correction of structural MR images: a data-driven approach to define input algorithm parameters

    Directory of Open Access Journals (Sweden)

    Marco eGanzetti

    2016-03-01

    Full Text Available Intensity non-uniformity (INU in magnetic resonance (MR imaging is a major issue when conducting analyses of brain structural properties. An inaccurate INU correction may result in qualitative and quantitative misinterpretations. Several INU correction methods exist, whose performance largely depend on the specific parameter settings that need to be chosen by the user. Here we addressed the question of how to select the best input parameters for a specific INU correction algorithm. Our investigation was based on the INU correction algorithm implemented in SPM, but this can be in principle extended to any other algorithm requiring the selection of input parameters. We conducted a comprehensive comparison of indirect metrics for the assessment of INU correction performance, namely the coefficient of variation of white matter (CV_WM, the coefficient of variation of gray matter (CV_GM, and the coefficient of joint variation between white matter and gray matter (CJV. Using simulated MR data, we observed the CJV to be more accurate than CV_WM and CV_GM, provided that the noise level in the INU-corrected image was controlled by means of spatial smoothing. Based on the CJV, we developed a data-driven approach for selecting INU correction parameters, which could effectively work on actual MR images. To this end, we implemented an enhanced procedure for the definition of white and gray matter masks, based on which the CJV was calculated. Our approach was validated using actual T1-weighted images collected with 1.5 T, 3 T and 7 T MR scanners. We found that our procedure can reliably assist the selection of valid INU correction algorithm parameters, thereby contributing to an enhanced inhomogeneity correction in MR images.

  11. Intensity Inhomogeneity Correction of Structural MR Images: A Data-Driven Approach to Define Input Algorithm Parameters.

    Science.gov (United States)

    Ganzetti, Marco; Wenderoth, Nicole; Mantini, Dante

    2016-01-01

    Intensity non-uniformity (INU) in magnetic resonance (MR) imaging is a major issue when conducting analyses of brain structural properties. An inaccurate INU correction may result in qualitative and quantitative misinterpretations. Several INU correction methods exist, whose performance largely depend on the specific parameter settings that need to be chosen by the user. Here we addressed the question of how to select the best input parameters for a specific INU correction algorithm. Our investigation was based on the INU correction algorithm implemented in SPM, but this can be in principle extended to any other algorithm requiring the selection of input parameters. We conducted a comprehensive comparison of indirect metrics for the assessment of INU correction performance, namely the coefficient of variation of white matter (CVWM), the coefficient of variation of gray matter (CVGM), and the coefficient of joint variation between white matter and gray matter (CJV). Using simulated MR data, we observed the CJV to be more accurate than CVWM and CVGM, provided that the noise level in the INU-corrected image was controlled by means of spatial smoothing. Based on the CJV, we developed a data-driven approach for selecting INU correction parameters, which could effectively work on actual MR images. To this end, we implemented an enhanced procedure for the definition of white and gray matter masks, based on which the CJV was calculated. Our approach was validated using actual T1-weighted images collected with 1.5 T, 3 T, and 7 T MR scanners. We found that our procedure can reliably assist the selection of valid INU correction algorithm parameters, thereby contributing to an enhanced inhomogeneity correction in MR images.

  12. Experimental investigation of the predictive capabilities of data driven modeling techniques in hydrology – Part 2: Application

    Directory of Open Access Journals (Sweden)

    D. P. Solomatine

    2009-11-01

    Full Text Available In this second part of the two-part paper, the data driven modeling (DDM experiment, presented and explained in the first part, is implemented. Inputs for the five case studies (half-hourly actual evapotranspiration, daily peat soil moisture, daily till soil moisture, and two daily rainfall-runoff datasets are identified, either based on previous studies or using the mutual information content. Twelve groups (realizations were randomly generated from each dataset by randomly sampling without replacement from the original dataset. Neural networks (ANNs, genetic programming (GP, evolutionary polynomial regression (EPR, Support vector machines (SVM, M5 model trees (M5, K nearest neighbors (K-nn, and multiple linear regression (MLR techniques are implemented and applied to each of the 12 realizations of each case study. The predictive accuracy and uncertainties of the various techniques are assessed using multiple average overall error measures, scatter plots, frequency distribution of model residuals, and the deterioration rate of prediction performance during the testing phase. Gamma test is used as a guide to assist in selecting the appropriate modeling technique. Unlike the two nonlinear soil moisture case studies, the results of the experiment conducted in this research study show that ANNs were a sub-optimal choice for the actual evapotranspiration and the two rainfall-runoff case studies. GP is the most successful technique due to its ability to adapt the model complexity to the modeled data. EPR performance could be close to GP with datasets that are more linear than nonlinear. SVM is sensitive to the kernel choice and if appropriately selected, the performance of SVM can improve. M5 performs very well with linear and semi linear data, which cover wide range of hydrological situations. In highly nonlinear case studies, ANNs, K-nn, and GP could be more successful than other modeling techniques. K-nn is also successful in linear situations

  13. Real-Time Monitoring of TP Load in a Mississippi Delta Stream Using a Dynamic Data Driven Application System

    Science.gov (United States)

    Ouyang, Y.; Leininger, T.; Hatten, J. A.

    2012-12-01

    Elevated phosphorus (P) in surface waters can cause eutrophication of aquatic ecosystems and can impair water for drinking, industry, agriculture, and recreation. Currently, little effort has been devoted to monitoring real-time variation and load of total P (TP) in surface waters due to the lack of suitable and/or cost-effective wireless sensors. However, when considering human health, drinking water supply, and rapidly developing events such as algal blooms, the availability of timely P information is very critical. In this study, we developed a new approach in the form of a dynamic data driven application system (DDDAS) for monitoring the real-time variation and load of TP in surface water. This DDDAS consisted of the following three major components: (1) a User Control that interacts with Schedule Run to implement the DDDAS with starting and ending times; (2) a Schedule Run that activates the Hydstra model; and (3) a Hydstra model that downloads the real-time data from a US Geological Survey (USGS) website that is updated every 15 minutes with data from USGS monitoring stations, predicts real-time variation and load of TP, graphs the variables in real-time on a computer screen, and sends email alerts when the TP exceeds a certain value. The DDDAS was applied to monitor real-time variation and load of TP for 30 days in Deer Creek, a stream located east of Leland, Mississippi, USA. Results showed that the TP contents in the stream ranged from 0.24 to 0.48 mg L-1 with an average of 0.30 mg L-1 for a 30-day monitoring period, whereas the cumulative load of TP from the stream was about 2.8kg for the same monitoring period. Our study suggests that the DDDAS developed in this study was useful for estimating the real-time variation and load of TP in surface water ecosystems.

  14. A data-driven investigation of gray matter–function correlations in schizophrenia during a working memory task

    Directory of Open Access Journals (Sweden)

    Andrew eMichael

    2011-08-01

    Full Text Available The brain is a vastly interconnected organ and methods are needed to investigate its long range structure(S–function(F associations to better understand disorders such as Schizophrenia that are hypothesized to be due to distributed disconnected brain regions. In previous work we introduced a methodology to reduce the whole brain S–F correlations to a histogram and here we reduce the correlations to brain clusters. The application of our approach to sMRI (gray matter concentration maps and fMRI data (GLM activation maps during Encode and Probe epochs of a working memory task from patients with schizophrenia (SZ, n=100 and healthy controls (HC, n=100 presented the following results. In HC the whole brain correlation histograms for gray matter(GM–Encode and GM–Probe overlap for Low and Medium loads and at High the histograms separate, but in SZ the histograms do not overlap for any of the load levels and Medium load shows the maximum difference. We computed GM–F differential correlation clusters using activation for Probe Medium, and they included regions in the left and right superior temporal gyri, anterior cingulate, cuneus, middle temporal gyrus and the cerebellum. Inter-cluster GM–Probe correlations for Medium load were positive in HC but negative in SZ. Within group inter-cluster GM–Encode and GM–Probe correlation comparisons show no differences in HC but in SZ differences are evident in the same clusters where HC versus SZ differences occurred for Probe Medium, indicating that the S–F integrity during Probe is aberrant in SZ. Through a data-driven whole brain analysis approach we find novel brain clusters and show how the S–F differential correlation changes during Probe and Encode at three memory load levels. Structural and functional anomalies have been extensively reported in schizophrenia and here we provide evidences to suggest that evaluating S–F associations can provide important additional information.

  15. Data-Driven Approaches for Computation in Intelligent Biomedical Devices: A Case Study of EEG Monitoring for Chronic Seizure Detection

    Directory of Open Access Journals (Sweden)

    Naveen Verma

    2011-04-01

    Full Text Available Intelligent biomedical devices implies systems that are able to detect specific physiological processes in patients so that particular responses can be generated. This closed-loop capability can have enormous clinical value when we consider the unprecedented modalities that are beginning to emerge for sensing and stimulating patient physiology. Both delivering therapy (e.g., deep-brain stimulation, vagus nerve stimulation, etc. and treating impairments (e.g., neural prosthesis requires computational devices that can make clinically relevant inferences, especially using minimally-intrusive patient signals. The key to such devices is algorithms that are based on data-driven signal modeling as well as hardware structures that are specialized to these. This paper discusses the primary application-domain challenges that must be overcome and analyzes the most promising methods for this that are emerging. We then look at how these methods are being incorporated in ultra-low-energy computational platforms and systems. The case study for this is a seizure-detection SoC that includes instrumentation and computation blocks in support of a system that exploits patient-specific modeling to achieve accurate performance for chronic detection. The SoC samples each EEG channel at a rate of 600 Hz and performs processing to derive signal features on every two second epoch, consuming 9 μJ/epoch/channel. Signal feature extraction reduces the data rate by a factor of over 40×, permitting wireless communication from the patient’s head while reducing the total power on the head by 14×.

  16. A data-driven modeling approach to identify disease-specific multi-organ networks driving physiological dysregulation.

    Directory of Open Access Journals (Sweden)

    Warren D Anderson

    2017-07-01

    Full Text Available Multiple physiological systems interact throughout the development of a complex disease. Knowledge of the dynamics and connectivity of interactions across physiological systems could facilitate the prevention or mitigation of organ damage underlying complex diseases, many of which are currently refractory to available therapeutics (e.g., hypertension. We studied the regulatory interactions operating within and across organs throughout disease development by integrating in vivo analysis of gene expression dynamics with a reverse engineering approach to infer data-driven dynamic network models of multi-organ gene regulatory influences. We obtained experimental data on the expression of 22 genes across five organs, over a time span that encompassed the development of autonomic nervous system dysfunction and hypertension. We pursued a unique approach for identification of continuous-time models that jointly described the dynamics and structure of multi-organ networks by estimating a sparse subset of ∼12,000 possible gene regulatory interactions. Our analyses revealed that an autonomic dysfunction-specific multi-organ sequence of gene expression activation patterns was associated with a distinct gene regulatory network. We analyzed the model structures for adaptation motifs, and identified disease-specific network motifs involving genes that exhibited aberrant temporal dynamics. Bioinformatic analyses identified disease-specific single nucleotide variants within or near transcription factor binding sites upstream of key genes implicated in maintaining physiological homeostasis. Our approach illustrates a novel framework for investigating the pathogenesis through model-based analysis of multi-organ system dynamics and network properties. Our results yielded novel candidate molecular targets driving the development of cardiovascular disease, metabolic syndrome, and immune dysfunction.

  17. Parallel data-driven decomposition algorithm for large-scale datasets: with application to transitional boundary layers

    Science.gov (United States)

    Sayadi, Taraneh; Schmid, Peter J.

    2016-10-01

    Many fluid flows of engineering interest, though very complex in appearance, can be approximated by low-order models governed by a few modes, able to capture the dominant behavior (dynamics) of the system. This feature has fueled the development of various methodologies aimed at extracting dominant coherent structures from the flow. Some of the more general techniques are based on data-driven decompositions, most of which rely on performing a singular value decomposition (SVD) on a formulated snapshot (data) matrix. The amount of experimentally or numerically generated data expands as more detailed experimental measurements and increased computational resources become readily available. Consequently, the data matrix to be processed will consist of far more rows than columns, resulting in a so-called tall-and-skinny (TS) matrix. Ultimately, the SVD of such a TS data matrix can no longer be performed on a single processor, and parallel algorithms are necessary. The present study employs the parallel TSQR algorithm of (Demmel et al. in SIAM J Sci Comput 34(1):206-239, 2012), which is further used as a basis of the underlying parallel SVD. This algorithm is shown to scale well on machines with a large number of processors and, therefore, allows the decomposition of very large datasets. In addition, the simplicity of its implementation and the minimum required communication makes it suitable for integration in existing numerical solvers and data decomposition techniques. Examples that demonstrate the capabilities of highly parallel data decomposition algorithms include transitional processes in compressible boundary layers without and with induced flow separation.

  18. Mapping of Agricultural Crops from Single High-Resolution Multispectral Images—Data-Driven Smoothing vs. Parcel-Based Smoothing

    Directory of Open Access Journals (Sweden)

    Asli Ozdarici-Ok

    2015-05-01

    Full Text Available Mapping agricultural crops is an important application of remote sensing. However, in many cases it is based either on hyperspectral imagery or on multitemporal coverage, both of which are difficult to scale up to large-scale deployment at high spatial resolution. In the present paper, we evaluate the possibility of crop classification based on single images from very high-resolution (VHR satellite sensors. The main objective of this work is to expose performance difference between state-of-the-art parcel-based smoothing and purely data-driven conditional random field (CRF smoothing, which is yet unknown. To fulfill this objective, we perform extensive tests with four different classification methods (Support Vector Machines, Random Forest, Gaussian Mixtures, and Maximum Likelihood to compute the pixel-wise data term; and we also test two different definitions of the pairwise smoothness term. We have performed a detailed evaluation on different multispectral VHR images (Ikonos, QuickBird, Kompsat-2. The main finding of this study is that pairwise CRF smoothing comes close to the state-of-the-art parcel-based method that requires parcel boundaries (average difference ≈ 2.5%. Our results indicate that a single multispectral (R, G, B, NIR image is enough to reach satisfactory classification accuracy for six crop classes (corn, pasture, rice, sugar beet, wheat, and tomato in Mediterranean climate. Overall, it appears that crop mapping using only one-shot VHR imagery taken at the right time may be a viable alternative, especially since high-resolution multitemporal or hyperspectral coverage as well as parcel boundaries are in practice often not available.

  19. RWater - A Novel Cyber-enabled Data-driven Educational Tool for Interpreting and Modeling Hydrologic Processes

    Science.gov (United States)

    Rajib, M. A.; Merwade, V.; Zhao, L.; Song, C.

    2014-12-01

    Explaining the complex cause-and-effect relationships in hydrologic cycle can often be challenging in a classroom with the use of traditional teaching approaches. With the availability of observed rainfall, streamflow and other hydrology data on the internet, it is possible to provide the necessary tools to students to explore these relationships and enhance their learning experience. From this perspective, a new online educational tool, called RWater, is developed using Purdue University's HUBzero technology. RWater's unique features include: (i) its accessibility including the R software from any java supported web browser; (ii) no installation of any software on user's computer; (iii) all the work and resulting data are stored in user's working directory on RWater server; and (iv) no prior programming experience with R software is necessary. In its current version, RWater can dynamically extract streamflow data from any USGS gaging station without any need for post-processing for use in the educational modules. By following data-driven modules, students can write small scripts in R and thereby create visualizations to identify the effect of rainfall distribution and watershed characteristics on runoff generation, investigate the impacts of landuse and climate change on streamflow, and explore the changes in extreme hydrologic events in actual locations. Each module contains relevant definitions, instructions on data extraction and coding, as well as conceptual questions based on the possible analyses which the students would perform. In order to assess its suitability in classroom implementation, and to evaluate users' perception over its utility, the current version of RWater has been tested with three different groups: (i) high school students, (ii) middle and high school teachers; and (iii) upper undergraduate/graduate students. The survey results from these trials suggest that the RWater has potential to improve students' understanding on various

  20. Model Penjadwalan Batch Multi Item dengan Dependent Processing Time

    Directory of Open Access Journals (Sweden)

    Sukoyo Sukoyo

    2010-01-01

    Full Text Available This paper investigates a development of single machine batch scheduling for multi items with dependent processing time. The batch scheduling problem is to determine simultaneously number of batch (N, which item and its size allocated for each batch, and processing sequences of resulting batches. We use total actual flow time as the objective of schedule performance. The multi item batch scheduling problem could be formulated into a biner-integer nonlinear programming model because the number of batch should be in integer value, the allocation of items to resulting batch need binary values, and also there are some non-linearity on objective function and constraint due to the dependent processing time. By applying relaxation on the decision variable of number of batch (N as parameter, a heuristic procedure could be applied to find solution of the single machine batch scheduling problem for multi items.

  1. Dynamic Fractional Resource Scheduling vs. Batch Scheduling

    CERN Document Server

    Casanova, Henri; Vivien, Frédéric

    2011-01-01

    We propose a novel job scheduling approach for homogeneous cluster computing platforms. Its key feature is the use of virtual machine technology to share fractional node resources in a precise and controlled manner. Other VM-based scheduling approaches have focused primarily on technical issues or on extensions to existing batch scheduling systems, while we take a more aggressive approach and seek to find heuristics that maximize an objective metric correlated with job performance. We derive absolute performance bounds and develop algorithms for the online, non-clairvoyant version of our scheduling problem. We further evaluate these algorithms in simulation against both synthetic and real-world HPC workloads and compare our algorithms to standard batch scheduling approaches. We find that our approach improves over batch scheduling by orders of magnitude in terms of job stretch, while leading to comparable or better resource utilization. Our results demonstrate that virtualization technology coupled with light...

  2. Data-driven deselection for monographs: a rules-based approach to weeding, storage, and shared print decisions

    Directory of Open Access Journals (Sweden)

    Rick Lugg

    2012-07-01

    Full Text Available The value of local print book collections is changing. Even as stacks fill and library traffic grows, circulation continues to decline. Across the ‘collective collection’, millions of unused books occupy prime central campus space. Meanwhile, users want more collaborative study space and online resources. Libraries want room for information commons, teaching and learning centers and cafes. Done properly, removing unused books can free space for these and other purposes, with little impact on users. Many low-use titles are securely archived, accessible digitally, and widely held in print. Surplus copies can be removed without endangering the scholarly record. But identifying candidates for deselection is time-consuming. Batch-oriented tools that incorporate both archival and service values are needed. This article describes the characteristics of a decision-support system that assembles deselection metadata and enables library-defined rules to generate lists of titles eligible for withdrawal, storage, or inclusion in shared print programs.

  3. Exploring the Transition From Batch to Online

    DEFF Research Database (Denmark)

    Jørgensen, Anker Helms

    2010-01-01

    of the truly interactive use of computers known today. The transition invoked changes in a number of areas: technological, such as hybrid forms between batch and online; organisational such as decentralization; and personal as users and developers alike had to adopt new technology, shape new organizational...... structures, and acquire new skills. This work-in-progress paper extends an earlier study of the transition from batch to online, based on oral history interviews with (ex)-employees in two large Danish Service Bureaus. The paper takes the next step by ana-lyzing a particular genre: the commercial computer...

  4. Enabling data-driven provenance in NetCDF, via OGC WPS operations. Climate Analysis services use case.

    Science.gov (United States)

    Mihajlovski, A.; Spinuso, A.; Plieger, M.; Som de Cerff, W.

    2016-12-01

    data products, a standardized provenance, metadata and processing infrastructure is researched for CLIPC. These efforts will lead towards the provision of tools for further web service processing development and optimisation, opening up possibilities to scale and administer abstract users and data driven workflows.

  5. A data-driven model for constraint of present-day glacial isostatic adjustment in North America

    Science.gov (United States)

    Simon, K. M.; Riva, R. E. M.; Kleinherenbrink, M.; Tangdamrongsub, N.

    2017-09-01

    Geodetic measurements of vertical land motion and gravity change are incorporated into an a priori model of present-day glacial isostatic adjustment (GIA) in North America via least-squares adjustment. The result is an updated GIA model wherein the final predicted signal is informed by both observational data, and prior knowledge (or intuition) of GIA inferred from models. The data-driven method allows calculation of the uncertainties of predicted GIA fields, and thus offers a significant advantage over predictions from purely forward GIA models. In order to assess the influence each dataset has on the final GIA prediction, the vertical land motion and GRACE-measured gravity data are incorporated into the model first independently (i.e., one dataset only), then simultaneously. The relative weighting of the datasets and the prior input is iteratively determined by variance component estimation in order to achieve the most statistically appropriate fit to the data. The best-fit model is obtained when both datasets are inverted and gives respective RMS misfits to the GPS and GRACE data of 1.3 mm/yr and 0.8 mm/yr equivalent water layer change. Non-GIA signals (e.g., hydrology) are removed from the datasets prior to inversion. The post-fit residuals between the model predictions and the vertical motion and gravity datasets, however, suggest particular regions where significant non-GIA signals may still be present in the data, including unmodeled hydrological changes in the central Prairies west of Lake Winnipeg. Outside of these regions of misfit, the posterior uncertainty of the predicted model provides a measure of the formal uncertainty associated with the GIA process; results indicate that this quantity is sensitive to the uncertainty and spatial distribution of the input data as well as that of the prior model information. In the study area, the predicted uncertainty of the present-day GIA signal ranges from ∼0.2-1.2 mm/yr for rates of vertical land motion, and

  6. An architecture for a continuous, user-driven, and data-driven application of clinical guidelines and its evaluation.

    Science.gov (United States)

    Shalom, Erez; Shahar, Yuval; Lunenfeld, Eitan

    2016-02-01

    Design, implement, and evaluate a new architecture for realistic continuous guideline (GL)-based decision support, based on a series of requirements that we have identified, such as support for continuous care, for multiple task types, and for data-driven and user-driven modes. We designed and implemented a new continuous GL-based support architecture, PICARD, which accesses a temporal reasoning engine, and provides several different types of application interfaces. We present the new architecture in detail in the current paper. To evaluate the architecture, we first performed a technical evaluation of the PICARD architecture, using 19 simulated scenarios in the preeclampsia/toxemia domain. We then performed a functional evaluation with the help of two domain experts, by generating patient records that simulate 60 decision points from six clinical guideline-based scenarios, lasting from two days to four weeks. Finally, 36 clinicians made manual decisions in half of the scenarios, and had access to the automated GL-based support in the other half. The measures used in all three experiments were correctness and completeness of the decisions relative to the GL. Mean correctness and completeness in the technical evaluation were 1±0.0 and 0.96±0.03 respectively. The functional evaluation produced only several minor comments from the two experts, mostly regarding the output's style; otherwise the system's recommendations were validated. In the clinically oriented evaluation, the 36 clinicians applied manually approximately 41% of the GL's recommended actions. Completeness increased to approximately 93% when using PICARD. Manual correctness was approximately 94.5%, and remained similar when using PICARD; but while 68% of the manual decisions included correct but redundant actions, only 3% of the actions included in decisions made when using PICARD were redundant. The PICARD architecture is technically feasible and is functionally valid, and addresses the realistic

  7. A Data-driven Approach to Integrate Crop Rotation Agronomic Practices in a Global Gridded Land-use Forcing Dataset

    Science.gov (United States)

    Sahajpal, R.; Hurtt, G. C.; Chini, L. P.; Frolking, S. E.; Izaurralde, R. C.

    2016-12-01

    Agro-ecosystems are the dominant land-use type on Earth, covering more than a third of ice-free land surface. Agricultural practices have influenced the Earth's climate system by significantly altering the biogeophysical and biogeochemical properties from hyper-local to global scales. While past work has focused largely on characterizing the effects of net land cover changes, the magnitude and nature of gross transitions and agricultural management practices on climate remains highly uncertain. To address this issue, a new set of global gridded land-use forcing datasets (LUH2) have been developed in a standard format required by climate models for CMIP6. For the first time, this dataset includes information on key agricultural management practices including crop rotations. Crop rotations describe the practice of growing crops on the same land in sequential seasons and are essential to agronomic management as they influence key ecosystem services such as crop yields, water quality, carbon and nutrient cycling, pest and disease control. Here, we present a data-driven approach to infer crop rotations based on crop specific land cover data, derived from moderate resolution satellite imagery and created at an annual time-step for the continental United States. Our approach compresses the more than 100,000 unique crop rotations prevalent in the United States from 2013 - 2015 to about 200 representative crop rotations that account for nearly 80% of the spatio-temporal variability. Further simplification is achieved by mapping individual crops to crop functional types, which identify crops based on their photosynthetic pathways (C3/C4), life strategy (annual/perennial) and whether they are N-fixing or not. The resulting matrix of annual transitions between crop functional types averages 41,000 km2/yr for rotations between C3 and C4 annual crops, and 140,000 km2/yr between C3 N-fixing and C4 annual crops. The crop rotation matrix is combined with information on other land

  8. Predictability of Keeping Quality for Strawberry Batches

    NARCIS (Netherlands)

    Schouten, R.E.; Kessler, D.; Orcaray, L.; Kooten, van O.

    2002-01-01

    Postharvest life of strawberries is largely limited by Botrytis cinerea infection. It is assumed that there are two factors influencing the batch keeping quality: the Botrytis pressure and the resistance of the strawberry to infection. The latter factor will be discussed in this article. A colour

  9. Monte Carlo simulation on kinetics of batch and semi-batch free radical polymerization

    KAUST Repository

    Shao, Jing

    2015-10-27

    Based on Monte Carlo simulation technology, we proposed a hybrid routine which combines reaction mechanism together with coarse-grained molecular simulation to study the kinetics of free radical polymerization. By comparing with previous experimental and simulation studies, we showed the capability of our Monte Carlo scheme on representing polymerization kinetics in batch and semi-batch processes. Various kinetics information, such as instant monomer conversion, molecular weight, and polydispersity etc. are readily calculated from Monte Carlo simulation. The kinetic constants such as polymerization rate k p is determined in the simulation without of “steady-state” hypothesis. We explored the mechanism for the variation of polymerization kinetics those observed in previous studies, as well as polymerization-induced phase separation. Our Monte Carlo simulation scheme is versatile on studying polymerization kinetics in batch and semi-batch processes.

  10. On transcending the impasse of respiratory motion correction applications in routine clinical imaging - a consideration of a fully automated data driven motion control framework.

    Science.gov (United States)

    Kesner, Adam L; Schleyer, Paul J; Büther, Florian; Walter, Martin A; Schäfers, Klaus P; Koo, Phillip J

    2014-12-01

    Positron emission tomography (PET) is increasingly used for the detection, characterization, and follow-up of tumors located in the thorax. However, patient respiratory motion presents a unique limitation that hinders the application of high-resolution PET technology for this type of imaging. Efforts to transcend this limitation have been underway for more than a decade, yet PET remains for practical considerations a modality vulnerable to motion-induced image degradation. Respiratory motion control is not employed in routine clinical operations. In this article, we take an opportunity to highlight some of the recent advancements in data-driven motion control strategies and how they may form an underpinning for what we are presenting as a fully automated data-driven motion control framework. This framework represents an alternative direction for future endeavors in motion control and can conceptually connect individual focused studies with a strategy for addressing big picture challenges and goals.

  11. Sojourn time distributions in a Markovian G-queue with batch arrival and batch removal

    OpenAIRE

    1999-01-01

    We consider a single server Markovian queue with two types of customers; positive and negative, where positive customers arrive in batches and arrivals of negative customers remove positive customers in batches. Only positive customers form a queue and negative customers just reduce the system congestion by removing positive ones upon their arrivals. We derive the LSTs of sojourn time distributions for a single server Markovian queue with positive customers and negative custom...

  12. Data-Driven Control for Interlinked AC/DC Microgrids via Model-Free Adaptive Control and Dual-Droop Control

    OpenAIRE

    Zhang, Huaguang; Zhou, Jianguo; Sun, Qiuye; Guerrero, Josep M.; Ma, Dazhong

    2016-01-01

    This paper investigates the coordinated power sharing issues of interlinked ac/dc microgrids. An appropriate control strategy is developed to control the interlinking converter (IC) to realize proportional power sharing between ac and dc microgrids. The proposed strategy mainly includes two parts: the primary outer-loop dual-droop control method along with secondary control; the inner-loop data-driven model-free adaptive voltage control. Using the proposed scheme, the interlinking converter, ...

  13. The Peace Game: A Data-Driven Evaluation of a Software-Based Model of the Effects of Modern Conflict on Populations

    Science.gov (United States)

    2015-09-01

    NAVAL POSTGRADUATE SCHOOL MONTEREY, CALIFORNIA THESIS Approved for public release; distribution is unlimited THE PEACE GAME : A...2. REPORT DATE September 2015 3. REPORT TYPE AND DATES COVERED Master’s thesis 4. TITLE AND SUBTITLE THE PEACE GAME : A DATA-DRIVEN EVALUATION OF A...capabilities of a plan or concept of operation. While most of these games focus on “war,” the Peace Game focuses on helping planners gain insight as to

  14. Batch and fed-batch production of butyric acid by Clostridium butyricum ZJUCB

    Institute of Scientific and Technical Information of China (English)

    HE Guo-qing; KONG Qing; CHEN Qi-he; RUAN Hui

    2005-01-01

    The production of butyric acid by Clostridium butyricum ZJUCB at various pH values was investigated. In order to study the effect of pH on cell growth, butyric acid biosynthesis and reducing sugar consumption, different cultivation pH values ranging from 6.0 to 7.5 were evaluated in 5-L bioreactor. In controlled pH batch fermentation, the optimum pH for cell growth and butyric acid production was 6.5 with a cell yield of 3.65 g/L and butyric acid yield of 12.25 g/L. Based on these results, this study then compared batch and fed-batch fermentation of butyric acid production at pH 6.5. Maximum value (16.74 g/L) of butyric acid concentration was obtained in fed-batch fermentation compared to 12.25 g/L in batch fermentation. It was concluded that cultivation under fed-batch fermentation mode could enhance butyric acid production significantly (P<0.01) by C. butyricum ZJUCB.

  15. Batch and fed-batch fermentation of Bacillus thuringiensis using starch industry wastewater as fermentation substrate.

    Science.gov (United States)

    Vu, Khanh Dang; Tyagi, Rajeshwar Dayal; Valéro, José R; Surampalli, Rao Y

    2010-08-01

    Bacillus thuringiensis var. kurstaki biopesticide was produced in batch and fed-batch fermentation modes using starch industry wastewater as sole substrate. Fed-batch fermentation with two intermittent feeds (at 10 and 20 h) during the fermentation of 72 h gave the maximum delta-endotoxin concentration (1,672.6 mg/L) and entomotoxicity (Tx) (18.5 x 10(6) SBU/mL) in fermented broth which were significantly higher than maximum delta-endotoxin concentration (511.0 mg/L) and Tx (15.8 x 10(6) SBU/mL) obtained in batch process. However, fed-batch fermentation with three intermittent feeds (at 10, 20 and 34 h) of the fermentation resulted in the formation of asporogenous variant (Spo-) from 36 h to the end of fermentation (72 h) which resulted in a significant decrease in spore and delta-endotoxin concentration and finally the Tx value. Tx of suspended pellets (27.4 x 10(6) SBU/mL) obtained in fed-batch fermentation with two feeds was the highest value as compared to other cases.

  16. Batch process. Optimum designing and operation of a batch process; Bacchi purosesu

    Energy Technology Data Exchange (ETDEWEB)

    Hasebe, S. [Kyoto Univ. (Japan). Faculty of Engineering

    1997-09-05

    Since the control of a batch process becomes dynamic, it becomes necessary to handle the process differently from a continuous process in terms of the designing, operating and controlling of the process. This paper describes the characteristics and the problems to be solved of a batch process from three points of view, the designing, operation and controlling of the process. A major problem of a batch process is the designing difficulty. In a batch process, the amount of products capable of being manufactured per unit time by each apparatus and that by the whole plant structured by combining apparatuses are different, and therefore the time and apparatus capacity are wasted in some cases. The actual designing of a batch process involves various factors, such as the seasonal fluctuation of demand for products, the possibility of expanding the apparatuses in the future and the easiness of controlling the process, and the shipment of products during consecutive holidays and periodic maintenance, which are not included in the formulation of mathematical programming problems. Regarding the optimum operation of a batch process and the controlling of the same, descriptions of forming of a dynamic optimum operation pattern and verification of the sequence control system are given. 9 refs., 4 figs.

  17. Developing a Metadata Infrastructure to facilitate data driven science gateway and to provide Inspire/GEMINI compliance for CLIPC

    Science.gov (United States)

    Mihajlovski, Andrej; Plieger, Maarten; Som de Cerff, Wim; Page, Christian

    2016-04-01

    indicators Key is the availability of standardized metadata, describing indicator data and services. This will enable standardization and interoperability between the different distributed services of CLIPC. To disseminate CLIPC indicator data, transformed data products to enable impacts assessments and climate change impact indicators a standardized meta-data infrastructure is provided. The challenge is that compliance of existing metadata to INSPIRE ISO standards and GEMINI standards needs to be extended to further allow the web portal to be generated from the available metadata blueprint. The information provided in the headers of netCDF files available through multiple catalogues, allow us to generate ISO compliant meta data which is in turn used to generate web based interface content, as well as OGC compliant web services such as WCS and WMS for front end and WPS interactions for the scientific users to combine and generate new datasets. The goal of the metadata infrastructure is to provide a blueprint for creating a data driven science portal, generated from the underlying: GIS data, web services and processing infrastructure. In the presentation we will present the results and lessons learned.

  18. Reformulated Neural Network (ReNN): a New Alternative for Data-driven Modelling in Hydrology and Water Resources Engineering

    Science.gov (United States)

    Razavi, S.; Tolson, B.; Burn, D.; Seglenieks, F.

    2012-04-01

    Reformulated Neural Network (ReNN) has been recently developed as an efficient and more effective alternative to feedforward multi-layer perceptron (MLP) neural networks [Razavi, S., and Tolson, B. A. (2011). "A new formulation for feedforward neural networks." IEEE Transactions on Neural Networks, 22(10), 1588-1598, DOI: 1510.1109/TNN.2011.2163169]. This presentation initially aims to introduce the ReNN to the water resources community and then demonstrates ReNN applications to water resources related problems. ReNN is essentially equivalent to a single-hidden-layer MLP neural network but defined on a new set of network variables which is more effective than the traditional set of network weights and biases. The main features of the new network variables are that they are geometrically interpretable and each variable has a distinct role in forming the network response. ReNN is more efficiently trained as it has a less complex error response surface. In addition to the ReNN training efficiency, the interpretability of the ReNN variables enables the users to monitor and understand the internal behaviour of the network while training. Regularization in the ReNN response can be also directly measured and controlled. This feature improves the generalization ability of the network. The appeal of the ReNN is demonstrated with two ReNN applications to water resources engineering problems. In the first application, the ReNN is used to model the rainfall-runoff relationships in multiple watersheds in the Great Lakes basin located in northeastern North America. Modelling inflows to the Great Lakes are of great importance to the management of the Great Lakes system. Due to the lack of some detailed physical data about existing control structures in many subwatersheds of this huge basin, the data-driven approach to modelling such as the ReNN are required to replace predictions from a physically-based rainfall runoff model. Unlike traditional MLPs, the ReNN does not necessarily

  19. Using Forensics to Untangle Batch Effects in TCGA Data - TCGA

    Science.gov (United States)

    Rehan Akbani, Ph.D., and colleagues at the University of Texas MD Anderson Cancer Center developed a tool called MBatch to detect, diagnose, and correct batch effects in TCGA data. Read more about batch effects in this Case Study.

  20. Optimal operation of batch membrane processes

    CERN Document Server

    Paulen, Radoslav

    2016-01-01

    This study concentrates on a general optimization of a particular class of membrane separation processes: those involving batch diafiltration. Existing practices are explained and operational improvements based on optimal control theory are suggested. The first part of the book introduces the theory of membrane processes, optimal control and dynamic optimization. Separation problems are defined and mathematical models of batch membrane processes derived. The control theory focuses on problems of dynamic optimization from a chemical-engineering point of view. Analytical and numerical methods that can be exploited to treat problems of optimal control for membrane processes are described. The second part of the text builds on this theoretical basis to establish solutions for membrane models of increasing complexity. Each chapter starts with a derivation of optimal operation and continues with case studies exemplifying various aspects of the control problems under consideration. The authors work their way from th...