WorldWideScience

Sample records for integrating statistical predictions

  1. Statistical timing for parametric yield prediction of digital integrated circuits

    Jess, J.A.G.; Kalafala, K.; Naidu, S.R.; Otten, R.H.J.M.; Visweswariah, C.

    2006-01-01

    Uncertainty in circuit performance due to manufacturing and environmental variations is increasing with each new generation of technology. It is therefore important to predict the performance of a chip as a probabilistic quantity. This paper proposes three novel path-based algorithms for statistical

  2. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data.

    Lu, Qiongshi; Hu, Yiming; Sun, Jiehuan; Cheng, Yuwei; Cheung, Kei-Hoi; Zhao, Hongyu

    2015-05-27

    Identifying functional regions in the human genome is a major goal in human genetics. Great efforts have been made to functionally annotate the human genome either through computational predictions, such as genomic conservation, or high-throughput experiments, such as the ENCODE project. These efforts have resulted in a rich collection of functional annotation data of diverse types that need to be jointly analyzed for integrated interpretation and annotation. Here we present GenoCanyon, a whole-genome annotation method that performs unsupervised statistical learning using 22 computational and experimental annotations thereby inferring the functional potential of each position in the human genome. With GenoCanyon, we are able to predict many of the known functional regions. The ability of predicting functional regions as well as its generalizable statistical framework makes GenoCanyon a unique and powerful tool for whole-genome annotation. The GenoCanyon web server is available at http://genocanyon.med.yale.edu.

  3. Personalizing oncology treatments by predicting drug efficacy, side-effects, and improved therapy: mathematics, statistics, and their integration.

    Agur, Zvia; Elishmereni, Moran; Kheifetz, Yuri

    2014-01-01

    Despite its great promise, personalized oncology still faces many hurdles, and it is increasingly clear that targeted drugs and molecular biomarkers alone yield only modest clinical benefit. One reason is the complex relationships between biomarkers and the patient's response to drugs, obscuring the true weight of the biomarkers in the overall patient's response. This complexity can be disentangled by computational models that integrate the effects of personal biomarkers into a simulator of drug-patient dynamic interactions, for predicting the clinical outcomes. Several computational tools have been developed for personalized oncology, notably evidence-based tools for simulating pharmacokinetics, Bayesian-estimated tools for predicting survival, etc. We describe representative statistical and mathematical tools, and discuss their merits, shortcomings and preliminary clinical validation attesting to their potential. Yet, the individualization power of mathematical models alone, or statistical models alone, is limited. More accurate and versatile personalization tools can be constructed by a new application of the statistical/mathematical nonlinear mixed effects modeling (NLMEM) approach, which until recently has been used only in drug development. Using these advanced tools, clinical data from patient populations can be integrated with mechanistic models of disease and physiology, for generating personal mathematical models. Upon a more substantial validation in the clinic, this approach will hopefully be applied in personalized clinical trials, P-trials, hence aiding the establishment of personalized medicine within the main stream of clinical oncology. © 2014 Wiley Periodicals, Inc.

  4. Statistical Methods in Integrative Genomics

    Richardson, Sylvia; Tseng, George C.; Sun, Wei

    2016-01-01

    Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions. PMID:27482531

  5. Exclusion statistics and integrable models

    Mashkevich, S.

    1998-01-01

    The definition of exclusion statistics, as given by Haldane, allows for a statistical interaction between distinguishable particles (multi-species statistics). The thermodynamic quantities for such statistics ca be evaluated exactly. The explicit expressions for the cluster coefficients are presented. Furthermore, single-species exclusion statistics is realized in one-dimensional integrable models. The interesting questions of generalizing this correspondence onto the higher-dimensional and the multi-species cases remain essentially open

  6. Statistical inference an integrated approach

    Migon, Helio S; Louzada, Francisco

    2014-01-01

    Introduction Information The concept of probability Assessing subjective probabilities An example Linear algebra and probability Notation Outline of the bookElements of Inference Common statistical modelsLikelihood-based functions Bayes theorem Exchangeability Sufficiency and exponential family Parameter elimination Prior Distribution Entirely subjective specification Specification through functional forms Conjugacy with the exponential family Non-informative priors Hierarchical priors Estimation Introduction to decision theoryBayesian point estimation Classical point estimation Empirical Bayes estimation Comparison of estimators Interval estimation Estimation in the Normal model Approximating Methods The general problem of inference Optimization techniquesAsymptotic theory Other analytical approximations Numerical integration methods Simulation methods Hypothesis Testing Introduction Classical hypothesis testingBayesian hypothesis testing Hypothesis testing and confidence intervalsAsymptotic tests Prediction...

  7. Multimodal integration in statistical learning

    Mitchell, Aaron; Christiansen, Morten Hyllekvist; Weiss, Dan

    2014-01-01

    , we investigated the ability of adults to integrate audio and visual input during statistical learning. We presented learners with a speech stream synchronized with a video of a speaker’s face. In the critical condition, the visual (e.g., /gi/) and auditory (e.g., /mi/) signals were occasionally...... facilitated participants’ ability to segment the speech stream. Our results therefore demonstrate that participants can integrate audio and visual input to perceive the McGurk illusion during statistical learning. We interpret our findings as support for modality-interactive accounts of statistical learning.......Recent advances in the field of statistical learning have established that learners are able to track regularities of multimodal stimuli, yet it is unknown whether the statistical computations are performed on integrated representations or on separate, unimodal representations. In the present study...

  8. Exclusion statistics and integrable models

    Mashkevich, S.

    1998-01-01

    The definition of exclusion statistics that was given by Haldane admits a 'statistical interaction' between distinguishable particles (multispecies statistics). For such statistics, thermodynamic quantities can be evaluated exactly; explicit expressions are presented here for cluster coefficients. Furthermore, single-species exclusion statistics is realized in one-dimensional integrable models of the Calogero-Sutherland type. The interesting questions of generalizing this correspondence to the higher-dimensional and the multispecies cases remain essentially open; however, our results provide some hints as to searches for the models in question

  9. Functional region prediction with a set of appropriate homologous sequences-an index for sequence selection by integrating structure and sequence information with spatial statistics

    2012-01-01

    Background The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions. Results We have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence

  10. Statistical Analysis of CO2 Exposed Wells to Predict Long Term Leakage through the Development of an Integrated Neural-Genetic Algorithm

    Guo, Boyun [Univ. of Louisiana, Lafayette, LA (United States); Duguid, Andrew [Battelle, Columbus, OH (United States); Nygaard, Ronar [Missouri Univ. of Science and Technology, Rolla, MO (United States)

    2017-08-05

    The objective of this project is to develop a computerized statistical model with the Integrated Neural-Genetic Algorithm (INGA) for predicting the probability of long-term leak of wells in CO2 sequestration operations. This object has been accomplished by conducting research in three phases: 1) data mining of CO2-explosed wells, 2) INGA computer model development, and 3) evaluation of the predictive performance of the computer model with data from field tests. Data mining was conducted for 510 wells in two CO2 sequestration projects in the Texas Gulf Coast region. They are the Hasting West field and Oyster Bayou field in the Southern Texas. Missing wellbore integrity data were estimated using an analytical and Finite Element Method (FEM) model. The INGA was first tested for performances of convergence and computing efficiency with the obtained data set of high dimension. It was concluded that the INGA can handle the gathered data set with good accuracy and reasonable computing time after a reduction of dimension with a grouping mechanism. A computerized statistical model with the INGA was then developed based on data pre-processing and grouping. Comprehensive training and testing of the model were carried out to ensure that the model is accurate and efficient enough for predicting the probability of long-term leak of wells in CO2 sequestration operations. The Cranfield in the southern Mississippi was select as the test site. Observation wells CFU31F2 and CFU31F3 were used for pressure-testing, formation-logging, and cement-sampling. Tools run in the wells include Isolation Scanner, Slim Cement Mapping Tool (SCMT), Cased Hole Formation Dynamics Tester (CHDT), and Mechanical Sidewall Coring Tool (MSCT). Analyses of the obtained data indicate no leak of CO2 cross the cap zone while it is evident that the well cement sheath was invaded by the CO2 from the storage zone. This observation is consistent

  11. Functional integral approach to classical statistical dynamics

    Jensen, R.V.

    1980-04-01

    A functional integral method is developed for the statistical solution of nonlinear stochastic differential equations which arise in classical dynamics. The functional integral approach provides a very natural and elegant derivation of the statistical dynamical equations that have been derived using the operator formalism of Martin, Siggia, and Rose

  12. Technical Topic 3.2.2.d Bayesian and Non-Parametric Statistics: Integration of Neural Networks with Bayesian Networks for Data Fusion and Predictive Modeling

    2016-05-31

    Distribution Unlimited UU UU UU UU 31-05-2016 15-Apr-2014 14-Jan-2015 Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics...of Papers published in non peer-reviewed journals: Final Report: Technical Topic 3.2.2.d Bayesian and Non- parametric Statistics: Integration of Neural...Transfer N/A Number of graduating undergraduates who achieved a 3.5 GPA to 4.0 (4.0 max scale ): Number of graduating undergraduates funded by a DoD funded

  13. Permutation statistical methods an integrated approach

    Berry, Kenneth J; Johnston, Janis E

    2016-01-01

    This research monograph provides a synthesis of a number of statistical tests and measures, which, at first consideration, appear disjoint and unrelated. Numerous comparisons of permutation and classical statistical methods are presented, and the two methods are compared via probability values and, where appropriate, measures of effect size. Permutation statistical methods, compared to classical statistical methods, do not rely on theoretical distributions, avoid the usual assumptions of normality and homogeneity of variance, and depend only on the data at hand. This text takes a unique approach to explaining statistics by integrating a large variety of statistical methods, and establishing the rigor of a topic that to many may seem to be a nascent field in statistics. This topic is new in that it took modern computing power to make permutation methods available to people working in the mainstream of research. This research monograph addresses a statistically-informed audience, and can also easily serve as a ...

  14. Learning Predictive Statistics: Strategies and Brain Mechanisms.

    Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe

    2017-08-30

    When immersed in a new environment, we are challenged to decipher initially incomprehensible streams of sensory information. However, quite rapidly, the brain finds structure and meaning in these incoming signals, helping us to predict and prepare ourselves for future actions. This skill relies on extracting the statistics of event streams in the environment that contain regularities of variable complexity from simple repetitive patterns to complex probabilistic combinations. Here, we test the brain mechanisms that mediate our ability to adapt to the environment's statistics and predict upcoming events. By combining behavioral training and multisession fMRI in human participants (male and female), we track the corticostriatal mechanisms that mediate learning of temporal sequences as they change in structure complexity. We show that learning of predictive structures relates to individual decision strategy; that is, selecting the most probable outcome in a given context (maximizing) versus matching the exact sequence statistics. These strategies engage distinct human brain regions: maximizing engages dorsolateral prefrontal, cingulate, sensory-motor regions, and basal ganglia (dorsal caudate, putamen), whereas matching engages occipitotemporal regions (including the hippocampus) and basal ganglia (ventral caudate). Our findings provide evidence for distinct corticostriatal mechanisms that facilitate our ability to extract behaviorally relevant statistics to make predictions. SIGNIFICANCE STATEMENT Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. Past work has studied how humans identify repetitive patterns and associative pairings. However, the natural environment contains regularities that vary in complexity from simple repetition to complex probabilistic combinations. Here, we combine behavior and multisession fMRI to track the brain mechanisms that mediate our ability to adapt to

  15. Probability and statistics with integrated software routines

    Deep, Ronald

    2005-01-01

    Probability & Statistics with Integrated Software Routines is a calculus-based treatment of probability concurrent with and integrated with statistics through interactive, tailored software applications designed to enhance the phenomena of probability and statistics. The software programs make the book unique.The book comes with a CD containing the interactive software leading to the Statistical Genie. The student can issue commands repeatedly while making parameter changes to observe the effects. Computer programming is an excellent skill for problem solvers, involving design, prototyping, data gathering, testing, redesign, validating, etc, all wrapped up in the scientific method.See also: CD to accompany Probability and Stats with Integrated Software Routines (0123694698)* Incorporates more than 1,000 engaging problems with answers* Includes more than 300 solved examples* Uses varied problem solving methods

  16. Statistical prediction of Late Miocene climate

    Fernandes, A.A; Gupta, S.M.

    by making certain simplifying assumptions; for example in modelling ocean 4 currents, the geostrophic approximation is made. In case of statistical prediction no such a priori assumption need be made. statistical prediction comprises of using observed data... the number of equations. In this case the equations are overdetermined, and therefore one has to look for a solution that best fits the sample data in a least squares sense. To this end we express the sample .... (2.1)+ ry = y + data as follows: n L c. (x...

  17. Nonparametric predictive inference in statistical process control

    Arts, G.R.J.; Coolen, F.P.A.; Laan, van der P.

    2000-01-01

    New methods for statistical process control are presented, where the inferences have a nonparametric predictive nature. We consider several problems in process control in terms of uncertainties about future observable random quantities, and we develop inferences for these random quantities hased on

  18. Nonparametric predictive inference in statistical process control

    Arts, G.R.J.; Coolen, F.P.A.; Laan, van der P.

    2004-01-01

    Statistical process control (SPC) is used to decide when to stop a process as confidence in the quality of the next item(s) is low. Information to specify a parametric model is not always available, and as SPC is of a predictive nature, we present a control chart developed using nonparametric

  19. Statistics of spatially integrated speckle intensity difference

    Hanson, Steen Grüner; Yura, Harold

    2009-01-01

    We consider the statistics of the spatially integrated speckle intensity difference obtained from two separated finite collecting apertures. For fully developed speckle, closed-form analytic solutions for both the probability density function and the cumulative distribution function are derived...... here for both arbitrary values of the mean number of speckles contained within an aperture and the degree of coherence of the optical field. Additionally, closed-form expressions are obtained for the corresponding nth statistical moments....

  20. Statistical inference an integrated Bayesianlikelihood approach

    Aitkin, Murray

    2010-01-01

    Filling a gap in current Bayesian theory, Statistical Inference: An Integrated Bayesian/Likelihood Approach presents a unified Bayesian treatment of parameter inference and model comparisons that can be used with simple diffuse prior specifications. This novel approach provides new solutions to difficult model comparison problems and offers direct Bayesian counterparts of frequentist t-tests and other standard statistical methods for hypothesis testing.After an overview of the competing theories of statistical inference, the book introduces the Bayes/likelihood approach used throughout. It pre

  1. Statistical short-term earthquake prediction.

    Kagan, Y Y; Knopoff, L

    1987-06-19

    A statistical procedure, derived from a theoretical model of fracture growth, is used to identify a foreshock sequence while it is in progress. As a predictor, the procedure reduces the average uncertainty in the rate of occurrence for a future strong earthquake by a factor of more than 1000 when compared with the Poisson rate of occurrence. About one-third of all main shocks with local magnitude greater than or equal to 4.0 in central California can be predicted in this way, starting from a 7-year database that has a lower magnitude cut off of 1.5. The time scale of such predictions is of the order of a few hours to a few days for foreshocks in the magnitude range from 2.0 to 5.0.

  2. Statistical predictions from anarchic field theory landscapes

    Balasubramanian, Vijay; Boer, Jan de; Naqvi, Asad

    2010-01-01

    Consistent coupling of effective field theories with a quantum theory of gravity appears to require bounds on the rank of the gauge group and the amount of matter. We consider landscapes of field theories subject to such to boundedness constraints. We argue that appropriately 'coarse-grained' aspects of the randomly chosen field theory in such landscapes, such as the fraction of gauge groups with ranks in a given range, can be statistically predictable. To illustrate our point we show how the uniform measures on simple classes of N=1 quiver gauge theories localize in the vicinity of theories with certain typical structures. Generically, this approach would predict a high energy theory with very many gauge factors, with the high rank factors largely decoupled from the low rank factors if we require asymptotic freedom for the latter.

  3. A statistical model for predicting muscle performance

    Byerly, Diane Leslie De Caix

    The objective of these studies was to develop a capability for predicting muscle performance and fatigue to be utilized for both space- and ground-based applications. To develop this predictive model, healthy test subjects performed a defined, repetitive dynamic exercise to failure using a Lordex spinal machine. Throughout the exercise, surface electromyography (SEMG) data were collected from the erector spinae using a Mega Electronics ME3000 muscle tester and surface electrodes placed on both sides of the back muscle. These data were analyzed using a 5th order Autoregressive (AR) model and statistical regression analysis. It was determined that an AR derived parameter, the mean average magnitude of AR poles, significantly correlated with the maximum number of repetitions (designated Rmax) that a test subject was able to perform. Using the mean average magnitude of AR poles, a test subject's performance to failure could be predicted as early as the sixth repetition of the exercise. This predictive model has the potential to provide a basis for improving post-space flight recovery, monitoring muscle atrophy in astronauts and assessing the effectiveness of countermeasures, monitoring astronaut performance and fatigue during Extravehicular Activity (EVA) operations, providing pre-flight assessment of the ability of an EVA crewmember to perform a given task, improving the design of training protocols and simulations for strenuous International Space Station assembly EVA, and enabling EVA work task sequences to be planned enhancing astronaut performance and safety. Potential ground-based, medical applications of the predictive model include monitoring muscle deterioration and performance resulting from illness, establishing safety guidelines in the industry for repetitive tasks, monitoring the stages of rehabilitation for muscle-related injuries sustained in sports and accidents, and enhancing athletic performance through improved training protocols while reducing

  4. Predicting radiotherapy outcomes using statistical learning techniques

    El Naqa, Issam; Bradley, Jeffrey D; Deasy, Joseph O; Lindsay, Patricia E; Hope, Andrew J

    2009-01-01

    Radiotherapy outcomes are determined by complex interactions between treatment, anatomical and patient-related variables. A common obstacle to building maximally predictive outcome models for clinical practice is the failure to capture potential complexity of heterogeneous variable interactions and applicability beyond institutional data. We describe a statistical learning methodology that can automatically screen for nonlinear relations among prognostic variables and generalize to unseen data before. In this work, several types of linear and nonlinear kernels to generate interaction terms and approximate the treatment-response function are evaluated. Examples of institutional datasets of esophagitis, pneumonitis and xerostomia endpoints were used. Furthermore, an independent RTOG dataset was used for 'generalizabilty' validation. We formulated the discrimination between risk groups as a supervised learning problem. The distribution of patient groups was initially analyzed using principle components analysis (PCA) to uncover potential nonlinear behavior. The performance of the different methods was evaluated using bivariate correlations and actuarial analysis. Over-fitting was controlled via cross-validation resampling. Our results suggest that a modified support vector machine (SVM) kernel method provided superior performance on leave-one-out testing compared to logistic regression and neural networks in cases where the data exhibited nonlinear behavior on PCA. For instance, in prediction of esophagitis and pneumonitis endpoints, which exhibited nonlinear behavior on PCA, the method provided 21% and 60% improvements, respectively. Furthermore, evaluation on the independent pneumonitis RTOG dataset demonstrated good generalizabilty beyond institutional data in contrast with other models. This indicates that the prediction of treatment response can be improved by utilizing nonlinear kernel methods for discovering important nonlinear interactions among model

  5. GAPIT: genome association and prediction integrated tool.

    Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu

    2012-09-15

    Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.

  6. Statistical Basis for Predicting Technological Progress

    Nagy, Béla; Farmer, J. Doyne; Bui, Quan M.; Trancik, Jessika E.

    2013-01-01

    Forecasting technological progress is of great interest to engineers, policy makers, and private investors. Several models have been proposed for predicting technological improvement, but how well do these models perform? An early hypothesis made by Theodore Wright in 1936 is that cost decreases as a power law of cumulative production. An alternative hypothesis is Moore's law, which can be generalized to say that technologies improve exponentially with time. Other alternatives were proposed by Goddard, Sinclair et al., and Nordhaus. These hypotheses have not previously been rigorously tested. Using a new database on the cost and production of 62 different technologies, which is the most expansive of its kind, we test the ability of six different postulated laws to predict future costs. Our approach involves hindcasting and developing a statistical model to rank the performance of the postulated laws. Wright's law produces the best forecasts, but Moore's law is not far behind. We discover a previously unobserved regularity that production tends to increase exponentially. A combination of an exponential decrease in cost and an exponential increase in production would make Moore's law and Wright's law indistinguishable, as originally pointed out by Sahal. We show for the first time that these regularities are observed in data to such a degree that the performance of these two laws is nearly the same. Our results show that technological progress is forecastable, with the square root of the logarithmic error growing linearly with the forecasting horizon at a typical rate of 2.5% per year. These results have implications for theories of technological change, and assessments of candidate technologies and policies for climate change mitigation. PMID:23468837

  7. Statistical basis for predicting technological progress.

    Béla Nagy

    Full Text Available Forecasting technological progress is of great interest to engineers, policy makers, and private investors. Several models have been proposed for predicting technological improvement, but how well do these models perform? An early hypothesis made by Theodore Wright in 1936 is that cost decreases as a power law of cumulative production. An alternative hypothesis is Moore's law, which can be generalized to say that technologies improve exponentially with time. Other alternatives were proposed by Goddard, Sinclair et al., and Nordhaus. These hypotheses have not previously been rigorously tested. Using a new database on the cost and production of 62 different technologies, which is the most expansive of its kind, we test the ability of six different postulated laws to predict future costs. Our approach involves hindcasting and developing a statistical model to rank the performance of the postulated laws. Wright's law produces the best forecasts, but Moore's law is not far behind. We discover a previously unobserved regularity that production tends to increase exponentially. A combination of an exponential decrease in cost and an exponential increase in production would make Moore's law and Wright's law indistinguishable, as originally pointed out by Sahal. We show for the first time that these regularities are observed in data to such a degree that the performance of these two laws is nearly the same. Our results show that technological progress is forecastable, with the square root of the logarithmic error growing linearly with the forecasting horizon at a typical rate of 2.5% per year. These results have implications for theories of technological change, and assessments of candidate technologies and policies for climate change mitigation.

  8. Thermodynamics and statistical mechanics an integrated approach

    Shell, M Scott

    2015-01-01

    Learn classical thermodynamics alongside statistical mechanics with this fresh approach to the subjects. Molecular and macroscopic principles are explained in an integrated, side-by-side manner to give students a deep, intuitive understanding of thermodynamics and equip them to tackle future research topics that focus on the nanoscale. Entropy is introduced from the get-go, providing a clear explanation of how the classical laws connect to the molecular principles, and closing the gap between the atomic world and thermodynamics. Notation is streamlined throughout, with a focus on general concepts and simple models, for building basic physical intuition and gaining confidence in problem analysis and model development. Well over 400 guided end-of-chapter problems are included, addressing conceptual, fundamental, and applied skill sets. Numerous worked examples are also provided together with handy shaded boxes to emphasize key concepts, making this the complete teaching package for students in chemical engineer...

  9. Predicting Statistical Distributions of Footbridge Vibrations

    Pedersen, Lars; Frier, Christian

    2009-01-01

    The paper considers vibration response of footbridges to pedestrian loading. Employing Newmark and Monte Carlo simulation methods, a statistical distribution of bridge vibration levels is calculated modelling walking parameters such as step frequency and stride length as random variables...

  10. Statistical prediction of parametric roll using FORM

    Jensen, Jørgen Juncher; Choi, Ju-hyuck; Nielsen, Ulrik Dam

    2017-01-01

    Previous research has shown that the First Order Reliability Method (FORM) can be an efficient method for estimation of outcrossing rates and extreme value statistics for stationary stochastic processes. This is so also for bifurcation type of processes like parametric roll of ships. The present...

  11. Time series prediction: statistical and neural techniques

    Zahirniak, Daniel R.; DeSimio, Martin P.

    1996-03-01

    In this paper we compare the performance of nonlinear neural network techniques to those of linear filtering techniques in the prediction of time series. Specifically, we compare the results of using the nonlinear systems, known as multilayer perceptron and radial basis function neural networks, with the results obtained using the conventional linear Wiener filter, Kalman filter and Widrow-Hoff adaptive filter in predicting future values of stationary and non- stationary time series. Our results indicate the performance of each type of system is heavily dependent upon the form of the time series being predicted and the size of the system used. In particular, the linear filters perform adequately for linear or near linear processes while the nonlinear systems perform better for nonlinear processes. Since the linear systems take much less time to be developed, they should be tried prior to using the nonlinear systems when the linearity properties of the time series process are unknown.

  12. Statistical Approaches for Spatiotemporal Prediction of Low Flows

    Fangmann, A.; Haberlandt, U.

    2017-12-01

    An adequate assessment of regional climate change impacts on streamflow requires the integration of various sources of information and modeling approaches. This study proposes simple statistical tools for inclusion into model ensembles, which are fast and straightforward in their application, yet able to yield accurate streamflow predictions in time and space. Target variables for all approaches are annual low flow indices derived from a data set of 51 records of average daily discharge for northwestern Germany. The models require input of climatic data in the form of meteorological drought indices, derived from observed daily climatic variables, averaged over the streamflow gauges' catchments areas. Four different modeling approaches are analyzed. Basis for all pose multiple linear regression models that estimate low flows as a function of a set of meteorological indices and/or physiographic and climatic catchment descriptors. For the first method, individual regression models are fitted at each station, predicting annual low flow values from a set of annual meteorological indices, which are subsequently regionalized using a set of catchment characteristics. The second method combines temporal and spatial prediction within a single panel data regression model, allowing estimation of annual low flow values from input of both annual meteorological indices and catchment descriptors. The third and fourth methods represent non-stationary low flow frequency analyses and require fitting of regional distribution functions. Method three is subject to a spatiotemporal prediction of an index value, method four to estimation of L-moments that adapt the regional frequency distribution to the at-site conditions. The results show that method two outperforms successive prediction in time and space. Method three also shows a high performance in the near future period, but since it relies on a stationary distribution, its application for prediction of far future changes may be

  13. Thermodynamics and statistical mechanics an integrated approach

    Hardy, Robert J

    2014-01-01

    This textbook brings together the fundamentals of the macroscopic and microscopic aspects of thermal physics by presenting thermodynamics and statistical mechanics as complementary theories based on small numbers of postulates. The book is designed to give the instructor flexibility in structuring courses for advanced undergraduates and/or beginning graduate students and is written on the principle that a good text should also be a good reference. The presentation of thermodynamics follows the logic of Clausius and Kelvin while relating the concepts involved to familiar phenomena and the mod

  14. Using machine learning, neural networks and statistics to predict bankruptcy

    Pompe, P.P.M.; Feelders, A.J.; Feelders, A.J.

    1997-01-01

    Recent literature strongly suggests that machine learning approaches to classification outperform "classical" statistical methods. We make a comparison between the performance of linear discriminant analysis, classification trees, and neural networks in predicting corporate bankruptcy. Linear

  15. Wind speed prediction using statistical regression and neural network

    Prediction of wind speed in the atmospheric boundary layer is important for wind energy assess- ment,satellite launching and aviation,etc.There are a few techniques available for wind speed prediction,which require a minimum number of input parameters.Four different statistical techniques,viz.,curve fitting,Auto Regressive ...

  16. Fatigue crack initiation and growth life prediction with statistical consideration

    Kwon, J.D.; Choi, S.H.; Kwak, S.G.; Chun, K.O.

    1991-01-01

    Life prediction or residual life prediction of structures or machines is one of the most strongly world wide needed problems as requirement in the stage of slowly developing economy which comes after rapidly and highly developing stage. For the purpose of statistical life prediction, fatigue test was conducted under the 3 stress levels, and for each stress level, 20 specimens are used. The statistical properties of the crack growth parameter m and C in the fatigue crack growth law of da/dN = C(ΔK) m , and the relationship between m and C, and the statistical distribution pattern of fatigue crack initiation, growth and fracture lives can be obtained by experimental results

  17. Statistical Methodologies to Integrate Experimental and Computational Research

    Parker, P. A.; Johnson, R. T.; Montgomery, D. C.

    2008-01-01

    Development of advanced algorithms for simulating engine flow paths requires the integration of fundamental experiments with the validation of enhanced mathematical models. In this paper, we provide an overview of statistical methods to strategically and efficiently conduct experiments and computational model refinement. Moreover, the integration of experimental and computational research efforts is emphasized. With a statistical engineering perspective, scientific and engineering expertise is combined with statistical sciences to gain deeper insights into experimental phenomenon and code development performance; supporting the overall research objectives. The particular statistical methods discussed are design of experiments, response surface methodology, and uncertainty analysis and planning. Their application is illustrated with a coaxial free jet experiment and a turbulence model refinement investigation. Our goal is to provide an overview, focusing on concepts rather than practice, to demonstrate the benefits of using statistical methods in research and development, thereby encouraging their broader and more systematic application.

  18. Analysis and Evaluation of Statistical Models for Integrated Circuits Design

    Sáenz-Noval J.J.

    2011-10-01

    Full Text Available Statistical models for integrated circuits (IC allow us to estimate the percentage of acceptable devices in the batch before fabrication. Actually, Pelgrom is the statistical model most accepted in the industry; however it was derived from a micrometer technology, which does not guarantee reliability in nanometric manufacturing processes. This work considers three of the most relevant statistical models in the industry and evaluates their limitations and advantages in analog design, so that the designer has a better criterion to make a choice. Moreover, it shows how several statistical models can be used for each one of the stages and design purposes.

  19. Learning predictive statistics from temporal sequences: Dynamics and strategies.

    Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe

    2017-10-01

    Human behavior is guided by our expectations about the future. Often, we make predictions by monitoring how event sequences unfold, even though such sequences may appear incomprehensible. Event structures in the natural environment typically vary in complexity, from simple repetition to complex probabilistic combinations. How do we learn these structures? Here we investigate the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants. Participants were asked to predict the upcoming item following a probabilistic sequence of symbols. Using a Markov process, we created a family of sequences, from simple frequency statistics (e.g., some symbols are more probable than others) to context-based statistics (e.g., symbol probability is contingent on preceding symbols). We demonstrate the dynamics with which individuals adapt to changes in the environment's statistics-that is, they extract the behaviorally relevant structures to make predictions about upcoming events. Further, we show that this structure learning relates to individual decision strategy; faster learning of complex structures relates to selection of the most probable outcome in a given context (maximizing) rather than matching of the exact sequence statistics. Our findings provide evidence for alternate routes to learning of behaviorally relevant statistics that facilitate our ability to predict future events in variable environments.

  20. A neighborhood statistics model for predicting stream pathogen indicator levels.

    Pandey, Pramod K; Pasternack, Gregory B; Majumder, Mahbubul; Soupir, Michelle L; Kaiser, Mark S

    2015-03-01

    Because elevated levels of water-borne Escherichia coli in streams are a leading cause of water quality impairments in the U.S., water-quality managers need tools for predicting aqueous E. coli levels. Presently, E. coli levels may be predicted using complex mechanistic models that have a high degree of unchecked uncertainty or simpler statistical models. To assess spatio-temporal patterns of instream E. coli levels, herein we measured E. coli, a pathogen indicator, at 16 sites (at four different times) within the Squaw Creek watershed, Iowa, and subsequently, the Markov Random Field model was exploited to develop a neighborhood statistics model for predicting instream E. coli levels. Two observed covariates, local water temperature (degrees Celsius) and mean cross-sectional depth (meters), were used as inputs to the model. Predictions of E. coli levels in the water column were compared with independent observational data collected from 16 in-stream locations. The results revealed that spatio-temporal averages of predicted and observed E. coli levels were extremely close. Approximately 66 % of individual predicted E. coli concentrations were within a factor of 2 of the observed values. In only one event, the difference between prediction and observation was beyond one order of magnitude. The mean of all predicted values at 16 locations was approximately 1 % higher than the mean of the observed values. The approach presented here will be useful while assessing instream contaminations such as pathogen/pathogen indicator levels at the watershed scale.

  1. Risk prediction model: Statistical and artificial neural network approach

    Paiman, Nuur Azreen; Hariri, Azian; Masood, Ibrahim

    2017-04-01

    Prediction models are increasingly gaining popularity and had been used in numerous areas of studies to complement and fulfilled clinical reasoning and decision making nowadays. The adoption of such models assist physician's decision making, individual's behavior, and consequently improve individual outcomes and the cost-effectiveness of care. The objective of this paper is to reviewed articles related to risk prediction model in order to understand the suitable approach, development and the validation process of risk prediction model. A qualitative review of the aims, methods and significant main outcomes of the nineteen published articles that developed risk prediction models from numerous fields were done. This paper also reviewed on how researchers develop and validate the risk prediction models based on statistical and artificial neural network approach. From the review done, some methodological recommendation in developing and validating the prediction model were highlighted. According to studies that had been done, artificial neural network approached in developing the prediction model were more accurate compared to statistical approach. However currently, only limited published literature discussed on which approach is more accurate for risk prediction model development.

  2. Advanced data analysis in neuroscience integrating statistical and computational models

    Durstewitz, Daniel

    2017-01-01

    This book is intended for use in advanced graduate courses in statistics / machine learning, as well as for all experimental neuroscientists seeking to understand statistical methods at a deeper level, and theoretical neuroscientists with a limited background in statistics. It reviews almost all areas of applied statistics, from basic statistical estimation and test theory, linear and nonlinear approaches for regression and classification, to model selection and methods for dimensionality reduction, density estimation and unsupervised clustering.  Its focus, however, is linear and nonlinear time series analysis from a dynamical systems perspective, based on which it aims to convey an understanding also of the dynamical mechanisms that could have generated observed time series. Further, it integrates computational modeling of behavioral and neural dynamics with statistical estimation and hypothesis testing. This way computational models in neuroscience are not only explanat ory frameworks, but become powerfu...

  3. Nonparametric Bayesian predictive distributions for future order statistics

    Richard A. Johnson; James W. Evans; David W. Green

    1999-01-01

    We derive the predictive distribution for a specified order statistic, determined from a future random sample, under a Dirichlet process prior. Two variants of the approach are treated and some limiting cases studied. A practical application to monitoring the strength of lumber is discussed including choices of prior expectation and comparisons made to a Bayesian...

  4. Model output statistics applied to wind power prediction

    Joensen, A; Giebel, G; Landberg, L [Risoe National Lab., Roskilde (Denmark); Madsen, H; Nielsen, H A [The Technical Univ. of Denmark, Dept. of Mathematical Modelling, Lyngby (Denmark)

    1999-03-01

    Being able to predict the output of a wind farm online for a day or two in advance has significant advantages for utilities, such as better possibility to schedule fossil fuelled power plants and a better position on electricity spot markets. In this paper prediction methods based on Numerical Weather Prediction (NWP) models are considered. The spatial resolution used in NWP models implies that these predictions are not valid locally at a specific wind farm. Furthermore, due to the non-stationary nature and complexity of the processes in the atmosphere, and occasional changes of NWP models, the deviation between the predicted and the measured wind will be time dependent. If observational data is available, and if the deviation between the predictions and the observations exhibits systematic behavior, this should be corrected for; if statistical methods are used, this approaches is usually referred to as MOS (Model Output Statistics). The influence of atmospheric turbulence intensity, topography, prediction horizon length and auto-correlation of wind speed and power is considered, and to take the time-variations into account, adaptive estimation methods are applied. Three estimation techniques are considered and compared, Extended Kalman Filtering, recursive least squares and a new modified recursive least squares algorithm. (au) EU-JOULE-3. 11 refs.

  5. IBM Watson Analytics: Automating Visualization, Descriptive, and Predictive Statistics.

    Hoyt, Robert Eugene; Snider, Dallas; Thompson, Carla; Mantravadi, Sarita

    2016-10-11

    We live in an era of explosive data generation that will continue to grow and involve all industries. One of the results of this explosion is the need for newer and more efficient data analytics procedures. Traditionally, data analytics required a substantial background in statistics and computer science. In 2015, International Business Machines Corporation (IBM) released the IBM Watson Analytics (IBMWA) software that delivered advanced statistical procedures based on the Statistical Package for the Social Sciences (SPSS). The latest entry of Watson Analytics into the field of analytical software products provides users with enhanced functions that are not available in many existing programs. For example, Watson Analytics automatically analyzes datasets, examines data quality, and determines the optimal statistical approach. Users can request exploratory, predictive, and visual analytics. Using natural language processing (NLP), users are able to submit additional questions for analyses in a quick response format. This analytical package is available free to academic institutions (faculty and students) that plan to use the tools for noncommercial purposes. To report the features of IBMWA and discuss how this software subjectively and objectively compares to other data mining programs. The salient features of the IBMWA program were examined and compared with other common analytical platforms, using validated health datasets. Using a validated dataset, IBMWA delivered similar predictions compared with several commercial and open source data mining software applications. The visual analytics generated by IBMWA were similar to results from programs such as Microsoft Excel and Tableau Software. In addition, assistance with data preprocessing and data exploration was an inherent component of the IBMWA application. Sensitivity and specificity were not included in the IBMWA predictive analytics results, nor were odds ratios, confidence intervals, or a confusion matrix

  6. IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

    Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

    2017-09-15

    Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  7. Spatial statistics for predicting flow through a rock fracture

    Coakley, K.J.

    1989-03-01

    Fluid flow through a single rock fracture depends on the shape of the space between the upper and lower pieces of rock which define the fracture. In this thesis, the normalized flow through a fracture, i.e. the equivalent permeability of a fracture, is predicted in terms of spatial statistics computed from the arrangement of voids, i.e. open spaces, and contact areas within the fracture. Patterns of voids and contact areas, with complexity typical of experimental data, are simulated by clipping a correlated Gaussian process defined on a N by N pixel square region. The voids have constant aperture; the distance between the upper and lower surfaces which define the fracture is either zero or a constant. Local flow is assumed to be proportional to local aperture cubed times local pressure gradient. The flow through a pattern of voids and contact areas is solved using a finite-difference method. After solving for the flow through simulated 10 by 10 by 30 pixel patterns of voids and contact areas, a model to predict equivalent permeability is developed. The first model is for patterns with 80% voids where all voids have the same aperture. The equivalent permeability of a pattern is predicted in terms of spatial statistics computed from the arrangement of voids and contact areas within the pattern. Four spatial statistics are examined. The change point statistic measures how often adjacent pixel alternate from void to contact area (or vice versa ) in the rows of the patterns which are parallel to the overall flow direction. 37 refs., 66 figs., 41 tabs

  8. Review of Statistical Learning Methods in Integrated Omics Studies (An Integrated Information Science).

    Zeng, Irene Sui Lan; Lumley, Thomas

    2018-01-01

    Integrated omics is becoming a new channel for investigating the complex molecular system in modern biological science and sets a foundation for systematic learning for precision medicine. The statistical/machine learning methods that have emerged in the past decade for integrated omics are not only innovative but also multidisciplinary with integrated knowledge in biology, medicine, statistics, machine learning, and artificial intelligence. Here, we review the nontrivial classes of learning methods from the statistical aspects and streamline these learning methods within the statistical learning framework. The intriguing findings from the review are that the methods used are generalizable to other disciplines with complex systematic structure, and the integrated omics is part of an integrated information science which has collated and integrated different types of information for inferences and decision making. We review the statistical learning methods of exploratory and supervised learning from 42 publications. We also discuss the strengths and limitations of the extended principal component analysis, cluster analysis, network analysis, and regression methods. Statistical techniques such as penalization for sparsity induction when there are fewer observations than the number of features and using Bayesian approach when there are prior knowledge to be integrated are also included in the commentary. For the completeness of the review, a table of currently available software and packages from 23 publications for omics are summarized in the appendix.

  9. Statistical models for expert judgement and wear prediction

    Pulkkinen, U.

    1994-01-01

    This thesis studies the statistical analysis of expert judgements and prediction of wear. The point of view adopted is the one of information theory and Bayesian statistics. A general Bayesian framework for analyzing both the expert judgements and wear prediction is presented. Information theoretic interpretations are given for some averaging techniques used in the determination of consensus distributions. Further, information theoretic models are compared with a Bayesian model. The general Bayesian framework is then applied in analyzing expert judgements based on ordinal comparisons. In this context, the value of information lost in the ordinal comparison process is analyzed by applying decision theoretic concepts. As a generalization of the Bayesian framework, stochastic filtering models for wear prediction are formulated. These models utilize the information from condition monitoring measurements in updating the residual life distribution of mechanical components. Finally, the application of stochastic control models in optimizing operational strategies for inspected components are studied. Monte-Carlo simulation methods, such as the Gibbs sampler and the stochastic quasi-gradient method, are applied in the determination of posterior distributions and in the solution of stochastic optimization problems. (orig.) (57 refs., 7 figs., 1 tab.)

  10. Predicting statistical properties of open reading frames in bacterial genomes.

    Katharina Mir

    Full Text Available An analytical model based on the statistical properties of Open Reading Frames (ORFs of eubacterial genomes such as codon composition and sequence length of all reading frames was developed. This new model predicts the average length, maximum length as well as the length distribution of the ORFs of 70 species with GC contents varying between 21% and 74%. Furthermore, the number of annotated genes is predicted with high accordance. However, the ORF length distribution in the five alternative reading frames shows interesting deviations from the predicted distribution. In particular, long ORFs appear more often than expected statistically. The unexpected depletion of stop codons in these alternative open reading frames cannot completely be explained by a biased codon usage in the +1 frame. While it is unknown if the stop codon depletion has a biological function, it could be due to a protein coding capacity of alternative ORFs exerting a selection pressure which prevents the fixation of stop codon mutations. The comparison of the analytical model with bacterial genomes, therefore, leads to a hypothesis suggesting novel gene candidates which can now be investigated in subsequent wet lab experiments.

  11. Optimizing Groundwater Monitoring Networks Using Integrated Statistical and Geostatistical Approaches

    Jay Krishna Thakur

    2015-08-01

    Full Text Available The aim of this work is to investigate new approaches using methods based on statistics and geo-statistics for spatio-temporal optimization of groundwater monitoring networks. The formulated and integrated methods were tested with the groundwater quality data set of Bitterfeld/Wolfen, Germany. Spatially, the monitoring network was optimized using geo-statistical methods. Temporal optimization of the monitoring network was carried out using Sen’s method (1968. For geostatistical network optimization, a geostatistical spatio-temporal algorithm was used to identify redundant wells in 2- and 2.5-D Quaternary and Tertiary aquifers. Influences of interpolation block width, dimension, contaminant association, groundwater flow direction and aquifer homogeneity on statistical and geostatistical methods for monitoring network optimization were analysed. The integrated approach shows 37% and 28% redundancies in the monitoring network in Quaternary aquifer and Tertiary aquifer respectively. The geostatistical method also recommends 41 and 22 new monitoring wells in the Quaternary and Tertiary aquifers respectively. In temporal optimization, an overall optimized sampling interval was recommended in terms of lower quartile (238 days, median quartile (317 days and upper quartile (401 days in the research area of Bitterfeld/Wolfen. Demonstrated methods for improving groundwater monitoring network can be used in real monitoring network optimization with due consideration given to influencing factors.

  12. Probably not future prediction using probability and statistical inference

    Dworsky, Lawrence N

    2008-01-01

    An engaging, entertaining, and informative introduction to probability and prediction in our everyday lives Although Probably Not deals with probability and statistics, it is not heavily mathematical and is not filled with complex derivations, proofs, and theoretical problem sets. This book unveils the world of statistics through questions such as what is known based upon the information at hand and what can be expected to happen. While learning essential concepts including "the confidence factor" and "random walks," readers will be entertained and intrigued as they move from chapter to chapter. Moreover, the author provides a foundation of basic principles to guide decision making in almost all facets of life including playing games, developing winning business strategies, and managing personal finances. Much of the book is organized around easy-to-follow examples that address common, everyday issues such as: How travel time is affected by congestion, driving speed, and traffic lights Why different gambling ...

  13. Prediction of slant path rain attenuation statistics at various locations

    Goldhirsh, J.

    1977-01-01

    The paper describes a method for predicting slant path attenuation statistics at arbitrary locations for variable frequencies and path elevation angles. The method involves the use of median reflectivity factor-height profiles measured with radar as well as the use of long-term point rain rate data and assumed or measured drop size distributions. The attenuation coefficient due to cloud liquid water in the presence of rain is also considered. Absolute probability fade distributions are compared for eight cases: Maryland (15 GHz), Texas (30 GHz), Slough, England (19 and 37 GHz), Fayetteville, North Carolina (13 and 18 GHz), and Cambridge, Massachusetts (13 and 18 GHz).

  14. Direct Breakthrough Curve Prediction From Statistics of Heterogeneous Conductivity Fields

    Hansen, Scott K.; Haslauer, Claus P.; Cirpka, Olaf A.; Vesselinov, Velimir V.

    2018-01-01

    This paper presents a methodology to predict the shape of solute breakthrough curves in heterogeneous aquifers at early times and/or under high degrees of heterogeneity, both cases in which the classical macrodispersion theory may not be applicable. The methodology relies on the observation that breakthrough curves in heterogeneous media are generally well described by lognormal distributions, and mean breakthrough times can be predicted analytically. The log-variance of solute arrival is thus sufficient to completely specify the breakthrough curves, and this is calibrated as a function of aquifer heterogeneity and dimensionless distance from a source plane by means of Monte Carlo analysis and statistical regression. Using the ensemble of simulated groundwater flow and solute transport realizations employed to calibrate the predictive regression, reliability estimates for the prediction are also developed. Additional theoretical contributions include heuristics for the time until an effective macrodispersion coefficient becomes applicable, and also an expression for its magnitude that applies in highly heterogeneous systems. It is seen that the results here represent a way to derive continuous time random walk transition distributions from physical considerations rather than from empirical field calibration.

  15. Uncertainty propagation for statistical impact prediction of space debris

    Hoogendoorn, R.; Mooij, E.; Geul, J.

    2018-01-01

    Predictions of the impact time and location of space debris in a decaying trajectory are highly influenced by uncertainties. The traditional Monte Carlo (MC) method can be used to perform accurate statistical impact predictions, but requires a large computational effort. A method is investigated that directly propagates a Probability Density Function (PDF) in time, which has the potential to obtain more accurate results with less computational effort. The decaying trajectory of Delta-K rocket stages was used to test the methods using a six degrees-of-freedom state model. The PDF of the state of the body was propagated in time to obtain impact-time distributions. This Direct PDF Propagation (DPP) method results in a multi-dimensional scattered dataset of the PDF of the state, which is highly challenging to process. No accurate results could be obtained, because of the structure of the DPP data and the high dimensionality. Therefore, the DPP method is less suitable for practical uncontrolled entry problems and the traditional MC method remains superior. Additionally, the MC method was used with two improved uncertainty models to obtain impact-time distributions, which were validated using observations of true impacts. For one of the two uncertainty models, statistically more valid impact-time distributions were obtained than in previous research.

  16. Statistical characterization of pitting corrosion process and life prediction

    Sheikh, A.K.; Younas, M.

    1995-01-01

    In order to prevent corrosion failures of machines and structures, it is desirable to know in advance when the corrosion damage will take place, and appropriate measures are needed to mitigate the damage. The corrosion predictions are needed both at development as well as operational stage of machines and structures. There are several forms of corrosion process through which varying degrees of damage can occur. Under certain conditions these corrosion processes at alone and in other set of conditions, several of these processes may occur simultaneously. For a certain type of machine elements and structures, such as gears, bearing, tubes, pipelines, containers, storage tanks etc., are particularly prone to pitting corrosion which is an insidious form of corrosion. The corrosion predictions are usually based on experimental results obtained from test coupons and/or field experiences of similar machines or parts of a structure. Considerable scatter is observed in corrosion processes. The probabilities nature and kinetics of pitting process makes in necessary to use statistical method to forecast the residual life of machine of structures. The focus of this paper is to characterization pitting as a time-dependent random process, and using this characterization the prediction of life to reach a critical level of pitting damage can be made. Using several data sets from literature on pitting corrosion, the extreme value modeling of pitting corrosion process, the evolution of the extreme value distribution in time, and their relationship to the reliability of machines and structure are explained. (author)

  17. Integrating Expert Knowledge with Statistical Analysis for Landslide Susceptibility Assessment at Regional Scale

    Christos Chalkias

    2016-03-01

    Full Text Available In this paper, an integration landslide susceptibility model by combining expert-based and bivariate statistical analysis (Landslide Susceptibility Index—LSI approaches is presented. Factors related with the occurrence of landslides—such as elevation, slope angle, slope aspect, lithology, land cover, Mean Annual Precipitation (MAP and Peak Ground Acceleration (PGA—were analyzed within a GIS environment. This integrated model produced a landslide susceptibility map which categorized the study area according to the probability level of landslide occurrence. The accuracy of the final map was evaluated by Receiver Operating Characteristics (ROC analysis depending on an independent (validation dataset of landslide events. The prediction ability was found to be 76% revealing that the integration of statistical analysis with human expertise can provide an acceptable landslide susceptibility assessment at regional scale.

  18. Statistical prediction of nanoparticle delivery: from culture media to cell

    Rowan Brown, M.; Hondow, Nicole; Brydson, Rik; Rees, Paul; Brown, Andrew P.; Summers, Huw D.

    2015-04-01

    The application of nanoparticles (NPs) within medicine is of great interest; their innate physicochemical characteristics provide the potential to enhance current technology, diagnostics and therapeutics. Recently a number of NP-based diagnostic and therapeutic agents have been developed for treatment of various diseases, where judicious surface functionalization is exploited to increase efficacy of administered therapeutic dose. However, quantification of heterogeneity associated with absolute dose of a nanotherapeutic (NP number), how this is trafficked across biological barriers has proven difficult to achieve. The main issue being the quantitative assessment of NP number at the spatial scale of the individual NP, data which is essential for the continued growth and development of the next generation of nanotherapeutics. Recent advances in sample preparation and the imaging fidelity of transmission electron microscopy (TEM) platforms provide information at the required spatial scale, where individual NPs can be individually identified. High spatial resolution however reduces the sample frequency and as a result dynamic biological features or processes become opaque. However, the combination of TEM data with appropriate probabilistic models provide a means to extract biophysical information that imaging alone cannot. Previously, we demonstrated that limited cell sampling via TEM can be statistically coupled to large population flow cytometry measurements to quantify exact NP dose. Here we extended this concept to link TEM measurements of NP agglomerates in cell culture media to that encapsulated within vesicles in human osteosarcoma cells. By construction and validation of a data-driven transfer function, we are able to investigate the dynamic properties of NP agglomeration through endocytosis. In particular, we statistically predict how NP agglomerates may traverse a biological barrier, detailing inter-agglomerate merging events providing the basis for

  19. Statistical prediction of nanoparticle delivery: from culture media to cell

    Brown, M Rowan; Rees, Paul; Summers, Huw D; Hondow, Nicole; Brydson, Rik; Brown, Andrew P

    2015-01-01

    The application of nanoparticles (NPs) within medicine is of great interest; their innate physicochemical characteristics provide the potential to enhance current technology, diagnostics and therapeutics. Recently a number of NP-based diagnostic and therapeutic agents have been developed for treatment of various diseases, where judicious surface functionalization is exploited to increase efficacy of administered therapeutic dose. However, quantification of heterogeneity associated with absolute dose of a nanotherapeutic (NP number), how this is trafficked across biological barriers has proven difficult to achieve. The main issue being the quantitative assessment of NP number at the spatial scale of the individual NP, data which is essential for the continued growth and development of the next generation of nanotherapeutics. Recent advances in sample preparation and the imaging fidelity of transmission electron microscopy (TEM) platforms provide information at the required spatial scale, where individual NPs can be individually identified. High spatial resolution however reduces the sample frequency and as a result dynamic biological features or processes become opaque. However, the combination of TEM data with appropriate probabilistic models provide a means to extract biophysical information that imaging alone cannot. Previously, we demonstrated that limited cell sampling via TEM can be statistically coupled to large population flow cytometry measurements to quantify exact NP dose. Here we extended this concept to link TEM measurements of NP agglomerates in cell culture media to that encapsulated within vesicles in human osteosarcoma cells. By construction and validation of a data-driven transfer function, we are able to investigate the dynamic properties of NP agglomeration through endocytosis. In particular, we statistically predict how NP agglomerates may traverse a biological barrier, detailing inter-agglomerate merging events providing the basis for

  20. Estimating Predictive Variance for Statistical Gas Distribution Modelling

    Lilienthal, Achim J.; Asadi, Sahar; Reggente, Matteo

    2009-01-01

    Recent publications in statistical gas distribution modelling have proposed algorithms that model mean and variance of a distribution. This paper argues that estimating the predictive concentration variance entails not only a gradual improvement but is rather a significant step to advance the field. This is, first, since the models much better fit the particular structure of gas distributions, which exhibit strong fluctuations with considerable spatial variations as a result of the intermittent character of gas dispersal. Second, because estimating the predictive variance allows to evaluate the model quality in terms of the data likelihood. This offers a solution to the problem of ground truth evaluation, which has always been a critical issue for gas distribution modelling. It also enables solid comparisons of different modelling approaches, and provides the means to learn meta parameters of the model, to determine when the model should be updated or re-initialised, or to suggest new measurement locations based on the current model. We also point out directions of related ongoing or potential future research work.

  1. Predicting Smoking Status Using Machine Learning Algorithms and Statistical Analysis

    Charles Frank

    2018-03-01

    Full Text Available Smoking has been proven to negatively affect health in a multitude of ways. As of 2009, smoking has been considered the leading cause of preventable morbidity and mortality in the United States, continuing to plague the country’s overall health. This study aims to investigate the viability and effectiveness of some machine learning algorithms for predicting the smoking status of patients based on their blood tests and vital readings results. The analysis of this study is divided into two parts: In part 1, we use One-way ANOVA analysis with SAS tool to show the statistically significant difference in blood test readings between smokers and non-smokers. The results show that the difference in INR, which measures the effectiveness of anticoagulants, was significant in favor of non-smokers which further confirms the health risks associated with smoking. In part 2, we use five machine learning algorithms: Naïve Bayes, MLP, Logistic regression classifier, J48 and Decision Table to predict the smoking status of patients. To compare the effectiveness of these algorithms we use: Precision, Recall, F-measure and Accuracy measures. The results show that the Logistic algorithm outperformed the four other algorithms with Precision, Recall, F-Measure, and Accuracy of 83%, 83.4%, 83.2%, 83.44%, respectively.

  2. Statistics and predictions of population, energy and environment problems

    Sobajima, Makoto

    1999-03-01

    In the situation that world's population, especially in developing countries, is rapidly growing, humankind is facing to global problems that they cannot steadily live unless they find individual places to live, obtain foods, and peacefully get energy necessary for living for centuries. For this purpose, humankind has to think what behavior they should take in the finite environment, talk, agree and execute. Though energy has been long respected as a symbol for improving living, demanded and used, they have come to limit the use making the global environment more serious. If there is sufficient energy not loading cost to the environment. If nuclear energy regarded as such one sustain the resource for long and has market competitiveness. What situation of realization of compensating new energy is now in the case the use of nuclear energy is restricted by the society fearing radioactivity. If there are promising ones for the future. One concerning with the study of energy cannot go without knowing these. The statistical materials compiled here are thought to be useful for that purpose, and are collected mainly from ones viewing future prediction based on past practices. Studies on the prediction is so important to have future measures that these data bases are expected to be improved for better accuracy. (author)

  3. Prediction of dimethyl disulfide levels from biosolids using statistical modeling.

    Gabriel, Steven A; Vilalai, Sirapong; Arispe, Susanna; Kim, Hyunook; McConnell, Laura L; Torrents, Alba; Peot, Christopher; Ramirez, Mark

    2005-01-01

    Two statistical models were used to predict the concentration of dimethyl disulfide (DMDS) released from biosolids produced by an advanced wastewater treatment plant (WWTP) located in Washington, DC, USA. The plant concentrates sludge from primary sedimentation basins in gravity thickeners (GT) and sludge from secondary sedimentation basins in dissolved air flotation (DAF) thickeners. The thickened sludge is pumped into blending tanks and then fed into centrifuges for dewatering. The dewatered sludge is then conditioned with lime before trucking out from the plant. DMDS, along with other volatile sulfur and nitrogen-containing chemicals, is known to contribute to biosolids odors. These models identified oxidation/reduction potential (ORP) values of a GT and DAF, the amount of sludge dewatered by centrifuges, and the blend ratio between GT thickened sludge and DAF thickened sludge in blending tanks as control variables. The accuracy of the developed regression models was evaluated by checking the adjusted R2 of the regression as well as the signs of coefficients associated with each variable. In general, both models explained observed DMDS levels in sludge headspace samples. The adjusted R2 value of the regression models 1 and 2 were 0.79 and 0.77, respectively. Coefficients for each regression model also had the correct sign. Using the developed models, plant operators can adjust the controllable variables to proactively decrease this odorant. Therefore, these models are a useful tool in biosolids management at WWTPs.

  4. Fast Quantum Algorithm for Predicting Descriptive Statistics of Stochastic Processes

    Williams Colin P.

    1999-01-01

    Stochastic processes are used as a modeling tool in several sub-fields of physics, biology, and finance. Analytic understanding of the long term behavior of such processes is only tractable for very simple types of stochastic processes such as Markovian processes. However, in real world applications more complex stochastic processes often arise. In physics, the complicating factor might be nonlinearities; in biology it might be memory effects; and in finance is might be the non-random intentional behavior of participants in a market. In the absence of analytic insight, one is forced to understand these more complex stochastic processes via numerical simulation techniques. In this paper we present a quantum algorithm for performing such simulations. In particular, we show how a quantum algorithm can predict arbitrary descriptive statistics (moments) of N-step stochastic processes in just O(square root of N) time. That is, the quantum complexity is the square root of the classical complexity for performing such simulations. This is a significant speedup in comparison to the current state of the art.

  5. Monthly to seasonal low flow prediction: statistical versus dynamical models

    Ionita-Scholz, Monica; Klein, Bastian; Meissner, Dennis; Rademacher, Silke

    2016-04-01

    the Alfred Wegener Institute a purely statistical scheme to generate streamflow forecasts for several months ahead. Instead of directly using teleconnection indices (e.g. NAO, AO) the idea is to identify regions with stable teleconnections between different global climate information (e.g. sea surface temperature, geopotential height etc.) and streamflow at different gauges relevant for inland waterway transport. So-called stability (correlation) maps are generated showing regions where streamflow and climate variable from previous months are significantly correlated in a 21 (31) years moving window. Finally, the optimal forecast model is established based on a multiple regression analysis of the stable predictors. We will present current results of the aforementioned approaches with focus on the River Rhine (being one of the world's most frequented waterways and the backbone of the European inland waterway network) and the Elbe River. Overall, our analysis reveals the existence of a valuable predictability of the low flows at monthly and seasonal time scales, a result that may be useful to water resources management. Given that all predictors used in the models are available at the end of each month, the forecast scheme can be used operationally to predict extreme events and to provide early warnings for upcoming low flows.

  6. An integrated model of statistical process control and maintenance based on the delayed monitoring

    Yin, Hui; Zhang, Guojun; Zhu, Haiping; Deng, Yuhao; He, Fei

    2015-01-01

    This paper develops an integrated model of statistical process control and maintenance decision. The proposal of delayed monitoring policy postpones the sampling process till a scheduled time and contributes to ten-scenarios of the production process, where equipment failure may occur besides quality shift. The equipment failure and the control chart alert trigger the corrective maintenance and the predictive maintenance, respectively. The occurrence probability, the cycle time and the cycle cost of each scenario are obtained by integral calculation; therefore, a mathematical model is established to minimize the expected cost by using the genetic algorithm. A Monte Carlo simulation experiment is conducted and compared with the integral calculation in order to ensure the analysis of the ten-scenario model. Another ordinary integrated model without delayed monitoring is also established as comparison. The results of a numerical example indicate satisfactory economic performance of the proposed model. Finally, a sensitivity analysis is performed to investigate the effect of model parameters. - Highlights: • We develop an integrated model of statistical process control and maintenance. • We propose delayed monitoring policy and derive an economic model with 10 scenarios. • We consider two deterioration mechanisms, quality shift and equipment failure. • The delayed monitoring policy will help reduce the expected cost

  7. Integrating Statistical Machine Learning in a Semantic Sensor Web for Proactive Monitoring and Control.

    Adeleke, Jude Adekunle; Moodley, Deshendran; Rens, Gavin; Adewumi, Aderemi Oluyinka

    2017-04-09

    Proactive monitoring and control of our natural and built environments is important in various application scenarios. Semantic Sensor Web technologies have been well researched and used for environmental monitoring applications to expose sensor data for analysis in order to provide responsive actions in situations of interest. While these applications provide quick response to situations, to minimize their unwanted effects, research efforts are still necessary to provide techniques that can anticipate the future to support proactive control, such that unwanted situations can be averted altogether. This study integrates a statistical machine learning based predictive model in a Semantic Sensor Web using stream reasoning. The approach is evaluated in an indoor air quality monitoring case study. A sliding window approach that employs the Multilayer Perceptron model to predict short term PM 2 . 5 pollution situations is integrated into the proactive monitoring and control framework. Results show that the proposed approach can effectively predict short term PM 2 . 5 pollution situations: precision of up to 0.86 and sensitivity of up to 0.85 is achieved over half hour prediction horizons, making it possible for the system to warn occupants or even to autonomously avert the predicted pollution situations within the context of Semantic Sensor Web.

  8. Integrating Statistical Machine Learning in a Semantic Sensor Web for Proactive Monitoring and Control

    Jude Adekunle Adeleke

    2017-04-01

    Full Text Available Proactive monitoring and control of our natural and built environments is important in various application scenarios. Semantic Sensor Web technologies have been well researched and used for environmental monitoring applications to expose sensor data for analysis in order to provide responsive actions in situations of interest. While these applications provide quick response to situations, to minimize their unwanted effects, research efforts are still necessary to provide techniques that can anticipate the future to support proactive control, such that unwanted situations can be averted altogether. This study integrates a statistical machine learning based predictive model in a Semantic Sensor Web using stream reasoning. The approach is evaluated in an indoor air quality monitoring case study. A sliding window approach that employs the Multilayer Perceptron model to predict short term PM 2 . 5 pollution situations is integrated into the proactive monitoring and control framework. Results show that the proposed approach can effectively predict short term PM 2 . 5 pollution situations: precision of up to 0.86 and sensitivity of up to 0.85 is achieved over half hour prediction horizons, making it possible for the system to warn occupants or even to autonomously avert the predicted pollution situations within the context of Semantic Sensor Web.

  9. Kalman-predictive-proportional-integral-derivative (KPPID)

    Fluerasu, A.; Sutton, M.

    2004-01-01

    With third generation synchrotron X-ray sources, it is possible to acquire detailed structural information about the system under study with time resolution orders of magnitude faster than was possible a few years ago. These advances have generated many new challenges for changing and controlling the state of the system on very short time scales, in a uniform and controlled manner. For our particular X-ray experiments on crystallization or order-disorder phase transitions in metallic alloys, we need to change the sample temperature by hundreds of degrees as fast as possible while avoiding over or under shooting. To achieve this, we designed and implemented a computer-controlled temperature tracking system which combines standard Proportional-Integral-Derivative (PID) feedback, thermal modeling and finite difference thermal calculations (feedforward), and Kalman filtering of the temperature readings in order to reduce the noise. The resulting Kalman-Predictive-Proportional-Integral-Derivative (KPPID) algorithm allows us to obtain accurate control, to minimize the response time and to avoid over/under shooting, even in systems with inherently noisy temperature readings and time delays. The KPPID temperature controller was successfully implemented at the Advanced Photon Source at Argonne National Laboratories and was used to perform coherent and time-resolved X-ray diffraction experiments.

  10. Statistical and Machine Learning Models to Predict Programming Performance

    Bergin, Susan

    2006-01-01

    This thesis details a longitudinal study on factors that influence introductory programming success and on the development of machine learning models to predict incoming student performance. Although numerous studies have developed models to predict programming success, the models struggled to achieve high accuracy in predicting the likely performance of incoming students. Our approach overcomes this by providing a machine learning technique, using a set of three significant...

  11. Statistical model based gender prediction for targeted NGS clinical panels

    Palani Kannan Kandavel

    2017-12-01

    The reference test dataset are being used to test the model. The sensitivity on predicting the gender has been increased from the current “genotype composition in ChrX” based approach. In addition, the prediction score given by the model can be used to evaluate the quality of clinical dataset. The higher prediction score towards its respective gender indicates the higher quality of sequenced data.

  12. Disturbance metrics predict a wetland Vegetation Index of Biotic Integrity

    Stapanian, Martin A.; Mack, John; Adams, Jean V.; Gara, Brian; Micacchion, Mick

    2013-01-01

    Indices of biological integrity of wetlands based on vascular plants (VIBIs) have been developed in many areas in the USA. Knowledge of the best predictors of VIBIs would enable management agencies to make better decisions regarding mitigation site selection and performance monitoring criteria. We use a novel statistical technique to develop predictive models for an established index of wetland vegetation integrity (Ohio VIBI), using as independent variables 20 indices and metrics of habitat quality, wetland disturbance, and buffer area land use from 149 wetlands in Ohio, USA. For emergent and forest wetlands, predictive models explained 61% and 54% of the variability, respectively, in Ohio VIBI scores. In both cases the most important predictor of Ohio VIBI score was a metric that assessed habitat alteration and development in the wetland. Of secondary importance as a predictor was a metric that assessed microtopography, interspersion, and quality of vegetation communities in the wetland. Metrics and indices assessing disturbance and land use of the buffer area were generally poor predictors of Ohio VIBI scores. Our results suggest that vegetation integrity of emergent and forest wetlands could be most directly enhanced by minimizing substrate and habitat disturbance within the wetland. Such efforts could include reducing or eliminating any practices that disturb the soil profile, such as nutrient enrichment from adjacent farm land, mowing, grazing, or cutting or removing woody plants.

  13. Statistical models to predict flows at monthly level in Salvajina

    Gonzalez, Harold O

    1994-01-01

    It thinks about and models of lineal regression evaluate at monthly level that they allow to predict flows in Salvajina, with base in predictions variable, like the difference of pressure between Darwin and Tahiti, precipitation in Piendamo Cauca), temperature in Port Chicama (Peru) and pressure in Tahiti

  14. Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) Status Data

    Office of Personnel Management — The Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) is a statistically cleansed sub-set of the data contained in the EHRI data warehouse. It...

  15. Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) Dynamics Data

    Office of Personnel Management — The Enterprise Human Resources Integration-Statistical Data Mart (EHRI-SDM) is a statistically cleansed sub-set of the data contained in the EHRI data warehouse. It...

  16. Statistical distribution of components of energy eigenfunctions: from nearly-integrable to chaotic

    Wang, Jiaozi; Wang, Wen-ge

    2016-01-01

    We study the statistical distribution of components in the non-perturbative parts of energy eigenfunctions (EFs), in which main bodies of the EFs lie. Our numerical simulations in five models show that deviation of the distribution from the prediction of random matrix theory (RMT) is useful in characterizing the process from nearly-integrable to chaotic, in a way somewhat similar to the nearest-level-spacing distribution. But, the statistics of EFs reveals some more properties, as described below. (i) In the process of approaching quantum chaos, the distribution of components shows a delay feature compared with the nearest-level-spacing distribution in most of the models studied. (ii) In the quantum chaotic regime, the distribution of components always shows small but notable deviation from the prediction of RMT in models possessing classical counterparts, while, the deviation can be almost negligible in models not possessing classical counterparts. (iii) In models whose Hamiltonian matrices possess a clear band structure, tails of EFs show statistical behaviors obviously different from those in the main bodies, while, the difference is smaller for Hamiltonian matrices without a clear band structure.

  17. PGT: A Statistical Approach to Prediction and Mechanism Design

    Wolpert, David H.; Bono, James W.

    One of the biggest challenges facing behavioral economics is the lack of a single theoretical framework that is capable of directly utilizing all types of behavioral data. One of the biggest challenges of game theory is the lack of a framework for making predictions and designing markets in a manner that is consistent with the axioms of decision theory. An approach in which solution concepts are distribution-valued rather than set-valued (i.e. equilibrium theory) has both capabilities. We call this approach Predictive Game Theory (or PGT). This paper outlines a general Bayesian approach to PGT. It also presents one simple example to illustrate the way in which this approach differs from equilibrium approaches in both prediction and mechanism design settings.

  18. Knowledge-Intensive Gathering and Integration of Statistical Information on European Fisheries

    Klinkert, M.; Treur, J.; Verwaart, T.; Loganantharaj, R.; Palm, G.; Ali, M.

    2000-01-01

    Gathering, maintenance, integration and presentation of statistics are major activities of the Dutch Agricultural Economics Research Institute LEI. In this paper we explore how knowledge and agent technology can be exploited to support the information gathering and integration process. In

  19. Multivariate statistical models for disruption prediction at ASDEX Upgrade

    Aledda, R.; Cannas, B.; Fanni, A.; Sias, G.; Pautasso, G.

    2013-01-01

    In this paper, a disruption prediction system for ASDEX Upgrade has been proposed that does not require disruption terminated experiments to be implemented. The system consists of a data-based model, which is built using only few input signals coming from successfully terminated pulses. A fault detection and isolation approach has been used, where the prediction is based on the analysis of the residuals of an auto regressive exogenous input model. The prediction performance of the proposed system is encouraging when it is applied to the same set of campaigns used to implement the model. However, the false alarms significantly increase when we tested the system on discharges coming from experimental campaigns temporally far from those used to train the model. This is due to the well know aging effect inherent in the data-based models. The main advantage of the proposed method, with respect to other data-based approaches in literature, is that it does not need data on experiments terminated with a disruption, as it uses a normal operating conditions model. This is a big advantage in the prospective of a prediction system for ITER, where a limited number of disruptions can be allowed

  20. Statistical tests for equal predictive ability across multiple forecasting methods

    Borup, Daniel; Thyrsgaard, Martin

    We develop a multivariate generalization of the Giacomini-White tests for equal conditional predictive ability. The tests are applicable to a mixture of nested and non-nested models, incorporate estimation uncertainty explicitly, and allow for misspecification of the forecasting model as well as ...

  1. A systems approach to integrative biology: an overview of statistical methods to elucidate association and architecture.

    Ciaccio, Mark F; Finkle, Justin D; Xue, Albert Y; Bagheri, Neda

    2014-07-01

    An organism's ability to maintain a desired physiological response relies extensively on how cellular and molecular signaling networks interpret and react to environmental cues. The capacity to quantitatively predict how networks respond to a changing environment by modifying signaling regulation and phenotypic responses will help inform and predict the impact of a changing global enivronment on organisms and ecosystems. Many computational strategies have been developed to resolve cue-signal-response networks. However, selecting a strategy that answers a specific biological question requires knowledge both of the type of data being collected, and of the strengths and weaknesses of different computational regimes. We broadly explore several computational approaches, and we evaluate their accuracy in predicting a given response. Specifically, we describe how statistical algorithms can be used in the context of integrative and comparative biology to elucidate the genomic, proteomic, and/or cellular networks responsible for robust physiological response. As a case study, we apply this strategy to a dataset of quantitative levels of protein abundance from the mussel, Mytilus galloprovincialis, to uncover the temperature-dependent signaling network. © The Author 2014. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.

  2. Foundations of Complex Systems Nonlinear Dynamics, Statistical Physics, and Prediction

    Nicolis, Gregoire

    2007-01-01

    Complexity is emerging as a post-Newtonian paradigm for approaching a large body of phenomena of concern at the crossroads of physical, engineering, environmental, life and human sciences from a unifying point of view. This book outlines the foundations of modern complexity research as it arose from the cross-fertilization of ideas and tools from nonlinear science, statistical physics and numerical simulation. It is shown how these developments lead to an understanding, both qualitative and quantitative, of the complex systems encountered in nature and in everyday experience and, conversely, h

  3. Prediction of lacking control power in power plants using statistical models

    Odgaard, Peter Fogh; Mataji, B.; Stoustrup, Jakob

    2007-01-01

    Prediction of the performance of plants like power plants is of interest, since the plant operator can use these predictions to optimize the plant production. In this paper the focus is addressed on a special case where a combination of high coal moisture content and a high load limits the possible...... plant load, meaning that the requested plant load cannot be met. The available models are in this case uncertain. Instead statistical methods are used to predict upper and lower uncertainty bounds on the prediction. Two different methods are used. The first relies on statistics of recent prediction...... errors; the second uses operating point depending statistics of prediction errors. Using these methods on the previous mentioned case, it can be concluded that the second method can be used to predict the power plant performance, while the first method has problems predicting the uncertain performance...

  4. A statistical approach for predicting thermal diffusivity profiles in fusion plasmas as a transport model

    Yokoyama, Masayuki

    2014-01-01

    A statistical approach is proposed to predict thermal diffusivity profiles as a transport “model” in fusion plasmas. It can provide regression expressions for the ion and electron heat diffusivities (χ i and χ e ), separately, to construct their radial profiles. An approach that this letter is proposing outstrips the conventional scaling laws for the global confinement time (τ E ) since it also deals with profiles (temperature, density, heating depositions etc.). This approach has become possible with the analysis database accumulated by the extensive application of the integrated transport analysis suite to experiment data. In this letter, TASK3D-a analysis database for high-ion-temperature (high-T i ) plasmas in the LHD (Large Helical Device) is used as an example to describe an approach. (author)

  5. A statistical model for aggregating judgments by incorporating peer predictions

    McCoy, John; Prelec, Drazen

    2017-01-01

    We propose a probabilistic model to aggregate the answers of respondents answering multiple-choice questions. The model does not assume that everyone has access to the same information, and so does not assume that the consensus answer is correct. Instead, it infers the most probable world state, even if only a minority vote for it. Each respondent is modeled as receiving a signal contingent on the actual world state, and as using this signal to both determine their own answer and predict the ...

  6. A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions.

    Pavlacky, David C; Lukacs, Paul M; Blakesley, Jennifer A; Skorkowsky, Robert C; Klute, David S; Hahn, Beth A; Dreitz, Victoria J; George, T Luke; Hanni, David J

    2017-01-01

    Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1) coordination across organizations and regions, 2) meaningful management and conservation objectives, and 3) rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR) program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17). We provide two examples for the Brewer's sparrow (Spizella breweri) in BCR 17 demonstrating the ability of the design to 1) determine hierarchical population responses to landscape change and 2) estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous statistical

  7. A statistically rigorous sampling design to integrate avian monitoring and management within Bird Conservation Regions.

    David C Pavlacky

    Full Text Available Monitoring is an essential component of wildlife management and conservation. However, the usefulness of monitoring data is often undermined by the lack of 1 coordination across organizations and regions, 2 meaningful management and conservation objectives, and 3 rigorous sampling designs. Although many improvements to avian monitoring have been discussed, the recommendations have been slow to emerge in large-scale programs. We introduce the Integrated Monitoring in Bird Conservation Regions (IMBCR program designed to overcome the above limitations. Our objectives are to outline the development of a statistically defensible sampling design to increase the value of large-scale monitoring data and provide example applications to demonstrate the ability of the design to meet multiple conservation and management objectives. We outline the sampling process for the IMBCR program with a focus on the Badlands and Prairies Bird Conservation Region (BCR 17. We provide two examples for the Brewer's sparrow (Spizella breweri in BCR 17 demonstrating the ability of the design to 1 determine hierarchical population responses to landscape change and 2 estimate hierarchical habitat relationships to predict the response of the Brewer's sparrow to conservation efforts at multiple spatial scales. The collaboration across organizations and regions provided economy of scale by leveraging a common data platform over large spatial scales to promote the efficient use of monitoring resources. We designed the IMBCR program to address the information needs and core conservation and management objectives of the participating partner organizations. Although it has been argued that probabilistic sampling designs are not practical for large-scale monitoring, the IMBCR program provides a precedent for implementing a statistically defensible sampling design from local to bioregional scales. We demonstrate that integrating conservation and management objectives with rigorous

  8. Integrating Statistical Visualization Research into the Political Science Classroom

    Draper, Geoffrey M.; Liu, Baodong; Riesenfeld, Richard F.

    2011-01-01

    The use of computer software to facilitate learning in political science courses is well established. However, the statistical software packages used in many political science courses can be difficult to use and counter-intuitive. We describe the results of a preliminary user study suggesting that visually-oriented analysis software can help…

  9. Manufacturing Squares: An Integrative Statistical Process Control Exercise

    Coy, Steven P.

    2016-01-01

    In the exercise, students in a junior-level operations management class are asked to manufacture a simple product. Given product specifications, they must design a production process, create roles and design jobs for each team member, and develop a statistical process control plan that efficiently and effectively controls quality during…

  10. Statistical Models for Predicting Threat Detection From Human Behavior

    Kelley, Timothy; Amon, Mary J.; Bertenthal, Bennett I.

    2018-01-01

    Users must regularly distinguish between secure and insecure cyber platforms in order to preserve their privacy and safety. Mouse tracking is an accessible, high-resolution measure that can be leveraged to understand the dynamics of perception, categorization, and decision-making in threat detection. Researchers have begun to utilize measures like mouse tracking in cyber security research, including in the study of risky online behavior. However, it remains an empirical question to what extent real-time information about user behavior is predictive of user outcomes and demonstrates added value compared to traditional self-report questionnaires. Participants navigated through six simulated websites, which resembled either secure “non-spoof” or insecure “spoof” versions of popular websites. Websites also varied in terms of authentication level (i.e., extended validation, standard validation, or partial encryption). Spoof websites had modified Uniform Resource Locator (URL) and authentication level. Participants chose to “login” to or “back” out of each website based on perceived website security. Mouse tracking information was recorded throughout the task, along with task performance. After completing the website identification task, participants completed a questionnaire assessing their security knowledge and degree of familiarity with the websites simulated during the experiment. Despite being primed to the possibility of website phishing attacks, participants generally showed a bias for logging in to websites versus backing out of potentially dangerous sites. Along these lines, participant ability to identify spoof websites was around the level of chance. Hierarchical Bayesian logistic models were used to compare the accuracy of two-factor (i.e., website security and encryption level), survey-based (i.e., security knowledge and website familiarity), and real-time measures (i.e., mouse tracking) in predicting risky online behavior during phishing

  11. Statistical Models for Predicting Threat Detection From Human Behavior

    Timothy Kelley

    2018-04-01

    Full Text Available Users must regularly distinguish between secure and insecure cyber platforms in order to preserve their privacy and safety. Mouse tracking is an accessible, high-resolution measure that can be leveraged to understand the dynamics of perception, categorization, and decision-making in threat detection. Researchers have begun to utilize measures like mouse tracking in cyber security research, including in the study of risky online behavior. However, it remains an empirical question to what extent real-time information about user behavior is predictive of user outcomes and demonstrates added value compared to traditional self-report questionnaires. Participants navigated through six simulated websites, which resembled either secure “non-spoof” or insecure “spoof” versions of popular websites. Websites also varied in terms of authentication level (i.e., extended validation, standard validation, or partial encryption. Spoof websites had modified Uniform Resource Locator (URL and authentication level. Participants chose to “login” to or “back” out of each website based on perceived website security. Mouse tracking information was recorded throughout the task, along with task performance. After completing the website identification task, participants completed a questionnaire assessing their security knowledge and degree of familiarity with the websites simulated during the experiment. Despite being primed to the possibility of website phishing attacks, participants generally showed a bias for logging in to websites versus backing out of potentially dangerous sites. Along these lines, participant ability to identify spoof websites was around the level of chance. Hierarchical Bayesian logistic models were used to compare the accuracy of two-factor (i.e., website security and encryption level, survey-based (i.e., security knowledge and website familiarity, and real-time measures (i.e., mouse tracking in predicting risky online behavior

  12. Statistical Models for Predicting Threat Detection From Human Behavior.

    Kelley, Timothy; Amon, Mary J; Bertenthal, Bennett I

    2018-01-01

    Users must regularly distinguish between secure and insecure cyber platforms in order to preserve their privacy and safety. Mouse tracking is an accessible, high-resolution measure that can be leveraged to understand the dynamics of perception, categorization, and decision-making in threat detection. Researchers have begun to utilize measures like mouse tracking in cyber security research, including in the study of risky online behavior. However, it remains an empirical question to what extent real-time information about user behavior is predictive of user outcomes and demonstrates added value compared to traditional self-report questionnaires. Participants navigated through six simulated websites, which resembled either secure "non-spoof" or insecure "spoof" versions of popular websites. Websites also varied in terms of authentication level (i.e., extended validation, standard validation, or partial encryption). Spoof websites had modified Uniform Resource Locator (URL) and authentication level. Participants chose to "login" to or "back" out of each website based on perceived website security. Mouse tracking information was recorded throughout the task, along with task performance. After completing the website identification task, participants completed a questionnaire assessing their security knowledge and degree of familiarity with the websites simulated during the experiment. Despite being primed to the possibility of website phishing attacks, participants generally showed a bias for logging in to websites versus backing out of potentially dangerous sites. Along these lines, participant ability to identify spoof websites was around the level of chance. Hierarchical Bayesian logistic models were used to compare the accuracy of two-factor (i.e., website security and encryption level), survey-based (i.e., security knowledge and website familiarity), and real-time measures (i.e., mouse tracking) in predicting risky online behavior during phishing attacks

  13. Prediction of Frost Occurrences Using Statistical Modeling Approaches

    Hyojin Lee

    2016-01-01

    Full Text Available We developed the frost prediction models in spring in Korea using logistic regression and decision tree techniques. Hit Rate (HR, Probability of Detection (POD, and False Alarm Rate (FAR from both models were calculated and compared. Threshold values for the logistic regression models were selected to maximize HR and POD and minimize FAR for each station, and the split for the decision tree models was stopped when change in entropy was relatively small. Average HR values were 0.92 and 0.91 for logistic regression and decision tree techniques, respectively, average POD values were 0.78 and 0.80 for logistic regression and decision tree techniques, respectively, and average FAR values were 0.22 and 0.28 for logistic regression and decision tree techniques, respectively. The average numbers of selected explanatory variables were 5.7 and 2.3 for logistic regression and decision tree techniques, respectively. Fewer explanatory variables can be more appropriate for operational activities to provide a timely warning for the prevention of the frost damages to agricultural crops. We concluded that the decision tree model can be more useful for the timely warning system. It is recommended that the models should be improved to reflect local topological features.

  14. A weighted generalized score statistic for comparison of predictive values of diagnostic tests.

    Kosinski, Andrzej S

    2013-03-15

    Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.

  15. Statistics

    Hayslett, H T

    1991-01-01

    Statistics covers the basic principles of Statistics. The book starts by tackling the importance and the two kinds of statistics; the presentation of sample data; the definition, illustration and explanation of several measures of location; and the measures of variation. The text then discusses elementary probability, the normal distribution and the normal approximation to the binomial. Testing of statistical hypotheses and tests of hypotheses about the theoretical proportion of successes in a binomial population and about the theoretical mean of a normal population are explained. The text the

  16. Impact of statistical learning methods on the predictive power of multivariate normal tissue complication probability models

    Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A.; van t Veld, Aart A.

    2012-01-01

    PURPOSE: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. METHODS AND MATERIALS: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator

  17. Statistical and Biophysical Models for Predicting Total and Outdoor Water Use in Los Angeles

    Mini, C.; Hogue, T. S.; Pincetl, S.

    2012-04-01

    Modeling water demand is a complex exercise in the choice of the functional form, techniques and variables to integrate in the model. The goal of the current research is to identify the determinants that control total and outdoor residential water use in semi-arid cities and to utilize that information in the development of statistical and biophysical models that can forecast spatial and temporal urban water use. The City of Los Angeles is unique in its highly diverse socio-demographic, economic and cultural characteristics across neighborhoods, which introduces significant challenges in modeling water use. Increasing climate variability also contributes to uncertainties in water use predictions in urban areas. Monthly individual water use records were acquired from the Los Angeles Department of Water and Power (LADWP) for the 2000 to 2010 period. Study predictors of residential water use include socio-demographic, economic, climate and landscaping variables at the zip code level collected from US Census database. Climate variables are estimated from ground-based observations and calculated at the centroid of each zip code by inverse-distance weighting method. Remotely-sensed products of vegetation biomass and landscape land cover are also utilized. Two linear regression models were developed based on the panel data and variables described: a pooled-OLS regression model and a linear mixed effects model. Both models show income per capita and the percentage of landscape areas in each zip code as being statistically significant predictors. The pooled-OLS model tends to over-estimate higher water use zip codes and both models provide similar RMSE values.Outdoor water use was estimated at the census tract level as the residual between total water use and indoor use. This residual is being compared with the output from a biophysical model including tree and grass cover areas, climate variables and estimates of evapotranspiration at very high spatial resolution. A

  18. Statistics

    Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.

  19. Scalable Integrated Region-Based Image Retrieval Using IRM and Statistical Clustering.

    Wang, James Z.; Du, Yanping

    Statistical clustering is critical in designing scalable image retrieval systems. This paper presents a scalable algorithm for indexing and retrieving images based on region segmentation. The method uses statistical clustering on region features and IRM (Integrated Region Matching), a measure developed to evaluate overall similarity between images…

  20. Program integration of predictive maintenance with reliability centered maintenance

    Strong, D.K. Jr; Wray, D.M.

    1990-01-01

    This paper addresses improving the safety and reliability of power plants in a cost-effective manner by integrating the recently developed reliability centered maintenance techniques with the traditional predictive maintenance techniques of nuclear power plants. The topics of the paper include a description of reliability centered maintenance (RCM), enhancing RCM with predictive maintenance, predictive maintenance programs, condition monitoring techniques, performance test techniques, the mid-Atlantic Reliability Centered Maintenance Users Group, test guides and the benefits of shared guide development

  1. The extraction and integration framework: a two-process account of statistical learning.

    Thiessen, Erik D; Kronstein, Alexandra T; Hufnagle, Daniel G

    2013-07-01

    The term statistical learning in infancy research originally referred to sensitivity to transitional probabilities. Subsequent research has demonstrated that statistical learning contributes to infant development in a wide array of domains. The range of statistical learning phenomena necessitates a broader view of the processes underlying statistical learning. Learners are sensitive to a much wider range of statistical information than the conditional relations indexed by transitional probabilities, including distributional and cue-based statistics. We propose a novel framework that unifies learning about all of these kinds of statistical structure. From our perspective, learning about conditional relations outputs discrete representations (such as words). Integration across these discrete representations yields sensitivity to cues and distributional information. To achieve sensitivity to all of these kinds of statistical structure, our framework combines processes that extract segments of the input with processes that compare across these extracted items. In this framework, the items extracted from the input serve as exemplars in long-term memory. The similarity structure of those exemplars in long-term memory leads to the discovery of cues and categorical structure, which guides subsequent extraction. The extraction and integration framework provides a way to explain sensitivity to both conditional statistical structure (such as transitional probabilities) and distributional statistical structure (such as item frequency and variability), and also a framework for thinking about how these different aspects of statistical learning influence each other. 2013 APA, all rights reserved

  2. Integrating functional data to prioritize causal variants in statistical fine-mapping studies.

    Gleb Kichaev

    2014-10-01

    Full Text Available Standard statistical approaches for prioritization of variants for functional testing in fine-mapping studies either use marginal association statistics or estimate posterior probabilities for variants to be causal under simplifying assumptions. Here, we present a probabilistic framework that integrates association strength with functional genomic annotation data to improve accuracy in selecting plausible causal variants for functional validation. A key feature of our approach is that it empirically estimates the contribution of each functional annotation to the trait of interest directly from summary association statistics while allowing for multiple causal variants at any risk locus. We devise efficient algorithms that estimate the parameters of our model across all risk loci to further increase performance. Using simulations starting from the 1000 Genomes data, we find that our framework consistently outperforms the current state-of-the-art fine-mapping methods, reducing the number of variants that need to be selected to capture 90% of the causal variants from an average of 13.3 to 10.4 SNPs per locus (as compared to the next-best performing strategy. Furthermore, we introduce a cost-to-benefit optimization framework for determining the number of variants to be followed up in functional assays and assess its performance using real and simulation data. We validate our findings using a large scale meta-analysis of four blood lipids traits and find that the relative probability for causality is increased for variants in exons and transcription start sites and decreased in repressed genomic regions at the risk loci of these traits. Using these highly predictive, trait-specific functional annotations, we estimate causality probabilities across all traits and variants, reducing the size of the 90% confidence set from an average of 17.5 to 13.5 variants per locus in this data.

  3. Statistics

    2005-01-01

    For the years 2004 and 2005 the figures shown in the tables of Energy Review are partly preliminary. The annual statistics published in Energy Review are presented in more detail in a publication called Energy Statistics that comes out yearly. Energy Statistics also includes historical time-series over a longer period of time (see e.g. Energy Statistics, Statistics Finland, Helsinki 2004.) The applied energy units and conversion coefficients are shown in the back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes, precautionary stock fees and oil pollution fees

  4. Statistics

    2001-01-01

    For the year 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions from the use of fossil fuels, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in 2000, Energy exports by recipient country in 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  5. Statistics

    2000-01-01

    For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g., Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-March 2000, Energy exports by recipient country in January-March 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  6. Statistics

    1999-01-01

    For the year 1998 and the year 1999, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy Review appear in more detail from the publication Energiatilastot - Energy Statistics issued annually, which also includes historical time series over a longer period (see e.g. Energiatilastot 1998, Statistics Finland, Helsinki 1999, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 1999, Energy exports by recipient country in January-June 1999, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  7. Predictive integrated modelling for ITER scenarios

    Artaud, J.F.; Imbeaux, F.; Aniel, T.; Basiuk, V.; Eriksson, L.G.; Giruzzi, G.; Hoang, G.T.; Huysmans, G.; Joffrin, E.; Peysson, Y.; Schneider, M.; Thomas, P.

    2005-01-01

    The uncertainty on the prediction of ITER scenarios is evaluated. 2 transport models which have been extensively validated against the multi-machine database are used for the computation of the transport coefficients. The first model is GLF23, the second called Kiauto is a model in which the profile of dilution coefficient is a gyro Bohm-like analytical function, renormalized in order to get profiles consistent with a given global energy confinement scaling. The package of codes CRONOS is used, it gives access to the dynamics of the discharge and allows the study of interplay between heat transport, current diffusion and sources. The main motivation of this work is to study the influence of parameters such plasma current, heat, density, impurities and toroidal moment transport. We can draw the following conclusions: 1) the target Q = 10 can be obtained in ITER hybrid scenario at I p = 13 MA, using either the DS03 two terms scaling or the GLF23 model based on the same pedestal; 2) I p = 11.3 MA, Q = 10 can be reached only assuming a very peaked pressure profile and a low pedestal; 3) at fixed Greenwald fraction, Q increases with density peaking; 4) achieving a stationary q-profile with q > 1 requires a large non-inductive current fraction (80%) that could be provided by 20 to 40 MW of LHCD; and 5) owing to the high temperature the q-profile penetration is delayed and q = 1 is reached about 600 s in ITER hybrid scenario at I p = 13 MA, in the absence of active q-profile control. (A.C.)

  8. On Extrapolating Past the Range of Observed Data When Making Statistical Predictions in Ecology.

    Paul B Conn

    Full Text Available Ecologists are increasingly using statistical models to predict animal abundance and occurrence in unsampled locations. The reliability of such predictions depends on a number of factors, including sample size, how far prediction locations are from the observed data, and similarity of predictive covariates in locations where data are gathered to locations where predictions are desired. In this paper, we propose extending Cook's notion of an independent variable hull (IVH, developed originally for application with linear regression models, to generalized regression models as a way to help assess the potential reliability of predictions in unsampled areas. Predictions occurring inside the generalized independent variable hull (gIVH can be regarded as interpolations, while predictions occurring outside the gIVH can be regarded as extrapolations worthy of additional investigation or skepticism. We conduct a simulation study to demonstrate the usefulness of this metric for limiting the scope of spatial inference when conducting model-based abundance estimation from survey counts. In this case, limiting inference to the gIVH substantially reduces bias, especially when survey designs are spatially imbalanced. We also demonstrate the utility of the gIVH in diagnosing problematic extrapolations when estimating the relative abundance of ribbon seals in the Bering Sea as a function of predictive covariates. We suggest that ecologists routinely use diagnostics such as the gIVH to help gauge the reliability of predictions from statistical models (such as generalized linear, generalized additive, and spatio-temporal regression models.

  9. Statistics

    2003-01-01

    For the year 2002, part of the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot 2001, Statistics Finland, Helsinki 2002). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supply and total consumption of electricity GWh, Energy imports by country of origin in January-June 2003, Energy exports by recipient country in January-June 2003, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees on energy products

  10. Statistics

    2004-01-01

    For the year 2003 and 2004, the figures shown in the tables of the Energy Review are partly preliminary. The annual statistics of the Energy Review also includes historical time-series over a longer period (see e.g. Energiatilastot, Statistics Finland, Helsinki 2003, ISSN 0785-3165). The applied energy units and conversion coefficients are shown in the inside back cover of the Review. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in GDP, energy consumption and electricity consumption, Carbon dioxide emissions from fossile fuels use, Coal consumption, Consumption of natural gas, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices in heat production, Fuel prices in electricity production, Price of electricity by type of consumer, Average monthly spot prices at the Nord pool power exchange, Total energy consumption by source and CO 2 -emissions, Supplies and total consumption of electricity GWh, Energy imports by country of origin in January-March 2004, Energy exports by recipient country in January-March 2004, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Price of natural gas by type of consumer, Price of electricity by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Excise taxes, precautionary stock fees on oil pollution fees

  11. Statistics

    2000-01-01

    For the year 1999 and 2000, part of the figures shown in the tables of the Energy Review are preliminary or estimated. The annual statistics of the Energy also includes historical time series over a longer period (see e.g., Energiatilastot 1999, Statistics Finland, Helsinki 2000, ISSN 0785-3165). The inside of the Review's back cover shows the energy units and the conversion coefficients used for them. Explanatory notes to the statistical tables can be found after tables and figures. The figures presents: Changes in the volume of GNP and energy consumption, Changes in the volume of GNP and electricity, Coal consumption, Natural gas consumption, Peat consumption, Domestic oil deliveries, Import prices of oil, Consumer prices of principal oil products, Fuel prices for heat production, Fuel prices for electricity production, Carbon dioxide emissions, Total energy consumption by source and CO 2 -emissions, Electricity supply, Energy imports by country of origin in January-June 2000, Energy exports by recipient country in January-June 2000, Consumer prices of liquid fuels, Consumer prices of hard coal, natural gas and indigenous fuels, Average electricity price by type of consumer, Price of district heating by type of consumer, Excise taxes, value added taxes and fiscal charges and fees included in consumer prices of some energy sources and Energy taxes and precautionary stock fees on oil products

  12. Application of statistical classification methods for predicting the acceptability of well-water quality

    Cameron, Enrico; Pilla, Giorgio; Stella, Fabio A.

    2018-01-01

    The application of statistical classification methods is investigated—in comparison also to spatial interpolation methods—for predicting the acceptability of well-water quality in a situation where an effective quantitative model of the hydrogeological system under consideration cannot be developed. In the example area in northern Italy, in particular, the aquifer is locally affected by saline water and the concentration of chloride is the main indicator of both saltwater occurrence and groundwater quality. The goal is to predict if the chloride concentration in a water well will exceed the allowable concentration so that the water is unfit for the intended use. A statistical classification algorithm achieved the best predictive performances and the results of the study show that statistical classification methods provide further tools for dealing with groundwater quality problems concerning hydrogeological systems that are too difficult to describe analytically or to simulate effectively.

  13. Statistical model for prediction of hearing loss in patients receiving cisplatin chemotherapy.

    Johnson, Andrew; Tarima, Sergey; Wong, Stuart; Friedland, David R; Runge, Christina L

    2013-03-01

    This statistical model might be used to predict cisplatin-induced hearing loss, particularly in patients undergoing concomitant radiotherapy. To create a statistical model based on pretreatment hearing thresholds to provide an individual probability for hearing loss from cisplatin therapy and, secondarily, to investigate the use of hearing classification schemes as predictive tools for hearing loss. Retrospective case-control study. Tertiary care medical center. A total of 112 subjects receiving chemotherapy and audiometric evaluation were evaluated for the study. Of these subjects, 31 met inclusion criteria for analysis. The primary outcome measurement was a statistical model providing the probability of hearing loss following the use of cisplatin chemotherapy. Fifteen of the 31 subjects had significant hearing loss following cisplatin chemotherapy. American Academy of Otolaryngology-Head and Neck Society and Gardner-Robertson hearing classification schemes revealed little change in hearing grades between pretreatment and posttreatment evaluations for subjects with or without hearing loss. The Chang hearing classification scheme could effectively be used as a predictive tool in determining hearing loss with a sensitivity of 73.33%. Pretreatment hearing thresholds were used to generate a statistical model, based on quadratic approximation, to predict hearing loss (C statistic = 0.842, cross-validated = 0.835). The validity of the model improved when only subjects who received concurrent head and neck irradiation were included in the analysis (C statistic = 0.91). A calculated cutoff of 0.45 for predicted probability has a cross-validated sensitivity and specificity of 80%. Pretreatment hearing thresholds can be used as a predictive tool for cisplatin-induced hearing loss, particularly with concomitant radiotherapy.

  14. Statistical learning and probabilistic prediction in music cognition: mechanisms of stylistic enculturation.

    Pearce, Marcus T

    2018-05-11

    Music perception depends on internal psychological models derived through exposure to a musical culture. It is hypothesized that this musical enculturation depends on two cognitive processes: (1) statistical learning, in which listeners acquire internal cognitive models of statistical regularities present in the music to which they are exposed; and (2) probabilistic prediction based on these learned models that enables listeners to organize and process their mental representations of music. To corroborate these hypotheses, I review research that uses a computational model of probabilistic prediction based on statistical learning (the information dynamics of music (IDyOM) model) to simulate data from empirical studies of human listeners. The results show that a broad range of psychological processes involved in music perception-expectation, emotion, memory, similarity, segmentation, and meter-can be understood in terms of a single, underlying process of probabilistic prediction using learned statistical models. Furthermore, IDyOM simulations of listeners from different musical cultures demonstrate that statistical learning can plausibly predict causal effects of differential cultural exposure to musical styles, providing a quantitative model of cultural distance. Understanding the neural basis of musical enculturation will benefit from close coordination between empirical neuroimaging and computational modeling of underlying mechanisms, as outlined here. © 2018 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals, Inc. on behalf of New York Academy of Sciences.

  15. UK Environmental Prediction - integration and evaluation at the convective scale

    Fallmann, Joachim; Lewis, Huw; Castillo, Juan Manuel; Pearson, David; Harris, Chris; Saulter, Andy; Bricheno, Lucy; Blyth, Eleanor

    2016-04-01

    Traditionally, the simulation of regional ocean, wave and atmosphere components of the Earth System have been considered separately, with some information on other components provided by means of boundary or forcing conditions. More recently, the potential value of a more integrated approach, as required for global climate and Earth System prediction, for regional short-term applications has begun to gain increasing research effort. In the UK, this activity is motivated by an understanding that accurate prediction and warning of the impacts of severe weather requires an integrated approach to forecasting. The substantial impacts on individuals, businesses and infrastructure of such events indicate a pressing need to understand better the value that might be delivered through more integrated environmental prediction. To address this need, the Met Office, NERC Centre for Ecology & Hydrology and NERC National Oceanography Centre have begun to develop the foundations of a coupled high resolution probabilistic forecast system for the UK at km-scale. This links together existing model components of the atmosphere, coastal ocean, land surface and hydrology. Our initial focus has been on a 2-year Prototype project to demonstrate the UK coupled prediction concept in research mode. This presentation will provide an update on UK environmental prediction activities. We will present the results from the initial implementation of an atmosphere-land-ocean coupled system, including a new eddy-permitting resolution ocean component, and discuss progress and initial results from further development to integrate wave interactions in this relatively high resolution system. We will discuss future directions and opportunities for collaboration in environmental prediction, and the challenges to realise the potential of integrated regional coupled forecasting for improving predictions and applications.

  16. Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic.

    Wang, Ming; Long, Qi

    2016-09-01

    Prediction models for disease risk and prognosis play an important role in biomedical research, and evaluating their predictive accuracy in the presence of censored data is of substantial interest. The standard concordance (c) statistic has been extended to provide a summary measure of predictive accuracy for survival models. Motivated by a prostate cancer study, we address several issues associated with evaluating survival prediction models based on c-statistic with a focus on estimators using the technique of inverse probability of censoring weighting (IPCW). Compared to the existing work, we provide complete results on the asymptotic properties of the IPCW estimators under the assumption of coarsening at random (CAR), and propose a sensitivity analysis under the mechanism of noncoarsening at random (NCAR). In addition, we extend the IPCW approach as well as the sensitivity analysis to high-dimensional settings. The predictive accuracy of prediction models for cancer recurrence after prostatectomy is assessed by applying the proposed approaches. We find that the estimated predictive accuracy for the models in consideration is sensitive to NCAR assumption, and thus identify the best predictive model. Finally, we further evaluate the performance of the proposed methods in both settings of low-dimensional and high-dimensional data under CAR and NCAR through simulations. © 2016, The International Biometric Society.

  17. Statistical modeling of an integrated boiler for coal fired thermal power plant

    Sreepradha Chandrasekharan

    2017-06-01

    Full Text Available The coal fired thermal power plants plays major role in the power production in the world as they are available in abundance. Many of the existing power plants are based on the subcritical technology which can produce power with the efficiency of around 33%. But the newer plants are built on either supercritical or ultra-supercritical technology whose efficiency can be up to 50%. Main objective of the work is to enhance the efficiency of the existing subcritical power plants to compensate for the increasing demand. For achieving the objective, the statistical modeling of the boiler units such as economizer, drum and the superheater are initially carried out. The effectiveness of the developed models is tested using analysis methods like R2 analysis and ANOVA (Analysis of Variance. The dependability of the process variable (temperature on different manipulated variables is analyzed in the paper. Validations of the model are provided with their error analysis. Response surface methodology (RSM supported by DOE (design of experiments are implemented to optimize the operating parameters. Individual models along with the integrated model are used to study and design the predictive control of the coal-fired thermal power plant. Keywords: Chemical engineering, Applied mathematics

  18. Statistical modeling of an integrated boiler for coal fired thermal power plant.

    Chandrasekharan, Sreepradha; Panda, Rames Chandra; Swaminathan, Bhuvaneswari Natrajan

    2017-06-01

    The coal fired thermal power plants plays major role in the power production in the world as they are available in abundance. Many of the existing power plants are based on the subcritical technology which can produce power with the efficiency of around 33%. But the newer plants are built on either supercritical or ultra-supercritical technology whose efficiency can be up to 50%. Main objective of the work is to enhance the efficiency of the existing subcritical power plants to compensate for the increasing demand. For achieving the objective, the statistical modeling of the boiler units such as economizer, drum and the superheater are initially carried out. The effectiveness of the developed models is tested using analysis methods like R 2 analysis and ANOVA (Analysis of Variance). The dependability of the process variable (temperature) on different manipulated variables is analyzed in the paper. Validations of the model are provided with their error analysis. Response surface methodology (RSM) supported by DOE (design of experiments) are implemented to optimize the operating parameters. Individual models along with the integrated model are used to study and design the predictive control of the coal-fired thermal power plant.

  19. Time integration and statistical regulation applied to mobile objects detection in a sequence of images

    Letang, Jean-Michel

    1993-01-01

    This PhD thesis deals with the detection of moving objects in monocular image sequences. The first section presents the inherent problems of motion analysis in real applications. We propose a method robust to perturbations frequently encountered during acquisition of outdoor scenes. It appears three main directions for investigations, all of them pointing out the importance of the temporal axis, which is a specific dimension for motion analysis. In the first part, the image sequence is considered as a set of temporal signals. The temporal multi-scale decomposition enables the characterization of various dynamical behaviors of the objects being in the scene at a given instant. A second module integrates motion information. This elementary trajectography of moving objects provides a temporal prediction map, giving a confidence level of motion presence. Interactions between both sets of data are expressed within a statistical regularization. Markov random field models supply a formal framework to convey a priori knowledge of the primitives to be evaluated. A calibration method with qualitative boxes is presented to estimate model parameters. Our approach requires only simple computations and leads to a rather fast algorithm, that we evaluate in the last section over various typical sequences. (author) [fr

  20. An integrated artificial neural networks approach for predicting global radiation

    Azadeh, A.; Maghsoudi, A.; Sohrabkhani, S.

    2009-01-01

    This article presents an integrated artificial neural network (ANN) approach for predicting solar global radiation by climatological variables. The integrated ANN trains and tests data with multi layer perceptron (MLP) approach which has the lowest mean absolute percentage error (MAPE). The proposed approach is particularly useful for locations where no available measurement equipment. Also, it considers all related climatological and meteorological parameters as input variables. To show the applicability and superiority of the integrated ANN approach, monthly data were collected for 6 years (1995-2000) in six nominal cities in Iran. Separate model for each city is considered and the quantity of solar global radiation in each city is calculated. Furthermore an integrated ANN model has been introduced for prediction of solar global radiation. The acquired results of the integrated model have shown high accuracy of about 94%. The results of the integrated model have been compared with traditional angstrom's model to show its considerable accuracy. Therefore, the proposed approach can be used as an efficient tool for prediction of solar radiation in the remote and rural locations with no direct measurement equipment.

  1. A Statistical Approach for Gain Bandwidth Prediction of Phoenix-Cell Based Reflect arrays

    Hassan Salti

    2018-01-01

    Full Text Available A new statistical approach to predict the gain bandwidth of Phoenix-cell based reflectarrays is proposed. It combines the effects of both main factors that limit the bandwidth of reflectarrays: spatial phase delays and intrinsic bandwidth of radiating cells. As an illustration, the proposed approach is successfully applied to two reflectarrays based on new Phoenix cells.

  2. Statistical model predictions for p+p and Pb+Pb collisions at LHC

    Kraus, I.; Cleymans, J.; Oeschler, H.; Redlich, K.; Wheaton, S.

    2009-01-01

    Particle production in p+p and central collisions at LHC is discussed in the context of the statistical thermal model. For heavy-ion collisions, predictions of various particle ratios are presented. The sensitivity of several ratios on the temperature and the baryon chemical potential is studied in

  3. Statistical prediction of biomethane potentials based on the composition of lignocellulosic biomass

    Thomsen, Sune Tjalfe; Spliid, Henrik; Østergård, Hanne

    2014-01-01

    Mixture models are introduced as a new and stronger methodology for statistical prediction of biomethane potentials (BPM) from lignocellulosic biomass compared to the linear regression models previously used. A large dataset from literature combined with our own data were analysed using canonical...

  4. Statistical Analysis of a Method to Predict Drug-Polymer Miscibility

    Knopp, Matthias Manne; Olesen, Niels Erik; Huang, Yanbin

    2016-01-01

    In this study, a method proposed to predict drug-polymer miscibility from differential scanning calorimetry measurements was subjected to statistical analysis. The method is relatively fast and inexpensive and has gained popularity as a result of the increasing interest in the formulation of drug...... as provided in this study. © 2015 Wiley Periodicals, Inc. and the American Pharmacists Association J Pharm Sci....

  5. Statistical Analysis of CFD Solutions from the Fourth AIAA Drag Prediction Workshop

    Morrison, Joseph H.

    2010-01-01

    A graphical framework is used for statistical analysis of the results from an extensive N-version test of a collection of Reynolds-averaged Navier-Stokes computational fluid dynamics codes. The solutions were obtained by code developers and users from the U.S., Europe, Asia, and Russia using a variety of grid systems and turbulence models for the June 2009 4th Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration for this workshop was a new subsonic transport model, the Common Research Model, designed using a modern approach for the wing and included a horizontal tail. The fourth workshop focused on the prediction of both absolute and incremental drag levels for wing-body and wing-body-horizontal tail configurations. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with earlier workshops using the statistical framework.

  6. Comparison of four statistical and machine learning methods for crash severity prediction.

    Iranitalab, Amirfarrokh; Khattak, Aemal

    2017-11-01

    Crash severity prediction models enable different agencies to predict the severity of a reported crash with unknown severity or the severity of crashes that may be expected to occur sometime in the future. This paper had three main objectives: comparison of the performance of four statistical and machine learning methods including Multinomial Logit (MNL), Nearest Neighbor Classification (NNC), Support Vector Machines (SVM) and Random Forests (RF), in predicting traffic crash severity; developing a crash costs-based approach for comparison of crash severity prediction methods; and investigating the effects of data clustering methods comprising K-means Clustering (KC) and Latent Class Clustering (LCC), on the performance of crash severity prediction models. The 2012-2015 reported crash data from Nebraska, United States was obtained and two-vehicle crashes were extracted as the analysis data. The dataset was split into training/estimation (2012-2014) and validation (2015) subsets. The four prediction methods were trained/estimated using the training/estimation dataset and the correct prediction rates for each crash severity level, overall correct prediction rate and a proposed crash costs-based accuracy measure were obtained for the validation dataset. The correct prediction rates and the proposed approach showed NNC had the best prediction performance in overall and in more severe crashes. RF and SVM had the next two sufficient performances and MNL was the weakest method. Data clustering did not affect the prediction results of SVM, but KC improved the prediction performance of MNL, NNC and RF, while LCC caused improvement in MNL and RF but weakened the performance of NNC. Overall correct prediction rate had almost the exact opposite results compared to the proposed approach, showing that neglecting the crash costs can lead to misjudgment in choosing the right prediction method. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. The Integrated Medical Model: Statistical Forecasting of Risks to Crew Health and Mission Success

    Fitts, M. A.; Kerstman, E.; Butler, D. J.; Walton, M. E.; Minard, C. G.; Saile, L. G.; Toy, S.; Myers, J.

    2008-01-01

    The Integrated Medical Model (IMM) helps capture and use organizational knowledge across the space medicine, training, operations, engineering, and research domains. The IMM uses this domain knowledge in the context of a mission and crew profile to forecast crew health and mission success risks. The IMM is most helpful in comparing the risk of two or more mission profiles, not as a tool for predicting absolute risk. The process of building the IMM adheres to Probability Risk Assessment (PRA) techniques described in NASA Procedural Requirement (NPR) 8705.5, and uses current evidence-based information to establish a defensible position for making decisions that help ensure crew health and mission success. The IMM quantitatively describes the following input parameters: 1) medical conditions and likelihood, 2) mission duration, 3) vehicle environment, 4) crew attributes (e.g. age, sex), 5) crew activities (e.g. EVA's, Lunar excursions), 6) diagnosis and treatment protocols (e.g. medical equipment, consumables pharmaceuticals), and 7) Crew Medical Officer (CMO) training effectiveness. It is worth reiterating that the IMM uses the data sets above as inputs. Many other risk management efforts stop at determining only likelihood. The IMM is unique in that it models not only likelihood, but risk mitigations, as well as subsequent clinical outcomes based on those mitigations. Once the mathematical relationships among the above parameters are established, the IMM uses a Monte Carlo simulation technique (a random sampling of the inputs as described by their statistical distribution) to determine the probable outcomes. Because the IMM is a stochastic model (i.e. the input parameters are represented by various statistical distributions depending on the data type), when the mission is simulated 10-50,000 times with a given set of medical capabilities (risk mitigations), a prediction of the most probable outcomes can be generated. For each mission, the IMM tracks which conditions

  8. Development of an Integrated Moisture Index for predicting species composition

    Louis R. Iverson; Charles T. Scott; Martin E. Dale; Anantha Prasad

    1996-01-01

    A geographic information system (GIS) approach was used to develop an Integrated Moisture Index (IMI), which was used to predict species composition for Ohio forests. Several landscape features (a slope-aspect shading index, cumulative flow of water downslope, curvature of the landscape, and the water-holding capacity of the soil) were derived from elevation and soils...

  9. Output from Statistical Predictive Models as Input to eLearning Dashboards

    Marlene A. Smith

    2015-06-01

    Full Text Available We describe how statistical predictive models might play an expanded role in educational analytics by giving students automated, real-time information about what their current performance means for eventual success in eLearning environments. We discuss how an online messaging system might tailor information to individual students using predictive analytics. The proposed system would be data-driven and quantitative; e.g., a message might furnish the probability that a student will successfully complete the certificate requirements of a massive open online course. Repeated messages would prod underperforming students and alert instructors to those in need of intervention. Administrators responsible for accreditation or outcomes assessment would have ready documentation of learning outcomes and actions taken to address unsatisfactory student performance. The article’s brief introduction to statistical predictive models sets the stage for a description of the messaging system. Resources and methods needed to develop and implement the system are discussed.

  10. THE INTEGRATED SHORT-TERM STATISTICAL SURVEYS: EXPERIENCE OF NBS IN MOLDOVA

    Oleg CARA

    2012-07-01

    Full Text Available The users’ rising need for relevant, reliable, coherent, timely data for the early diagnosis of the economic vulnerability and of the turning points in the business cycles, especially during a financial and economic crisis, asks for a prompt answer, coordinated by statistical institutions. High quality short term statistics are of special interest for the emerging market economies, such as the Moldavian one, being extremely vulnerable when facing economic recession. Answering to the challenges of producing a coherent and adequate image of the economic activity, by using the system of indicators and definitions efficiently applied at the level of the European Union, the National Bureau of Statistics (NBS of the Republic of Moldova has launched the development process of an integrated system of short term statistics (STS based on the advanced international experience.Thus, in 2011, BNS implemented the integrated statistical survey on STS based on consistent concepts, harmonized with the EU standards. The integration of the production processes, which were previously separated, is based on a common technical infrastructure, standardized procedures and techniques for data production. The achievement of this complex survey with holistic approach has allowed the consolidation of the statistical data quality, comparable at European level and the signifi cant reduction of information burden on business units, especially of small size.The reformation of STS based on the integrated survey has been possible thanks to the consistent methodological and practical support given to NBS by the National Institute of Statistics (INS of Romania, for which we would like to thank to our Romanian colleagues.

  11. Comparison of statistical and clinical predictions of functional outcome after ischemic stroke.

    Douglas D Thompson

    Full Text Available To determine whether the predictions of functional outcome after ischemic stroke made at the bedside using a doctor's clinical experience were more or less accurate than the predictions made by clinical prediction models (CPMs.A prospective cohort study of nine hundred and thirty one ischemic stroke patients recruited consecutively at the outpatient, inpatient and emergency departments of the Western General Hospital, Edinburgh between 2002 and 2005. Doctors made informal predictions of six month functional outcome on the Oxford Handicap Scale (OHS. Patients were followed up at six months with a validated postal questionnaire. For each patient we calculated the absolute predicted risk of death or dependence (OHS≥3 using five previously described CPMs. The specificity of a doctor's informal predictions of OHS≥3 at six months was good 0.96 (95% CI: 0.94 to 0.97 and similar to CPMs (range 0.94 to 0.96; however the sensitivity of both informal clinical predictions 0.44 (95% CI: 0.39 to 0.49 and clinical prediction models (range 0.38 to 0.45 was poor. The prediction of the level of disability after stroke was similar for informal clinical predictions (ordinal c-statistic 0.74 with 95% CI 0.72 to 0.76 and CPMs (range 0.69 to 0.75. No patient or clinician characteristic affected the accuracy of informal predictions, though predictions were more accurate in outpatients.CPMs are at least as good as informal clinical predictions in discriminating between good and bad functional outcome after ischemic stroke. The place of these models in clinical practice has yet to be determined.

  12. Testing earthquake prediction algorithms: Statistically significant advance prediction of the largest earthquakes in the Circum-Pacific, 1992-1997

    Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.

    1999-01-01

    Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier

  13. Comparison of classical statistical methods and artificial neural network in traffic noise prediction

    Nedic, Vladimir; Despotovic, Danijela; Cvetanovic, Slobodan; Despotovic, Milan; Babic, Sasa

    2014-01-01

    Traffic is the main source of noise in urban environments and significantly affects human mental and physical health and labor productivity. Therefore it is very important to model the noise produced by various vehicles. Techniques for traffic noise prediction are mainly based on regression analysis, which generally is not good enough to describe the trends of noise. In this paper the application of artificial neural networks (ANNs) for the prediction of traffic noise is presented. As input variables of the neural network, the proposed structure of the traffic flow and the average speed of the traffic flow are chosen. The output variable of the network is the equivalent noise level in the given time period L eq . Based on these parameters, the network is modeled, trained and tested through a comparative analysis of the calculated values and measured levels of traffic noise using the originally developed user friendly software package. It is shown that the artificial neural networks can be a useful tool for the prediction of noise with sufficient accuracy. In addition, the measured values were also used to calculate equivalent noise level by means of classical methods, and comparative analysis is given. The results clearly show that ANN approach is superior in traffic noise level prediction to any other statistical method. - Highlights: • We proposed an ANN model for prediction of traffic noise. • We developed originally designed user friendly software package. • The results are compared with classical statistical methods. • The results are much better predictive capabilities of ANN model

  14. BetaTPred: prediction of beta-TURNS in a protein using statistical algorithms.

    Kaur, Harpreet; Raghava, G P S

    2002-03-01

    beta-turns play an important role from a structural and functional point of view. beta-turns are the most common type of non-repetitive structures in proteins and comprise on average, 25% of the residues. In the past numerous methods have been developed to predict beta-turns in a protein. Most of these prediction methods are based on statistical approaches. In order to utilize the full potential of these methods, there is a need to develop a web server. This paper describes a web server called BetaTPred, developed for predicting beta-TURNS in a protein from its amino acid sequence. BetaTPred allows the user to predict turns in a protein using existing statistical algorithms. It also allows to predict different types of beta-TURNS e.g. type I, I', II, II', VI, VIII and non-specific. This server assists the users in predicting the consensus beta-TURNS in a protein. The server is accessible from http://imtech.res.in/raghava/betatpred/

  15. Comparison of classical statistical methods and artificial neural network in traffic noise prediction

    Nedic, Vladimir, E-mail: vnedic@kg.ac.rs [Faculty of Philology and Arts, University of Kragujevac, Jovana Cvijića bb, 34000 Kragujevac (Serbia); Despotovic, Danijela, E-mail: ddespotovic@kg.ac.rs [Faculty of Economics, University of Kragujevac, Djure Pucara Starog 3, 34000 Kragujevac (Serbia); Cvetanovic, Slobodan, E-mail: slobodan.cvetanovic@eknfak.ni.ac.rs [Faculty of Economics, University of Niš, Trg kralja Aleksandra Ujedinitelja, 18000 Niš (Serbia); Despotovic, Milan, E-mail: mdespotovic@kg.ac.rs [Faculty of Engineering, University of Kragujevac, Sestre Janjic 6, 34000 Kragujevac (Serbia); Babic, Sasa, E-mail: babicsf@yahoo.com [College of Applied Mechanical Engineering, Trstenik (Serbia)

    2014-11-15

    Traffic is the main source of noise in urban environments and significantly affects human mental and physical health and labor productivity. Therefore it is very important to model the noise produced by various vehicles. Techniques for traffic noise prediction are mainly based on regression analysis, which generally is not good enough to describe the trends of noise. In this paper the application of artificial neural networks (ANNs) for the prediction of traffic noise is presented. As input variables of the neural network, the proposed structure of the traffic flow and the average speed of the traffic flow are chosen. The output variable of the network is the equivalent noise level in the given time period L{sub eq}. Based on these parameters, the network is modeled, trained and tested through a comparative analysis of the calculated values and measured levels of traffic noise using the originally developed user friendly software package. It is shown that the artificial neural networks can be a useful tool for the prediction of noise with sufficient accuracy. In addition, the measured values were also used to calculate equivalent noise level by means of classical methods, and comparative analysis is given. The results clearly show that ANN approach is superior in traffic noise level prediction to any other statistical method. - Highlights: • We proposed an ANN model for prediction of traffic noise. • We developed originally designed user friendly software package. • The results are compared with classical statistical methods. • The results are much better predictive capabilities of ANN model.

  16. Symbol recognition via statistical integration of pixel-level constraint histograms: a new descriptor.

    Yang, Su

    2005-02-01

    A new descriptor for symbol recognition is proposed. 1) A histogram is constructed for every pixel to figure out the distribution of the constraints among the other pixels. 2) All the histograms are statistically integrated to form a feature vector with fixed dimension. The robustness and invariance were experimentally confirmed.

  17. Integrating source-language context into phrase-based statistical machine translation

    Haque, R.; Kumar Naskar, S.; Bosch, A.P.J. van den; Way, A.

    2011-01-01

    The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling

  18. Feasibility of real-time calculation of correlation integral derived statistics applied to EGG time series

    van den Broek, PLC; van Egmond, J; van Rijn, CM; Takens, F; Coenen, AML; Booij, LHDJ

    2005-01-01

    Background: This study assessed the feasibility of online calculation of the correlation integral (C(r)) aiming to apply C(r)derived statistics. For real-time application it is important to reduce calculation time. It is shown how our method works for EEG time series. Methods: To achieve online

  19. Feasibility of real-time calculation of correlation integral derived statistics applied to EEG time series

    Broek, P.L.C. van den; Egmond, J. van; Rijn, C.M. van; Takens, F.; Coenen, A.M.L.; Booij, L.H.D.J.

    2005-01-01

    This study assessed the feasibility of online calculation of the correlation integral (C(r)) aiming to apply C(r)-derived statistics. For real-time application it is important to reduce calculation time. It is shown how our method works for EEG time series. Methods: To achieve online calculation of

  20. Demonstrating the Effectiveness of an Integrated and Intensive Research Methods and Statistics Course Sequence

    Pliske, Rebecca M.; Caldwell, Tracy L.; Calin-Jageman, Robert J.; Taylor-Ritzler, Tina

    2015-01-01

    We developed a two-semester series of intensive (six-contact hours per week) behavioral research methods courses with an integrated statistics curriculum. Our approach includes the use of team-based learning, authentic projects, and Excel and SPSS. We assessed the effectiveness of our approach by examining our students' content area scores on the…

  1. Using the Student Research Project to Integrate Macroeconomics and Statistics in an Advanced Cost Accounting Course

    Hassan, Mahamood M.; Schwartz, Bill N.

    2014-01-01

    This paper discusses a student research project that is part of an advanced cost accounting class. The project emphasizes active learning, integrates cost accounting with macroeconomics and statistics by "learning by doing" using real world data. Students analyze sales data for a publicly listed company by focusing on the company's…

  2. Translating visual information into action predictions: Statistical learning in action and nonaction contexts.

    Monroy, Claire D; Gerson, Sarah A; Hunnius, Sabine

    2018-05-01

    Humans are sensitive to the statistical regularities in action sequences carried out by others. In the present eyetracking study, we investigated whether this sensitivity can support the prediction of upcoming actions when observing unfamiliar action sequences. In two between-subjects conditions, we examined whether observers would be more sensitive to statistical regularities in sequences performed by a human agent versus self-propelled 'ghost' events. Secondly, we investigated whether regularities are learned better when they are associated with contingent effects. Both implicit and explicit measures of learning were compared between agent and ghost conditions. Implicit learning was measured via predictive eye movements to upcoming actions or events, and explicit learning was measured via both uninstructed reproduction of the action sequences and verbal reports of the regularities. The findings revealed that participants, regardless of condition, readily learned the regularities and made correct predictive eye movements to upcoming events during online observation. However, different patterns of explicit-learning outcomes emerged following observation: Participants were most likely to re-create the sequence regularities and to verbally report them when they had observed an actor create a contingent effect. These results suggest that the shift from implicit predictions to explicit knowledge of what has been learned is facilitated when observers perceive another agent's actions and when these actions cause effects. These findings are discussed with respect to the potential role of the motor system in modulating how statistical regularities are learned and used to modify behavior.

  3. Using Patient Demographics and Statistical Modeling to Predict Knee Tibia Component Sizing in Total Knee Arthroplasty.

    Ren, Anna N; Neher, Robert E; Bell, Tyler; Grimm, James

    2018-06-01

    Preoperative planning is important to achieve successful implantation in primary total knee arthroplasty (TKA). However, traditional TKA templating techniques are not accurate enough to predict the component size to a very close range. With the goal of developing a general predictive statistical model using patient demographic information, ordinal logistic regression was applied to build a proportional odds model to predict the tibia component size. The study retrospectively collected the data of 1992 primary Persona Knee System TKA procedures. Of them, 199 procedures were randomly selected as testing data and the rest of the data were randomly partitioned between model training data and model evaluation data with a ratio of 7:3. Different models were trained and evaluated on the training and validation data sets after data exploration. The final model had patient gender, age, weight, and height as independent variables and predicted the tibia size within 1 size difference 96% of the time on the validation data, 94% of the time on the testing data, and 92% on a prospective cadaver data set. The study results indicated the statistical model built by ordinal logistic regression can increase the accuracy of tibia sizing information for Persona Knee preoperative templating. This research shows statistical modeling may be used with radiographs to dramatically enhance the templating accuracy, efficiency, and quality. In general, this methodology can be applied to other TKA products when the data are applicable. Copyright © 2018 Elsevier Inc. All rights reserved.

  4. Statistical approach to predict compressive strength of high workability slag-cement mortars

    Memon, N.A.; Memon, N.A.; Sumadi, S.R.

    2009-01-01

    This paper reports an attempt made to develop empirical expressions to estimate/ predict the compressive strength of high workability slag-cement mortars. Experimental data of 54 mix mortars were used. The mortars were prepared with slag as cement replacement of the order of 0, 50 and 60%. The flow (workability) was maintained at 136+-3%. The numerical and statistical analysis was performed by using database computer software Microsoft Office Excel 2003. Three empirical mathematical models were developed to estimate/predict 28 days compressive strength of high workability slag cement-mortars with 0, 50 and 60% slag which predict the values accurate between 97 and 98%. Finally a generalized empirical mathematical model was proposed which can predict 28 days compressive strength of high workability mortars up to degree of accuracy 95%. (author)

  5. Comparison and validation of statistical methods for predicting power outage durations in the event of hurricanes.

    Nateghi, Roshanak; Guikema, Seth D; Quiring, Steven M

    2011-12-01

    This article compares statistical methods for modeling power outage durations during hurricanes and examines the predictive accuracy of these methods. Being able to make accurate predictions of power outage durations is valuable because the information can be used by utility companies to plan their restoration efforts more efficiently. This information can also help inform customers and public agencies of the expected outage times, enabling better collective response planning, and coordination of restoration efforts for other critical infrastructures that depend on electricity. In the long run, outage duration estimates for future storm scenarios may help utilities and public agencies better allocate risk management resources to balance the disruption from hurricanes with the cost of hardening power systems. We compare the out-of-sample predictive accuracy of five distinct statistical models for estimating power outage duration times caused by Hurricane Ivan in 2004. The methods compared include both regression models (accelerated failure time (AFT) and Cox proportional hazard models (Cox PH)) and data mining techniques (regression trees, Bayesian additive regression trees (BART), and multivariate additive regression splines). We then validate our models against two other hurricanes. Our results indicate that BART yields the best prediction accuracy and that it is possible to predict outage durations with reasonable accuracy. © 2011 Society for Risk Analysis.

  6. Comparison of accuracy in predicting emotional instability from MMPI data: fisherian versus contingent probability statistics

    Berghausen, P.E. Jr.; Mathews, T.W.

    1987-01-01

    The security plans of nuclear power plants generally require that all personnel who are to have access to protected areas or vital islands be screened for emotional stability. In virtually all instances, the screening involves the administration of one or more psychological tests, usually including the Minnesota Multiphasic Personality Inventory (MMPI). At some plants, all employees receive a structured clinical interview after they have taken the MMPI and results have been obtained. At other plants, only those employees with dirty MMPI are interviewed. This latter protocol is referred to as interviews by exception. Behaviordyne Psychological Corp. has succeeded in removing some of the uncertainty associated with interview-by-exception protocols by developing an empirically based, predictive equation. This equation permits utility companies to make informed choices regarding the risks they are assuming. A conceptual problem exists with the predictive equation, however. Like most predictive equations currently in use, it is based on Fisherian statistics, involving least-squares analyses. Consequently, Behaviordyne Psychological Corp., in conjunction with T.W. Mathews and Associates, has just developed a second predictive equation, one based on contingent probability statistics. The particular technique used in the multi-contingent analysis of probability systems (MAPS) approach. The present paper presents a comparison of predictive accuracy of the two equations: the one derived using Fisherian techniques versus the one thing contingent probability techniques.

  7. Comparison of accuracy in predicting emotional instability from MMPI data: fisherian versus contingent probability statistics

    Berghausen, P.E. Jr.; Mathews, T.W.

    1987-01-01

    The security plans of nuclear power plants generally require that all personnel who are to have access to protected areas or vital islands be screened for emotional stability. In virtually all instances, the screening involves the administration of one or more psychological tests, usually including the Minnesota Multiphasic Personality Inventory (MMPI). At some plants, all employees receive a structured clinical interview after they have taken the MMPI and results have been obtained. At other plants, only those employees with dirty MMPI are interviewed. This latter protocol is referred to as interviews by exception. Behaviordyne Psychological Corp. has succeeded in removing some of the uncertainty associated with interview-by-exception protocols by developing an empirically based, predictive equation. This equation permits utility companies to make informed choices regarding the risks they are assuming. A conceptual problem exists with the predictive equation, however. Like most predictive equations currently in use, it is based on Fisherian statistics, involving least-squares analyses. Consequently, Behaviordyne Psychological Corp., in conjunction with T.W. Mathews and Associates, has just developed a second predictive equation, one based on contingent probability statistics. The particular technique used in the multi-contingent analysis of probability systems (MAPS) approach. The present paper presents a comparison of predictive accuracy of the two equations: the one derived using Fisherian techniques versus the one thing contingent probability techniques

  8. Prediction of transmission loss through an aircraft sidewall using statistical energy analysis

    Ming, Ruisen; Sun, Jincai

    1989-06-01

    The transmission loss of randomly incident sound through an aircraft sidewall is investigated using statistical energy analysis. Formulas are also obtained for the simple calculation of sound transmission loss through single- and double-leaf panels. Both resonant and nonresonant sound transmissions can be easily calculated using the formulas. The formulas are used to predict sound transmission losses through a Y-7 propeller airplane panel. The panel measures 2.56 m x 1.38 m and has two windows. The agreement between predicted and measured values through most of the frequency ranges tested is quite good.

  9. Supervisory Model Predictive Control of the Heat Integrated Distillation Column

    Meyer, Kristian; Bisgaard, Thomas; Huusom, Jakob Kjøbsted

    2017-01-01

    This paper benchmarks a centralized control system based on model predictive control for the operation of the heat integrated distillation column (HIDiC) against a fully decentralized control system using the most complete column model currently available in the literature. The centralized control...... system outperforms the decentralized system, because it handles the interactions in the HIDiC process better. The integral absolute error (IAE) is reduced by a factor of 2 and a factor of 4 for control of the top and bottoms compositions, respectively....

  10. Integrating geophysics and hydrology for reducing the uncertainty of groundwater model predictions and improved prediction performance

    Christensen, Nikolaj Kruse; Christensen, Steen; Ferre, Ty

    the integration of geophysical data in the construction of a groundwater model increases the prediction performance. We suggest that modelers should perform a hydrogeophysical “test-bench” analysis of the likely value of geophysics data for improving groundwater model prediction performance before actually...... and the resulting predictions can be compared with predictions from the ‘true’ model. By performing this analysis we expect to give the modeler insight into how the uncertainty of model-based prediction can be reduced.......A major purpose of groundwater modeling is to help decision-makers in efforts to manage the natural environment. Increasingly, it is recognized that both the predictions of interest and their associated uncertainties should be quantified to support robust decision making. In particular, decision...

  11. Statistical Analysis of CFD Solutions From the Fifth AIAA Drag Prediction Workshop

    Morrison, Joseph H.

    2013-01-01

    A graphical framework is used for statistical analysis of the results from an extensive N-version test of a collection of Reynolds-averaged Navier-Stokes computational fluid dynamics codes. The solutions were obtained by code developers and users from North America, Europe, Asia, and South America using a common grid sequence and multiple turbulence models for the June 2012 fifth Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration for this workshop was the Common Research Model subsonic transport wing-body previously used for the 4th Drag Prediction Workshop. This work continues the statistical analysis begun in the earlier workshops and compares the results from the grid convergence study of the most recent workshop with previous workshops.

  12. Statistical Model Predictions for p+p and Pb+Pb Collisions at LHC

    Kraus, I; Oeschler, H; Redlich, K; Wheaton, S

    2009-01-01

    Particle production in p+p and central Pb+Pb collisions at LHC is discussed in the context of the statistical thermal model. For heavy-ion collisions, predictions of various particle ratios are presented. The sensitivity of several ratios on the temperature and the baryon chemical potential is studied in detail, and some of them, which are particularly appropriate to determine the chemical freeze-out point experimentally, are indicated. Considering elementary interactions on the other hand, we focus on strangeness production and its possible suppression. Extrapolating the thermal parameters to LHC energy, we present predictions of the statistical model for particle yields in p+p collisions. We quantify the strangeness suppression by the correlation volume parameter and discuss its influence on particle production. We propose observables that can provide deeper insight into the mechanism of strangeness production and suppression at LHC.

  13. Predicting Statistical Response and Extreme Events in Uncertainty Quantification through Reduced-Order Models

    Qi, D.; Majda, A.

    2017-12-01

    A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty in statistical sensitivity and intermittency in principal model directions with largest variability in high-dimensional turbulent system and turbulent transport models. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model performance. The idea in the reduced-order method is from a self-consistent mathematical framework for general systems with quadratic nonlinearity, where crucial high-order statistics are approximated by a systematic model calibration procedure. Model efficiency is improved through additional damping and noise corrections to replace the expensive energy-conserving nonlinear interactions. Model errors due to the imperfect nonlinear approximation are corrected by tuning the model parameters using linear response theory with an information metric in a training phase before prediction. A statistical energy principle is adopted to introduce a global scaling factor in characterizing the higher-order moments in a consistent way to improve model sensitivity. Stringent models of barotropic and baroclinic turbulence are used to display the feasibility of the reduced-order methods. Principal statistical responses in mean and variance can be captured by the reduced-order models with accuracy and efficiency. Besides, the reduced-order models are also used to capture crucial passive tracer field that is advected by the baroclinic turbulent flow. It is demonstrated that crucial principal statistical quantities like the tracer spectrum and fat-tails in the tracer probability density functions in the most important large scales can be captured efficiently with accuracy using the reduced-order tracer model in various dynamical regimes of the flow field with

  14. Integration of Predictive Display and Aircraft Flight Control System

    Efremov A.V.

    2017-01-01

    Full Text Available The synthesis of predictive display information and direct lift control system are considered for the path control tracking tasks (in particular landing task. The both solutions are based on pilot-vehicle system analysis and requirements to provide the highest accuracy and lowest pilot workload. The investigation was carried out for cases with and without time delay in aircraft dynamics. The efficiency of the both ways for the flying qualities improvement and their integration is tested by ground based simulation.

  15. Integrating statistical machine learning in a semantic sensor web for proactive monitoring and control

    Adeleke, Jude Adekunle

    2017-04-01

    Full Text Available in an indoor air quality monitoring case study. A sliding window approach that employs the Multilayer Perceptron model to predict short term PM 2 . 5 pollution situations is integrated into the proactive monitoring and control framework. Results show...

  16. Predicting Protein Function via Semantic Integration of Multiple Networks.

    Yu, Guoxian; Fu, Guangyuan; Wang, Jun; Zhu, Hailong

    2016-01-01

    Determining the biological functions of proteins is one of the key challenges in the post-genomic era. The rapidly accumulated large volumes of proteomic and genomic data drives to develop computational models for automatically predicting protein function in large scale. Recent approaches focus on integrating multiple heterogeneous data sources and they often get better results than methods that use single data source alone. In this paper, we investigate how to integrate multiple biological data sources with the biological knowledge, i.e., Gene Ontology (GO), for protein function prediction. We propose a method, called SimNet, to Semantically integrate multiple functional association Networks derived from heterogenous data sources. SimNet firstly utilizes GO annotations of proteins to capture the semantic similarity between proteins and introduces a semantic kernel based on the similarity. Next, SimNet constructs a composite network, obtained as a weighted summation of individual networks, and aligns the network with the kernel to get the weights assigned to individual networks. Then, it applies a network-based classifier on the composite network to predict protein function. Experiment results on heterogenous proteomic data sources of Yeast, Human, Mouse, and Fly show that, SimNet not only achieves better (or comparable) results than other related competitive approaches, but also takes much less time. The Matlab codes of SimNet are available at https://sites.google.com/site/guoxian85/simnet.

  17. An integrated logistic formula for prediction of complications from radiosurgery

    Flickinger, J.C.

    1989-01-01

    An integrated logistic model for predicting the probability of complications when small volumes of tissue receive an inhomogeneous radiation dose is described. This model can be used with either an exponential or linear quadratic correction for dose per fraction and time. Both the exponential and linear quadratic versions of this integrated logistic formula provide reasonable estimates of the tolerance of brain to radiosurgical dose distributions where there are small volumes of brain receiving high radiation doses and larger volumes receiving lower doses. This makes it possible to predict the probability of complications from stereotactic radiosurgery, as well as combinations of fractionated large volume irradiation with a radiosurgical boost. Complication probabilities predicted for single fraction radiosurgery with the Leksell Gamma Unit using 4, 8, 14, and 18 mm diameter collimators as well as for whole brain irradiation combined with a radiosurgical boost are presented. The exponential and linear quadratic versions of the integrated logistic formula provide useful methods of calculating the probability of complications from radiosurgical treatment

  18. Integrated Computational Solution for Predicting Skin Sensitization Potential of Molecules.

    Konda Leela Sarath Kumar

    Full Text Available Skin sensitization forms a major toxicological endpoint for dermatology and cosmetic products. Recent ban on animal testing for cosmetics demands for alternative methods. We developed an integrated computational solution (SkinSense that offers a robust solution and addresses the limitations of existing computational tools i.e. high false positive rate and/or limited coverage.The key components of our solution include: QSAR models selected from a combinatorial set, similarity information and literature-derived sub-structure patterns of known skin protein reactive groups. Its prediction performance on a challenge set of molecules showed accuracy = 75.32%, CCR = 74.36%, sensitivity = 70.00% and specificity = 78.72%, which is better than several existing tools including VEGA (accuracy = 45.00% and CCR = 54.17% with 'High' reliability scoring, DEREK (accuracy = 72.73% and CCR = 71.44% and TOPKAT (accuracy = 60.00% and CCR = 61.67%. Although, TIMES-SS showed higher predictive power (accuracy = 90.00% and CCR = 92.86%, the coverage was very low (only 10 out of 77 molecules were predicted reliably.Owing to improved prediction performance and coverage, our solution can serve as a useful expert system towards Integrated Approaches to Testing and Assessment for skin sensitization. It would be invaluable to cosmetic/ dermatology industry for pre-screening their molecules, and reducing time, cost and animal testing.

  19. The statistical prediction of offshore winds from land-based data for wind-energy applications

    Walmsley, J.L.; Barthelmie, R.J.; Burrows, W.R.

    2001-01-01

    Land-based meteorological measurements at two locations on the Danish coast are used to predict offshore wind speeds. Offshore wind-speed data are used only for developing the statistical prediction algorithms and for verification. As a first step, the two datasets were separated into nine...... percentile-based bins, with a minimum of 30 data records in each bin. Next, the records were randomly selected with approximately 70% of the data in each bin being used as a training set for development of the prediction algorithms, and the remaining 30% being reserved as a test set for evaluation purposes....... The binning procedure ensured that both training and test sets fairly represented the overall data distribution. To base the conclusions on firmer ground, five permutations of these training and test sets were created. Thus, all calculations were based on five cases, each one representing a different random...

  20. Comparing statistical and machine learning classifiers: alternatives for predictive modeling in human factors research.

    Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann

    2003-01-01

    Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.

  1. Prediction of rotor blade-vortex interaction using Volterra integrals

    Wong, A.; Nitzsche, F. [Carleton Univ., Dept. of Mechanical and Aerospace Engineering, Ottawa, Ontario (Canada)]. E-mail: Fred_Nitzsche@carleton.ca; Khalid, M. [National Research Council Canada, Inst. for Aerospace Research, Ottawa, Ontario (Canada)

    2004-07-01

    The theory of Volterra integral equations for nonlinear system is applied to the prediction of the nonlinear aerodynamic response of an NACA 0012 airfoil experiencing blade-vortex interaction. The phenomenon is first modeled in two-dimensions using an Euler/Navier-Stoke code, and the resulting unsteady aerodynamic flow field sequences are appropriately combined to form a training dataset. The Volterra kernels are identified in the time-domain characteristics of the selected data, which is in turn used to predict the nonlinear aerodynamic response of the airfoil. The Volterra kernel based data is then compared against a standard airfoil response. The predicted lift time histories of the airfoil are shown to be in good agreement with the aerodynamic data. (author)

  2. Prediction of rotor blade-vortex interaction using Volterra integrals

    Wong, A.; Nitzsche, F.; Khalid, M.

    2004-01-01

    The theory of Volterra integral equations for nonlinear system is applied to the prediction of the nonlinear aerodynamic response of an NACA 0012 airfoil experiencing blade-vortex interaction. The phenomenon is first modeled in two-dimensions using an Euler/Navier-Stoke code, and the resulting unsteady aerodynamic flow field sequences are appropriately combined to form a training dataset. The Volterra kernels are identified in the time-domain characteristics of the selected data, which is in turn used to predict the nonlinear aerodynamic response of the airfoil. The Volterra kernel based data is then compared against a standard airfoil response. The predicted lift time histories of the airfoil are shown to be in good agreement with the aerodynamic data. (author)

  3. Information trimming: Sufficient statistics, mutual information, and predictability from effective channel states

    James, Ryan G.; Mahoney, John R.; Crutchfield, James P.

    2017-06-01

    One of the most basic characterizations of the relationship between two random variables, X and Y , is the value of their mutual information. Unfortunately, calculating it analytically and estimating it empirically are often stymied by the extremely large dimension of the variables. One might hope to replace such a high-dimensional variable by a smaller one that preserves its relationship with the other. It is well known that either X (or Y ) can be replaced by its minimal sufficient statistic about Y (or X ) while preserving the mutual information. While intuitively reasonable, it is not obvious or straightforward that both variables can be replaced simultaneously. We demonstrate that this is in fact possible: the information X 's minimal sufficient statistic preserves about Y is exactly the information that Y 's minimal sufficient statistic preserves about X . We call this procedure information trimming. As an important corollary, we consider the case where one variable is a stochastic process' past and the other its future. In this case, the mutual information is the channel transmission rate between the channel's effective states. That is, the past-future mutual information (the excess entropy) is the amount of information about the future that can be predicted using the past. Translating our result about minimal sufficient statistics, this is equivalent to the mutual information between the forward- and reverse-time causal states of computational mechanics. We close by discussing multivariate extensions to this use of minimal sufficient statistics.

  4. Predicting axillary lymph node metastasis from kinetic statistics of DCE-MRI breast images

    Ashraf, Ahmed B.; Lin, Lilie; Gavenonis, Sara C.; Mies, Carolyn; Xanthopoulos, Eric; Kontos, Despina

    2012-03-01

    The presence of axillary lymph node metastases is the most important prognostic factor in breast cancer and can influence the selection of adjuvant therapy, both chemotherapy and radiotherapy. In this work we present a set of kinetic statistics derived from DCE-MRI for predicting axillary node status. Breast DCE-MRI images from 69 women with known nodal status were analyzed retrospectively under HIPAA and IRB approval. Axillary lymph nodes were positive in 12 patients while 57 patients had no axillary lymph node involvement. Kinetic curves for each pixel were computed and a pixel-wise map of time-to-peak (TTP) was obtained. Pixels were first partitioned according to the similarity of their kinetic behavior, based on TTP values. For every kinetic curve, the following pixel-wise features were computed: peak enhancement (PE), wash-in-slope (WIS), wash-out-slope (WOS). Partition-wise statistics for every feature map were calculated, resulting in a total of 21 kinetic statistic features. ANOVA analysis was done to select features that differ significantly between node positive and node negative women. Using the computed kinetic statistic features a leave-one-out SVM classifier was learned that performs with AUC=0.77 under the ROC curve, outperforming the conventional kinetic measures, including maximum peak enhancement (MPE) and signal enhancement ratio (SER), (AUCs of 0.61 and 0.57 respectively). These findings suggest that our DCE-MRI kinetic statistic features can be used to improve the prediction of axillary node status in breast cancer patients. Such features could ultimately be used as imaging biomarkers to guide personalized treatment choices for women diagnosed with breast cancer.

  5. The Meta-Analysis of Clinical Judgment Project: Fifty-Six Years of Accumulated Research on Clinical Versus Statistical Prediction

    Aegisdottir, Stefania; White, Michael J.; Spengler, Paul M.; Maugherman, Alan S.; Anderson, Linda A.; Cook, Robert S.; Nichols, Cassandra N.; Lampropoulos, Georgios K.; Walker, Blain S.; Cohen, Genna; Rush, Jeffrey D.

    2006-01-01

    Clinical predictions made by mental health practitioners are compared with those using statistical approaches. Sixty-seven studies were identified from a comprehensive search of 56 years of research; 92 effect sizes were derived from these studies. The overall effect of clinical versus statistical prediction showed a somewhat greater accuracy for…

  6. Prediction of monthly average global solar radiation based on statistical distribution of clearness index

    Ayodele, T.R.; Ogunjuyigbe, A.S.O.

    2015-01-01

    In this paper, probability distribution of clearness index is proposed for the prediction of global solar radiation. First, the clearness index is obtained from the past data of global solar radiation, then, the parameters of the appropriate distribution that best fit the clearness index are determined. The global solar radiation is thereafter predicted from the clearness index using inverse transformation of the cumulative distribution function. To validate the proposed method, eight years global solar radiation data (2000–2007) of Ibadan, Nigeria are used to determine the parameters of appropriate probability distribution for clearness index. The calculated parameters are then used to predict the future monthly average global solar radiation for the following year (2008). The predicted values are compared with the measured values using four statistical tests: the Root Mean Square Error (RMSE), MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error) and the coefficient of determination (R"2). The proposed method is also compared to the existing regression models. The results show that logistic distribution provides the best fit for clearness index of Ibadan and the proposed method is effective in predicting the monthly average global solar radiation with overall RMSE of 0.383 MJ/m"2/day, MAE of 0.295 MJ/m"2/day, MAPE of 2% and R"2 of 0.967. - Highlights: • Distribution of clearnes index is proposed for prediction of global solar radiation. • The clearness index is obtained from the past data of global solar radiation. • The parameters of distribution that best fit the clearness index are determined. • Solar radiation is predicted from the clearness index using inverse transformation. • The method is effective in predicting the monthly average global solar radiation.

  7. Integrative Analysis of Gene Expression Data Including an Assessment of Pathway Enrichment for Predicting Prostate Cancer

    Pingzhao Hu

    2006-01-01

    biological pathways. In particular, we observed that by integrating information from the insulin signalling pathway into our prediction model, we achieved better prediction of prostate cancer. Conclusions: Our data integration methodology provides an efficient way to identify biologically sound and statistically significant pathways from gene expression data. The significant gene expression phenotypes identified in our study have the potential to characterize complex genetic alterations in prostate cancer.

  8. Integrated statistical learning of metabolic ion mobility spectrometry profiles for pulmonary disease identification

    Hauschild, A.C.; Baumbach, Jan; Baumbach, J.

    2012-01-01

    sophisticated statistical learning techniques for VOC-based feature selection and supervised classification into patient groups. We analyzed breath data from 84 volunteers, each of them either suffering from chronic obstructive pulmonary disease (COPD), or both COPD and bronchial carcinoma (COPD + BC), as well...... as from 35 healthy volunteers, comprising a control group (CG). We standardized and integrated several statistical learning methods to provide a broad overview of their potential for distinguishing the patient groups. We found that there is strong potential for separating MCC/IMS chromatograms of healthy...... patients from healthy controls. We conclude that these statistical learning methods have a generally high accuracy when applied to well-structured, medical MCC/IMS data....

  9. Gaussian orthogonal ensemble statistics in graphene billiards with the shape of classically integrable billiards.

    Yu, Pei; Li, Zi-Yuan; Xu, Hong-Ya; Huang, Liang; Dietz, Barbara; Grebogi, Celso; Lai, Ying-Cheng

    2016-12-01

    A crucial result in quantum chaos, which has been established for a long time, is that the spectral properties of classically integrable systems generically are described by Poisson statistics, whereas those of time-reversal symmetric, classically chaotic systems coincide with those of random matrices from the Gaussian orthogonal ensemble (GOE). Does this result hold for two-dimensional Dirac material systems? To address this fundamental question, we investigate the spectral properties in a representative class of graphene billiards with shapes of classically integrable circular-sector billiards. Naively one may expect to observe Poisson statistics, which is indeed true for energies close to the band edges where the quasiparticle obeys the Schrödinger equation. However, for energies near the Dirac point, where the quasiparticles behave like massless Dirac fermions, Poisson statistics is extremely rare in the sense that it emerges only under quite strict symmetry constraints on the straight boundary parts of the sector. An arbitrarily small amount of imperfection of the boundary results in GOE statistics. This implies that, for circular-sector confinements with arbitrary angle, the spectral properties will generically be GOE. These results are corroborated by extensive numerical computation. Furthermore, we provide a physical understanding for our results.

  10. Gaussian orthogonal ensemble statistics in graphene billiards with the shape of classically integrable billiards

    Yu, Pei; Li, Zi-Yuan; Xu, Hong-Ya; Huang, Liang; Dietz, Barbara; Grebogi, Celso; Lai, Ying-Cheng

    2016-12-01

    A crucial result in quantum chaos, which has been established for a long time, is that the spectral properties of classically integrable systems generically are described by Poisson statistics, whereas those of time-reversal symmetric, classically chaotic systems coincide with those of random matrices from the Gaussian orthogonal ensemble (GOE). Does this result hold for two-dimensional Dirac material systems? To address this fundamental question, we investigate the spectral properties in a representative class of graphene billiards with shapes of classically integrable circular-sector billiards. Naively one may expect to observe Poisson statistics, which is indeed true for energies close to the band edges where the quasiparticle obeys the Schrödinger equation. However, for energies near the Dirac point, where the quasiparticles behave like massless Dirac fermions, Poisson statistics is extremely rare in the sense that it emerges only under quite strict symmetry constraints on the straight boundary parts of the sector. An arbitrarily small amount of imperfection of the boundary results in GOE statistics. This implies that, for circular-sector confinements with arbitrary angle, the spectral properties will generically be GOE. These results are corroborated by extensive numerical computation. Furthermore, we provide a physical understanding for our results.

  11. Departure Queue Prediction for Strategic and Tactical Surface Scheduler Integration

    Zelinski, Shannon; Windhorst, Robert

    2016-01-01

    A departure metering concept to be demonstrated at Charlotte Douglas International Airport (CLT) will integrate strategic and tactical surface scheduling components to enable the respective collaborative decision making and improved efficiency benefits these two methods of scheduling provide. This study analyzes the effect of tactical scheduling on strategic scheduler predictability. Strategic queue predictions and target gate pushback times to achieve a desired queue length are compared between fast time simulations of CLT surface operations with and without tactical scheduling. The use of variable departure rates as a strategic scheduler input was shown to substantially improve queue predictions over static departure rates. With target queue length calibration, the strategic scheduler can be tuned to produce average delays within one minute of the tactical scheduler. However, root mean square differences between strategic and tactical delays were between 12 and 15 minutes due to the different methods the strategic and tactical schedulers use to predict takeoff times and generate gate pushback clearances. This demonstrates how difficult it is for the strategic scheduler to predict tactical scheduler assigned gate delays on an individual flight basis as the tactical scheduler adjusts departure sequence to accommodate arrival interactions. Strategic/tactical scheduler compatibility may be improved by providing more arrival information to the strategic scheduler and stabilizing tactical scheduler changes to runway sequence in response to arrivals.

  12. Controlling cyclic combustion timing variations using a symbol-statistics predictive approach in an HCCI engine

    Ghazimirsaied, Ahmad; Koch, Charles Robert

    2012-01-01

    Highlights: ► Misfire reduction in a combustion engine based on chaotic theory methods. ► Chaotic theory analysis of cyclic variation of a HCCI engine near misfire. ► Symbol sequence approach is used to predict ignition timing one cycle-ahead. ► Prediction is combined with feedback control to lower HCCI combustion variation. ► Feedback control extends the HCCI operating range into the misfire region. -- Abstract: Cyclic variation of a Homogeneous Charge Compression Ignition (HCCI) engine near misfire is analyzed using chaotic theory methods and feedback control is used to stabilize high cyclic variations. Variation of consecutive cycles of θ Pmax (the crank angle of maximum cylinder pressure over an engine cycle) for a Primary Reference Fuel engine is analyzed near misfire operation for five test points with similar conditions but different octane numbers. The return map of the time series of θ Pmax at each combustion cycle reveals the deterministic and random portions of the dynamics near misfire for this HCCI engine. A symbol-statistic approach is used to predict θ Pmax one cycle-ahead. Predicted θ Pmax has similar dynamical behavior to the experimental measurements. Based on this cycle ahead prediction, and using fuel octane as the input, feedback control is used to stabilize the instability of θ Pmax variations at this engine condition near misfire.

  13. Integration of HIV in the Human Genome: Which Sites Are Preferential? A Genetic and Statistical Assessment

    Gonçalves, Juliana; Moreira, Elsa; Sequeira, Inês J.; Rodrigues, António S.; Rueff, José; Brás, Aldina

    2016-01-01

    Chromosomal fragile sites (FSs) are loci where gaps and breaks may occur and are preferential integration targets for some viruses, for example, Hepatitis B, Epstein-Barr virus, HPV16, HPV18, and MLV vectors. However, the integration of the human immunodeficiency virus (HIV) in Giemsa bands and in FSs is not yet completely clear. This study aimed to assess the integration preferences of HIV in FSs and in Giemsa bands using an in silico study. HIV integration positions from Jurkat cells were used and two nonparametric tests were applied to compare HIV integration in dark versus light bands and in FS versus non-FS (NFSs). The results show that light bands are preferential targets for integration of HIV-1 in Jurkat cells and also that it integrates with equal intensity in FSs and in NFSs. The data indicates that HIV displays different preferences for FSs compared to other viruses. The aim was to develop and apply an approach to predict the conditions and constraints of HIV insertion in the human genome which seems to adequately complement empirical data. PMID:27294106

  14. RADSS: an integration of GIS, spatial statistics, and network service for regional data mining

    Hu, Haitang; Bao, Shuming; Lin, Hui; Zhu, Qing

    2005-10-01

    Regional data mining, which aims at the discovery of knowledge about spatial patterns, clusters or association between regions, has widely applications nowadays in social science, such as sociology, economics, epidemiology, crime, and so on. Many applications in the regional or other social sciences are more concerned with the spatial relationship, rather than the precise geographical location. Based on the spatial continuity rule derived from Tobler's first law of geography: observations at two sites tend to be more similar to each other if the sites are close together than if far apart, spatial statistics, as an important means for spatial data mining, allow the users to extract the interesting and useful information like spatial pattern, spatial structure, spatial association, spatial outlier and spatial interaction, from the vast amount of spatial data or non-spatial data. Therefore, by integrating with the spatial statistical methods, the geographical information systems will become more powerful in gaining further insights into the nature of spatial structure of regional system, and help the researchers to be more careful when selecting appropriate models. However, the lack of such tools holds back the application of spatial data analysis techniques and development of new methods and models (e.g., spatio-temporal models). Herein, we make an attempt to develop such an integrated software and apply it into the complex system analysis for the Poyang Lake Basin. This paper presents a framework for integrating GIS, spatial statistics and network service in regional data mining, as well as their implementation. After discussing the spatial statistics methods involved in regional complex system analysis, we introduce RADSS (Regional Analysis and Decision Support System), our new regional data mining tool, by integrating GIS, spatial statistics and network service. RADSS includes the functions of spatial data visualization, exploratory spatial data analysis, and

  15. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances

    Abut F

    2015-08-01

    Full Text Available Fatih Abut, Mehmet Fatih AkayDepartment of Computer Engineering, Çukurova University, Adana, TurkeyAbstract: Maximal oxygen uptake (VO2max indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance

  16. A statistical approach to the prediction of pressure tube fracture toughness

    Pandey, M.D.; Radford, D.D.

    2008-01-01

    The fracture toughness of the zirconium alloy (Zr-2.5Nb) is an important parameter in determining the flaw tolerance for operation of pressure tubes in a nuclear reactor. Fracture toughness data have been generated by performing rising pressure burst tests on sections of pressure tubes removed from operating reactors. The test data were used to generate a lower-bound fracture toughness curve, which is used in defining the operational limits of pressure tubes. The paper presents a comprehensive statistical analysis of burst test data and develops a multivariate statistical model to relate toughness with material chemistry, mechanical properties, and operational history. The proposed model can be useful in predicting fracture toughness of specific in-service pressure tubes, thereby minimizing conservatism associated with a generic lower-bound approach

  17. Flow prediction models using macroclimatic variables and multivariate statistical techniques in the Cauca River Valley

    Carvajal Escobar Yesid; Munoz, Flor Matilde

    2007-01-01

    The project this centred in the revision of the state of the art of the ocean-atmospheric phenomena that you affect the Colombian hydrology especially The Phenomenon Enos that causes a socioeconomic impact of first order in our country, it has not been sufficiently studied; therefore it is important to approach the thematic one, including the variable macroclimates associated to the Enos in the analyses of water planning. The analyses include revision of statistical techniques of analysis of consistency of hydrological data with the objective of conforming a database of monthly flow of the river reliable and homogeneous Cauca. Statistical methods are used (Analysis of data multivariante) specifically The analysis of principal components to involve them in the development of models of prediction of flows monthly means in the river Cauca involving the Lineal focus as they are the model autoregressive AR, ARX and Armax and the focus non lineal Net Artificial Network.

  18. Statistical Diagnosis Method of Conductor Motions in Superconducting Magnets to Predict their Quench Performance

    Khomenko, B A; Rijllart, A; Sanfilippo, S; Siemko, A

    2001-01-01

    Premature training quenches are usually caused by the transient energy released within the magnet coil as it is energised. Two distinct varieties of disturbances exist. They are thought to be electrical and mechanical in origin. The first type of disturbance comes from non-uniform current distribution in superconducting cables whereas the second one usually originates from conductor motions or micro-fractures of insulating materials under the action of Lorentz forces. All of these mechanical events produce in general a rapid variation of the voltages in the so-called quench antennas and across the magnet coil, called spikes. A statistical method to treat the spatial localisation and the time occurrence of spikes will be presented. It allows identification of the mechanical weak points in the magnet without need to increase the current to provoke a quench. The prediction of the quench level from detailed analysis of the spike statistics can be expected.

  19. Hydrogen-bond coordination in organic crystal structures: statistics, predictions and applications.

    Galek, Peter T A; Chisholm, James A; Pidcock, Elna; Wood, Peter A

    2014-02-01

    Statistical models to predict the number of hydrogen bonds that might be formed by any donor or acceptor atom in a crystal structure have been derived using organic structures in the Cambridge Structural Database. This hydrogen-bond coordination behaviour has been uniquely defined for more than 70 unique atom types, and has led to the development of a methodology to construct hypothetical hydrogen-bond arrangements. Comparing the constructed hydrogen-bond arrangements with known crystal structures shows promise in the assessment of structural stability, and some initial examples of industrially relevant polymorphs, co-crystals and hydrates are described.

  20. Prediction of noise in ships by the application of “statistical energy analysis.”

    Jensen, John Ødegaard

    1979-01-01

    If it will be possible effectively to reduce the noise level in the accomodation on board ships, by introducing appropriate noise abatement measures already at an early design stage, it is quite essential that sufficiently accurate prediction methods are available for the naval architects...... or for a special noise abatement measure, e.g., increased structural damping. The paper discusses whether it might be possible to derive an alternative calculation model based on the “statistical energy analysis” approach (SEA). By considering the hull of a ship to be constructed from plate elements connected...

  1. Statistical significance of theoretical predictions: A new dimension in nuclear structure theories (I)

    DUDEK, J; SZPAK, B; FORNAL, B; PORQUET, M-G

    2011-01-01

    In this and the follow-up article we briefly discuss what we believe represents one of the most serious problems in contemporary nuclear structure: the question of statistical significance of parametrizations of nuclear microscopic Hamiltonians and the implied predictive power of the underlying theories. In the present Part I, we introduce the main lines of reasoning of the so-called Inverse Problem Theory, an important sub-field in the contemporary Applied Mathematics, here illustrated on the example of the Nuclear Mean-Field Approach.

  2. Predictive Solar-Integrated Commercial Building Load Control

    Glasgow, Nathan [EdgePower Inc., Aspen, CO (United States)

    2017-01-31

    This report is the final technical report for the Department of Energy SunShot award number EE0007180 to EdgePower Inc., for the project entitled “Predictive Solar-Integrated Commercial Building Load Control.” The goal of this project was to successfully prove that the integration of solar forecasting and building load control can reduce demand charge costs for commercial building owners with solar PV. This proof of concept Tier 0 project demonstrated its value through a pilot project at a commercial building. This final report contains a summary of the work completed through he duration of the project. Clean Power Research was a sub-recipient on the award.

  3. Impact of Statistical Learning Methods on the Predictive Power of Multivariate Normal Tissue Complication Probability Models

    Xu Chengjian, E-mail: c.j.xu@umcg.nl [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands); Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van' t [Department of Radiation Oncology, University of Groningen, University Medical Center Groningen, Groningen (Netherlands)

    2012-03-15

    Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.

  4. Impact of statistical learning methods on the predictive power of multivariate normal tissue complication probability models.

    Xu, Cheng-Jian; van der Schaaf, Arjen; Schilstra, Cornelis; Langendijk, Johannes A; van't Veld, Aart A

    2012-03-15

    To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended. Copyright © 2012 Elsevier Inc. All rights reserved.

  5. Prediction and reconstruction of future and missing unobservable modified Weibull lifetime based on generalized order statistics

    Amany E. Aly

    2016-04-01

    Full Text Available When a system consisting of independent components of the same type, some appropriate actions may be done as soon as a portion of them have failed. It is, therefore, important to be able to predict later failure times from earlier ones. One of the well-known failure distributions commonly used to model component life, is the modified Weibull distribution (MWD. In this paper, two pivotal quantities are proposed to construct prediction intervals for future unobservable lifetimes based on generalized order statistics (gos from MWD. Moreover, a pivotal quantity is developed to reconstruct missing observations at the beginning of experiment. Furthermore, Monte Carlo simulation studies are conducted and numerical computations are carried out to investigate the efficiency of presented results. Finally, two illustrative examples for real data sets are analyzed.

  6. Impact of Statistical Learning Methods on the Predictive Power of Multivariate Normal Tissue Complication Probability Models

    Xu Chengjian; Schaaf, Arjen van der; Schilstra, Cornelis; Langendijk, Johannes A.; Veld, Aart A. van’t

    2012-01-01

    Purpose: To study the impact of different statistical learning methods on the prediction performance of multivariate normal tissue complication probability (NTCP) models. Methods and Materials: In this study, three learning methods, stepwise selection, least absolute shrinkage and selection operator (LASSO), and Bayesian model averaging (BMA), were used to build NTCP models of xerostomia following radiotherapy treatment for head and neck cancer. Performance of each learning method was evaluated by a repeated cross-validation scheme in order to obtain a fair comparison among methods. Results: It was found that the LASSO and BMA methods produced models with significantly better predictive power than that of the stepwise selection method. Furthermore, the LASSO method yields an easily interpretable model as the stepwise method does, in contrast to the less intuitive BMA method. Conclusions: The commonly used stepwise selection method, which is simple to execute, may be insufficient for NTCP modeling. The LASSO method is recommended.

  7. Predicting energy performance of a net-zero energy building: A statistical approach

    Kneifel, Joshua; Webb, David

    2016-01-01

    Highlights: • A regression model is applied to actual energy data from a net-zero energy building. • The model is validated through a rigorous statistical analysis. • Comparisons are made between model predictions and those of a physics-based model. • The model is a viable baseline for evaluating future models from the energy data. - Abstract: Performance-based building requirements have become more prevalent because it gives freedom in building design while still maintaining or exceeding the energy performance required by prescriptive-based requirements. In order to determine if building designs reach target energy efficiency improvements, it is necessary to estimate the energy performance of a building using predictive models and different weather conditions. Physics-based whole building energy simulation modeling is the most common approach. However, these physics-based models include underlying assumptions and require significant amounts of information in order to specify the input parameter values. An alternative approach to test the performance of a building is to develop a statistically derived predictive regression model using post-occupancy data that can accurately predict energy consumption and production based on a few common weather-based factors, thus requiring less information than simulation models. A regression model based on measured data should be able to predict energy performance of a building for a given day as long as the weather conditions are similar to those during the data collection time frame. This article uses data from the National Institute of Standards and Technology (NIST) Net-Zero Energy Residential Test Facility (NZERTF) to develop and validate a regression model to predict the energy performance of the NZERTF using two weather variables aggregated to the daily level, applies the model to estimate the energy performance of hypothetical NZERTFs located in different cities in the Mixed-Humid Climate Zone, and compares these

  8. Evaluating Computer-Based Simulations, Multimedia and Animations that Help Integrate Blended Learning with Lectures in First Year Statistics

    Neumann, David L.; Neumann, Michelle M.; Hood, Michelle

    2011-01-01

    The discipline of statistics seems well suited to the integration of technology in a lecture as a means to enhance student learning and engagement. Technology can be used to simulate statistical concepts, create interactive learning exercises, and illustrate real world applications of statistics. The present study aimed to better understand the…

  9. Statistical Modeling and Prediction for Tourism Economy Using Dendritic Neural Network.

    Yu, Ying; Wang, Yirui; Gao, Shangce; Tang, Zheng

    2017-01-01

    With the impact of global internationalization, tourism economy has also been a rapid development. The increasing interest aroused by more advanced forecasting methods leads us to innovate forecasting methods. In this paper, the seasonal trend autoregressive integrated moving averages with dendritic neural network model (SA-D model) is proposed to perform the tourism demand forecasting. First, we use the seasonal trend autoregressive integrated moving averages model (SARIMA model) to exclude the long-term linear trend and then train the residual data by the dendritic neural network model and make a short-term prediction. As the result showed in this paper, the SA-D model can achieve considerably better predictive performances. In order to demonstrate the effectiveness of the SA-D model, we also use the data that other authors used in the other models and compare the results. It also proved that the SA-D model achieved good predictive performances in terms of the normalized mean square error, absolute percentage of error, and correlation coefficient.

  10. Statistical Modeling and Prediction for Tourism Economy Using Dendritic Neural Network

    Ying Yu

    2017-01-01

    Full Text Available With the impact of global internationalization, tourism economy has also been a rapid development. The increasing interest aroused by more advanced forecasting methods leads us to innovate forecasting methods. In this paper, the seasonal trend autoregressive integrated moving averages with dendritic neural network model (SA-D model is proposed to perform the tourism demand forecasting. First, we use the seasonal trend autoregressive integrated moving averages model (SARIMA model to exclude the long-term linear trend and then train the residual data by the dendritic neural network model and make a short-term prediction. As the result showed in this paper, the SA-D model can achieve considerably better predictive performances. In order to demonstrate the effectiveness of the SA-D model, we also use the data that other authors used in the other models and compare the results. It also proved that the SA-D model achieved good predictive performances in terms of the normalized mean square error, absolute percentage of error, and correlation coefficient.

  11. Integrated predictive modelling simulations of burning plasma experiment designs

    Bateman, Glenn; Onjun, Thawatchai; Kritz, Arnold H

    2003-01-01

    Models for the height of the pedestal at the edge of H-mode plasmas (Onjun T et al 2002 Phys. Plasmas 9 5018) are used together with the Multi-Mode core transport model (Bateman G et al 1998 Phys. Plasmas 5 1793) in the BALDUR integrated predictive modelling code to predict the performance of the ITER (Aymar A et al 2002 Plasma Phys. Control. Fusion 44 519), FIRE (Meade D M et al 2001 Fusion Technol. 39 336), and IGNITOR (Coppi B et al 2001 Nucl. Fusion 41 1253) fusion reactor designs. The simulation protocol used in this paper is tested by comparing predicted temperature and density profiles against experimental data from 33 H-mode discharges in the JET (Rebut P H et al 1985 Nucl. Fusion 25 1011) and DIII-D (Luxon J L et al 1985 Fusion Technol. 8 441) tokamaks. The sensitivities of the predictions are evaluated for the burning plasma experimental designs by using variations of the pedestal temperature model that are one standard deviation above and below the standard model. Simulations of the fusion reactor designs are carried out for scans in which the plasma density and auxiliary heating power are varied

  12. Wind gust estimation by combining numerical weather prediction model and statistical post-processing

    Patlakas, Platon; Drakaki, Eleni; Galanis, George; Spyrou, Christos; Kallos, George

    2017-04-01

    The continuous rise of off-shore and near-shore activities as well as the development of structures, such as wind farms and various offshore platforms, requires the employment of state-of-the-art risk assessment techniques. Such analysis is used to set the safety standards and can be characterized as a climatologically oriented approach. Nevertheless, a reliable operational support is also needed in order to minimize cost drawbacks and human danger during the construction and the functioning stage as well as during maintenance activities. One of the most important parameters for this kind of analysis is the wind speed intensity and variability. A critical measure associated with this variability is the presence and magnitude of wind gusts as estimated in the reference level of 10m. The latter can be attributed to different processes that vary among boundary-layer turbulence, convection activities, mountain waves and wake phenomena. The purpose of this work is the development of a wind gust forecasting methodology combining a Numerical Weather Prediction model and a dynamical statistical tool based on Kalman filtering. To this end, the parameterization of Wind Gust Estimate method was implemented to function within the framework of the atmospheric model SKIRON/Dust. The new modeling tool combines the atmospheric model with a statistical local adaptation methodology based on Kalman filters. This has been tested over the offshore west coastline of the United States. The main purpose is to provide a useful tool for wind analysis and prediction and applications related to offshore wind energy (power prediction, operation and maintenance). The results have been evaluated by using observational data from the NOAA's buoy network. As it was found, the predicted output shows a good behavior that is further improved after the local adjustment post-process.

  13. Multiple Kernel Learning with Random Effects for Predicting Longitudinal Outcomes and Data Integration

    Chen, Tianle; Zeng, Donglin

    2015-01-01

    Summary Predicting disease risk and progression is one of the main goals in many clinical research studies. Cohort studies on the natural history and etiology of chronic diseases span years and data are collected at multiple visits. Although kernel-based statistical learning methods are proven to be powerful for a wide range of disease prediction problems, these methods are only well studied for independent data but not for longitudinal data. It is thus important to develop time-sensitive prediction rules that make use of the longitudinal nature of the data. In this paper, we develop a novel statistical learning method for longitudinal data by introducing subject-specific short-term and long-term latent effects through a designed kernel to account for within-subject correlation of longitudinal measurements. Since the presence of multiple sources of data is increasingly common, we embed our method in a multiple kernel learning framework and propose a regularized multiple kernel statistical learning with random effects to construct effective nonparametric prediction rules. Our method allows easy integration of various heterogeneous data sources and takes advantage of correlation among longitudinal measures to increase prediction power. We use different kernels for each data source taking advantage of the distinctive feature of each data modality, and then optimally combine data across modalities. We apply the developed methods to two large epidemiological studies, one on Huntington's disease and the other on Alzheimer's Disease (Alzheimer's Disease Neuroimaging Initiative, ADNI) where we explore a unique opportunity to combine imaging and genetic data to study prediction of mild cognitive impairment, and show a substantial gain in performance while accounting for the longitudinal aspect of the data. PMID:26177419

  14. The use of machine learning and nonlinear statistical tools for ADME prediction.

    Sakiyama, Yojiro

    2009-02-01

    Absorption, distribution, metabolism and excretion (ADME)-related failure of drug candidates is a major issue for the pharmaceutical industry today. Prediction of ADME by in silico tools has now become an inevitable paradigm to reduce cost and enhance efficiency in pharmaceutical research. Recently, machine learning as well as nonlinear statistical tools has been widely applied to predict routine ADME end points. To achieve accurate and reliable predictions, it would be a prerequisite to understand the concepts, mechanisms and limitations of these tools. Here, we have devised a small synthetic nonlinear data set to help understand the mechanism of machine learning by 2D-visualisation. We applied six new machine learning methods to four different data sets. The methods include Naive Bayes classifier, classification and regression tree, random forest, Gaussian process, support vector machine and k nearest neighbour. The results demonstrated that ensemble learning and kernel machine displayed greater accuracy of prediction than classical methods irrespective of the data set size. The importance of interaction with the engineering field is also addressed. The results described here provide insights into the mechanism of machine learning, which will enable appropriate usage in the future.

  15. Predicting The Exit Time Of Employees In An Organization Using Statistical Model

    Ahmed Al Kuwaiti

    2015-08-01

    Full Text Available Employees are considered as an asset to any organization and each organization provide a better and flexible working environment to retain its best and resourceful workforce. As such continuous efforts are being taken to avoid or extend the exitwithdrawal of employees from the organization. Human resource managers are facing a challenge to predict the exit time of employees and there is no precise model existing at present in the literature. This study has been conducted to predict the probability of exit of an employee in an organization using appropriate statistical model. Accordingly authors designed a model using Additive Weibull distribution to predict the expected exit time of employee in an organization. In addition a Shock model approach is also executed to check how well the Additive Weibull distribution suits in an organization. The analytical results showed that when the inter-arrival time increases the expected time for the employees to exit also increases. This study concluded that Additive Weibull distribution can be considered as an alternative in the place of Shock model approach to predict the exit time of employee in an organization.

  16. Integration of modern statistical tools for the analysis of climate extremes into the web-GIS “CLIMATE”

    Ryazanova, A. A.; Okladnikov, I. G.; Gordov, E. P.

    2017-11-01

    The frequency of occurrence and magnitude of precipitation and temperature extreme events show positive trends in several geographical regions. These events must be analyzed and studied in order to better understand their impact on the environment, predict their occurrences, and mitigate their effects. For this purpose, we augmented web-GIS called “CLIMATE” to include a dedicated statistical package developed in the R language. The web-GIS “CLIMATE” is a software platform for cloud storage processing and visualization of distributed archives of spatial datasets. It is based on a combined use of web and GIS technologies with reliable procedures for searching, extracting, processing, and visualizing the spatial data archives. The system provides a set of thematic online tools for the complex analysis of current and future climate changes and their effects on the environment. The package includes new powerful methods of time-dependent statistics of extremes, quantile regression and copula approach for the detailed analysis of various climate extreme events. Specifically, the very promising copula approach allows obtaining the structural connections between the extremes and the various environmental characteristics. The new statistical methods integrated into the web-GIS “CLIMATE” can significantly facilitate and accelerate the complex analysis of climate extremes using only a desktop PC connected to the Internet.

  17. Protein thermostability prediction within homologous families using temperature-dependent statistical potentials.

    Fabrizio Pucci

    Full Text Available The ability to rationally modify targeted physical and biological features of a protein of interest holds promise in numerous academic and industrial applications and paves the way towards de novo protein design. In particular, bioprocesses that utilize the remarkable properties of enzymes would often benefit from mutants that remain active at temperatures that are either higher or lower than the physiological temperature, while maintaining the biological activity. Many in silico methods have been developed in recent years for predicting the thermodynamic stability of mutant proteins, but very few have focused on thermostability. To bridge this gap, we developed an algorithm for predicting the best descriptor of thermostability, namely the melting temperature Tm, from the protein's sequence and structure. Our method is applicable when the Tm of proteins homologous to the target protein are known. It is based on the design of several temperature-dependent statistical potentials, derived from datasets consisting of either mesostable or thermostable proteins. Linear combinations of these potentials have been shown to yield an estimation of the protein folding free energies at low and high temperatures, and the difference of these energies, a prediction of the melting temperature. This particular construction, that distinguishes between the interactions that contribute more than others to the stability at high temperatures and those that are more stabilizing at low T, gives better performances compared to the standard approach based on T-independent potentials which predict the thermal resistance from the thermodynamic stability. Our method has been tested on 45 proteins of known Tm that belong to 11 homologous families. The standard deviation between experimental and predicted Tm's is equal to 13.6°C in cross validation, and decreases to 8.3°C if the 6 worst predicted proteins are excluded. Possible extensions of our approach are discussed.

  18. Machine learning and statistical methods for the prediction of maximal oxygen uptake: recent advances.

    Abut, Fatih; Akay, Mehmet Fatih

    2015-01-01

    Maximal oxygen uptake (VO2max) indicates how many milliliters of oxygen the body can consume in a state of intense exercise per minute. VO2max plays an important role in both sport and medical sciences for different purposes, such as indicating the endurance capacity of athletes or serving as a metric in estimating the disease risk of a person. In general, the direct measurement of VO2max provides the most accurate assessment of aerobic power. However, despite a high level of accuracy, practical limitations associated with the direct measurement of VO2max, such as the requirement of expensive and sophisticated laboratory equipment or trained staff, have led to the development of various regression models for predicting VO2max. Consequently, a lot of studies have been conducted in the last years to predict VO2max of various target audiences, ranging from soccer athletes, nonexpert swimmers, cross-country skiers to healthy-fit adults, teenagers, and children. Numerous prediction models have been developed using different sets of predictor variables and a variety of machine learning and statistical methods, including support vector machine, multilayer perceptron, general regression neural network, and multiple linear regression. The purpose of this study is to give a detailed overview about the data-driven modeling studies for the prediction of VO2max conducted in recent years and to compare the performance of various VO2max prediction models reported in related literature in terms of two well-known metrics, namely, multiple correlation coefficient (R) and standard error of estimate. The survey results reveal that with respect to regression methods used to develop prediction models, support vector machine, in general, shows better performance than other methods, whereas multiple linear regression exhibits the worst performance.

  19. A reductionist perspective on quantum statistical mechanics: Coarse-graining of path integrals.

    Sinitskiy, Anton V; Voth, Gregory A

    2015-09-07

    Computational modeling of the condensed phase based on classical statistical mechanics has been rapidly developing over the last few decades and has yielded important information on various systems containing up to millions of atoms. However, if a system of interest contains important quantum effects, well-developed classical techniques cannot be used. One way of treating finite temperature quantum systems at equilibrium has been based on Feynman's imaginary time path integral approach and the ensuing quantum-classical isomorphism. This isomorphism is exact only in the limit of infinitely many classical quasiparticles representing each physical quantum particle. In this work, we present a reductionist perspective on this problem based on the emerging methodology of coarse-graining. This perspective allows for the representations of one quantum particle with only two classical-like quasiparticles and their conjugate momenta. One of these coupled quasiparticles is the centroid particle of the quantum path integral quasiparticle distribution. Only this quasiparticle feels the potential energy function. The other quasiparticle directly provides the observable averages of quantum mechanical operators. The theory offers a simplified perspective on quantum statistical mechanics, revealing its most reductionist connection to classical statistical physics. By doing so, it can facilitate a simpler representation of certain quantum effects in complex molecular environments.

  20. A reductionist perspective on quantum statistical mechanics: Coarse-graining of path integrals

    Sinitskiy, Anton V.; Voth, Gregory A.

    2015-01-01

    Computational modeling of the condensed phase based on classical statistical mechanics has been rapidly developing over the last few decades and has yielded important information on various systems containing up to millions of atoms. However, if a system of interest contains important quantum effects, well-developed classical techniques cannot be used. One way of treating finite temperature quantum systems at equilibrium has been based on Feynman’s imaginary time path integral approach and the ensuing quantum-classical isomorphism. This isomorphism is exact only in the limit of infinitely many classical quasiparticles representing each physical quantum particle. In this work, we present a reductionist perspective on this problem based on the emerging methodology of coarse-graining. This perspective allows for the representations of one quantum particle with only two classical-like quasiparticles and their conjugate momenta. One of these coupled quasiparticles is the centroid particle of the quantum path integral quasiparticle distribution. Only this quasiparticle feels the potential energy function. The other quasiparticle directly provides the observable averages of quantum mechanical operators. The theory offers a simplified perspective on quantum statistical mechanics, revealing its most reductionist connection to classical statistical physics. By doing so, it can facilitate a simpler representation of certain quantum effects in complex molecular environments

  1. The duration of uncertain times: audiovisual information about intervals is integrated in a statistically optimal fashion.

    Jess Hartcher-O'Brien

    Full Text Available Often multisensory information is integrated in a statistically optimal fashion where each sensory source is weighted according to its precision. This integration scheme isstatistically optimal because it theoretically results in unbiased perceptual estimates with the highest precisionpossible.There is a current lack of consensus about how the nervous system processes multiple sensory cues to elapsed time.In order to shed light upon this, we adopt a computational approach to pinpoint the integration strategy underlying duration estimationof audio/visual stimuli. One of the assumptions of our computational approach is that the multisensory signals redundantly specify the same stimulus property. Our results clearly show that despite claims to the contrary, perceived duration is the result of an optimal weighting process, similar to that adopted for estimates of space. That is, participants weight the audio and visual information to arrive at the most precise, single duration estimate possible. The work also disentangles how different integration strategies - i.e. consideringthe time of onset/offset ofsignals - might alter the final estimate. As such we provide the first concrete evidence of an optimal integration strategy in human duration estimates.

  2. A Unified Statistical Rain-Attenuation Model for Communication Link Fade Predictions and Optimal Stochastic Fade Control Design Using a Location-Dependent Rain-Statistic Database

    Manning, Robert M.

    1990-01-01

    A static and dynamic rain-attenuation model is presented which describes the statistics of attenuation on an arbitrarily specified satellite link for any location for which there are long-term rainfall statistics. The model may be used in the design of the optimal stochastic control algorithms to mitigate the effects of attenuation and maintain link reliability. A rain-statistics data base is compiled, which makes it possible to apply the model to any location in the continental U.S. with a resolution of 0-5 degrees in latitude and longitude. The model predictions are compared with experimental observations, showing good agreement.

  3. Predicting losing and gaining river reaches in lowland New Zealand based on a statistical methodology

    Yang, Jing; Zammit, Christian; Dudley, Bruce

    2017-04-01

    The phenomenon of losing and gaining in rivers normally takes place in lowland where often there are various, sometimes conflicting uses for water resources, e.g., agriculture, industry, recreation, and maintenance of ecosystem function. To better support water allocation decisions, it is crucial to understand the location and seasonal dynamics of these losses and gains. We present a statistical methodology to predict losing and gaining river reaches in New Zealand based on 1) information surveys with surface water and groundwater experts from regional government, 2) A collection of river/watershed characteristics, including climate, soil and hydrogeologic information, and 3) the random forests technique. The surveys on losing and gaining reaches were conducted face-to-face at 16 New Zealand regional government authorities, and climate, soil, river geometry, and hydrogeologic data from various sources were collected and compiled to represent river/watershed characteristics. The random forests technique was used to build up the statistical relationship between river reach status (gain and loss) and river/watershed characteristics, and then to predict for river reaches at Strahler order one without prior losing and gaining information. Results show that the model has a classification error of around 10% for "gain" and "loss". The results will assist further research, and water allocation decisions in lowland New Zealand.

  4. Predicting future protection of respirator users: Statistical approaches and practical implications.

    Hu, Chengcheng; Harber, Philip; Su, Jing

    2016-01-01

    The purpose of this article is to describe a statistical approach for predicting a respirator user's fit factor in the future based upon results from initial tests. A statistical prediction model was developed based upon joint distribution of multiple fit factor measurements over time obtained from linear mixed effect models. The model accounts for within-subject correlation as well as short-term (within one day) and longer-term variability. As an example of applying this approach, model parameters were estimated from a research study in which volunteers were trained by three different modalities to use one of two types of respirators. They underwent two quantitative fit tests at the initial session and two on the same day approximately six months later. The fitted models demonstrated correlation and gave the estimated distribution of future fit test results conditional on past results for an individual worker. This approach can be applied to establishing a criterion value for passing an initial fit test to provide reasonable likelihood that a worker will be adequately protected in the future; and to optimizing the repeat fit factor test intervals individually for each user for cost-effective testing.

  5. Predicting tube repair at French nuclear steam generators using statistical modeling

    Mathon, C., E-mail: cedric.mathon@edf.fr [EDF Generation, Basic Design Department (SEPTEN), 69628 Villeurbanne (France); Chaudhary, A. [EDF Generation, Basic Design Department (SEPTEN), 69628 Villeurbanne (France); Gay, N.; Pitner, P. [EDF Generation, Nuclear Operation Division (UNIE), Saint-Denis (France)

    2014-04-01

    Electricité de France (EDF) currently operates a total of 58 Nuclear Pressurized Water Reactors (PWR) which are composed of 34 units of 900 MWe, 20 units of 1300 MWe and 4 units of 1450 MWe. This report provides an overall status of SG tube bundles on the 1300 MWe units. These units are 4 loop reactors using the AREVA 68/19 type SG model which are equipped either with Alloy 600 thermally treated (TT) tubes or Alloy 690 TT tubes. As of 2011, the effective full power years of operation (EFPY) ranges from 13 to 20 and during this time, the main degradation mechanisms observed on SG tubes are primary water stress corrosion cracking (PWSCC) and wear at anti-vibration bars (AVB) level. Statistical models have been developed for each type of degradation in order to predict the growth rate and number of affected tubes. Additional plugging is also performed to prevent other degradations such as tube wear due to foreign objects or high-cycle flow-induced fatigue. The contribution of these degradation mechanisms on the rate of tube plugging is described. The results from the statistical models are then used in predicting the long-term life of the steam generators and therefore providing a useful tool toward their effective life management and possible replacement.

  6. Sparse Power-Law Network Model for Reliable Statistical Predictions Based on Sampled Data

    Alexander P. Kartun-Giles

    2018-04-01

    Full Text Available A projective network model is a model that enables predictions to be made based on a subsample of the network data, with the predictions remaining unchanged if a larger sample is taken into consideration. An exchangeable model is a model that does not depend on the order in which nodes are sampled. Despite a large variety of non-equilibrium (growing and equilibrium (static sparse complex network models that are widely used in network science, how to reconcile sparseness (constant average degree with the desired statistical properties of projectivity and exchangeability is currently an outstanding scientific problem. Here we propose a network process with hidden variables which is projective and can generate sparse power-law networks. Despite the model not being exchangeable, it can be closely related to exchangeable uncorrelated networks as indicated by its information theory characterization and its network entropy. The use of the proposed network process as a null model is here tested on real data, indicating that the model offers a promising avenue for statistical network modelling.

  7. Neighborhood Integration and Connectivity Predict Cognitive Performance and Decline

    Amber Watts PhD

    2015-08-01

    Full Text Available Objective: Neighborhood characteristics may be important for promoting walking, but little research has focused on older adults, especially those with cognitive impairment. We evaluated the role of neighborhood characteristics on cognitive function and decline over a 2-year period adjusting for measures of walking. Method: In a study of 64 older adults with and without mild Alzheimer’s disease (AD, we evaluated neighborhood integration and connectivity using geographical information systems data and space syntax analysis. In multiple regression analyses, we used these characteristics to predict 2-year declines in factor analytically derived cognitive scores (attention, verbal memory, mental status adjusting for age, sex, education, and self-reported walking. Results : Neighborhood integration and connectivity predicted cognitive performance at baseline, and changes in cognitive performance over 2 years. The relationships between neighborhood characteristics and cognitive performance were not fully explained by self-reported walking. Discussion : Clearer definitions of specific neighborhood characteristics associated with walkability are needed to better understand the mechanisms by which neighborhoods may impact cognitive outcomes. These results have implications for measuring neighborhood characteristics, design and maintenance of living spaces, and interventions to increase walking among older adults. We offer suggestions for future research measuring neighborhood characteristics and cognitive function.

  8. A generic statistical methodology to predict the maximum pit depth of a localized corrosion process

    Jarrah, A.; Bigerelle, M.; Guillemot, G.; Najjar, D.; Iost, A.; Nianga, J.-M.

    2011-01-01

    Highlights: → We propose a methodology to predict the maximum pit depth in a corrosion process. → Generalized Lambda Distribution and the Computer Based Bootstrap Method are combined. → GLD fit a large variety of distributions both in their central and tail regions. → Minimum thickness preventing perforation can be estimated with a safety margin. → Considering its applications, this new approach can help to size industrial pieces. - Abstract: This paper outlines a new methodology to predict accurately the maximum pit depth related to a localized corrosion process. It combines two statistical methods: the Generalized Lambda Distribution (GLD), to determine a model of distribution fitting with the experimental frequency distribution of depths, and the Computer Based Bootstrap Method (CBBM), to generate simulated distributions equivalent to the experimental one. In comparison with conventionally established statistical methods that are restricted to the use of inferred distributions constrained by specific mathematical assumptions, the major advantage of the methodology presented in this paper is that both the GLD and the CBBM enable a statistical treatment of the experimental data without making any preconceived choice neither on the unknown theoretical parent underlying distribution of pit depth which characterizes the global corrosion phenomenon nor on the unknown associated theoretical extreme value distribution which characterizes the deepest pits. Considering an experimental distribution of depths of pits produced on an aluminium sample, estimations of maximum pit depth using a GLD model are compared to similar estimations based on usual Gumbel and Generalized Extreme Value (GEV) methods proposed in the corrosion engineering literature. The GLD approach is shown having smaller bias and dispersion in the estimation of the maximum pit depth than the Gumbel approach both for its realization and mean. This leads to comparing the GLD approach to the GEV one

  9. Relative effects of statistical preprocessing and postprocessing on a regional hydrological ensemble prediction system

    Sharma, Sanjib; Siddique, Ridwan; Reed, Seann; Ahnert, Peter; Mendoza, Pablo; Mejia, Alfonso

    2018-03-01

    The relative roles of statistical weather preprocessing and streamflow postprocessing in hydrological ensemble forecasting at short- to medium-range forecast lead times (day 1-7) are investigated. For this purpose, a regional hydrologic ensemble prediction system (RHEPS) is developed and implemented. The RHEPS is comprised of the following components: (i) hydrometeorological observations (multisensor precipitation estimates, gridded surface temperature, and gauged streamflow); (ii) weather ensemble forecasts (precipitation and near-surface temperature) from the National Centers for Environmental Prediction 11-member Global Ensemble Forecast System Reforecast version 2 (GEFSRv2); (iii) NOAA's Hydrology Laboratory-Research Distributed Hydrologic Model (HL-RDHM); (iv) heteroscedastic censored logistic regression (HCLR) as the statistical preprocessor; (v) two statistical postprocessors, an autoregressive model with a single exogenous variable (ARX(1,1)) and quantile regression (QR); and (vi) a comprehensive verification strategy. To implement the RHEPS, 1 to 7 days weather forecasts from the GEFSRv2 are used to force HL-RDHM and generate raw ensemble streamflow forecasts. Forecasting experiments are conducted in four nested basins in the US Middle Atlantic region, ranging in size from 381 to 12 362 km2. Results show that the HCLR preprocessed ensemble precipitation forecasts have greater skill than the raw forecasts. These improvements are more noticeable in the warm season at the longer lead times (> 3 days). Both postprocessors, ARX(1,1) and QR, show gains in skill relative to the raw ensemble streamflow forecasts, particularly in the cool season, but QR outperforms ARX(1,1). The scenarios that implement preprocessing and postprocessing separately tend to perform similarly, although the postprocessing-alone scenario is often more effective. The scenario involving both preprocessing and postprocessing consistently outperforms the other scenarios. In some cases

  10. Path integral molecular dynamics for exact quantum statistics of multi-electronic-state systems.

    Liu, Xinzijian; Liu, Jian

    2018-03-14

    An exact approach to compute physical properties for general multi-electronic-state (MES) systems in thermal equilibrium is presented. The approach is extended from our recent progress on path integral molecular dynamics (PIMD), Liu et al. [J. Chem. Phys. 145, 024103 (2016)] and Zhang et al. [J. Chem. Phys. 147, 034109 (2017)], for quantum statistical mechanics when a single potential energy surface is involved. We first define an effective potential function that is numerically favorable for MES-PIMD and then derive corresponding estimators in MES-PIMD for evaluating various physical properties. Its application to several representative one-dimensional and multi-dimensional models demonstrates that MES-PIMD in principle offers a practical tool in either of the diabatic and adiabatic representations for studying exact quantum statistics of complex/large MES systems when the Born-Oppenheimer approximation, Condon approximation, and harmonic bath approximation are broken.

  11. Early prediction of lung cancer recurrence after stereotactic radiotherapy using second order texture statistics

    Mattonen, Sarah A.; Palma, David A.; Haasbeek, Cornelis J. A.; Senan, Suresh; Ward, Aaron D.

    2014-03-01

    Benign radiation-induced lung injury is a common finding following stereotactic ablative radiotherapy (SABR) for lung cancer, and is often difficult to differentiate from a recurring tumour due to the ablative doses and highly conformal treatment with SABR. Current approaches to treatment response assessment have shown limited ability to predict recurrence within 6 months of treatment. The purpose of our study was to evaluate the accuracy of second order texture statistics for prediction of eventual recurrence based on computed tomography (CT) images acquired within 6 months of treatment, and compare with the performance of first order appearance and lesion size measures. Consolidative and ground-glass opacity (GGO) regions were manually delineated on post-SABR CT images. Automatic consolidation expansion was also investigated to act as a surrogate for GGO position. The top features for prediction of recurrence were all texture features within the GGO and included energy, entropy, correlation, inertia, and first order texture (standard deviation of density). These predicted recurrence with 2-fold cross validation (CV) accuracies of 70-77% at 2- 5 months post-SABR, with energy, entropy, and first order texture having leave-one-out CV accuracies greater than 80%. Our results also suggest that automatic expansion of the consolidation region could eliminate the need for manual delineation, and produced reproducible results when compared to manually delineated GGO. If validated on a larger data set, this could lead to a clinically useful computer-aided diagnosis system for prediction of recurrence within 6 months of SABR and allow for early salvage therapy for patients with recurrence.

  12. REMAINING LIFE TIME PREDICTION OF BEARINGS USING K-STAR ALGORITHM – A STATISTICAL APPROACH

    R. SATISHKUMAR

    2017-01-01

    Full Text Available The role of bearings is significant in reducing the down time of all rotating machineries. The increasing trend of bearing failures in recent times has triggered the need and importance of deployment of condition monitoring. There are multiple factors associated to a bearing failure while it is in operation. Hence, a predictive strategy is required to evaluate the current state of the bearings in operation. In past, predictive models with regression techniques were widely used for bearing lifetime estimations. The Objective of this paper is to estimate the remaining useful life of bearings through a machine learning approach. The ultimate objective of this study is to strengthen the predictive maintenance. The present study was done using classification approach following the concepts of machine learning and a predictive model was built to calculate the residual lifetime of bearings in operation. Vibration signals were acquired on a continuous basis from an experiment wherein the bearings are made to run till it fails naturally. It should be noted that the experiment was carried out with new bearings at pre-defined load and speed conditions until the bearing fails on its own. In the present work, statistical features were deployed and feature selection process was carried out using J48 decision tree and selected features were used to develop the prognostic model. The K-Star classification algorithm, a supervised machine learning technique is made use of in building a predictive model to estimate the lifetime of bearings. The performance of classifier was cross validated with distinct data. The result shows that the K-Star classification model gives 98.56% classification accuracy with selected features.

  13. Generalized Hamiltonians, functional integration and statistics of continuous fluids and plasmas

    Tasso, H.

    1985-05-01

    Generalized Hamiltonian formalism including generalized Poisson brackets and Lie-Poisson brackets is presented in Section II. Gyroviscous magnetohydrodynamics is treated as a relevant example in Euler and Clebsch variables. Section III is devoted to a short review of functional integration containing the definition and a discussion of ambiguities and methods of evaluation. The main part of the contribution is given in Section IV, where some of the content of the previous sections is applied to Gibbs statistics of continuous fluids and plasmas. In particular, exact fluctuation spectra are calculated for relevant equations in fluids and plasmas. (orig.)

  14. DYNAMIC STABILITY OF THE SOLAR SYSTEM: STATISTICALLY INCONCLUSIVE RESULTS FROM ENSEMBLE INTEGRATIONS

    Zeebe, Richard E., E-mail: zeebe@soest.hawaii.edu [School of Ocean and Earth Science and Technology, University of Hawaii at Manoa, 1000 Pope Road, MSB 629, Honolulu, HI 96822 (United States)

    2015-01-01

    Due to the chaotic nature of the solar system, the question of its long-term stability can only be answered in a statistical sense, for instance, based on numerical ensemble integrations of nearby orbits. Destabilization of the inner planets, leading to close encounters and/or collisions can be initiated through a large increase in Mercury's eccentricity, with a currently assumed likelihood of ∼1%. However, little is known at present about the robustness of this number. Here I report ensemble integrations of the full equations of motion of the eight planets and Pluto over 5 Gyr, including contributions from general relativity. The results show that different numerical algorithms lead to statistically different results for the evolution of Mercury's eccentricity (e{sub M}). For instance, starting at present initial conditions (e{sub M}≃0.21), Mercury's maximum eccentricity achieved over 5 Gyr is, on average, significantly higher in symplectic ensemble integrations using heliocentric rather than Jacobi coordinates and stricter error control. In contrast, starting at a possible future configuration (e{sub M}≃0.53), Mercury's maximum eccentricity achieved over the subsequent 500 Myr is, on average, significantly lower using heliocentric rather than Jacobi coordinates. For example, the probability for e{sub M} to increase beyond 0.53 over 500 Myr is >90% (Jacobi) versus only 40%-55% (heliocentric). This poses a dilemma because the physical evolution of the real system—and its probabilistic behavior—cannot depend on the coordinate system or the numerical algorithm chosen to describe it. Some tests of the numerical algorithms suggest that symplectic integrators using heliocentric coordinates underestimate the odds for destabilization of Mercury's orbit at high initial e{sub M}.

  15. Predicting the accumulated number of plugged tubes in a steam generator using statistical methodologies

    Ferng, Y.-M.; Fan, C.N.; Pei, B.S.; Li, H.-N.

    2008-01-01

    A steam generator (SG) plays a significant role not only with respect to the primary-to-secondary heat transfer but also as a fission product barrier to prevent the release of radionuclides. Tube plugging is an efficient way to avoid releasing radionuclides when SG tubes are severely degraded. However, this remedial action may cause the decrease of SG heat transfer capability, especially in transient or accident conditions. It is therefore crucial for the plant staff to understand the trend of plugged tubes for the SG operation and maintenance. Statistical methodologies are proposed in this paper to predict this trend. The accumulated numbers of SG plugged tubes versus the operation time are predicted using the Weibull and log-normal distributions, which correspond well with the plant measured data from a selected pressurized water reactor (PWR). With the help of these predictions, the accumulated number of SG plugged tubes can be reasonably extrapolated to the 40-year operation lifetime (or even longer than 40 years) of a PWR. This information can assist the plant policymakers to determine whether or when a SG must be replaced

  16. Response statistics of rotating shaft with non-linear elastic restoring forces by path integration

    Gaidai, Oleg; Naess, Arvid; Dimentberg, Michael

    2017-07-01

    Extreme statistics of random vibrations is studied for a Jeffcott rotor under uniaxial white noise excitation. Restoring force is modelled as elastic non-linear; comparison is done with linearized restoring force to see the force non-linearity effect on the response statistics. While for the linear model analytical solutions and stability conditions are available, it is not generally the case for non-linear system except for some special cases. The statistics of non-linear case is studied by applying path integration (PI) method, which is based on the Markov property of the coupled dynamic system. The Jeffcott rotor response statistics can be obtained by solving the Fokker-Planck (FP) equation of the 4D dynamic system. An efficient implementation of PI algorithm is applied, namely fast Fourier transform (FFT) is used to simulate dynamic system additive noise. The latter allows significantly reduce computational time, compared to the classical PI. Excitation is modelled as Gaussian white noise, however any kind distributed white noise can be implemented with the same PI technique. Also multidirectional Markov noise can be modelled with PI in the same way as unidirectional. PI is accelerated by using Monte Carlo (MC) estimated joint probability density function (PDF) as initial input. Symmetry of dynamic system was utilized to afford higher mesh resolution. Both internal (rotating) and external damping are included in mechanical model of the rotor. The main advantage of using PI rather than MC is that PI offers high accuracy in the probability distribution tail. The latter is of critical importance for e.g. extreme value statistics, system reliability, and first passage probability.

  17. Proposal for future diagnosis and management of vascular tumors by using automatic software for image processing and statistic prediction.

    Popescu, M D; Draghici, L; Secheli, I; Secheli, M; Codrescu, M; Draghici, I

    2015-01-01

    Infantile Hemangiomas (IH) are the most frequent tumors of vascular origin, and the differential diagnosis from vascular malformations is difficult to establish. Specific types of IH due to the location, dimensions and fast evolution, can determine important functional and esthetic sequels. To avoid these unfortunate consequences it is necessary to establish the exact appropriate moment to begin the treatment and decide which the most adequate therapeutic procedure is. Based on clinical data collected by a serial clinical observations correlated with imaging data, and processed by a computer-aided diagnosis system (CAD), the study intended to develop a treatment algorithm to accurately predict the best final results, from the esthetical and functional point of view, for a certain type of lesion. The preliminary database was composed of 75 patients divided into 4 groups according to the treatment management they received: medical therapy, sclerotherapy, surgical excision and no treatment. The serial clinical observation was performed each month and all the data was processed by using CAD. The project goal was to create a software that incorporated advanced methods to accurately measure the specific IH lesions, integrated medical information, statistical methods and computational methods to correlate this information with that obtained from the processing of images. Based on these correlations, a prediction mechanism of the evolution of hemangioma, which helped determine the best method of therapeutic intervention to minimize further complications, was established.

  18. Integrated Detection and Prediction of Influenza Activity for Real-Time Surveillance: Algorithm Design.

    Spreco, Armin; Eriksson, Olle; Dahlström, Örjan; Cowling, Benjamin John; Timpka, Toomas

    2017-06-15

    Influenza is a viral respiratory disease capable of causing epidemics that represent a threat to communities worldwide. The rapidly growing availability of electronic "big data" from diagnostic and prediagnostic sources in health care and public health settings permits advance of a new generation of methods for local detection and prediction of winter influenza seasons and influenza pandemics. The aim of this study was to present a method for integrated detection and prediction of influenza virus activity in local settings using electronically available surveillance data and to evaluate its performance by retrospective application on authentic data from a Swedish county. An integrated detection and prediction method was formally defined based on a design rationale for influenza detection and prediction methods adapted for local surveillance. The novel method was retrospectively applied on data from the winter influenza season 2008-09 in a Swedish county (population 445,000). Outcome data represented individuals who met a clinical case definition for influenza (based on International Classification of Diseases version 10 [ICD-10] codes) from an electronic health data repository. Information from calls to a telenursing service in the county was used as syndromic data source. The novel integrated detection and prediction method is based on nonmechanistic statistical models and is designed for integration in local health information systems. The method is divided into separate modules for detection and prediction of local influenza virus activity. The function of the detection module is to alert for an upcoming period of increased load of influenza cases on local health care (using influenza-diagnosis data), whereas the function of the prediction module is to predict the timing of the activity peak (using syndromic data) and its intensity (using influenza-diagnosis data). For detection modeling, exponential regression was used based on the assumption that the beginning

  19. Overview of statistical methods, models and analysis for predicting equipment end of life

    NONE

    2009-07-01

    Utility equipment can be operated and maintained for many years following installation. However, as the equipment ages, utility operators must decide whether to extend the service life or replace the equipment. Condition assessment modelling is used by many utilities to determine the condition of equipment and to prioritize the maintenance or repair. Several factors are weighted and combined in assessment modelling, which gives a single index number to rate the equipment. There is speculation that this index alone may not be adequate for a business case to rework or replace an asset because it only ranks an asset into a particular category. For that reason, a new methodology was developed to determine the economic end of life of an asset. This paper described the different statistical methods available and their use in determining the remaining service life of electrical equipment. A newly developed Excel-based demonstration computer tool is also an integral part of the deliverables of this project.

  20. Integration of statistical modeling and high-content microscopy to systematically investigate cell-substrate interactions.

    Chen, Wen Li Kelly; Likhitpanichkul, Morakot; Ho, Anthony; Simmons, Craig A

    2010-03-01

    Cell-substrate interactions are multifaceted, involving the integration of various physical and biochemical signals. The interactions among these microenvironmental factors cannot be facilely elucidated and quantified by conventional experimentation, and necessitate multifactorial strategies. Here we describe an approach that integrates statistical design and analysis of experiments with automated microscopy to systematically investigate the combinatorial effects of substrate-derived stimuli (substrate stiffness and matrix protein concentration) on mesenchymal stem cell (MSC) spreading, proliferation and osteogenic differentiation. C3H10T1/2 cells were grown on type I collagen- or fibronectin-coated polyacrylamide hydrogels with tunable mechanical properties. Experimental conditions, which were defined according to central composite design, consisted of specific permutations of substrate stiffness (3-144 kPa) and adhesion protein concentration (7-520 microg/mL). Spreading area, BrdU incorporation and Runx2 nuclear translocation were quantified using high-content microscopy and modeled as mathematical functions of substrate stiffness and protein concentration. The resulting response surfaces revealed distinct patterns of protein-specific, substrate stiffness-dependent modulation of MSC proliferation and differentiation, demonstrating the advantage of statistical modeling in the detection and description of higher-order cellular responses. In a broader context, this approach can be adapted to study other types of cell-material interactions and can facilitate the efficient screening and optimization of substrate properties for applications involving cell-material interfaces. Copyright 2009 Elsevier Ltd. All rights reserved.

  1. Measuring the data universe data integration using statistical data and metadata exchange

    Stahl, Reinhold

    2018-01-01

    This richly illustrated book provides an easy-to-read introduction to the challenges of organizing and integrating modern data worlds, explaining the contribution of public statistics and the ISO standard SDMX (Statistical Data and Metadata Exchange). As such, it is a must for data experts as well those aspiring to become one. Today, exponentially growing data worlds are increasingly determining our professional and private lives. The rapid increase in the amount of globally available data, fueled by search engines and social networks but also by new technical possibilities such as Big Data, offers great opportunities. But whatever the undertaking – driving the block chain revolution or making smart phones even smarter – success will be determined by how well it is possible to integrate, i.e. to collect, link and evaluate, the required data. One crucial factor in this is the introduction of a cross-domain order system in combination with a standardization of the data structure. Using everyday examples, th...

  2. Five year prediction of Sea Surface Temperature in the Tropical Atlantic: a comparison of simple statistical methods

    Laepple, Thomas; Jewson, Stephen; Meagher, Jonathan; O'Shay, Adam; Penzer, Jeremy

    2007-01-01

    We are developing schemes that predict future hurricane numbers by first predicting future sea surface temperatures (SSTs), and then apply the observed statistical relationship between SST and hurricane numbers. As part of this overall goal, in this study we compare the historical performance of three simple statistical methods for making five-year SST forecasts. We also present SST forecasts for 2006-2010 using these methods and compare them to forecasts made from two structural time series ...

  3. Archaeology Through Computational Linguistics: Inscription Statistics Predict Excavation Sites of Indus Valley Artifacts.

    Recchia, Gabriel L; Louwerse, Max M

    2016-11-01

    Computational techniques comparing co-occurrences of city names in texts allow the relative longitudes and latitudes of cities to be estimated algorithmically. However, these techniques have not been applied to estimate the provenance of artifacts with unknown origins. Here, we estimate the geographic origin of artifacts from the Indus Valley Civilization, applying methods commonly used in cognitive science to the Indus script. We show that these methods can accurately predict the relative locations of archeological sites on the basis of artifacts of known provenance, and we further apply these techniques to determine the most probable excavation sites of four sealings of unknown provenance. These findings suggest that inscription statistics reflect historical interactions among locations in the Indus Valley region, and they illustrate how computational methods can help localize inscribed archeological artifacts of unknown origin. The success of this method offers opportunities for the cognitive sciences in general and for computational anthropology specifically. Copyright © 2015 Cognitive Science Society, Inc.

  4. Multiple-point statistical prediction on fracture networks at Yucca Mountain

    Liu, X.Y; Zhang, C.Y.; Liu, Q.S.; Birkholzer, J.T.

    2009-01-01

    In many underground nuclear waste repository systems, such as at Yucca Mountain, water flow rate and amount of water seepage into the waste emplacement drifts are mainly determined by hydrological properties of fracture network in the surrounding rock mass. Natural fracture network system is not easy to describe, especially with respect to its connectivity which is critically important for simulating the water flow field. In this paper, we introduced a new method for fracture network description and prediction, termed multi-point-statistics (MPS). The process of the MPS method is to record multiple-point statistics concerning the connectivity patterns of a fracture network from a known fracture map, and to reproduce multiple-scale training fracture patterns in a stochastic manner, implicitly and directly. It is applied to fracture data to study flow field behavior at the Yucca Mountain waste repository system. First, the MPS method is used to create a fracture network with an original fracture training image from Yucca Mountain dataset. After we adopt a harmonic and arithmetic average method to upscale the permeability to a coarse grid, THM simulation is carried out to study near-field water flow in the surrounding waste emplacement drifts. Our study shows that connectivity or patterns of fracture networks can be grasped and reconstructed by MPS methods. In theory, it will lead to better prediction of fracture system characteristics and flow behavior. Meanwhile, we can obtain variance from flow field, which gives us a way to quantify model uncertainty even in complicated coupled THM simulations. It indicates that MPS can potentially characterize and reconstruct natural fracture networks in a fractured rock mass with advantages of quantifying connectivity of fracture system and its simulation uncertainty simultaneously.

  5. Shape-correlated deformation statistics for respiratory motion prediction in 4D lung

    Liu, Xiaoxiao; Oguz, Ipek; Pizer, Stephen M.; Mageras, Gig S.

    2010-02-01

    4D image-guided radiation therapy (IGRT) for free-breathing lungs is challenging due to the complicated respiratory dynamics. Effective modeling of respiratory motion is crucial to account for the motion affects on the dose to tumors. We propose a shape-correlated statistical model on dense image deformations for patient-specic respiratory motion estimation in 4D lung IGRT. Using the shape deformations of the high-contrast lungs as the surrogate, the statistical model trained from the planning CTs can be used to predict the image deformation during delivery verication time, with the assumption that the respiratory motion at both times are similar for the same patient. Dense image deformation fields obtained by diffeomorphic image registrations characterize the respiratory motion within one breathing cycle. A point-based particle optimization algorithm is used to obtain the shape models of lungs with group-wise surface correspondences. Canonical correlation analysis (CCA) is adopted in training to maximize the linear correlation between the shape variations of the lungs and the corresponding dense image deformations. Both intra- and inter-session CT studies are carried out on a small group of lung cancer patients and evaluated in terms of the tumor location accuracies. The results suggest potential applications using the proposed method.

  6. Microgravity Disturbance Predictions in the Combustion Integrated Rack

    Just, M.; Grodsinsky, Carlos M.

    2002-01-01

    This paper will focus on the approach used to characterize microgravity disturbances in the Combustion Integrated Rack (CIR), currently scheduled for launch to the International Space Station (ISS) in 2005. Microgravity experiments contained within the CIR are extremely sensitive to vibratory and transient disturbances originating on-board and off-board the rack. Therefore, several techniques are implemented to isolate the critical science locations from external vibration. A combined testing and analysis approach is utilized to predict the resulting microgravity levels at the critical science location. The major topics to be addressed are: 1) CIR Vibration Isolation Approaches, 2) Disturbance Sources and Characterization, 3) Microgravity Predictive Modeling, 4) Science Microgravity Requirements, 6) Microgravity Control, and 7) On-Orbit Disturbance Measurement. The CIR is using the Passive Rack Isolation System (PaRIS) to isolate the rack from offboard rack disturbances. By utilizing this system, CIR is connected to the U.S. Lab module structure by either 13 or 14 umbilical lines and 8 spring / damper isolators. Some on-board CIR disturbers are locally isolated by grommets or wire ropes. CIR's environmental and science on board support equipment such as air circulation fans, pumps, water flow, air flow, solenoid valves, and computer hard drives cause disturbances within the rack. These disturbers along with the rack structure must be characterized to predict whether the on-orbit vibration levels during experimentation exceed the specified science microgravity vibration level requirements. Both vibratory and transient disturbance conditions are addressed. Disturbance levels/analytical inputs are obtained for each individual disturber in a "free floating" condition in the Glenn Research Center (GRC) Microgravity Emissions Lab (MEL). Flight spare hardware is tested on an Orbital Replacement Unit (ORU) basis. Based on test and analysis, maximum disturbance level

  7. Model Predictive Control of Integrated Gasification Combined Cycle Power Plants

    B. Wayne Bequette; Priyadarshi Mahapatra

    2010-08-31

    The primary project objectives were to understand how the process design of an integrated gasification combined cycle (IGCC) power plant affects the dynamic operability and controllability of the process. Steady-state and dynamic simulation models were developed to predict the process behavior during typical transients that occur in plant operation. Advanced control strategies were developed to improve the ability of the process to follow changes in the power load demand, and to improve performance during transitions between power levels. Another objective of the proposed work was to educate graduate and undergraduate students in the application of process systems and control to coal technology. Educational materials were developed for use in engineering courses to further broaden this exposure to many students. ASPENTECH software was used to perform steady-state and dynamic simulations of an IGCC power plant. Linear systems analysis techniques were used to assess the steady-state and dynamic operability of the power plant under various plant operating conditions. Model predictive control (MPC) strategies were developed to improve the dynamic operation of the power plants. MATLAB and SIMULINK software were used for systems analysis and control system design, and the SIMULINK functionality in ASPEN DYNAMICS was used to test the control strategies on the simulated process. Project funds were used to support a Ph.D. student to receive education and training in coal technology and the application of modeling and simulation techniques.

  8. Estimating cross-validatory predictive p-values with integrated importance sampling for disease mapping models.

    Li, Longhai; Feng, Cindy X; Qiu, Shi

    2017-06-30

    An important statistical task in disease mapping problems is to identify divergent regions with unusually high or low risk of disease. Leave-one-out cross-validatory (LOOCV) model assessment is the gold standard for estimating predictive p-values that can flag such divergent regions. However, actual LOOCV is time-consuming because one needs to rerun a Markov chain Monte Carlo analysis for each posterior distribution in which an observation is held out as a test case. This paper introduces a new method, called integrated importance sampling (iIS), for estimating LOOCV predictive p-values with only Markov chain samples drawn from the posterior based on a full data set. The key step in iIS is that we integrate away the latent variables associated the test observation with respect to their conditional distribution without reference to the actual observation. By following the general theory for importance sampling, the formula used by iIS can be proved to be equivalent to the LOOCV predictive p-value. We compare iIS and other three existing methods in the literature with two disease mapping datasets. Our empirical results show that the predictive p-values estimated with iIS are almost identical to the predictive p-values estimated with actual LOOCV and outperform those given by the existing three methods, namely, the posterior predictive checking, the ordinary importance sampling, and the ghosting method by Marshall and Spiegelhalter (2003). Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  9. Integrating statistical and process-based models to produce probabilistic landslide hazard at regional scale

    Strauch, R. L.; Istanbulluoglu, E.

    2017-12-01

    We develop a landslide hazard modeling approach that integrates a data-driven statistical model and a probabilistic process-based shallow landslide model for mapping probability of landslide initiation, transport, and deposition at regional scales. The empirical model integrates the influence of seven site attribute (SA) classes: elevation, slope, curvature, aspect, land use-land cover, lithology, and topographic wetness index, on over 1,600 observed landslides using a frequency ratio (FR) approach. A susceptibility index is calculated by adding FRs for each SA on a grid-cell basis. Using landslide observations we relate susceptibility index to an empirically-derived probability of landslide impact. This probability is combined with results from a physically-based model to produce an integrated probabilistic map. Slope was key in landslide initiation while deposition was linked to lithology and elevation. Vegetation transition from forest to alpine vegetation and barren land cover with lower root cohesion leads to higher frequency of initiation. Aspect effects are likely linked to differences in root cohesion and moisture controlled by solar insulation and snow. We demonstrate the model in the North Cascades of Washington, USA and identify locations of high and low probability of landslide impacts that can be used by land managers in their design, planning, and maintenance.

  10. Prediction of thrombophilia in patients with unexplained recurrent pregnancy loss using a statistical model.

    Wang, Tongfei; Kang, Xiaomin; He, Liying; Liu, Zhilan; Xu, Haijing; Zhao, Aimin

    2017-09-01

    To establish a statistical model to predict thrombophilia in patients with unexplained recurrent pregnancy loss (URPL). A retrospective case-control study was conducted at Ren Ji Hospital, Shanghai, China, from March 2014 to October 2016. The levels of D-dimer (DD), fibrinogen degradation products (FDP), activated partial thromboplastin time (APTT), prothrombin time (PT), thrombin time (TT), fibrinogen (Fg), and platelet aggregation in response to arachidonic acid (AA) and adenosine diphosphate (ADP) were collected. Receiver operating characteristic curve analysis was used to analyze data from 158 UPRL patients (≥3 previous first trimester pregnancy losses with unexplained etiology) and 131 non-RPL patients (no history of recurrent pregnancy loss). A logistic regression model (LRM) was built and the model was externally validated in another group of patients. The LRM included AA, DD, FDP, TT, APTT, and PT. The overall accuracy of the LRM was 80.9%, with sensitivity and specificity of 78.5% and 78.3%, respectively. The diagnostic threshold of the possibility of the LRM was 0.6492, with a sensitivity of 78.5% and a specificity of 78.3%. Subsequently, the LRM was validated with an overall accuracy of 83.6%. The LRM is a valuable model for prediction of thrombophilia in URPL patients. © 2017 International Federation of Gynecology and Obstetrics.

  11. Development of the statistical ARIMA model: an application for predicting the upcoming of MJO index

    Hermawan, Eddy; Nurani Ruchjana, Budi; Setiawan Abdullah, Atje; Gede Nyoman Mindra Jaya, I.; Berliana Sipayung, Sinta; Rustiana, Shailla

    2017-10-01

    This study is mainly concerned in development one of the most important equatorial atmospheric phenomena that we call as the Madden Julian Oscillation (MJO) which having strong impacts to the extreme rainfall anomalies over the Indonesian Maritime Continent (IMC). In this study, we focused to the big floods over Jakarta and surrounded area that suspecting caused by the impacts of MJO. We concentrated to develop the MJO index using the statistical model that we call as Box-Jenkis (ARIMA) ini 1996, 2002, and 2007, respectively. They are the RMM (Real Multivariate MJO) index as represented by RMM1 and RMM2, respectively. There are some steps to develop that model, starting from identification of data, estimated, determined model, before finally we applied that model for investigation some big floods that occurred at Jakarta in 1996, 2002, and 2007 respectively. We found the best of estimated model for the RMM1 and RMM2 prediction is ARIMA (2,1,2). Detailed steps how that model can be extracted and applying to predict the rainfall anomalies over Jakarta for 3 to 6 months later is discussed at this paper.

  12. Statistics of return intervals between long heartbeat intervals and their usability for online prediction of disorders

    Bogachev, Mikhail I; Bunde, Armin; Kireenkov, Igor S; Nifontov, Eugene M

    2009-01-01

    We study the statistics of return intervals between large heartbeat intervals (above a certain threshold Q) in 24 h records obtained from healthy subjects. We find that both the linear and the nonlinear long-term memory inherent in the heartbeat intervals lead to power-laws in the probability density function P Q (r) of the return intervals. As a consequence, the probability W Q (t; Δt) that at least one large heartbeat interval will occur within the next Δt heartbeat intervals, with an increasing elapsed number of intervals t after the last large heartbeat interval, follows a power-law. Based on these results, we suggest a method of obtaining a priori information about the occurrence of the next large heartbeat interval, and thus to predict it. We show explicitly that the proposed method, which exploits long-term memory, is superior to the conventional precursory pattern recognition technique, which focuses solely on short-term memory. We believe that our results can be straightforwardly extended to obtain more reliable predictions in other physiological signals like blood pressure, as well as in other complex records exhibiting multifractal behaviour, e.g. turbulent flow, precipitation, river flows and network traffic.

  13. Saccadic gain adaptation is predicted by the statistics of natural fluctuations in oculomotor function

    Mark V Albert

    2012-12-01

    Full Text Available Due to multiple factors such as fatigue, muscle strengthening, and neural plasticity, the responsiveness of the motor apparatus to neural commands changes over time. To enable precise movements the nervous system must adapt to compensate for these changes. Recent models of motor adaptation derive from assumptions about the way the motor apparatus changes. Characterizing these changes is difficult because motor adaptation happens at the same time, masking most of the effects of ongoing changes. Here, we analyze eye movements of monkeys with lesions to the posterior cerebellar vermis that impair adaptation. Their fluctuations better reveal the underlying changes of the motor system over time. When these measured, unadapted changes are used to derive optimal motor adaptation rules the prediction precision significantly improves. Among three models that similarly fit single-day adaptation results, the model that also matches the temporal correlations of the nonadapting saccades most accurately predicts multiple day adaptation. Saccadic gain adaptation is well matched to the natural statistics of fluctuations of the oculomotor plant.

  14. A RANS knock model to predict the statistical occurrence of engine knock

    D'Adamo, Alessandro; Breda, Sebastiano; Fontanesi, Stefano; Irimescu, Adrian; Merola, Simona Silvia; Tornatore, Cinzia

    2017-01-01

    Highlights: • Development of a new RANS model for SI engine knock probability. • Turbulence-derived transport equations for variances of mixture fraction and enthalpy. • Gasoline autoignition delay times calculated from detailed chemical kinetics. • Knock probability validated against experiments on optically accessible GDI unit. • PDF-based knock model accounting for the random nature of SI engine knock in RANS simulations. - Abstract: In the recent past engine knock emerged as one of the main limiting aspects for the achievement of higher efficiency targets in modern spark-ignition (SI) engines. To attain these requirements, engine operating points must be moved as close as possible to the onset of abnormal combustions, although the turbulent nature of flow field and SI combustion leads to possibly ample fluctuations between consecutive engine cycles. This forces engine designers to distance the target condition from its theoretical optimum in order to prevent abnormal combustion, which can potentially damage engine components because of few individual heavy-knocking cycles. A statistically based RANS knock model is presented in this study, whose aim is the prediction not only of the ensemble average knock occurrence, poorly meaningful in such a stochastic event, but also of a knock probability. The model is based on look-up tables of autoignition times from detailed chemistry, coupled with transport equations for the variance of mixture fraction and enthalpy. The transported perturbations around the ensemble average value are based on variable gradients and on a local turbulent time scale. A multi-variate cell-based Gaussian-PDF model is proposed for the unburnt mixture, resulting in a statistical distribution for the in-cell reaction rate. An average knock precursor and its variance are independently calculated and transported; this results in the prediction of an earliest knock probability preceding the ensemble average knock onset, as confirmed by

  15. Performance prediction for silicon photonics integrated circuits with layout-dependent correlated manufacturing variability.

    Lu, Zeqin; Jhoja, Jaspreet; Klein, Jackson; Wang, Xu; Liu, Amy; Flueckiger, Jonas; Pond, James; Chrostowski, Lukas

    2017-05-01

    This work develops an enhanced Monte Carlo (MC) simulation methodology to predict the impacts of layout-dependent correlated manufacturing variations on the performance of photonics integrated circuits (PICs). First, to enable such performance prediction, we demonstrate a simple method with sub-nanometer accuracy to characterize photonics manufacturing variations, where the width and height for a fabricated waveguide can be extracted from the spectral response of a racetrack resonator. By measuring the spectral responses for a large number of identical resonators spread over a wafer, statistical results for the variations of waveguide width and height can be obtained. Second, we develop models for the layout-dependent enhanced MC simulation. Our models use netlist extraction to transfer physical layouts into circuit simulators. Spatially correlated physical variations across the PICs are simulated on a discrete grid and are mapped to each circuit component, so that the performance for each component can be updated according to its obtained variations, and therefore, circuit simulations take the correlated variations between components into account. The simulation flow and theoretical models for our layout-dependent enhanced MC simulation are detailed in this paper. As examples, several ring-resonator filter circuits are studied using the developed enhanced MC simulation, and statistical results from the simulations can predict both common-mode and differential-mode variations of the circuit performance.

  16. Frontoparietal white matter integrity predicts haptic performance in chronic stroke

    Alexandra L. Borstad

    2016-01-01

    . Age strongly correlated with the shared variance across tracts in the control, but not in the poststroke participants. A moderate to good relationship was found between ipsilesional T–M1 MD and affected hand HASTe score (r = −0.62, p = 0.006 and less affected hand HASTe score (r = −0.53, p = 0.022. Regression analysis revealed approximately 90% of the variance in affected hand HASTe score was predicted by the white matter integrity in the frontoparietal network (as indexed by MD in poststroke participants while 87% of the variance in HASTe score was predicted in control participants. This study demonstrates the importance of frontoparietal white matter in mediating haptic performance and specifically identifies that T–M1 and precuneus interhemispheric tracts may be appropriate targets for piloting rehabilitation interventions, such as noninvasive brain stimulation, when the goal is to improve poststroke haptic performance.

  17. Frontoparietal white matter integrity predicts haptic performance in chronic stroke.

    Borstad, Alexandra L; Choi, Seongjin; Schmalbrock, Petra; Nichols-Larsen, Deborah S

    2016-01-01

    strongly correlated with the shared variance across tracts in the control, but not in the poststroke participants. A moderate to good relationship was found between ipsilesional T-M1 MD and affected hand HASTe score (r = - 0.62, p = 0.006) and less affected hand HASTe score (r = - 0.53, p = 0.022). Regression analysis revealed approximately 90% of the variance in affected hand HASTe score was predicted by the white matter integrity in the frontoparietal network (as indexed by MD) in poststroke participants while 87% of the variance in HASTe score was predicted in control participants. This study demonstrates the importance of frontoparietal white matter in mediating haptic performance and specifically identifies that T-M1 and precuneus interhemispheric tracts may be appropriate targets for piloting rehabilitation interventions, such as noninvasive brain stimulation, when the goal is to improve poststroke haptic performance.

  18. Solvability of a class of systems of infinite-dimensional integral equations and their application in statistical mechanics

    Gonchar, N.S.

    1986-01-01

    This paper presents a mathematical method developed for investigating a class of systems of infinite-dimensional integral equations which have application in statistical mechanics. Necessary and sufficient conditions are obtained for the uniqueness and bifurcation of the solution of this class of systems of equations. Problems of equilibrium statistical mechanics are considered on the basis of this method

  19. Predicting the Diagnostic and Statistical Manual of Mental Disorders (Fifth Edition): The Mystery of How to Constrain Unchecked Growth.

    Blashfield, Roger K; Fuller, A Kenneth

    2016-06-01

    Twenty years ago, slightly after the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition was published, we predicted the characteristics of the future Diagnostic and Statistical Manual of Mental Disorders (fifth edition) (). Included in our predictions were how many diagnoses it would contain, the physical size of the Diagnostic and Statistical Manual of Mental Disorders (fifth edition), who its leader would be, how many professionals would be involved in creating it, the revenue generated, and the color of its cover. This article reports on the accuracy of our predictions. Our largest prediction error concerned financial revenue. The earnings growth of the DSM's has been remarkable. Drug company investments, insurance benefits, the financial need of the American Psychiatric Association, and the research grant process are factors that have stimulated the growth of the DSM's. Restoring order and simplicity to the classification of mental disorders will not be a trivial task.

  20. Predictability of the recent slowdown and subsequent recovery of large-scale surface warming using statistical methods

    Mann, Michael E.; Steinman, Byron A.; Miller, Sonya K.; Frankcombe, Leela M.; England, Matthew H.; Cheung, Anson H.

    2016-04-01

    The temporary slowdown in large-scale surface warming during the early 2000s has been attributed to both external and internal sources of climate variability. Using semiempirical estimates of the internal low-frequency variability component in Northern Hemisphere, Atlantic, and Pacific surface temperatures in concert with statistical hindcast experiments, we investigate whether the slowdown and its recent recovery were predictable. We conclude that the internal variability of the North Pacific, which played a critical role in the slowdown, does not appear to have been predictable using statistical forecast methods. An additional minor contribution from the North Atlantic, by contrast, appears to exhibit some predictability. While our analyses focus on combining semiempirical estimates of internal climatic variability with statistical hindcast experiments, possible implications for initialized model predictions are also discussed.

  1. Statistical and Machine-Learning Data Mining Techniques for Better Predictive Modeling and Analysis of Big Data

    Ratner, Bruce

    2011-01-01

    The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has

  2. Statistical Indicators of Ethno-Cultural Community Integration in Canadian Society = Indicateurs statistiques de l'integration des communautes ethnoculturelles dans la societe canadienne.

    de Vries, John

    This paper addresses the issue of measuring the integration of various ethnocultural communities into Canadian society by means of statistical or social indicators. The overall philosophy of the study is based on the following principles: (1) indicators should have a clear meaning with respect to the underlying concept of integration; (2)…

  3. Toward integration of genomic selection with crop modelling: the development of an integrated approach to predicting rice heading dates.

    Onogi, Akio; Watanabe, Maya; Mochizuki, Toshihiro; Hayashi, Takeshi; Nakagawa, Hiroshi; Hasegawa, Toshihiro; Iwata, Hiroyoshi

    2016-04-01

    It is suggested that accuracy in predicting plant phenotypes can be improved by integrating genomic prediction with crop modelling in a single hierarchical model. Accurate prediction of phenotypes is important for plant breeding and management. Although genomic prediction/selection aims to predict phenotypes on the basis of whole-genome marker information, it is often difficult to predict phenotypes of complex traits in diverse environments, because plant phenotypes are often influenced by genotype-environment interaction. A possible remedy is to integrate genomic prediction with crop/ecophysiological modelling, which enables us to predict plant phenotypes using environmental and management information. To this end, in the present study, we developed a novel method for integrating genomic prediction with phenological modelling of Asian rice (Oryza sativa, L.), allowing the heading date of untested genotypes in untested environments to be predicted. The method simultaneously infers the phenological model parameters and whole-genome marker effects on the parameters in a Bayesian framework. By cultivating backcross inbred lines of Koshihikari × Kasalath in nine environments, we evaluated the potential of the proposed method in comparison with conventional genomic prediction, phenological modelling, and two-step methods that applied genomic prediction to phenological model parameters inferred from Nelder-Mead or Markov chain Monte Carlo algorithms. In predicting heading dates of untested lines in untested environments, the proposed and two-step methods tended to provide more accurate predictions than the conventional genomic prediction methods, particularly in environments where phenotypes from environments similar to the target environment were unavailable for training genomic prediction. The proposed method showed greater accuracy in prediction than the two-step methods in all cross-validation schemes tested, suggesting the potential of the integrated approach in

  4. Integrated Data Collection Analysis (IDCA) Program - Statistical Analysis of RDX Standard Data Sets

    Sandstrom, Mary M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Brown, Geoffrey W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Preston, Daniel N. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Pollard, Colin J. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Warner, Kirstin F. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Sorensen, Daniel N. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Remmers, Daniel L. [Naval Surface Warfare Center (NSWC), Indian Head, MD (United States). Indian Head Division; Phillips, Jason J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Shelley, Timothy J. [Air Force Research Lab. (AFRL), Tyndall AFB, FL (United States); Reyes, Jose A. [Applied Research Associates, Tyndall AFB, FL (United States); Hsu, Peter C. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Reynolds, John G. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2015-10-30

    The Integrated Data Collection Analysis (IDCA) program is conducting a Proficiency Test for Small- Scale Safety and Thermal (SSST) testing of homemade explosives (HMEs). Described here are statistical analyses of the results for impact, friction, electrostatic discharge, and differential scanning calorimetry analysis of the RDX Type II Class 5 standard. The material was tested as a well-characterized standard several times during the proficiency study to assess differences among participants and the range of results that may arise for well-behaved explosive materials. The analyses show that there are detectable differences among the results from IDCA participants. While these differences are statistically significant, most of them can be disregarded for comparison purposes to assess potential variability when laboratories attempt to measure identical samples using methods assumed to be nominally the same. The results presented in this report include the average sensitivity results for the IDCA participants and the ranges of values obtained. The ranges represent variation about the mean values of the tests of between 26% and 42%. The magnitude of this variation is attributed to differences in operator, method, and environment as well as the use of different instruments that are also of varying age. The results appear to be a good representation of the broader safety testing community based on the range of methods, instruments, and environments included in the IDCA Proficiency Test.

  5. Integrated model for predicting rice yield with climate change

    Park, Jin-Ki; Das, Amrita; Park, Jong-Hwa

    2018-04-01

    Rice is the chief agricultural product and one of the primary food source. For this reason, it is of pivotal importance for worldwide economy and development. Therefore, in a decision-support-system both for the farmers and in the planning and management of the country's economy, forecasting yield is vital. However, crop yield, which is a dependent of the soil-bio-atmospheric system, is difficult to represent in statistical language. This paper describes a novel approach for predicting rice yield using artificial neural network, spatial interpolation, remote sensing and GIS methods. Herein, the variation in the yield is attributed to climatic parameters and crop health, and the normalized difference vegetation index from MODIS is used as an indicator of plant health and growth. Due importance was given to scaling up the input parameters using spatial interpolation and GIS and minimising the sources of error in every step of the modelling. The low percentage error (2.91) and high correlation (0.76) signifies the robust performance of the proposed model. This simple but effective approach is then used to estimate the influence of climate change on South Korean rice production. As proposed in the RCP8.5 scenario, an upswing in temperature may increase the rice yield throughout South Korea.

  6. Statistical prediction of seasonal discharge in the Naryn basin for water resources planning in Central Asia

    Apel, Heiko; Gafurov, Abror; Gerlitz, Lars; Unger-Shayesteh, Katy; Vorogushyn, Sergiy; Merkushkin, Aleksandr; Merz, Bruno

    2016-04-01

    The semi-arid regions of Central Asia crucially depend on the water resources supplied by the mountainous areas of the Tien-Shan and Pamirs. During the summer months the snow and glacier melt water of the rivers originating in the mountains provides the only water resource available for agricultural production but also for water collection in reservoirs for energy production in winter months. Thus a reliable seasonal forecast of the water resources is crucial for a sustainable management and planning of water resources.. In fact, seasonal forecasts are mandatory tasks of national hydro-meteorological services in the region. Thus this study aims at a statistical forecast of the seasonal water availability, whereas the focus is put on the usage of freely available data in order to facilitate an operational use without data access limitations. The study takes the Naryn basin as a test case, at which outlet the Toktogul reservoir stores the discharge of the Naryn River. As most of the water originates form snow and glacier melt, a statistical forecast model should use data sets that can serve as proxy data for the snow masses and snow water equivalent in late spring, which essentially determines the bulk of the seasonal discharge. CRU climate data describing the precipitation and temperature in the basin during winter and spring was used as base information, which was complemented by MODIS snow cover data processed through ModSnow tool, discharge during the spring and also GRACE gravimetry anomalies. For the construction of linear forecast models monthly as well as multi-monthly means over the period January to April were used to predict the seasonal mean discharge of May-September at the station Uchterek. An automatic model selection was performed in multiple steps, whereas the best models were selected according to several performance measures and their robustness in a leave-one-out cross validation. It could be shown that the seasonal discharge can be predicted with

  7. An integrated user-friendly ArcMAP tool for bivariate statistical modeling in geoscience applications

    Jebur, M. N.; Pradhan, B.; Shafri, H. Z. M.; Yusof, Z.; Tehrany, M. S.

    2014-10-01

    Modeling and classification difficulties are fundamental issues in natural hazard assessment. A geographic information system (GIS) is a domain that requires users to use various tools to perform different types of spatial modeling. Bivariate statistical analysis (BSA) assists in hazard modeling. To perform this analysis, several calculations are required and the user has to transfer data from one format to another. Most researchers perform these calculations manually by using Microsoft Excel or other programs. This process is time consuming and carries a degree of uncertainty. The lack of proper tools to implement BSA in a GIS environment prompted this study. In this paper, a user-friendly tool, BSM (bivariate statistical modeler), for BSA technique is proposed. Three popular BSA techniques such as frequency ratio, weights-of-evidence, and evidential belief function models are applied in the newly proposed ArcMAP tool. This tool is programmed in Python and is created by a simple graphical user interface, which facilitates the improvement of model performance. The proposed tool implements BSA automatically, thus allowing numerous variables to be examined. To validate the capability and accuracy of this program, a pilot test area in Malaysia is selected and all three models are tested by using the proposed program. Area under curve is used to measure the success rate and prediction rate. Results demonstrate that the proposed program executes BSA with reasonable accuracy. The proposed BSA tool can be used in numerous applications, such as natural hazard, mineral potential, hydrological, and other engineering and environmental applications.

  8. BICEPP: an example-based statistical text mining method for predicting the binary characteristics of drugs

    Tsafnat Guy

    2011-04-01

    Full Text Available Abstract Background The identification of drug characteristics is a clinically important task, but it requires much expert knowledge and consumes substantial resources. We have developed a statistical text-mining approach (BInary Characteristics Extractor and biomedical Properties Predictor: BICEPP to help experts screen drugs that may have important clinical characteristics of interest. Results BICEPP first retrieves MEDLINE abstracts containing drug names, then selects tokens that best predict the list of drugs which represents the characteristic of interest. Machine learning is then used to classify drugs using a document frequency-based measure. Evaluation experiments were performed to validate BICEPP's performance on 484 characteristics of 857 drugs, identified from the Australian Medicines Handbook (AMH and the PharmacoKinetic Interaction Screening (PKIS database. Stratified cross-validations revealed that BICEPP was able to classify drugs into all 20 major therapeutic classes (100% and 157 (of 197 minor drug classes (80% with areas under the receiver operating characteristic curve (AUC > 0.80. Similarly, AUC > 0.80 could be obtained in the classification of 173 (of 238 adverse events (73%, up to 12 (of 15 groups of clinically significant cytochrome P450 enzyme (CYP inducers or inhibitors (80%, and up to 11 (of 14 groups of narrow therapeutic index drugs (79%. Interestingly, it was observed that the keywords used to describe a drug characteristic were not necessarily the most predictive ones for the classification task. Conclusions BICEPP has sufficient classification power to automatically distinguish a wide range of clinical properties of drugs. This may be used in pharmacovigilance applications to assist with rapid screening of large drug databases to identify important characteristics for further evaluation.

  9. Predictive statistical modelling of cadmium content in durum wheat grain based on soil parameters.

    Viala, Yoann; Laurette, Julien; Denaix, Laurence; Gourdain, Emmanuelle; Méléard, Benoit; Nguyen, Christophe; Schneider, André; Sappin-Didier, Valérie

    2017-09-01

    Regulatory limits on cadmium (Cd) content in food products are tending to become stricter, especially in cereals, which are a major contributor to dietary intake of Cd by humans. This is of particular importance for durum wheat, which accumulates more Cd than bread wheat. The contamination of durum wheat grain by Cd depends not only on the genotype but also to a large extent on soil Cd availability. Assessing the phytoavailability of Cd for durum wheat is thus crucial, and appropriate methods are required. For this purpose, we propose a statistical model to predict Cd accumulation in durum wheat grain based on soil geochemical properties related to Cd availability in French agricultural soils with low Cd contents and neutral to alkaline pH (soils commonly used to grow durum wheat). The best model is based on the concentration of total Cd in the soil solution, the pH of a soil CaCl 2 extract, the cation exchange capacity (CEC), and the content of manganese oxides (Tamm's extraction) in the soil. The model variables suggest a major influence of cadmium buffering power of the soil and of Cd speciation in solution. The model successfully explains 88% of Cd variability in grains with, generally, below 0.02 mg Cd kg -1 prediction error in wheat grain. Monte Carlo cross-validation indicated that model accuracy will suffice for the European Community project to reduce the regulatory limit from 0.2 to 0.15 mg Cd kg -1 grain, but not for the intermediate step at 0.175 mg Cd kg -1 . The model will help farmers assess the risk that the Cd content of their durum wheat grain will exceed regulatory limits, and help food safety authorities test different regulatory thresholds to find a trade-off between food safety and the negative impact a too strict regulation could have on farmers.

  10. Reproducing tailing in breakthrough curves: Are statistical models equally representative and predictive?

    Pedretti, Daniele; Bianchi, Marco

    2018-03-01

    Breakthrough curves (BTCs) observed during tracer tests in highly heterogeneous aquifers display strong tailing. Power laws are popular models for both the empirical fitting of these curves, and the prediction of transport using upscaling models based on best-fitted estimated parameters (e.g. the power law slope or exponent). The predictive capacity of power law based upscaling models can be however questioned due to the difficulties to link model parameters with the aquifers' physical properties. This work analyzes two aspects that can limit the use of power laws as effective predictive tools: (a) the implication of statistical subsampling, which often renders power laws undistinguishable from other heavily tailed distributions, such as the logarithmic (LOG); (b) the difficulties to reconcile fitting parameters obtained from models with different formulations, such as the presence of a late-time cutoff in the power law model. Two rigorous and systematic stochastic analyses, one based on benchmark distributions and the other on BTCs obtained from transport simulations, are considered. It is found that a power law model without cutoff (PL) results in best-fitted exponents (αPL) falling in the range of typical experimental values reported in the literature (1.5 tailing becomes heavier. Strong fluctuations occur when the number of samples is limited, due to the effects of subsampling. On the other hand, when the power law model embeds a cutoff (PLCO), the best-fitted exponent (αCO) is insensitive to the degree of tailing and to the effects of subsampling and tends to a constant αCO ≈ 1. In the PLCO model, the cutoff rate (λ) is the parameter that fully reproduces the persistence of the tailing and is shown to be inversely correlated to the LOG scale parameter (i.e. with the skewness of the distribution). The theoretical results are consistent with the fitting analysis of a tracer test performed during the MADE-5 experiment. It is shown that a simple

  11. An Integrated Model to Predict Corporate Failure of Listed Companies in Sri Lanka

    Nisansala Wijekoon

    2015-07-01

    Full Text Available The primary objective of this study is to develop an integrated model to predict corporate failure of listed companies in Sri Lanka. The logistic regression analysis was employed to a data set of 70 matched-pairs of failed and non-failed companies listed in the Colombo Stock Exchange (CSE in Sri Lanka over the period 2002 to 2010. A total of fifteen financial ratios and eight corporate governance variables were used as predictor variables of corporate failure. Analysis of the statistical testing results indicated that model consists with both corporate governance variables and financial ratios improved the prediction accuracy to reach 88.57 per cent one year prior to failure. Furthermore, predictive accuracy of this model in all three years prior to failure is above 80 per cent. Hence model is robust in obtaining accurate results for up to three years prior to failure. It was further found that two financial ratios, working capital to total assets and cash flow from operating activities to total assets, and two corporate governance variables, outside director ratio and company audit committee are having more explanatory power to predict corporate failure. Therefore, model developed in this study can assist investors, managers, shareholders, financial institutions, auditors and regulatory agents in Sri Lanka to forecast corporate failure of listed companies.

  12. Foreign exchange market data analysis reveals statistical features that predict price movement acceleration.

    Nacher, Jose C; Ochiai, Tomoshiro

    2012-05-01

    Increasingly accessible financial data allow researchers to infer market-dynamics-based laws and to propose models that are able to reproduce them. In recent years, several stylized facts have been uncovered. Here we perform an extensive analysis of foreign exchange data that leads to the unveiling of a statistical financial law. First, our findings show that, on average, volatility increases more when the price exceeds the highest (or lowest) value, i.e., breaks the resistance line. We call this the breaking-acceleration effect. Second, our results show that the probability P(T) to break the resistance line in the past time T follows power law in both real data and theoretically simulated data. However, the probability calculated using real data is rather lower than the one obtained using a traditional Black-Scholes (BS) model. Taken together, the present analysis characterizes a different stylized fact of financial markets and shows that the market exceeds a past (historical) extreme price fewer times than expected by the BS model (the resistance effect). However, when the market does, we predict that the average volatility at that time point will be much higher. These findings indicate that any Markovian model does not faithfully capture the market dynamics.

  13. Application of Statistical Thermodynamics To Predict the Adsorption Properties of Polypeptides in Reversed-Phase HPLC.

    Tarasova, Irina A; Goloborodko, Anton A; Perlova, Tatyana Y; Pridatchenko, Marina L; Gorshkov, Alexander V; Evreinov, Victor V; Ivanov, Alexander R; Gorshkov, Mikhail V

    2015-07-07

    The theory of critical chromatography for biomacromolecules (BioLCCC) describes polypeptide retention in reversed-phase HPLC using the basic principles of statistical thermodynamics. However, whether this theory correctly depicts a variety of empirical observations and laws introduced for peptide chromatography over the last decades remains to be determined. In this study, by comparing theoretical results with experimental data, we demonstrate that the BioLCCC: (1) fits the empirical dependence of the polypeptide retention on the amino acid sequence length with R(2) > 0.99 and allows in silico determination of the linear regression coefficients of the log-length correction in the additive model for arbitrary sequences and lengths and (2) predicts the distribution coefficients of polypeptides with an accuracy from 0.98 to 0.99 R(2). The latter enables direct calculation of the retention factors for given solvent compositions and modeling of the migration dynamics of polypeptides separated under isocratic or gradient conditions. The obtained results demonstrate that the suggested theory correctly relates the main aspects of polypeptide separation in reversed-phase HPLC.

  14. Using HIV&AIDS statistics in pre-service Mathematics Education to integrate HIV&AIDS education.

    van Laren, Linda

    2012-12-01

    In South Africa, the HIV&AIDS education policy documents indicate opportunities for integration across disciplines/subjects. There are different interpretations of integration/inclusion and mainstreaming HIV&AIDS education, and numerous levels of integration. Integration ensures that learners experience the disciplines/subjects as being linked and related, and integration is required to support and expand the learners' opportunities to attain skills, acquire knowledge and develop attitudes and values across the curriculum. This study makes use of self-study methodology where I, a teacher educator, aim to improve my practice through including HIV&AIDS statistics in Mathematics Education. This article focuses on how I used HIV&AIDS statistics to facilitate pre-service teacher reflection and introduce them to integration of HIV&AIDS education across the curriculum. After pre-service teachers were provided with HIV statistics, they drew a pie chart which graphically illustrated the situation and reflected on issues relating to HIV&AIDS. Three themes emerged from the analysis of their reflections. The themes relate to the need for further HIV&AIDS education, the changing pastoral role of teachers and the changing context of teaching. This information indicates that the use of statistics is an appropriate means of initiating the integration of HIV&AIDS education into the academic curriculum.

  15. Statistical prediction of AVB wear growth and initiation in model F steam generator tubes using Monte Carlo method

    Lee, Jae Bong; Park, Jae Hak; Kim, Hong Deok; Chung, Han Sub; Kim, Tae Ryong

    2005-01-01

    The growth of AVB wear in Model F steam generator tubes is predicted using the Monte Carlo Method and statistical approaches. The statistical parameters that represent the characteristics of wear growth and wear initiation are derived from In-Service Inspection (ISI) Non-Destructive Evaluation (NDE) data. Based on the statistical approaches, wear growth model are proposed and applied to predict wear distribution at the End Of Cycle (EOC). Probabilistic distributions of the number of wear flaws and maximum wear depth at EOC are obtained from the analysis. Comparing the predicted EOC wear flaw data with the known EOC data the usefulness of the proposed method is examined and satisfactory results are obtained

  16. Statistical prediction of AVB wear growth and initiation in model F steam generator tubes using Monte Carlo method

    Lee, Jae Bong; Park, Jae Hak [Chungbuk National Univ., Cheongju (Korea, Republic of); Kim, Hong Deok; Chung, Han Sub; Kim, Tae Ryong [Korea Electtric Power Research Institute, Daejeon (Korea, Republic of)

    2005-07-01

    The growth of AVB wear in Model F steam generator tubes is predicted using the Monte Carlo Method and statistical approaches. The statistical parameters that represent the characteristics of wear growth and wear initiation are derived from In-Service Inspection (ISI) Non-Destructive Evaluation (NDE) data. Based on the statistical approaches, wear growth model are proposed and applied to predict wear distribution at the End Of Cycle (EOC). Probabilistic distributions of the number of wear flaws and maximum wear depth at EOC are obtained from the analysis. Comparing the predicted EOC wear flaw data with the known EOC data the usefulness of the proposed method is examined and satisfactory results are obtained.

  17. The importance of vegetation change in the prediction of future tropical cyclone flood statistics

    Irish, J. L.; Resio, D.; Bilskie, M. V.; Hagen, S. C.; Weiss, R.

    2015-12-01

    Global sea level rise is a near certainty over the next century (e.g., Stocker et al. 2013 [IPCC] and references therein). With sea level rise, coastal topography and land cover (hereafter "landscape") is expected to change and tropical cyclone flood hazard is expected to accelerate (e.g., Irish et al. 2010 [Ocean Eng], Woodruff et al. 2013 [Nature], Bilskie et al. 2014 [Geophys Res Lett], Ferreira et al. 2014 [Coast Eng], Passeri et al. 2015 [Nat Hazards]). Yet, the relative importance of sea-level rise induced landscape change on future tropical cyclone flood hazard assessment is not known. In this paper, idealized scenarios are used to evaluate the relative impact of one class of landscape change on future tropical cyclone extreme-value statistics in back-barrier regions: sea level rise induced vegetation migration and loss. The joint probability method with optimal sampling (JPM-OS) (Resio et al. 2009 [Nat Hazards]) with idealized surge response functions (e.g., Irish et al. 2009 [Nat Hazards]) is used to quantify the present-day and future flood hazard under various sea level rise scenarios. Results are evaluated in terms of their impact on the flood statistics (a) when projected flood elevations are included directly in the JPM analysis (Figure 1) and (b) when represented as additional uncertainty within the JPM integral (Resio et al. 2013 [Nat Hazards]), i.e., as random error. Findings are expected to aid in determining the level of effort required to reasonably account for future landscape change in hazard assessments, namely in determining when such processes are sufficiently captured by added uncertainty and when sea level rise induced vegetation changes must be considered dynamically, via detailed modeling initiatives. Acknowledgements: This material is based upon work supported by the National Science Foundation under Grant No. CMMI-1206271 and by the National Sea Grant College Program of the U.S. Department of Commerce's National Oceanic and

  18. Machine learning and statistical techniques : an application to the prediction of insolvency in Spanish non-life insurance companies

    Díaz, Zuleyka; Segovia, María Jesús; Fernández, José

    2005-01-01

    Prediction of insurance companies insolvency has arisen as an important problem in the field of financial research. Most methods applied in the past to tackle this issue are traditional statistical techniques which use financial ratios as explicative variables. However, these variables often do not satisfy statistical assumptions, which complicates the application of the mentioned methods. In this paper, a comparative study of the performance of two non-parametric machine learning techniques ...

  19. Integration of association statistics over genomic regions using Bayesian adaptive regression splines

    Zhang Xiaohua

    2003-11-01

    Full Text Available Abstract In the search for genetic determinants of complex disease, two approaches to association analysis are most often employed, testing single loci or testing a small group of loci jointly via haplotypes for their relationship to disease status. It is still debatable which of these approaches is more favourable, and under what conditions. The former has the advantage of simplicity but suffers severely when alleles at the tested loci are not in linkage disequilibrium (LD with liability alleles; the latter should capture more of the signal encoded in LD, but is far from simple. The complexity of haplotype analysis could be especially troublesome for association scans over large genomic regions, which, in fact, is becoming the standard design. For these reasons, the authors have been evaluating statistical methods that bridge the gap between single-locus and haplotype-based tests. In this article, they present one such method, which uses non-parametric regression techniques embodied by Bayesian adaptive regression splines (BARS. For a set of markers falling within a common genomic region and a corresponding set of single-locus association statistics, the BARS procedure integrates these results into a single test by examining the class of smooth curves consistent with the data. The non-parametric BARS procedure generally finds no signal when no liability allele exists in the tested region (ie it achieves the specified size of the test and it is sensitive enough to pick up signals when a liability allele is present. The BARS procedure provides a robust and potentially powerful alternative to classical tests of association, diminishes the multiple testing problem inherent in those tests and can be applied to a wide range of data types, including genotype frequencies estimated from pooled samples.

  20. Statistical Energy Analysis (SEA) and Energy Finite Element Analysis (EFEA) Predictions for a Floor-Equipped Composite Cylinder

    Grosveld, Ferdinand W.; Schiller, Noah H.; Cabell, Randolph H.

    2011-01-01

    Comet Enflow is a commercially available, high frequency vibroacoustic analysis software founded on Energy Finite Element Analysis (EFEA) and Energy Boundary Element Analysis (EBEA). Energy Finite Element Analysis (EFEA) was validated on a floor-equipped composite cylinder by comparing EFEA vibroacoustic response predictions with Statistical Energy Analysis (SEA) and experimental results. Statistical Energy Analysis (SEA) predictions were made using the commercial software program VA One 2009 from ESI Group. The frequency region of interest for this study covers the one-third octave bands with center frequencies from 100 Hz to 4000 Hz.

  1. An integrated user-friendly ArcMAP tool for bivariate statistical modelling in geoscience applications

    Jebur, M. N.; Pradhan, B.; Shafri, H. Z. M.; Yusoff, Z. M.; Tehrany, M. S.

    2015-03-01

    Modelling and classification difficulties are fundamental issues in natural hazard assessment. A geographic information system (GIS) is a domain that requires users to use various tools to perform different types of spatial modelling. Bivariate statistical analysis (BSA) assists in hazard modelling. To perform this analysis, several calculations are required and the user has to transfer data from one format to another. Most researchers perform these calculations manually by using Microsoft Excel or other programs. This process is time-consuming and carries a degree of uncertainty. The lack of proper tools to implement BSA in a GIS environment prompted this study. In this paper, a user-friendly tool, bivariate statistical modeler (BSM), for BSA technique is proposed. Three popular BSA techniques, such as frequency ratio, weight-of-evidence (WoE), and evidential belief function (EBF) models, are applied in the newly proposed ArcMAP tool. This tool is programmed in Python and created by a simple graphical user interface (GUI), which facilitates the improvement of model performance. The proposed tool implements BSA automatically, thus allowing numerous variables to be examined. To validate the capability and accuracy of this program, a pilot test area in Malaysia is selected and all three models are tested by using the proposed program. Area under curve (AUC) is used to measure the success rate and prediction rate. Results demonstrate that the proposed program executes BSA with reasonable accuracy. The proposed BSA tool can be used in numerous applications, such as natural hazard, mineral potential, hydrological, and other engineering and environmental applications.

  2. Integration of Multi-Modal Biomedical Data to Predict Cancer Grade and Patient Survival.

    Phan, John H; Hoffman, Ryan; Kothari, Sonal; Wu, Po-Yen; Wang, May D

    2016-02-01

    The Big Data era in Biomedical research has resulted in large-cohort data repositories such as The Cancer Genome Atlas (TCGA). These repositories routinely contain hundreds of matched patient samples for genomic, proteomic, imaging, and clinical data modalities, enabling holistic and multi-modal integrative analysis of human disease. Using TCGA renal and ovarian cancer data, we conducted a novel investigation of multi-modal data integration by combining histopathological image and RNA-seq data. We compared the performances of two integrative prediction methods: majority vote and stacked generalization. Results indicate that integration of multiple data modalities improves prediction of cancer grade and outcome. Specifically, stacked generalization, a method that integrates multiple data modalities to produce a single prediction result, outperforms both single-data-modality prediction and majority vote. Moreover, stacked generalization reveals the contribution of each data modality (and specific features within each data modality) to the final prediction result and may provide biological insights to explain prediction performance.

  3. A statistical kinematic source inversion approach based on the QUESO library for uncertainty quantification and prediction

    Zielke, Olaf; McDougall, Damon; Mai, Martin; Babuska, Ivo

    2014-05-01

    Seismic, often augmented with geodetic data, are frequently used to invert for the spatio-temporal evolution of slip along a rupture plane. The resulting images of the slip evolution for a single event, inferred by different research teams, often vary distinctly, depending on the adopted inversion approach and rupture model parameterization. This observation raises the question, which of the provided kinematic source inversion solutions is most reliable and most robust, and — more generally — how accurate are fault parameterization and solution predictions? These issues are not included in "standard" source inversion approaches. Here, we present a statistical inversion approach to constrain kinematic rupture parameters from teleseismic body waves. The approach is based a) on a forward-modeling scheme that computes synthetic (body-)waves for a given kinematic rupture model, and b) on the QUESO (Quantification of Uncertainty for Estimation, Simulation, and Optimization) library that uses MCMC algorithms and Bayes theorem for sample selection. We present Bayesian inversions for rupture parameters in synthetic earthquakes (i.e. for which the exact rupture history is known) in an attempt to identify the cross-over at which further model discretization (spatial and temporal resolution of the parameter space) is no longer attributed to a decreasing misfit. Identification of this cross-over is of importance as it reveals the resolution power of the studied data set (i.e. teleseismic body waves), enabling one to constrain kinematic earthquake rupture histories of real earthquakes at a resolution that is supported by data. In addition, the Bayesian approach allows for mapping complete posterior probability density functions of the desired kinematic source parameters, thus enabling us to rigorously assess the uncertainties in earthquake source inversions.

  4. SU-E-T-205: MLC Predictive Maintenance Using Statistical Process Control Analysis.

    Able, C; Hampton, C; Baydush, A; Bright, M

    2012-06-01

    MLC failure increases accelerator downtime and negatively affects the clinic treatment delivery schedule. This study investigates the use of Statistical Process Control (SPC), a modern quality control methodology, to retrospectively evaluate MLC performance data thereby predicting the impending failure of individual MLC leaves. SPC, a methodology which detects exceptional variability in a process, was used to analyze MLC leaf velocity data. A MLC velocity test is performed weekly on all leaves during morning QA. The leaves sweep 15 cm across the radiation field with the gantry pointing down. The leaf speed is analyzed from the generated dynalog file using quality assurance software. MLC leaf speeds in which a known motor failure occurred (8) and those in which no motor replacement was performed (11) were retrospectively evaluated for a 71 week period. SPC individual and moving range (I/MR) charts were used in the analysis. The I/MR chart limits were calculated using the first twenty weeks of data and set at 3 standard deviations from the mean. The MLCs in which a motor failure occurred followed two general trends: (a) no data indicating a change in leaf speed prior to failure (5 of 8) and (b) a series of data points exceeding the limit prior to motor failure (3 of 8). I/MR charts for a high percentage (8 of 11) of the non-replaced MLC motors indicated that only a single point exceeded the limit. These single point excesses were deemed false positives. SPC analysis using MLC performance data may be helpful in detecting a significant percentage of impending failures of MLC motors. The ability to detect MLC failure may depend on the method of failure (i.e. gradual or catastrophic). Further study is needed to determine if increasing the sampling frequency could increase reliability. Project was support by a grant from Varian Medical Systems, Inc. © 2012 American Association of Physicists in Medicine.

  5. Accurate Holdup Calculations with Predictive Modeling & Data Integration

    Azmy, Yousry [North Carolina State Univ., Raleigh, NC (United States). Dept. of Nuclear Engineering; Cacuci, Dan [Univ. of South Carolina, Columbia, SC (United States). Dept. of Mechanical Engineering

    2017-04-03

    Bayes’ Theorem, one must have a model y(x) that maps the state variables x (the solution in this case) to the measurements y. In this case, the unknown state variables are the configuration and composition of the heldup SNM. The measurements are the detector readings. Thus, the natural model is neutral-particle radiation transport where a wealth of computational tools exists for performing these simulations accurately and efficiently. The combination of predictive model and Bayesian inference forms the Data Integration with Modeled Predictions (DIMP) method that serves as foundation for this project. The cost functional describing the model-to-data misfit is computed via a norm created by the inverse of the covariance matrix of the model parameters and responses. Since the model y(x) for the holdup problem is nonlinear, a nonlinear optimization on Q is conducted via Newton-type iterative methods to find the optimal values of the model parameters x. This project comprised a collaboration between NC State University (NCSU), the University of South Carolina (USC), and Oak Ridge National Laboratory (ORNL). The project was originally proposed in seven main tasks with an eighth contingency task to be performed if time and funding permitted; in fact time did not permit commencement of the contingency task and it was not performed. The remaining tasks involved holdup analysis with gamma detection strategies and separately with neutrons based on coincidence counting. Early in the project, and upon consultation with experts in coincidence counting it became evident that this approach is not viable for holdup applications and this task was replaced with an alternative, but valuable investigation that was carried out by the USC partner. Nevertheless, the experimental 4 measurements at ORNL of both gamma and neutron sources for the purpose of constructing Detector Response Functions (DRFs) with the associated uncertainties were indeed completed.

  6. Revisiting EOR Projects in Indonesia through Integrated Study: EOR Screening, Predictive Model, and Optimisation

    Hartono, A. D.; Hakiki, Farizal; Syihab, Z.; Ambia, F.; Yasutra, A.; Sutopo, S.; Efendi, M.; Sitompul, V.; Primasari, I.; Apriandi, R.

    2017-01-01

    EOR preliminary analysis is pivotal to be performed at early stage of assessment in order to elucidate EOR feasibility. This study proposes an in-depth analysis toolkit for EOR preliminary evaluation. The toolkit incorporates EOR screening, predictive, economic, risk analysis and optimisation modules. The screening module introduces algorithms which assimilates statistical and engineering notions into consideration. The United States Department of Energy (U.S. DOE) predictive models were implemented in the predictive module. The economic module is available to assess project attractiveness, while Monte Carlo Simulation is applied to quantify risk and uncertainty of the evaluated project. Optimization scenario of EOR practice can be evaluated using the optimisation module, in which stochastic methods of Genetic Algorithms (GA), Particle Swarm Optimization (PSO) and Evolutionary Strategy (ES) were applied in the algorithms. The modules were combined into an integrated package of EOR preliminary assessment. Finally, we utilised the toolkit to evaluate several Indonesian oil fields for EOR evaluation (past projects) and feasibility (future projects). The attempt was able to update the previous consideration regarding EOR attractiveness and open new opportunity for EOR implementation in Indonesia.

  7. Revisiting EOR Projects in Indonesia through Integrated Study: EOR Screening, Predictive Model, and Optimisation

    Hartono, A. D.

    2017-10-17

    EOR preliminary analysis is pivotal to be performed at early stage of assessment in order to elucidate EOR feasibility. This study proposes an in-depth analysis toolkit for EOR preliminary evaluation. The toolkit incorporates EOR screening, predictive, economic, risk analysis and optimisation modules. The screening module introduces algorithms which assimilates statistical and engineering notions into consideration. The United States Department of Energy (U.S. DOE) predictive models were implemented in the predictive module. The economic module is available to assess project attractiveness, while Monte Carlo Simulation is applied to quantify risk and uncertainty of the evaluated project. Optimization scenario of EOR practice can be evaluated using the optimisation module, in which stochastic methods of Genetic Algorithms (GA), Particle Swarm Optimization (PSO) and Evolutionary Strategy (ES) were applied in the algorithms. The modules were combined into an integrated package of EOR preliminary assessment. Finally, we utilised the toolkit to evaluate several Indonesian oil fields for EOR evaluation (past projects) and feasibility (future projects). The attempt was able to update the previous consideration regarding EOR attractiveness and open new opportunity for EOR implementation in Indonesia.

  8. Fractional statistics and quantum scaling properties of the integrable Penson-Kolb-Hubbard chain

    Vitoriano, Carlindo; Coutinho-Filho, M. D.

    2010-09-01

    We investigate the ground-state and low-temperature properties of the integrable version of the Penson-Kolb-Hubbard chain. The model obeys fractional statistical properties, which give rise to fractional elementary excitations and manifest differently in the four regions of the phase diagram U/t versus n , where U is the Coulomb coupling, t is the correlated hopping amplitude, and n is the particle density. In fact, we can find local pair formation, fractionalization of the average occupation number per orbital k , or U - and n -dependent average electric charge per orbital k . We also study the scaling behavior near the U -driven quantum phase transitions and characterize their universality classes. Finally, it is shown that in the regime of parameters where local pair formation is energetically more favorable, the ground state exhibits power-law superconductivity; we also stress that above half filling the pair-hopping term stabilizes local Cooper pairs in the repulsive- U regime for U

  9. On the estimation of multiple random integrals and U-statistics

    Major, Péter

    2013-01-01

    This work starts with the study of those limit theorems in probability theory for which classical methods do not work. In many cases some form of linearization can help to solve the problem, because the linearized version is simpler. But in order to apply such a method we have to show that the linearization causes a negligible error. The estimation of this error leads to some important large deviation type problems, and the main subject of this work is their investigation. We provide sharp estimates of the tail distribution of multiple integrals with respect to a normalized empirical measure and so-called degenerate U-statistics and also of the supremum of appropriate classes of such quantities. The proofs apply a number of useful techniques of modern probability that enable us to investigate the non-linear functionals of independent random variables. This lecture note yields insights into these methods, and may also be useful for those who only want some new tools to help them prove limit theorems when stand...

  10. Abnormal white matter integrity in the corpus callosum among smokers: tract-based spatial statistics.

    Wakako Umene-Nakano

    Full Text Available In the present study, we aimed to investigate the difference in white matter between smokers and nonsmokers. In addition, we examined relationships between white matter integrity and nicotine dependence parameters in smoking subjects. Nineteen male smokers were enrolled in this study. Eighteen age-matched non-smokers with no current or past psychiatric history were included as controls. Diffusion tensor imaging scans were performed, and the analysis was conducted using a tract-based special statistics approach. Compared with nonsmokers, smokers exhibited a significant decrease in fractional anisotropy (FA throughout the whole corpus callosum. There were no significant differences in radial diffusivity or axial diffusivity between the two groups. There was a significant negative correlation between FA in the whole corpus callosum and the amount of tobacco use (cigarettes/day; R = - 0.580, p = 0.023. These results suggest that the corpus callosum may be one of the key areas influenced by chronic smoking.

  11. Integrating observation and statistical forecasts over sub-Saharan Africa to support Famine Early Warning

    Funk, Chris; Verdin, James P.; Husak, Gregory

    2007-01-01

    Famine early warning in Africa presents unique challenges and rewards. Hydrologic extremes must be tracked and anticipated over complex and changing climate regimes. The successful anticipation and interpretation of hydrologic shocks can initiate effective government response, saving lives and softening the impacts of droughts and floods. While both monitoring and forecast technologies continue to advance, discontinuities between monitoring and forecast systems inhibit effective decision making. Monitoring systems typically rely on high resolution satellite remote-sensed normalized difference vegetation index (NDVI) and rainfall imagery. Forecast systems provide information on a variety of scales and formats. Non-meteorologists are often unable or unwilling to connect the dots between these disparate sources of information. To mitigate these problem researchers at UCSB's Climate Hazard Group, NASA GIMMS and USGS/EROS are implementing a NASA-funded integrated decision support system that combines the monitoring of precipitation and NDVI with statistical one-to-three month forecasts. We present the monitoring/forecast system, assess its accuracy, and demonstrate its application in food insecure sub-Saharan Africa.

  12. A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics.

    Joyce, Brendan; Lee, Danny; Rubio, Alex; Ogurtsov, Aleksey; Alves, Gelio; Yu, Yi-Kuo

    2018-03-15

    RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId's core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goal here. We have constructed a graphical user interface to facilitate the use of RAId on users' local machines. Written in Java, RAId_GUI not only makes easy executions of RAId but also provides tools for data/spectra visualization, MS-product analysis, molecular isotopic distribution analysis, and graphing the retrieval versus the proportion of false discoveries. The results viewer displays and allows the users to download the analyses results. Both the knowledge-integrated organismal databases and the code package (containing source code, the graphical user interface, and a user manual) are available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads/raid.html .

  13. Predicting Flowering Behavior and Exploring Its Genetic Determinism in an Apple Multi-family Population Based on Statistical Indices and Simplified Phenotyping

    Jean-Baptiste Durand

    2017-06-01

    Full Text Available Irregular flowering over years is commonly observed in fruit trees. The early prediction of tree behavior is highly desirable in breeding programmes. This study aims at performing such predictions, combining simplified phenotyping and statistics methods. Sequences of vegetative vs. floral annual shoots (AS were observed along axes in trees belonging to five apple related full-sib families. Sequences were analyzed using Markovian and linear mixed models including year and site effects. Indices of flowering irregularity, periodicity and synchronicity were estimated, at tree and axis scales. They were used to predict tree behavior and detect QTL with a Bayesian pedigree-based analysis, using an integrated genetic map containing 6,849 SNPs. The combination of a Biennial Bearing Index (BBI with an autoregressive coefficient (γg efficiently predicted and classified the genotype behaviors, despite few misclassifications. Four QTLs common to BBIs and γg and one for synchronicity were highlighted and revealed the complex genetic architecture of the traits. Irregularity resulted from high AS synchronism, whereas regularity resulted from either asynchronous locally alternating or continual regular AS flowering. A relevant and time-saving method, based on a posteriori sampling of axes and statistical indices is proposed, which is efficient to evaluate the tree breeding values for flowering regularity and could be transferred to other species.

  14. Predicting Flowering Behavior and Exploring Its Genetic Determinism in an Apple Multi-family Population Based on Statistical Indices and Simplified Phenotyping.

    Durand, Jean-Baptiste; Allard, Alix; Guitton, Baptiste; van de Weg, Eric; Bink, Marco C A M; Costes, Evelyne

    2017-01-01

    Irregular flowering over years is commonly observed in fruit trees. The early prediction of tree behavior is highly desirable in breeding programmes. This study aims at performing such predictions, combining simplified phenotyping and statistics methods. Sequences of vegetative vs. floral annual shoots (AS) were observed along axes in trees belonging to five apple related full-sib families. Sequences were analyzed using Markovian and linear mixed models including year and site effects. Indices of flowering irregularity, periodicity and synchronicity were estimated, at tree and axis scales. They were used to predict tree behavior and detect QTL with a Bayesian pedigree-based analysis, using an integrated genetic map containing 6,849 SNPs. The combination of a Biennial Bearing Index (BBI) with an autoregressive coefficient (γ g ) efficiently predicted and classified the genotype behaviors, despite few misclassifications. Four QTLs common to BBIs and γ g and one for synchronicity were highlighted and revealed the complex genetic architecture of the traits. Irregularity resulted from high AS synchronism, whereas regularity resulted from either asynchronous locally alternating or continual regular AS flowering. A relevant and time-saving method, based on a posteriori sampling of axes and statistical indices is proposed, which is efficient to evaluate the tree breeding values for flowering regularity and could be transferred to other species.

  15. A Hierarchical Multivariate Bayesian Approach to Ensemble Model output Statistics in Atmospheric Prediction

    2017-09-01

    application of statistical inference. Even when human forecasters leverage their professional experience, which is often gained through long periods of... application throughout statistics and Bayesian data analysis. The multivariate form of 2( , )  (e.g., Figure 12) is similarly analytically...data (i.e., no systematic manipulations with analytical functions), it is common in the statistical literature to apply mathematical transformations

  16. Two sample Bayesian prediction intervals for order statistics based on the inverse exponential-type distributions using right censored sample

    M.M. Mohie El-Din

    2011-10-01

    Full Text Available In this paper, two sample Bayesian prediction intervals for order statistics (OS are obtained. This prediction is based on a certain class of the inverse exponential-type distributions using a right censored sample. A general class of prior density functions is used and the predictive cumulative function is obtained in the two samples case. The class of the inverse exponential-type distributions includes several important distributions such the inverse Weibull distribution, the inverse Burr distribution, the loglogistic distribution, the inverse Pareto distribution and the inverse paralogistic distribution. Special cases of the inverse Weibull model such as the inverse exponential model and the inverse Rayleigh model are considered.

  17. Predicting the distribution of four species of raptors (Aves: Accipitridae) in southern Spain: statistical models work better than existing maps

    Bustamante, Javier; Seoane, Javier

    2004-01-01

    Aim To test the effectiveness of statistical models based on explanatory environmental variables vs. existing distribution information (maps and breeding atlas), for predicting the distribution of four species of raptors (family Accipitridae): common buzzard Buteo buteo (Linnaeus, 1758), short-toed eagle Circaetus gallicus (Gmelin, 1788), booted eagle Hieraaetus pennatus (Gmelin, 1788) and black kite Milvus migrans (Boddaert, 1783). Location Andalusia, southe...

  18. Predicting freshwater habitat integrity using land-use surrogates

    2007-04-02

    Apr 2, 2007 ... Quantification of potential surrogates of freshwater habitat integrity. We chose a series of land-use variables that might be suitable predictors for assessing freshwater habitat integrity from the land cover map (CSIR 2005) and added separate GIS surfaces for human population density and the distribution of ...

  19. Prediction of interior noise due to random acoustic or turbulent boundary layer excitation using statistical energy analysis

    Grosveld, Ferdinand W.

    1990-01-01

    The feasibility of predicting interior noise due to random acoustic or turbulent boundary layer excitation was investigated in experiments in which a statistical energy analysis model (VAPEPS) was used to analyze measurements of the acceleration response and sound transmission of flat aluminum, lucite, and graphite/epoxy plates exposed to random acoustic or turbulent boundary layer excitation. The noise reduction of the plate, when backed by a shallow cavity and excited by a turbulent boundary layer, was predicted using a simplified theory based on the assumption of adiabatic compression of the fluid in the cavity. The predicted plate acceleration response was used as input in the noise reduction prediction. Reasonable agreement was found between the predictions and the measured noise reduction in the frequency range 315-1000 Hz.

  20. Context mining and integration into predictive web analytics

    Kiseleva, Y.

    2013-01-01

    Predictive Web Analytics is aimed at understanding behavioural patterns of users of various web-based applications: e-commerce, ubiquitous and mobile computing, and computational advertising. Within these applications business decisions often rely on two types of predictions: an overall or

  1. Integrating Statistical and Expert Knowledge to Develop Phenoregions for the Continental United States

    Hoffman, F. M.; Kumar, J.; Hargrove, W. W.

    2013-12-01

    Vegetated ecosystems typically exhibit unique phenological behavior over the course of a year, suggesting that remotely sensed land surface phenology may be useful for characterizing land cover and ecoregions. However, phenology is also strongly influenced by temperature and water stress; insect, fire, and storm disturbances; and climate change over seasonal, interannual, decadal and longer time scales. Normalized difference vegetation index (NDVI), a remotely sensed measure of greenness, provides a useful proxy for land surface phenology. We used NDVI for the conterminous United States (CONUS) derived from the Moderate Resolution Spectroradiometer (MODIS) at 250 m resolution to develop phenological signatures of emergent ecological regimes called phenoregions. By applying a unsupervised, quantitative data mining technique to NDVI measurements for every eight days over the entire MODIS record, annual maps of phenoregions were developed. This technique produces a prescribed number of prototypical phenological states to which every location belongs in any year. To reduce the impact of short-term disturbances, we derived a single map of the mode of annual phenological states for the CONUS, assigning each map cell to the state with the largest integrated NDVI in cases where multiple states tie for the highest frequency. Since the data mining technique is unsupervised, individual phenoregions are not associated with an ecologically understandable label. To add automated supervision to the process, we applied the method of Mapcurves, developed by Hargrove and Hoffman, to associate individual phenoregions with labeled polygons in expert-derived maps of biomes, land cover, and ecoregions. Utilizing spatial overlays with multiple expert-derived maps, this "label-stealing"' technique exploits the knowledge contained in a collection of maps to identify biome characteristics of our statistically derived phenoregions. Generalized land cover maps were produced by combining

  2. The IntFOLD server: an integrated web resource for protein fold recognition, 3D model quality assessment, intrinsic disorder prediction, domain prediction and ligand binding site prediction.

    Roche, Daniel B; Buenavista, Maria T; Tetchner, Stuart J; McGuffin, Liam J

    2011-07-01

    The IntFOLD server is a novel independent server that integrates several cutting edge methods for the prediction of structure and function from sequence. Our guiding principles behind the server development were as follows: (i) to provide a simple unified resource that makes our prediction software accessible to all and (ii) to produce integrated output for predictions that can be easily interpreted. The output for predictions is presented as a simple table that summarizes all results graphically via plots and annotated 3D models. The raw machine readable data files for each set of predictions are also provided for developers, which comply with the Critical Assessment of Methods for Protein Structure Prediction (CASP) data standards. The server comprises an integrated suite of five novel methods: nFOLD4, for tertiary structure prediction; ModFOLD 3.0, for model quality assessment; DISOclust 2.0, for disorder prediction; DomFOLD 2.0 for domain prediction; and FunFOLD 1.0, for ligand binding site prediction. Predictions from the IntFOLD server were found to be competitive in several categories in the recent CASP9 experiment. The IntFOLD server is available at the following web site: http://www.reading.ac.uk/bioinf/IntFOLD/.

  3. Evaluating and Predicting Patient Safety for Medical Devices With Integral Information Technology

    2005-01-01

    323 Evaluating and Predicting Patient Safety for Medical Devices with Integral Information Technology Jiajie Zhang, Vimla L. Patel, Todd R...errors are due to inappropriate designs for user interactions, rather than mechanical failures. Evaluating and predicting patient safety in medical ...the users on the identified trouble spots in the devices. We developed two methods for evaluating and predicting patient safety in medical devices

  4. Predicting freshwater habitat integrity using land-use surrogates

    Amis, MA

    2007-04-01

    Full Text Available Freshwater biodiversity is globally threatened due to human disturbances, but freshwater ecosystems have been accorded less protection than their terrestrial and marine counterparts. Few criteria exist for assessing the habitat integrity of rivers...

  5. Study of 1-min rain rate integration statistic in South Korea

    Shrestha, Sujan; Choi, Dong-You

    2017-03-01

    The design of millimeter wave communication links and the study of propagation impairments at higher frequencies due to a hydrometeor, particularly rain, require the knowledge of 1-min. rainfall rate data. Signal attenuation in space communication results are due to absorption and scattering of radio wave energy. Radio wave attenuation due to rain depends on the relevance of a 1-min. integration time for the rain rate. However, in practice, securing these data over a wide range of areas is difficult. Long term precipitation data are readily available. However, there is a need for a 1-min. rainfall rate in the rain attenuation prediction models for a better estimation of the attenuation. In this paper, we classify and survey the prominent 1-min. rain rate models. Regression analysis was performed for the study of cumulative rainfall data measured experimentally for a decade in nine different regions in South Korea, with 93 different locations, using the experimental 1-min. rainfall accumulation. To visualize the 1-min. rainfall rate applicable for the whole region for 0.01% of the time, we have considered the variation in the rain rate for 40 stations across South Korea. The Kriging interpolation method was used for spatial interpolation of the rain rate values for 0.01% of the time into a regular grid to obtain a highly consistent and predictable rainfall variation. The rain rate exceeded the 1-min. interval that was measured through the rain gauge compared to the rainfall data estimated using the International Telecommunication Union Radio Communication Sector model (ITU-R P.837-6) along with the empirical methods as Segal, Burgueno et al., Chebil and Rahman, logarithmic, exponential and global coefficients, second and third order polynomial fits, and Model 1 for Icheon regions under the regional and average coefficient set. The ITU-R P. 837-6 exhibits a lower relative error percentage of 3.32% and 12.59% in the 5- and 10-min. to 1-min. conversion, whereas the

  6. Drivers and seasonal predictability of extreme wind speeds in the ECMWF System 4 and a statistical model

    Walz, M. A.; Donat, M.; Leckebusch, G. C.

    2017-12-01

    As extreme wind speeds are responsible for large socio-economic losses in Europe, a skillful prediction would be of great benefit for disaster prevention as well as for the actuarial community. Here we evaluate patterns of large-scale atmospheric variability and the seasonal predictability of extreme wind speeds (e.g. >95th percentile) in the European domain in the dynamical seasonal forecast system ECMWF System 4, and compare to the predictability based on a statistical prediction model. The dominant patterns of atmospheric variability show distinct differences between reanalysis and ECMWF System 4, with most patterns in System 4 extended downstream in comparison to ERA-Interim. The dissimilar manifestations of the patterns within the two models lead to substantially different drivers associated with the occurrence of extreme winds in the respective model. While the ECMWF System 4 is shown to provide some predictive power over Scandinavia and the eastern Atlantic, only very few grid cells in the European domain have significant correlations for extreme wind speeds in System 4 compared to ERA-Interim. In contrast, a statistical model predicts extreme wind speeds during boreal winter in better agreement with the observations. Our results suggest that System 4 does not seem to capture the potential predictability of extreme winds that exists in the real world, and therefore fails to provide reliable seasonal predictions for lead months 2-4. This is likely related to the unrealistic representation of large-scale patterns of atmospheric variability. Hence our study points to potential improvements of dynamical prediction skill by improving the simulation of large-scale atmospheric dynamics.

  7. MirZ: an integrated microRNA expression atlas and target prediction resource.

    Hausser, Jean; Berninger, Philipp; Rodak, Christoph; Jantscher, Yvonne; Wirth, Stefan; Zavolan, Mihaela

    2009-07-01

    MicroRNAs (miRNAs) are short RNAs that act as guides for the degradation and translational repression of protein-coding mRNAs. A large body of work showed that miRNAs are involved in the regulation of a broad range of biological functions, from development to cardiac and immune system function, to metabolism, to cancer. For most of the over 500 miRNAs that are encoded in the human genome the functions still remain to be uncovered. Identifying miRNAs whose expression changes between cell types or between normal and pathological conditions is an important step towards characterizing their function as is the prediction of mRNAs that could be targeted by these miRNAs. To provide the community the possibility of exploring interactively miRNA expression patterns and the candidate targets of miRNAs in an integrated environment, we developed the MirZ web server, which is accessible at www.mirz.unibas.ch. The server provides experimental and computational biologists with statistical analysis and data mining tools operating on up-to-date databases of sequencing-based miRNA expression profiles and of predicted miRNA target sites in species ranging from Caenorhabditis elegans to Homo sapiens.

  8. Accuracy statistics in predicting Independent Activities of Daily Living (IADL) capacity with comprehensive and brief neuropsychological test batteries.

    Karzmark, Peter; Deutsch, Gayle K

    2018-01-01

    This investigation was designed to determine the predictive accuracy of a comprehensive neuropsychological and brief neuropsychological test battery with regard to the capacity to perform instrumental activities of daily living (IADLs). Accuracy statistics that included measures of sensitivity, specificity, positive and negative predicted power and positive likelihood ratio were calculated for both types of batteries. The sample was drawn from a general neurological group of adults (n = 117) that included a number of older participants (age >55; n = 38). Standardized neuropsychological assessments were administered to all participants and were comprised of the Halstead Reitan Battery and portions of the Wechsler Adult Intelligence Scale-III. A comprehensive test battery yielded a moderate increase over base-rate in predictive accuracy that generalized to older individuals. There was only limited support for using a brief battery, for although sensitivity was high, specificity was low. We found that a comprehensive neuropsychological test battery provided good classification accuracy for predicting IADL capacity.

  9. Chemical agnostic hazard prediction: Statistical inference of toxicity pathways - data for Figure 2

    U.S. Environmental Protection Agency — This dataset comprises one SigmaPlot 13 file containing measured survival data and survival data predicted from the model coefficients selected by the LASSO...

  10. Prediction of thermo-mechanical integrity of wafer backend processes

    Gonda, V.; Toonder, den J.M.J.; Beijer, J.G.J.; Zhang, G.Q.; Hoofman, R.J.O.M.; Ernst, L.J.; Ernst, L.J.

    2003-01-01

    More than 65% of IC failures are related to thermal and mechanical problems. For wafer backend processes, thermo-mechanical failure is one of the major bottlenecks. The ongoing technological trends like miniaturization, introduction of new materials, and function/product integration will increase

  11. Preliminary background prediction for the INTEGRAL x-ray monitor

    Feroci, M.; Costa, E.; Budtz-Joergensen, C.

    1996-01-01

    The JEM-X (joint European x-ray monitor) experiment will be flown onboard the ESA's INTEGRAL satellite. The instrumental background level of the two JEM-X twin detectors will depend on several parameters, among which the satellite orbit and mass distribution, and the detectors materials play...

  12. Integrating models to predict regional haze from wildland fire.

    D. McKenzie; S.M. O' Neill; N. Larkin; R.A. Norheim

    2006-01-01

    Visibility impairment from regional haze is a significant problem throughout the continental United States. A substantial portion of regional haze is produced by smoke from prescribed and wildland fires. Here we describe the integration of four simulation models, an array of GIS raster layers, and a set of algorithms for fire-danger calculations into a modeling...

  13. Tide Gauge and Satellite Altimetry Integration for Storm Surge Prediction

    Andersen, Ole Baltazar; Cheng, Yongcun; Deng, X.

    2013-01-01

    of the Northeast Australia, we have investigated several large cyclones causing much destruction when they hit the coast. One of these being the Cyclone Larry, which hit the Queensland coast in March 2006 and caused both losses of lives as well as huge devastation. Here we demonstrate the importance of integrating...

  14. Basic Mathematics Test Predicts Statistics Achievement and Overall First Year Academic Success

    Fonteyne, Lot; De Fruyt, Filip; Dewulf, Nele; Duyck, Wouter; Erauw, Kris; Goeminne, Katy; Lammertyn, Jan; Marchant, Thierry; Moerkerke, Beatrijs; Oosterlinck, Tom; Rosseel, Yves

    2015-01-01

    In the psychology and educational science programs at Ghent University, only 36.1% of the new incoming students in 2011 and 2012 passed all exams. Despite availability of information, many students underestimate the scientific character of social science programs. Statistics courses are a major obstacle in this matter. Not all enrolling students…

  15. The Rhetoric of Investment Theory : The Story of Statistics and Predictability

    T. Pistorius (Thomas)

    2016-01-01

    markdownabstractUncertainty is a feeling of anxiety and a part of culture since the dawn of civilization. Civilizations have invented numerous ways to cope with uncertainty, statistics is one of those technologies. The rhetoric as the discourse of investment theory uncovers that the theory of

  16. Predicting phenology by integrating ecology, evolution and climate science

    Pau, Stephanie; Wolkovich, Elizabeth M.; Cook, Benjamin I.; Davies, T. Jonathan; Kraft, Nathan J.B.; Bolmgren, Kjell; Betancourt, Julio L.; Cleland, Elsa E.

    2011-01-01

    Forecasting how species and ecosystems will respond to climate change has been a major aim of ecology in recent years. Much of this research has focused on phenology — the timing of life-history events. Phenology has well-demonstrated links to climate, from genetic to landscape scales; yet our ability to explain and predict variation in phenology across species, habitats and time remains poor. Here, we outline how merging approaches from ecology, climate science and evolutionary biology can advance research on phenological responses to climate variability. Using insight into seasonal and interannual climate variability combined with niche theory and community phylogenetics, we develop a predictive approach for species' reponses to changing climate. Our approach predicts that species occupying higher latitudes or the early growing season should be most sensitive to climate and have the most phylogenetically conserved phenologies. We further predict that temperate species will respond to climate change by shifting in time, while tropical species will respond by shifting space, or by evolving. Although we focus here on plant phenology, our approach is broadly applicable to ecological research of plant responses to climate variability.

  17. Predicting uncertainty in future marine ice sheet volume using Bayesian statistical methods

    Davis, A. D.

    2015-12-01

    The marine ice instability can trigger rapid retreat of marine ice streams. Recent observations suggest that marine ice systems in West Antarctica have begun retreating. However, unknown ice dynamics, computationally intensive mathematical models, and uncertain parameters in these models make predicting retreat rate and ice volume difficult. In this work, we fuse current observational data with ice stream/shelf models to develop probabilistic predictions of future grounded ice sheet volume. Given observational data (e.g., thickness, surface elevation, and velocity) and a forward model that relates uncertain parameters (e.g., basal friction and basal topography) to these observations, we use a Bayesian framework to define a posterior distribution over the parameters. A stochastic predictive model then propagates uncertainties in these parameters to uncertainty in a particular quantity of interest (QoI)---here, the volume of grounded ice at a specified future time. While the Bayesian approach can in principle characterize the posterior predictive distribution of the QoI, the computational cost of both the forward and predictive models makes this effort prohibitively expensive. To tackle this challenge, we introduce a new Markov chain Monte Carlo method that constructs convergent approximations of the QoI target density in an online fashion, yielding accurate characterizations of future ice sheet volume at significantly reduced computational cost.Our second goal is to attribute uncertainty in these Bayesian predictions to uncertainties in particular parameters. Doing so can help target data collection, for the purpose of constraining the parameters that contribute most strongly to uncertainty in the future volume of grounded ice. For instance, smaller uncertainties in parameters to which the QoI is highly sensitive may account for more variability in the prediction than larger uncertainties in parameters to which the QoI is less sensitive. We use global sensitivity

  18. IPF-LASSO: Integrative L1-Penalized Regression with Penalty Factors for Prediction Based on Multi-Omics Data

    Anne-Laure Boulesteix

    2017-01-01

    Full Text Available As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper, such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (Integrative LASSO with Penalty Factors and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility.

  19. Developing a comprehensive training curriculum for integrated predictive maintenance

    Wurzbach, Richard N.

    2002-03-01

    On-line equipment condition monitoring is a critical component of the world-class production and safety histories of many successful nuclear plant operators. From addressing availability and operability concerns of nuclear safety-related equipment to increasing profitability through support system reliability and reduced maintenance costs, Predictive Maintenance programs have increasingly become a vital contribution to the maintenance and operation decisions of nuclear facilities. In recent years, significant advancements have been made in the quality and portability of many of the instruments being used, and software improvements have been made as well. However, the single most influential component of the success of these programs is the impact of a trained and experienced team of personnel putting this technology to work. Changes in the nature of the power generation industry brought on by competition, mergers, and acquisitions, has taken the historically stable personnel environment of power generation and created a very dynamic situation. As a result, many facilities have seen a significant turnover in personnel in key positions, including predictive maintenance personnel. It has become the challenge for many nuclear operators to maintain the consistent contribution of quality data and information from predictive maintenance that has become important in the overall equipment decision process. These challenges can be met through the implementation of quality training to predictive maintenance personnel and regular updating and re-certification of key technology holders. The use of data management tools and services aid in the sharing of information across sites within an operating company, and with experts who can contribute value-added data management and analysis. The overall effectiveness of predictive maintenance programs can be improved through the incorporation of newly developed comprehensive technology training courses. These courses address the use of

  20. A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

    Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

    2015-09-01

    Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Predicting the lung compliance of mechanically ventilated patients via statistical modeling

    Ganzert, Steven; Kramer, Stefan; Guttmann, Josef

    2012-01-01

    To avoid ventilator associated lung injury (VALI) during mechanical ventilation, the ventilator is adjusted with reference to the volume distensibility or ‘compliance’ of the lung. For lung-protective ventilation, the lung should be inflated at its maximum compliance, i.e. when during inspiration a maximal intrapulmonary volume change is achieved by a minimal change of pressure. To accomplish this, one of the main parameters is the adjusted positive end-expiratory pressure (PEEP). As changing the ventilator settings usually produces an effect on patient's lung mechanics with a considerable time delay, the prediction of the compliance change associated with a planned change of PEEP could assist the physician at the bedside. This study introduces a machine learning approach to predict the nonlinear lung compliance for the individual patient by Gaussian processes, a probabilistic modeling technique. Experiments are based on time series data obtained from patients suffering from acute respiratory distress syndrome (ARDS). With a high hit ratio of up to 93%, the learned models could predict whether an increase/decrease of PEEP would lead to an increase/decrease of the compliance. However, the prediction of the complete pressure–volume relation for an individual patient has to be improved. We conclude that the approach is well suitable for the given problem domain but that an individualized feature selection should be applied for a precise prediction of individual pressure–volume curves. (paper)

  2. Edaphic history over seedling characters predicts integration and plasticity of integration across geologically variable populations of Arabidopsis thaliana.

    Cousins, Elsa A; Murren, Courtney J

    2017-12-01

    Studies on phenotypic plasticity and plasticity of integration have uncovered functionally linked modules of aboveground traits and seedlings of Arabidopsis thaliana , but we lack details about belowground variation in adult plants. Functional modules can be comprised of additional suites of traits that respond to environmental variation. We assessed whether shoot and root responses to nutrient environments in adult A. thaliana were predictable from seedling traits or population-specific geologic soil characteristics at the site of origin. We compared 17 natural accessions from across the native range of A. thaliana using 14-day-old seedlings grown on agar or sand and plants grown to maturity across nutrient treatments in sand. We measured aboveground size, reproduction, timing traits, root length, and root diameter. Edaphic characteristics were obtained from a global-scale dataset and related to field data. We detected significant among-population variation in root traits of seedlings and adults and in plasticity in aboveground and belowground traits of adult plants. Phenotypic integration of roots and shoots varied by population and environment. Relative integration was greater in roots than in shoots, and integration was predicted by edaphic soil history, particularly organic carbon content, whereas seedling traits did not predict later ontogenetic stages. Soil environment of origin has significant effects on phenotypic plasticity in response to nutrients, and on phenotypic integration of root modules and shoot modules. Root traits varied among populations in reproductively mature individuals, indicating potential for adaptive and integrated functional responses of root systems in annuals. © 2017 Botanical Society of America.

  3. The Prediction of Exchange Rates with the Use of Auto-Regressive Integrated Moving-Average Models

    Daniela Spiesová

    2014-10-01

    Full Text Available Currency market is recently the largest world market during the existence of which there have been many theories regarding the prediction of the development of exchange rates based on macroeconomic, microeconomic, statistic and other models. The aim of this paper is to identify the adequate model for the prediction of non-stationary time series of exchange rates and then use this model to predict the trend of the development of European currencies against Euro. The uniqueness of this paper is in the fact that there are many expert studies dealing with the prediction of the currency pairs rates of the American dollar with other currency but there is only a limited number of scientific studies concerned with the long-term prediction of European currencies with the help of the integrated ARMA models even though the development of exchange rates has a crucial impact on all levels of economy and its prediction is an important indicator for individual countries, banks, companies and businessmen as well as for investors. The results of this study confirm that to predict the conditional variance and then to estimate the future values of exchange rates, it is adequate to use the ARIMA (1,1,1 model without constant, or ARIMA [(1,7,1,(1,7] model, where in the long-term, the square root of the conditional variance inclines towards stable value.

  4. Improving the Document Development Process: Integrating Relational Data and Statistical Process Control.

    Miller, John

    1994-01-01

    Presents an approach to document numbering, document titling, and process measurement which, when used with fundamental techniques of statistical process control, reveals meaningful process-element variation as well as nominal productivity models. (SR)

  5. EUROPEAN INTEGRATION: A MULTILEVEL PROCESS THAT REQUIRES A MULTILEVEL STATISTICAL ANALYSIS

    Roxana-Otilia-Sonia HRITCU

    2015-11-01

    Full Text Available A process of market regulation and a system of multi-level governance and several supranational, national and subnational levels of decision making, European integration subscribes to being a multilevel phenomenon. The individual characteristics of citizens, as well as the environment where the integration process takes place, are important. To understand the European integration and its consequences it is important to develop and test multi-level theories that consider individual-level characteristics, as well as the overall context where individuals act and express their characteristics. A central argument of this paper is that support for European integration is influenced by factors operating at different levels. We review and present theories and related research on the use of multilevel analysis in the European area. This paper draws insights on various aspects and consequences of the European integration to take stock of what we know about how and why to use multilevel modeling.

  6. Differential and integral characteristics of prompt fission neutrons in the statistical theory

    Gerasimenko, B.F.; Rubchenya, V.A.

    1989-01-01

    Hauser-Feshbach statistical theory is the most consistent approach to the calculation of both spectra and prompt fission neutrons characteristics. On the basis of this approach a statistical model for calculation of differential prompt fission neutrons characteristics of low energy fission has been proposed and improved in order to take into account the anisotropy effects arising at prompt fission neutrons emission from fragments. 37 refs, 6 figs

  7. Improving protein function prediction methods with integrated literature data

    Gabow Aaron P

    2008-04-01

    Full Text Available Abstract Background Determining the function of uncharacterized proteins is a major challenge in the post-genomic era due to the problem's complexity and scale. Identifying a protein's function contributes to an understanding of its role in the involved pathways, its suitability as a drug target, and its potential for protein modifications. Several graph-theoretic approaches predict unidentified functions of proteins by using the functional annotations of better-characterized proteins in protein-protein interaction networks. We systematically consider the use of literature co-occurrence data, introduce a new method for quantifying the reliability of co-occurrence and test how performance differs across species. We also quantify changes in performance as the prediction algorithms annotate with increased specificity. Results We find that including information on the co-occurrence of proteins within an abstract greatly boosts performance in the Functional Flow graph-theoretic function prediction algorithm in yeast, fly and worm. This increase in performance is not simply due to the presence of additional edges since supplementing protein-protein interactions with co-occurrence data outperforms supplementing with a comparably-sized genetic interaction dataset. Through the combination of protein-protein interactions and co-occurrence data, the neighborhood around unknown proteins is quickly connected to well-characterized nodes which global prediction algorithms can exploit. Our method for quantifying co-occurrence reliability shows superior performance to the other methods, particularly at threshold values around 10% which yield the best trade off between coverage and accuracy. In contrast, the traditional way of asserting co-occurrence when at least one abstract mentions both proteins proves to be the worst method for generating co-occurrence data, introducing too many false positives. Annotating the functions with greater specificity is harder

  8. Explained variation and predictive accuracy in general parametric statistical models: the role of model misspecification

    Rosthøj, Susanne; Keiding, Niels

    2004-01-01

    When studying a regression model measures of explained variation are used to assess the degree to which the covariates determine the outcome of interest. Measures of predictive accuracy are used to assess the accuracy of the predictions based on the covariates and the regression model. We give a ...... a detailed and general introduction to the two measures and the estimation procedures. The framework we set up allows for a study of the effect of misspecification on the quantities estimated. We also introduce a generalization to survival analysis....

  9. An analysis of a large dataset on immigrant integration in Spain. The Statistical Mechanics perspective on Social Action

    Barra, Adriano; Contucci, Pierluigi; Sandell, Rickard; Vernia, Cecilia

    2014-02-01

    How does immigrant integration in a country change with immigration density? Guided by a statistical mechanics perspective we propose a novel approach to this problem. The analysis focuses on classical integration quantifiers such as the percentage of jobs (temporary and permanent) given to immigrants, mixed marriages, and newborns with parents of mixed origin. We find that the average values of different quantifiers may exhibit either linear or non-linear growth on immigrant density and we suggest that social action, a concept identified by Max Weber, causes the observed non-linearity. Using the statistical mechanics notion of interaction to quantitatively emulate social action, a unified mathematical model for integration is proposed and it is shown to explain both growth behaviors observed. The linear theory instead, ignoring the possibility of interaction effects would underestimate the quantifiers up to 30% when immigrant densities are low, and overestimate them as much when densities are high. The capacity to quantitatively isolate different types of integration mechanisms makes our framework a suitable tool in the quest for more efficient integration policies.

  10. Prediction of the Electromagnetic Field Distribution in a Typical Aircraft Using the Statistical Energy Analysis

    Kovalevsky, Louis; Langley, Robin S.; Caro, Stephane

    2016-05-01

    Due to the high cost of experimental EMI measurements significant attention has been focused on numerical simulation. Classical methods such as Method of Moment or Finite Difference Time Domain are not well suited for this type of problem, as they require a fine discretisation of space and failed to take into account uncertainties. In this paper, the authors show that the Statistical Energy Analysis is well suited for this type of application. The SEA is a statistical approach employed to solve high frequency problems of electromagnetically reverberant cavities at a reduced computational cost. The key aspects of this approach are (i) to consider an ensemble of system that share the same gross parameter, and (ii) to avoid solving Maxwell's equations inside the cavity, using the power balance principle. The output is an estimate of the field magnitude distribution in each cavity. The method is applied on a typical aircraft structure.

  11. A statistical framework for evaluating neural networks to predict recurrent events in breast cancer

    Gorunescu, Florin; Gorunescu, Marina; El-Darzi, Elia; Gorunescu, Smaranda

    2010-07-01

    Breast cancer is the second leading cause of cancer deaths in women today. Sometimes, breast cancer can return after primary treatment. A medical diagnosis of recurrent cancer is often a more challenging task than the initial one. In this paper, we investigate the potential contribution of neural networks (NNs) to support health professionals in diagnosing such events. The NN algorithms are tested and applied to two different datasets. An extensive statistical analysis has been performed to verify our experiments. The results show that a simple network structure for both the multi-layer perceptron and radial basis function can produce equally good results, not all attributes are needed to train these algorithms and, finally, the classification performances of all algorithms are statistically robust. Moreover, we have shown that the best performing algorithm will strongly depend on the features of the datasets, and hence, there is not necessarily a single best classifier.

  12. Integration of Fast Predictive Model and SLM Process Development Chamber, Phase I

    National Aeronautics and Space Administration — This STTR project seeks to develop a fast predictive model for selective laser melting (SLM) processes and then integrate that model with an SLM chamber that allows...

  13. Integrating prediction, provenance, and optimization into high energy workflows

    Schram, M.; Bansal, V.; Friese, R. D.; Tallent, N. R.; Yin, J.; Barker, K. J.; Stephan, E.; Halappanavar, M.; Kerbyson, D. J.

    2017-10-01

    We propose a novel approach for efficient execution of workflows on distributed resources. The key components of this framework include: performance modeling to quantitatively predict workflow component behavior; optimization-based scheduling such as choosing an optimal subset of resources to meet demand and assignment of tasks to resources; distributed I/O optimizations such as prefetching; and provenance methods for collecting performance data. In preliminary results, these techniques improve throughput on a small Belle II workflow by 20%.

  14. The value of model averaging and dynamical climate model predictions for improving statistical seasonal streamflow forecasts over Australia

    Pokhrel, Prafulla; Wang, Q. J.; Robertson, David E.

    2013-10-01

    Seasonal streamflow forecasts are valuable for planning and allocation of water resources. In Australia, the Bureau of Meteorology employs a statistical method to forecast seasonal streamflows. The method uses predictors that are related to catchment wetness at the start of a forecast period and to climate during the forecast period. For the latter, a predictor is selected among a number of lagged climate indices as candidates to give the "best" model in terms of model performance in cross validation. This study investigates two strategies for further improvement in seasonal streamflow forecasts. The first is to combine, through Bayesian model averaging, multiple candidate models with different lagged climate indices as predictors, to take advantage of different predictive strengths of the multiple models. The second strategy is to introduce additional candidate models, using rainfall and sea surface temperature predictions from a global climate model as predictors. This is to take advantage of the direct simulations of various dynamic processes. The results show that combining forecasts from multiple statistical models generally yields more skillful forecasts than using only the best model and appears to moderate the worst forecast errors. The use of rainfall predictions from the dynamical climate model marginally improves the streamflow forecasts when viewed over all the study catchments and seasons, but the use of sea surface temperature predictions provide little additional benefit.

  15. Predicting survey responses: how and why semantics shape survey statistics on organizational behaviour.

    Jan Ketil Arnulf

    Full Text Available Some disciplines in the social sciences rely heavily on collecting survey responses to detect empirical relationships among variables. We explored whether these relationships were a priori predictable from the semantic properties of the survey items, using language processing algorithms which are now available as new research methods. Language processing algorithms were used to calculate the semantic similarity among all items in state-of-the-art surveys from Organisational Behaviour research. These surveys covered areas such as transformational leadership, work motivation and work outcomes. This information was used to explain and predict the response patterns from real subjects. Semantic algorithms explained 60-86% of the variance in the response patterns and allowed remarkably precise prediction of survey responses from humans, except in a personality test. Even the relationships between independent and their purported dependent variables were accurately predicted. This raises concern about the empirical nature of data collected through some surveys if results are already given a priori through the way subjects are being asked. Survey response patterns seem heavily determined by semantics. Language algorithms may suggest these prior to administering a survey. This study suggests that semantic algorithms are becoming new tools for the social sciences, opening perspectives on survey responses that prevalent psychometric theory cannot explain.

  16. On the functional integral approach in quantum statistics. 1. Some approximations

    Dai Xianxi.

    1990-08-01

    In this paper the susceptibility of a Kondo system in a fairly wide temperature region is calculated in the first harmonic approximation in a functional integral approach. The comparison with that of the renormalization group theory shows that in this region the two results agree quite well. The expansion of the partition function with infinite independent harmonics for the Anderson model is studied. Some symmetry relations are generalized. It is a challenging problem to develop a functional integral approach including diagram analysis, mixed mode effects and some exact relations in the Anderson system proved in the functional integral approach. These topics will be discussed in the next paper. (author). 22 refs, 1 fig

  17. Path integrals in quantum mechanics, statistics, polymer physics, and financial markets

    Kleinert, Hagen

    2009-01-01

    This is the fifth, expanded edition of the comprehensive textbook published in 1990 on the theory and applications of path integrals. It is the first book to explicitly solve path integrals of a wide variety of nontrivial quantum-mechanical systems, in particular the hydrogen atom. The solutions have been made possible by two major advances. The first is a new euclidean path integral formula which increases the restricted range of applicability of Feynman's time-sliced formula to include singular attractive 1/r- and 1/r2-potentials. The second is a new nonholonomic mapping principle carrying p

  18. Post-fire debris flow prediction in Western United States: Advancements based on a nonparametric statistical technique

    Nikolopoulos, E. I.; Destro, E.; Bhuiyan, M. A. E.; Borga, M., Sr.; Anagnostou, E. N.

    2017-12-01

    Fire disasters affect modern societies at global scale inducing significant economic losses and human casualties. In addition to their direct impacts they have various adverse effects on hydrologic and geomorphologic processes of a region due to the tremendous alteration of the landscape characteristics (vegetation, soil properties etc). As a consequence, wildfires often initiate a cascade of hazards such as flash floods and debris flows that usually follow the occurrence of a wildfire thus magnifying the overall impact in a region. Post-fire debris flows (PFDF) is one such type of hazards frequently occurring in Western United States where wildfires are a common natural disaster. Prediction of PDFD is therefore of high importance in this region and over the last years a number of efforts from United States Geological Survey (USGS) and National Weather Service (NWS) have been focused on the development of early warning systems that will help mitigate PFDF risk. This work proposes a prediction framework that is based on a nonparametric statistical technique (random forests) that allows predicting the occurrence of PFDF at regional scale with a higher degree of accuracy than the commonly used approaches that are based on power-law thresholds and logistic regression procedures. The work presented is based on a recently released database from USGS that reports a total of 1500 storms that triggered and did not trigger PFDF in a number of fire affected catchments in Western United States. The database includes information on storm characteristics (duration, accumulation, max intensity etc) and other auxiliary information of land surface properties (soil erodibility index, local slope etc). Results show that the proposed model is able to achieve a satisfactory prediction accuracy (threat score > 0.6) superior of previously published prediction frameworks highlighting the potential of nonparametric statistical techniques for development of PFDF prediction systems.

  19. Improving Permafrost Hydrology Prediction Through Data-Model Integration

    Wilson, C. J.; Andresen, C. G.; Atchley, A. L.; Bolton, W. R.; Busey, R.; Coon, E.; Charsley-Groffman, L.

    2017-12-01

    The CMIP5 Earth System Models were unable to adequately predict the fate of the 16GT of permafrost carbon in a warming climate due to poor representation of Arctic ecosystem processes. The DOE Office of Science Next Generation Ecosystem Experiment, NGEE-Arctic project aims to reduce uncertainty in the Arctic carbon cycle and its impact on the Earth's climate system by improved representation of the coupled physical, chemical and biological processes that drive how much buried carbon will be converted to CO2 and CH4, how fast this will happen, which form will dominate, and the degree to which increased plant productivity will offset increased soil carbon emissions. These processes fundamentally depend on permafrost thaw rate and its influence on surface and subsurface hydrology through thermal erosion, land subsidence and changes to groundwater flow pathways as soil, bedrock and alluvial pore ice and massive ground ice melts. LANL and its NGEE colleagues are co-developing data and models to better understand controls on permafrost degradation and improve prediction of the evolution of permafrost and its impact on Arctic hydrology. The LANL Advanced Terrestrial Simulator was built using a state of the art HPC software framework to enable the first fully coupled 3-dimensional surface-subsurface thermal-hydrology and land surface deformation simulations to simulate the evolution of the physical Arctic environment. Here we show how field data including hydrology, snow, vegetation, geochemistry and soil properties, are informing the development and application of the ATS to improve understanding of controls on permafrost stability and permafrost hydrology. The ATS is being used to inform parameterizations of complex coupled physical, ecological and biogeochemical processes for implementation in the DOE ACME land model, to better predict the role of changing Arctic hydrology on the global climate system. LA-UR-17-26566.

  20. Predictions of integrated circuit serviceability in space radiation fields

    Khamidullina, N.M.; Kuznetsov, N.V.; Pichkhadze, K.M.; Popov, V.D

    1999-10-01

    The present paper suggests an approach to estimating and predicting the serviceability of on-board electronic equipment. It is based on the postulates of the reliability theory and accounts for total-dose and single-event radiation effects as well as other exterior destabilizing factors. The methods of determination of failure and upset rates for CMOS devices are considered. The probability of non-failure operation of a two CMOS RAM is calculated along the whole trajectory of the 'Solar Probe' spacecraft.

  1. Dynamics and spike trains statistics in conductance-based integrate-and-fire neural networks with chemical and electric synapses

    Cofré, Rodrigo; Cessac, Bruno

    2013-01-01

    We investigate the effect of electric synapses (gap junctions) on collective neuronal dynamics and spike statistics in a conductance-based integrate-and-fire neural network, driven by Brownian noise, where conductances depend upon spike history. We compute explicitly the time evolution operator and show that, given the spike-history of the network and the membrane potentials at a given time, the further dynamical evolution can be written in a closed form. We show that spike train statistics is described by a Gibbs distribution whose potential can be approximated with an explicit formula, when the noise is weak. This potential form encompasses existing models for spike trains statistics analysis such as maximum entropy models or generalized linear models (GLM). We also discuss the different types of correlations: those induced by a shared stimulus and those induced by neurons interactions

  2. Quantification of integrated HIV DNA by repetitive-sampling Alu-HIV PCR on the basis of poisson statistics.

    De Spiegelaere, Ward; Malatinkova, Eva; Lynch, Lindsay; Van Nieuwerburgh, Filip; Messiaen, Peter; O'Doherty, Una; Vandekerckhove, Linos

    2014-06-01

    Quantification of integrated proviral HIV DNA by repetitive-sampling Alu-HIV PCR is a candidate virological tool to monitor the HIV reservoir in patients. However, the experimental procedures and data analysis of the assay are complex and hinder its widespread use. Here, we provide an improved and simplified data analysis method by adopting binomial and Poisson statistics. A modified analysis method on the basis of Poisson statistics was used to analyze the binomial data of positive and negative reactions from a 42-replicate Alu-HIV PCR by use of dilutions of an integration standard and on samples of 57 HIV-infected patients. Results were compared with the quantitative output of the previously described Alu-HIV PCR method. Poisson-based quantification of the Alu-HIV PCR was linearly correlated with the standard dilution series, indicating that absolute quantification with the Poisson method is a valid alternative for data analysis of repetitive-sampling Alu-HIV PCR data. Quantitative outputs of patient samples assessed by the Poisson method correlated with the previously described Alu-HIV PCR analysis, indicating that this method is a valid alternative for quantifying integrated HIV DNA. Poisson-based analysis of the Alu-HIV PCR data enables absolute quantification without the need of a standard dilution curve. Implementation of the CI estimation permits improved qualitative analysis of the data and provides a statistical basis for the required minimal number of technical replicates. © 2014 The American Association for Clinical Chemistry.

  3. Development of an integrated method for long-term water quality prediction using seasonal climate forecast

    J. Cho

    2016-10-01

    Full Text Available The APEC Climate Center (APCC produces climate prediction information utilizing a multi-climate model ensemble (MME technique. In this study, four different downscaling methods, in accordance with the degree of utilizing the seasonal climate prediction information, were developed in order to improve predictability and to refine the spatial scale. These methods include: (1 the Simple Bias Correction (SBC method, which directly uses APCC's dynamic prediction data with a 3 to 6 month lead time; (2 the Moving Window Regression (MWR method, which indirectly utilizes dynamic prediction data; (3 the Climate Index Regression (CIR method, which predominantly uses observation-based climate indices; and (4 the Integrated Time Regression (ITR method, which uses predictors selected from both CIR and MWR. Then, a sampling-based temporal downscaling was conducted using the Mahalanobis distance method in order to create daily weather inputs to the Soil and Water Assessment Tool (SWAT model. Long-term predictability of water quality within the Wecheon watershed of the Nakdong River Basin was evaluated. According to the Korean Ministry of Environment's Provisions of Water Quality Prediction and Response Measures, modeling-based predictability was evaluated by using 3-month lead prediction data issued in February, May, August, and November as model input of SWAT. Finally, an integrated approach, which takes into account various climate information and downscaling methods for water quality prediction, was presented. This integrated approach can be used to prevent potential problems caused by extreme climate in advance.

  4. Statistical prediction of far-field wind-turbine noise, with probabilistic characterization of atmospheric stability

    Kelly, Mark C.; Barlas, Emre; Sogachev, Andrey

    2018-01-01

    Here we provide statistical low-order characterization of noise propagation from a single wind turbine, as affected by mutually interacting turbine wake and environmental conditions. This is accomplished via a probabilistic model, applied to an ensemble of atmospheric conditions based upon......; the latter solves Reynolds-Averaged Navier-Stokes equations of momentum and temperature, including the effects of stability and the ABL depth, along with the drag due to the wind turbine. Sound levels are found to be highest downwind for modestly stable conditions not atypical of mid-latitude climates...

  5. Statistics-Based Prediction Analysis for Head and Neck Cancer Tumor Deformation

    Maryam Azimi

    2012-01-01

    Full Text Available Most of the current radiation therapy planning systems, which are based on pre-treatment Computer Tomography (CT images, assume that the tumor geometry does not change during the course of treatment. However, tumor geometry is shown to be changing over time. We propose a methodology to monitor and predict daily size changes of head and neck cancer tumors during the entire radiation therapy period. Using collected patients' CT scan data, MATLAB routines are developed to quantify the progressive geometric changes occurring in patients during radiation therapy. Regression analysis is implemented to develop predictive models for tumor size changes through entire period. The generated models are validated using leave-one-out cross validation. The proposed method will increase the accuracy of therapy and improve patient's safety and quality of life by reducing the number of harmful unnecessary CT scans.

  6. Integrated Simulation for HVAC Performance Prediction: State-of-the-Art Illustration

    Hensen, J.L.M.; Clarke, J.A.

    2000-01-01

    This paper aims to outline the current state-of-the-art in integrated building simulation for performance prediction of heating, ventilating and air-conditioning (HVAC) systems. The ESP-r system is used as an example where integrated simulation is a core philosophy behind the development. The

  7. Statistical downscaling of historical monthly mean winds over a coastal region of complex terrain. II. Predicting wind components

    Kamp, Derek van der [University of Victoria, Pacific Climate Impacts Consortium, Victoria, BC (Canada); University of Victoria, School of Earth and Ocean Sciences, Victoria, BC (Canada); Curry, Charles L. [Environment Canada University of Victoria, Canadian Centre for Climate Modelling and Analysis, Victoria, BC (Canada); University of Victoria, School of Earth and Ocean Sciences, Victoria, BC (Canada); Monahan, Adam H. [University of Victoria, School of Earth and Ocean Sciences, Victoria, BC (Canada)

    2012-04-15

    A regression-based downscaling technique was applied to monthly mean surface wind observations from stations throughout western Canada as well as from buoys in the Northeast Pacific Ocean over the period 1979-2006. A predictor set was developed from principal component analysis of the three wind components at 500 hPa and mean sea-level pressure taken from the NCEP Reanalysis II. Building on the results of a companion paper, Curry et al. (Clim Dyn 2011), the downscaling was applied to both wind speed and wind components, in an effort to evaluate the utility of each type of predictand. Cross-validated prediction skill varied strongly with season, with autumn and summer displaying the highest and lowest skill, respectively. In most cases wind components were predicted with better skill than wind speeds. The predictive ability of wind components was found to be strongly related to their orientation. Wind components with the best predictions were often oriented along topographically significant features such as constricted valleys, mountain ranges or ocean channels. This influence of directionality on predictive ability is most prominent during autumn and winter at inland sites with complex topography. Stations in regions with relatively flat terrain (where topographic steering is minimal) exhibit inter-station consistencies including region-wide seasonal shifts in the direction of the best predicted wind component. The conclusion that wind components can be skillfully predicted only over a limited range of directions at most stations limits the scope of statistically downscaled wind speed predictions. It seems likely that such limitations apply to other regions of complex terrain as well. (orig.)

  8. A statistical intercomparison of temperature and precipitation predicted by four general circulation models with historical data

    Grotch, S.L.

    1991-01-01

    This study is a detailed intercomparison of the results produced by four general circulation models (GCMs) that have been used to estimate the climatic consequences of a doubling of the CO 2 concentration. Two variables, surface air temperature and precipitation, annually and seasonally averaged, are compared for both the current climate and for the predicted equilibrium changes after a doubling of the atmospheric CO 2 concentration. The major question considered here is: how well do the predictions from different GCMs agree with each other and with historical climatology over different areal extents, from the global scale down to the range of only several gridpoints? Although the models often agree well when estimating averages over large areas, substantial disagreements become apparent as the spatial scale is reduced. At scales below continental, the correlations observed between different model predictions are often very poor. The implications of this work for investigation of climatic impacts on a regional scale are profound. For these two important variables, at least, the poor agreement between model simulations of the current climate on the regional scale calls into question the ability of these models to quantitatively estimate future climatic change on anything approaching the scale of a few (< 10) gridpoints, which is essential if these results are to be used in meaningful resource-assessment studies. A stronger cooperative effort among the different modeling groups will be necessary to assure that we are getting better agreement for the right reasons, a prerequisite for improving confidence in model projections. 11 refs.; 10 figs

  9. A statistical intercomparison of temperature and precipitation predicted by four general circulation models with historical data

    Grotch, S.L.

    1990-01-01

    This study is a detailed intercomparison of the results produced by four general circulation models (GCMs) that have been used to estimate the climatic consequences of a doubling of the CO 2 concentration. Two variables, surface air temperature and precipitation, annually and seasonally averaged, are compared for both the current climate and for the predicted equilibrium changes after a doubling of the atmospheric CO 2 concentration. The major question considered here is: how well do the predictions from different GCMs agree with each other and with historical climatology over different areal extents, from the global scale down to the range of only several gridpoints? Although the models often agree well when estimating averages over large areas, substantial disagreements become apparent as the spatial scale is reduced. At scales below continental, the correlations observed between different model predictions are often very poor. The implications of this work for investigation of climatic impacts on a regional scale are profound. For these two important variables, at least, the poor agreement between model simulations of the current climate on the regional scale calls into question the ability of these models to quantitatively estimate future climatic change on anything approaching the scale of a few (< 10) gridpoints, which is essential if these results are to be used in meaningful resource-assessment studies. A stronger cooperative effort among the different modeling groups will be necessary to assure that we are getting better agreement for the right reasons, a prerequisite for improving confidence in model projections

  10. A Systems Approach to Integrative Biology: An Overview of Statistical Methods to Elucidate Association and Architecture

    Ciaccio, Mark F.; Finkle, Justin D.; Xue, Albert Y.; Bagheri, Neda

    2014-01-01

    An organism’s ability to maintain a desired physiological response relies extensively on how cellular and molecular signaling networks interpret and react to environmental cues. The capacity to quantitatively predict how networks respond to a changing environment by modifying signaling regulation and phenotypic responses will help inform and predict the impact of a changing global enivronment on organisms and ecosystems. Many computational strategies have been developed to resolve cue–signal–...

  11. Long-Term Evolution of Email Networks: Statistical Regularities, Predictability and Stability of Social Behaviors.

    Godoy-Lorite, Antonia; Guimerà, Roger; Sales-Pardo, Marta

    2016-01-01

    In social networks, individuals constantly drop ties and replace them by new ones in a highly unpredictable fashion. This highly dynamical nature of social ties has important implications for processes such as the spread of information or of epidemics. Several studies have demonstrated the influence of a number of factors on the intricate microscopic process of tie replacement, but the macroscopic long-term effects of such changes remain largely unexplored. Here we investigate whether, despite the inherent randomness at the microscopic level, there are macroscopic statistical regularities in the long-term evolution of social networks. In particular, we analyze the email network of a large organization with over 1,000 individuals throughout four consecutive years. We find that, although the evolution of individual ties is highly unpredictable, the macro-evolution of social communication networks follows well-defined statistical patterns, characterized by exponentially decaying log-variations of the weight of social ties and of individuals' social strength. At the same time, we find that individuals have social signatures and communication strategies that are remarkably stable over the scale of several years.

  12. A new statistical scission-point model fed with microscopic ingredients to predict fission fragments distributions

    Heinrich, S.

    2006-01-01

    Nucleus fission process is a very complex phenomenon and, even nowadays, no realistic models describing the overall process are available. The work presented here deals with a theoretical description of fission fragments distributions in mass, charge, energy and deformation. We have reconsidered and updated the B.D. Wilking Scission Point model. Our purpose was to test if this statistic model applied at the scission point and by introducing new results of modern microscopic calculations allows to describe quantitatively the fission fragments distributions. We calculate the surface energy available at the scission point as a function of the fragments deformations. This surface is obtained from a Hartree Fock Bogoliubov microscopic calculation which guarantee a realistic description of the potential dependence on the deformation for each fragment. The statistic balance is described by the level densities of the fragment. We have tried to avoid as much as possible the input of empirical parameters in the model. Our only parameter, the distance between each fragment at the scission point, is discussed by comparison with scission configuration obtained from full dynamical microscopic calculations. Also, the comparison between our results and experimental data is very satisfying and allow us to discuss the success and limitations of our approach. We finally proposed ideas to improve the model, in particular by applying dynamical corrections. (author)

  13. Statistical analysis of solid lipid nanoparticles produced by high-pressure homogenization: a practical prediction approach

    Duran-Lobato, Matilde, E-mail: mduran@us.es [Universidad de Sevilla, Dpto. Farmacia y Tecnologia Farmaceutica, Facultad de Farmacia (Espana) (Spain); Enguix-Gonzalez, Alicia [Universidad de Sevilla, Dpto. Estadistica e Investigacion Operativa, Facultad de Matematicas (Espana) (Spain); Fernandez-Arevalo, Mercedes; Martin-Banderas, Lucia [Universidad de Sevilla, Dpto. Farmacia y Tecnologia Farmaceutica, Facultad de Farmacia (Espana) (Spain)

    2013-02-15

    Lipid nanoparticles (LNPs) are a promising carrier for all administration routes due to their safety, small size, and high loading of lipophilic compounds. Among the LNP production techniques, the easy scale-up, lack of organic solvents, and short production times of the high-pressure homogenization technique (HPH) make this method stand out. In this study, a statistical analysis was applied to the production of LNP by HPH. Spherical LNPs with mean size ranging from 65 nm to 11.623 {mu}m, negative zeta potential under -30 mV, and smooth surface were produced. Manageable equations based on commonly used parameters in the pharmaceutical field were obtained. The lipid to emulsifier ratio (R{sub L/S}) was proved to statistically explain the influence of oil phase and surfactant concentration on final nanoparticles size. Besides, the homogenization pressure was found to ultimately determine LNP size for a given R{sub L/S}, while the number of passes applied mainly determined polydispersion. {alpha}-Tocopherol was used as a model drug to illustrate release properties of LNP as a function of particle size, which was optimized by the regression models. This study is intended as a first step to optimize production conditions prior to LNP production at both laboratory and industrial scale from an eminently practical approach, based on parameters extensively used in formulation.

  14. Geographic prediction of tuberculosis clusters in Fukuoka, Japan, using the space-time scan statistic

    Daisuke Onozuka; Akihito Hagihara [Fukuoka Institute of Health and Environmental Sciences, Fukuoka (Japan). Department of Information Science

    2007-07-01

    Tuberculosis (TB) has reemerged as a global public health epidemic in recent years. Although evaluating local disease clusters leads to effective prevention and control of TB, there are few, if any, spatiotemporal comparisons for epidemic diseases. TB cases among residents in Fukuoka Prefecture between 1999 and 2004 (n = 9,119) were geocoded at the census tract level (n = 109) based on residence at the time of diagnosis. The spatial and space-time scan statistics were then used to identify clusters of census tracts with elevated proportions of TB cases. In the purely spatial analyses, the most likely clusters were in the Chikuho coal mining area (in 1999, 2002, 2003, 2004), the Kita-Kyushu industrial area (in 2000), and the Fukuoka urban area (in 2001). In the space-time analysis, the most likely cluster was the Kita-Kyushu industrial area (in 2000). The north part of Fukuoka Prefecture was the most likely to have a cluster with a significantly high occurrence of TB. The spatial and space-time scan statistics are effective ways of describing circular disease clusters. Since, in reality, infectious diseases might form other cluster types, the effectiveness of the method may be limited under actual practice. The sophistication of the analytical methodology, however, is a topic for future study. 48 refs., 3 figs., 3 tabs.

  15. Geographic prediction of tuberculosis clusters in Fukuoka, Japan, using the space-time scan statistic

    Onozuka Daisuke

    2007-04-01

    Full Text Available Abstract Background Tuberculosis (TB has reemerged as a global public health epidemic in recent years. Although evaluating local disease clusters leads to effective prevention and control of TB, there are few, if any, spatiotemporal comparisons for epidemic diseases. Methods TB cases among residents in Fukuoka Prefecture between 1999 and 2004 (n = 9,119 were geocoded at the census tract level (n = 109 based on residence at the time of diagnosis. The spatial and space-time scan statistics were then used to identify clusters of census tracts with elevated proportions of TB cases. Results In the purely spatial analyses, the most likely clusters were in the Chikuho coal mining area (in 1999, 2002, 2003, 2004, the Kita-Kyushu industrial area (in 2000, and the Fukuoka urban area (in 2001. In the space-time analysis, the most likely cluster was the Kita-Kyushu industrial area (in 2000. The north part of Fukuoka Prefecture was the most likely to have a cluster with a significantly high occurrence of TB. Conclusion The spatial and space-time scan statistics are effective ways of describing circular disease clusters. Since, in reality, infectious diseases might form other cluster types, the effectiveness of the method may be limited under actual practice. The sophistication of the analytical methodology, however, is a topic for future study.

  16. Prediction of hydrate formation temperature by both statistical models and artificial neural network approaches

    Zahedi, Gholamreza; Karami, Zohre; Yaghoobi, Hamed

    2009-01-01

    In this study, various estimation methods have been reviewed for hydrate formation temperature (HFT) and two procedures have been presented. In the first method, two general correlations have been proposed for HFT. One of the correlations has 11 parameters, and the second one has 18 parameters. In order to obtain constants in proposed equations, 203 experimental data points have been collected from literatures. The Engineering Equation Solver (EES) and Statistical Package for the Social Sciences (SPSS) soft wares have been employed for statistical analysis of the data. Accuracy of the obtained correlations also has been declared by comparison with experimental data and some recent common used correlations. In the second method, HFT is estimated by artificial neural network (ANN) approach. In this case, various architectures have been checked using 70% of experimental data for training of ANN. Among the various architectures multi layer perceptron (MLP) network with trainlm training algorithm was found as the best architecture. Comparing the obtained ANN model results with 30% of unseen data confirms ANN excellent estimation performance. It was found that ANN is more accurate than traditional methods and even our two proposed correlations for HFT estimation.

  17. Breast cancer statistics and prediction methodology: a systematic review and analysis.

    Dubey, Ashutosh Kumar; Gupta, Umesh; Jain, Sonal

    2015-01-01

    Breast cancer is a menacing cancer, primarily affecting women. Continuous research is going on for detecting breast cancer in the early stage as the possibility of cure in early stages is bright. There are two main objectives of this current study, first establish statistics for breast cancer and second to find methodologies which can be helpful in the early stage detection of the breast cancer based on previous studies. The breast cancer statistics for incidence and mortality of the UK, US, India and Egypt were considered for this study. The finding of this study proved that the overall mortality rates of the UK and US have been improved because of awareness, improved medical technology and screening, but in case of India and Egypt the condition is less positive because of lack of awareness. The methodological findings of this study suggest a combined framework based on data mining and evolutionary algorithms. It provides a strong bridge in improving the classification and detection accuracy of breast cancer data.

  18. Statistical analysis of solid lipid nanoparticles produced by high-pressure homogenization: a practical prediction approach

    Durán-Lobato, Matilde; Enguix-González, Alicia; Fernández-Arévalo, Mercedes; Martín-Banderas, Lucía

    2013-01-01

    Lipid nanoparticles (LNPs) are a promising carrier for all administration routes due to their safety, small size, and high loading of lipophilic compounds. Among the LNP production techniques, the easy scale-up, lack of organic solvents, and short production times of the high-pressure homogenization technique (HPH) make this method stand out. In this study, a statistical analysis was applied to the production of LNP by HPH. Spherical LNPs with mean size ranging from 65 nm to 11.623 μm, negative zeta potential under –30 mV, and smooth surface were produced. Manageable equations based on commonly used parameters in the pharmaceutical field were obtained. The lipid to emulsifier ratio (R L/S ) was proved to statistically explain the influence of oil phase and surfactant concentration on final nanoparticles size. Besides, the homogenization pressure was found to ultimately determine LNP size for a given R L/S , while the number of passes applied mainly determined polydispersion. α-Tocopherol was used as a model drug to illustrate release properties of LNP as a function of particle size, which was optimized by the regression models. This study is intended as a first step to optimize production conditions prior to LNP production at both laboratory and industrial scale from an eminently practical approach, based on parameters extensively used in formulation.

  19. Optimal day-ahead wind-thermal unit commitment considering statistical and predicted features of wind speeds

    Sun, Yanan; Dong, Jizhe; Ding, Lijuan

    2017-01-01

    Highlights: • A day–ahead wind–thermal unit commitment model is presented. • Wind speed transfer matrix is formed to depict the sequential wind features. • Spinning reserve setting considering wind power accuracy and variation is proposed. • Verified study is performed to check the correctness of the program. - Abstract: The increasing penetration of intermittent wind power affects the secure operation of power systems and leads to a requirement of robust and economic generation scheduling. This paper presents an optimal day–ahead wind–thermal generation scheduling method that considers the statistical and predicted features of wind speeds. In this method, the statistical analysis of historical wind data, which represents the local wind regime, is first implemented. Then, according to the statistical results and the predicted wind power, the spinning reserve requirements for the scheduling period are calculated. Based on the calculated spinning reserve requirements, the wind–thermal generation scheduling is finally conducted. To validate the program, a verified study is performed on a test system. Then, numerical studies to demonstrate the effectiveness of the proposed method are conducted.

  20. Using Advanced Data Mining And Integration In Environmental Prediction Scenarios

    Habala Ondrej

    2012-01-01

    Full Text Available We present one of the meteorological and hydrological experiments performed in the FP7 project ADMIRE. It serves as an experimental platform for hydrologists, and we have used it also as a testing platform for a suite of advanced data integration and data mining (DMI tools, developed within ADMIRE. The idea of ADMIRE is to develop an advanced DMI platform accessible even to users who are not familiar with data mining techniques. To this end, we have designed a novel DMI architecture, supported by a set of software tools, managed by DMI process descriptions written in a specialized high-level DMI language called DISPEL, and controlled via several different user interfaces, each performing a different set of tasks and targeting different user group.

  1. Predicted performance of an integrated modular engine system

    Binder, Michael; Felder, James L.

    1993-01-01

    Space vehicle propulsion systems are traditionally comprised of a cluster of discrete engines, each with its own set of turbopumps, valves, and a thrust chamber. The Integrated Modular Engine (IME) concept proposes a vehicle propulsion system comprised of multiple turbopumps, valves, and thrust chambers which are all interconnected. The IME concept has potential advantages in fault-tolerance, weight, and operational efficiency compared with the traditional clustered engine configuration. The purpose of this study is to examine the steady-state performance of an IME system with various components removed to simulate fault conditions. An IME configuration for a hydrogen/oxygen expander cycle propulsion system with four sets of turbopumps and eight thrust chambers has been modeled using the Rocket Engine Transient Simulator (ROCETS) program. The nominal steady-state performance is simulated, as well as turbopump thrust chamber and duct failures. The impact of component failures on system performance is discussed in the context of the system's fault tolerant capabilities.

  2. Development and application of a statistical methodology to evaluate the predictive accuracy of building energy baseline models

    Granderson, Jessica [Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Energy Technologies Area Div.; Price, Phillip N. [Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States). Energy Technologies Area Div.

    2014-03-01

    This paper documents the development and application of a general statistical methodology to assess the accuracy of baseline energy models, focusing on its application to Measurement and Verification (M&V) of whole-­building energy savings. The methodology complements the principles addressed in resources such as ASHRAE Guideline 14 and the International Performance Measurement and Verification Protocol. It requires fitting a baseline model to data from a ``training period’’ and using the model to predict total electricity consumption during a subsequent ``prediction period.’’ We illustrate the methodology by evaluating five baseline models using data from 29 buildings. The training period and prediction period were varied, and model predictions of daily, weekly, and monthly energy consumption were compared to meter data to determine model accuracy. Several metrics were used to characterize the accuracy of the predictions, and in some cases the best-­performing model as judged by one metric was not the best performer when judged by another metric.

  3. Integrating chemical footprinting data into RNA secondary structure prediction.

    Kourosh Zarringhalam

    Full Text Available Chemical and enzymatic footprinting experiments, such as shape (selective 2'-hydroxyl acylation analyzed by primer extension, yield important information about RNA secondary structure. Indeed, since the [Formula: see text]-hydroxyl is reactive at flexible (loop regions, but unreactive at base-paired regions, shape yields quantitative data about which RNA nucleotides are base-paired. Recently, low error rates in secondary structure prediction have been reported for three RNAs of moderate size, by including base stacking pseudo-energy terms derived from shape data into the computation of minimum free energy secondary structure. Here, we describe a novel method, RNAsc (RNA soft constraints, which includes pseudo-energy terms for each nucleotide position, rather than only for base stacking positions. We prove that RNAsc is self-consistent, in the sense that the nucleotide-specific probabilities of being unpaired in the low energy Boltzmann ensemble always become more closely correlated with the input shape data after application of RNAsc. From this mathematical perspective, the secondary structure predicted by RNAsc should be 'correct', in as much as the shape data is 'correct'. We benchmark RNAsc against the previously mentioned method for eight RNAs, for which both shape data and native structures are known, to find the same accuracy in 7 out of 8 cases, and an improvement of 25% in one case. Furthermore, we present what appears to be the first direct comparison of shape data and in-line probing data, by comparing yeast asp-tRNA shape data from the literature with data from in-line probing experiments we have recently performed. With respect to several criteria, we find that shape data appear to be more robust than in-line probing data, at least in the case of asp-tRNA.

  4. Integrating remotely sensed fires for predicting deforestation for REDD.

    Armenteras, Dolors; Gibbes, Cerian; Anaya, Jesús A; Dávalos, Liliana M

    2017-06-01

    Fire is an important tool in tropical forest management, as it alters forest composition, structure, and the carbon budget. The United Nations program on Reducing Emissions from Deforestation and Forest Degradation (REDD+) aims to sustainably manage forests, as well as to conserve and enhance their carbon stocks. Despite the crucial role of fire management, decision-making on REDD+ interventions fails to systematically include fires. Here, we address this critical knowledge gap in two ways. First, we review REDD+ projects and programs to assess the inclusion of fires in monitoring, reporting, and verification (MRV) systems. Second, we model the relationship between fire and forest for a pilot site in Colombia using near-real-time (NRT) fire monitoring data derived from the Moderate Resolution Imaging Spectroradiometer (MODIS). The literature review revealed fire remains to be incorporated as a key component of MRV systems. Spatially explicit modeling of land use change showed the probability of deforestation declined sharply with increasing distance to the nearest fire the preceding year (multi-year model area under the curve [AUC] 0.82). Deforestation predictions based on the model performed better than the official REDD early-warning system. The model AUC for 2013 and 2014 was 0.81, compared to 0.52 for the early-warning system in 2013 and 0.68 in 2014. This demonstrates NRT fire monitoring is a powerful tool to predict sites of forest deforestation. Applying new, publicly available, and open-access NRT fire data should be an essential element of early-warning systems to detect and prevent deforestation. Our results provide tools for improving both the current MRV systems, and the deforestation early-warning system in Colombia. © 2017 by the Ecological Society of America.

  5. A statistical model to predict total column ozone in Peninsular Malaysia

    Tan, K. C.; Lim, H. S.; Mat Jafri, M. Z.

    2016-03-01

    This study aims to predict monthly columnar ozone in Peninsular Malaysia based on concentrations of several atmospheric gases. Data pertaining to five atmospheric gases (CO2, O3, CH4, NO2, and H2O vapor) were retrieved by satellite scanning imaging absorption spectrometry for atmospheric chartography from 2003 to 2008 and used to develop a model to predict columnar ozone in Peninsular Malaysia. Analyses of the northeast monsoon (NEM) and the southwest monsoon (SWM) seasons were conducted separately. Based on the Pearson correlation matrices, columnar ozone was negatively correlated with H2O vapor but positively correlated with CO2 and NO2 during both the NEM and SWM seasons from 2003 to 2008. This result was expected because NO2 is a precursor of ozone. Therefore, an increase in columnar ozone concentration is associated with an increase in NO2 but a decrease in H2O vapor. In the NEM season, columnar ozone was negatively correlated with H2O (-0.847), NO2 (0.754), and CO2 (0.477); columnar ozone was also negatively but weakly correlated with CH4 (-0.035). In the SWM season, columnar ozone was highly positively correlated with NO2 (0.855), CO2 (0.572), and CH4 (0.321) and also highly negatively correlated with H2O (-0.832). Both multiple regression and principal component analyses were used to predict the columnar ozone value in Peninsular Malaysia. We obtained the best-fitting regression equations for the columnar ozone data using four independent variables. Our results show approximately the same R value (≈ 0.83) for both the NEM and SWM seasons.

  6. Statistical Viewer: a tool to upload and integrate linkage and association data as plots displayed within the Ensembl genome browser

    Hauser Elizabeth R

    2005-04-01

    Full Text Available Abstract Background To facilitate efficient selection and the prioritization of candidate complex disease susceptibility genes for association analysis, increasingly comprehensive annotation tools are essential to integrate, visualize and analyze vast quantities of disparate data generated by genomic screens, public human genome sequence annotation and ancillary biological databases. We have developed a plug-in package for Ensembl called "Statistical Viewer" that facilitates the analysis of genomic features and annotation in the regions of interest defined by linkage analysis. Results Statistical Viewer is an add-on package to the open-source Ensembl Genome Browser and Annotation System that displays disease study-specific linkage and/or association data as 2 dimensional plots in new panels in the context of Ensembl's Contig View and Cyto View pages. An enhanced upload server facilitates the upload of statistical data, as well as additional feature annotation to be displayed in DAS tracts, in the form of Excel Files. The Statistical View panel, drawn directly under the ideogram, illustrates lod score values for markers from a study of interest that are plotted against their position in base pairs. A module called "Get Map" easily converts the genetic locations of markers to genomic coordinates. The graph is placed under the corresponding ideogram features a synchronized vertical sliding selection box that is seamlessly integrated into Ensembl's Contig- and Cyto- View pages to choose the region to be displayed in Ensembl's "Overview" and "Detailed View" panels. To resolve Association and Fine mapping data plots, a "Detailed Statistic View" plot corresponding to the "Detailed View" may be displayed underneath. Conclusion Features mapping to regions of linkage are accentuated when Statistic View is used in conjunction with the Distributed Annotation System (DAS to display supplemental laboratory information such as differentially expressed disease

  7. Is Pan-Asian Economic Integration Moving Forward?: Evidence from Pan-Asian Trade Statistics

    Sapkota, Jeet Bahadur; Shuto, Motoko

    2016-01-01

    Asia is growing economically faster than any other region in the world; this led to the shift of the center of gravity of the global economy from the West to the East. However, it is not clear whether the Asian economy is integrating regionally or globally. In the context of the growing efforts of regional or sub-regional pan-Asian integration, it is worthwhile to explore the pan-Asian trade flows regionally as well as globally. Thus, this paper examines the trend and determinants of economic...

  8. Statistical model of natural stimuli predicts edge-like pooling of spatial frequency channels in V2

    Gutmann Michael

    2005-02-01

    Full Text Available Abstract Background It has been shown that the classical receptive fields of simple and complex cells in the primary visual cortex emerge from the statistical properties of natural images by forcing the cell responses to be maximally sparse or independent. We investigate how to learn features beyond the primary visual cortex from the statistical properties of modelled complex-cell outputs. In previous work, we showed that a new model, non-negative sparse coding, led to the emergence of features which code for contours of a given spatial frequency band. Results We applied ordinary independent component analysis to modelled outputs of complex cells that span different frequency bands. The analysis led to the emergence of features which pool spatially coherent across-frequency activity in the modelled primary visual cortex. Thus, the statistically optimal way of processing complex-cell outputs abandons separate frequency channels, while preserving and even enhancing orientation tuning and spatial localization. As a technical aside, we found that the non-negativity constraint is not necessary: ordinary independent component analysis produces essentially the same results as our previous work. Conclusion We propose that the pooling that emerges allows the features to code for realistic low-level image features related to step edges. Further, the results prove the viability of statistical modelling of natural images as a framework that produces quantitative predictions of visual processing.

  9. The Integrated Medical Model: A Probabilistic Simulation Model Predicting In-Flight Medical Risks

    Keenan, Alexandra; Young, Millennia; Saile, Lynn; Boley, Lynn; Walton, Marlei; Kerstman, Eric; Shah, Ronak; Goodenow, Debra A.; Myers, Jerry G., Jr.

    2015-01-01

    The Integrated Medical Model (IMM) is a probabilistic model that uses simulation to predict mission medical risk. Given a specific mission and crew scenario, medical events are simulated using Monte Carlo methodology to provide estimates of resource utilization, probability of evacuation, probability of loss of crew, and the amount of mission time lost due to illness. Mission and crew scenarios are defined by mission length, extravehicular activity (EVA) schedule, and crew characteristics including: sex, coronary artery calcium score, contacts, dental crowns, history of abdominal surgery, and EVA eligibility. The Integrated Medical Evidence Database (iMED) houses the model inputs for one hundred medical conditions using in-flight, analog, and terrestrial medical data. Inputs include incidence, event durations, resource utilization, and crew functional impairment. Severity of conditions is addressed by defining statistical distributions on the dichotomized best and worst-case scenarios for each condition. The outcome distributions for conditions are bounded by the treatment extremes of the fully treated scenario in which all required resources are available and the untreated scenario in which no required resources are available. Upon occurrence of a simulated medical event, treatment availability is assessed, and outcomes are generated depending on the status of the affected crewmember at the time of onset, including any pre-existing functional impairments or ongoing treatment of concurrent conditions. The main IMM outcomes, including probability of evacuation and loss of crew life, time lost due to medical events, and resource utilization, are useful in informing mission planning decisions. To date, the IMM has been used to assess mission-specific risks with and without certain crewmember characteristics, to determine the impact of eliminating certain resources from the mission medical kit, and to design medical kits that maximally benefit crew health while meeting

  10. The Integrated Medical Model: A Probabilistic Simulation Model for Predicting In-Flight Medical Risks

    Keenan, Alexandra; Young, Millennia; Saile, Lynn; Boley, Lynn; Walton, Marlei; Kerstman, Eric; Shah, Ronak; Goodenow, Debra A.; Myers, Jerry G.

    2015-01-01

    The Integrated Medical Model (IMM) is a probabilistic model that uses simulation to predict mission medical risk. Given a specific mission and crew scenario, medical events are simulated using Monte Carlo methodology to provide estimates of resource utilization, probability of evacuation, probability of loss of crew, and the amount of mission time lost due to illness. Mission and crew scenarios are defined by mission length, extravehicular activity (EVA) schedule, and crew characteristics including: sex, coronary artery calcium score, contacts, dental crowns, history of abdominal surgery, and EVA eligibility. The Integrated Medical Evidence Database (iMED) houses the model inputs for one hundred medical conditions using in-flight, analog, and terrestrial medical data. Inputs include incidence, event durations, resource utilization, and crew functional impairment. Severity of conditions is addressed by defining statistical distributions on the dichotomized best and worst-case scenarios for each condition. The outcome distributions for conditions are bounded by the treatment extremes of the fully treated scenario in which all required resources are available and the untreated scenario in which no required resources are available. Upon occurrence of a simulated medical event, treatment availability is assessed, and outcomes are generated depending on the status of the affected crewmember at the time of onset, including any pre-existing functional impairments or ongoing treatment of concurrent conditions. The main IMM outcomes, including probability of evacuation and loss of crew life, time lost due to medical events, and resource utilization, are useful in informing mission planning decisions. To date, the IMM has been used to assess mission-specific risks with and without certain crewmember characteristics, to determine the impact of eliminating certain resources from the mission medical kit, and to design medical kits that maximally benefit crew health while meeting

  11. A Statistical Approach To Prediction Of The CMM Drift Behaviour Using A Calibrated Mechanical Artefact

    Cuesta Eduardo

    2015-09-01

    Full Text Available This paper presents a multivariate regression predictive model of drift on the Coordinate Measuring Machine (CMM behaviour. Evaluation tests on a CMM with a multi-step gauge were carried out following an extended version of an ISO evaluation procedure with a periodicity of at least once a week and during more than five months. This test procedure consists in measuring the gauge for several range volumes, spatial locations, distances and repetitions. The procedure, environment conditions and even the gauge have been kept invariables, so a massive measurement dataset was collected over time under high repeatability conditions. A multivariate regression analysis has revealed the main parameters that could affect the CMM behaviour, and then detected a trend on the CMM performance drift. A performance model that considers both the size of the measured dimension and the elapsed time since the last CMM calibration has been developed. This model can predict the CMM performance and measurement reliability over time and also can estimate an optimized period between calibrations for a specific measurement length or accuracy level.

  12. A multimetric approach for predicting the ecological integrity of New Zealand streams

    Clapcott J.E.

    2014-01-01

    Full Text Available Integrating multiple measures of stream health into a combined metric can provide a holistic assessment of the ecological integrity of a stream. The aim of this study was to develop a multimetric index (MMI of stream integrity based on predictive modelling of national data sets of water quality, macroinvertebrates, fish and ecosystem process metrics. We used a boosted regression tree approach to calculate an observed/expected score for each metric prior to combining metrics in a MMI based on data availability and the strength of predictive models. The resulting MMI provides a geographically meaningful prediction of the ecological integrity of rivers in New Zealand, but identifies limitations in data and approach, providing focus for ongoing research.

  13. Statistical prediction of immunity to placental malaria based on multi-assay antibody data for malarial antigens

    Siriwardhana, Chathura; Fang, Rui; Salanti, Ali

    2017-01-01

    Background Plasmodium falciparum infections are especially severe in pregnant women because infected erythrocytes (IE) express VAR2CSA, a ligand that binds to placental trophoblasts, causing IE to accumulate in the placenta. Resulting inflammation and pathology increases a woman’s risk of anemia...... to 28 malarial antigens and used the data to develop statistical models for predicting if a woman has sufficient immunity to prevent PM. Methods Archival plasma samples from 1377 women were screened in a bead-based multiplex assay for Ab to 17 VAR2CSA-associated antigens (full length VAR2CSA (FV2), DBL...... in the following seven statistical approaches: logistic regression full model, logistic regression reduced model, recursive partitioning, random forests, linear discriminant analysis, quadratic discriminant analysis, and support vector machine. Results The best and simplest model proved to be the logistic...

  14. Statistical mechanics of neocortical interactions: Path-integral evolution of short-term memory

    Ingber, Lester

    1994-05-01

    Previous papers in this series of statistical mechanics of neocortical interactions (SMNI) have detailed a development from the relatively microscopic scales of neurons up to the macroscopic scales as recorded by electroencephalography (EEG), requiring an intermediate mesocolumnar scale to be developed at the scale of minicolumns (~=102 neurons) and macrocolumns (~=105 neurons). Opportunity was taken to view SMNI as sets of statistical constraints, not necessarily describing specific synaptic or neuronal mechanisms, on neuronal interactions, on some aspects of short-term memory (STM), e.g., its capacity, stability, and duration. A recently developed c-language code, pathint, provides a non-Monte Carlo technique for calculating the dynamic evolution of arbitrary-dimension (subject to computer resources) nonlinear Lagrangians, such as derived for the two-variable SMNI problem. Here, pathint is used to explicitly detail the evolution of the SMNI constraints on STM.

  15. Integrating community-based verbal autopsy into civil registration and vital statistics (CRVS): system-level considerations

    de Savigny, Don; Riley, Ian; Chandramohan, Daniel; Odhiambo, Frank; Nichols, Erin; Notzon, Sam; AbouZahr, Carla; Mitra, Raj; Cobos Muñoz, Daniel; Firth, Sonja; Maire, Nicolas; Sankoh, Osman; Bronson, Gay; Setel, Philip; Byass, Peter; Jakob, Robert; Boerma, Ties; Lopez, Alan D.

    2017-01-01

    ABSTRACT Background: Reliable and representative cause of death (COD) statistics are essential to inform public health policy, respond to emerging health needs, and document progress towards Sustainable Development Goals. However, less than one-third of deaths worldwide are assigned a cause. Civil registration and vital statistics (CRVS) systems in low- and lower-middle-income countries are failing to provide timely, complete and accurate vital statistics, and it will still be some time before they can provide physician-certified COD for every death. Proposals: Verbal autopsy (VA) is a method to ascertain the probable COD and, although imperfect, it is the best alternative in the absence of medical certification. There is extensive experience with VA in research settings but only a few examples of its use on a large scale. Data collection using electronic questionnaires on mobile devices and computer algorithms to analyse responses and estimate probable COD have increased the potential for VA to be routinely applied in CRVS systems. However, a number of CRVS and health system integration issues should be considered in planning, piloting and implementing a system-wide intervention such as VA. These include addressing the multiplicity of stakeholders and sub-systems involved, integration with existing CRVS work processes and information flows, linking VA results to civil registration records, information technology requirements and data quality assurance. Conclusions: Integrating VA within CRVS systems is not simply a technical undertaking. It will have profound system-wide effects that should be carefully considered when planning for an effective implementation. This paper identifies and discusses the major system-level issues and emerging practices, provides a planning checklist of system-level considerations and proposes an overview for how VA can be integrated into routine CRVS systems. PMID:28137194

  16. Integrated Solid Oxide Fuel Cell Power System Characteristics Prediction

    Marian GAICEANU

    2009-07-01

    Full Text Available The main objective of this paper is to deduce the specific characteristics of the CHP 100kWe Solid Oxide Fuel Cell (SOFC Power System from the steady state experimental data. From the experimental data, the authors have been developed and validated the steady state mathematical model. From the control room the steady state experimental data of the SOFC power conditioning are available and using the developed steady state mathematical model, the authors have been obtained the characteristic curves of the system performed by Siemens-Westinghouse Power Corporation. As a methodology the backward and forward power flow analysis has been employed. The backward power flow makes possible to obtain the SOFC power system operating point at different load levels, resulting as the load characteristic. By knowing the fuel cell output characteristic, the forward power flow analysis is used to predict the power system efficiency in different operating points, to choose the adequate control decision in order to obtain the high efficiency operation of the SOFC power system at different load levels. The CHP 100kWe power system is located at Gas Turbine Technologies Company (a Siemens Subsidiary, TurboCare brand in Turin, Italy. The work was carried out through the Energia da Ossidi Solidi (EOS Project. The SOFC stack delivers constant power permanently in order to supply the electric and thermal power both to the TurboCare Company and to the national grid.

  17. A graphical user interface for RAId, a knowledge integrated proteomics analysis suite with accurate statistics

    Joyce, Brendan; Lee, Danny; Rubio, Alex; Ogurtsov, Aleksey; Alves, Gelio; Yu, Yi-Kuo

    2018-01-01

    Abstract Objective RAId is a software package that has been actively developed for the past 10 years for computationally and visually analyzing MS/MS data. Founded on rigorous statistical methods, RAId’s core program computes accurate E-values for peptides and proteins identified during database searches. Making this robust tool readily accessible for the proteomics community by developing a graphical user interface (GUI) is our main goa...

  18. EFFECT OF MEASUREMENT ERRORS ON PREDICTED COSMOLOGICAL CONSTRAINTS FROM SHEAR PEAK STATISTICS WITH LARGE SYNOPTIC SURVEY TELESCOPE

    Bard, D.; Chang, C.; Kahn, S. M.; Gilmore, K.; Marshall, S. [KIPAC, Stanford University, 452 Lomita Mall, Stanford, CA 94309 (United States); Kratochvil, J. M.; Huffenberger, K. M. [Department of Physics, University of Miami, Coral Gables, FL 33124 (United States); May, M. [Physics Department, Brookhaven National Laboratory, Upton, NY 11973 (United States); AlSayyad, Y.; Connolly, A.; Gibson, R. R.; Jones, L.; Krughoff, S. [Department of Astronomy, University of Washington, Seattle, WA 98195 (United States); Ahmad, Z.; Bankert, J.; Grace, E.; Hannel, M.; Lorenz, S. [Department of Physics, Purdue University, West Lafayette, IN 47907 (United States); Haiman, Z.; Jernigan, J. G., E-mail: djbard@slac.stanford.edu [Department of Astronomy and Astrophysics, Columbia University, New York, NY 10027 (United States); and others

    2013-09-01

    We study the effect of galaxy shape measurement errors on predicted cosmological constraints from the statistics of shear peak counts with the Large Synoptic Survey Telescope (LSST). We use the LSST Image Simulator in combination with cosmological N-body simulations to model realistic shear maps for different cosmological models. We include both galaxy shape noise and, for the first time, measurement errors on galaxy shapes. We find that the measurement errors considered have relatively little impact on the constraining power of shear peak counts for LSST.

  19. Statistical properties of thermodynamically predicted RNA secondary structures in viral genomes

    Spanò, M.; Lillo, F.; Miccichè, S.; Mantegna, R. N.

    2008-10-01

    By performing a comprehensive study on 1832 segments of 1212 complete genomes of viruses, we show that in viral genomes the hairpin structures of thermodynamically predicted RNA secondary structures are more abundant than expected under a simple random null hypothesis. The detected hairpin structures of RNA secondary structures are present both in coding and in noncoding regions for the four groups of viruses categorized as dsDNA, dsRNA, ssDNA and ssRNA. For all groups, hairpin structures of RNA secondary structures are detected more frequently than expected for a random null hypothesis in noncoding rather than in coding regions. However, potential RNA secondary structures are also present in coding regions of dsDNA group. In fact, we detect evolutionary conserved RNA secondary structures in conserved coding and noncoding regions of a large set of complete genomes of dsDNA herpesviruses.

  20. Predicting of biomass in Brazilian tropical dry forest: a statistical evaluation of generic equations

    ROBSON B. DE LIMA

    2017-08-01

    Full Text Available ABSTRACT Dry tropical forests are a key component in the global carbon cycle and their biomass estimates depend almost exclusively of fitted equations for multi-species or individual species data. Therefore, a systematic evaluation of statistical models through validation of estimates of aboveground biomass stocks is justifiable. In this study was analyzed the capacity of generic and specific equations obtained from different locations in Mexico and Brazil, to estimate aboveground biomass at multi-species levels and for four different species. Generic equations developed in Mexico and Brazil performed better in estimating tree biomass for multi-species data. For Poincianella bracteosa and Mimosa ophthalmocentra, only the Sampaio and Silva (2005 generic equation was the most recommended. These equations indicate lower tendency and lower bias, and biomass estimates for these equations are similar. For the species Mimosa tenuiflora, Aspidosperma pyrifolium and for the genus Croton the specific regional equations are more recommended, although the generic equation of Sampaio and Silva (2005 is not discarded for biomass estimates. Models considering gender, families, successional groups, climatic variables and wood specific gravity should be adjusted, tested and the resulting equations should be validated at both local and regional levels as well as on the scales of tropics with dry forest dominance.

  1. Predicting of biomass in Brazilian tropical dry forest: a statistical evaluation of generic equations.

    Lima, Robson B DE; Alves, Francisco T; Oliveira, Cinthia P DE; Silva, José A A DA; Ferreira, Rinaldo L C

    2017-01-01

    Dry tropical forests are a key component in the global carbon cycle and their biomass estimates depend almost exclusively of fitted equations for multi-species or individual species data. Therefore, a systematic evaluation of statistical models through validation of estimates of aboveground biomass stocks is justifiable. In this study was analyzed the capacity of generic and specific equations obtained from different locations in Mexico and Brazil, to estimate aboveground biomass at multi-species levels and for four different species. Generic equations developed in Mexico and Brazil performed better in estimating tree biomass for multi-species data. For Poincianella bracteosa and Mimosa ophthalmocentra, only the Sampaio and Silva (2005) generic equation was the most recommended. These equations indicate lower tendency and lower bias, and biomass estimates for these equations are similar. For the species Mimosa tenuiflora, Aspidosperma pyrifolium and for the genus Croton the specific regional equations are more recommended, although the generic equation of Sampaio and Silva (2005) is not discarded for biomass estimates. Models considering gender, families, successional groups, climatic variables and wood specific gravity should be adjusted, tested and the resulting equations should be validated at both local and regional levels as well as on the scales of tropics with dry forest dominance.

  2. Hippocampal Structure Predicts Statistical Learning and Associative Inference Abilities during Development.

    Schlichting, Margaret L; Guarino, Katharine F; Schapiro, Anna C; Turk-Browne, Nicholas B; Preston, Alison R

    2017-01-01

    Despite the importance of learning and remembering across the lifespan, little is known about how the episodic memory system develops to support the extraction of associative structure from the environment. Here, we relate individual differences in volumes along the hippocampal long axis to performance on statistical learning and associative inference tasks-both of which require encoding associations that span multiple episodes-in a developmental sample ranging from ages 6 to 30 years. Relating age to volume, we found dissociable patterns across the hippocampal long axis, with opposite nonlinear volume changes in the head and body. These structural differences were paralleled by performance gains across the age range on both tasks, suggesting improvements in the cross-episode binding ability from childhood to adulthood. Controlling for age, we also found that smaller hippocampal heads were associated with superior behavioral performance on both tasks, consistent with this region's hypothesized role in forming generalized codes spanning events. Collectively, these results highlight the importance of examining hippocampal development as a function of position along the hippocampal axis and suggest that the hippocampal head is particularly important in encoding associative structure across development.

  3. Statistical surrogate models for prediction of high-consequence climate change.

    Constantine, Paul; Field, Richard V., Jr.; Boslough, Mark Bruce Elrick

    2011-09-01

    In safety engineering, performance metrics are defined using probabilistic risk assessments focused on the low-probability, high-consequence tail of the distribution of possible events, as opposed to best estimates based on central tendencies. We frame the climate change problem and its associated risks in a similar manner. To properly explore the tails of the distribution requires extensive sampling, which is not possible with existing coupled atmospheric models due to the high computational cost of each simulation. We therefore propose the use of specialized statistical surrogate models (SSMs) for the purpose of exploring the probability law of various climate variables of interest. A SSM is different than a deterministic surrogate model in that it represents each climate variable of interest as a space/time random field. The SSM can be calibrated to available spatial and temporal data from existing climate databases, e.g., the Program for Climate Model Diagnosis and Intercomparison (PCMDI), or to a collection of outputs from a General Circulation Model (GCM), e.g., the Community Earth System Model (CESM) and its predecessors. Because of its reduced size and complexity, the realization of a large number of independent model outputs from a SSM becomes computationally straightforward, so that quantifying the risk associated with low-probability, high-consequence climate events becomes feasible. A Bayesian framework is developed to provide quantitative measures of confidence, via Bayesian credible intervals, in the use of the proposed approach to assess these risks.

  4. Ultimate compression after impact load prediction in graphite/epoxy coupons using neural network and multivariate statistical analyses

    Gregoire, Alexandre David

    2011-07-01

    The goal of this research was to accurately predict the ultimate compressive load of impact damaged graphite/epoxy coupons using a Kohonen self-organizing map (SOM) neural network and multivariate statistical regression analysis (MSRA). An optimized use of these data treatment tools allowed the generation of a simple, physically understandable equation that predicts the ultimate failure load of an impacted damaged coupon based uniquely on the acoustic emissions it emits at low proof loads. Acoustic emission (AE) data were collected using two 150 kHz resonant transducers which detected and recorded the AE activity given off during compression to failure of thirty-four impacted 24-ply bidirectional woven cloth laminate graphite/epoxy coupons. The AE quantification parameters duration, energy and amplitude for each AE hit were input to the Kohonen self-organizing map (SOM) neural network to accurately classify the material failure mechanisms present in the low proof load data. The number of failure mechanisms from the first 30% of the loading for twenty-four coupons were used to generate a linear prediction equation which yielded a worst case ultimate load prediction error of 16.17%, just outside of the +/-15% B-basis allowables, which was the goal for this research. Particular emphasis was placed upon the noise removal process which was largely responsible for the accuracy of the results.

  5. Sentinel node status prediction by four statistical models: results from a large bi-institutional series (n = 1132).

    Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R

    2009-12-01

    To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.

  6. Statistical characteristics of surface integrity by fiber laser cutting of Nitinol vascular stents

    Fu, C.H.; Liu, J.F.; Guo, Andrew

    2015-01-01

    Graphical abstract: - Highlights: • Precision kerf with tight tolerance of Nitinol stents can be cut by fiber laser. • No HAZ in the subsurface was detected due to large grain size. • Recast layer has lower hardness than the bulk. • Laser cutting speed has a higher influence on surface integrity than laser power. - Abstract: Nitinol alloys have been widely used in manufacturing of vascular stents due to the outstanding properties such as superelasticity, shape memory, and superior biocompatibility. Laser cutting is the dominant process for manufacturing Nitinol stents. Conventional laser cutting usually produces unsatisfactory surface integrity which has a significant detrimental impact on stent performance. Emerging as a competitive process, fiber laser with high beam quality is expected to produce much less thermal damage such as striation, dross, heat affected zone (HAZ), and recast layer. To understand the process capability of fiber laser cutting of Nitinol alloy, a design-of-experiment based laser cutting experiment was performed. The kerf geometry, roughness, topography, microstructure, and hardness were studied to better understand the nature of the HAZ and recast layer in fiber laser cutting. Moreover, effect size analysis was conducted to investigate the relationship between surface integrity and process parameters.

  7. Statistical characteristics of surface integrity by fiber laser cutting of Nitinol vascular stents

    Fu, C.H., E-mail: cfu5@crimson.ua.edu [Dept of Mechanical Engineering, The University of Alabama, Tuscaloosa, AL 35487 (United States); Liu, J.F. [Dept of Mechanical Engineering, The University of Alabama, Tuscaloosa, AL 35487 (United States); Guo, Andrew [Dept of Mechanical Engineering, The University of Alabama, Tuscaloosa, AL 35487 (United States); College of Arts and Science, Vanderbilt University, Nashville, TN 37235 (United States)

    2015-10-30

    Graphical abstract: - Highlights: • Precision kerf with tight tolerance of Nitinol stents can be cut by fiber laser. • No HAZ in the subsurface was detected due to large grain size. • Recast layer has lower hardness than the bulk. • Laser cutting speed has a higher influence on surface integrity than laser power. - Abstract: Nitinol alloys have been widely used in manufacturing of vascular stents due to the outstanding properties such as superelasticity, shape memory, and superior biocompatibility. Laser cutting is the dominant process for manufacturing Nitinol stents. Conventional laser cutting usually produces unsatisfactory surface integrity which has a significant detrimental impact on stent performance. Emerging as a competitive process, fiber laser with high beam quality is expected to produce much less thermal damage such as striation, dross, heat affected zone (HAZ), and recast layer. To understand the process capability of fiber laser cutting of Nitinol alloy, a design-of-experiment based laser cutting experiment was performed. The kerf geometry, roughness, topography, microstructure, and hardness were studied to better understand the nature of the HAZ and recast layer in fiber laser cutting. Moreover, effect size analysis was conducted to investigate the relationship between surface integrity and process parameters.

  8. Statistical Analysis of Wave Climate Data Using Mixed Distributions and Extreme Wave Prediction

    Wei Li

    2016-05-01

    Full Text Available The investigation of various aspects of the wave climate at a wave energy test site is essential for the development of reliable and efficient wave energy conversion technology. This paper presents studies of the wave climate based on nine years of wave observations from the 2005–2013 period measured with a wave measurement buoy at the Lysekil wave energy test site located off the west coast of Sweden. A detailed analysis of the wave statistics is investigated to reveal the characteristics of the wave climate at this specific test site. The long-term extreme waves are estimated from applying the Peak over Threshold (POT method on the measured wave data. The significant wave height and the maximum wave height at the test site for different return periods are also compared. In this study, a new approach using a mixed-distribution model is proposed to describe the long-term behavior of the significant wave height and it shows an impressive goodness of fit to wave data from the test site. The mixed-distribution model is also applied to measured wave data from four other sites and it provides an illustration of the general applicability of the proposed model. The methodologies used in this paper can be applied to general wave climate analysis of wave energy test sites to estimate extreme waves for the survivability assessment of wave energy converters and characterize the long wave climate to forecast the wave energy resource of the test sites and the energy production of the wave energy converters.

  9. Uncertainty analysis of depth predictions from seismic reflection data using Bayesian statistics

    Michelioudakis, Dimitrios G.; Hobbs, Richard W.; Caiado, Camila C. S.

    2018-03-01

    Estimating the depths of target horizons from seismic reflection data is an important task in exploration geophysics. To constrain these depths we need a reliable and accurate velocity model. Here, we build an optimum 2D seismic reflection data processing flow focused on pre - stack deghosting filters and velocity model building and apply Bayesian methods, including Gaussian process emulation and Bayesian History Matching (BHM), to estimate the uncertainties of the depths of key horizons near the borehole DSDP-258 located in the Mentelle Basin, south west of Australia, and compare the results with the drilled core from that well. Following this strategy, the tie between the modelled and observed depths from DSDP-258 core was in accordance with the ± 2σ posterior credibility intervals and predictions for depths to key horizons were made for the two new drill sites, adjacent the existing borehole of the area. The probabilistic analysis allowed us to generate multiple realizations of pre-stack depth migrated images, these can be directly used to better constrain interpretation and identify potential risk at drill sites. The method will be applied to constrain the drilling targets for the upcoming International Ocean Discovery Program (IODP), leg 369.

  10. Uncertainty analysis of depth predictions from seismic reflection data using Bayesian statistics

    Michelioudakis, Dimitrios G.; Hobbs, Richard W.; Caiado, Camila C. S.

    2018-06-01

    Estimating the depths of target horizons from seismic reflection data is an important task in exploration geophysics. To constrain these depths we need a reliable and accurate velocity model. Here, we build an optimum 2-D seismic reflection data processing flow focused on pre-stack deghosting filters and velocity model building and apply Bayesian methods, including Gaussian process emulation and Bayesian History Matching, to estimate the uncertainties of the depths of key horizons near the Deep Sea Drilling Project (DSDP) borehole 258 (DSDP-258) located in the Mentelle Basin, southwest of Australia, and compare the results with the drilled core from that well. Following this strategy, the tie between the modelled and observed depths from DSDP-258 core was in accordance with the ±2σ posterior credibility intervals and predictions for depths to key horizons were made for the two new drill sites, adjacent to the existing borehole of the area. The probabilistic analysis allowed us to generate multiple realizations of pre-stack depth migrated images, these can be directly used to better constrain interpretation and identify potential risk at drill sites. The method will be applied to constrain the drilling targets for the upcoming International Ocean Discovery Program, leg 369.

  11. Multivariate Statistics and Supervised Learning for Predictive Detection of Unintentional Islanding in Grid-Tied Solar PV Systems

    Shashank Vyas

    2016-01-01

    Full Text Available Integration of solar photovoltaic (PV generation with power distribution networks leads to many operational challenges and complexities. Unintentional islanding is one of them which is of rising concern given the steady increase in grid-connected PV power. This paper builds up on an exploratory study of unintentional islanding on a modeled radial feeder having large PV penetration. Dynamic simulations, also run in real time, resulted in exploration of unique potential causes of creation of accidental islands. The resulting voltage and current data underwent dimensionality reduction using principal component analysis (PCA which formed the basis for the application of Q statistic control charts for detecting the anomalous currents that could island the system. For reducing the false alarm rate of anomaly detection, Kullback-Leibler (K-L divergence was applied on the principal component projections which concluded that Q statistic based approach alone is not reliable for detection of the symptoms liable to cause unintentional islanding. The obtained data was labeled and a K-nearest neighbor (K-NN binomial classifier was then trained for identification and classification of potential islanding precursors from other power system transients. The three-phase short-circuit fault case was successfully identified as statistically different from islanding symptoms.

  12. Prediction of failure enthalpy and reliability of irradiated fuel rod under reactivity-initiated accidents by means of statistical approach

    Nam, Cheol; Choi, Byeong Kwon; Jeong, Yong Hwan; Jung, Youn Ho

    2001-01-01

    During the last decade, the failure behavior of high-burnup fuel rods under RIA has been an extensive concern since observations of fuel rod failures at low enthalpy. Of great importance is placed on failure prediction of fuel rod in the point of licensing criteria and safety in extending burnup achievement. To address the issue, a statistics-based methodology is introduced to predict failure probability of irradiated fuel rods. Based on RIA simulation results in literature, a failure enthalpy correlation for irradiated fuel rod is constructed as a function of oxide thickness, fuel burnup, and pulse width. From the failure enthalpy correlation, a single damage parameter, equivalent enthalpy, is defined to reflect the effects of the three primary factors as well as peak fuel enthalpy. Moreover, the failure distribution function with equivalent enthalpy is derived, applying a two-parameter Weibull statistical model. Using these equations, the sensitivity analysis is carried out to estimate the effects of burnup, corrosion, peak fuel enthalpy, pulse width and cladding materials used

  13. Vitamin D and ferritin correlation with chronic neck pain using standard statistics and a novel artificial neural network prediction model.

    Eloqayli, Haytham; Al-Yousef, Ali; Jaradat, Raid

    2018-02-15

    Despite the high prevalence of chronic neck pain, there is limited consensus about the primary etiology, risk factors, diagnostic criteria and therapeutic outcome. Here, we aimed to determine if Ferritin and Vitamin D are modifiable risk factors with chronic neck pain using slandered statistics and artificial intelligence neural network (ANN). Fifty-four patients with chronic neck pain treated between February 2016 and August 2016 in King Abdullah University Hospital and 54 patients age matched controls undergoing outpatient or minor procedures were enrolled. Patients and control demographic parameters, height, weight and single measurement of serum vitamin D, Vitamin B12, ferritin, calcium, phosphorus, zinc were obtained. An ANN prediction model was developed. The statistical analysis reveals that patients with chronic neck pain have significantly lower serum Vitamin D and Ferritin (p-value artificial neural network can be of future benefit in classification and prediction models for chronic neck pain. We hope this initial work will encourage a future larger cohort study addressing vitamin D and iron correction as modifiable factors and the application of artificial intelligence models in clinical practice.

  14. Direct integration of intensity-level data from Affymetrix and Illumina microarrays improves statistical power for robust reanalysis

    Turnbull Arran K

    2012-08-01

    Full Text Available Abstract Background Affymetrix GeneChips and Illumina BeadArrays are the most widely used commercial single channel gene expression microarrays. Public data repositories are an extremely valuable resource, providing array-derived gene expression measurements from many thousands of experiments. Unfortunately many of these studies are underpowered and it is desirable to improve power by combining data from more than one study; we sought to determine whether platform-specific bias precludes direct integration of probe intensity signals for combined reanalysis. Results Using Affymetrix and Illumina data from the microarray quality control project, from our own clinical samples, and from additional publicly available datasets we evaluated several approaches to directly integrate intensity level expression data from the two platforms. After mapping probe sequences to Ensembl genes we demonstrate that, ComBat and cross platform normalisation (XPN, significantly outperform mean-centering and distance-weighted discrimination (DWD in terms of minimising inter-platform variance. In particular we observed that DWD, a popular method used in a number of previous studies, removed systematic bias at the expense of genuine biological variability, potentially reducing legitimate biological differences from integrated datasets. Conclusion Normalised and batch-corrected intensity-level data from Affymetrix and Illumina microarrays can be directly combined to generate biologically meaningful results with improved statistical power for robust, integrated reanalysis.

  15. Predictive analysis of beer quality by correlating sensory evaluation with higher alcohol and ester production using multivariate statistics methods.

    Dong, Jian-Jun; Li, Qing-Liang; Yin, Hua; Zhong, Cheng; Hao, Jun-Guang; Yang, Pan-Fei; Tian, Yu-Hong; Jia, Shi-Ru

    2014-10-15

    Sensory evaluation is regarded as a necessary procedure to ensure a reproducible quality of beer. Meanwhile, high-throughput analytical methods provide a powerful tool to analyse various flavour compounds, such as higher alcohol and ester. In this study, the relationship between flavour compounds and sensory evaluation was established by non-linear models such as partial least squares (PLS), genetic algorithm back-propagation neural network (GA-BP), support vector machine (SVM). It was shown that SVM with a Radial Basis Function (RBF) had a better performance of prediction accuracy for both calibration set (94.3%) and validation set (96.2%) than other models. Relatively lower prediction abilities were observed for GA-BP (52.1%) and PLS (31.7%). In addition, the kernel function of SVM played an essential role of model training when the prediction accuracy of SVM with polynomial kernel function was 32.9%. As a powerful multivariate statistics method, SVM holds great potential to assess beer quality. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Statistical modelling coupled with LC-MS analysis to predict human upper intestinal absorption of phytochemical mixtures.

    Selby-Pham, Sophie N B; Howell, Kate S; Dunshea, Frank R; Ludbey, Joel; Lutz, Adrian; Bennett, Louise

    2018-04-15

    A diet rich in phytochemicals confers benefits for health by reducing the risk of chronic diseases via regulation of oxidative stress and inflammation (OSI). For optimal protective bio-efficacy, the time required for phytochemicals and their metabolites to reach maximal plasma concentrations (T max ) should be synchronised with the time of increased OSI. A statistical model has been reported to predict T max of individual phytochemicals based on molecular mass and lipophilicity. We report the application of the model for predicting the absorption profile of an uncharacterised phytochemical mixture, herein referred to as the 'functional fingerprint'. First, chemical profiles of phytochemical extracts were acquired using liquid chromatography mass spectrometry (LC-MS), then the molecular features for respective components were used to predict their plasma absorption maximum, based on molecular mass and lipophilicity. This method of 'functional fingerprinting' of plant extracts represents a novel tool for understanding and optimising the health efficacy of plant extracts. Copyright © 2017 Elsevier Ltd. All rights reserved.

  17. Using integrated multivariate statistics to assess the hydrochemistry of surface water quality, Lake Taihu basin, China

    Xiangyu Mu

    2014-09-01

    Full Text Available Natural factors and anthropogenic activities both contribute dissolved chemical loads to  lakes and streams.  Mineral solubility,  geomorphology of the drainage basin, source strengths and climate all contribute to concentrations and their variability. Urbanization and agriculture waste-water particularly lead to aquatic environmental degradation. Major contaminant sources and controls on water quality can be asssessed by analyzing the variability in proportions of major and minor solutes in water coupled to mutivariate statistical methods.   The demand for freshwater needed for increasing crop production puulation and industrialization occurs almost everywhere in in China and these conflicting needs have led to widespread water contamination. Because of heavy nutrient loadings from all of these sources, Lake Taihu (eastern China notably suffers periodic hyper-eutrophication and drinking water deterioration, which has led to shortages of freshwater for the City of Wuxi and other nearby cities. This lake, the third largest freshwater body in China, has historically beeen considered a cultural treasure of China, and has supported long-term fisheries. The is increasing pressure to remediate the present contamination which compromises both aquiculture and the prior economic base centered on tourism.  However, remediation cannot be effectively done without first characterizing the broad nature of the non-point source pollution. To this end, we investigated the hydrochemical setting of Lake Taihu to determine how different land use types influence the variability of surface water chemistry in different water sources to the lake. We found that waters broadly show wide variability ranging from  calcium-magnesium-bicarbonate hydrochemical facies type to mixed sodium-sulfate-chloride type. Principal components analysis produced three principal components that explained 78% of the variance in the water quality and reflect three major types of water

  18. Application of Different Statistical Techniques in Integrated Logistics Support of the International Space Station Alpha

    Sepehry-Fard, F.; Coulthard, Maurice H.

    1995-01-01

    The process to predict the values of the maintenance time dependent variable parameters such as mean time between failures (MTBF) over time must be one that will not in turn introduce uncontrolled deviation in the results of the ILS analysis such as life cycle cost spares calculation, etc. A minor deviation in the values of the maintenance time dependent variable parameters such as MTBF over time will have a significant impact on the logistics resources demands, International Space Station availability, and maintenance support costs. It is the objective of this report to identify the magnitude of the expected enhancement in the accuracy of the results for the International Space Station reliability and maintainability data packages by providing examples. These examples partially portray the necessary information hy evaluating the impact of the said enhancements on the life cycle cost and the availability of the International Space Station.

  19. The integration of weighted human gene association networks based on link prediction.

    Yang, Jian; Yang, Tinghong; Wu, Duzhi; Lin, Limei; Yang, Fan; Zhao, Jing

    2017-01-31

    Physical and functional interplays between genes or proteins have important biological meaning for cellular functions. Some efforts have been made to construct weighted gene association meta-networks by integrating multiple biological resources, where the weight indicates the confidence of the interaction. However, it is found that these existing human gene association networks share only quite limited overlapped interactions, suggesting their incompleteness and noise. Here we proposed a workflow to construct a weighted human gene association network using information of six existing networks, including two weighted specific PPI networks and four gene association meta-networks. We applied link prediction algorithm to predict possible missing links of the networks, cross-validation approach to refine each network and finally integrated the refined networks to get the final integrated network. The common information among the refined networks increases notably, suggesting their higher reliability. Our final integrated network owns much more links than most of the original networks, meanwhile its links still keep high functional relevance. Being used as background network in a case study of disease gene prediction, the final integrated network presents good performance, implying its reliability and application significance. Our workflow could be insightful for integrating and refining existing gene association data.

  20. Protein logic: a statistical mechanical study of signal integration at the single-molecule level.

    de Ronde, Wiet; Rein ten Wolde, Pieter; Mugler, Andrew

    2012-09-05

    Information processing and decision-making is based upon logic operations, which in cellular networks has been well characterized at the level of transcription. In recent years, however, both experimentalists and theorists have begun to appreciate that cellular decision-making can also be performed at the level of a single protein, giving rise to the notion of protein logic. Here we systematically explore protein logic using a well-known statistical mechanical model. As an example system, we focus on receptors that bind either one or two ligands, and their associated dimers. Notably, we find that a single heterodimer can realize any of the 16 possible logic gates, including the XOR gate, by variation of biochemical parameters. We then introduce what to our knowledge is a novel idea: that a set of receptors with fixed parameters can encode functionally unique logic gates simply by forming different dimeric combinations. An exhaustive search reveals that the simplest set of receptors (two single-ligand receptors and one double-ligand receptor) can realize several different groups of three unique gates, a result for which the parametric analysis of single receptors and dimers provides a clear interpretation. Both results underscore the surprising functional freedom readily available to cells at the single-protein level. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  1. Integrating Symbolic and Statistical Methods for Testing Intelligent Systems Applications to Machine Learning and Computer Vision

    Jha, Sumit Kumar [University of Central Florida, Orlando; Pullum, Laura L [ORNL; Ramanathan, Arvind [ORNL

    2016-01-01

    Embedded intelligent systems ranging from tiny im- plantable biomedical devices to large swarms of autonomous un- manned aerial systems are becoming pervasive in our daily lives. While we depend on the flawless functioning of such intelligent systems, and often take their behavioral correctness and safety for granted, it is notoriously difficult to generate test cases that expose subtle errors in the implementations of machine learning algorithms. Hence, the validation of intelligent systems is usually achieved by studying their behavior on representative data sets, using methods such as cross-validation and bootstrapping.In this paper, we present a new testing methodology for studying the correctness of intelligent systems. Our approach uses symbolic decision procedures coupled with statistical hypothesis testing to. We also use our algorithm to analyze the robustness of a human detection algorithm built using the OpenCV open-source computer vision library. We show that the human detection implementation can fail to detect humans in perturbed video frames even when the perturbations are so small that the corresponding frames look identical to the naked eye.

  2. Integrated GIS and multivariate statistical analysis for regional scale assessment of heavy metal soil contamination: A critical review

    Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan

    2017-01-01

    Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0–0.10 m, or 0–0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component

  3. Analysis Code - Data Analysis in 'Leveraging Multiple Statistical Methods for Inverse Prediction in Nuclear Forensics Applications' (LMSMIPNFA) v. 1.0

    2018-03-19

    R code that performs the analysis of a data set presented in the paper ‘Leveraging Multiple Statistical Methods for Inverse Prediction in Nuclear Forensics Applications’ by Lewis, J., Zhang, A., Anderson-Cook, C. It provides functions for doing inverse predictions in this setting using several different statistical methods. The data set is a publicly available data set from a historical Plutonium production experiment.

  4. C-terminal motif prediction in eukaryotic proteomes using comparative genomics and statistical over-representation across protein families

    Cutler Sean R

    2007-06-01

    Full Text Available Abstract Background The carboxy termini of proteins are a frequent site of activity for a variety of biologically important functions, ranging from post-translational modification to protein targeting. Several short peptide motifs involved in protein sorting roles and dependent upon their proximity to the C-terminus for proper function have already been characterized. As a limited number of such motifs have been identified, the potential exists for genome-wide statistical analysis and comparative genomics to reveal novel peptide signatures functioning in a C-terminal dependent manner. We have applied a novel methodology to the prediction of C-terminal-anchored peptide motifs involving a simple z-statistic and several techniques for improving the signal-to-noise ratio. Results We examined the statistical over-representation of position-specific C-terminal tripeptides in 7 eukaryotic proteomes. Sequence randomization models and simple-sequence masking were applied to the successful reduction of background noise. Similarly, as C-terminal homology among members of large protein families may artificially inflate tripeptide counts in an irrelevant and obfuscating manner, gene-family clustering was performed prior to the analysis in order to assess tripeptide over-representation across protein families as opposed to across all proteins. Finally, comparative genomics was used to identify tripeptides significantly occurring in multiple species. This approach has been able to predict, to our knowledge, all C-terminally anchored targeting motifs present in the literature. These include the PTS1 peroxisomal targeting signal (SKL*, the ER-retention signal (K/HDEL*, the ER-retrieval signal for membrane bound proteins (KKxx*, the prenylation signal (CC* and the CaaX box prenylation motif. In addition to a high statistical over-representation of these known motifs, a collection of significant tripeptides with a high propensity for biological function exists

  5. A statistical method for predicting splice variants between two groups of samples using GeneChip® expression array data

    Olson James M

    2006-04-01

    Full Text Available Abstract Background Alternative splicing of pre-messenger RNA results in RNA variants with combinations of selected exons. It is one of the essential biological functions and regulatory components in higher eukaryotic cells. Some of these variants are detectable with the Affymetrix GeneChip® that uses multiple oligonucleotide probes (i.e. probe set, since the target sequences for the multiple probes are adjacent within each gene. Hybridization intensity from a probe correlates with abundance of the corresponding transcript. Although the multiple-probe feature in the current GeneChip® was designed to assess expression values of individual genes, it also measures transcriptional abundance for a sub-region of a gene sequence. This additional capacity motivated us to develop a method to predict alternative splicing, taking advance of extensive repositories of GeneChip® gene expression array data. Results We developed a two-step approach to predict alternative splicing from GeneChip® data. First, we clustered the probes from a probe set into pseudo-exons based on similarity of probe intensities and physical adjacency. A pseudo-exon is defined as a sequence in the gene within which multiple probes have comparable probe intensity values. Second, for each pseudo-exon, we assessed the statistical significance of the difference in probe intensity between two groups of samples. Differentially expressed pseudo-exons are predicted to be alternatively spliced. We applied our method to empirical data generated from GeneChip® Hu6800 arrays, which include 7129 probe sets and twenty probes per probe set. The dataset consists of sixty-nine medulloblastoma (27 metastatic and 42 non-metastatic samples and four cerebellum samples as normal controls. We predicted that 577 genes would be alternatively spliced when we compared normal cerebellum samples to medulloblastomas, and predicted that thirteen genes would be alternatively spliced when we compared metastatic

  6. Integrating Statistical and Expert Knowledge to Develop Phenoregions for the Continental United States

    Betancourt, J. L.; Biondi, F.; Bradford, J. B.; Foster, J. R.; Betancourt, J. L.; Foster, J. R.; Biondi, F.; Bradford, J. B.; Henebry, G. M.; Post, E.; Koenig, W.; Hoffman, F. M.; de Beurs, K.; Hoffman, F. M.; Kumar, J.; Hargrove, W. W.; Norman, S. P.; Brooks, B. G.

    2016-12-01

    Vegetated ecosystems exhibit unique phenological behavior over the course of a year, suggesting that remotely sensed land surface phenology may be useful for characterizing land cover and ecoregions. However, phenology is also strongly influenced by temperature and water stress; insect, fire, and weather disturbances; and climate change over seasonal, interannual, decadal and longer time scales. Normalized difference vegetation index (NDVI), a remotely sensed measure of greenness, provides a useful proxy for land surface phenology. We used NDVI for the conterminous United States (CONUS) derived from the Moderate Resolution Spectroradiometer (MODIS) every eight days at 250 m resolution for the period 2000-2015 to develop phenological signatures of emergent ecological regimes called phenoregions. We employed a "Big Data" classification approach on a supercomputer, specifically applying an unsupervised data mining technique, to this large collection of NDVI measurements to develop annual maps of phenoregions. This technique produces a prescribed number of prototypical phenological states to which every location belongs in any year. To reduce the impact of short-term disturbances, we derived a single map of the mode of annual phenological states for the CONUS, assigning each map cell to the state with the largest integrated NDVI in cases where multiple states tie for the highest frequency of occurrence. Since the data mining technique is unsupervised, individual phenoregions are not associated with an ecologically understandable label. To add automated supervision to the process, we applied the method of Mapcurves, developed by Hargrove and Hoffman, to associate individual phenoregions with labeled polygons in expert-derived maps of biomes, land cover, and ecoregions. We will present the phenoregions methodology and resulting maps for the CONUS, describe the "label-stealing" technique for ascribing biome characteristics to phenoregions, and introduce a new polar

  7. Computational Performance Optimisation for Statistical Analysis of the Effect of Nano-CMOS Variability on Integrated Circuits

    Zheng Xie

    2013-01-01

    Full Text Available The intrinsic variability of nanoscale VLSI technology must be taken into account when analyzing circuit designs to predict likely yield. Monte-Carlo- (MC- and quasi-MC- (QMC- based statistical techniques do this by analysing many randomised or quasirandomised copies of circuits. The randomisation must model forms of variability that occur in nano-CMOS technology, including “atomistic” effects without intradie correlation and effects with intradie correlation between neighbouring devices. A major problem is the computational cost of carrying out sufficient analyses to produce statistically reliable results. The use of principal components analysis, behavioural modeling, and an implementation of “Statistical Blockade” (SB is shown to be capable of achieving significant reduction in the computational costs. A computation time reduction of 98.7% was achieved for a commonly used asynchronous circuit element. Replacing MC by QMC analysis can achieve further computation reduction, and this is illustrated for more complex circuits, with the results being compared with those of transistor-level simulations. The “yield prediction” analysis of SRAM arrays is taken as a case study, where the arrays contain up to 1536 transistors modelled using parameters appropriate to 35 nm technology. It is reported that savings of up to 99.85% in computation time were obtained.

  8. A statistical method for predicting sound absorbing property of porous metal materials by using quartet structure generation set

    Guan, Dong; Wu, Jiu Hui; Jing, Li

    2015-01-01

    Highlights: • A random internal morphology and structure generation-growth method, termed as the quartet structure generation set (QSGS), has been utilized based on the stochastic cluster growth theory for numerical generating the various microstructures of porous metal materials. • Effects of different parameters such as thickness and porosity on sound absorption performance of the generated structures are studied by the present method, and the obtained results are validated by an empirical model as well. • This method could be utilized to guide the design and fabrication of the sound-absorption porous metal materials. - Abstract: In this paper, a statistical method for predicting sound absorption properties of porous metal materials is presented. To reflect the stochastic distribution characteristics of the porous metal materials, a random internal morphology and structure generation-growth method, termed as the quartet structure generation set (QSGS), has been utilized based on the stochastic cluster growth theory for numerical generating the various microstructures of porous metal materials. Then by using the transfer-function approach along with the QSGS tool, we investigate the sound absorbing performance of porous metal materials with complex stochastic geometries. The statistical method has been validated by the good agreement among the numerical results for metal rubber from this method and a previous empirical model and the corresponding experimental data. Furthermore, the effects of different parameters such as thickness and porosity on sound absorption performance of the generated structures are studied by the present method, and the obtained results are validated by an empirical model as well. Therefore, the present method is a reliable and robust method for predicting the sound absorption performance of porous metal materials, and could be utilized to guide the design and fabrication of the sound-absorption porous metal materials

  9. The Abdominal Aortic Aneurysm Statistically Corrected Operative Risk Evaluation (AAA SCORE) for predicting mortality after open and endovascular interventions.

    Ambler, Graeme K; Gohel, Manjit S; Mitchell, David C; Loftus, Ian M; Boyle, Jonathan R

    2015-01-01

    Accurate adjustment of surgical outcome data for risk is vital in an era of surgeon-level reporting. Current risk prediction models for abdominal aortic aneurysm (AAA) repair are suboptimal. We aimed to develop a reliable risk model for in-hospital mortality after intervention for AAA, using rigorous contemporary statistical techniques to handle missing data. Using data collected during a 15-month period in the United Kingdom National Vascular Database, we applied multiple imputation methodology together with stepwise model selection to generate preoperative and perioperative models of in-hospital mortality after AAA repair, using two thirds of the available data. Model performance was then assessed on the remaining third of the data by receiver operating characteristic curve analysis and compared with existing risk prediction models. Model calibration was assessed by Hosmer-Lemeshow analysis. A total of 8088 AAA repair operations were recorded in the National Vascular Database during the study period, of which 5870 (72.6%) were elective procedures. Both preoperative and perioperative models showed excellent discrimination, with areas under the receiver operating characteristic curve of .89 and .92, respectively. This was significantly better than any of the existing models (area under the receiver operating characteristic curve for best comparator model, .84 and .88; P AAA repair. These models were carefully developed with rigorous statistical methodology and significantly outperform existing methods for both elective cases and overall AAA mortality. These models will be invaluable for both preoperative patient counseling and accurate risk adjustment of published outcome data. Copyright © 2015 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.

  10. Predicting adverse drug reaction profiles by integrating protein interaction networks with drug structures.

    Huang, Liang-Chin; Wu, Xiaogang; Chen, Jake Y

    2013-01-01

    The prediction of adverse drug reactions (ADRs) has become increasingly important, due to the rising concern on serious ADRs that can cause drugs to fail to reach or stay in the market. We proposed a framework for predicting ADR profiles by integrating protein-protein interaction (PPI) networks with drug structures. We compared ADR prediction performances over 18 ADR categories through four feature groups-only drug targets, drug targets with PPI networks, drug structures, and drug targets with PPI networks plus drug structures. The results showed that the integration of PPI networks and drug structures can significantly improve the ADR prediction performance. The median AUC values for the four groups were 0.59, 0.61, 0.65, and 0.70. We used the protein features in the best two models, "Cardiac disorders" (median-AUC: 0.82) and "Psychiatric disorders" (median-AUC: 0.76), to build ADR-specific PPI networks with literature supports. For validation, we examined 30 drugs withdrawn from the U.S. market to see if our approach can predict their ADR profiles and explain why they were withdrawn. Except for three drugs having ADRs in the categories we did not predict, 25 out of 27 withdrawn drugs (92.6%) having severe ADRs were successfully predicted by our approach. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Predicting cycle time distributions for integrated processing workstations : an aggregate modeling approach

    Veeger, C.P.L.; Etman, L.F.P.; Lefeber, A.A.J.; Adan, I.J.B.F.; Herk, van J.; Rooda, J.E.

    2011-01-01

    To predict cycle time distributions of integrated processing workstations, detailed simulation models are almost exclusively used; these models require considerable development and maintenance effort. As an alternative, we propose an aggregate model that is a lumped-parameter representation of the

  12. Predicting Examination Performance Using an Expanded Integrated Hierarchical Model of Test Emotions and Achievement Goals

    Putwain, Dave; Deveney, Carolyn

    2009-01-01

    The aim of this study was to examine an expanded integrative hierarchical model of test emotions and achievement goal orientations in predicting the examination performance of undergraduate students. Achievement goals were theorised as mediating the relationship between test emotions and performance. 120 undergraduate students completed…

  13. COMPUTING THERAPY FOR PRECISION MEDICINE: COLLABORATIVE FILTERING INTEGRATES AND PREDICTS MULTI-ENTITY INTERACTIONS.

    Regenbogen, Sam; Wilkins, Angela D; Lichtarge, Olivier

    2016-01-01

    Biomedicine produces copious information it cannot fully exploit. Specifically, there is considerable need to integrate knowledge from disparate studies to discover connections across domains. Here, we used a Collaborative Filtering approach, inspired by online recommendation algorithms, in which non-negative matrix factorization (NMF) predicts interactions among chemicals, genes, and diseases only from pairwise information about their interactions. Our approach, applied to matrices derived from the Comparative Toxicogenomics Database, successfully recovered Chemical-Disease, Chemical-Gene, and Disease-Gene networks in 10-fold cross-validation experiments. Additionally, we could predict each of these interaction matrices from the other two. Integrating all three CTD interaction matrices with NMF led to good predictions of STRING, an independent, external network of protein-protein interactions. Finally, this approach could integrate the CTD and STRING interaction data to improve Chemical-Gene cross-validation performance significantly, and, in a time-stamped study, it predicted information added to CTD after a given date, using only data prior to that date. We conclude that collaborative filtering can integrate information across multiple types of biological entities, and that as a first step towards precision medicine it can compute drug repurposing hypotheses.

  14. Factors Influencing College Women's Contraceptive Behavior: An Application of the Integrative Model of Behavioral Prediction

    Sutton, Jazmyne A.; Walsh-Buhi, Eric R.

    2017-01-01

    Objective: This study investigated variables within the Integrative Model of Behavioral Prediction (IMBP) as well as differences across socioeconomic status (SES) levels within the context of inconsistent contraceptive use among college women. Participants: A nonprobability sample of 515 female college students completed an Internet-based survey…

  15. Testing Predictive Models of Technology Integration in Mexico and the United States

    Velazquez, Cesareo Morales

    2008-01-01

    Data from Mexico City, Mexico (N = 978) and from Texas, USA (N = 932) were used to test the predictive validity of the teacher professional development component of the Will, Skill, Tool Model of Technology Integration in a cross-cultural context. Structural equation modeling (SEM) was used to test the model. Analyses of these data yielded…

  16. Predicting Elementary Education Candidates' Technology Integration during Their Field Placement Instruction.

    Negishi, Meiko; Elder, Anastasia D.; Hamil, J. Burnette; Mzoughi, Taha

    A growing concern in teacher education programs is technology training. Research confirms that training positively affects perservice teachers' attitudes and technology proficiency. However, little is known about the kinds of factors that may predict preservice teachers' integration of technology into their own instruction. The goal of this study…

  17. Predictive statistical mechanics

    Jaynes, E.T.

    1986-01-01

    This paper tries to explain the original motivation in quantum theory, the formalism that evolved from it, and some recent applications. The difficulty in finding a rational interpretation of quantum theory is examined. The paper presents and tries to solve the technical problem of how to use probability theory to help one do plausible reasoning in situations where, because of incomplete information, one cannot use deductive reasoning. Bayesian inference and the mass of Saturn are discussed and a generalized inverse problem is presented. The author examines the maximum entropy principle. Applications are discussed and include: spectrum analysis, image reconstruction, noisy data, and medical tomography

  18. Integrating Crop Growth Models with Whole Genome Prediction through Approximate Bayesian Computation.

    Frank Technow

    Full Text Available Genomic selection, enabled by whole genome prediction (WGP methods, is revolutionizing plant breeding. Existing WGP methods have been shown to deliver accurate predictions in the most common settings, such as prediction of across environment performance for traits with additive gene effects. However, prediction of traits with non-additive gene effects and prediction of genotype by environment interaction (G×E, continues to be challenging. Previous attempts to increase prediction accuracy for these particularly difficult tasks employed prediction methods that are purely statistical in nature. Augmenting the statistical methods with biological knowledge has been largely overlooked thus far. Crop growth models (CGMs attempt to represent the impact of functional relationships between plant physiology and the environment in the formation of yield and similar output traits of interest. Thus, they can explain the impact of G×E and certain types of non-additive gene effects on the expressed phenotype. Approximate Bayesian computation (ABC, a novel and powerful computational procedure, allows the incorporation of CGMs directly into the estimation of whole genome marker effects in WGP. Here we provide a proof of concept study for this novel approach and demonstrate its use with synthetic data sets. We show that this novel approach can be considerably more accurate than the benchmark WGP method GBLUP in predicting performance in environments represented in the estimation set as well as in previously unobserved environments for traits determined by non-additive gene effects. We conclude that this proof of concept demonstrates that using ABC for incorporating biological knowledge in the form of CGMs into WGP is a very promising and novel approach to improving prediction accuracy for some of the most challenging scenarios in plant breeding and applied genetics.

  19. Structural integrity of frontostriatal connections predicts longitudinal changes in self-esteem.

    Chavez, Robert S; Heatherton, Todd F

    2017-06-01

    Diverse neurological and psychiatric conditions are marked by a diminished sense of positive self-regard, and reductions in self-esteem are associated with risk for these disorders. Recent evidence has shown that the connectivity of frontostriatal circuitry reflects individual differences in self-esteem. However, it remains an open question as to whether the integrity of these connections can predict self-esteem changes over larger timescales. Using diffusion magnetic resonance imaging and probabilistic tractography, we demonstrate that the integrity of white matter pathways linking the medial prefrontal cortex to the ventral striatum predicts changes in self-esteem 8 months after initial scanning in a sample of 30 young adults. Individuals with greater integrity of this pathway during the scanning session at Time 1 showed increased levels of self-esteem at follow-up, whereas individuals with lower integrity showed stifled or decreased levels of self-esteem. These results provide evidence that frontostriatal white matter integrity predicts the trajectory of self-esteem development in early adulthood, which may contribute to blunted levels of positive self-regard seen in multiple psychiatric conditions, including depression and anxiety.

  20. Integrative approaches to the prediction of protein functions based on the feature selection

    Lee Hyunju

    2009-12-01

    Full Text Available Abstract Background Protein function prediction has been one of the most important issues in functional genomics. With the current availability of various genomic data sets, many researchers have attempted to develop integration models that combine all available genomic data for protein function prediction. These efforts have resulted in the improvement of prediction quality and the extension of prediction coverage. However, it has also been observed that integrating more data sources does not always increase the prediction quality. Therefore, selecting data sources that highly contribute to the protein function prediction has become an important issue. Results We present systematic feature selection methods that assess the contribution of genome-wide data sets to predict protein functions and then investigate the relationship between genomic data sources and protein functions. In this study, we use ten different genomic data sources in Mus musculus, including: protein-domains, protein-protein interactions, gene expressions, phenotype ontology, phylogenetic profiles and disease data sources to predict protein functions that are labelled with Gene Ontology (GO terms. We then apply two approaches to feature selection: exhaustive search feature selection using a kernel based logistic regression (KLR, and a kernel based L1-norm regularized logistic regression (KL1LR. In the first approach, we exhaustively measure the contribution of each data set for each function based on its prediction quality. In the second approach, we use the estimated coefficients of features as measures of contribution of data sources. Our results show that the proposed methods improve the prediction quality compared to the full integration of all data sources and other filter-based feature selection methods. We also show that contributing data sources can differ depending on the protein function. Furthermore, we observe that highly contributing data sets can be similar among

  1. Case study on prediction of remaining methane potential of landfilled municipal solid waste by statistical analysis of waste composition data.

    Sel, İlker; Çakmakcı, Mehmet; Özkaya, Bestamin; Suphi Altan, H

    2016-10-01

    Main objective of this study was to develop a statistical model for easier and faster Biochemical Methane Potential (BMP) prediction of landfilled municipal solid waste by analyzing waste composition of excavated samples from 12 sampling points and three waste depths representing different landfilling ages of closed and active sections of a sanitary landfill site located in İstanbul, Turkey. Results of Principal Component Analysis (PCA) were used as a decision support tool to evaluation and describe the waste composition variables. Four principal component were extracted describing 76% of data set variance. The most effective components were determined as PCB, PO, T, D, W, FM, moisture and BMP for the data set. Multiple Linear Regression (MLR) models were built by original compositional data and transformed data to determine differences. It was observed that even residual plots were better for transformed data the R(2) and Adjusted R(2) values were not improved significantly. The best preliminary BMP prediction models consisted of D, W, T and FM waste fractions for both versions of regressions. Adjusted R(2) values of the raw and transformed models were determined as 0.69 and 0.57, respectively. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Subtask 2.4 - Integration and Synthesis in Climate Change Predictive Modeling

    Jaroslav Solc

    2009-06-01

    The Energy & Environmental Research Center (EERC) completed a brief evaluation of the existing status of predictive modeling to assess options for integration of our previous paleohydrologic reconstructions and their synthesis with current global climate scenarios. Results of our research indicate that short-term data series available from modern instrumental records are not sufficient to reconstruct past hydrologic events or predict future ones. On the contrary, reconstruction of paleoclimate phenomena provided credible information on past climate cycles and confirmed their integration in the context of regional climate history is possible. Similarly to ice cores and other paleo proxies, acquired data represent an objective, credible tool for model calibration and validation of currently observed trends. It remains a subject of future research whether further refinement of our results and synthesis with regional and global climate observations could contribute to improvement and credibility of climate predictions on a regional and global scale.

  3. At the Nexus of History, Ecology, and Hydrobiogeochemistry: Improved Predictions across Scales through Integration.

    Stegen, James C

    2018-01-01

    To improve predictions of ecosystem function in future environments, we need to integrate the ecological and environmental histories experienced by microbial communities with hydrobiogeochemistry across scales. A key issue is whether we can derive generalizable scaling relationships that describe this multiscale integration. There is a strong foundation for addressing these challenges. We have the ability to infer ecological history with null models and reveal impacts of environmental history through laboratory and field experimentation. Recent developments also provide opportunities to inform ecosystem models with targeted omics data. A major next step is coupling knowledge derived from such studies with multiscale modeling frameworks that are predictive under non-steady-state conditions. This is particularly true for systems spanning dynamic interfaces, which are often hot spots of hydrobiogeochemical function. We can advance predictive capabilities through a holistic perspective focused on the nexus of history, ecology, and hydrobiogeochemistry.

  4. Improving Allergen Prediction in Main Crops Using a Weighted Integrative Method.

    Li, Jing; Wang, Jing; Li, Jing

    2017-12-01

    As a public health problem, food allergy is frequently caused by food allergy proteins, which trigger a type-I hypersensitivity reaction in the immune system of atopic individuals. The food allergens in our daily lives are mainly from crops including rice, wheat, soybean and maize. However, allergens in these main crops are far from fully uncovered. Although some bioinformatics tools or methods predicting the potential allergenicity of proteins have been proposed, each method has their limitation. In this paper, we built a novel algorithm PREAL W , which integrated PREAL, FAO/WHO criteria and motif-based method by a weighted average score, to benefit the advantages of different methods. Our results illustrated PREAL W has better performance significantly in the crops' allergen prediction. This integrative allergen prediction algorithm could be useful for critical food safety matters. The PREAL W could be accessed at http://lilab.life.sjtu.edu.cn:8080/prealw .

  5. Digital immunohistochemistry platform for the staining variation monitoring based on integration of image and statistical analyses with laboratory information system.

    Laurinaviciene, Aida; Plancoulaine, Benoit; Baltrusaityte, Indra; Meskauskas, Raimundas; Besusparis, Justinas; Lesciute-Krilaviciene, Daiva; Raudeliunas, Darius; Iqbal, Yasir; Herlin, Paulette; Laurinavicius, Arvydas

    2014-01-01

    Digital immunohistochemistry (IHC) is one of the most promising applications brought by new generation image analysis (IA). While conventional IHC staining quality is monitored by semi-quantitative visual evaluation of tissue controls, IA may require more sensitive measurement. We designed an automated system to digitally monitor IHC multi-tissue controls, based on SQL-level integration of laboratory information system with image and statistical analysis tools. Consecutive sections of TMA containing 10 cores of breast cancer tissue were used as tissue controls in routine Ki67 IHC testing. Ventana slide label barcode ID was sent to the LIS to register the serial section sequence. The slides were stained and scanned (Aperio ScanScope XT), IA was performed by the Aperio/Leica Colocalization and Genie Classifier/Nuclear algorithms. SQL-based integration ensured automated statistical analysis of the IA data by the SAS Enterprise Guide project. Factor analysis and plot visualizations were performed to explore slide-to-slide variation of the Ki67 IHC staining results in the control tissue. Slide-to-slide intra-core IHC staining analysis revealed rather significant variation of the variables reflecting the sample size, while Brown and Blue Intensity were relatively stable. To further investigate this variation, the IA results from the 10 cores were aggregated to minimize tissue-related variance. Factor analysis revealed association between the variables reflecting the sample size detected by IA and Blue Intensity. Since the main feature to be extracted from the tissue controls was staining intensity, we further explored the variation of the intensity variables in the individual cores. MeanBrownBlue Intensity ((Brown+Blue)/2) and DiffBrownBlue Intensity (Brown-Blue) were introduced to better contrast the absolute intensity and the colour balance variation in each core; relevant factor scores were extracted. Finally, tissue-related factors of IHC staining variance were

  6. More Gamma More Predictions: Gamma-Synchronization as a Key Mechanism for Efficient Integration of Classical Receptive Field Inputs with Surround Predictions

    Vinck, Martin; Bosman, Conrado A.

    2016-01-01

    During visual stimulation, neurons in visual cortex often exhibit rhythmic and synchronous firing in the gamma-frequency (30–90 Hz) band. Whether this phenomenon plays a functional role during visual processing is not fully clear and remains heavily debated. In this article, we explore the function of gamma-synchronization in the context of predictive and efficient coding theories. These theories hold that sensory neurons utilize the statistical regularities in the natural world in order to improve the efficiency of the neural code, and to optimize the inference of the stimulus causes of the sensory data. In visual cortex, this relies on the integration of classical receptive field (CRF) data with predictions from the surround. Here we outline two main hypotheses about gamma-synchronization in visual cortex. First, we hypothesize that the precision of gamma-synchronization reflects the extent to which CRF data can be accurately predicted by the surround. Second, we hypothesize that different cortical columns synchronize to the extent that they accurately predict each other’s CRF visual input. We argue that these two hypotheses can account for a large number of empirical observations made on the stimulus dependencies of gamma-synchronization. Furthermore, we show that they are consistent with the known laminar dependencies of gamma-synchronization and the spatial profile of intercolumnar gamma-synchronization, as well as the dependence of gamma-synchronization on experience and development. Based on our two main hypotheses, we outline two additional hypotheses. First, we hypothesize that the precision of gamma-synchronization shows, in general, a negative dependence on RF size. In support, we review evidence showing that gamma-synchronization decreases in strength along the visual hierarchy, and tends to be more prominent in species with small V1 RFs. Second, we hypothesize that gamma-synchronized network dynamics facilitate the emergence of spiking output that

  7. Predicting the Pullout Capacity of Small Ground Anchors Using Nonlinear Integrated Computing Techniques

    Mosbeh R. Kaloop

    2017-01-01

    Full Text Available This study investigates predicting the pullout capacity of small ground anchors using nonlinear computing techniques. The input-output prediction model for the nonlinear Hammerstein-Wiener (NHW and delay inputs for the adaptive neurofuzzy inference system (DANFIS are developed and utilized to predict the pullout capacity. The results of the developed models are compared with previous studies that used artificial neural networks and least square support vector machine techniques for the same case study. The in situ data collection and statistical performances are used to evaluate the models performance. Results show that the developed models enhance the precision of predicting the pullout capacity when compared with previous studies. Also, the DANFIS model performance is proven to be better than other models used to detect the pullout capacity of ground anchors.

  8. [Integrity].

    Gómez Rodríguez, Rafael Ángel

    2014-01-01

    To say that someone possesses integrity is to claim that that person is almost predictable about responses to specific situations, that he or she can prudentially judge and to act correctly. There is a closed interrelationship between integrity and autonomy, and the autonomy rests on the deeper moral claim of all humans to integrity of the person. Integrity has two senses of significance for medical ethic: one sense refers to the integrity of the person in the bodily, psychosocial and intellectual elements; and in the second sense, the integrity is the virtue. Another facet of integrity of the person is la integrity of values we cherish and espouse. The physician must be a person of integrity if the integrity of the patient is to be safeguarded. The autonomy has reduced the violations in the past, but the character and virtues of the physician are the ultimate safeguard of autonomy of patient. A field very important in medicine is the scientific research. It is the character of the investigator that determines the moral quality of research. The problem arises when legitimate self-interests are replaced by selfish, particularly when human subjects are involved. The final safeguard of moral quality of research is the character and conscience of the investigator. Teaching must be relevant in the scientific field, but the most effective way to teach virtue ethics is through the example of the a respected scientist.

  9. Integrated petrophysical and reservoir characterization workflow to enhance permeability and water saturation prediction

    Al-Amri, Meshal; Mahmoud, Mohamed; Elkatatny, Salaheldin; Al-Yousef, Hasan; Al-Ghamdi, Tariq

    2017-07-01

    Accurate estimation of permeability is essential in reservoir characterization and in determining fluid flow in porous media which greatly assists optimize the production of a field. Some of the permeability prediction techniques such as Porosity-Permeability transforms and recently artificial intelligence and neural networks are encouraging but still show moderate to good match to core data. This could be due to limitation to homogenous media while the knowledge about geology and heterogeneity is indirectly related or absent. The use of geological information from core description as in Lithofacies which includes digenetic information show a link to permeability when categorized into rock types exposed to similar depositional environment. The objective of this paper is to develop a robust combined workflow integrating geology and petrophysics and wireline logs in an extremely heterogeneous carbonate reservoir to accurately predict permeability. Permeability prediction is carried out using pattern recognition algorithm called multi-resolution graph-based clustering (MRGC). We will bench mark the prediction results with hard data from core and well test analysis. As a result, we showed how much better improvements are achieved in the permeability prediction when geology is integrated within the analysis. Finally, we use the predicted permeability as an input parameter in J-function and correct for uncertainties in saturation calculation produced by wireline logs using the classical Archie equation. Eventually, high level of confidence in hydrocarbon volumes estimation is reached when robust permeability and saturation height functions are estimated in presence of important geological details that are petrophysically meaningful.

  10. Predicting co-complexed protein pairs using genomic and proteomic data integration

    King Oliver D

    2004-04-01

    Full Text Available Abstract Background Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H and affinity purification coupled with mass spectrometry (APMS have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship. Results Using a supervised machine learning approach – probabilistic decision tree, we integrated high-throughput protein interaction datasets and other gene- and protein-pair characteristics to predict co-complexed pairs (CCP of proteins. Our predictions proved more sensitive and specific than predictions based on Y2H or APMS methods alone or in combination. Among the top predictions not annotated as CCPs in our reference set (obtained from the MIPS complex catalogue, a significant fraction was found to physically interact according to a separate database (YPD, Yeast Proteome Database, and the remaining predictions may potentially represent unknown CCPs. Conclusions We demonstrated that the probabilistic decision tree approach can be successfully used to predict co-complexed protein (CCP pairs from other characteristics. Our top-scoring CCP predictions provide testable hypotheses for experimental validation.

  11. Assessing the hydrogeochemical processes affecting groundwater pollution in arid areas using an integration of geochemical equilibrium and multivariate statistical techniques

    El Alfy, Mohamed; Lashin, Aref; Abdalla, Fathy; Al-Bassam, Abdulaziz

    2017-01-01

    Rapid economic expansion poses serious problems for groundwater resources in arid areas, which typically have high rates of groundwater depletion. In this study, integration of hydrochemical investigations involving chemical and statistical analyses are conducted to assess the factors controlling hydrochemistry and potential pollution in an arid region. Fifty-four groundwater samples were collected from the Dhurma aquifer in Saudi Arabia, and twenty-one physicochemical variables were examined for each sample. Spatial patterns of salinity and nitrate were mapped using fitted variograms. The nitrate spatial distribution shows that nitrate pollution is a persistent problem affecting a wide area of the aquifer. The hydrochemical investigations and cluster analysis reveal four significant clusters of groundwater zones. Five main factors were extracted, which explain >77% of the total data variance. These factors indicated that the chemical characteristics of the groundwater were influenced by rock–water interactions and anthropogenic factors. The identified clusters and factors were validated with hydrochemical investigations. The geogenic factors include the dissolution of various minerals (calcite, aragonite, gypsum, anhydrite, halite and fluorite) and ion exchange processes. The anthropogenic factors include the impact of irrigation return flows and the application of potassium, nitrate, and phosphate fertilizers. Over time, these anthropogenic factors will most likely contribute to further declines in groundwater quality. - Highlights: • Hydrochemical investigations were carried out in Dhurma aquifer in Saudi Arabia. • The factors controlling potential groundwater pollution in an arid region were studied. • Chemical and statistical analyses are integrated to assess these factors. • Five main factors were extracted, which explain >77% of the total data variance. • The chemical characteristics of the groundwater were influenced by rock–water interactions

  12. Predictive Coding and Multisensory Integration: An Attentional Account of the Multisensory Mind

    Durk eTalsma

    2015-03-01

    Full Text Available Multisensory integration involves a host of different cognitive processes, occurring at different stages of sensory processing. Here I argue that, despite recent insights suggesting that multisensory interactions can occur at very early latencies, the actual integration of individual sensory traces into an internally consistent mental representation is dependent on both top-down and bottom-up processes. Moreover, I argue that this integration is not limited to just sensory inputs, but that internal cognitive processes also shape the resulting mental representation. Studies showing that memory recall is affected by the initial multisensory context in which the stimuli were presented will be discussed, as well as several studies showing that mental imagery can affect multisensory illusions. This empirical evidence will be discussed from a predictive coding perspective, in which a central top-down attentional process is proposed to play a central role in coordinating the integration of all these inputs into a coherent mental representation.

  13. Integrated GIS and multivariate statistical analysis for regional scale assessment of heavy metal soil contamination: A critical review.

    Hou, Deyi; O'Connor, David; Nathanail, Paul; Tian, Li; Ma, Yan

    2017-12-01

    Heavy metal soil contamination is associated with potential toxicity to humans or ecotoxicity. Scholars have increasingly used a combination of geographical information science (GIS) with geostatistical and multivariate statistical analysis techniques to examine the spatial distribution of heavy metals in soils at a regional scale. A review of such studies showed that most soil sampling programs were based on grid patterns and composite sampling methodologies. Many programs intended to characterize various soil types and land use types. The most often used sampling depth intervals were 0-0.10 m, or 0-0.20 m, below surface; and the sampling densities used ranged from 0.0004 to 6.1 samples per km 2 , with a median of 0.4 samples per km 2 . The most widely used spatial interpolators were inverse distance weighted interpolation and ordinary kriging; and the most often used multivariate statistical analysis techniques were principal component analysis and cluster analysis. The review also identified several determining and correlating factors in heavy metal distribution in soils, including soil type, soil pH, soil organic matter, land use type, Fe, Al, and heavy metal concentrations. The major natural and anthropogenic sources of heavy metals were found to derive from lithogenic origin, roadway and transportation, atmospheric deposition, wastewater and runoff from industrial and mining facilities, fertilizer application, livestock manure, and sewage sludge. This review argues that the full potential of integrated GIS and multivariate statistical analysis for assessing heavy metal distribution in soils on a regional scale has not yet been fully realized. It is proposed that future research be conducted to map multivariate results in GIS to pinpoint specific anthropogenic sources, to analyze temporal trends in addition to spatial patterns, to optimize modeling parameters, and to expand the use of different multivariate analysis tools beyond principal component analysis

  14. Integrating Models of Diffusion and Behavior to Predict Innovation Adoption, Maintenance, and Social Diffusion.

    Smith, Rachel A; Kim, Youllee; Zhu, Xun; Doudou, Dimi Théodore; Sternberg, Eleanore D; Thomas, Matthew B

    2018-01-01

    This study documents an investigation into the adoption and diffusion of eave tubes, a novel mosquito vector control, during a large-scale scientific field trial in West Africa. The diffusion of innovations (DOI) and the integrated model of behavior (IMB) were integrated (i.e., innovation attributes with attitudes and social pressures with norms) to predict participants' (N = 329) diffusion intentions. The findings showed that positive attitudes about the innovation's attributes were a consistent positive predictor of diffusion intentions: adopting it, maintaining it, and talking with others about it. As expected by the DOI and the IMB, the social pressure created by a descriptive norm positively predicted intentions to adopt and maintain the innovation. Drawing upon sharing research, we argued that the descriptive norm may dampen future talk about the innovation, because it may no longer be seen as a novel, useful topic to discuss. As predicted, the results showed that as the descriptive norm increased, the intention to talk about the innovation decreased. These results provide broad support for integrating the DOI and the IMB to predict diffusion and for efforts to draw on other research to understand motivations for social diffusion.

  15. Fast and General Method To Predict the Physicochemical Properties of Druglike Molecules Using the Integral Equation Theory of Molecular Liquids.

    Palmer, David S; Mišin, Maksim; Fedorov, Maxim V; Llinas, Antonio

    2015-09-08

    We report a method to predict physicochemical properties of druglike molecules using a classical statistical mechanics based solvent model combined with machine learning. The RISM-MOL-INF method introduced here provides an accurate technique to characterize solvation and desolvation processes based on solute-solvent correlation functions computed by the 1D reference interaction site model of the integral equation theory of molecular liquids. These functions can be obtained in a matter of minutes for most small organic and druglike molecules using existing software (RISM-MOL) (Sergiievskyi, V. P.; Hackbusch, W.; Fedorov, M. V. J. Comput. Chem. 2011, 32, 1982-1992). Predictions of caco-2 cell permeability and hydration free energy obtained using the RISM-MOL-INF method are shown to be more accurate than the state-of-the-art tools for benchmark data sets. Due to the importance of solvation and desolvation effects in biological systems, it is anticipated that the RISM-MOL-INF approach will find many applications in biophysical and biomedical property prediction.

  16. Intermittent explosive disorder: development of integrated research criteria for Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition.

    Coccaro, Emil F

    2011-01-01

    This study was designed to develop a revised diagnostic criteria set for intermittent explosive disorder (IED) for consideration for inclusion in Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-V). This revised criteria set was developed by integrating previous research criteria with elements from the current DSM-IV set of diagnostic criteria. Evidence supporting the reliability and validity of IED-IR ("IED Integrated Criteria") in a new and well-characterized group of subjects with personality disorder is presented. Clinical, phenomenologic, and diagnostic data from 201 individuals with personality disorder were reviewed. All IED diagnoses were assigned using a best-estimate process (eg, kappa for IED-IR >0.85). In addition, subjects meeting IED-IR criteria had higher scores on dimensional measures of aggression and had lower global functioning scores than non-IED-IR subjects, even when related variables were controlled. The IED-IR criteria were more sensitive than the DSM-IV criteria only in identifying subjects with significant impulsive-aggressive behavior by a factor of 16. We conclude that the IED-IR criteria can be reliably applied and have sufficient validity to warrant consideration as DSM-V criteria for IED. Copyright © 2011 Elsevier Inc. All rights reserved.

  17. Improving thermal model prediction through statistical analysis of irradiation and post-irradiation data from AGR experiments

    Pham, Binh T.; Hawkes, Grant L.; Einerson, Jeffrey J.

    2014-01-01

    As part of the High Temperature Reactors (HTR) R and D program, a series of irradiation tests, designated as Advanced Gas-cooled Reactor (AGR), have been defined to support development and qualification of fuel design, fabrication process, and fuel performance under normal operation and accident conditions. The AGR tests employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule and instrumented with thermocouples (TC) embedded in graphite blocks enabling temperature control. While not possible to obtain by direct measurements in the tests, crucial fuel conditions (e.g., temperature, neutron fast fluence, and burnup) are calculated using core physics and thermal modeling codes. This paper is focused on AGR test fuel temperature predicted by the ABAQUS code's finite element-based thermal models. The work follows up on a previous study, in which several statistical analysis methods were adapted, implemented in the NGNP Data Management and Analysis System (NDMAS), and applied for qualification of AGR-1 thermocouple data. Abnormal trends in measured data revealed by the statistical analysis are traced to either measuring instrument deterioration or physical mechanisms in capsules that may have shifted the system thermal response. The main thrust of this work is to exploit the variety of data obtained in irradiation and post-irradiation examination (PIE) for assessment of modeling assumptions. As an example, the uneven reduction of the control gas gap in Capsule 5 found in the capsule metrology measurements in PIE helps identify mechanisms other than TC drift causing the decrease in TC readings. This suggests a more physics-based modification of the thermal model that leads to a better fit with experimental data, thus reducing model uncertainty and increasing confidence in the calculated fuel temperatures of the AGR-1 test

  18. Improving thermal model prediction through statistical analysis of irradiation and post-irradiation data from AGR experiments

    Pham, Binh T., E-mail: Binh.Pham@inl.gov [Human Factor, Controls and Statistics Department, Nuclear Science and Technology, Idaho National Laboratory, Idaho Falls, ID 83415 (United States); Hawkes, Grant L. [Thermal Science and Safety Analysis Department, Nuclear Science and Technology, Idaho National Laboratory, Idaho Falls, ID 83415 (United States); Einerson, Jeffrey J. [Human Factor, Controls and Statistics Department, Nuclear Science and Technology, Idaho National Laboratory, Idaho Falls, ID 83415 (United States)

    2014-05-01

    As part of the High Temperature Reactors (HTR) R and D program, a series of irradiation tests, designated as Advanced Gas-cooled Reactor (AGR), have been defined to support development and qualification of fuel design, fabrication process, and fuel performance under normal operation and accident conditions. The AGR tests employ fuel compacts placed in a graphite cylinder shrouded by a steel capsule and instrumented with thermocouples (TC) embedded in graphite blocks enabling temperature control. While not possible to obtain by direct measurements in the tests, crucial fuel conditions (e.g., temperature, neutron fast fluence, and burnup) are calculated using core physics and thermal modeling codes. This paper is focused on AGR test fuel temperature predicted by the ABAQUS code's finite element-based thermal models. The work follows up on a previous study, in which several statistical analysis methods were adapted, implemented in the NGNP Data Management and Analysis System (NDMAS), and applied for qualification of AGR-1 thermocouple data. Abnormal trends in measured data revealed by the statistical analysis are traced to either measuring instrument deterioration or physical mechanisms in capsules that may have shifted the system thermal response. The main thrust of this work is to exploit the variety of data obtained in irradiation and post-irradiation examination (PIE) for assessment of modeling assumptions. As an example, the uneven reduction of the control gas gap in Capsule 5 found in the capsule metrology measurements in PIE helps identify mechanisms other than TC drift causing the decrease in TC readings. This suggests a more physics-based modification of the thermal model that leads to a better fit with experimental data, thus reducing model uncertainty and increasing confidence in the calculated fuel temperatures of the AGR-1 test.

  19. An electrically actuated imperfect microbeam: Dynamical integrity for interpreting and predicting the device response

    Ruzziconi, Laura

    2013-02-20

    In this study we deal with a microelectromechanical system (MEMS) and develop a dynamical integrity analysis to interpret and predict the experimental response. The device consists of a clamped-clamped polysilicon microbeam, which is electrostatically and electrodynamically actuated. It has non-negligible imperfections, which are a typical consequence of the microfabrication process. A single-mode reduced-order model is derived and extensive numerical simulations are performed in a neighborhood of the first symmetric natural frequency, via frequency response diagrams and behavior chart. The typical softening behavior is observed and the overall scenario is explored, when both the frequency and the electrodynamic voltage are varied. We show that simulations based on direct numerical integration of the equation of motion in time yield satisfactory agreement with the experimental data. Nevertheless, these theoretical predictions are not completely fulfilled in some aspects. In particular, the range of existence of each attractor is smaller in practice than in the simulations. This is because these theoretical curves represent the ideal limit case where disturbances are absent, which never occurs under realistic conditions. A reliable prediction of the actual (and not only theoretical) range of existence of each attractor is essential in applications. To overcome this discrepancy and extend the results to the practical case where disturbances exist, a dynamical integrity analysis is developed. After introducing dynamical integrity concepts, integrity profiles and integrity charts are drawn. They are able to describe if each attractor is robust enough to tolerate the disturbances. Moreover, they detect the parameter range where each branch can be reliably observed in practice and where, instead, becomes vulnerable, i.e. they provide valuable information to operate the device in safe conditions according to the desired outcome and depending on the expected disturbances

  20. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks

    Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E.; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A.; Kellis, Manolis

    2012-01-01

    Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein–protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level. PMID:22456606

  1. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks.

    Marbach, Daniel; Roy, Sushmita; Ay, Ferhat; Meyer, Patrick E; Candeias, Rogerio; Kahveci, Tamer; Bristow, Christopher A; Kellis, Manolis

    2012-07-01

    Gaining insights on gene regulation from large-scale functional data sets is a grand challenge in systems biology. In this article, we develop and apply methods for transcriptional regulatory network inference from diverse functional genomics data sets and demonstrate their value for gene function and gene expression prediction. We formulate the network inference problem in a machine-learning framework and use both supervised and unsupervised methods to predict regulatory edges by integrating transcription factor (TF) binding, evolutionarily conserved sequence motifs, gene expression, and chromatin modification data sets as input features. Applying these methods to Drosophila melanogaster, we predict ∼300,000 regulatory edges in a network of ∼600 TFs and 12,000 target genes. We validate our predictions using known regulatory interactions, gene functional annotations, tissue-specific expression, protein-protein interactions, and three-dimensional maps of chromosome conformation. We use the inferred network to identify putative functions for hundreds of previously uncharacterized genes, including many in nervous system development, which are independently confirmed based on their tissue-specific expression patterns. Last, we use the regulatory network to predict target gene expression levels as a function of TF expression, and find significantly higher predictive power for integrative networks than for motif or ChIP-based networks. Our work reveals the complementarity between physical evidence of regulatory interactions (TF binding, motif conservation) and functional evidence (coordinated expression or chromatin patterns) and demonstrates the power of data integration for network inference and studies of gene regulation at the systems level.

  2. Evaluation of multivariate statistical analyses for monitoring and prediction of processes in an seawater reverse osmosis desalination plant

    Kolluri, Srinivas Sahan; Esfahani, Iman Janghorban; Garikiparthy, Prithvi Sai Nadh; Yoo, Chang Kyoo

    2015-01-01

    Our aim was to analyze, monitor, and predict the outcomes of processes in a full-scale seawater reverse osmosis (SWRO) desalination plant using multivariate statistical techniques. Multivariate analysis of variance (MANOVA) was used to investigate the performance and efficiencies of two SWRO processes, namely, pore controllable fiber filterreverse osmosis (PCF-SWRO) and sand filtration-ultra filtration-reverse osmosis (SF-UF-SWRO). Principal component analysis (PCA) was applied to monitor the two SWRO processes. PCA monitoring revealed that the SF-UF-SWRO process could be analyzed reliably with a low number of outliers and disturbances. Partial least squares (PLS) analysis was then conducted to predict which of the seven input parameters of feed flow rate, PCF/SF-UF filtrate flow rate, temperature of feed water, turbidity feed, pH, reverse osmosis (RO)flow rate, and pressure had a significant effect on the outcome variables of permeate flow rate and concentration. Root mean squared errors (RMSEs) of the PLS models for permeate flow rates were 31.5 and 28.6 for the PCF-SWRO process and SF-UF-SWRO process, respectively, while RMSEs of permeate concentrations were 350.44 and 289.4, respectively. These results indicate that the SF-UF-SWRO process can be modeled more accurately than the PCF-SWRO process, because the RMSE values of permeate flowrate and concentration obtained using a PLS regression model of the SF-UF-SWRO process were lower than those obtained for the PCF-SWRO process.

  3. Evaluation of multivariate statistical analyses for monitoring and prediction of processes in an seawater reverse osmosis desalination plant

    Kolluri, Srinivas Sahan; Esfahani, Iman Janghorban; Garikiparthy, Prithvi Sai Nadh; Yoo, Chang Kyoo [Kyung Hee University, Yongin (Korea, Republic of)

    2015-08-15

    Our aim was to analyze, monitor, and predict the outcomes of processes in a full-scale seawater reverse osmosis (SWRO) desalination plant using multivariate statistical techniques. Multivariate analysis of variance (MANOVA) was used to investigate the performance and efficiencies of two SWRO processes, namely, pore controllable fiber filterreverse osmosis (PCF-SWRO) and sand filtration-ultra filtration-reverse osmosis (SF-UF-SWRO). Principal component analysis (PCA) was applied to monitor the two SWRO processes. PCA monitoring revealed that the SF-UF-SWRO process could be analyzed reliably with a low number of outliers and disturbances. Partial least squares (PLS) analysis was then conducted to predict which of the seven input parameters of feed flow rate, PCF/SF-UF filtrate flow rate, temperature of feed water, turbidity feed, pH, reverse osmosis (RO)flow rate, and pressure had a significant effect on the outcome variables of permeate flow rate and concentration. Root mean squared errors (RMSEs) of the PLS models for permeate flow rates were 31.5 and 28.6 for the PCF-SWRO process and SF-UF-SWRO process, respectively, while RMSEs of permeate concentrations were 350.44 and 289.4, respectively. These results indicate that the SF-UF-SWRO process can be modeled more accurately than the PCF-SWRO process, because the RMSE values of permeate flowrate and concentration obtained using a PLS regression model of the SF-UF-SWRO process were lower than those obtained for the PCF-SWRO process.

  4. Analysis of predicted and measured performance of an integrated compound parabolic concentrator (ICPC)

    Winston, R.; O' Gallagher, J.J.; Muschaweck, J.; Mahoney, A.R.; Dudley, V.

    1999-07-01

    A variety of configurations of evacuated Integrated Compound Parabolic Concentrator (ICPC) tubes have been under development for many years. A particularly favorable optical design corresponds to the unit concentration limit for a fin CPC solution which is then coupled to a practical, thin, wedge-shaped absorber. Prototype collector modules using tubes with two different fin orientations (horizontal and vertical) have been fabricated and tested. Comprehensive measurements of the optical characteristics of the reflector and absorber have been used together with a detailed ray trace analysis to predict the optical performance characteristics of these designs. The observed performance agrees well with the predicted performance.

  5. Energy-Efficient Integration of Continuous Context Sensing and Prediction into Smartwatches

    Reza Rawassizadeh

    2015-09-01

    Full Text Available As the availability and use of wearables increases, they are becoming a promising platform for context sensing and context analysis. Smartwatches are a particularly interesting platform for this purpose, as they offer salient advantages, such as their proximity to the human body. However, they also have limitations associated with their small form factor, such as processing power and battery life, which makes it difficult to simply transfer smartphone-based context sensing and prediction models to smartwatches. In this paper, we introduce an energy-efficient, generic, integrated framework for continuous context sensing and prediction on smartwatches. Our work extends previous approaches for context sensing and prediction on wrist-mounted wearables that perform predictive analytics outside the device. We offer a generic sensing module and a novel energy-efficient, on-device prediction module that is based on a semantic abstraction approach to convert sensor data into meaningful information objects, similar to human perception of a behavior. Through six evaluations, we analyze the energy efficiency of our framework modules, identify the optimal file structure for data access and demonstrate an increase in accuracy of prediction through our semantic abstraction method. The proposed framework is hardware independent and can serve as a reference model for implementing context sensing and prediction on small wearable devices beyond smartwatches, such as body-mounted cameras.

  6. Energy-Efficient Integration of Continuous Context Sensing and Prediction into Smartwatches.

    Rawassizadeh, Reza; Tomitsch, Martin; Nourizadeh, Manouchehr; Momeni, Elaheh; Peery, Aaron; Ulanova, Liudmila; Pazzani, Michael

    2015-09-08

    As the availability and use of wearables increases, they are becoming a promising platform for context sensing and context analysis. Smartwatches are a particularly interesting platform for this purpose, as they offer salient advantages, such as their proximity to the human body. However, they also have limitations associated with their small form factor, such as processing power and battery life, which makes it difficult to simply transfer smartphone-based context sensing and prediction models to smartwatches. In this paper, we introduce an energy-efficient, generic, integrated framework for continuous context sensing and prediction on smartwatches. Our work extends previous approaches for context sensing and prediction on wrist-mounted wearables that perform predictive analytics outside the device. We offer a generic sensing module and a novel energy-efficient, on-device prediction module that is based on a semantic abstraction approach to convert sensor data into meaningful information objects, similar to human perception of a behavior. Through six evaluations, we analyze the energy efficiency of our framework modules, identify the optimal file structure for data access and demonstrate an increase in accuracy of prediction through our semantic abstraction method. The proposed framework is hardware independent and can serve as a reference model for implementing context sensing and prediction on small wearable devices beyond smartwatches, such as body-mounted cameras.

  7. Integration of Attributes from Non-Linear Characterization of Cardiovascular Time-Series for Prediction of Defibrillation Outcomes.

    Sharad Shandilya

    Full Text Available The timing of defibrillation is mostly at arbitrary intervals during cardio-pulmonary resuscitation (CPR, rather than during intervals when the out-of-hospital cardiac arrest (OOH-CA patient is physiologically primed for successful countershock. Interruptions to CPR may negatively impact defibrillation success. Multiple defibrillations can be associated with decreased post-resuscitation myocardial function. We hypothesize that a more complete picture of the cardiovascular system can be gained through non-linear dynamics and integration of multiple physiologic measures from biomedical signals.Retrospective analysis of 153 anonymized OOH-CA patients who received at least one defibrillation for ventricular fibrillation (VF was undertaken. A machine learning model, termed Multiple Domain Integrative (MDI model, was developed to predict defibrillation success. We explore the rationale for non-linear dynamics and statistically validate heuristics involved in feature extraction for model development. Performance of MDI is then compared to the amplitude spectrum area (AMSA technique.358 defibrillations were evaluated (218 unsuccessful and 140 successful. Non-linear properties (Lyapunov exponent > 0 of the ECG signals indicate a chaotic nature and validate the use of novel non-linear dynamic methods for feature extraction. Classification using MDI yielded ROC-AUC of 83.2% and accuracy of 78.8%, for the model built with ECG data only. Utilizing 10-fold cross-validation, at 80% specificity level, MDI (74% sensitivity outperformed AMSA (53.6% sensitivity. At 90% specificity level, MDI had 68.4% sensitivity while AMSA had 43.3% sensitivity. Integrating available end-tidal carbon dioxide features into MDI, for the available 48 defibrillations, boosted ROC-AUC to 93.8% and accuracy to 83.3% at 80% sensitivity.At clinically relevant sensitivity thresholds, the MDI provides improved performance as compared to AMSA, yielding fewer unsuccessful defibrillations

  8. Tool Sequence Trends in Minimally Invasive Surgery: Statistical Analysis and Implications for Predictive Control of Multifunction Instruments

    Carl A. Nelson

    2012-01-01

    Full Text Available This paper presents an analysis of 67 minimally invasive surgical procedures covering 11 different procedure types to determine patterns of tool use. A new graph-theoretic approach was taken to organize and analyze the data. Through grouping surgeries by type, trends of common tool changes were identified. Using the concept of signal/noise ratio, these trends were found to be statistically strong. The tool-use trends were used to generate tool placement patterns for modular (multi-tool, cartridge-type surgical tool systems, and the same 67 surgeries were numerically simulated to determine the optimality of these tool arrangements. The results indicate that aggregated tool-use data (by procedure type can be employed to predict tool-use sequences with good accuracy, and also indicate the potential for artificial intelligence as a means of preoperative and/or intraoperative planning. Furthermore, this suggests that the use of multifunction surgical tools can be optimized to streamline surgical workflow.

  9. Prediction of modified Mercalli intensity from PGA, PGV, moment magnitude, and epicentral distance using several nonlinear statistical algorithms

    Alvarez, Diego A.; Hurtado, Jorge E.; Bedoya-Ruíz, Daniel Alveiro

    2012-07-01

    Despite technological advances in seismic instrumentation, the assessment of the intensity of an earthquake using an observational scale as given, for example, by the modified Mercalli intensity scale is highly useful for practical purposes. In order to link the qualitative numbers extracted from the acceleration record of an earthquake and other instrumental data such as peak ground velocity, epicentral distance, and moment magnitude on the one hand and the modified Mercalli intensity scale on the other, simple statistical regression has been generally employed. In this paper, we will employ three methods of nonlinear regression, namely support vector regression, multilayer perceptrons, and genetic programming in order to find a functional dependence between the instrumental records and the modified Mercalli intensity scale. The proposed methods predict the intensity of an earthquake while dealing with nonlinearity and the noise inherent to the data. The nonlinear regressions with good estimation results have been performed using the "Did You Feel It?" database of the US Geological Survey and the database of the Center for Engineering Strong Motion Data for the California region.

  10. A Statistical Test for Identifying the Number of Creep Regimes When Using the Wilshire Equations for Creep Property Predictions

    Evans, Mark

    2016-12-01

    A new parametric approach, termed the Wilshire equations, offers the realistic potential of being able to accurately lift materials operating at in-service conditions from accelerated test results lasting no more than 5000 hours. The success of this approach can be attributed to a well-defined linear relationship that appears to exist between various creep properties and a log transformation of the normalized stress. However, these linear trends are subject to discontinuities, the number of which appears to differ from material to material. These discontinuities have until now been (1) treated as abrupt in nature and (2) identified by eye from an inspection of simple graphical plots of the data. This article puts forward a statistical test for determining the correct number of discontinuities present within a creep data set and a method for allowing these discontinuities to occur more gradually, so that the methodology is more in line with the accepted view as to how creep mechanisms evolve with changing test conditions. These two developments are fully illustrated using creep data sets on two steel alloys. When these new procedures are applied to these steel alloys, not only do they produce more accurate and realistic looking long-term predictions of the minimum creep rate, but they also lead to different conclusions about the mechanisms determining the rates of creep from those originally put forward by Wilshire.

  11. An integrated prediction and optimization model of biogas production system at a wastewater treatment facility.

    Akbaş, Halil; Bilgen, Bilge; Turhan, Aykut Melih

    2015-11-01

    This study proposes an integrated prediction and optimization model by using multi-layer perceptron neural network and particle swarm optimization techniques. Three different objective functions are formulated. The first one is the maximization of methane percentage with single output. The second one is the maximization of biogas production with single output. The last one is the maximization of biogas quality and biogas production with two outputs. Methane percentage, carbon dioxide percentage, and other contents' percentage are used as the biogas quality criteria. Based on the formulated models and data from a wastewater treatment facility, optimal values of input variables and their corresponding maximum output values are found out for each model. It is expected that the application of the integrated prediction and optimization models increases the biogas production and biogas quality, and contributes to the quantity of electricity production at the wastewater treatment facility. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. The Prediction-Focused Approach: An opportunity for hydrogeophysical data integration and interpretation

    Hermans, Thomas; Nguyen, Frédéric; Klepikova, Maria; Dassargues, Alain; Caers, Jef

    2017-04-01

    Hydrogeophysics is an interdisciplinary field of sciences aiming at a better understanding of subsurface hydrological processes. If geophysical surveys have been successfully used to qualitatively characterize the subsurface, two important challenges remain for a better quantification of hydrological processes: (1) the inversion of geophysical data and (2) their integration in hydrological subsurface models. The classical inversion approach using regularization suffers from spatially and temporally varying resolution and yields geologically unrealistic solutions without uncertainty quantification, making their utilization for hydrogeological calibration less consistent. More advanced techniques such as coupled inversion allow for a direct use of geophysical data for conditioning groundwater and solute transport model calibration. However, the technique is difficult to apply in complex cases and remains computationally demanding to estimate uncertainty. In a recent study, we investigate a prediction-focused approach (PFA) to directly estimate subsurface physical properties from geophysical data, circumventing the need for classic inversions. In PFA, we seek a direct relationship between the data and the subsurface variables we want to predict (the forecast). This relationship is obtained through a prior set of subsurface models for which both data and forecast are computed. A direct relationship can often be derived through dimension reduction techniques. PFA offers a framework for both hydrogeophysical "inversion" and hydrogeophysical data integration. For hydrogeophysical "inversion", the considered forecast variable is the subsurface variable, such as the salinity. An ensemble of possible solutions is generated, allowing uncertainty quantification. For hydrogeophysical data integration, the forecast variable becomes the prediction we want to make with our subsurface models, such as the concentration of contaminant in a drinking water production well. Geophysical

  13. Integrative EEG biomarkers predict progression to Alzheimer's disease at the MCI stage

    Simon-Shlomo ePoil

    2013-10-01

    Full Text Available Alzheimer's disease (AD is a devastating disorder of increasing prevalence in modern society. Mild cognitive impairment (MCI is considered a transitional stage between normal aging and AD; however, not all subjects with MCI progress to AD. Prediction of conversion to AD at an early stage would enable an earlier, and potentially more effective, treatment of AD. Electroencephalography (EEG biomarkers would provide a non-invasive and relatively cheap screening tool to predict conversion to AD; however, traditional EEG biomarkers have not been considered accurate enough to be useful in clinical practice. Here, we aim to combine the information from multiple EEG biomarkers into a diagnostic classification index in order to improve the accuracy of predicting conversion from MCI to AD within a two-year period. We followed 86 patients initially diagnosed with MCI for two years during which 25 patients converted to AD. We show that multiple EEG biomarkers mainly related to activity in the beta-frequency range (13–30 Hz can predict conversion from MCI to AD. Importantly, by integrating six EEG biomarkers into a diagnostic index using logistic regression the prediction improved compared with the classification using the individual biomarkers, with a sensitivity of 88% and specificity of 82%, compared with a sensitivity of 64% and specificity of 62% of the best individual biomarker in this index. In order to identify this diagnostic index we developed a data mining approach implemented in the Neurophysiological Biomarker Toolbox (http://www.nbtwiki.net/. We suggest that this approach can be used to identify optimal combinations of biomarkers (integrative biomarkers also in other modalities. Potentially, these integrative biomarkers could be more sensitive to disease progression and response to therapeutic intervention.

  14. PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites.

    Jiangning Song

    Full Text Available The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s. Knowledge of the substrate specificity of a protease can dramatically improve our ability to predict its target protein substrates, but this information must be utilized in an effective manner in order to efficiently identify protein substrates by in silico approaches. To address this problem, we present PROSPER, an integrated feature-based server for in silico identification of protease substrates and their cleavage sites for twenty-four different proteases. PROSPER utilizes established specificity information for these proteases (derived from the MEROPS database with a machine learning approach to predict protease cleavage sites by using different, but complementary sequence and structure characteristics. Features used by PROSPER include local amino acid sequence profile, predicted secondary structure, solvent accessibility and predicted native disorder. Thus, for proteases with known amino acid specificity, PROSPER provides a convenient, pre-prepared tool for use in identifying protein substrates for the enzymes. Systematic prediction analysis for the twenty-four proteases thus far included in the database revealed that the features we have included in the tool strongly improve performance in terms of cleavage site prediction, as evidenced by their contribution to performance improvement in terms of identifying known cleavage sites in substrates for these enzymes. In comparison with two state-of-the-art prediction tools, PoPS and SitePrediction, PROSPER achieves greater accuracy and coverage. To our knowledge, PROSPER is the first comprehensive server capable of predicting cleavage sites of multiple proteases within a single substrate

  15. Trends and Opportunities of BIM-GIS Integration in the Architecture, Engineering and Construction Industry: A Review from a Spatio-Temporal Statistical Perspective

    Yongze Song

    2017-12-01

    Full Text Available The integration of building information modelling (BIM and geographic information system (GIS in construction management is a new and fast developing trend in recent years, from research to industrial practice. BIM has advantages on rich geometric and semantic information through the building life cycle, while GIS is a broad field covering geovisualization-based decision making and geospatial modelling. However, most current studies of BIM-GIS integration focus on the integration techniques but lack theories and methods for further data analysis and mathematic modelling. This paper reviews the applications and discusses future trends of BIM-GIS integration in the architecture, engineering and construction (AEC industry based on the studies of 96 high-quality research articles from a spatio-temporal statistical perspective. The analysis of these applications helps reveal the evolution progress of BIM-GIS integration. Results show that the utilization of BIM-GIS integration in the AEC industry requires systematic theories beyond integration technologies and deep applications of mathematical modeling methods, including spatio-temporal statistical modeling in GIS and 4D/nD BIM simulation and management. Opportunities of BIM-GIS integration are outlined as three hypotheses in the AEC industry for future research on the in-depth integration of BIM and GIS. BIM-GIS integration hypotheses enable more comprehensive applications through the life cycle of AEC projects.

  16. Applied immuno-epidemiological research: an approach for integrating existing knowledge into the statistical analysis of multiple immune markers.

    Genser, Bernd; Fischer, Joachim E; Figueiredo, Camila A; Alcântara-Neves, Neuza; Barreto, Mauricio L; Cooper, Philip J; Amorim, Leila D; Saemann, Marcus D; Weichhart, Thomas; Rodrigues, Laura C

    2016-05-20

    Immunologists often measure several correlated immunological markers, such as concentrations of different cytokines produced by different immune cells and/or measured under different conditions, to draw insights from complex immunological mechanisms. Although there have been recent methodological efforts to improve the statistical analysis of immunological data, a framework is still needed for the simultaneous analysis of multiple, often correlated, immune markers. This framework would allow the immunologists' hypotheses about the underlying biological mechanisms to be integrated. We present an analytical approach for statistical analysis of correlated immune markers, such as those commonly collected in modern immuno-epidemiological studies. We demonstrate i) how to deal with interdependencies among multiple measurements of the same immune marker, ii) how to analyse association patterns among different markers, iii) how to aggregate different measures and/or markers to immunological summary scores, iv) how to model the inter-relationships among these scores, and v) how to use these scores in epidemiological association analyses. We illustrate the application of our approach to multiple cytokine measurements from 818 children enrolled in a large immuno-epidemiological study (SCAALA Salvador), which aimed to quantify the major immunological mechanisms underlying atopic diseases or asthma. We demonstrate how to aggregate systematically the information captured in multiple cytokine measurements to immunological summary scores aimed at reflecting the presumed underlying immunological mechanisms (Th1/Th2 balance and immune regulatory network). We show how these aggregated immune scores can be used as predictors in regression models with outcomes of immunological studies (e.g. specific IgE) and compare the results to those obtained by a traditional multivariate regression approach. The proposed analytical approach may be especially useful to quantify complex immune

  17. Integrating eQTL data with GWAS summary statistics in pathway-based analysis with application to schizophrenia.

    Wu, Chong; Pan, Wei

    2018-04-01

    Many genetic variants affect complex traits through gene expression, which can be exploited to boost statistical power and enhance interpretation in genome-wide association studies (GWASs) as demonstrated by the transcriptome-wide association study (TWAS) approach. Furthermore, due to polygenic inheritance, a complex trait is often affected by multiple genes with similar functions as annotated in gene pathways. Here, we extend TWAS from gene-based analysis to pathway-based analysis: we integrate public pathway collections, expression quantitative trait locus (eQTL) data and GWAS summary association statistics (or GWAS individual-level data) to identify gene pathways associated with complex traits. The basic idea is to weight the SNPs of the genes in a pathway based on their estimated cis-effects on gene expression, then adaptively test for association of the pathway with a GWAS trait by effectively aggregating possibly weak association signals across the genes in the pathway. The P values can be calculated analytically and thus fast. We applied our proposed test with the KEGG and GO pathways to two schizophrenia (SCZ) GWAS summary association data sets, denoted by SCZ1 and SCZ2 with about 20,000 and 150,000 subjects, respectively. Most of the significant pathways identified by analyzing the SCZ1 data were reproduced by the SCZ2 data. Importantly, we identified 15 novel pathways associated with SCZ, such as GABA receptor complex (GO:1902710), which could not be uncovered by the standard single SNP-based analysis or gene-based TWAS. The newly identified pathways may help us gain insights into the biological mechanism underlying SCZ. Our results showcase the power of incorporating gene expression information and gene functional annotations into pathway-based association testing for GWAS. © 2018 WILEY PERIODICALS, INC.

  18. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

    Novichkov, Pavel S.; Rodionov, Dmitry A.; Stavrovskaya, Elena D.; Novichkova, Elena S.; Kazakov, Alexey E.; Gelfand, Mikhail S.; Arkin, Adam P.; Mironov, Andrey A.; Dubchak, Inna

    2010-05-26

    RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.

  19. Integrating environmental and genetic effects to predict responses of tree populations to climate.

    Wang, Tongli; O'Neill, Gregory A; Aitken, Sally N

    2010-01-01

    Climate is a major environmental factor affecting the phenotype of trees and is also a critical agent of natural selection that has molded among-population genetic variation. Population response functions describe the environmental effect of planting site climates on the performance of a single population, whereas transfer functions describe among-population genetic variation molded by natural selection for climate. Although these approaches are widely used to predict the responses of trees to climate change, both have limitations. We present a novel approach that integrates both genetic and environmental effects into a single "universal response function" (URF) to better predict the influence of climate on phenotypes. Using a large lodgepole pine (Pinus contorta Dougl. ex Loud.) field transplant experiment composed of 140 populations planted on 62 sites to demonstrate the methodology, we show that the URF makes full use of data from provenance trials to: (1) improve predictions of climate change impacts on phenotypes; (2) reduce the size and cost of future provenance trials without compromising predictive power; (3) more fully exploit existing, less comprehensive provenance tests; (4) quantify and compare environmental and genetic effects of climate on population performance; and (5) predict the performance of any population growing in any climate. Finally, we discuss how the last attribute allows the URF to be used as a mechanistic model to predict population and species ranges for the future and to guide assisted migration of seed for reforestation, restoration, or afforestation and genetic conservation in a changing climate.

  20. Brain systems for probabilistic and dynamic prediction: computational specificity and integration.

    Jill X O'Reilly

    2013-09-01

    Full Text Available A computational approach to functional specialization suggests that brain systems can be characterized in terms of the types of computations they perform, rather than their sensory or behavioral domains. We contrasted the neural systems associated with two computationally distinct forms of predictive model: a reinforcement-learning model of the environment obtained through experience with discrete events, and continuous dynamic forward modeling. By manipulating the precision with which each type of prediction could be used, we caused participants to shift computational strategies within a single spatial prediction task. Hence (using fMRI we showed that activity in two brain systems (typically associated with reward learning and motor control could be dissociated in terms of the forms of computations that were performed there, even when both systems were used to make parallel predictions of the same event. A region in parietal cortex, which was sensitive to the divergence between the predictions of the models and anatomically connected to both computational networks, is proposed to mediate integration of the two predictive modes to produce a single behavioral output.

  1. Online Sentence Comprehension in PPA: Verb-Based Integration and Prediction

    Jennifer E Mack

    2015-05-01

    Full Text Available Introduction. Impaired language comprehension is frequently observed in primary progressive aphasia (PPA. Word comprehension deficits are characteristic of the semantic variant (PPA-S whereas sentence comprehension deficits are more prevalent in the agrammatic (PPA-G and logopenic (PPA-L variants (Amici et al., 2007; Gorno-Tempini et al., 2011; Thompson et al., 2013. Word and sentence comprehension deficits have also been shown to have distinct neural substrates in PPA (Mesulam, Thompson, Weintraub, & Rogalski, in press. However, little is known about the relationship between word and sentence comprehension processes in PPA, specifically how words are accessed, combined, and used to predict upcoming elements within a sentence. A previous study demonstrated that listeners with stroke-induced agrammatic aphasia rapidly access verb meanings and use them to semantically integrate verb-arguments; however, they show deficits in using verb meanings predictively (Mack, Ji, & Thompson, 2013. The present study tested whether listeners with PPA are able to access verb meanings and to use this information to integrate and predict verb-arguments. Methods. Fifteen adults with PPA (8 with PPA-G, 3 with PPA-L, and 4 with PPA-S and ten age-matched controls participated in two eyetracking experiments. In both experiments, participants heard sentences with restrictive verbs that were semantically compatible with only one object in a four-picture visual array (e.g., eat when the array included a cake and three non-edible objects and unrestrictive verbs (e.g., move that were compatible with all four objects. The verb-based integration experiment tested access to verb meaning and its effects on integration of the direct object (e.g., Susan will eat/move the cake; the verb-based prediction experiment examined prediction of the direct object (e.g., Susan will eat/move the …. The dependent variable was the rate of fixations on the target picture (e.g., the cake in the

  2. Understanding Eating Behaviors through Parental Communication and the Integrative Model of Behavioral Prediction.

    Scheinfeld, Emily; Shim, Minsun

    2017-05-01

    Emerging adulthood (EA) is an important yet overlooked period for developing long-term health behaviors. During these years, emerging adults adopt health behaviors that persist throughout life. This study applies the Integrative Model of Behavioral Prediction (IMBP) to examine the role of childhood parental communication in predicting engagement in healthful eating during EA. Participants included 239 college students, ages 18 to 25, from a large university in the southern United States. Participants were recruited and data collection occurred spring 2012. Participants responded to measures to assess perceived parental communication, eating behaviors, attitudes, subjective norms, and behavioral control over healthful eating. SEM and mediation analyses were used to address the hypotheses posited. Data demonstrated that perceived parent-child communication - specifically, its quality and target-specific content - significantly predicted emerging adults' eating behaviors, mediated through subjective norm and perceived behavioral control. This study sets the stage for further exploration and understanding of different ways parental communication influences emerging adults' healthy behavior enactment.

  3. Statistical analysis and definition of blockages-prediction formulae for the wastewater network of Oslo by evolutionary computing.

    Ugarelli, Rita; Kristensen, Stig Morten; Røstum, Jon; Saegrov, Sveinung; Di Federico, Vittorio

    2009-01-01

    Oslo Vann og Avløpsetaten (Oslo VAV)-the water/wastewater utility in the Norwegian capital city of Oslo-is assessing future strategies for selection of most reliable materials for wastewater networks, taking into account not only material technical performance but also material performance, regarding operational condition of the system.The research project undertaken by SINTEF Group, the largest research organisation in Scandinavia, NTNU (Norges Teknisk-Naturvitenskapelige Universitet) and Oslo VAV adopts several approaches to understand reasons for failures that may impact flow capacity, by analysing historical data for blockages in Oslo.The aim of the study was to understand whether there is a relationship between the performance of the pipeline and a number of specific attributes such as age, material, diameter, to name a few. This paper presents the characteristics of the data set available and discusses the results obtained by performing two different approaches: a traditional statistical analysis by segregating the pipes into classes, each of which with the same explanatory variables, and a Evolutionary Polynomial Regression model (EPR), developed by Technical University of Bari and University of Exeter, to identify possible influence of pipe's attributes on the total amount of predicted blockages in a period of time.Starting from a detailed analysis of the available data for the blockage events, the most important variables are identified and a classification scheme is adopted.From the statistical analysis, it can be stated that age, size and function do seem to have a marked influence on the proneness of a pipeline to blockages, but, for the reduced sample available, it is difficult to say which variable it is more influencing. If we look at total number of blockages the oldest class seems to be the most prone to blockages, but looking at blockage rates (number of blockages per km per year), then it is the youngest class showing the highest blockage rate

  4. An integrative approach to ortholog prediction for disease-focused and other functional studies

    Perrimon Norbert

    2011-08-01

    Full Text Available Abstract Background Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. Results We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt, for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM and genes in genome-wide association study (GWAS data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist. Conclusions DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.

  5. An integrative approach to ortholog prediction for disease-focused and other functional studies.

    Hu, Yanhui; Flockhart, Ian; Vinayagam, Arunachalam; Bergwitz, Clemens; Berger, Bonnie; Perrimon, Norbert; Mohr, Stephanie E

    2011-08-31

    Mapping of orthologous genes among species serves an important role in functional genomics by allowing researchers to develop hypotheses about gene function in one species based on what is known about the functions of orthologs in other species. Several tools for predicting orthologous gene relationships are available. However, these tools can give different results and identification of predicted orthologs is not always straightforward. We report a simple but effective tool, the Drosophila RNAi Screening Center Integrative Ortholog Prediction Tool (DIOPT; http://www.flyrnai.org/diopt), for rapid identification of orthologs. DIOPT integrates existing approaches, facilitating rapid identification of orthologs among human, mouse, zebrafish, C. elegans, Drosophila, and S. cerevisiae. As compared to individual tools, DIOPT shows increased sensitivity with only a modest decrease in specificity. Moreover, the flexibility built into the DIOPT graphical user interface allows researchers with different goals to appropriately 'cast a wide net' or limit results to highest confidence predictions. DIOPT also displays protein and domain alignments, including percent amino acid identity, for predicted ortholog pairs. This helps users identify the most appropriate matches among multiple possible orthologs. To facilitate using model organisms for functional analysis of human disease-associated genes, we used DIOPT to predict high-confidence orthologs of disease genes in Online Mendelian Inheritance in Man (OMIM) and genes in genome-wide association study (GWAS) data sets. The results are accessible through the DIOPT diseases and traits query tool (DIOPT-DIST; http://www.flyrnai.org/diopt-dist). DIOPT and DIOPT-DIST are useful resources for researchers working with model organisms, especially those who are interested in exploiting model organisms such as Drosophila to study the functions of human disease genes.

  6. Predicting the risk of cucurbit downy mildew in the eastern United States using an integrated aerobiological model

    Neufeld, K. N.; Keinath, A. P.; Gugino, B. K.; McGrath, M. T.; Sikora, E. J.; Miller, S. A.; Ivey, M. L.; Langston, D. B.; Dutta, B.; Keever, T.; Sims, A.; Ojiambo, P. S.

    2017-11-01

    Cucurbit downy mildew caused by the obligate oomycete, Pseudoperonospora cubensis, is considered one of the most economically important diseases of cucurbits worldwide. In the continental United States, the pathogen overwinters in southern Florida and along the coast of the Gulf of Mexico. Outbreaks of the disease in northern states occur annually via long-distance aerial transport of sporangia from infected source fields. An integrated aerobiological modeling system has been developed to predict the risk of disease occurrence and to facilitate timely use of fungicides for disease management. The forecasting system, which combines information on known inoculum sources, long-distance atmospheric spore transport and spore deposition modules, was tested to determine its accuracy in predicting risk of disease outbreak. Rainwater samples at disease monitoring sites in Alabama, Georgia, Louisiana, New York, North Carolina, Ohio, Pennsylvania and South Carolina were collected weekly from planting to the first appearance of symptoms at the field sites during the 2013, 2014, and 2015 growing seasons. A conventional PCR assay with primers specific to P. cubensis was used to detect the presence of sporangia in rain water samples. Disease forecasts were monitored and recorded for each site after each rain event until initial disease symptoms appeared. The pathogen was detected in 38 of the 187 rainwater samples collected during the study period. The forecasting system correctly predicted the risk of disease outbreak based on the presence of sporangia or appearance of initial disease symptoms with an overall accuracy rate of 66 and 75%, respectively. In addition, the probability that the forecasting system correctly classified the presence or absence of disease was ≥ 73%. The true skill statistic calculated based on the appearance of disease symptoms in cucurbit field plantings ranged from 0.42 to 0.58, indicating that the disease forecasting system had an acceptable to good

  7. Assessing the hydrogeochemical processes affecting groundwater pollution in arid areas using an integration of geochemical equilibrium and multivariate statistical techniques.

    El Alfy, Mohamed; Lashin, Aref; Abdalla, Fathy; Al-Bassam, Abdulaziz

    2017-10-01

    Rapid economic expansion poses serious problems for groundwater resources in arid areas, which typically have high rates of groundwater depletion. In this study, integration of hydrochemical investigations involving chemical and statistical analyses are conducted to assess the factors controlling hydrochemistry and potential pollution in an arid region. Fifty-four groundwater samples were collected from the Dhurma aquifer in Saudi Arabia, and twenty-one physicochemical variables were examined for each sample. Spatial patterns of salinity and nitrate were mapped using fitted variograms. The nitrate spatial distribution shows that nitrate pollution is a persistent problem affecting a wide area of the aquifer. The hydrochemical investigations and cluster analysis reveal four significant clusters of groundwater zones. Five main factors were extracted, which explain >77% of the total data variance. These factors indicated that the chemical characteristics of the groundwater were influenced by rock-water interactions and anthropogenic factors. The identified clusters and factors were validated with hydrochemical investigations. The geogenic factors include the dissolution of various minerals (calcite, aragonite, gypsum, anhydrite, halite and fluorite) and ion exchange processes. The anthropogenic factors include the impact of irrigation return flows and the application of potassium, nitrate, and phosphate fertilizers. Over time, these anthropogenic factors will most likely contribute to further declines in groundwater quality. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Downscaling of Open Coarse Precipitation Data through Spatial and Statistical Analysis, Integrating NDVI, NDWI, Elevation, and Distance from Sea

    Hicham Ezzine

    2017-01-01

    Full Text Available This study aims to improve the statistical spatial downscaling of coarse precipitation (TRMM 3B43 product and also to explore its limitations in the Mediterranean area. It was carried out in Morocco and was based on an open dataset including four predictors (NDVI, NDWI, DEM, and distance from sea that explain TRMM 3B43 product. For this purpose, four groups of models were established based on different combinations of the four predictors, in order to compare from one side NDVI and NDWI based models and the other side stepwise with multiple regression. The models that have given rise to the best approximations and best fits were used to downscale TRMM 3B43 product. The resulting downscaled and calibrated precipitations were validated by independent RGS. Aside from that, the limitations of the proposed approach were assessed in five bioclimatic stages. Furthermore, the influence of the sea was analyzed in five classes of distance. The findings showed that the models built using NDVI and NDWI have a high correlation and therefore can be used to downscale precipitation. The integration of elevation and distance improved the correlation models. According to R2, RMSE, bias, and MAE, the study revealed that there is a great agreement between downscaled precipitations and RGS measurements. In addition, the analysis showed that the contribution of the variable (distance from sea is evident around the coastal area and decreases progressively. Likewise, the study demonstrated that the approach performs well in humid and arid bioclimatic stages compared to others.

  9. Analysis of gene expression profiles of soft tissue sarcoma using a combination of knowledge-based filtering with integration of multiple statistics.

    Anna Takahashi

    Full Text Available The diagnosis and treatment of soft tissue sarcomas (STS have been difficult. Of the diverse histological subtypes, undifferentiated pleomorphic sarcoma (UPS is particularly difficult to diagnose accurately, and its classification per se is still controversial. Recent advances in genomic technologies provide an excellent way to address such problems. However, it is often difficult, if not impossible, to identify definitive disease-associated genes using genome-wide analysis alone, primarily because of multiple testing problems. In the present study, we analyzed microarray data from 88 STS patients using a combination method that used knowledge-based filtering and a simulation based on the integration of multiple statistics to reduce multiple testing problems. We identified 25 genes, including hypoxia-related genes (e.g., MIF, SCD1, P4HA1, ENO1, and STAT1 and cell cycle- and DNA repair-related genes (e.g., TACC3, PRDX1, PRKDC, and H2AFY. These genes showed significant differential expression among histological subtypes, including UPS, and showed associations with overall survival. STAT1 showed a strong association with overall survival in UPS patients (logrank p = 1.84 × 10(-6 and adjusted p value 2.99 × 10(-3 after the permutation test. According to the literature, the 25 genes selected are useful not only as markers of differential diagnosis but also as prognostic/predictive markers and/or therapeutic targets for STS. Our combination method can identify genes that are potential prognostic/predictive factors and/or therapeutic targets in STS and possibly in other cancers. These disease-associated genes deserve further preclinical and clinical validation.

  10. ESB-Based Sensor Web Integration for the Prediction of Electric Power Supply System Vulnerability

    Stoimenov, Leonid; Bogdanovic, Milos; Bogdanovic-Dinic, Sanja

    2013-01-01

    Electric power supply companies increasingly rely on enterprise IT systems to provide them with a comprehensive view of the state of the distribution network. Within a utility-wide network, enterprise IT systems collect data from various metering devices. Such data can be effectively used for the prediction of power supply network vulnerability. The purpose of this paper is to present the Enterprise Service Bus (ESB)-based Sensor Web integration solution that we have developed with the purpose of enabling prediction of power supply network vulnerability, in terms of a prediction of defect probability for a particular network element. We will give an example of its usage and demonstrate our vulnerability prediction model on data collected from two different power supply companies. The proposed solution is an extension of the GinisSense Sensor Web-based architecture for collecting, processing, analyzing, decision making and alerting based on the data received from heterogeneous data sources. In this case, GinisSense has been upgraded to be capable of operating in an ESB environment and combine Sensor Web and GIS technologies to enable prediction of electric power supply system vulnerability. Aside from electrical values, the proposed solution gathers ambient values from additional sensors installed in the existing power supply network infrastructure. GinisSense aggregates gathered data according to an adapted Omnibus data fusion model and applies decision-making logic on the aggregated data. Detected vulnerabilities are visualized to end-users through means of a specialized Web GIS application. PMID:23955435

  11. Improved prediction of drug-target interactions using regularized least squares integrating with kernel fusion technique

    Hao, Ming; Wang, Yanli, E-mail: ywang@ncbi.nlm.nih.gov; Bryant, Stephen H., E-mail: bryant@ncbi.nlm.nih.gov

    2016-02-25

    Identification of drug-target interactions (DTI) is a central task in drug discovery processes. In this work, a simple but effective regularized least squares integrating with nonlinear kernel fusion (RLS-KF) algorithm is proposed to perform DTI predictions. Using benchmark DTI datasets, our proposed algorithm achieves the state-of-the-art results with area under precision–recall curve (AUPR) of 0.915, 0.925, 0.853 and 0.909 for enzymes, ion channels (IC), G protein-coupled receptors (GPCR) and nuclear receptors (NR) based on 10 fold cross-validation. The performance can further be improved by using a recalculated kernel matrix, especially for the small set of nuclear receptors with AUPR of 0.945. Importantly, most of the top ranked interaction predictions can be validated by experimental data reported in the literature, bioassay results in the PubChem BioAssay database, as well as other previous studies. Our analysis suggests that the proposed RLS-KF is helpful for studying DTI, drug repositioning as well as polypharmacology, and may help to accelerate drug discovery by identifying novel drug targets. - Graphical abstract: Flowchart of the proposed RLS-KF algorithm for drug-target interaction predictions. - Highlights: • A nonlinear kernel fusion algorithm is proposed to perform drug-target interaction predictions. • Performance can further be improved by using the recalculated kernel. • Top predictions can be validated by experimental data.

  12. ESB-Based Sensor Web Integration for the Prediction of Electric Power Supply System Vulnerability

    Milos Bogdanovic

    2013-08-01

    Full Text Available Electric power supply companies increasingly rely on enterprise IT systems to provide them with a comprehensive view of the state of the distribution network. Within a utility-wide network, enterprise IT systems collect data from various metering devices. Such data can be effectively used for the prediction of power supply network vulnerability. The purpose of this paper is to present the Enterprise Service Bus (ESB-based Sensor Web integration solution that we have developed with the purpose of enabling prediction of power supply network vulnerability, in terms of a prediction of defect probability for a particular network element. We will give an example of its usage and demonstrate our vulnerability prediction model on data collected from two different power supply companies. The proposed solution is an extension of the GinisSense Sensor Web-based architecture for collecting, processing, analyzing, decision making and alerting based on the data received from heterogeneous data sources. In this case, GinisSense has been upgraded to be capable of operating in an ESB environment and combine Sensor Web and GIS technologies to enable prediction of electric power supply system vulnerability. Aside from electrical values, the proposed solution gathers ambient values from additional sensors installed in the existing power supply network infrastructure. GinisSense aggregates gathered data according to an adapted Omnibus data fusion model and applies decision-making logic on the aggregated data. Detected vulnerabilities are visualized to end-users through means of a specialized Web GIS application.

  13. ESB-based Sensor Web integration for the prediction of electric power supply system vulnerability.

    Stoimenov, Leonid; Bogdanovic, Milos; Bogdanovic-Dinic, Sanja

    2013-08-15

    Electric power supply companies increasingly rely on enterprise IT systems to provide them with a comprehensive view of the state of the distribution network. Within a utility-wide network, enterprise IT systems collect data from various metering devices. Such data can be effectively used for the prediction of power supply network vulnerability. The purpose of this paper is to present the Enterprise Service Bus (ESB)-based Sensor Web integration solution that we have developed with the purpose of enabling prediction of power supply network vulnerability, in terms of a prediction of defect probability for a particular network element. We will give an example of its usage and demonstrate our vulnerability prediction model on data collected from two different power supply companies. The proposed solution is an extension of the GinisSense Sensor Web-based architecture for collecting, processing, analyzing, decision making and alerting based on the data received from heterogeneous data sources. In this case, GinisSense has been upgraded to be capable of operating in an ESB environment and combine Sensor Web and GIS technologies to enable prediction of electric power supply system vulnerability. Aside from electrical values, the proposed solution gathers ambient values from additional sensors installed in the existing power supply network infrastructure. GinisSense aggregates gathered data according to an adapted Omnibus data fusion model and applies decision-making logic on the aggregated data. Detected vulnerabilities are visualized to end-users through means of a specialized Web GIS application.

  14. Improved prediction of drug-target interactions using regularized least squares integrating with kernel fusion technique

    Hao, Ming; Wang, Yanli; Bryant, Stephen H.

    2016-01-01

    Identification of drug-target interactions (DTI) is a central task in drug discovery processes. In this work, a simple but effective regularized least squares integrating with nonlinear kernel fusion (RLS-KF) algorithm is proposed to perform DTI predictions. Using benchmark DTI datasets, our proposed algorithm achieves the state-of-the-art results with area under precision–recall curve (AUPR) of 0.915, 0.925, 0.853 and 0.909 for enzymes, ion channels (IC), G protein-coupled receptors (GPCR) and nuclear receptors (NR) based on 10 fold cross-validation. The performance can further be improved by using a recalculated kernel matrix, especially for the small set of nuclear receptors with AUPR of 0.945. Importantly, most of the top ranked interaction predictions can be validated by experimental data reported in the literature, bioassay results in the PubChem BioAssay database, as well as other previous studies. Our analysis suggests that the proposed RLS-KF is helpful for studying DTI, drug repositioning as well as polypharmacology, and may help to accelerate drug discovery by identifying novel drug targets. - Graphical abstract: Flowchart of the proposed RLS-KF algorithm for drug-target interaction predictions. - Highlights: • A nonlinear kernel fusion algorithm is proposed to perform drug-target interaction predictions. • Performance can further be improved by using the recalculated kernel. • Top predictions can be validated by experimental data.

  15. Fast integration-based prediction bands for ordinary differential equation models.

    Hass, Helge; Kreutz, Clemens; Timmer, Jens; Kaschek, Daniel

    2016-04-15

    To gain a deeper understanding of biological processes and their relevance in disease, mathematical models are built upon experimental data. Uncertainty in the data leads to uncertainties of the model's parameters and in turn to uncertainties of predictions. Mechanistic dynamic models of biochemical networks are frequently based on nonlinear differential equation systems and feature a large number of parameters, sparse observations of the model components and lack of information in the available data. Due to the curse of dimensionality, classical and sampling approaches propagating parameter uncertainties to predictions are hardly feasible and insufficient. However, for experimental design and to discriminate between competing models, prediction and confidence bands are essential. To circumvent the hurdles of the former methods, an approach to calculate a profile likelihood on arbitrary observations for a specific time point has been introduced, which provides accurate confidence and prediction intervals for nonlinear models and is computationally feasible for high-dimensional models. In this article, reliable and smooth point-wise prediction and confidence bands to assess the model's uncertainty on the whole time-course are achieved via explicit integration with elaborate correction mechanisms. The corresponding system of ordinary differential equations is derived and tested on three established models for cellular signalling. An efficiency analysis is performed to illustrate the computational benefit compared with repeated profile likelihood calculations at multiple time points. The integration framework and the examples used in this article are provided with the software package Data2Dynamics, which is based on MATLAB and freely available at http://www.data2dynamics.org helge.hass@fdm.uni-freiburg.de Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e

  16. Predicting Athletes’ Pre-Exercise Fluid Intake: A Theoretical Integration Approach

    Chunxiao Li

    2018-05-01

    Full Text Available Pre-exercise fluid intake is an important healthy behavior for maintaining athletes’ sports performances and health. However, athletes’ behavioral adherence to fluid intake and its underlying psychological mechanisms have not been investigated. This prospective study aimed to use a health psychology model that integrates the self-determination theory and the theory of planned behavior for understanding pre-exercise fluid intake among athletes. Participants (n = 179 were athletes from college sport teams who completed surveys at two time points. Baseline (Time 1 assessment comprised psychological variables of the integrated model (i.e., autonomous and controlled motivation, attitude, subjective norm, perceived behavioral control, and intention and fluid intake (i.e., behavior was measured prospectively at one month (Time 2. Path analysis showed that the positive association between autonomous motivation and intention was mediated by subjective norm and perceived behavioral control. Controlled motivation positively predicted the subjective norm. Intentions positively predicted pre-exercise fluid intake behavior. Overall, the pattern of results was generally consistent with the integrated model, and it was suggested that athletes’ pre-exercise fluid intake behaviors were associated with the motivational and social cognitive factors of the model. The research findings could be informative for coaches and sport scientists to promote athletes’ pre-exercise fluid intake behaviors.

  17. Stock return predictability and market integration: The role of global and local information

    David G. McMillan

    2016-12-01

    Full Text Available This paper examines the predictability of a range of international stock markets where we allow the presence of both local and global predictive factors. Recent research has argued that US returns have predictive power for international stock returns. We expand this line of research, following work on market integration, to include a more general definition of the global factor, based on principal components analysis. Results identify three global expected returns factors, one related to the major stock markets of the US, UK and Asia and one related to the other markets analysed. The third component is related to dividend growth. A single dominant realised returns factor is also noted. A forecasting exercise comparing the principal components based factors to a US return factor and local market only factors, as well as the historical mean benchmark finds supportive evidence for the former approach. It is hoped that the results from this paper will be informative on three counts. First, to academics interested in understanding the dynamics asset price movement. Second, to market participants who aim to time the market and engage in portfolio and risk management. Third, to those (policy makers and others who are interested in linkages across international markets and the nature and degree of integration.

  18. The effects of clinical and statistical heterogeneity on the predictive values of results from meta-analyses

    Melsen, W G; Rovers, M M; Bonten, M J M; Bootsma, M C J|info:eu-repo/dai/nl/304830305

    Variance between studies in a meta-analysis will exist. This heterogeneity may be of clinical, methodological or statistical origin. The last of these is quantified by the I(2) -statistic. We investigated, using simulated studies, the accuracy of I(2) in the assessment of heterogeneity and the

  19. Predicting Acquisition of Learning Outcomes: A Comparison of Traditional and Activity-Based Instruction in an Introductory Statistics Course.

    Geske, Jenenne A.; Mickelson, William T.; Bandalos, Deborah L.; Jonson, Jessica; Smith, Russell W.

    The bulk of experimental research related to reforms in the teaching of statistics concentrates on the effects of alternative teaching methods on statistics achievement. This study expands on that research by including an examination of the effects of instructor and the interaction between instructor and method on achievement as well as attitudes,…

  20. Integration of RNA-Seq and RPPA data for survival time prediction in cancer patients.

    Isik, Zerrin; Ercan, Muserref Ece

    2017-10-01

    Integration of several types of patient data in a computational framework can accelerate the identification of more reliable biomarkers, especially for prognostic purposes. This study aims to identify biomarkers that can successfully predict the potential survival time of a cancer patient by integrating the transcriptomic (RNA-Seq), proteomic (RPPA), and protein-protein interaction (PPI) data. The proposed method -RPBioNet- employs a random walk-based algorithm that works on a PPI network to identify a limited number of protein biomarkers. Later, the method uses gene expression measurements of the selected biomarkers to train a classifier for the survival time prediction of patients. RPBioNet was applied to classify kidney renal clear cell carcinoma (KIRC), glioblastoma multiforme (GBM), and lung squamous cell carcinoma (LUSC) patients based on their survival time classes (long- or short-term). The RPBioNet method correctly identified the survival time classes of patients with between 66% and 78% average accuracy for three data sets. RPBioNet operates with only 20 to 50 biomarkers and can achieve on average 6% higher accuracy compared to the closest alternative method, which uses only RNA-Seq data in the biomarker selection. Further analysis of the most predictive biomarkers highlighted genes that are common for both cancer types, as they may be driver proteins responsible for cancer progression. The novelty of this study is the integration of a PPI network with mRNA and protein expression data to identify more accurate prognostic biomarkers that can be used for clinical purposes in the future. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Integration of Tuyere, Raceway and Shaft Models for Predicting Blast Furnace Process

    Fu, Dong; Tang, Guangwu; Zhao, Yongfu; D'Alessio, John; Zhou, Chenn Q.

    2018-06-01

    A novel modeling strategy is presented for simulating the blast furnace iron making process. Such physical and chemical phenomena are taking place across a wide range of length and time scales, and three models are developed to simulate different regions of the blast furnace, i.e., the tuyere model, the raceway model and the shaft model. This paper focuses on the integration of the three models to predict the entire blast furnace process. Mapping output and input between models and an iterative scheme are developed to establish communications between models. The effects of tuyere operation and burden distribution on blast furnace fuel efficiency are investigated numerically. The integration of different models provides a way to realistically simulate the blast furnace by improving the modeling resolution on local phenomena and minimizing the model assumptions.

  2. Predicting 2D target velocity cannot help 2D motion integration for smooth pursuit initiation.

    Montagnini, Anna; Spering, Miriam; Masson, Guillaume S

    2006-12-01

    Smooth pursuit eye movements reflect the temporal dynamics of bidimensional (2D) visual motion integration. When tracking a single, tilted line, initial pursuit direction is biased toward unidimensional (1D) edge motion signals, which are orthogonal to the line orientation. Over 200 ms, tracking direction is slowly corrected to finally match the 2D object motion during steady-state pursuit. We now show that repetition of line orientation and/or motion direction does not eliminate the transient tracking direction error nor change the time course of pursuit correction. Nonetheless, multiple successive presentations of a single orientation/direction condition elicit robust anticipatory pursuit eye movements that always go in the 2D object motion direction not the 1D edge motion direction. These results demonstrate that predictive signals about target motion cannot be used for an efficient integration of ambiguous velocity signals at pursuit initiation.

  3. Sensory processing patterns predict the integration of information held in visual working memory.

    Lowe, Matthew X; Stevenson, Ryan A; Wilson, Kristin E; Ouslis, Natasha E; Barense, Morgan D; Cant, Jonathan S; Ferber, Susanne

    2016-02-01

    Given the limited resources of visual working memory, multiple items may be remembered as an averaged group or ensemble. As a result, local information may be ill-defined, but these ensemble representations provide accurate diagnostics of the natural world by combining gist information with item-level information held in visual working memory. Some neurodevelopmental disorders are characterized by sensory processing profiles that predispose individuals to avoid or seek-out sensory stimulation, fundamentally altering their perceptual experience. Here, we report such processing styles will affect the computation of ensemble statistics in the general population. We identified stable adult sensory processing patterns to demonstrate that individuals with low sensory thresholds who show a greater proclivity to engage in active response strategies to prevent sensory overstimulation are less likely to integrate mean size information across a set of similar items and are therefore more likely to be biased away from the mean size representation of an ensemble display. We therefore propose the study of ensemble processing should extend beyond the statistics of the display, and should also consider the statistics of the observer. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  4. A Distributed Model Predictive Control approach for the integration of flexible loads, storage and renewables

    Ferrarini, Luca; Mantovani, Giancarlo; Costanzo, Giuseppe Tommaso

    2014-01-01

    This paper presents an innovative solution based on distributed model predictive controllers to integrate the control and management of energy consumption, energy storage, PV and wind generation at customer side. The overall goal is to enable an advanced prosumer to autoproduce part of the energy...... he needs with renewable sources and, at the same time, to optimally exploit the thermal and electrical storages, to trade off its comfort requirements with different pricing schemes (including real-time pricing), and apply optimal control techniques rather than sub-optimal heuristics....

  5. Model Predictive Control of Grid Connected Modular Multilevel Converter for Integration of Photovoltaic Power Systems

    Hajizadeh, Amin; Shahirinia, Amir

    2017-01-01

    Investigation of an advanced control structure for integration of Photovoltaic Power Systems through Grid Connected-Modular Multilevel Converter (GC-MMC) is proposed in this paper. To achieve this goal, a non-linear model of MMC regarding considering of negative and positive sequence components has...... been presented. Then, due to existence of unbalance voltage faults in distribution grid, non-linarites and uncertainties in model, model predictive controller which is developed for GC-MMC. They are implemented based upon positive and negative components of voltage and current to mitigate the power...

  6. An integrated numerical model for the prediction of Gaussian and billet shapes

    Hattel, Jesper; Pryds, Nini; Pedersen, Trine Bjerre

    2004-01-01

    Separate models for the atomisation and the deposition stages were recently integrated by the authors to form a unified model describing the entire spray-forming process. In the present paper, the focus is on describing the shape of the deposited material during the spray-forming process, obtained...... by this model. After a short review of the models and their coupling, the important factors which influence the resulting shape, i.e. Gaussian or billet, are addressed. The key parameters, which are utilized to predict the geometry and dimension of the deposited material, are the sticking efficiency...

  7. On predicting quantal cross sections by interpolation: Surprisal analysis of j/sub z/CCS and statistical j/sub z/ results

    Goldflam, R.; Kouri, D.J.

    1976-01-01

    New methods for predicting the full matrix of integral cross sections are developed by combining the surprisal analysis of Bernstein and Levine with the j/sub z/-conserving coupled states method (j/sub z/CCS) of McGuire, Kouri, and Pack and with the statistical j/sub z/ approximation (Sj/sub z/) of Kouri, Shimoni, and Heil. A variety of approaches is possible and only three are studied in the present work. These are (a) a surprisal fit of the j=0→j' column of the j/sub z/CCS cross section matrix (thereby requiring only a solution of the lambda=0 set of j/sub z/CCS equations), (b) a surprisal fit of the lambda-bar=0 Sj/sub z/ cross section matrix (again requiring solution of the lambda=0 set of j/sub z/CCS equations only), and (c) a surprisal fit of a lambda-bar not equal to 0 Sj/sub z/ submatrix (involving input cross sections for j,j'> or =lambda-bar transitions only). The last approach requires the solution of the lambda=lambda-bar set of j/sub z/CCS equations only, which requires less computation effort than the effective potential method. We explore three different choices for the prior and two-parameter (i.e., linear) and three-parameter (i.e., parabolic) fits as applied to Ar--N 2 collisions. The results are in general very encouraging and for one choice of prior give results which are within 20% of the exact j/sub z/CCS results

  8. Predicting sugar consumption: Application of an integrated dual-process, dual-phase model.

    Hagger, Martin S; Trost, Nadine; Keech, Jacob J; Chan, Derwin K C; Hamilton, Kyra

    2017-09-01

    Excess consumption of added dietary sugars is related to multiple metabolic problems and adverse health conditions. Identifying the modifiable social cognitive and motivational constructs that predict sugar consumption is important to inform behavioral interventions aimed at reducing sugar intake. We tested the efficacy of an integrated dual-process, dual-phase model derived from multiple theories to predict sugar consumption. Using a prospective design, university students (N = 90) completed initial measures of the reflective (autonomous and controlled motivation, intentions, attitudes, subjective norm, perceived behavioral control), impulsive (implicit attitudes), volitional (action and coping planning), and behavioral (past sugar consumption) components of the proposed model. Self-reported sugar consumption was measured two weeks later. A structural equation model revealed that intentions, implicit attitudes, and, indirectly, autonomous motivation to reduce sugar consumption had small, significant effects on sugar consumption. Attitudes, subjective norm, and, indirectly, autonomous motivation to reduce sugar consumption predicted intentions. There were no effects of the planning constructs. Model effects were independent of the effects of past sugar consumption. The model identified the relative contribution of reflective and impulsive components in predicting sugar consumption. Given the prominent role of the impulsive component, interventions that assist individuals in managing cues-to-action and behavioral monitoring are likely to be effective in regulating sugar consumption. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. [Prediction of regional soil quality based on mutual information theory integrated with decision tree algorithm].

    Lin, Fen-Fang; Wang, Ke; Yang, Ning; Yan, Shi-Guang; Zheng, Xin-Yu

    2012-02-01

    In this paper, some main factors such as soil type, land use pattern, lithology type, topography, road, and industry type that affect soil quality were used to precisely obtain the spatial distribution characteristics of regional soil quality, mutual information theory was adopted to select the main environmental factors, and decision tree algorithm See 5.0 was applied to predict the grade of regional soil quality. The main factors affecting regional soil quality were soil type, land use, lithology type, distance to town, distance to water area, altitude, distance to road, and distance to industrial land. The prediction accuracy of the decision tree model with the variables selected by mutual information was obviously higher than that of the model with all variables, and, for the former model, whether of decision tree or of decision rule, its prediction accuracy was all higher than 80%. Based on the continuous and categorical data, the method of mutual information theory integrated with decision tree could not only reduce the number of input parameters for decision tree algorithm, but also predict and assess regional soil quality effectively.

  10. Statistical prediction of seasonal discharge in Central Asia for water resources management: development of a generic (pre-)operational modeling tool

    Apel, Heiko; Baimaganbetov, Azamat; Kalashnikova, Olga; Gavrilenko, Nadejda; Abdykerimova, Zharkinay; Agalhanova, Marina; Gerlitz, Lars; Unger-Shayesteh, Katy; Vorogushyn, Sergiy; Gafurov, Abror

    2017-04-01

    for robustness by a leave-one-out cross validation. Based on the cross validation the predictive uncertainty was quantified for every prediction model. According to the official procedures of the hydromet services forecasts of the mean seasonal discharge of the period April to September are derived every month starting from January until June. The application of the model for several catchments in Central Asia - ranging from small to the largest rivers - for the period 2000-2015 provided skillful forecasts for most catchments already in January. The skill of the prediction increased every month, with R2 values often in the range 0.8 - 0.9 in April just before the prediction period. The forecasts further improve in the following months, most likely due to the integration of spring precipitation, which is not included in the predictors before May, or spring discharge, which contains indicative information for the overall seasonal discharge. In summary, the proposed generic automatic forecast model development tool provides robust predictions for seasonal water availability in Central Asia, which will be tested against the official forecasts in the upcoming years, with the vision of eventual operational implementation.

  11. Integrated prediction based on GIS for sandstone-type uranium deposits in the northwest of Ordos Basin

    Han Shaoyang; Ke Dan; Hu Shuiqing; Guo Qingyin; Hou Huiqun

    2005-01-01

    The integrated prediction model of sandstone-type uranium deposits and its integrated evaluation methods as well as flow of the work based on GIS are studied. A software for extracting metallogenic information is also developed. A multi-source exploring information database is established in the northwest of Ordos Basin, and an integrated digital mineral deposit prospecting model of sandstone-type uranium deposits is designed based on GIS. The authors have completed metallogenic information extraction and integrated evaluation of sandstone-type uranium deposits based on GIS in the study area. Research results prove that the integrated prediction of sandstone-type uranium deposits based on GIS may further delineate prospective target areas rapidly and improve the predictive precision. (authors)

  12. Modelling short- and long-term statistical learning of music as a process of predictive entropy reduction

    Hansen, Niels Christian; Loui, Psyche; Vuust, Peter

    Statistical learning underlies the generation of expectations with different degrees of uncertainty. In music, uncertainty applies to expectations for pitches in a melody. This uncertainty can be quantified by Shannon entropy from distributions of expectedness ratings for multiple continuations o...

  13. Thought insertion as a self-disturbance: An integration of predictive coding and phenomenological approaches

    Philipp Sterzer

    2016-10-01

    Full Text Available Current theories in the framework of hierarchical predictive coding propose that positive symptoms of schizophrenia, such as delusions and hallucinations, arise from an alteration in Bayesian inference, the term inference referring to a process by which learned predictions are used to infer probable causes of sensory data. However, for one particularly striking and frequent symptom of schizophrenia, thought insertion, no plausible account has been proposed in terms of the predictive-coding framework. Here we propose that thought insertion is due to an altered experience of thoughts as coming from nowhere, as is already indicated by the early 20th century phenomenological accounts by the early Heidelberg School of psychiatry. These accounts identified thought insertion as one of the self-disturbances (from German: Ichstörungen of schizophrenia and used mescaline as a model-psychosis in healthy individuals to explore the possible mechanisms. The early Heidelberg School (Gruhle, Mayer-Gross, Beringer first named and defined the self-disturbances, and proposed that thought insertion involves a disruption of the inner connectedness of thoughts and experiences, and a becoming sensory of those thoughts experienced as inserted. This account offers a novel way to integrate the phenomenology of thought insertion with the predictive coding framework. We argue that the altered experience of thoughts may be caused by a reduced precision of context-dependent predictions, relative to sensory precision. According to the principles of Bayesian inference, this reduced precision leads to increased prediction-error signals evoked by the neural activity that encodes thoughts. Thus, in analogy with the prediction-error related aberrant salience of external events that has been proposed previously, internal events such as thoughts (including volitions, emotions and memories can also be associated with increased prediction-error signaling and are thus imbued with

  14. Integrating sequence stratigraphy and rock-physics to interpret seismic amplitudes and predict reservoir quality

    Dutta, Tanima

    This dissertation focuses on the link between seismic amplitudes and reservoir properties. Prediction of reservoir properties, such as sorting, sand/shale ratio, and cement-volume from seismic amplitudes improves by integrating knowledge from multiple disciplines. The key contribution of this dissertation is to improve the prediction of reservoir properties by integrating sequence stratigraphy and rock physics. Sequence stratigraphy has been successfully used for qualitative interpretation of seismic amplitudes to predict reservoir properties. Rock physics modeling allows quantitative interpretation of seismic amplitudes. However, often there is uncertainty about selecting geologically appropriate rock physics model and its input parameters, away from the wells. In the present dissertation, we exploit the predictive power of sequence stratigraphy to extract the spatial trends of sedimentological parameters that control seismic amplitudes. These spatial trends of sedimentological parameters can serve as valuable constraints in rock physics modeling, especially away from the wells. Consequently, rock physics modeling, integrated with the trends from sequence stratigraphy, become useful for interpreting observed seismic amplitudes away from the wells in terms of underlying sedimentological parameters. We illustrate this methodology using a comprehensive dataset from channelized turbidite systems, deposited in minibasin settings in the offshore Equatorial Guinea, West Africa. First, we present a practical recipe for using closed-form expressions of effective medium models to predict seismic velocities in unconsolidated sandstones. We use an effective medium model that combines perfectly rough and smooth grains (the extended Walton model), and use that model to derive coordination number, porosity, and pressure relations for P and S wave velocities from experimental data. Our recipe provides reasonable fits to other experimental and borehole data, and specifically

  15. Global identification predicts gay-male identity integration and well-being among Turkish gay men.

    Koc, Yasin; Vignoles, Vivian L

    2016-12-01

    In most parts of the world, hegemonic masculinity requires men to endorse traditional masculine ideals, one of which is rejection of homosexuality. Wherever hegemonic masculinity favours heterosexuality over homosexuality, gay males may feel under pressure to negotiate their conflicting male gender and gay sexual identities to maintain positive self-perceptions. However, globalization, as a source of intercultural interaction, might provide a beneficial context for people wishing to create alternative masculinities in the face of hegemonic masculinity. Hence, we tested if global identification would predict higher levels of gay-male identity integration, and indirectly subjective well-being, via alternative masculinity representations for gay and male identities. A community sample of 219 gay and bisexual men from Turkey completed the study. Structural equation modelling revealed that global identification positively predicted gay-male identity integration, and indirectly subjective well-being; however, alternative masculinity representations did not mediate this relationship. Our findings illustrate how identity categories in different domains can intersect and affect each other in complex ways. Moreover, we discuss mental health and well-being implications for gay men living in cultures where they experience high levels of prejudice and stigma. © 2016 The British Psychological Society.

  16. Guidelineness of the parameters using integrated test in down syndrome risk prediction

    Lee, Jin Won [Graduate School of Catholic University of Pusan, Busan (Korea, Republic of); Go, Sung Jin; Kang, Se Sik; Kim, Chang Soo [Dept. Radiological Science, College of Health Sciences, Catholic University of Pusan, Busan (Korea, Republic of)

    2016-12-15

    This study was an evaluation of the significance of each parameter through aimed at pregnant women subjected to screening test(integrated test) in predicting risk of Down syndrome. We retrospectively analysed the correlation of risk of Down's syndrome with Nuchal Translucency(NT) images measured by ultrasound, Pregnancy Associated Plasma Protein A(PAPP-A), alpha-fetoprotein(AFP), unconjugated estriol(uE3), human chorionic gonadotrophin(hCG) and Inhibin A by maternal serum. As a result, a significant correlation with NT, uE3, hCG, Inhibin A is revealed with Down's syndrome risk(P<.001). In ROC analysis, AUC of Inhibin A is analysed as the biggest predictor of Down's syndrome(0.859). And the criterion for cut-off was inhibin A 1.4 MoM(sensitivity 81.8%, specificity 75.9%). In conclusion, Inhibin A was the most useful in parameters to predict Down's syndrome in the integrated test. If we make up for the weakness based on the cut-off value of parameters they will be able to be used as an independent indicator in the risk of Down's syndrome screening.

  17. Predictive networks: a flexible, open source, web application for integration and analysis of human gene networks.

    Haibe-Kains, Benjamin; Olsen, Catharina; Djebbari, Amira; Bontempi, Gianluca; Correll, Mick; Bouton, Christopher; Quackenbush, John

    2012-01-01

    Genomics provided us with an unprecedented quantity of data on the genes that are activated or repressed in a wide range of phenotypes. We have increasingly come to recognize that defining the networks and pathways underlying these phenotypes requires both the integration of multiple data types and the development of advanced computational methods to infer relationships between the genes and to estimate the predictive power of the networks through which they interact. To address these issues we have developed Predictive Networks (PN), a flexible, open-source, web-based application and data services framework that enables the integration, navigation, visualization and analysis of gene interaction networks. The primary goal of PN is to allow biomedical researchers to evaluate experimentally derived gene lists in the context of large-scale gene interaction networks. The PN analytical pipeline involves two key steps. The first is the collection of a comprehensive set of known gene interactions derived from a variety of publicly available sources. The second is to use these 'known' interactions together with gene expression data to infer robust gene networks. The PN web application is accessible from http://predictivenetworks.org. The PN code base is freely available at https://sourceforge.net/projects/predictivenets/.

  18. Guidelineness of the parameters using integrated test in down syndrome risk prediction

    Lee, Jin Won; Go, Sung Jin; Kang, Se Sik; Kim, Chang Soo

    2016-01-01

    This study was an evaluation of the significance of each parameter through aimed at pregnant women subjected to screening test(integrated test) in predicting risk of Down syndrome. We retrospectively analysed the correlation of risk of Down's syndrome with Nuchal Translucency(NT) images measured by ultrasound, Pregnancy Associated Plasma Protein A(PAPP-A), alpha-fetoprotein(AFP), unconjugated estriol(uE3), human chorionic gonadotrophin(hCG) and Inhibin A by maternal serum. As a result, a significant correlation with NT, uE3, hCG, Inhibin A is revealed with Down's syndrome risk(P<.001). In ROC analysis, AUC of Inhibin A is analysed as the biggest predictor of Down's syndrome(0.859). And the criterion for cut-off was inhibin A 1.4 MoM(sensitivity 81.8%, specificity 75.9%). In conclusion, Inhibin A was the most useful in parameters to predict Down's syndrome in the integrated test. If we make up for the weakness based on the cut-off value of parameters they will be able to be used as an independent indicator in the risk of Down's syndrome screening

  19. An integrated Modelling framework to monitor and predict trends of agricultural management (iMSoil)

    Keller, Armin; Della Peruta, Raneiro; Schaepman, Michael; Gomez, Marta; Mann, Stefan; Schulin, Rainer

    2014-05-01

    Agricultural systems lay at the interface between natural ecosystems and the anthroposphere. Various drivers induce pressures on the agricultural systems, leading to changes in farming practice. The limitation of available land and the socio-economic drivers are likely to result in further intensification of agricultural land management, with implications on fertilization practices, soil and pest management, as well as crop and livestock production. In order to steer the development into desired directions, tools are required by which the effects of these pressures on agricultural management and resulting impacts on soil functioning can be detected as early as possible, future scenarios predicted and suitable management options and policies defined. In this context, the use of integrated models can play a major role in providing long-term predictions of soil quality and assessing the sustainability of agricultural soil management. Significant progress has been made in this field over the last decades. Some of these integrated modelling frameworks include biophysical parameters, but often the inherent characteristics and detailed processes of the soil system have been very simplified. The development of such tools has been hampered in the past by a lack of spatially explicit soil and land management information at regional scale. The iMSoil project, funded by the Swiss National Science Foundation in the national research programme NRP68 "soil as a resource" (www.nrp68.ch) aims at developing and implementing an integrated modeling framework (IMF) which can overcome the limitations mentioned above, by combining socio-economic, agricultural land management, and biophysical models, in order to predict the long-term impacts of different socio-economic scenarios on the soil quality. In our presentation we briefly outline the approach that is based on an interdisciplinary modular framework that builds on already existing monitoring tools and model components that are

  20. A statistical forecast model using the time-scale decomposition technique to predict rainfall during flood period over the middle and lower reaches of the Yangtze River Valley

    Hu, Yijia; Zhong, Zhong; Zhu, Yimin; Ha, Yao

    2018-04-01

    In this paper, a statistical forecast model using the time-scale decomposition method is established to do the seasonal prediction of the rainfall during flood period (FPR) over the middle and lower reaches of the Yangtze River Valley (MLYRV). This method decomposites the rainfall over the MLYRV into three time-scale components, namely, the interannual component with the period less than 8 years, the interdecadal component with the period from 8 to 30 years, and the interdecadal component with the period larger than 30 years. Then, the predictors are selected for the three time-scale components of FPR through the correlation analysis. At last, a statistical forecast model is established using the multiple linear regression technique to predict the three time-scale components of the FPR, respectively. The results show that this forecast model can capture the interannual and interdecadal variation of FPR. The hindcast of FPR during 14 years from 2001 to 2014 shows that the FPR can be predicted successfully in 11 out of the 14 years. This forecast model performs better than the model using traditional scheme without time-scale decomposition. Therefore, the statistical forecast model using the time-scale decomposition technique has good skills and application value in the operational prediction of FPR over the MLYRV.

  1. A vision for an ultra-high resolution integrated water cycle observation and prediction system

    Houser, P. R.

    2013-05-01

    Society's welfare, progress, and sustainable economic growth—and life itself—depend on the abundance and vigorous cycling and replenishing of water throughout the global environment. The water cycle operates on a continuum of time and space scales and exchanges large amounts of energy as water undergoes phase changes and is moved from one part of the Earth system to another. We must move toward an integrated observation and prediction paradigm that addresses broad local-to-global science and application issues by realizing synergies associated with multiple, coordinated observations and prediction systems. A central challenge of a future water and energy cycle observation strategy is to progress from single variable water-cycle instruments to multivariable integrated instruments in electromagnetic-band families. The microwave range in the electromagnetic spectrum is ideally suited for sensing the state and abundance of water because of water's dielectric properties. Eventually, a dedicated high-resolution water-cycle microwave-based satellite mission may be possible based on large-aperture antenna technology that can harvest the synergy that would be afforded by simultaneous multichannel active and passive microwave measurements. A partial demonstration of these ideas can even be realized with existing microwave satellite observations to support advanced multivariate retrieval methods that can exploit the totality of the microwave spectral information. The simultaneous multichannel active and passive microwave retrieval would allow improved-accuracy retrievals that are not possible with isolated measurements. Furthermore, the simultaneous monitoring of several of the land, atmospheric, oceanic, and cryospheric states brings synergies that will substantially enhance understanding of the global water and energy cycle as a system. The multichannel approach also affords advantages to some constituent retrievals—for instance, simultaneous retrieval of vegetation

  2. Accurate diffraction data integration by the EVAL15 profile prediction method : Application in chemical and biological crystallography

    Xian, X.

    2009-01-01

    Accurate integration of reflection intensities plays an essential role in structure determination of the crystallized compound. A new diffraction data integration method, EVAL15, is presented in this thesis. This method uses the principle of general impacts to predict ab inito three-dimensional

  3. Observing others stay or switch - How social prediction errors are integrated into reward reversal learning.

    Ihssen, Niklas; Mussweiler, Thomas; Linden, David E J

    2016-08-01

    Reward properties of stimuli can undergo sudden changes, and the detection of these 'reversals' is often made difficult by the probabilistic nature of rewards/punishments. Here we tested whether and how humans use social information (someone else's choices) to overcome uncertainty during reversal learning. We show a substantial social influence during reversal learning, which was modulated by the type of observed behavior. Participants frequently followed observed conservative choices (no switches after punishment) made by the (fictitious) other player but ignored impulsive choices (switches), even though the experiment was set up so that both types of response behavior would be similarly beneficial/detrimental (Study 1). Computational modeling showed that participants integrated the observed choices as a 'social prediction error' instead of ignoring or blindly following the other player. Modeling also confirmed higher learning rates for 'conservative' versus 'impulsive' social prediction errors. Importantly, this 'conservative bias' was boosted by interpersonal similarity, which in conjunction with the lack of effects observed in a non-social control experiment (Study 2) confirmed its social nature. A third study suggested that relative weighting of observed impulsive responses increased with increased volatility (frequency of reversals). Finally, simulations showed that in the present paradigm integrating social and reward information was not necessarily more adaptive to maximize earnings than learning from reward alone. Moreover, integrating social information increased accuracy only when conservative and impulsive choices were weighted similarly during learning. These findings suggest that to guide decisions in choice contexts that involve reward reversals humans utilize social cues conforming with their preconceptions more strongly than cues conflicting with them, especially when the other is similar. Copyright © 2016 The Authors. Published by Elsevier B

  4. Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models

    LeDonne Norman C

    2011-02-01

    Full Text Available Abstract Background Standard approaches to address the performance of predictive models that used common statistical measurements for the entire data set provide an overview of the average performance of the models across the entire predictive space, but give little insight into applicability of the model across the prediction space. Guha and Van Drie recently proposed the use of structure-activity landscape index (SALI curves via the SALI curve integral (SCI as a means to map the predictive power of computational models within the predictive space. This approach evaluates model performance by assessing the accuracy of pairwise predictions, comparing compound pairs in a manner similar to that done by medicinal chemists. Results The SALI approach was used to evaluate the performance of continuous prediction models for MDR1-MDCK in vitro efflux potential. Efflux models were built with ADMET Predictor neural net, support vector machine, kernel partial least squares, and multiple linear regression engines, as well as SIMCA-P+ partial least squares, and random forest from Pipeline Pilot as implemented by AstraZeneca, using molecular descriptors from SimulationsPlus and AstraZeneca. Conclusion The results indicate that the choice of training sets used to build the prediction models is of great importance in the resulting model quality and that the SCI values calculated for these models were very similar to their Kendall τ values, leading to our suggestion of an approach to use this SALI/SCI paradigm to evaluate predictive model performance that will allow more informed decisions regarding model utility. The use of SALI graphs and curves provides an additional level of quality assessment for predictive models.

  5. Use of structure-activity landscape index curves and curve integrals to evaluate the performance of multiple machine learning prediction models.

    Ledonne, Norman C; Rissolo, Kevin; Bulgarelli, James; Tini, Leonard

    2011-02-07

    Standard approaches to address the performance of predictive models that used common statistical measurements for the entire data set provide an overview of the average performance of the models across the entire predictive space, but give little insight into applicability of the model across the prediction space. Guha and Van Drie recently proposed the use of structure-activity landscape index (SALI) curves via the SALI curve integral (SCI) as a means to map the predictive power of computational models within the predictive space. This approach evaluates model performance by assessing the accuracy of pairwise predictions, comparing compound pairs in a manner similar to that done by medicinal chemists. The SALI approach was used to evaluate the performance of continuous prediction models for MDR1-MDCK in vitro efflux potential. Efflux models were built with ADMET Predictor neural net, support vector machine, kernel partial least squares, and multiple linear regression engines, as well as SIMCA-P+ partial least squares, and random forest from Pipeline Pilot as implemented by AstraZeneca, using molecular descriptors from SimulationsPlus and AstraZeneca. The results indicate that the choice of training sets used to build the prediction models is of great importance in the resulting model quality and that the SCI values calculated for these models were very similar to their Kendall τ values, leading to our suggestion of an approach to use this SALI/SCI paradigm to evaluate predictive model performance that will allow more informed decisions regarding model utility. The use of SALI graphs and curves provides an additional level of quality assessment for predictive models.

  6. An Integrated and Interdisciplinary Model for Predicting the Risk of Injury and Death in Future Earthquakes.

    Shapira, Stav; Novack, Lena; Bar-Dayan, Yaron; Aharonson-Daniel, Limor

    2016-01-01

    A comprehensive technique for earthquake-related casualty estimation remains an unmet challenge. This study aims to integrate risk factors related to characteristics of the exposed population and to the built environment in order to improve communities' preparedness and response capabilities and to mitigate future consequences. An innovative model was formulated based on a widely used loss estimation model (HAZUS) by integrating four human-related risk factors (age, gender, physical disability and socioeconomic status) that were identified through a systematic review and meta-analysis of epidemiological data. The common effect measures of these factors were calculated and entered to the existing model's algorithm using logistic regression equations. Sensitivity analysis was performed by conducting a casualty estimation simulation in a high-vulnerability risk area in Israel. the integrated model outcomes indicated an increase in the total number of casualties compared with the prediction of the traditional model; with regard to specific injury levels an increase was demonstrated in the number of expected fatalities and in the severely and moderately injured, and a decrease was noted in the lightly injured. Urban areas with higher populations at risk rates were found more vulnerable in this regard. The proposed model offers a novel approach that allows quantification of the combined impact of human-related and structural factors on the results of earthquake casualty modelling. Investing efforts in reducing human vulnerability and increasing resilience prior to an occurrence of an earthquake could lead to a possible decrease in the expected number of casualties.

  7. iSPUW: integrated sensing and prediction of urban water for sustainable cities

    Noh, S. J.; Nazari, B.; Habibi, H.; Norouzi, A.; Nabatian, M.; Seo, D. J.; Bartos, M. D.; Kerkez, B.; Lakshman, L.; Zink, M.; Lee, J.

    2016-12-01

    Many cities face tremendous water-related challenges in this Century of the City. Urban areas are particularly susceptible not only to excesses and shortages of water but also to impaired water quality. To addresses these challenges, we synergistically integrate advances in computing and cyber-infrastructure, environmental modeling, geoscience, and information science to develop integrative solutions for urban water challenges. In this presentation, we describe the various efforts that are currently ongoing in the Dallas-Fort Worth Metroplex (DFW) area for iSPUW: real-time high-resolution flash flood forecasting, inundation mapping for large urban areas, crowdsourcing of water observations in urban areas, real-time assimilation of crowdsourced observations for street and river flooding, integrated control of lawn irrigation and rainwater harvesting for water conservation and stormwater management, feature mining with causal discovery for flood prediction, and development of the Arlington Urban Hydroinformatics Testbed. Analyzed is the initial data of sensor network for water level and lawn monitoring, and cellphone applications for crowdsourcing flood reports. New data assimilation approaches to deal with categorical and continuous observations are also evaluated via synthetic experiments.

  8. Integration of research infrastructures and ecosystem models toward development of predictive ecology

    Luo, Y.; Huang, Y.; Jiang, J.; MA, S.; Saruta, V.; Liang, G.; Hanson, P. J.; Ricciuto, D. M.; Milcu, A.; Roy, J.

    2017-12-01

    The past two decades have witnessed rapid development in sensor technology. Built upon the sensor development, large research infrastructure facilities, such as National Ecological Observatory Network (NEON) and FLUXNET, have been established. Through networking different kinds of sensors and other data collections at many locations all over the world, those facilities generate large volumes of ecological data every day. The big data from those facilities offer an unprecedented opportunity for advancing our understanding of ecological processes, educating teachers and students, supporting decision-making, and testing ecological theory. The big data from the major research infrastructure facilities also provides foundation for developing predictive ecology. Indeed, the capability to predict future changes in our living environment and natural resources is critical to decision making in a world where the past is no longer a clear guide to the future. We are living in a period marked by rapid climate change, profound alteration of biogeochemical cycles, unsustainable depletion of natural resources, and deterioration of air and water quality. Projecting changes in future ecosystem services to the society becomes essential not only for science but also for policy making. We will use this panel format to outline major opportunities and challenges in integrating research infrastructure and ecosystem models toward developing predictive ecology. Meanwhile, we will also show results from an interactive model-experiment System - Ecological Platform for Assimilating Data into models (EcoPAD) - that have been implemented at the Spruce and Peatland Responses Under Climatic and Environmental change (SPRUCE) experiment in Northern Minnesota and Montpellier Ecotron, France. EcoPAD is developed by integrating web technology, eco-informatics, data assimilation techniques, and ecosystem modeling. EcoPAD is designed to streamline data transfer seamlessly from research infrastructure

  9. Abnormal white matter integrity in chronic users of codeine-containing cough syrups: a tract-based spatial statistics study.

    Qiu, Y-W; Su, H-H; Lv, X-F; Jiang, G-H

    2015-01-01

    Codeine-containing cough syrups have become one of the most popular drugs of abuse in young people in the world. Chronic codeine-containing cough syrup abuse is related to impairments in a broad range of cognitive functions. However, the potential brain white matter impairment caused by chronic codeine-containing cough syrup abuse has not been reported previously. Our aim was to investigate abnormalities in the microstructure of brain white matter in chronic users of codeine-containing syrups and to determine whether these WM abnormalities are related to the duration of the use these syrups and clinical impulsivity. Thirty chronic codeine-containing syrup users and 30 matched controls were evaluated. Diffusion tensor imaging was performed by using a single-shot spin-echo-planar sequence. Whole-brain voxelwise analysis of fractional anisotropy was performed by using tract-based spatial statistics to localize abnormal WM regions. The Barratt Impulsiveness Scale 11 was surveyed to assess participants' impulsivity. Volume-of-interest analysis was used to detect changes of diffusivity indices in regions with fractional anisotropy abnormalities. Abnormal fractional anisotropy was extracted and correlated with clinical impulsivity and the duration of codeine-containing syrup use. Chronic codeine-containing syrup users had significantly lower fractional anisotropy in the inferior fronto-occipital fasciculus of the bilateral temporo-occipital regions, right frontal region, and the right corona radiata WM than controls. There were significant negative correlations among fractional anisotropy values of the right frontal region of the inferior fronto-occipital fasciculus and the right superior corona radiata WM and Barratt Impulsiveness Scale total scores, and between the right frontal region of the inferior fronto-occipital fasciculus and nonplan impulsivity scores in chronic codeine-containing syrup users. There was also a significant negative correlation between fractional

  10. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. 1: Theoretical development and application to yearly predictions for selected cities in the United States

    Manning, Robert M.

    1986-01-01

    A rain attenuation prediction model is described for use in calculating satellite communication link availability for any specific location in the world that is characterized by an extended record of rainfall. Such a formalism is necessary for the accurate assessment of such availability predictions in the case of the small user-terminal concept of the Advanced Communication Technology Satellite (ACTS) Project. The model employs the theory of extreme value statistics to generate the necessary statistical rainrate parameters from rain data in the form compiled by the National Weather Service. These location dependent rain statistics are then applied to a rain attenuation model to obtain a yearly prediction of the occurrence of attenuation on any satellite link at that location. The predictions of this model are compared to those of the Crane Two-Component Rain Model and some empirical data and found to be very good. The model is then used to calculate rain attenuation statistics at 59 locations in the United States (including Alaska and Hawaii) for the 20 GHz downlinks and 30 GHz uplinks of the proposed ACTS system. The flexibility of this modeling formalism is such that it allows a complete and unified treatment of the temporal aspects of rain attenuation that leads to the design of an optimum stochastic power control algorithm, the purpose of which is to efficiently counter such rain fades on a satellite link.

  11. Prediction of leisure-time walking: an integration of social cognitive, perceived environmental, and personality factors

    Blanchard Chris M

    2007-10-01

    Full Text Available Abstract Background Walking is the primary focus of population-based physical activity initiatives but a theoretical understanding of this behaviour is still elusive. The purpose of this study was to integrate personality, the perceived environment, and planning into a theory of planned behaviour (TPB framework to predict leisure-time walking. Methods Participants were a random sample (N = 358 of Canadian adults who completed measures of the TPB, planning, perceived neighbourhood environment, and personality at Time 1 and self-reported walking behaviour two months later. Results Analyses using structural equation modelling provided evidence that leisure-time walking is largely predicted by intention (standardized effect = .42 with an additional independent contribution from proximity to neighbourhood retail shops (standardized effect = .18. Intention, in turn, was predicted by attitudes toward walking and perceived behavioural control. Effects of perceived neighbourhood aesthetics and walking infrastructure on walking were mediated through attitudes and intention. Moderated regression analysis showed that the intention-walking relationship was moderated by conscientiousness and proximity to neighbourhood recreation facilities but not planning. Conclusion Overall, walking behaviour is theoretically complex but may best be addressed at a population level by facilitating strong intentions in a receptive environment even though individual differences may persist.

  12. An integrated computational validation approach for potential novel miRNA prediction

    Pooja Viswam

    2017-12-01

    Full Text Available MicroRNAs (miRNAs are short, non-coding RNAs between 17bp-24bp length that regulate gene expression by targeting mRNA molecules. The regulatory functions of miRNAs are known to be majorly associated with disease phenotypes such as cancer, cell signaling, cell division, growth and other metabolisms. Novel miRNAs are defined as sequences which does not have any similarity with the existing known sequences and void of any experimental evidences. In recent decades, the advent of next-generation sequencing allows us to capture the small RNA molecules form the cells and developing methods to estimate their expression levels. Several computational algorithms are available to predict the novel miRNAs from the deep sequencing data. In this work, we integrated three novel miRNA prediction programs miRDeep, miRanalyzer and miRPRo to compare and validate their prediction efficiency. The dicer cleavage sites, alignment density, seed conservation, minimum free energy, AU-GC percentage, secondary loop scores, false discovery rates and confidence scores will be considered for comparison and evaluation. Efficiency to identify isomiRs and base pair mismatches in a strand specific manner will also be considered for the computational validation. Further, the criteria and parameters for the identification of the best possible novel miRNA with minimal false positive rates were deduced.

  13. Integration of relational and hierarchical network information for protein function prediction

    Jiang Xiaoyu

    2008-08-01

    Full Text Available Abstract Background In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions. Results We propose a probabilistic framework to integrate information in relational data, in the form of a protein-protein interaction network, and a hierarchically structured database of terms, in the form of the GO database, for the purpose of protein function prediction. At the heart of our framework is a factorization of local neighborhood information in the protein-protein interaction network across successive ancestral terms in the GO hierarchy. We introduce a classifier within this framework, with computationally efficient implementation, that produces GO-term predictions that naturally obey a hierarchical 'true-path' consistency from root to leaves, without the need for further post-processing. Conclusion A cross-validation study, using data from the yeast Saccharomyces cerevisiae, shows our method offers substantial improvements over both standard 'guilt-by-association' (i.e., Nearest-Neighbor and more refined Markov random field methods, whether in their original form or when post-processed to artificially impose 'true-path' consistency. Further analysis of the results indicates that these improvements are associated with increased predictive capabilities (i.e., increased

  14. Predictive capacity of a non-radioisotopic local lymph node assay using flow cytometry, LLNA:BrdU-FCM: Comparison of a cutoff approach and inferential statistics.

    Kim, Da-Eun; Yang, Hyeri; Jang, Won-Hee; Jung, Kyoung-Mi; Park, Miyoung; Choi, Jin Kyu; Jung, Mi-Sook; Jeon, Eun-Young; Heo, Yong; Yeo, Kyung-Wook; Jo, Ji-Hoon; Park, Jung Eun; Sohn, Soo Jung; Kim, Tae Sung; Ahn, Il Young; Jeong, Tae-Cheon; Lim, Kyung-Min; Bae, SeungJin

    2016-01-01

    In order for a novel test method to be applied for regulatory purposes, its reliability and relevance, i.e., reproducibility and predictive capacity, must be demonstrated. Here, we examine the predictive capacity of a novel non-radioisotopic local lymph node assay, LLNA:BrdU-FCM (5-bromo-2'-deoxyuridine-flow cytometry), with a cutoff approach and inferential statistics as a prediction model. 22 reference substances in OECD TG429 were tested with a concurrent positive control, hexylcinnamaldehyde 25%(PC), and the stimulation index (SI) representing the fold increase in lymph node cells over the vehicle control was obtained. The optimal cutoff SI (2.7≤cutoff <3.5), with respect to predictive capacity, was obtained by a receiver operating characteristic curve, which produced 90.9% accuracy for the 22 substances. To address the inter-test variability in responsiveness, SI values standardized with PC were employed to obtain the optimal percentage cutoff (42.6≤cutoff <57.3% of PC), which produced 86.4% accuracy. A test substance may be diagnosed as a sensitizer if a statistically significant increase in SI is elicited. The parametric one-sided t-test and non-parametric Wilcoxon rank-sum test produced 77.3% accuracy. Similarly, a test substance could be defined as a sensitizer if the SI means of the vehicle control, and of the low, middle, and high concentrations were statistically significantly different, which was tested using ANOVA or Kruskal-Wallis, with post hoc analysis, Dunnett, or DSCF (Dwass-Steel-Critchlow-Fligner), respectively, depending on the equal variance test, producing 81.8% accuracy. The absolute SI-based cutoff approach produced the best predictive capacity, however the discordant decisions between prediction models need to be examined further. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Prediction of Al2O3 leaching recovery in the Bayer process using statistical multilinear regresion analysis

    Đurić I.

    2010-01-01

    Full Text Available This paper presents the results of defining the mathematical model which describes the dependence of leaching degree of Al2O3 in bauxite from the most influential input parameters in industrial conditions of conducting the leaching process in the Bayer technology of alumina production. Mathematical model is defined using the stepwise MLRA method, with R2 = 0.764 and significant statistical reliability - VIF<2 and p<0.05, on the one-year statistical sample. Validation of the acquired model was performed using the data from the following year, collected from the process conducted under industrial conditions, rendering the same statistical reliability, with R2 = 0.759.

  16. From data to the decision: A software architecture to integrate predictive modelling in clinical settings.

    Martinez-Millana, A; Fernandez-Llatas, C; Sacchi, L; Segagni, D; Guillen, S; Bellazzi, R; Traver, V

    2015-08-01

    The application of statistics and mathematics over large amounts of data is providing healthcare systems with new tools for screening and managing multiple diseases. Nonetheless, these tools have many technical and clinical limitations as they are based on datasets with concrete characteristics. This proposition paper describes a novel architecture focused on providing a validation framework for discrimination and prediction models in the screening of Type 2 diabetes. For that, the architecture has been designed to gather different data sources under a common data structure and, furthermore, to be controlled by a centralized component (Orchestrator) in charge of directing the interaction flows among data sources, models and graphical user interfaces. This innovative approach aims to overcome the data-dependency of the models by providing a validation framework for the models as they are used within clinical settings.

  17. Alexander Technique Training Coupled With an Integrative Model of Behavioral Prediction in Teachers With Low Back Pain.

    Kamalikhah, Tahereh; Morowatisharifabad, Mohammad Ali; Rezaei-Moghaddam, Farid; Ghasemi, Mohammad; Gholami-Fesharaki, Mohammad; Goklani, Salma

    2016-09-01

    Individuals suffering from chronic low back pain (CLBP) experience major physical, social, and occupational disruptions. Strong evidence confirms the effectiveness of Alexander technique (AT) training for CLBP. The present study applied an integrative model (IM) of behavioral prediction for improvement of AT training. This was a quasi-experimental study of female teachers with nonspecific LBP in southern Tehran in 2014. Group A contained 42 subjects and group B had 35 subjects. In group A, AT lessons were designed based on IM constructs, while in group B, AT lessons only were taught. The validity and reliability of the AT questionnaire were confirmed using content validity (CVR 0.91, CVI 0.96) and Cronbach's α (0.80). The IM constructs of both groups were measured after the completion of training. Statistical analysis used independent and paired samples t-tests and the univariate generalized linear model (GLM). Significant differences were recorded before and after intervention (P < 0.001) for the model constructs of intention, perceived risk, direct attitude, behavioral beliefs, and knowledge in both groups. Direct attitude and behavioral beliefs in group A were higher than in group B after the intervention (P < 0.03). The educational framework provided by IM for AT training improved attitude and behavioral beliefs that can facilitate the adoption of AT behavior and decreased CLBP.

  18. Integrated genetic and epigenetic prediction of coronary heart disease in the Framingham Heart Study.

    Meeshanthini V Dogan

    Full Text Available An improved method for detecting coronary heart disease (CHD could have substantial clinical impact. Building on the idea that systemic effects of CHD risk factors are a conglomeration of genetic and environmental factors, we use machine learning techniques and integrate genetic, epigenetic and phenotype data from the Framingham Heart Study to build and test a Random Forest classification model for symptomatic CHD. Our classifier was trained on n = 1,545 individuals and consisted of four DNA methylation sites, two SNPs, age and gender. The methylation sites and SNPs were selected during the training phase. The final trained model was then tested on n = 142 individuals. The test data comprised of individuals removed based on relatedness to those in the training dataset. This integrated classifier was capable of classifying symptomatic CHD status of those in the test set with an accuracy, sensitivity and specificity of 78%, 0.75 and 0.80, respectively. In contrast, a model using only conventional CHD risk factors as predictors had an accuracy and sensitivity of only 65% and 0.42, respectively, but with a specificity of 0.89 in the test set. Regression analyses of the methylation signatures illustrate our ability to map these signatures to known risk factors in CHD pathogenesis. These results demonstrate the capability of an integrated approach to effectively model symptomatic CHD status. These results also suggest that future studies of biomaterial collected from longitudinally informative cohorts that are specifically characterized for cardiac disease at follow-up could lead to the introduction of sensitive, readily employable integrated genetic-epigenetic algorithms for predicting onset of future symptomatic CHD.