WorldWideScience

Sample records for parallel process model

  1. The Extended Parallel Process Model: Illuminating the Gaps in Research

    Science.gov (United States)

    Popova, Lucy

    2012-01-01

    This article examines constructs, propositions, and assumptions of the extended parallel process model (EPPM). Review of the EPPM literature reveals that its theoretical concepts are thoroughly developed, but the theory lacks consistency in operational definitions of some of its constructs. Out of the 12 propositions of the EPPM, a few have not…

  2. "Let's Move" campaign: applying the extended parallel process model.

    Science.gov (United States)

    Batchelder, Alicia; Matusitz, Jonathan

    2014-01-01

    This article examines Michelle Obama's health campaign, "Let's Move," through the lens of the extended parallel process model (EPPM). "Let's Move" aims to reduce the childhood obesity epidemic in the United States. Developed by Kim Witte, EPPM rests on the premise that people's attitudes can be changed when fear is exploited as a factor of persuasion. Fear appeals work best (a) when a person feels a concern about the issue or situation, and (b) when he or she believes to have the capability of dealing with that issue or situation. Overall, the analysis found that "Let's Move" is based on past health campaigns that have been successful. An important element of the campaign is the use of fear appeals (as it is postulated by EPPM). For example, part of the campaign's strategies is to explain the severity of the diseases associated with obesity. By looking at the steps of EPPM, readers can also understand the strengths and weaknesses of "Let's Move."

  3. A model for dealing with parallel processes in supervision

    Directory of Open Access Journals (Sweden)

    Lilja Cajvert

    2011-03-01

    Supervision in social work is essential for successful outcomes when working with clients. In social work, unconscious difficulties may arise and similar difficulties may occur in supervision as parallel processes. In this article, the development of a practice-based model of supervision to deal with parallel processes in supervision is described. The model has six phases. In the first phase, the focus is on the supervisor’s inner world, his/her own reflections and observations. In the second phase, the supervision situation is “frozen”, and the supervisees are invited to join the supervisor in taking a meta-perspective on the current situation of supervision. The focus in the third phase is on the inner world of all the group members as well as the visualization and identification of reflections and feelings that arose during the supervision process. Phase four focuses on the supervisee who presented a case, and in phase five the focus shifts to the common understanding and theorization of the supervision process as well as the definition and identification of possible parallel processes. In the final phase, the supervisee, with the assistance of the supervisor and other members of the group, develops a solution and determines how to proceed with the client in treatment. This article uses phenomenological concepts to provide a theoretical framework for the supervision model. Phenomenological reduction is an important approach to examine and to externalize and visualize the inner words of the supervisor and supervisees. Een model voor het hanteren van parallelle processen tijdens supervisie Om succesvol te zijn in de hulpverlening aan cliënten, is supervisie cruciaal in het sociaal werk. Tijdens de hulpverlening kunnen impliciete moeilijkheden de kop opsteken en soortgelijke moeilijkheden duiken soms ook op tijdens supervisie. Dit worden parallelle processen genoemd. Dit artikel beschrijft een op praktijkervaringen gebaseerd model om dergelijke parallelle

  4. Investigation of Mediational Processes Using Parallel Process Latent Growth Curve Modeling

    Science.gov (United States)

    Cheong, JeeWon; MacKinnon, David P.; Khoo, Siek Toon

    2010-01-01

    This study investigated a method to evaluate mediational processes using latent growth curve modeling. The mediator and the outcome measured across multiple time points were viewed as 2 separate parallel processes. The mediational process was defined as the independent variable influencing the growth of the mediator, which, in turn, affected the growth of the outcome. To illustrate modeling procedures, empirical data from a longitudinal drug prevention program, Adolescents Training and Learning to Avoid Steroids, were used. The program effects on the growth of the mediator and the growth of the outcome were examined first in a 2-group structural equation model. The mediational process was then modeled and tested in a parallel process latent growth curve model by relating the prevention program condition, the growth rate factor of the mediator, and the growth rate factor of the outcome. PMID:20157639

  5. Parallel direct solver for finite element modeling of manufacturing processes

    DEFF Research Database (Denmark)

    Nielsen, Chris Valentin; Martins, P.A.F.

    2017-01-01

    The central processing unit (CPU) time is of paramount importance in finite element modeling of manufacturing processes. Because the most significant part of the CPU time is consumed in solving the main system of equations resulting from finite element assemblies, different approaches have been...

  6. When fast logic meets slow belief: Evidence for a parallel-processing model of belief bias

    OpenAIRE

    Trippas, Dries; Thompson, Valerie A.; Handley, Simon J.

    2016-01-01

    Two experiments pitted the default-interventionist account of belief bias against a parallel-processing model. According to the former, belief bias occurs because a fast, belief-based evaluation of the conclusion pre-empts a working-memory demanding logical analysis. In contrast, according to the latter both belief-based and logic-based responding occur in parallel. Participants were given deductive reasoning problems of variable complexity and instructed to decide whether the conclusion was ...

  7. Fear Control an Danger Control: A Test of the Extended Parallel Process Model (EPPM).

    Science.gov (United States)

    Witte, Kim

    1994-01-01

    Explores cognitive and emotional mechanisms underlying success and failure of fear appeals in context of AIDS prevention. Offers general support for Extended Parallel Process Model. Suggests that cognitions lead to fear appeal success (attitude, intention, or behavior changes) via danger control processes, whereas the emotion fear leads to fear…

  8. Toward a model framework of generalized parallel componential processing of multi-symbol numbers.

    Science.gov (United States)

    Huber, Stefan; Cornelsen, Sonja; Moeller, Korbinian; Nuerk, Hans-Christoph

    2015-05-01

    In this article, we propose and evaluate a new model framework of parallel componential multi-symbol number processing, generalizing the idea of parallel componential processing of multi-digit numbers to the case of negative numbers by considering the polarity signs similar to single digits. In a first step, we evaluated this account by defining and investigating a sign-decade compatibility effect for the comparison of positive and negative numbers, which extends the unit-decade compatibility effect in 2-digit number processing. Then, we evaluated whether the model is capable of accounting for previous findings in negative number processing. In a magnitude comparison task, in which participants had to single out the larger of 2 integers, we observed a reliable sign-decade compatibility effect with prolonged reaction times for incompatible (e.g., -97 vs. +53; in which the number with the larger decade digit has the smaller, i.e., negative polarity sign) as compared with sign-decade compatible number pairs (e.g., -53 vs. +97). Moreover, an analysis of participants' eye fixation behavior corroborated our model of parallel componential processing of multi-symbol numbers. These results are discussed in light of concurrent theoretical notions about negative number processing. On the basis of the present results, we propose a generalized integrated model framework of parallel componential multi-symbol processing. (c) 2015 APA, all rights reserved).

  9. Parallelism and array processing

    International Nuclear Information System (INIS)

    Zacharov, V.

    1983-01-01

    Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)

  10. Cocaine Use and Delinquent Behavior among High-Risk Youths: A Growth Model of Parallel Processes

    Science.gov (United States)

    Dembo, Richard; Sullivan, Christopher

    2009-01-01

    We report the results of a parallel-process, latent growth model analysis examining the relationships between cocaine use and delinquent behavior among youths. The study examined a sample of 278 justice-involved juveniles completing at least one of three follow-up interviews as part of a National Institute on Drug Abuse-funded study. The results…

  11. An Inconvenient Truth: An Application of the Extended Parallel Process Model

    Science.gov (United States)

    Goodall, Catherine E.; Roberto, Anthony J.

    2008-01-01

    "An Inconvenient Truth" is an Academy Award-winning documentary about global warming presented by Al Gore. This documentary is appropriate for a lesson on fear appeals and the extended parallel process model (EPPM). The EPPM is concerned with the effects of perceived threat and efficacy on behavior change. Perceived threat is composed of an…

  12. Parallel processing and non-uniform grids in global air quality modeling

    NARCIS (Netherlands)

    Berkvens, P.J.F.; Bochev, Mikhail A.

    2002-01-01

    A large-scale global air quality model, running efficiently on a single vector processor, is enhanced to make more realistic and more long-term simulations feasible. Two strategies are combined: non-uniform grids and parallel processing. The communication through the hierarchy of non-uniform grids

  13. Sustainability Attitudes and Behavioral Motivations of College Students: Testing the Extended Parallel Process Model

    Science.gov (United States)

    Perrault, Evan K.; Clark, Scott K.

    2018-01-01

    Purpose: A planet that can no longer sustain life is a frightening thought--and one that is often present in mass media messages. Therefore, this study aims to test the components of a classic fear appeal theory, the extended parallel process model (EPPM) and to determine how well its constructs predict sustainability behavioral intentions. This…

  14. Lamb wave propagation modelling and simulation using parallel processing architecture and graphical cards

    International Nuclear Information System (INIS)

    Paćko, P; Bielak, T; Staszewski, W J; Uhl, T; Spencer, A B; Worden, K

    2012-01-01

    This paper demonstrates new parallel computation technology and an implementation for Lamb wave propagation modelling in complex structures. A graphical processing unit (GPU) and computer unified device architecture (CUDA), available in low-cost graphical cards in standard PCs, are used for Lamb wave propagation numerical simulations. The local interaction simulation approach (LISA) wave propagation algorithm has been implemented as an example. Other algorithms suitable for parallel discretization can also be used in practice. The method is illustrated using examples related to damage detection. The results demonstrate good accuracy and effective computational performance of very large models. The wave propagation modelling presented in the paper can be used in many practical applications of science and engineering. (paper)

  15. When fast logic meets slow belief: Evidence for a parallel-processing model of belief bias.

    Science.gov (United States)

    Trippas, Dries; Thompson, Valerie A; Handley, Simon J

    2017-05-01

    Two experiments pitted the default-interventionist account of belief bias against a parallel-processing model. According to the former, belief bias occurs because a fast, belief-based evaluation of the conclusion pre-empts a working-memory demanding logical analysis. In contrast, according to the latter both belief-based and logic-based responding occur in parallel. Participants were given deductive reasoning problems of variable complexity and instructed to decide whether the conclusion was valid on half the trials or to decide whether the conclusion was believable on the other half. When belief and logic conflict, the default-interventionist view predicts that it should take less time to respond on the basis of belief than logic, and that the believability of a conclusion should interfere with judgments of validity, but not the reverse. The parallel-processing view predicts that beliefs should interfere with logic judgments only if the processing required to evaluate the logical structure exceeds that required to evaluate the knowledge necessary to make a belief-based judgment, and vice versa otherwise. Consistent with this latter view, for the simplest reasoning problems (modus ponens), judgments of belief resulted in lower accuracy than judgments of validity, and believability interfered more with judgments of validity than the converse. For problems of moderate complexity (modus tollens and single-model syllogisms), the interference was symmetrical, in that validity interfered with belief judgments to the same degree that believability interfered with validity judgments. For the most complex (three-term multiple-model syllogisms), conclusion believability interfered more with judgments of validity than vice versa, in spite of the significant interference from conclusion validity on judgments of belief.

  16. LMFAO! Humor as a Response to Fear: Decomposing Fear Control within the Extended Parallel Process Model

    Science.gov (United States)

    Abril, Eulàlia P.; Szczypka, Glen; Emery, Sherry L.

    2017-01-01

    This study seeks to analyze fear control responses to the 2012 Tips from Former Smokers campaign using the Extended Parallel Process Model (EPPM). The goal is to examine the occurrence of ancillary fear control responses, like humor. In order to explore individuals’ responses in an organic setting, we use Twitter data—tweets—collected via the Firehose. Content analysis of relevant fear control tweets (N = 14,281) validated the existence of boomerang responses within the EPPM: denial, defensive avoidance, and reactance. More importantly, results showed that humor tweets were not only a significant occurrence but constituted the majority of fear control responses. PMID:29527092

  17. A parallel process growth model of avoidant personality disorder symptoms and personality traits.

    Science.gov (United States)

    Wright, Aidan G C; Pincus, Aaron L; Lenzenweger, Mark F

    2013-07-01

    Avoidant personality disorder (AVPD), like other personality disorders, has historically been construed as a highly stable disorder. However, results from a number of longitudinal studies have found that the symptoms of AVPD demonstrate marked change over time. Little is known about which other psychological systems are related to this change. Although cross-sectional research suggests a strong relationship between AVPD and personality traits, no work has examined the relationship of their change trajectories. The current study sought to establish the longitudinal relationship between AVPD and basic personality traits using parallel process growth curve modeling. Parallel process growth curve modeling was applied to the trajectories of AVPD and basic personality traits from the Longitudinal Study of Personality Disorders (Lenzenweger, M. F., 2006, The longitudinal study of personality disorders: History, design considerations, and initial findings. Journal of Personality Disorders, 20, 645-670. doi:10.1521/pedi.2006.20.6.645), a naturalistic, prospective, multiwave, longitudinal study of personality disorder, temperament, and normal personality. The focus of these analyses is on the relationship between the rates of change in both AVPD symptoms and basic personality traits. AVPD symptom trajectories demonstrated significant negative relationships with the trajectories of interpersonal dominance and affiliation, and a significant positive relationship to rates of change in neuroticism. These results provide some of the first compelling evidence that trajectories of change in PD symptoms and personality traits are linked. These results have important implications for the ways in which temporal stability is conceptualized in AVPD specifically, and PD in general.

  18. A Parallel Process Growth Model of Avoidant Personality Disorder Symptoms and Personality Traits

    Science.gov (United States)

    Wright, Aidan G. C.; Pincus, Aaron L.; Lenzenweger, Mark F.

    2012-01-01

    Background Avoidant personality disorder (AVPD), like other personality disorders, has historically been construed as a highly stable disorder. However, results from a number of longitudinal studies have found that the symptoms of AVPD demonstrate marked change over time. Little is known about which other psychological systems are related to this change. Although cross-sectional research suggests a strong relationship between AVPD and personality traits, no work has examined the relationship of their change trajectories. The current study sought to establish the longitudinal relationship between AVPD and basic personality traits using parallel process growth curve modeling. Methods Parallel process growth curve modeling was applied to the trajectories of AVPD and basic personality traits from the Longitudinal Study of Personality Disorders (Lenzenweger, 2006), a naturalistic, prospective, multiwave, longitudinal study of personality disorder, temperament, and normal personality. The focus of these analyses is on the relationship between the rates of change in both AVPD symptoms and basic personality traits. Results AVPD symptom trajectories demonstrated significant negative relationships with the trajectories of interpersonal dominance and affiliation, and a significant positive relationship to rates of change in neuroticism. Conclusions These results provide some of the first compelling evidence that trajectories of change in PD symptoms and personality traits are linked. These results have important implications for the ways in which temporal stability is conceptualized in AVPD specifically, and PD in general. PMID:22506627

  19. Parallel Framework for Cooperative Processes

    Directory of Open Access Journals (Sweden)

    Mitică Craus

    2005-01-01

    Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.

  20. Metastable states in the hierarchical Dyson model drive parallel processing in the hierarchical Hopfield network

    International Nuclear Information System (INIS)

    Agliari, Elena; Barra, Adriano; Guerra, Francesco; Galluzzi, Andrea; Tantari, Daniele; Tavani, Flavia

    2015-01-01

    In this paper, we introduce and investigate the statistical mechanics of hierarchical neural networks. First, we approach these systems à la Mattis, by thinking of the Dyson model as a single-pattern hierarchical neural network. We also discuss the stability of different retrievable states as predicted by the related self-consistencies obtained both from a mean-field bound and from a bound that bypasses the mean-field limitation. The latter is worked out by properly reabsorbing the magnetization fluctuations related to higher levels of the hierarchy into effective fields for the lower levels. Remarkably, mixing Amit's ansatz technique for selecting candidate-retrievable states with the interpolation procedure for solving for the free energy of these states, we prove that, due to gauge symmetry, the Dyson model accomplishes both serial and parallel processing. We extend this scenario to multiple stored patterns by implementing the Hebb prescription for learning within the couplings. This results in Hopfield-like networks constrained on a hierarchical topology, for which, by restricting to the low-storage regime where the number of patterns grows at its most logarithmical with the amount of neurons, we prove the existence of the thermodynamic limit for the free energy, and we give an explicit expression of its mean-field bound and of its related improved bound. We studied the resulting self-consistencies for the Mattis magnetizations, which act as order parameters, are studied and the stability of solutions is analyzed to get a picture of the overall retrieval capabilities of the system according to both mean-field and non-mean-field scenarios. Our main finding is that embedding the Hebbian rule on a hierarchical topology allows the network to accomplish both serial and parallel processing. By tuning the level of fast noise affecting it or triggering the decay of the interactions with the distance among neurons, the system may switch from sequential retrieval to

  1. Implementation science: a role for parallel dual processing models of reasoning?

    Directory of Open Access Journals (Sweden)

    Phillips Paddy A

    2006-05-01

    Full Text Available Abstract Background A better theoretical base for understanding professional behaviour change is needed to support evidence-based changes in medical practice. Traditionally strategies to encourage changes in clinical practices have been guided empirically, without explicit consideration of underlying theoretical rationales for such strategies. This paper considers a theoretical framework for reasoning from within psychology for identifying individual differences in cognitive processing between doctors that could moderate the decision to incorporate new evidence into their clinical decision-making. Discussion Parallel dual processing models of reasoning posit two cognitive modes of information processing that are in constant operation as humans reason. One mode has been described as experiential, fast and heuristic; the other as rational, conscious and rule based. Within such models, the uptake of new research evidence can be represented by the latter mode; it is reflective, explicit and intentional. On the other hand, well practiced clinical judgments can be positioned in the experiential mode, being automatic, reflexive and swift. Research suggests that individual differences between people in both cognitive capacity (e.g., intelligence and cognitive processing (e.g., thinking styles influence how both reasoning modes interact. This being so, it is proposed that these same differences between doctors may moderate the uptake of new research evidence. Such dispositional characteristics have largely been ignored in research investigating effective strategies in implementing research evidence. Whilst medical decision-making occurs in a complex social environment with multiple influences and decision makers, it remains true that an individual doctor's judgment still retains a key position in terms of diagnostic and treatment decisions for individual patients. This paper argues therefore, that individual differences between doctors in terms of

  2. Implementation science: a role for parallel dual processing models of reasoning?

    Science.gov (United States)

    Sladek, Ruth M; Phillips, Paddy A; Bond, Malcolm J

    2006-05-25

    A better theoretical base for understanding professional behaviour change is needed to support evidence-based changes in medical practice. Traditionally strategies to encourage changes in clinical practices have been guided empirically, without explicit consideration of underlying theoretical rationales for such strategies. This paper considers a theoretical framework for reasoning from within psychology for identifying individual differences in cognitive processing between doctors that could moderate the decision to incorporate new evidence into their clinical decision-making. Parallel dual processing models of reasoning posit two cognitive modes of information processing that are in constant operation as humans reason. One mode has been described as experiential, fast and heuristic; the other as rational, conscious and rule based. Within such models, the uptake of new research evidence can be represented by the latter mode; it is reflective, explicit and intentional. On the other hand, well practiced clinical judgments can be positioned in the experiential mode, being automatic, reflexive and swift. Research suggests that individual differences between people in both cognitive capacity (e.g., intelligence) and cognitive processing (e.g., thinking styles) influence how both reasoning modes interact. This being so, it is proposed that these same differences between doctors may moderate the uptake of new research evidence. Such dispositional characteristics have largely been ignored in research investigating effective strategies in implementing research evidence. Whilst medical decision-making occurs in a complex social environment with multiple influences and decision makers, it remains true that an individual doctor's judgment still retains a key position in terms of diagnostic and treatment decisions for individual patients. This paper argues therefore, that individual differences between doctors in terms of reasoning are important considerations in any

  3. Belief–logic conflict resolution in syllogistic reasoning: Inspection-time evidence for a parallel process model

    OpenAIRE

    Stupple, Edward J.N; Ball, Linden

    2008-01-01

    An experiment is reported examining dual-process models of belief bias in syllogistic reasoning using a problem complexity manipulation and an inspection-time method to monitor processing latencies for premises and conclusions. Endorsement rates indicated increased belief bias on complex problems, a finding that runs counter to the “belief-first” selective scrutiny model, but which is consistent with other theories, including “reasoning-first” and “parallel-process” models. Inspection-time da...

  4. Prediction of Adequate Prenatal Care Utilization Based on the Extended Parallel Process Model.

    Science.gov (United States)

    Hajian, Sepideh; Imani, Fatemeh; Riazi, Hedyeh; Salmani, Fatemeh

    2017-10-01

    Pregnancy complications are one of the major public health concerns. One of the main causes of preventable complications is the absence of or inadequate provision of prenatal care. The present study was conducted to investigate whether Extended Parallel Process Model's constructs can predict the utilization of prenatal care services. The present longitudinal prospective study was conducted on 192 pregnant women selected through the multi-stage sampling of health facilities in Qeshm, Hormozgan province, from April to June 2015. Participants were followed up from the first half of pregnancy until their childbirth to assess adequate or inadequate/non-utilization of prenatal care services. Data were collected using the structured Risk Behavior Diagnosis Scale. The analysis of the data was carried out in SPSS-22 using one-way ANOVA, linear regression and logistic regression analysis. The level of significance was set at 0.05. Totally, 178 pregnant women with a mean age of 25.31±5.42 completed the study. Perceived self-efficacy (OR=25.23; Pprenatal care. Husband's occupation in the labor market (OR=0.43; P=0.02), unwanted pregnancy (OR=0.352; Pcare for the minors or elderly at home (OR=0.35; P=0.045) were associated with lower odds of receiving prenatal care. The model showed that when perceived efficacy of the prenatal care services overcame the perceived threat, the likelihood of prenatal care usage will increase. This study identified some modifiable factors associated with prenatal care usage by women, providing key targets for appropriate clinical interventions.

  5. Cellular automata a parallel model

    CERN Document Server

    Mazoyer, J

    1999-01-01

    Cellular automata can be viewed both as computational models and modelling systems of real processes. This volume emphasises the first aspect. In articles written by leading researchers, sophisticated massive parallel algorithms (firing squad, life, Fischer's primes recognition) are treated. Their computational power and the specific complexity classes they determine are surveyed, while some recent results in relation to chaos from a new dynamic systems point of view are also presented. Audience: This book will be of interest to specialists of theoretical computer science and the parallelism challenge.

  6. Comparison of Efficacy and Threat Perception Processes in Predicting Smoking among University Students Based on Extended Parallel Process Model

    Directory of Open Access Journals (Sweden)

    S. Bashirian

    2014-04-01

    Full Text Available Introduction & Objective: The survey of smoking as the most toxic, common and cheapest ad-diction, and its psychological and demographic variables especially among the youth who are efficient and constructive individuals of the society is of great importance. This study was performed to compare efficacy and threat perception in predicting cigarette smoking among university students based on Expended Parallel Process Model (EPPM. Material & Methods: This cross sectional descriptive study was carried out on 700 college stu-dents of Hamadan recruited with a stratified sampling method. The participants completed a self-administered questionnaire including demographic characteristics, smoking status and EPPM Data analysis was done with the SPSS software (version 16, using t-test, one way ANOVA, Pierson correlation and logistic regression methods. Results: The average scores of threat and efficacy perception were 39.7 and 38.6, respectively. The prevalence of cigarette smoking among participants was 27.1 percent. Also, there were significant differences between the average score of efficacy perception and age, gender, his-tory of drug abuse and dwelling of students (P<0.05. Efficacy and threat perception both predicted student cigarette smoking. Conclusions: Cognitive mediating process of threat perception was a more powerful predictor of cigarette smoking as an unsafe behavior. Therefore, increasing self efficacy and response efficacy of university students aimed at facilitating the acceptance of safe behavior could be note-worthy as a principle in education. (Sci J Hamadan Univ Med Sci 2014; 21 (1:58-65

  7. Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    Science.gov (United States)

    Harper, Richard

    1989-01-01

    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

  8. Flexible parallel implicit modelling of coupled thermal-hydraulic-mechanical processes in fractured rocks

    Science.gov (United States)

    Cacace, Mauro; Jacquey, Antoine B.

    2017-09-01

    Theory and numerical implementation describing groundwater flow and the transport of heat and solute mass in fully saturated fractured rocks with elasto-plastic mechanical feedbacks are developed. In our formulation, fractures are considered as being of lower dimension than the hosting deformable porous rock and we consider their hydraulic and mechanical apertures as scaling parameters to ensure continuous exchange of fluid mass and energy within the fracture-solid matrix system. The coupled system of equations is implemented in a new simulator code that makes use of a Galerkin finite-element technique. The code builds on a flexible, object-oriented numerical framework (MOOSE, Multiphysics Object Oriented Simulation Environment) which provides an extensive scalable parallel and implicit coupling to solve for the multiphysics problem. The governing equations of groundwater flow, heat and mass transport, and rock deformation are solved in a weak sense (either by classical Newton-Raphson or by free Jacobian inexact Newton-Krylow schemes) on an underlying unstructured mesh. Nonlinear feedbacks among the active processes are enforced by considering evolving fluid and rock properties depending on the thermo-hydro-mechanical state of the system and the local structure, i.e. degree of connectivity, of the fracture system. A suite of applications is presented to illustrate the flexibility and capability of the new simulator to address problems of increasing complexity and occurring at different spatial (from centimetres to tens of kilometres) and temporal scales (from minutes to hundreds of years).

  9. Flexible parallel implicit modelling of coupled thermal–hydraulic–mechanical processes in fractured rocks

    Directory of Open Access Journals (Sweden)

    M. Cacace

    2017-09-01

    Full Text Available Theory and numerical implementation describing groundwater flow and the transport of heat and solute mass in fully saturated fractured rocks with elasto-plastic mechanical feedbacks are developed. In our formulation, fractures are considered as being of lower dimension than the hosting deformable porous rock and we consider their hydraulic and mechanical apertures as scaling parameters to ensure continuous exchange of fluid mass and energy within the fracture–solid matrix system. The coupled system of equations is implemented in a new simulator code that makes use of a Galerkin finite-element technique. The code builds on a flexible, object-oriented numerical framework (MOOSE, Multiphysics Object Oriented Simulation Environment which provides an extensive scalable parallel and implicit coupling to solve for the multiphysics problem. The governing equations of groundwater flow, heat and mass transport, and rock deformation are solved in a weak sense (either by classical Newton–Raphson or by free Jacobian inexact Newton–Krylow schemes on an underlying unstructured mesh. Nonlinear feedbacks among the active processes are enforced by considering evolving fluid and rock properties depending on the thermo-hydro-mechanical state of the system and the local structure, i.e. degree of connectivity, of the fracture system. A suite of applications is presented to illustrate the flexibility and capability of the new simulator to address problems of increasing complexity and occurring at different spatial (from centimetres to tens of kilometres and temporal scales (from minutes to hundreds of years.

  10. Linear parallel processing machines I

    Energy Technology Data Exchange (ETDEWEB)

    Von Kunze, M

    1984-01-01

    As is well-known, non-context-free grammars for generating formal languages happen to be of a certain intrinsic computational power that presents serious difficulties to efficient parsing algorithms as well as for the development of an algebraic theory of contextsensitive languages. In this paper a framework is given for the investigation of the computational power of formal grammars, in order to start a thorough analysis of grammars consisting of derivation rules of the form aB ..-->.. A/sub 1/ ... A /sub n/ b/sub 1/...b /sub m/ . These grammars may be thought of as automata by means of parallel processing, if one considers the variables as operators acting on the terminals while reading them right-to-left. This kind of automata and their 2-dimensional programming language prove to be useful by allowing a concise linear-time algorithm for integer multiplication. Linear parallel processing machines (LP-machines) which are, in their general form, equivalent to Turing machines, include finite automata and pushdown automata (with states encoded) as special cases. Bounded LP-machines yield deterministic accepting automata for nondeterministic contextfree languages, and they define an interesting class of contextsensitive languages. A characterization of this class in terms of generating grammars is established by using derivation trees with crossings as a helpful tool. From the algebraic point of view, deterministic LP-machines are effectively represented semigroups with distinguished subsets. Concerning the dualism between generating and accepting devices of formal languages within the algebraic setting, the concept of accepting automata turns out to reduce essentially to embeddability in an effectively represented extension monoid, even in the classical cases.

  11. Using the extended parallel process model to prevent noise-induced hearing loss among coal miners in Appalachia

    Energy Technology Data Exchange (ETDEWEB)

    Murray-Johnson, L.; Witte, K.; Patel, D.; Orrego, V.; Zuckerman, C.; Maxfield, A.M.; Thimons, E.D. [Ohio State University, Columbus, OH (US)

    2004-12-15

    Occupational noise-induced hearing loss is the second most self-reported occupational illness or injury in the United States. Among coal miners, more than 90% of the population reports a hearing deficit by age 55. In this formative evaluation, focus groups were conducted with coal miners in Appalachia to ascertain whether miners perceive hearing loss as a major health risk and if so, what would motivate the consistent wearing of hearing protection devices (HPDs). The theoretical framework of the Extended Parallel Process Model was used to identify the miners' knowledge, attitudes, beliefs, and current behaviors regarding hearing protection. Focus group participants had strong perceived severity and varying levels of perceived susceptibility to hearing loss. Various barriers significantly reduced the self-efficacy and the response efficacy of using hearing protection.

  12. Parallel processing for artificial intelligence 1

    CERN Document Server

    Kanal, LN; Kumar, V; Suttner, CB

    1994-01-01

    Parallel processing for AI problems is of great current interest because of its potential for alleviating the computational demands of AI procedures. The articles in this book consider parallel processing for problems in several areas of artificial intelligence: image processing, knowledge representation in semantic networks, production rules, mechanization of logic, constraint satisfaction, parsing of natural language, data filtering and data mining. The publication is divided into six sections. The first addresses parallel computing for processing and understanding images. The second discus

  13. Biological neural networks as model systems for designing future parallel processing computers

    Science.gov (United States)

    Ross, Muriel D.

    1991-01-01

    One of the more interesting debates of the present day centers on whether human intelligence can be simulated by computer. The author works under the premise that neurons individually are not smart at all. Rather, they are physical units which are impinged upon continuously by other matter that influences the direction of voltage shifts across the units membranes. It is only the action of a great many neurons, billions in the case of the human nervous system, that intelligent behavior emerges. What is required to understand even the simplest neural system is painstaking analysis, bit by bit, of the architecture and the physiological functioning of its various parts. The biological neural network studied, the vestibular utricular and saccular maculas of the inner ear, are among the most simple of the mammalian neural networks to understand and model. While there is still a long way to go to understand even this most simple neural network in sufficient detail for extrapolation to computers and robots, a start was made. Moreover, the insights obtained and the technologies developed help advance the understanding of the more complex neural networks that underlie human intelligence.

  14. Combining self-affirmation with the extended parallel process model: the consequences for motivation to eat more fruit and vegetables.

    Science.gov (United States)

    Napper, Lucy E; Harris, Peter R; Klein, William M P

    2014-01-01

    There is potential for fruitful integration of research using the Extended Parallel Process Model (EPPM) with research using Self-affirmation Theory. However, to date no studies have attempted to do this. This article reports an experiment that tests whether (a) the effects of a self-affirmation manipulation add to those of EPPM variables in predicting intentions to improve a health behavior and (b) self-affirmation moderates the relationship between EPPM variables and intentions. Participants (N = 80) were randomized to either a self-affirmation or control condition prior to receiving personally relevant health information about the risks of not eating at least five portions of fruit and vegetables per day. A hierarchical regression model revealed that efficacy, threat × efficacy, self-affirmation, and self-affirmation × efficacy all uniquely contributed to the prediction of intentions to eat at least five portions per day. Self-affirmed participants and those with higher efficacy reported greater motivation to change. Threat predicted intentions at low levels of efficacy, but not at high levels. Efficacy had a stronger relationship with intentions in the nonaffirmed condition than in the self-affirmed condition. The findings indicate that self-affirmation processes can moderate the impact of variables in the EPPM and also add to the variance explained. We argue that there is potential for integration of the two traditions of research, to the benefit of both.

  15. Parallel processing for fluid dynamics applications

    International Nuclear Information System (INIS)

    Johnson, G.M.

    1989-01-01

    The impact of parallel processing on computational science and, in particular, on computational fluid dynamics is growing rapidly. In this paper, particular emphasis is given to developments which have occurred within the past two years. Parallel processing is defined and the reasons for its importance in high-performance computing are reviewed. Parallel computer architectures are classified according to the number and power of their processing units, their memory, and the nature of their connection scheme. Architectures which show promise for fluid dynamics applications are emphasized. Fluid dynamics problems are examined for parallelism inherent at the physical level. CFD algorithms and their mappings onto parallel architectures are discussed. Several example are presented to document the performance of fluid dynamics applications on present-generation parallel processing devices

  16. Parallel processing of genomics data

    Science.gov (United States)

    Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-10-01

    The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.

  17. The evolution of concepts of vestibular peripheral information processing: toward the dynamic, adaptive, parallel processing macular model

    Science.gov (United States)

    Ross, Muriel D.

    2003-01-01

    In a letter to Robert Hooke, written on 5 February, 1675, Isaac Newton wrote "If I have seen further than certain other men it is by standing upon the shoulders of giants." In his context, Newton was referring to the work of Galileo and Kepler, who preceded him. However, every field has its own giants, those men and women who went before us and, often with few tools at their disposal, uncovered the facts that enabled later researchers to advance knowledge in a particular area. This review traces the history of the evolution of views from early giants in the field of vestibular research to modern concepts of vestibular organ organization and function. Emphasis will be placed on the mammalian maculae as peripheral processors of linear accelerations acting on the head. This review shows that early, correct findings were sometimes unfortunately disregarded, impeding later investigations into the structure and function of the vestibular organs. The central themes are that the macular organs are highly complex, dynamic, adaptive, distributed parallel processors of information, and that historical references can help us to understand our own place in advancing knowledge about their complicated structure and functions.

  18. Assessment of Substance Abuse Behaviors in Adolescents’: Integration of Self-Control into Extended Parallel Process Model

    Directory of Open Access Journals (Sweden)

    K Witte

    2005-04-01

    Full Text Available Introduction: An effective preventive health education program on drug abuse can be delivered by applying behavior change theories in a complementary fashion. Methods: The aim of this study was to assess the effectiveness of integrating self-control into Extended Parallel Process Model in drug substance abuse behaviors. A sample of 189 governmental high school students participated in this survey. Information was collected individually by completing researcher designed questionnaire and a urinary rapid immuno-chromatography test for opium and marijuana. Results: The results of the study show that 6.9% of students used drugs (especially opium and marijuana and also peer pressure was determinant factor for using drugs. Moreover the EPPM theoretical variables of perceived severity and perceived self-efficacy with self-control are predictive factors to behavior intention against substance abuse. In this manner, self-control had a significant effect on protective motivation and perceived efficacy. Low self- control was a predictive factor of drug abuse and low self-control students had drug abuse experience. Conclusion: The results of this study suggest that an integration of self-control into EPPM can be effective in expressing and designing primary preventive programs against drug abuse, and assessing abused behavior and deviance behaviors among adolescent population, especially risk seekers

  19. An extension of the extended parallel process model (EPPM) in television health news: the influence of health consciousness on individual message processing and acceptance.

    Science.gov (United States)

    Hong, Hyehyun

    2011-06-01

    The purpose of this study is to examine the role of health consciousness in processing TV news that contains potential health threats and preventive recommendations. Based on the extended parallel process model (Witte, 1992), relationships among health consciousness, perceived severity, perceived susceptibility, perceived response efficacy, perceived self-efficacy, and message acceptance/rejection were hypothesized. Responses collected from 175 participants after viewing four TV health news stories were analyzed using the bootstrapping analysis (Preacher & Hayes, 2008). Results confirmed three mediators (i.e., perceived severity, response efficacy, self-efficacy) in the influence of health consciousness on message acceptance. A negative association found between health consciousness and perceived susceptibility is discussed in relation to characteristics of health conscious individuals and optimistic bias of health risks.

  20. Co-development of Problem Gambling and Depression Symptoms in Emerging Adults: A Parallel-Process Latent Class Growth Model.

    Science.gov (United States)

    Edgerton, Jason D; Keough, Matthew T; Roberts, Lance W

    2018-02-21

    This study examines whether there are multiple joint trajectories of depression and problem gambling co-development in a sample of emerging adults. Data were from the Manitoba Longitudinal Study of Young Adults (n = 679), which was collected in 4 waves across 5 years (age 18-20 at baseline). Parallel process latent class growth modeling was used to identified 5 joint trajectory classes: low decreasing gambling, low increasing depression (81%); low stable gambling, moderate decreasing depression (9%); low stable gambling, high decreasing depression (5%); low stable gambling, moderate stable depression (3%); moderate stable problem gambling, no depression (2%). There was no evidence of reciprocal growth in problem gambling and depression in any of the joint classes. Multinomial logistic regression analyses of baseline risk and protective factors found that only neuroticism, escape-avoidance coping, and perceived level of family social support were significant predictors of joint trajectory class membership. Consistent with the pathways model framework, we observed that individuals in the problem gambling only class were more likely using gambling as a stable way to cope with negative emotions. Similarly, high levels of neuroticism and low levels of family support were associated with increased odds of being in a class with moderate to high levels of depressive symptoms (but low gambling problems). The results suggest that interventions for problem gambling and/or depression need to focus on promoting more adaptive coping skills among more "at-risk" young adults, and such interventions should be tailored in relation to specific subtypes of comorbid mental illness.

  1. Advanced parallel processing with supercomputer architectures

    International Nuclear Information System (INIS)

    Hwang, K.

    1987-01-01

    This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers

  2. Structured building model reduction toward parallel simulation

    Energy Technology Data Exchange (ETDEWEB)

    Dobbs, Justin R. [Cornell University; Hencey, Brondon M. [Cornell University

    2013-08-26

    Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.

  3. Development of mathematical model and optimal control system of internal temperatures of hot-blast stove process in staggered parallel operation; Netsufuro sushiki model to parallel sofu ni okeru ronai ondo saiteki seigyo system no kaihatsu

    Energy Technology Data Exchange (ETDEWEB)

    Matoba, Y. [Sumitomo Metal Industries, Ltd., Osaka (Japan); Otsuka, K.

    1998-07-01

    A mathematical model and an optimal control system of hot-blast stove process are described. A precise mathematical simulation model of the hot-blast stove was developed and the accuracy of the model has been confirmed. An optimal control system of the thermal conditions of the hot-blast stoves in staggered parallel operation was also developed. By the use of the multivariable optimal regulator and the feedforward compensations for the change of the aimed blast temperature and blast volume, the system is able to control the hot blast temperature and the brick temperature efficiently. The system has been applied to Kashima works. The variations of the blast temperature and the silica brick temperature have been decreased. The ultimate low heat level operations have been realized and the thermal efficiency furthermore has been raised by about 1%. 8 refs., 14 figs., 1 tab.

  4. Does the extended parallel process model fear appeal theory explain fears and barriers to prenatal physical activity?

    Science.gov (United States)

    Redmond, Michelle L; Dong, Fanglong; Frazier, Linda M

    2015-01-01

    Few studies have looked at the impact of fear on exercise behavior during pregnancy using a fear appeal theory. It is beneficial to understand how women receive the message of safe exercise during pregnancy and whether established guidelines have any influence on their decision to exercise. Using the extended parallel process model (EPPM), we explored women's fears about prenatal physical activity. We conducted a prospective, cross-sectional study on the fears and barriers to prenatal exercise among a racially/ethnically diverse population of pregnant women. Participants were recruited from local prenatal clinics. Ninety females with a singleton pregnancy between 16 and 30 weeks gestation were enrolled in the study. The primary outcome measure was classification of risk behavior based on the EPPM theory. Women who scored high on self-efficacy for exercising safely were more likely to exercise during pregnancy (adjusted odds ratio, 5.95; 95% CI, 1.39-25.39; P=.016) for at least 90 minutes per week. Participants who exercised at least 90 minutes per week during pregnancy scored higher on their perceived ability to control danger to the baby, as well as less susceptibility of harm and threat to baby of moderate exercise from prenatal exercise. More education and counseling on specific guidelines for safely exercising during pregnancy are needed. The EPPM framework has the potential to help improve health communications about exercise safety and guidelines between patients and health care professionals during pregnancy. Copyright © 2015 Jacobs Institute of Women's Health. Published by Elsevier Inc. All rights reserved.

  5. Parallel processing of structural integrity analysis codes

    International Nuclear Information System (INIS)

    Swami Prasad, P.; Dutta, B.K.; Kushwaha, H.S.

    1996-01-01

    Structural integrity analysis forms an important role in assessing and demonstrating the safety of nuclear reactor components. This analysis is performed using analytical tools such as Finite Element Method (FEM) with the help of digital computers. The complexity of the problems involved in nuclear engineering demands high speed computation facilities to obtain solutions in reasonable amount of time. Parallel processing systems such as ANUPAM provide an efficient platform for realising the high speed computation. The development and implementation of software on parallel processing systems is an interesting and challenging task. The data and algorithm structure of the codes plays an important role in exploiting the parallel processing system capabilities. Structural analysis codes based on FEM can be divided into two categories with respect to their implementation on parallel processing systems. The first category codes such as those used for harmonic analysis, mechanistic fuel performance codes need not require the parallelisation of individual modules of the codes. The second category of codes such as conventional FEM codes require parallelisation of individual modules. In this category, parallelisation of equation solution module poses major difficulties. Different solution schemes such as domain decomposition method (DDM), parallel active column solver and substructuring method are currently used on parallel processing systems. Two codes, FAIR and TABS belonging to each of these categories have been implemented on ANUPAM. The implementation details of these codes and the performance of different equation solvers are highlighted. (author). 5 refs., 12 figs., 1 tab

  6. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  7. Intelligent spatial ecosystem modeling using parallel processors

    International Nuclear Information System (INIS)

    Maxwell, T.; Costanza, R.

    1993-01-01

    Spatial modeling of ecosystems is essential if one's modeling goals include developing a relatively realistic description of past behavior and predictions of the impacts of alternative management policies on future ecosystem behavior. Development of these models has been limited in the past by the large amount of input data required and the difficulty of even large mainframe serial computers in dealing with large spatial arrays. These two limitations have begun to erode with the increasing availability of remote sensing data and GIS systems to manipulate it, and the development of parallel computer systems which allow computation of large, complex, spatial arrays. Although many forms of dynamic spatial modeling are highly amenable to parallel processing, the primary focus in this project is on process-based landscape models. These models simulate spatial structure by first compartmentalizing the landscape into some geometric design and then describing flows within compartments and spatial processes between compartments according to location-specific algorithms. The authors are currently building and running parallel spatial models at the regional scale for the Patuxent River region in Maryland, the Everglades in Florida, and Barataria Basin in Louisiana. The authors are also planning a project to construct a series of spatially explicit linked ecological and economic simulation models aimed at assessing the long-term potential impacts of global climate change

  8. Applications of Parallel Processing in Mobile Banking

    Directory of Open Access Journals (Sweden)

    2007-01-01

    Full Text Available The future of mobile banking will be represented by such applications that support mobile, Internet banking and EFT (Electronic Funds Transfer transactions in a single user interface. In such a way, the mobile banking will be able to cover all the types of applications demanded at the market level. The parallel processing of credit card bank transactions could be performed with the help of a grid network. Excluding some limitations, the grid processing offers huge opportunities to exploit the parallelism. For this reason, a lot of applications of waiting queues in grid processing were developed in the last years. Grid networks represent a distinctive and very modern field of the parallel and distributed processing.

  9. A parallel process model of the development of positive smoking expectancies and smoking behavior during early adolescence in Caucasian and African American girls

    OpenAIRE

    Chung, Tammy; White, Helene R.; Hipwell, Alison E.; Stepp, Stephanie D.; Loeber, Rolf

    2010-01-01

    This study examined the development of positive smoking expectancies and smoking behavior in an urban cohort of girls followed annually over ages 11-14. Longitudinal data from the oldest cohort of the Pittsburgh Girls Study (N=566, 56% African American, 44% Caucasian) were used to estimate a parallel process growth model of positive smoking expectancies and smoking behavior. Average level of positive smoking expectancies was relatively stable over ages 11-14, although there was significant va...

  10. Parallel computing in enterprise modeling.

    Energy Technology Data Exchange (ETDEWEB)

    Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

    2008-08-01

    This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

  11. Evidence of Parallel Processing During Translation

    DEFF Research Database (Denmark)

    Balling, Laura Winther; Hvelplund, Kristian Tangsgaard; Sjørup, Annette Camilla

    2014-01-01

    conclude that translation is a parallel process and that literal translation is likely to be a universal initial default strategy in translation. This conclusion is strengthened by the fact that all three experiments were relatively naturalistic, due to the combination of remote eye tracking and mixed...

  12. Researching the Parallel Process in Supervision and Psychotherapy

    DEFF Research Database (Denmark)

    Jacobsen, Claus Haugaard

    Reflects upon how to do process research in supervision and in the parallel process. A single case study is presented illustrating how a study on parallel process can be carried out.......Reflects upon how to do process research in supervision and in the parallel process. A single case study is presented illustrating how a study on parallel process can be carried out....

  13. Parallel processing for artificial intelligence 2

    CERN Document Server

    Kumar, V; Suttner, CB

    1994-01-01

    With the increasing availability of parallel machines and the raising of interest in large scale and real world applications, research on parallel processing for Artificial Intelligence (AI) is gaining greater importance in the computer science environment. Many applications have been implemented and delivered but the field is still considered to be in its infancy. This book assembles diverse aspects of research in the area, providing an overview of the current state of technology. It also aims to promote further growth across the discipline. Contributions have been grouped according to their

  14. Aspects of parallel processing and control engineering

    OpenAIRE

    McKittrick, Brendan J

    1991-01-01

    The concept of parallel processing is not a new one, but the application of it to control engineering tasks is a relatively recent development, made possible by contemporary hardware and software innovation. It has long been accepted that, if properly orchestrated several processors/CPUs when combined can form a powerful processing entity. What prevented this from being implemented in commercial systems was the adequacy of the microprocessor for most tasks and hence the expense of a multi-pro...

  15. A multitransputer parallel processing system (MTPPS)

    International Nuclear Information System (INIS)

    Jethra, A.K.; Pande, S.S.; Borkar, S.P.; Khare, A.N.; Ghodgaonkar, M.D.; Bairi, B.R.

    1993-01-01

    This report describes the design and implementation of a 16 node Multi Transputer Parallel Processing System(MTPPS) which is a platform for parallel program development. It is a MIMD machine based on message passing paradigm. The basic compute engine is an Inmos Transputer Ims T800-20. Transputer with local memory constitutes the processing element (NODE) of this MIMD architecture. Multiple NODES can be connected to each other in an identifiable network topology through the high speed serial links of the transputer. A Network Configuration Unit (NCU) incorporates the necessary hardware to provide software controlled network configuration. System is modularly expandable and more NODES can be added to the system to achieve the required processing power. The system is backend to the IBM-PC which has been integrated into the system to provide user I/O interface. PC resources are available to the programmer. Interface hardware between the PC and the network of transputers is INMOS compatible. Therefore, all the commercially available development software compatible to INMOS products can run on this system. While giving the details of design and implementation, this report briefly summarises MIMD Architectures, Transputer Architecture and Parallel Processing Software Development issues. LINPACK performance evaluation of the system and solutions of neutron physics and plasma physics problem have been discussed along with results. (author). 12 refs., 22 figs., 3 tabs., 3 appendixes

  16. PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

    Energy Technology Data Exchange (ETDEWEB)

    Lichtner, Peter C. [OFM Research, Redmond, WA (United States); Hammond, Glenn E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lu, Chuan [Idaho National Lab. (INL), Idaho Falls, ID (United States); Karra, Satish [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Bisht, Gautam [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Andre, Benjamin [National Center for Atmospheric Research, Boulder, CO (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Mills, Richard [Intel Corporation, Portland, OR (United States); Univ. of Tennessee, Knoxville, TN (United States); Kumar, Jitendra [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2015-01-20

    PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Written in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 232 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.

  17. Parallel asynchronous systems and image processing algorithms

    Science.gov (United States)

    Coon, D. D.; Perera, A. G. U.

    1989-01-01

    A new hardware approach to implementation of image processing algorithms is described. The approach is based on silicon devices which would permit an independent analog processing channel to be dedicated to evey pixel. A laminar architecture consisting of a stack of planar arrays of the device would form a two-dimensional array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuronlike asynchronous pulse coded form through the laminar processor. Such systems would integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The research is aimed at implementation of algorithms, such as the intensity dependent summation algorithm and pyramid processing structures, which are motivated by the operation of natural vision systems. Implementation of natural vision algorithms would benefit from the use of neuronlike information coding and the laminar, 2-D parallel, vision system type architecture. Besides providing a neural network framework for implementation of natural vision algorithms, a 2-D parallel approach could eliminate the serial bottleneck of conventional processing systems. Conversion to serial format would occur only after raw intensity data has been substantially processed. An interesting challenge arises from the fact that the mathematical formulation of natural vision algorithms does not specify the means of implementation, so that hardware implementation poses intriguing questions involving vision science.

  18. Oxytocin: parallel processing in the social brain?

    Science.gov (United States)

    Dölen, Gül

    2015-06-01

    Early studies attempting to disentangle the network complexity of the brain exploited the accessibility of sensory receptive fields to reveal circuits made up of synapses connected both in series and in parallel. More recently, extension of this organisational principle beyond the sensory systems has been made possible by the advent of modern molecular, viral and optogenetic approaches. Here, evidence supporting parallel processing of social behaviours mediated by oxytocin is reviewed. Understanding oxytocinergic signalling from this perspective has significant implications for the design of oxytocin-based therapeutic interventions aimed at disorders such as autism, where disrupted social function is a core clinical feature. Moreover, identification of opportunities for novel technology development will require a better appreciation of the complexity of the circuit-level organisation of the social brain. © 2015 The Authors. Journal of Neuroendocrinology published by John Wiley & Sons Ltd on behalf of British Society for Neuroendocrinology.

  19. Towards a streaming model for nested data parallelism

    DEFF Research Database (Denmark)

    Madsen, Frederik Meisner; Filinski, Andrzej

    2013-01-01

    The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism-flattening......The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism......-processable in a streaming fashion. This semantics is directly compatible with previously proposed piecewise execution models for nested data parallelism, but allows the expected space usage to be reasoned about directly at the source-language level. The language definition and implementation are still very much work...

  20. PDDP, A Data Parallel Programming Model

    Directory of Open Access Journals (Sweden)

    Karen H. Warren

    1996-01-01

    Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.

  1. Fast image processing on parallel hardware

    International Nuclear Information System (INIS)

    Bittner, U.

    1988-01-01

    Current digital imaging modalities in the medical field incorporate parallel hardware which is heavily used in the stage of image formation like the CT/MR image reconstruction or in the DSA real time subtraction. In order to image post-processing as efficient as image acquisition, new software approaches have to be found which take full advantage of the parallel hardware architecture. This paper describes the implementation of two-dimensional median filter which can serve as an example for the development of such an algorithm. The algorithm is analyzed by viewing it as a complete parallel sort of the k pixel values in the chosen window which leads to a generalization to rank order operators and other closely related filters reported in literature. A section about the theoretical base of the algorithm gives hints for how to characterize operations suitable for implementations on pipeline processors and the way to find the appropriate algorithms. Finally some results that computation time and usefulness of medial filtering in radiographic imaging are given

  2. New Parallel Algorithms for Landscape Evolution Model

    Science.gov (United States)

    Jin, Y.; Zhang, H.; Shi, Y.

    2017-12-01

    Most landscape evolution models (LEM) developed in the last two decades solve the diffusion equation to simulate the transportation of surface sediments. This numerical approach is difficult to parallelize due to the computation of drainage area for each node, which needs huge amount of communication if run in parallel. In order to overcome this difficulty, we developed two parallel algorithms for LEM with a stream net. One algorithm handles the partition of grid with traditional methods and applies an efficient global reduction algorithm to do the computation of drainage areas and transport rates for the stream net; the other algorithm is based on a new partition algorithm, which partitions the nodes in catchments between processes first, and then partitions the cells according to the partition of nodes. Both methods focus on decreasing communication between processes and take the advantage of massive computing techniques, and numerical experiments show that they are both adequate to handle large scale problems with millions of cells. We implemented the two algorithms in our program based on the widely used finite element library deal.II, so that it can be easily coupled with ASPECT.

  3. Partitioning sparse rectangular matrices for parallel processing

    Energy Technology Data Exchange (ETDEWEB)

    Kolda, T.G.

    1998-05-01

    The authors are interested in partitioning sparse rectangular matrices for parallel processing. The partitioning problem has been well-studied in the square symmetric case, but the rectangular problem has received very little attention. They will formalize the rectangular matrix partitioning problem and discuss several methods for solving it. They will extend the spectral partitioning method for symmetric matrices to the rectangular case and compare this method to three new methods -- the alternating partitioning method and two hybrid methods. The hybrid methods will be shown to be best.

  4. Parallel Boltzmann machines : a mathematical model

    NARCIS (Netherlands)

    Zwietering, P.J.; Aarts, E.H.L.

    1991-01-01

    A mathematical model is presented for the description of parallel Boltzmann machines. The framework is based on the theory of Markov chains and combines a number of previously known results into one generic model. It is argued that parallel Boltzmann machines maximize a function consisting of a

  5. Parallel models of associative memory

    CERN Document Server

    Hinton, Geoffrey E

    2014-01-01

    This update of the 1981 classic on neural networks includes new commentaries by the authors that show how the original ideas are related to subsequent developments. As researchers continue to uncover ways of applying the complex information processing abilities of neural networks, they give these models an exciting future which may well involve revolutionary developments in understanding the brain and the mind -- developments that may allow researchers to build adaptive intelligent machines. The original chapters show where the ideas came from and the new commentaries show where they are going

  6. Toward a parallel and cascading model of the writing system: A review of research on writing processes coordination

    OpenAIRE

    Thierry Olive

    2014-01-01

    Efficient coordination of the different writing processes is central to producing good-quality texts, and is a fundamental component of writing skill. In this article, I propose a general theoretical framework for considering how writing processes are coordinated, in which writing processes are concurrently activated with more or less overlap between processes depending on their working memory demands, and with the flow of information cascading from central to peripheral levels of processing....

  7. A qualitative single case study of parallel processes

    DEFF Research Database (Denmark)

    Jacobsen, Claus Haugaard

    2007-01-01

    Parallel process in psychotherapy and supervision is a phenomenon manifest in relationships and interactions, that originates in one setting and is reflected in another. This article presents an explorative single case study of parallel processes based on qualitative analyses of two successive...... randomly chosen psychotherapy sessions with a schizophrenic patient and the supervision session given in between. The author's analysis is verified by an independent examiner's analysis. Parallel processes are identified and described. Reflections on the dynamics of parallel processes and supervisory...

  8. An educational tool for interactive parallel and distributed processing

    DEFF Research Database (Denmark)

    Pagliarini, Luigi; Lund, Henrik Hautop

    2012-01-01

    In this article we try to describe how the modular interactive tiles system (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing a handson educational tool that allows a change in the representation...... of abstract problems related to designing interactive parallel and distributed systems. Indeed, the MITS seems to bring a series of goals into education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, and user and multi-user interaction which can rarely be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples we show how to implement...

  9. Surface topography of parallel grinding process for nonaxisymmetric aspheric lens

    International Nuclear Information System (INIS)

    Zhang Ningning; Wang Zhenzhong; Pan Ri; Wang Chunjin; Guo Yinbiao

    2012-01-01

    Workpiece surface profile, texture and roughness can be predicted by modeling the topography of wheel surface and modeling kinematics of grinding process, which compose an important part of precision grinding process theory. Parallel grinding technology is an important method for nonaxisymmetric aspheric lens machining, but there is few report on relevant simulation. In this paper, a simulation method based on parallel grinding for precision machining of aspheric lens is proposed. The method combines modeling the random surface of wheel and modeling the single grain track based on arc wheel contact points. Then, a mathematical algorithm for surface topography is proposed and applied in conditions of different machining parameters. The consistence between the results of simulation and test proves that the algorithm is correct and efficient. (authors)

  10. Parallel Distributed Processing theory in the age of deep networks

    OpenAIRE

    Bowers, Jeffrey

    2017-01-01

    Parallel Distributed Processing (PDP) models in psychology are the precursors of deep networks used in computer science. However, only PDP models are associated with two core psychological claims, namely, that all knowledge is coded in a distributed format, and cognition is mediated by non-symbolic computations. These claims have long been debated within cognitive science, and recent work with deep networks speaks to this debate. Specifically, single-unit recordings show that deep networks le...

  11. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2008-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...

  12. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2009-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...

  13. Parallel processing from applications to systems

    CERN Document Server

    Moldovan, Dan I

    1993-01-01

    This text provides one of the broadest presentations of parallelprocessing available, including the structure of parallelprocessors and parallel algorithms. The emphasis is on mappingalgorithms to highly parallel computers, with extensive coverage ofarray and multiprocessor architectures. Early chapters provideinsightful coverage on the analysis of parallel algorithms andprogram transformations, effectively integrating a variety ofmaterial previously scattered throughout the literature. Theory andpractice are well balanced across diverse topics in this concisepresentation. For exceptional cla

  14. Parallel Algorithms for Model Checking

    NARCIS (Netherlands)

    van de Pol, Jaco; Mousavi, Mohammad Reza; Sgall, Jiri

    2017-01-01

    Model checking is an automated verification procedure, which checks that a model of a system satisfies certain properties. These properties are typically expressed in some temporal logic, like LTL and CTL. Algorithms for LTL model checking (linear time logic) are based on automata theory and graph

  15. The effects of fear appeal message repetition on perceived threat, perceived efficacy, and behavioral intention in the extended parallel process model.

    Science.gov (United States)

    Shi, Jingyuan Jolie; Smith, Sandi W

    2016-01-01

    This study examined the effect of moderately repeated exposure (three times) to a fear appeal message on the Extended Parallel Processing Model (EPPM) variables of threat, efficacy, and behavioral intentions for the recommended behaviors in the message, as well as the proportions of systematic and message-related thoughts generated after each message exposure. The results showed that after repeated exposure to a fear appeal message about preventing melanoma, perceived threat in terms of susceptibility and perceived efficacy in terms of response efficacy significantly increased. The behavioral intentions of all recommended behaviors did not change after repeated exposure to the message. However, after the second exposure the proportions of both systematic and all message-related thoughts (relative to total thoughts) significantly decreased while the proportion of heuristic thoughts significantly increased, and this pattern held after the third exposure. The findings demonstrated that the predictions in the EPPM are likely to be operative after three exposures to a persuasive message.

  16. Using the Extended Parallel Process Model to create and evaluate the effectiveness of brochures to reduce the risk for noise-induced hearing loss in college students

    Directory of Open Access Journals (Sweden)

    Michael R Kotowski

    2011-01-01

    Full Text Available Brochures containing messages developed according to the Extended Parallel Process Model were deployed to increase intentions to use hearing protection for college students. These brochures were presented to one-half of a college student sample, after which a questionnaire was administered to assess perceptions of threat, efficacy, and behavioral intentions. The other half of the sample completed the questionnaire and then received brochures. Results indicated that people receiving the brochure before the questionnaire reported greater perceptions of hearing loss threat and efficacy to use ear plugs when in loud environments, however, intentions to use ear plugs were unchanged. Distribution of the brochure also resulted in greater perceptions of hearing loss threat and efficacy to use over-the-ear headphones when using devices such as MP3 players. In this case, however, intentions to use over-the-ear headphones increased. Results are discussed in terms of future research and practical applications.

  17. The study of image processing of parallel digital signal processor

    International Nuclear Information System (INIS)

    Liu Jie

    2000-01-01

    The author analyzes the basic characteristic of parallel DSP (digital signal processor) TMS320C80 and proposes related optimized image algorithm and the parallel processing method based on parallel DSP. The realtime for many image processing can be achieved in this way

  18. Removal of antibiotics in a parallel-plate thin-film-photocatalytic reactor: Process modeling and evolution of transformation by-products and toxicity.

    Science.gov (United States)

    Özkal, Can Burak; Frontistis, Zacharias; Antonopoulou, Maria; Konstantinou, Ioannis; Mantzavinos, Dionissios; Meriç, Süreyya

    2017-10-01

    Photocatalytic degradation of sulfamethoxazole (SMX) antibiotic has been studied under recycling batch and homogeneous flow conditions in a thin-film coated immobilized system namely parallel-plate (PPL) reactor. Experimentally designed, statistically evaluated with a factorial design (FD) approach with intent to provide a mathematical model takes into account the parameters influencing process performance. Initial antibiotic concentration, UV energy level, irradiated surface area, water matrix (ultrapure and secondary treated wastewater) and time, were defined as model parameters. A full of 2 5 experimental design was consisted of 32 random experiments. PPL reactor test experiments were carried out in order to set boundary levels for hydraulic, volumetric and defined defined process parameters. TTIP based thin-film with polyethylene glycol+TiO 2 additives were fabricated according to pre-described methodology. Antibiotic degradation was monitored by High Performance Liquid Chromatography analysis while the degradation products were specified by LC-TOF-MS analysis. Acute toxicity of untreated and treated SMX solutions was tested by standard Daphnia magna method. Based on the obtained mathematical model, the response of the immobilized PC system is described with a polynomial equation. The statistically significant positive effects are initial SMX concentration, process time and the combined effect of both, while combined effect of water matrix and irradiated surface area displays an adverse effect on the rate of antibiotic degradation by photocatalytic oxidation. Process efficiency and the validity of the acquired mathematical model was also verified for levofloxacin and cefaclor antibiotics. Immobilized PC degradation in PPL reactor configuration was found capable of providing reduced effluent toxicity by simultaneous degradation of SMX parent compound and TBPs. Copyright © 2017. Published by Elsevier B.V.

  19. GPGPU Parallel SPIN Model Checker

    Data.gov (United States)

    National Aeronautics and Space Administration — Model Checking is a powerful technique used to verify that a system does not violate its intended behavior. While this is very useful in proving the robustness of a...

  20. An Educational Tool for Interactive Parallel and Distributed Processing

    DEFF Research Database (Denmark)

    Pagliarini, Luigi; Lund, Henrik Hautop

    2011-01-01

    In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation of the abs......In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation...... of the abstract problems related to designing interactive parallel and distributed systems. Indeed, MITS seems to bring a series of goals into the education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, user and multiuser interaction, which can hardly be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples show how to implement interactive...

  1. An intelligent allocation algorithm for parallel processing

    Science.gov (United States)

    Carroll, Chester C.; Homaifar, Abdollah; Ananthram, Kishan G.

    1988-01-01

    The problem of allocating nodes of a program graph to processors in a parallel processing architecture is considered. The algorithm is based on critical path analysis, some allocation heuristics, and the execution granularity of nodes in a program graph. These factors, and the structure of interprocessor communication network, influence the allocation. To achieve realistic estimations of the executive durations of allocations, the algorithm considers the fact that nodes in a program graph have to communicate through varying numbers of tokens. Coarse and fine granularities have been implemented, with interprocessor token-communication duration, varying from zero up to values comparable to the execution durations of individual nodes. The effect on allocation of communication network structures is demonstrated by performing allocations for crossbar (non-blocking) and star (blocking) networks. The algorithm assumes the availability of as many processors as it needs for the optimal allocation of any program graph. Hence, the focus of allocation has been on varying token-communication durations rather than varying the number of processors. The algorithm always utilizes as many processors as necessary for the optimal allocation of any program graph, depending upon granularity and characteristics of the interprocessor communication network.

  2. Work stressors, depressive symptoms and sleep quality among US Navy members: a parallel process latent growth modelling approach across deployment.

    Science.gov (United States)

    Bravo, Adrian J; Kelley, Michelle L; Swinkels, Cindy M; Ulmer, Christi S

    2017-11-03

    The present study examined whether work stressors contribute to sleep problems and depressive symptoms over the course of deployment (i.e. pre-deployment, post-deployment and 6-month reintegration) among US Navy members. Specifically, we examined whether depressive symptoms or sleep quality mediate the relationships between work stressors and these outcomes. Participants were 101 US Navy members who experienced an 8-month deployment after Operational Enduring Freedom/Operation Iraqi Freedom. Using piecewise latent growth models, we found that increased work stressors were linked to increased depressive symptoms and decreased sleep quality across all three deployment stages. Further, increases in work stressors from pre- to post-deployment contributed to poorer sleep quality post-deployment via increasing depressive symptoms. Moreover, sleep quality mediated the association between increases in work stressors and increases in depressive symptoms from pre- to post-deployment. These effects were maintained from post-deployment through the 6-month reintegration. Although preliminary, our results suggest that changes in work stressors may have small, but significant implications for both depressive symptoms and quality of sleep over time, and a bi-directional relationship persists between sleep quality and depression across deployment. Strategies that target both stress and sleep could address both precipitating and perpetuating factors that affect sleep and depressive symptoms. © 2017 European Sleep Research Society.

  3. Parallel Computing for Terrestrial Ecosystem Carbon Modeling

    International Nuclear Information System (INIS)

    Wang, Dali; Post, Wilfred M.; Ricciuto, Daniel M.; Berry, Michael

    2011-01-01

    Terrestrial ecosystems are a primary component of research on global environmental change. Observational and modeling research on terrestrial ecosystems at the global scale, however, has lagged behind their counterparts for oceanic and atmospheric systems, largely because the unique challenges associated with the tremendous diversity and complexity of terrestrial ecosystems. There are 8 major types of terrestrial ecosystem: tropical rain forest, savannas, deserts, temperate grassland, deciduous forest, coniferous forest, tundra, and chaparral. The carbon cycle is an important mechanism in the coupling of terrestrial ecosystems with climate through biological fluxes of CO 2 . The influence of terrestrial ecosystems on atmospheric CO 2 can be modeled via several means at different timescales. Important processes include plant dynamics, change in land use, as well as ecosystem biogeography. Over the past several decades, many terrestrial ecosystem models (see the 'Model developments' section) have been developed to understand the interactions between terrestrial carbon storage and CO 2 concentration in the atmosphere, as well as the consequences of these interactions. Early TECMs generally adapted simple box-flow exchange models, in which photosynthetic CO 2 uptake and respiratory CO 2 release are simulated in an empirical manner with a small number of vegetation and soil carbon pools. Demands on kinds and amount of information required from global TECMs have grown. Recently, along with the rapid development of parallel computing, spatially explicit TECMs with detailed process based representations of carbon dynamics become attractive, because those models can readily incorporate a variety of additional ecosystem processes (such as dispersal, establishment, growth, mortality etc.) and environmental factors (such as landscape position, pest populations, disturbances, resource manipulations, etc.), and provide information to frame policy options for climate change

  4. Density functional theory and parallel processing

    International Nuclear Information System (INIS)

    Ward, R.C.; Geist, G.A.; Butler, W.H.

    1987-01-01

    The authors demonstrate a method for obtaining the ground state energies and charge densities of a system of atoms described within density functional theory using simulated annealing on a parallel computer

  5. Implementation of a parallel version of a regional climate model

    Energy Technology Data Exchange (ETDEWEB)

    Gerstengarbe, F.W. [ed.; Kuecken, M. [Potsdam-Institut fuer Klimafolgenforschung (PIK), Potsdam (Germany); Schaettler, U. [Deutscher Wetterdienst, Offenbach am Main (Germany). Geschaeftsbereich Forschung und Entwicklung

    1997-10-01

    A regional climate model developed by the Max Planck Institute for Meterology and the German Climate Computing Centre in Hamburg based on the `Europa` and `Deutschland` models of the German Weather Service has been parallelized and implemented on the IBM RS/6000 SP computer system of the Potsdam Institute for Climate Impact Research including parallel input/output processing, the explicit Eulerian time-step, the semi-implicit corrections, the normal-mode initialization and the physical parameterizations of the German Weather Service. The implementation utilizes Fortran 90 and the Message Passing Interface. The parallelization strategy used is a 2D domain decomposition. This report describes the parallelization strategy, the parallel I/O organization, the influence of different domain decomposition approaches for static and dynamic load imbalances and first numerical results. (orig.)

  6. A further extension of the Extended Parallel Process Model (E-EPPM): implications of cognitive appraisal theory of emotion and dispositional coping style.

    Science.gov (United States)

    So, Jiyeon

    2013-01-01

    For two decades, the extended parallel process model (EPPM; Witte, 1992 ) has been one of the most widely used theoretical frameworks in health risk communication. The model has gained much popularity because it recognizes that, ironically, preceding fear appeal models do not incorporate the concept of fear as a legitimate and central part of them. As a remedy to this situation, the EPPM aims at "putting the fear back into fear appeals" ( Witte, 1992 , p. 330). Despite this attempt, however, this article argues that the EPPM still does not fully capture the essence of fear as an emotion. Specifically, drawing upon Lazarus's (1991 ) cognitive appraisal theory of emotion and the concept of dispositional coping style ( Miller, 1995 ), this article seeks to further extend the EPPM. The revised EPPM incorporates a more comprehensive perspective on risk perceptions as a construct involving both cognitive and affective aspects (i.e., fear and anxiety) and integrates the concept of monitoring and blunting coping style as a moderator of further information seeking regarding a given risk topic.

  7. Encouraging early preventive dental visits for preschool-aged children enrolled in Medicaid: using the extended parallel process model to conduct formative research.

    Science.gov (United States)

    Askelson, Natoshia M; Chi, Donald L; Momany, Elizabeth; Kuthy, Raymond; Ortiz, Cristina; Hanson, Jessica D; Damiano, Peter

    2014-01-01

    Preventive dental visits for preschool-aged children can result in better oral health outcomes, especially for children from lower income families. Many children, however, still do not see a dentist for preventive visits. This qualitative study examined the potential for the Extended Parallel Process Model (EPPM) to be used to uncover potential antecedents to parents' decisions about seeking preventive dental care. Seventeen focus groups including 41 parents were conducted. The focus group protocol centered on constructs (perceived severity, perceived susceptibility, perceived self-efficacy, and perceived response efficacy) of the EPPM. Transcripts were analyzed by three coders who employed closed coding strategies. Parents' perceptions of severity of dental issues were high, particularly regarding negative health and appearance outcomes. Parents perceived susceptibility of their children to dental problems as low, primarily because most children in this study received preventive care, which parents viewed as highly efficacious. Parents' self-efficacy to obtain preventive care for their children was high. However, they were concerned about barriers including lack of dentists, especially dentists who are good with young children. Findings were consistent with EPPM, which suggests this model is a potential tool for understanding parents' decisions about seeking preventive dental care for their young children. Future research should utilize quantitative methods to test this model. © 2012 American Association of Public Health Dentistry.

  8. Iteration schemes for parallelizing models of superconductivity

    Energy Technology Data Exchange (ETDEWEB)

    Gray, P.A. [Michigan State Univ., East Lansing, MI (United States)

    1996-12-31

    The time dependent Lawrence-Doniach model, valid for high fields and high values of the Ginzburg-Landau parameter, is often used for studying vortex dynamics in layered high-T{sub c} superconductors. When solving these equations numerically, the added degrees of complexity due to the coupling and nonlinearity of the model often warrant the use of high-performance computers for their solution. However, the interdependence between the layers can be manipulated so as to allow parallelization of the computations at an individual layer level. The reduced parallel tasks may then be solved independently using a heterogeneous cluster of networked workstations connected together with Parallel Virtual Machine (PVM) software. Here, this parallelization of the model is discussed and several computational implementations of varying degrees of parallelism are presented. Computational results are also given which contrast properties of convergence speed, stability, and consistency of these implementations. Included in these results are models involving the motion of vortices due to an applied current and pinning effects due to various material properties.

  9. Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

    Science.gov (United States)

    Hsieh, Shang-Hsien

    1993-01-01

    The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.

  10. The Moderated Mediating Effect of Self-Efficacy on Exercise Among Older Adults in an Online Bone Health Intervention Study: A Parallel Process Latent Growth Curve Model.

    Science.gov (United States)

    Zhu, Shijun; Nahm, Eun-Shim; Resnick, Barbara; Friedmann, Erika; Brown, Clayton; Park, Jumin; Cheon, Jooyoung; Park, DoHwan

    2017-07-01

    This secondary data analyses of a longitudinal study assessed whether self-efficacy for exercise (SEE) mediated online intervention effects on exercise among older adults and whether age (50-64 vs. ≥65 years) moderated the mediation. Data were from an online bone health intervention study. Eight hundred sixty-six older adults (≥50 years) were randomized to three arms: Bone Power (n = 301), Bone Power Plus (n = 302), or Control (n = 263). Parallel process latent growth curve modeling (LGCM) was used to jointly model growths in SEE and in exercise and to assess the mediating effect of SEE on the effect of intervention on exercise. SEE was a significant mediator in 50- to 64-year-old adults (0.061, 95 BCI: 0.011, 0.163) but not in the ≥65 age group (-0.004, 95% BCI: -0.047, 0.025). Promotion of SEE is critical to improve exercise among 50- to 64-year-olds.

  11. Bessel functions: parallel display and processing.

    Science.gov (United States)

    Lohmann, A W; Ojeda-Castañeda, J; Serrano-Heredia, A

    1994-01-01

    We present an optical setup that converts planar binary curves into two-dimensional amplitude distributions, which are proportional, along one axis, to the Bessel function of order n, whereas along the other axis the order n increases. This Bessel displayer can be used for parallel Bessel transformation of a signal. Experimental verifications are included.

  12. Smoldyn on graphics processing units: massively parallel Brownian dynamics simulations.

    Science.gov (United States)

    Dematté, Lorenzo

    2012-01-01

    Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output

  13. A Model for Speedup of Parallel Programs

    Science.gov (United States)

    1997-01-01

    Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static

  14. A fast and efficient adaptive parallel ray tracing based model for thermally coupled surface radiation in casting and heat treatment processes

    International Nuclear Information System (INIS)

    Fainberg, J; Schaefer, W

    2015-01-01

    A new algorithm for heat exchange between thermally coupled diffusely radiating interfaces is presented, which can be applied for closed and half open transparent radiating cavities. Interfaces between opaque and transparent materials are automatically detected and subdivided into elementary radiation surfaces named tiles. Contrary to the classical view factor method, the fixed unit sphere area subdivision oriented along the normal tile direction is projected onto the surrounding radiation mesh and not vice versa. Then, the total incident radiating flux of the receiver is approximated as a direct sum of radiation intensities of representative “senders” with the same weight factor. A hierarchical scheme for the space angle subdivision is selected in order to minimize the total memory and the computational demands during thermal calculations. Direct visibility is tested by means of a voxel-based ray tracing method accelerated by means of the anisotropic Chebyshev distance method, which reuses the computational grid as a Chebyshev one. The ray tracing algorithm is fully parallelized using MPI and takes advantage of the balanced distribution of all available tiles among all CPU's. This approach allows tracing of each particular ray without any communication. The algorithm has been implemented in a commercial casting process simulation software. The accuracy and computational performance of the new radiation model for heat treatment, investment and ingot casting applications is illustrated using industrial examples. (paper)

  15. Self-critical perfectionism, dependency, and symptomatic distress in patients with personality disorder during hospitalization-based psychodynamic treatment: A parallel process growth modeling approach.

    Science.gov (United States)

    Lowyck, Benedicte; Luyten, Patrick; Vermote, Rudi; Verhaest, Yannic; Vansteelandt, Kristof

    2017-07-01

    There is growing evidence for the efficacy and effectiveness of psychotherapy in patients with personality disorder (PD), but very little is known about the factors underlying these effects. Two-polarities models of personality development provide an empirically supported approach to studying therapeutic change. Briefly, these models argue that personality pathology is characterized by an imbalance between development of the capacity for self-definition and for relatedness, with an exaggerated emphasis on issues regarding self-definition and relatedness being expressed in high levels of self-critical perfectionism (SCP) and dependency, respectively. This study used data from a study of 111 patients with PD who received long-term hospitalization-based psychodynamic treatment to investigate whether (a) treatment was related to changes in SCP, dependency, and symptomatic distress; (b) these changes could be explained by pretreatment levels of SCP, dependency, and/or symptomatic distress; and (c) changes in these personality dimensions over time were associated with symptomatic improvement. SCP, dependency, and symptomatic distress were assessed at admission (baseline), at 12 and 24 weeks into treatment, and at discharge. Parallel process multilevel growth modeling showed that (a) treatment was associated with a significant decrease in levels of SCP, dependency, and symptomatic distress, whereas (b) pretreatment levels of each of these three factors did not predict the decreases observed, and (c) changes in SCP, but not dependency, were associated with the rate of decrease in symptomatic distress over time. Implications of these findings for our understanding of therapeutic change in the treatment of PD are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  16. Parallel phase model : a programming model for high-end parallel machines with manycores.

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

    2009-04-01

    This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.

  17. Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

    Science.gov (United States)

    Tam, Wing-Kin; Yang, Zhi

    2018-05-01

    Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.

  18. Parallel processing at the SSC: The fact and the fiction

    International Nuclear Information System (INIS)

    Bourianoff, G.; Cole, B.

    1991-10-01

    Accurately modelling the behavior of particles circulating in accelerators is a computationally demanding task. The particle tracking code currently in use at SSC is based upon a ''thin element'' analysis (TEAPOT). In this model each magnet in the lattice is described by a thin element at which the particle experiences an impulsive kick. Each kick requires approximately 200 floating point operations (''FLOP''). For the SSC collider lattice consisting of 10 4 elements, performing a tracking of study for a set of 100 particles for 10 7 turns would require 2 x 10 15 FLOPS. Even on a machine capable of 100 MFLOP/sec (MFLOPS), this would require 2 x 10 7 seconds, and many such runs are necessary. It should be noted that the accuracy with which the kicks are to be calculated is important: the large number of iterations involved will magnify the effects of small errors. The inability of current computational resources to effectively perform the full calculation motivates the migration of this calculation to the most powerful computers available. A survey of the current research into new technologies for superconducting reveals that the supercomputers of the future will be parallel in nature. Further, numerous such machines exist today, and are being used to solve other difficult problems. Thus it seems clear that it is not early to begin developing the capability to develop tracking codes for parallel architectures. This report discusses implementing parallel processing on the SCC

  19. Parallel-Processing Test Bed For Simulation Software

    Science.gov (United States)

    Blech, Richard; Cole, Gary; Townsend, Scott

    1996-01-01

    Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).

  20. A Topological Model for Parallel Algorithm Design

    Science.gov (United States)

    1991-09-01

    effort should be directed to planning, requirements analysis, specification and design, with 20% invested into the actual coding, and then the final 40...be olle more language to learn. And by investing the effort into improving the utility of ai, existing language instead of creating a new one, this...193) it abandons the notion of a process as a fundemental concept of parallel program design and that it facilitates program derivation by rigorously

  1. Parallel processing of two-dimensional Sn transport calculations

    International Nuclear Information System (INIS)

    Uematsu, M.

    1997-01-01

    A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation

  2. Electromagnetic Physics Models for Parallel Computing Architectures

    International Nuclear Information System (INIS)

    Amadio, G; Bianchini, C; Iope, R; Ananya, A; Apostolakis, J; Aurora, A; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S Y; Lima, G; Duhem, L

    2016-01-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. (paper)

  3. Electromagnetic Physics Models for Parallel Computing Architectures

    Science.gov (United States)

    Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

    2016-10-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.

  4. Longitudinal associations between sleep and anxiety during pregnancy, and the moderating effect of resilience, using parallel process latent growth curve models.

    Science.gov (United States)

    van der Zwan, Judith Esi; de Vente, Wieke; Tolvanen, Mimmi; Karlsson, Hasse; Buil, J Marieke; Koot, Hans M; Paavonen, E Juulia; Polo-Kantola, Päivi; Huizink, Anja C; Karlsson, Linnea

    2017-12-01

    For many women, pregnancy-related sleep disturbances and pregnancy-related anxiety change as pregnancy progresses and both are associated with lower maternal quality of life and less favorable birth outcomes. Thus, the interplay between these two problems across pregnancy is of interest. In addition, psychological resilience may explain individual differences in this association, as it may promote coping with both sleep disturbances and anxiety, and thereby reduce their mutual effects. Therefore, the aim of the current study was to examine whether sleep quality and sleep duration, and changes in sleep are associated with the level of and changes in anxiety during pregnancy. Furthermore, the study tested the moderating effect of resilience on these associations. At gestational weeks 14, 24, and 34, 532 pregnant women from the FinnBrain Birth Cohort Study in Finland filled out questionnaires on general sleep quality, sleep duration and pregnancy-related anxiety; resilience was assessed in week 14. Parallel process latent growth curve models showed that shorter initial sleep duration predicted a higher initial level of anxiety, and a higher initial anxiety level predicted a faster shortening of sleep duration. Changes in sleep duration and changes in anxiety over the course of pregnancy were not related. The predicted moderating effect of resilience was not found. The results suggested that pregnant women reporting anxiety problems should also be screened for sleeping problems, and vice versa, because women who experienced one of these pregnancy-related problems were also at risk of experiencing or developing the other problem. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Parallelization of a hydrological model using the message passing interface

    Science.gov (United States)

    Wu, Yiping; Li, Tiejian; Sun, Liqun; Chen, Ji

    2013-01-01

    With the increasing knowledge about the natural processes, hydrological models such as the Soil and Water Assessment Tool (SWAT) are becoming larger and more complex with increasing computation time. Additionally, other procedures such as model calibration, which may require thousands of model iterations, can increase running time and thus further reduce rapid modeling and analysis. Using the widely-applied SWAT as an example, this study demonstrates how to parallelize a serial hydrological model in a Windows® environment using a parallel programing technology—Message Passing Interface (MPI). With a case study, we derived the optimal values for the two parameters (the number of processes and the corresponding percentage of work to be distributed to the master process) of the parallel SWAT (P-SWAT) on an ordinary personal computer and a work station. Our study indicates that model execution time can be reduced by 42%–70% (or a speedup of 1.74–3.36) using multiple processes (two to five) with a proper task-distribution scheme (between the master and slave processes). Although the computation time cost becomes lower with an increasing number of processes (from two to five), this enhancement becomes less due to the accompanied increase in demand for message passing procedures between the master and all slave processes. Our case study demonstrates that the P-SWAT with a five-process run may reach the maximum speedup, and the performance can be quite stable (fairly independent of a project size). Overall, the P-SWAT can help reduce the computation time substantially for an individual model run, manual and automatic calibration procedures, and optimization of best management practices. In particular, the parallelization method we used and the scheme for deriving the optimal parameters in this study can be valuable and easily applied to other hydrological or environmental models.

  6. Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine

    DEFF Research Database (Denmark)

    Madsen, Kasper Grud Skat; Zhou, Yongluan; Cao, Jianneng

    2017-01-01

    Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled...... solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches....

  7. Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Barnett, D.A.

    1999-01-01

    The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead

  8. Multitasking TORT under UNICOS: Parallel performance models and measurements

    International Nuclear Information System (INIS)

    Barnett, A.; Azmy, Y.Y.

    1999-01-01

    The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead

  9. Parallelization of the model-based iterative reconstruction algorithm DIRA

    International Nuclear Information System (INIS)

    Oertenberg, A.; Sandborg, M.; Alm Carlsson, G.; Malusek, A.; Magnusson, M.

    2016-01-01

    New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelization of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelization of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelized using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelization of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelization with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. (authors)

  10. Parallel and Distributed Data Processing Using Autonomous ...

    African Journals Online (AJOL)

    Looking at the distributed nature of these networks, data is processed by remote login or Remote Procedure Calls (RPC), this causes congestion in the network bandwidth. This paper proposes a framework where software agents are assigned duties to be processing the distributed data concurrently and assembling the ...

  11. A new parallelization algorithm of ocean model with explicit scheme

    Science.gov (United States)

    Fu, X. D.

    2017-08-01

    This paper will focus on the parallelization of ocean model with explicit scheme which is one of the most commonly used schemes in the discretization of governing equation of ocean model. The characteristic of explicit schema is that calculation is simple, and that the value of the given grid point of ocean model depends on the grid point at the previous time step, which means that one doesn’t need to solve sparse linear equations in the process of solving the governing equation of the ocean model. Aiming at characteristics of the explicit scheme, this paper designs a parallel algorithm named halo cells update with tiny modification of original ocean model and little change of space step and time step of the original ocean model, which can parallelize ocean model by designing transmission module between sub-domains. This paper takes the GRGO for an example to implement the parallelization of GRGO (Global Reduced Gravity Ocean model) with halo update. The result demonstrates that the higher speedup can be achieved at different problem size.

  12. Advanced optical signal processing of broadband parallel data signals

    DEFF Research Database (Denmark)

    Oxenløwe, Leif Katsuo; Hu, Hao; Kjøller, Niels-Kristian

    2016-01-01

    Optical signal processing may aid in reducing the number of active components in communication systems with many parallel channels, by e.g. using telescopic time lens arrangements to perform format conversion and allow for WDM regeneration.......Optical signal processing may aid in reducing the number of active components in communication systems with many parallel channels, by e.g. using telescopic time lens arrangements to perform format conversion and allow for WDM regeneration....

  13. Parallelization of the Coupled Earthquake Model

    Science.gov (United States)

    Block, Gary; Li, P. Peggy; Song, Yuhe T.

    2007-01-01

    This Web-based tsunami simulation system allows users to remotely run a model on JPL s supercomputers for a given undersea earthquake. At the time of this reporting, predicting tsunamis on the Internet has never happened before. This new code directly couples the earthquake model and the ocean model on parallel computers and improves simulation speed. Seismometers can only detect information from earthquakes; they cannot detect whether or not a tsunami may occur as a result of the earthquake. When earthquake-tsunami models are coupled with the improved computational speed of modern, high-performance computers and constrained by remotely sensed data, they are able to provide early warnings for those coastal regions at risk. The software is capable of testing NASA s satellite observations of tsunamis. It has been successfully tested for several historical tsunamis, has passed all alpha and beta testing, and is well documented for users.

  14. Parallel finite elements with domain decomposition and its pre-processing

    International Nuclear Information System (INIS)

    Yoshida, A.; Yagawa, G.; Hamada, S.

    1993-01-01

    This paper describes a parallel finite element analysis using a domain decomposition method, and the pre-processing for the parallel calculation. Computer simulations are about to replace experiments in various fields, and the scale of model to be simulated tends to be extremely large. On the other hand, computational environment has drastically changed in these years. Especially, parallel processing on massively parallel computers or computer networks is considered to be promising techniques. In order to achieve high efficiency on such parallel computation environment, large granularity of tasks, a well-balanced workload distribution are key issues. It is also important to reduce the cost of pre-processing in such parallel FEM. From the point of view, the authors developed the domain decomposition FEM with the automatic and dynamic task-allocation mechanism and the automatic mesh generation/domain subdivision system for it. (author)

  15. Parallel Distributed Processing Theory in the Age of Deep Networks.

    Science.gov (United States)

    Bowers, Jeffrey S

    2017-12-01

    Parallel distributed processing (PDP) models in psychology are the precursors of deep networks used in computer science. However, only PDP models are associated with two core psychological claims, namely that all knowledge is coded in a distributed format and cognition is mediated by non-symbolic computations. These claims have long been debated in cognitive science, and recent work with deep networks speaks to this debate. Specifically, single-unit recordings show that deep networks learn units that respond selectively to meaningful categories, and researchers are finding that deep networks need to be supplemented with symbolic systems to perform some tasks. Given the close links between PDP and deep networks, it is surprising that research with deep networks is challenging PDP theory. Copyright © 2017. Published by Elsevier Ltd.

  16. Evolution of a minimal parallel programming model

    International Nuclear Information System (INIS)

    Lusk, Ewing; Butler, Ralph; Pieper, Steven C.

    2017-01-01

    Here, we take a historical approach to our presentation of self-scheduled task parallelism, a programming model with its origins in early irregular and nondeterministic computations encountered in automated theorem proving and logic programming. We show how an extremely simple task model has evolved into a system, asynchronous dynamic load balancing (ADLB), and a scalable implementation capable of supporting sophisticated applications on today’s (and tomorrow’s) largest supercomputers; and we illustrate the use of ADLB with a Green’s function Monte Carlo application, a modern, mature nuclear physics code in production use. Our lesson is that by surrendering a certain amount of generality and thus applicability, a minimal programming model (in terms of its basic concepts and the size of its application programmer interface) can achieve extreme scalability without introducing complexity.

  17. Climate models on massively parallel computers

    International Nuclear Information System (INIS)

    Vitart, F.; Rouvillois, P.

    1993-01-01

    First results got on massively parallel computers (Multiple Instruction Multiple Data and Simple Instruction Multiple Data) allow to consider building of coupled models with high resolutions. This would make possible simulation of thermoaline circulation and other interaction phenomena between atmosphere and ocean. The increasing of computers powers, and then the improvement of resolution will go us to revise our approximations. Then hydrostatic approximation (in ocean circulation) will not be valid when the grid mesh will be of a dimension lower than a few kilometers: We shall have to find other models. The expert appraisement got in numerical analysis at the Center of Limeil-Valenton (CEL-V) will be used again to imagine global models taking in account atmosphere, ocean, ice floe and biosphere, allowing climate simulation until a regional scale

  18. Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2016-03-15

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  19. Parallel processing for pitch splitting decomposition

    Science.gov (United States)

    Barnes, Levi; Li, Yong; Wadkins, David; Biederman, Steve; Miloslavsky, Alex; Cork, Chris

    2009-10-01

    Decomposition of an input pattern in preparation for a double patterning process is an inherently global problem in which the influence of a local decomposition decision can be felt across an entire pattern. In spite of this, a large portion of the work can be massively distributed. Here, we discuss the advantages of geometric distribution for polygon operations with limited range of influence. Further, we have found that even the naturally global "coloring" step can, in large part, be handled in a geometrically local manner. In some practical cases, up to 70% of the work can be distributed geometrically. We also describe the methods for partitioning the problem into local pieces and present scaling data up to 100 CPUs. These techniques reduce DPT decomposition runtime by orders of magnitude.

  20. Parallel processing of neutron transport in fuel assembly calculation

    International Nuclear Information System (INIS)

    Song, Jae Seung

    1992-02-01

    Group constants, which are used for reactor analyses by nodal method, are generated by fuel assembly calculations based on the neutron transport theory, since one or a quarter of the fuel assembly corresponds to a unit mesh in the current nodal calculation. The group constant calculation for a fuel assembly is performed through spectrum calculations, a two-dimensional fuel assembly calculation, and depletion calculations. The purpose of this study is to develop a parallel algorithm to be used in a parallel processor for the fuel assembly calculation and the depletion calculations of the group constant generation. A serial program, which solves the neutron integral transport equation using the transmission probability method and the linear depletion equation, was prepared and verified by a benchmark calculation. Small changes from the serial program was enough to parallelize the depletion calculation which has inherent parallel characteristics. In the fuel assembly calculation, however, efficient parallelization is not simple and easy because of the many coupling parameters in the calculation and data communications among CPU's. In this study, the group distribution method is introduced for the parallel processing of the fuel assembly calculation to minimize the data communications. The parallel processing was performed on Quadputer with 4 CPU's operating in NURAD Lab. at KAIST. Efficiencies of 54.3 % and 78.0 % were obtained in the fuel assembly calculation and depletion calculation, respectively, which lead to the overall speedup of about 2.5. As a result, it is concluded that the computing time consumed for the group constant generation can be easily reduced by parallel processing on the parallel computer with small size CPU's

  1. Parallel and distributed processing: applications to power systems

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Felix; Murphy, Liam [California Univ., Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences

    1994-12-31

    Applications of parallel and distributed processing to power systems problems are still in the early stages. Rapid progress in computing and communications promises a revolutionary increase in the capacity of distributed processing systems. In this paper, the state-of-the art in distributed processing technology and applications is reviewed and future trends are discussed. (author) 14 refs.,1 tab.

  2. The Acoustic and Peceptual Effects of Series and Parallel Processing

    Directory of Open Access Journals (Sweden)

    Melinda C. Anderson

    2009-01-01

    Full Text Available Temporal envelope (TE cues provide a great deal of speech information. This paper explores how spectral subtraction and dynamic-range compression gain modifications affect TE fluctuations for parallel and series configurations. In parallel processing, algorithms compute gains based on the same input signal, and the gains in dB are summed. In series processing, output from the first algorithm forms the input to the second algorithm. Acoustic measurements show that the parallel arrangement produces more gain fluctuations, introducing more changes to the TE than the series configurations. Intelligibility tests for normal-hearing (NH and hearing-impaired (HI listeners show (1 parallel processing gives significantly poorer speech understanding than an unprocessed (UNP signal and the series arrangement and (2 series processing and UNP yield similar results. Speech quality tests show that UNP is preferred to both parallel and series arrangements, although spectral subtraction is the most preferred. No significant differences exist in sound quality between the series and parallel arrangements, or between the NH group and the HI group. These results indicate that gain modifications affect intelligibility and sound quality differently. Listeners appear to have a higher tolerance for gain modifications with regard to intelligibility, while judgments for sound quality appear to be more affected by smaller amounts of gain modification.

  3. Efficient multitasking: parallel versus serial processing of multiple tasks.

    Science.gov (United States)

    Fischer, Rico; Plessow, Franziska

    2015-01-01

    In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.

  4. Fast robot kinematics modeling by using a parallel simulator (PSIM)

    International Nuclear Information System (INIS)

    El-Gazzar, H.M.; Ayad, N.M.A.

    2002-01-01

    High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done

  5. Fast robot kinematics modeling by using a parallel simulator (PSIM)

    Energy Technology Data Exchange (ETDEWEB)

    El-Gazzar, H M; Ayad, N M.A. [Atomic Energy Authority, Reactor Dept., Computer and Control Lab., P.O. Box no 13759 (Egypt)

    2002-09-15

    High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done.

  6. A Novel Least Significant Bit First Processing Parallel CRC Circuit

    Directory of Open Access Journals (Sweden)

    Xiujie Qu

    2013-01-01

    Full Text Available In HDLC serial communication protocol, CRC calculation can first process the most or least significant bit of data. Nowadays most CRC calculation is based on the most significant bit (MSB first processing. An algorithm of the least significant bit (LSB first processing parallel CRC is proposed in this paper. Based on the general expression of the least significant bit first processing serial CRC, using state equation method of linear system, we derive a recursive formula by the mathematical deduction. The recursive formula is applicable to any number of bits processed in parallel and any series of generator polynomial. According to the formula, we present the parallel circuit of CRC calculation and implement it with VHDL on FPGA. The results verify the accuracy and effectiveness of this method.

  7. Distributed parallel computing in stochastic modeling of groundwater systems.

    Science.gov (United States)

    Dong, Yanhui; Li, Guomin; Xu, Haizhen

    2013-03-01

    Stochastic modeling is a rapidly evolving, popular approach to the study of the uncertainty and heterogeneity of groundwater systems. However, the use of Monte Carlo-type simulations to solve practical groundwater problems often encounters computational bottlenecks that hinder the acquisition of meaningful results. To improve the computational efficiency, a system that combines stochastic model generation with MODFLOW-related programs and distributed parallel processing is investigated. The distributed computing framework, called the Java Parallel Processing Framework, is integrated into the system to allow the batch processing of stochastic models in distributed and parallel systems. As an example, the system is applied to the stochastic delineation of well capture zones in the Pinggu Basin in Beijing. Through the use of 50 processing threads on a cluster with 10 multicore nodes, the execution times of 500 realizations are reduced to 3% compared with those of a serial execution. Through this application, the system demonstrates its potential in solving difficult computational problems in practical stochastic modeling. © 2012, The Author(s). Groundwater © 2012, National Ground Water Association.

  8. Utilization of Mental Health Services and Mental Health Status Among Children Placed in Out-of-Home Care: A Parallel Process Latent Growth Modeling Approach.

    Science.gov (United States)

    Yampolskaya, Svetlana; Sharrock, Patty J; Clark, Colleen; Hanson, Ardis

    2017-10-01

    This longitudinal study examined the parallel trajectories of mental health service use and mental health status among children placed in Florida out-of-home care. The results of growth curve modeling suggested that children with greater mental health problems initially received more mental health services. Initial child mental health status, however, had no effect on subsequent service provision when all outpatient mental health services were included. When specific types of mental health services, such as basic outpatient, targeted case management, and intensive mental health services were examined, results suggested that children with compromised functioning during the baseline period received more intensive mental health services over time. However, this increased provision of intensive mental health services did not improve mental health status, rather it was significantly associated with progressively worse mental health functioning. These findings underscore the need for regular comprehensive mental health assessments focusing on specific needs of the child.

  9. Method of parallel processing in SANPO real time system

    International Nuclear Information System (INIS)

    Ostrovnoj, A.I.; Salamatin, I.M.

    1981-01-01

    A method of parellel processing in SANPO real time system is described. Algorithms of data accumulation and preliminary processing in this system as a parallel processes using a specialized high level programming language are described. Hierarchy of elementary processes are also described. It provides the synchronization of concurrent processes without semaphors. The developed means are applied to the systems of experiment automation using SM-3 minicomputers [ru

  10. Parallel and distributed processing in power system simulation and control

    Energy Technology Data Exchange (ETDEWEB)

    Falcao, Djalma M [Universidade Federal, Rio de Janeiro, RJ (Brazil). Coordenacao dos Programas de Pos-graduacao de Engenharia

    1994-12-31

    Recent advances in computer technology will certainly have a great impact in the methodologies used in power system expansion and operational planning as well as in real-time control. Parallel and distributed processing are among the new technologies that present great potential for application in these areas. Parallel computers use multiple functional or processing units to speed up computation while distributed processing computer systems are collection of computers joined together by high speed communication networks having many objectives and advantages. The paper presents some ideas for the use of parallel and distributed processing in power system simulation and control. It also comments on some of the current research work in these topics and presents a summary of the work presently being developed at COPPE. (author) 53 refs., 2 figs.

  11. Spatially parallel processing of within-dimension conjunctions.

    Science.gov (United States)

    Linnell, K J; Humphreys, G W

    2001-01-01

    Within-dimension conjunction search for red-green targets amongst red-blue, and blue-green, nontargets is extremely inefficient (Wolfe et al, 1990 Journal of Experimental Psychology: Human Perception and Performance 16 879-892). We tested whether pairs of red-green conjunction targets can nevertheless be processed spatially in parallel. Participants made speeded detection responses whenever a red-green target was present. Across trials where a second identical target was present, the distribution of detection times was compatible with the assumption that targets were processed in parallel (Miller, 1982 Cognitive Psychology 14 247-279). We show that this was not an artifact of response-competition or feature-based processing. We suggest that within-dimension conjunctions can be processed spatially in parallel. Visual search for such items may be inefficient owing to within-dimension grouping between items.

  12. Decomposition based parallel processing technique for efficient collaborative optimization

    International Nuclear Information System (INIS)

    Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

    2000-01-01

    In practical design studies, most of designers solve multidisciplinary problems with complex design structure. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder original design processes to minimize total cost and time. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology

  13. Parallel transaction processing in functional languages, towards practical functional databases

    NARCIS (Netherlands)

    Wevers, L.; Huisman, Marieke; de Keijzer, Ander

    2013-01-01

    This paper shows how functional languages can be adapted for transaction processing, and discusses the implementation of a parallel runtime system for such functional transaction processing languages. We extend functional languages with current state variables and result state variables to allow the

  14. Parallel programming practical aspects, models and current limitations

    CERN Document Server

    Tarkov, Mikhail S

    2014-01-01

    Parallel programming is designed for the use of parallel computer systems for solving time-consuming problems that cannot be solved on a sequential computer in a reasonable time. These problems can be divided into two classes: 1. Processing large data arrays (including processing images and signals in real time)2. Simulation of complex physical processes and chemical reactions For each of these classes, prospective methods are designed for solving problems. For data processing, one of the most promising technologies is the use of artificial neural networks. Particles-in-cell method and cellular automata are very useful for simulation. Problems of scalability of parallel algorithms and the transfer of existing parallel programs to future parallel computers are very acute now. An important task is to optimize the use of the equipment (including the CPU cache) of parallel computers. Along with parallelizing information processing, it is essential to ensure the processing reliability by the relevant organization ...

  15. Test generation for digital circuits using parallel processing

    Science.gov (United States)

    Hartmann, Carlos R.; Ali, Akhtar-Uz-Zaman M.

    1990-12-01

    The problem of test generation for digital logic circuits is an NP-Hard problem. Recently, the availability of low cost, high performance parallel machines has spurred interest in developing fast parallel algorithms for computer-aided design and test. This report describes a method of applying a 15-valued logic system for digital logic circuit test vector generation in a parallel programming environment. A concept called fault site testing allows for test generation, in parallel, that targets more than one fault at a given location. The multi-valued logic system allows results obtained by distinct processors and/or processes to be merged by means of simple set intersections. A machine-independent description is given for the proposed algorithm.

  16. Optimisation of a parallel ocean general circulation model

    OpenAIRE

    M. I. Beare; D. P. Stevens

    1997-01-01

    International audience; This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by...

  17. Mechatronic Model Based Computed Torque Control of a Parallel Manipulator

    Directory of Open Access Journals (Sweden)

    Zhiyong Yang

    2008-11-01

    Full Text Available With high speed and accuracy the parallel manipulators have wide application in the industry, but there still exist many difficulties in the actual control process because of the time-varying and coupling. Unfortunately, the present-day commercial controlles cannot provide satisfying performance for its single axis linear control only. Therefore, aimed at a novel 2-DOF (Degree of Freedom parallel manipulator called Diamond 600, a motor-mechanism coupling dynamic model based control scheme employing the computed torque control algorithm are presented in this paper. First, the integrated dynamic coupling model is deduced, according to equivalent torques between the mechanical structure and the PM (Permanent Magnetism servomotor. Second, computed torque controller is described in detail for the above proposed model. At last, a series of numerical simulations and experiments are carried out to test the effectiveness of the system, and the results verify the favourable tracking ability and robustness.

  18. Mechatronic Model Based Computed Torque Control of a Parallel Manipulator

    Directory of Open Access Journals (Sweden)

    Zhiyong Yang

    2008-03-01

    Full Text Available With high speed and accuracy the parallel manipulators have wide application in the industry, but there still exist many difficulties in the actual control process because of the time-varying and coupling. Unfortunately, the present-day commercial controlles cannot provide satisfying performance for its single axis linear control only. Therefore, aimed at a novel 2-DOF (Degree of Freedom parallel manipulator called Diamond 600, a motor-mechanism coupling dynamic model based control scheme employing the computed torque control algorithm are presented in this paper. First, the integrated dynamic coupling model is deduced, according to equivalent torques between the mechanical structure and the PM (Permanent Magnetism servomotor. Second, computed torque controller is described in detail for the above proposed model. At last, a series of numerical simulations and experiments are carried out to test the effectiveness of the system, and the results verify the favourable tracking ability and robustness.

  19. Adaptive Dynamic Process Scheduling on Distributed Memory Parallel Computers

    Directory of Open Access Journals (Sweden)

    Wei Shu

    1994-01-01

    Full Text Available One of the challenges in programming distributed memory parallel machines is deciding how to allocate work to processors. This problem is particularly important for computations with unpredictable dynamic behaviors or irregular structures. We present a scheme for dynamic scheduling of medium-grained processes that is useful in this context. The adaptive contracting within neighborhood (ACWN is a dynamic, distributed, load-dependent, and scalable scheme. It deals with dynamic and unpredictable creation of processes and adapts to different systems. The scheme is described and contrasted with two other schemes that have been proposed in this context, namely the randomized allocation and the gradient model. The performance of the three schemes on an Intel iPSC/2 hypercube is presented and analyzed. The experimental results show that even though the ACWN algorithm incurs somewhat larger overhead than the randomized allocation, it achieves better performance in most cases due to its adaptiveness. Its feature of quickly spreading the work helps it outperform the gradient model in performance and scalability.

  20. High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. Research Report. ETS RR-16-34

    Science.gov (United States)

    von Davier, Matthias

    2016-01-01

    This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…

  1. Preliminary Study on the Enhancement of Reconstruction Speed for Emission Computed Tomography Using Parallel Processing

    International Nuclear Information System (INIS)

    Park, Min Jae; Lee, Jae Sung; Kim, Soo Mee; Kang, Ji Yeon; Lee, Dong Soo; Park, Kwang Suk

    2009-01-01

    Conventional image reconstruction uses simplified physical models of projection. However, real physics, for example 3D reconstruction, takes too long time to process all the data in clinic and is unable in a common reconstruction machine because of the large memory for complex physical models. We suggest the realistic distributed memory model of fast-reconstruction using parallel processing on personal computers to enable large-scale technologies. The preliminary tests for the possibility on virtual machines and various performance test on commercial super computer, Tachyon were performed. Expectation maximization algorithm with common 2D projection and realistic 3D line of response were tested. Since the process time was getting slower (max 6 times) after a certain iteration, optimization for compiler was performed to maximize the efficiency of parallelization. Parallel processing of a program on multiple computers was available on Linux with MPICH and NFS. We verified that differences between parallel processed image and single processed image at the same iterations were under the significant digits of floating point number, about 6 bit. Double processors showed good efficiency (1.96 times) of parallel computing. Delay phenomenon was solved by vectorization method using SSE. Through the study, realistic parallel computing system in clinic was established to be able to reconstruct by plenty of memory using the realistic physical models which was impossible to simplify

  2. Leveraging Parallel Data Processing Frameworks with Verified Lifting

    Directory of Open Access Journals (Sweden)

    Maaz Bin Safeer Ahmad

    2016-11-01

    Full Text Available Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific languages that each framework supports. This rewriting–tedious and error-prone–also requires developers to choose the framework that best optimizes performance given a specific workload. This paper describes Casper, a novel compiler that automatically retargets sequential Java code for execution on Hadoop, a parallel data processing framework that implements the MapReduce paradigm. Given a sequential code fragment, Casper uses verified lifting to infer a high-level summary expressed in our program specification language that is then compiled for execution on Hadoop. We demonstrate that Casper automatically translates Java benchmarks into Hadoop. The translated results execute on average 3.3x faster than the sequential implementations and scale better, as well, to larger datasets.

  3. Shared Variable Oriented Parallel Precompiler for SPMD Model

    Institute of Scientific and Technical Information of China (English)

    1995-01-01

    For the moment,commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers,which are just traditional sequential FORTRAN or C compilers expanded with communication statements.Programmers suffer from writing parallel programs with communication statements. The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD(Single Program Multiple Data) computation model and greatly ease the parallel programming with high communication efficiency.The core function of parallel C precompiler has been successfully verified on a transputer-based parallel computer.Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique.

  4. Mobile Devices and GPU Parallelism in Ionospheric Data Processing

    Science.gov (United States)

    Mascharka, D.; Pankratius, V.

    2015-12-01

    Scientific data acquisition in the field is often constrained by data transfer backchannels to analysis environments. Geoscientists are therefore facing practical bottlenecks with increasing sensor density and variety. Mobile devices, such as smartphones and tablets, offer promising solutions to key problems in scientific data acquisition, pre-processing, and validation by providing advanced capabilities in the field. This is due to affordable network connectivity options and the increasing mobile computational power. This contribution exemplifies a scenario faced by scientists in the field and presents the "Mahali TEC Processing App" developed in the context of the NSF-funded Mahali project. Aimed at atmospheric science and the study of ionospheric Total Electron Content (TEC), this app is able to gather data from various dual-frequency GPS receivers. It demonstrates parsing of full-day RINEX files on mobile devices and on-the-fly computation of vertical TEC values based on satellite ephemeris models that are obtained from NASA. Our experiments show how parallel computing on the mobile device GPU enables fast processing and visualization of up to 2 million datapoints in real-time using OpenGL. GPS receiver bias is estimated through minimum TEC approximations that can be interactively adjusted by scientists in the graphical user interface. Scientists can also perform approximate computations for "quickviews" to reduce CPU processing time and memory consumption. In the final stage of our mobile processing pipeline, scientists can upload data to the cloud for further processing. Acknowledgements: The Mahali project (http://mahali.mit.edu) is funded by the NSF INSPIRE grant no. AGS-1343967 (PI: V. Pankratius). We would like to acknowledge our collaborators at Boston College, Virginia Tech, Johns Hopkins University, Colorado State University, as well as the support of UNAVCO for loans of dual-frequency GPS receivers for use in this project, and Intel for loans of

  5. A Parallel Computational Model for Multichannel Phase Unwrapping Problem

    Science.gov (United States)

    Imperatore, Pasquale; Pepe, Antonio; Lanari, Riccardo

    2015-05-01

    In this paper, a parallel model for the solution of the computationally intensive multichannel phase unwrapping (MCh-PhU) problem is proposed. Firstly, the Extended Minimum Cost Flow (EMCF) algorithm for solving MCh-PhU problem is revised within the rigorous mathematical framework of the discrete calculus ; thus permitting to capture its topological structure in terms of meaningful discrete differential operators. Secondly, emphasis is placed on those methodological and practical aspects, which lead to a parallel reformulation of the EMCF algorithm. Thus, a novel dual-level parallel computational model, in which the parallelism is hierarchically implemented at two different (i.e., process and thread) levels, is presented. The validity of our approach has been demonstrated through a series of experiments that have revealed a significant speedup. Therefore, the attained high-performance prototype is suitable for the solution of large-scale phase unwrapping problems in reasonable time frames, with a significant impact on the systematic exploitation of the existing, and rapidly growing, large archives of SAR data.

  6. Simultaneous data pre-processing and SVM classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils.

    Science.gov (United States)

    Devos, Olivier; Downey, Gerard; Duponchel, Ludovic

    2014-04-01

    Classification is an important task in chemometrics. For several years now, support vector machines (SVMs) have proven to be powerful for infrared spectral data classification. However such methods require optimisation of parameters in order to control the risk of overfitting and the complexity of the boundary. Furthermore, it is established that the prediction ability of classification models can be improved using pre-processing in order to remove unwanted variance in the spectra. In this paper we propose a new methodology based on genetic algorithm (GA) for the simultaneous optimisation of SVM parameters and pre-processing (GENOPT-SVM). The method has been tested for the discrimination of the geographical origin of Italian olive oil (Ligurian and non-Ligurian) on the basis of near infrared (NIR) or mid infrared (FTIR) spectra. Different classification models (PLS-DA, SVM with mean centre data, GENOPT-SVM) have been tested and statistically compared using McNemar's statistical test. For the two datasets, SVM with optimised pre-processing give models with higher accuracy than the one obtained with PLS-DA on pre-processed data. In the case of the NIR dataset, most of this accuracy improvement (86.3% compared with 82.8% for PLS-DA) occurred using only a single pre-processing step. For the FTIR dataset, three optimised pre-processing steps are required to obtain SVM model with significant accuracy improvement (82.2%) compared to the one obtained with PLS-DA (78.6%). Furthermore, this study demonstrates that even SVM models have to be developed on the basis of well-corrected spectral data in order to obtain higher classification rates. Copyright © 2013 Elsevier Ltd. All rights reserved.

  7. Using Motivational Interviewing Techniques to Address Parallel Process in Supervision

    Science.gov (United States)

    Giordano, Amanda; Clarke, Philip; Borders, L. DiAnne

    2013-01-01

    Supervision offers a distinct opportunity to experience the interconnection of counselor-client and counselor-supervisor interactions. One product of this network of interactions is parallel process, a phenomenon by which counselors unconsciously identify with their clients and subsequently present to their supervisors in a similar fashion…

  8. Heterogeneous Multicore Parallel Programming for Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Francois Bodin

    2009-01-01

    Full Text Available Hybrid parallel multicore architectures based on graphics processing units (GPUs can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a unintrusive manner while preserving the legacy code.

  9. Parallel algorithms for interactive manipulation of digital terrain models

    Science.gov (United States)

    Davis, E. W.; Mcallister, D. F.; Nagaraj, V.

    1988-01-01

    Interactive three-dimensional graphics applications, such as terrain data representation and manipulation, require extensive arithmetic processing. Massively parallel machines are attractive for this application since they offer high computational rates, and grid connected architectures provide a natural mapping for grid based terrain models. Presented here are algorithms for data movement on the massive parallel processor (MPP) in support of pan and zoom functions over large data grids. It is an extension of earlier work that demonstrated real-time performance of graphics functions on grids that were equal in size to the physical dimensions of the MPP. When the dimensions of a data grid exceed the processing array size, data is packed in the array memory. Windows of the total data grid are interactively selected for processing. Movement of packed data is needed to distribute items across the array for efficient parallel processing. Execution time for data movement was found to exceed that for arithmetic aspects of graphics functions. Performance figures are given for routines written in MPP Pascal.

  10. A Parallel Algebraic Multigrid Solver on Graphics Processing Units

    KAUST Repository

    Haase, Gundolf

    2010-01-01

    The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core. © 2010 Springer-Verlag.

  11. Connectionism, parallel constraint satisfaction processes, and gestalt principles: (re) introducing cognitive dynamics to social psychology.

    Science.gov (United States)

    Read, S J; Vanman, E J; Miller, L C

    1997-01-01

    We argue that recent work in connectionist modeling, in particular the parallel constraint satisfaction processes that are central to many of these models, has great importance for understanding issues of both historical and current concern for social psychologists. We first provide a brief description of connectionist modeling, with particular emphasis on parallel constraint satisfaction processes. Second, we examine the tremendous similarities between parallel constraint satisfaction processes and the Gestalt principles that were the foundation for much of modem social psychology. We propose that parallel constraint satisfaction processes provide a computational implementation of the principles of Gestalt psychology that were central to the work of such seminal social psychologists as Asch, Festinger, Heider, and Lewin. Third, we then describe how parallel constraint satisfaction processes have been applied to three areas that were key to the beginnings of modern social psychology and remain central today: impression formation and causal reasoning, cognitive consistency (balance and cognitive dissonance), and goal-directed behavior. We conclude by discussing implications of parallel constraint satisfaction principles for a number of broader issues in social psychology, such as the dynamics of social thought and the integration of social information within the narrow time frame of social interaction.

  12. Construction of a digital elevation model: methods and parallelization

    International Nuclear Information System (INIS)

    Mazzoni, Christophe

    1995-01-01

    The aim of this work is to reduce the computation time needed to produce the Digital Elevation Models (DEM) by using a parallel machine. It is made in collaboration between the French 'Institut Geographique National' (IGN) and the Laboratoire d'Electronique de Technologie et d'Instrumentation (LETI) of the French Atomic Energy Commission (CEA). The IGN has developed a system which provides DEM that is used to produce topographic maps. The kernel of this system is the correlator, a software which automatically matches pairs of homologous points of a stereo-pair of photographs. Nevertheless the correlator is expensive In computing time. In order to reduce computation time and to produce the DEM with same accuracy that the actual system, we have parallelized the IGN's correlator on the OPENVISION system. This hardware solution uses a SIMD (Single Instruction Multiple Data) parallel machine SYMPATI-2, developed by the LETI that is involved in parallel architecture and image processing. Our analysis of the implementation has demonstrated the difficulty of efficient coupling between scalar and parallel structure. So we propose solutions to reinforce this coupling. In order to accelerate more the processing we evaluate SYMPHONIE, a SIMD calculator, successor of SYMPATI-2. On an other hand, we developed a multi-agent approach for what a MIMD (Multiple Instruction, Multiple Data) architecture is available. At last, we describe a Multi-SIMD architecture that conciliates our two approaches. This architecture offers a capacity to apprehend efficiently multi-level treatment image. It is flexible by its modularity, and its communication network supplies reliability that interest sensible systems. (author) [fr

  13. HPC parallel programming model for gyrokinetic MHD simulation

    International Nuclear Information System (INIS)

    Naitou, Hiroshi; Yamada, Yusuke; Tokuda, Shinji; Ishii, Yasutomo; Yagi, Masatoshi

    2011-01-01

    The 3-dimensional gyrokinetic PIC (particle-in-cell) code for MHD simulation, Gpic-MHD, was installed on SR16000 (“Plasma Simulator”), which is a scalar cluster system consisting of 8,192 logical cores. The Gpic-MHD code advances particle and field quantities in time. In order to distribute calculations over large number of logical cores, the total simulation domain in cylindrical geometry was broken up into N DD-r × N DD-z (number of radial decomposition times number of axial decomposition) small domains including approximately the same number of particles. The axial direction was uniformly decomposed, while the radial direction was non-uniformly decomposed. N RP replicas (copies) of each decomposed domain were used (“particle decomposition”). The hybrid parallelization model of multi-threads and multi-processes was employed: threads were parallelized by the auto-parallelization and N DD-r × N DD-z × N RP processes were parallelized by MPI (message-passing interface). The parallelization performance of Gpic-MHD was investigated for the medium size system of N r × N θ × N z = 1025 × 128 × 128 mesh with 4.196 or 8.192 billion particles. The highest speed for the fixed number of logical cores was obtained for two threads, the maximum number of N DD-z , and optimum combination of N DD-r and N RP . The observed optimum speeds demonstrated good scaling up to 8,192 logical cores. (author)

  14. Parallel processing approach to transform-based image coding

    Science.gov (United States)

    Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.

    1991-06-01

    This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.

  15. Parallel processing based decomposition technique for efficient collaborative optimization

    International Nuclear Information System (INIS)

    Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

    2001-01-01

    In practical design studies, most of designers solve multidisciplinary problems with large sized and complex design system. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder the original design processes to minimize total computational cost. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology

  16. Application of parallel processing for automatic inspection of printed circuits

    International Nuclear Information System (INIS)

    Lougheed, R.M.

    1986-01-01

    Automated visual inspection of printed electronic circuits is a challenging application for image processing systems. Detailed inspection requires high speed analysis of gray scale imagery along with high quality optics, lighting, and sensing equipment. A prototype system has been developed and demonstrated at the Environmental Research Institute of Michigan (ERIM) for inspection of multilayer thick-film circuits. The central problem of real-time image processing is solved by a special-purpose parallel processor which includes a new high-speed Cytocomputer. In this chapter the inspection process and the algorithms used are summarized, along with the functional requirements of the machine vision system. Next, the parallel processor is described in detail and then performance on this application is given

  17. War and peace: morphemes and full forms in a noninteractive activation parallel dual-route model.

    Science.gov (United States)

    Baayen, H; Schreuder, R

    This article introduces a computational tool for modeling the process of morphological segmentation in visual and auditory word recognition in the framework of a parallel dual-route model. Copyright 1999 Academic Press.

  18. Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.

    Science.gov (United States)

    Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele

    2015-01-01

    Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.

  19. Digital intermediate frequency QAM modulator using parallel processing

    Science.gov (United States)

    Pao, Hsueh-Yuan [Livermore, CA; Tran, Binh-Nien [San Ramon, CA

    2008-05-27

    The digital Intermediate Frequency (IF) modulator applies to various modulation types and offers a simple and low cost method to implement a high-speed digital IF modulator using field programmable gate arrays (FPGAs). The architecture eliminates multipliers and sequential processing by storing the pre-computed modulated cosine and sine carriers in ROM look-up-tables (LUTs). The high-speed input data stream is parallel processed using the corresponding LUTs, which reduces the main processing speed, allowing the use of low cost FPGAs.

  20. A dataflow analysis tool for parallel processing of algorithms

    Science.gov (United States)

    Jones, Robert L., III

    1993-01-01

    A graph-theoretic design process and software tool is presented for selecting a multiprocessing scheduling solution for a class of computational problems. The problems of interest are those that can be described using a dataflow graph and are intended to be executed repetitively on a set of identical parallel processors. Typical applications include signal processing and control law problems. Graph analysis techniques are introduced and shown to effectively determine performance bounds, scheduling constraints, and resource requirements. The software tool is shown to facilitate the application of the design process to a given problem.

  1. Image processing with massively parallel computer Quadrics Q1

    International Nuclear Information System (INIS)

    Della Rocca, A.B.; La Porta, L.; Ferriani, S.

    1995-05-01

    Aimed to evaluate the image processing capabilities of the massively parallel computer Quadrics Q1, a convolution algorithm that has been implemented is described in this report. At first the discrete convolution mathematical definition is recalled together with the main Q1 h/w and s/w features. Then the different codification forms of the algorythm are described and the Q1 performances are compared with those obtained by different computers. Finally, the conclusions report on main results and suggestions

  2. Multi states electromechanical switch for energy efficient parallel data processing

    KAUST Repository

    Kloub, Hussam

    2011-04-01

    We present a design, simulation results and fabrication of electromechanical switches enabling parallel data processing and multi functionality. The device is applied in logic gates AND, NOR, XNOR, and Flip-Flops. The device footprint size is 2μm by 0.5μm, and has a pull-in voltage of 5.15V which is verified by FEM simulation. © 2011 IEEE.

  3. Multi states electromechanical switch for energy efficient parallel data processing

    KAUST Repository

    Kloub, Hussam; Smith, Casey; Hussain, Muhammad Mustafa

    2011-01-01

    We present a design, simulation results and fabrication of electromechanical switches enabling parallel data processing and multi functionality. The device is applied in logic gates AND, NOR, XNOR, and Flip-Flops. The device footprint size is 2μm by 0.5μm, and has a pull-in voltage of 5.15V which is verified by FEM simulation. © 2011 IEEE.

  4. Morphological evidence for parallel processing of information in rat macula

    Science.gov (United States)

    Ross, M. D.

    1988-01-01

    Study of montages, tracings and reconstructions prepared from a series of 570 consecutive ultrathin sections shows that rat maculas are morphologically organized for parallel processing of linear acceleratory information. Type II cells of one terminal field distribute information to neighboring terminals as well. The findings are examined in light of physiological data which indicate that macular receptor fields have a preferred directional vector, and are interpreted by analogy to a computer technology known as an information network.

  5. Methods to model-check parallel systems software

    International Nuclear Information System (INIS)

    Matlin, O. S.; McCune, W.; Lusk, E.

    2003-01-01

    We report on an effort to develop methodologies for formal verification of parts of the Multi-Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of communicating processes. While the individual components of the collection execute simple algorithms, their interaction leads to unexpected errors that are difficult to uncover by conventional means. Two verification approaches are discussed here: the standard model checking approach using the software model checker SPIN and the nonstandard use of a general-purpose first-order resolution-style theorem prover OTTER to conduct the traditional state space exploration. We compare modeling methodology and analyze performance and scalability of the two methods with respect to verification of MPD

  6. The parallel processing of EGS4 code on distributed memory scalar parallel computer:Intel Paragon XP/S15-256

    Energy Technology Data Exchange (ETDEWEB)

    Takemiya, Hiroshi; Ohta, Hirofumi; Honma, Ichirou

    1996-03-01

    The parallelization of Electro-Magnetic Cascade Monte Carlo Simulation Code, EGS4 on distributed memory scalar parallel computer: Intel Paragon XP/S15-256 is described. EGS4 has the feature that calculation time for one incident particle is quite different from each other because of the dynamic generation of secondary particles and different behavior of each particle. Granularity for parallel processing, parallel programming model and the algorithm of parallel random number generation are discussed and two kinds of method, each of which allocates particles dynamically or statically, are used for the purpose of realizing high speed parallel processing of this code. Among four problems chosen for performance evaluation, the speedup factors for three problems have been attained to nearly 100 times with 128 processor. It has been found that when both the calculation time for each incident particles and its dispersion are large, it is preferable to use dynamic particle allocation method which can average the load for each processor. And it has also been found that when they are small, it is preferable to use static particle allocation method which reduces the communication overhead. Moreover, it is pointed out that to get the result accurately, it is necessary to use double precision variables in EGS4 code. Finally, the workflow of program parallelization is analyzed and tools for program parallelization through the experience of the EGS4 parallelization are discussed. (author).

  7. Tolerating correlated failures in Massively Parallel Stream Processing Engines

    DEFF Research Database (Denmark)

    Su, L.; Zhou, Y.

    2016-01-01

    Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the o......Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint....... On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE...

  8. Psychodrama: A Creative Approach for Addressing Parallel Process in Group Supervision

    Science.gov (United States)

    Hinkle, Michelle Gimenez

    2008-01-01

    This article provides a model for using psychodrama to address issues of parallel process during group supervision. Information on how to utilize the specific concepts and techniques of psychodrama in relation to group supervision is discussed. A case vignette of the model is provided.

  9. One Factor or Two Parallel Processes? Comorbidity and Development of Adolescent Anxiety and Depressive Disorder Symptoms

    Science.gov (United States)

    Hale, William W., III; Raaijmakers, Quinten A. W.; Muris, Peter; van Hoof, Anne; Meeus, Wim H. J.

    2009-01-01

    Background: This study investigates whether anxiety and depressive disorder symptoms of adolescents from the general community are best described by a model that assumes they are indicative of one general factor or by a model that assumes they are two distinct disorders with parallel growth processes. Additional analyses were conducted to explore…

  10. Computer model of a reverberant and parallel circuit coupling

    Science.gov (United States)

    Kalil, Camila de Andrade; de Castro, Maria Clícia Stelling; Cortez, Célia Martins

    2017-11-01

    The objective of the present study was to deepen the knowledge about the functioning of the neural circuits by implementing a signal transmission model using the Graph Theory in a small network of neurons composed of an interconnected reverberant and parallel circuit, in order to investigate the processing of the signals in each of them and the effects on the output of the network. For this, a program was developed in C language and simulations were done using neurophysiological data obtained in the literature.

  11. Optimisation of a parallel ocean general circulation model

    Science.gov (United States)

    Beare, M. I.; Stevens, D. P.

    1997-10-01

    This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  12. Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-12-31

    This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.

  13. Graphics Processing Unit Enhanced Parallel Document Flocking Clustering

    Energy Technology Data Exchange (ETDEWEB)

    Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL; ST Charles, Jesse Lee [ORNL

    2010-01-01

    Analyzing and clustering documents is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to generate results in a reasonable amount of time. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. In this paper, we have conducted research to exploit this archi- tecture and apply its strengths to the flocking based document clustering problem. Using the CUDA platform from NVIDIA, we developed a doc- ument flocking implementation to be run on the NVIDIA GEFORCE GPU. Performance gains ranged from thirty-six to nearly sixty times improvement of the GPU over the CPU implementation.

  14. Efficient Parallel Statistical Model Checking of Biochemical Networks

    Directory of Open Access Journals (Sweden)

    Paolo Ballarini

    2009-12-01

    Full Text Available We consider the problem of verifying stochastic models of biochemical networks against behavioral properties expressed in temporal logic terms. Exact probabilistic verification approaches such as, for example, CSL/PCTL model checking, are undermined by a huge computational demand which rule them out for most real case studies. Less demanding approaches, such as statistical model checking, estimate the likelihood that a property is satisfied by sampling executions out of the stochastic model. We propose a methodology for efficiently estimating the likelihood that a LTL property P holds of a stochastic model of a biochemical network. As with other statistical verification techniques, the methodology we propose uses a stochastic simulation algorithm for generating execution samples, however there are three key aspects that improve the efficiency: first, the sample generation is driven by on-the-fly verification of P which results in optimal overall simulation time. Second, the confidence interval estimation for the probability of P to hold is based on an efficient variant of the Wilson method which ensures a faster convergence. Third, the whole methodology is designed according to a parallel fashion and a prototype software tool has been implemented that performs the sampling/verification process in parallel over an HPC architecture.

  15. Plagiarism Detection for Indonesian Language using Winnowing with Parallel Processing

    Science.gov (United States)

    Arifin, Y.; Isa, S. M.; Wulandhari, L. A.; Abdurachman, E.

    2018-03-01

    The plagiarism has many forms, not only copy paste but include changing passive become active voice, or paraphrasing without appropriate acknowledgment. It happens on all language include Indonesian Language. There are many previous research that related with plagiarism detection in Indonesian Language with different method. But there are still some part that still has opportunity to improve. This research proposed the solution that can improve the plagiarism detection technique that can detect not only copy paste form but more advance than that. The proposed solution is using Winnowing with some addition process in pre-processing stage. With stemming processing in Indonesian Language and generate fingerprint in parallel processing that can saving time processing and produce the plagiarism result on the suspected document.

  16. Parallel factor analysis PARAFAC of process affected water

    Energy Technology Data Exchange (ETDEWEB)

    Ewanchuk, A.M.; Ulrich, A.C.; Sego, D. [Alberta Univ., Edmonton, AB (Canada). Dept. of Civil and Environmental Engineering; Alostaz, M. [Thurber Engineering Ltd., Calgary, AB (Canada)

    2010-07-01

    A parallel factor analysis (PARAFAC) of oil sands process-affected water was presented. Naphthenic acids (NA) are traditionally described as monobasic carboxylic acids. Research has indicated that oil sands NA do not fit classical definitions of NA. Oil sands organic acids have toxic and corrosive properties. When analyzed by fluorescence technology, oil sands process-affected water displays a characteristic peak at 290 nm excitation and approximately 346 nm emission. In this study, a parallel factor analysis (PARAFAC) was used to decompose process-affected water multi-way data into components representing analytes, chemical compounds, and groups of compounds. Water samples from various oil sands operations were analyzed in order to obtain EEMs. The EEMs were then arranged into a large matrix in decreasing process-affected water content for PARAFAC. Data were divided into 5 components. A comparison with commercially prepared NA samples suggested that oil sands NA is fundamentally different. Further research is needed to determine what each of the 5 components represent. tabs., figs.

  17. Parallel Processing of Images in Mobile Devices using BOINC

    Science.gov (United States)

    Curiel, Mariela; Calle, David F.; Santamaría, Alfredo S.; Suarez, David F.; Flórez, Leonardo

    2018-04-01

    Medical image processing helps health professionals make decisions for the diagnosis and treatment of patients. Since some algorithms for processing images require substantial amounts of resources, one could take advantage of distributed or parallel computing. A mobile grid can be an adequate computing infrastructure for this problem. A mobile grid is a grid that includes mobile devices as resource providers. In a previous step of this research, we selected BOINC as the infrastructure to build our mobile grid. However, parallel processing of images in mobile devices poses at least two important challenges: the execution of standard libraries for processing images and obtaining adequate performance when compared to desktop computers grids. By the time we started our research, the use of BOINC in mobile devices also involved two issues: a) the execution of programs in mobile devices required to modify the code to insert calls to the BOINC API, and b) the division of the image among the mobile devices as well as its merging required additional code in some BOINC components. This article presents answers to these four challenges.

  18. Parallel Processing of Images in Mobile Devices using BOINC

    Directory of Open Access Journals (Sweden)

    Curiel Mariela

    2018-04-01

    Full Text Available Medical image processing helps health professionals make decisions for the diagnosis and treatment of patients. Since some algorithms for processing images require substantial amounts of resources, one could take advantage of distributed or parallel computing. A mobile grid can be an adequate computing infrastructure for this problem. A mobile grid is a grid that includes mobile devices as resource providers. In a previous step of this research, we selected BOINC as the infrastructure to build our mobile grid. However, parallel processing of images in mobile devices poses at least two important challenges: the execution of standard libraries for processing images and obtaining adequate performance when compared to desktop computers grids. By the time we started our research, the use of BOINC in mobile devices also involved two issues: a the execution of programs in mobile devices required to modify the code to insert calls to the BOINC API, and b the division of the image among the mobile devices as well as its merging required additional code in some BOINC components. This article presents answers to these four challenges.

  19. Peformance Tuning and Evaluation of a Parallel Community Climate Model

    Energy Technology Data Exchange (ETDEWEB)

    Drake, J.B.; Worley, P.H.; Hammond, S.

    1999-11-13

    The Parallel Community Climate Model (PCCM) is a message-passing parallelization of version 2.1 of the Community Climate Model (CCM) developed by researchers at Argonne and Oak Ridge National Laboratories and at the National Center for Atmospheric Research in the early to mid 1990s. In preparation for use in the Department of Energy's Parallel Climate Model (PCM), PCCM has recently been updated with new physics routines from version 3.2 of the CCM, improvements to the parallel implementation, and ports to the SGIKray Research T3E and Origin 2000. We describe our experience in porting and tuning PCCM on these new platforms, evaluating the performance of different parallel algorithm options and comparing performance between the T3E and Origin 2000.

  20. Parallel processing of Monte Carlo code MCNP for particle transport problem

    Energy Technology Data Exchange (ETDEWEB)

    Higuchi, Kenji; Kawasaki, Takuji

    1996-06-01

    It is possible to vectorize or parallelize Monte Carlo codes (MC code) for photon and neutron transport problem, making use of independency of the calculation for each particle. Applicability of existing MC code to parallel processing is mentioned. As for parallel computer, we have used both vector-parallel processor and scalar-parallel processor in performance evaluation. We have made (i) vector-parallel processing of MCNP code on Monte Carlo machine Monte-4 with four vector processors, (ii) parallel processing on Paragon XP/S with 256 processors. In this report we describe the methodology and results for parallel processing on two types of parallel or distributed memory computers. In addition, we mention the evaluation of parallel programming environments for parallel computers used in the present work as a part of the work developing STA (Seamless Thinking Aid) Basic Software. (author)

  1. GPU: the biggest key processor for AI and parallel processing

    Science.gov (United States)

    Baji, Toru

    2017-07-01

    Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.

  2. Z-buffer image assembly processing in high parallel visualization processing

    International Nuclear Information System (INIS)

    Kaneko, Isamu; Muramatsu, Kazuhiro

    2000-03-01

    On the platform of the parallel computer with many processors, the domain decomposition method is used as a popular means of parallel processing. In these days when the simulation scale becomes much larger and takes a lot of time, the simultaneous visualization processing with the actual computation is much more needed, and especially in case of a real-time visualization, the domain decomposition technique is indispensable. In case of parallel rendering processing, the rendered results must be gathered to one processor to compose the integrated picture in the last stage. This integration is usually conducted by the method using Z-buffer values. This process, however, induces the crucial problems of much lower speed processing and local memory shortage in case of parallel processing exceeding more than several tens of processors. In this report, the two new solutions are proposed. The one is the adoption of a special operator (Reduce operator) in the parallelization process, and the other is a buffer compression by deleting the background informations. This report includes the performance results of these new techniques to investigate their effect with use of the parallel computer Paragon. (author)

  3. Parallel Distributed Processing at 25: Further Explorations in the Microstructure of Cognition

    Science.gov (United States)

    Rogers, Timothy T.; McClelland, James L.

    2014-01-01

    This paper introduces a special issue of "Cognitive Science" initiated on the 25th anniversary of the publication of "Parallel Distributed Processing" (PDP), a two-volume work that introduced the use of neural network models as vehicles for understanding cognition. The collection surveys the core commitments of the PDP…

  4. Parallel community climate model: Description and user`s guide

    Energy Technology Data Exchange (ETDEWEB)

    Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H. [and others

    1996-07-15

    This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.

  5. Parallel processing is good for your scientific codes...But massively parallel processing is so much better

    International Nuclear Information System (INIS)

    Thomas, B.; Domain, Ch.; Souffez, Y.; Eon-Duval, P.

    1998-01-01

    Harnessing the power of many computers, to solve concurrently difficult scientific problems, is one of the most innovative trend in High Performance Computing. At EDF, we have invested in parallel computing and have achieved significant results. First we improved the processing speed of strategic codes, in order to extend their scope. Then we turned to numerical simulations at the atomic scale. These computations, we never dreamt of before, provided us with a better understanding of metallurgic phenomena. More precisely we were able to trace defects in alloys that are used in nuclear power plants. (author)

  6. Climate Ocean Modeling on Parallel Computers

    Science.gov (United States)

    Wang, P.; Cheng, B. N.; Chao, Y.

    1998-01-01

    Ocean modeling plays an important role in both understanding the current climatic conditions and predicting future climate change. However, modeling the ocean circulation at various spatial and temporal scales is a very challenging computational task.

  7. Models of parallel computation :a survey and classification

    Institute of Scientific and Technical Information of China (English)

    ZHANG Yunquan; CHEN Guoliang; SUN Guangzhong; MIAO Qiankun

    2007-01-01

    In this paper,the state-of-the-art parallel computational model research is reviewed.We will introduce various models that were developed during the past decades.According to their targeting architecture features,especially memory organization,we classify these parallel computational models into three generations.These models and their characteristics are discussed based on three generations classification.We believe that with the ever increasing speed gap between the CPU and memory systems,incorporating non-uniform memory hierarchy into computational models will become unavoidable.With the emergence of multi-core CPUs,the parallelism hierarchy of current computing platforms becomes more and more complicated.Describing this complicated parallelism hierarchy in future computational models becomes more and more important.A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity,thus allowing more complicated models with more parameters to be adopted.Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research.

  8. MASSIVELY PARALLEL LATENT SEMANTIC ANALYSES USING A GRAPHICS PROCESSING UNIT

    Energy Technology Data Exchange (ETDEWEB)

    Cavanagh, J.; Cui, S.

    2009-01-01

    Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using Singular Value Decomposition. However, with the ever-expanding size of datasets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. A graphics processing unit (GPU) can solve some highly parallel problems much faster than a traditional sequential processor or central processing unit (CPU). Thus, a deployable system using a GPU to speed up large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a PC cluster. Due to the GPU’s application-specifi c architecture, harnessing the GPU’s computational prowess for LSA is a great challenge. We presented a parallel LSA implementation on the GPU, using NVIDIA® Compute Unifi ed Device Architecture and Compute Unifi ed Basic Linear Algebra Subprograms software. The performance of this implementation is compared to traditional LSA implementation on a CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1 000x1 000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran fi ve to six times faster than the CPU version. The large variation is due to architectural benefi ts of the GPU for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.

  9. Parallel Task Processing on a Multicore Platform in a PC-based Control System for Parallel Kinematics

    Directory of Open Access Journals (Sweden)

    Harald Michalik

    2009-02-01

    Full Text Available Multicore platforms are such that have one physical processor chip with multiple cores interconnected via a chip level bus. Because they deliver a greater computing power through concurrency, offer greater system density multicore platforms provide best qualifications to address the performance bottleneck encountered in PC-based control systems for parallel kinematic robots with heavy CPU-load. Heavy load control tasks are generated by new control approaches that include features like singularity prediction, structure control algorithms, vision data integration and similar tasks. In this paper we introduce the parallel task scheduling extension of a communication architecture specially tailored for the development of PC-based control of parallel kinematics. The Sche-duling is specially designed for the processing on a multicore platform. It breaks down the serial task processing of the robot control cycle and extends it with parallel task processing paths in order to enhance the overall control performance.

  10. Parallel-Batch Scheduling with Two Models of Deterioration to Minimize the Makespan

    Directory of Open Access Journals (Sweden)

    Cuixia Miao

    2014-01-01

    Full Text Available We consider the bounded parallel-batch scheduling with two models of deterioration, in which the processing time of the first model is pj=aj+αt and of the second model is pj=a+αjt. The objective is to minimize the makespan. We present O(n log n time algorithms for the single-machine problems, respectively. And we propose fully polynomial time approximation schemes to solve the identical-parallel-machine problem and uniform-parallel-machine problem, respectively.

  11. A parallelized three-dimensional cellular automaton model for grain growth during additive manufacturing

    Science.gov (United States)

    Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.

    2018-01-01

    In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.

  12. A parallelized three-dimensional cellular automaton model for grain growth during additive manufacturing

    Science.gov (United States)

    Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.

    2018-05-01

    In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.

  13. Fraud Detection in Credit Card Transactions; Using Parallel Processing of Anomalies in Big Data

    Directory of Open Access Journals (Sweden)

    Mohammad Reza Taghva

    2016-10-01

    Full Text Available In parallel to the increasing use of electronic cards, especially in the banking industry, the volume of transactions using these cards has grown rapidly. Moreover, the financial nature of these cards has led to the desirability of fraud in this area. The present study with Map Reduce approach and parallel processing, applied the Kohonen neural network model to detect abnormalities in bank card transactions. For this purpose, firstly it was proposed to classify all transactions into the fraudulent and legal which showed better performance compared with other methods. In the next step, we transformed the Kohonen model into the form of parallel task which demonstrated appropriate performance in terms of time; as expected to be well implemented in transactions with Big Data assumptions.

  14. Parallel asynchronous hardware implementation of image processing algorithms

    Science.gov (United States)

    Coon, Darryl D.; Perera, A. G. U.

    1990-01-01

    Research is being carried out on hardware for a new approach to focal plane processing. The hardware involves silicon injection mode devices. These devices provide a natural basis for parallel asynchronous focal plane image preprocessing. The simplicity and novel properties of the devices would permit an independent analog processing channel to be dedicated to every pixel. A laminar architecture built from arrays of the devices would form a two-dimensional (2-D) array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuron-like asynchronous pulse-coded form through the laminar processor. No multiplexing, digitization, or serial processing would occur in the preprocessing state. High performance is expected, based on pulse coding of input currents down to one picoampere with noise referred to input of about 10 femtoamperes. Linear pulse coding has been observed for input currents ranging up to seven orders of magnitude. Low power requirements suggest utility in space and in conjunction with very large arrays. Very low dark current and multispectral capability are possible because of hardware compatibility with the cryogenic environment of high performance detector arrays. The aforementioned hardware development effort is aimed at systems which would integrate image acquisition and image processing.

  15. Optimisation of a parallel ocean general circulation model

    Directory of Open Access Journals (Sweden)

    M. I. Beare

    1997-10-01

    Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  16. Optimisation of a parallel ocean general circulation model

    Directory of Open Access Journals (Sweden)

    M. I. Beare

    Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  17. Modeling and Control of Primary Parallel Isolated Boost Converter

    DEFF Research Database (Denmark)

    Mira Albert, Maria del Carmen; Hernandez Botella, Juan Carlos; Sen, Gökhan

    2012-01-01

    In this paper state space modeling and closed loop controlled operation have been presented for primary parallel isolated boost converter (PPIBC) topology as a battery charging unit. Parasitic resistances have been included to have an accurate dynamic model. The accuracy of the model has been...

  18. Petascale Hierarchical Modeling VIA Parallel Execution

    Energy Technology Data Exchange (ETDEWEB)

    Gelman, Andrew [Principal Investigator

    2014-04-14

    The research allows more effective model building. By allowing researchers to fit complex models to large datasets in a scalable manner, our algorithms and software enable more effective scientific research. In the new area of “big data,” it is often necessary to fit “big models” to adjust for systematic differences between sample and population. For this task, scalable and efficient model-fitting tools are needed, and these have been achieved with our new Hamiltonian Monte Carlo algorithm, the no-U-turn sampler, and our new C++ program, Stan. In layman’s terms, our research enables researchers to create improved mathematical modes for large and complex systems.

  19. Work stressors, sleep quality, and alcohol-related problems across deployment: A parallel process latent growth modeling approach among Navy members.

    Science.gov (United States)

    Bravo, Adrian J; Kelley, Michelle L; Hollis, Brittany F

    2017-10-01

    This study examined how work stressors were associated with sleep quality and alcohol-related problems among U.S. Navy members over the course of deployment. Participants were 101 U.S. Navy members assigned to an Arleigh Burke-class destroyer who experienced an 8-month deployment after Operational Enduring Freedom/Operation Iraqi Freedom. Approximately 6 weeks prior to deployment, 6 weeks after deployment, and 6 months reintegration, participants completed measures that assessed work stressors, sleep quality, and alcohol-related problems. A piecewise latent growth model was conducted in which the structural paths assessed if work stressors influenced sleep quality or its growth over time, and in turn if sleep quality influenced alcohol-related problems intercepts or growth over time. A significant indirect effect was found such that increases in work stressors from pre- to postdeployment predicted decreases in sleep quality, which in turn were associated with increases in alcohol-related problems from pre- to postdeployment. These effects were maintained from postdeployment through the 6-month reintegration. Findings suggest that work stressors may have important implications for sleep quality and alcohol-related problems. Positive methods of addressing stress and techniques to improve sleep quality are needed as both may be associated with alcohol-related problems among current Navy members. Copyright © 2016 John Wiley & Sons, Ltd.

  20. Sequential and Parallel Attack Tree Modelling

    NARCIS (Netherlands)

    Arnold, Florian; Guck, Dennis; Kumar, Rajesh; Stoelinga, Mariëlle Ida Antoinette; Koornneef, Floor; van Gulijk, Coen

    The intricacy of socio-technical systems requires a careful planning and utilisation of security resources to ensure uninterrupted, secure and reliable services. Even though many studies have been conducted to understand and model the behaviour of a potential attacker, the detection of crucial

  1. Category specific spatial dissociations of parallel processes underlying visual naming.

    Science.gov (United States)

    Conner, Christopher R; Chen, Gang; Pieters, Thomas A; Tandon, Nitin

    2014-10-01

    The constituent elements and dynamics of the networks responsible for word production are a central issue to understanding human language. Of particular interest is their dependency on lexical category, particularly the possible segregation of nouns and verbs into separate processing streams. We applied a novel mixed-effects, multilevel analysis to electrocorticographic data collected from 19 patients (1942 electrodes) to examine the activity of broadly disseminated cortical networks during the retrieval of distinct lexical categories. This approach was designed to overcome the issues of sparse sampling and individual variability inherent to invasive electrophysiology. Both noun and verb generation evoked overlapping, yet distinct nonhierarchical processes favoring ventral and dorsal visual streams, respectively. Notable differences in activity patterns were noted in Broca's area and superior lateral temporo-occipital regions (verb > noun) and in parahippocampal and fusiform cortices (noun > verb). Comparisons with functional magnetic resonance imaging (fMRI) results yielded a strong correlation of blood oxygen level-dependent signal and gamma power and an independent estimate of group size needed for fMRI studies of cognition. Our findings imply parallel, lexical category-specific processes and reconcile discrepancies between lesional and functional imaging studies. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. Parallel Hyperspectral Image Processing on Distributed Multi-Cluster Systems

    NARCIS (Netherlands)

    Liu, F.; Seinstra, F.J.; Plaza, A.J.

    2011-01-01

    Computationally efficient processing of hyperspectral image cubes can be greatly beneficial in many application domains, including environmental modeling, risk/hazard prevention and response, and defense/security. As individual cluster computers often cannot satisfy the computational demands of

  3. Regional-scale calculation of the LS factor using parallel processing

    Science.gov (United States)

    Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

    2015-05-01

    With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.

  4. Parallelization Experience with Four Canonical Econometric Models Using ParMitISEM

    Directory of Open Access Journals (Sweden)

    Nalan Baştürk

    2016-03-01

    Full Text Available This paper presents the parallel computing implementation of the MitISEM algorithm, labeled Parallel MitISEM. The basic MitISEM algorithm provides an automatic and flexible method to approximate a non-elliptical target density using adaptive mixtures of Student-t densities, where only a kernel of the target density is required. The approximation can be used as a candidate density in Importance Sampling or Metropolis Hastings methods for Bayesian inference on model parameters and probabilities. We present and discuss four canonical econometric models using a Graphics Processing Unit and a multi-core Central Processing Unit version of the MitISEM algorithm. The results show that the parallelization of the MitISEM algorithm on Graphics Processing Units and multi-core Central Processing Units is straightforward and fast to program using MATLAB. Moreover the speed performance of the Graphics Processing Unit version is much higher than the Central Processing Unit one.

  5. Performance of Air Pollution Models on Massively Parallel Computers

    DEFF Research Database (Denmark)

    Brown, John; Hansen, Per Christian; Wasniewski, Jerzy

    1996-01-01

    To compare the performance and use of three massively parallel SIMD computers, we implemented a large air pollution model on the computers. Using a realistic large-scale model, we gain detailed insight about the performance of the three computers when used to solve large-scale scientific problems...

  6. Tutorial: Parallel Computing of Simulation Models for Risk Analysis.

    Science.gov (United States)

    Reilly, Allison C; Staid, Andrea; Gao, Michael; Guikema, Seth D

    2016-10-01

    Simulation models are widely used in risk analysis to study the effects of uncertainties on outcomes of interest in complex problems. Often, these models are computationally complex and time consuming to run. This latter point may be at odds with time-sensitive evaluations or may limit the number of parameters that are considered. In this article, we give an introductory tutorial focused on parallelizing simulation code to better leverage modern computing hardware, enabling risk analysts to better utilize simulation-based methods for quantifying uncertainty in practice. This article is aimed primarily at risk analysts who use simulation methods but do not yet utilize parallelization to decrease the computational burden of these models. The discussion is focused on conceptual aspects of embarrassingly parallel computer code and software considerations. Two complementary examples are shown using the languages MATLAB and R. A brief discussion of hardware considerations is located in the Appendix. © 2016 Society for Risk Analysis.

  7. Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

    Science.gov (United States)

    Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

    2016-04-01

    Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and

  8. Parallel processing using an optical delay-based reservoir computer

    Science.gov (United States)

    Van der Sande, Guy; Nguimdo, Romain Modeste; Verschaffelt, Guy

    2016-04-01

    Delay systems subject to delayed optical feedback have recently shown great potential in solving computationally hard tasks. By implementing a neuro-inspired computational scheme relying on the transient response to optical data injection, high processing speeds have been demonstrated. However, reservoir computing systems based on delay dynamics discussed in the literature are designed by coupling many different stand-alone components which lead to bulky, lack of long-term stability, non-monolithic systems. Here we numerically investigate the possibility of implementing reservoir computing schemes based on semiconductor ring lasers. Semiconductor ring lasers are semiconductor lasers where the laser cavity consists of a ring-shaped waveguide. SRLs are highly integrable and scalable, making them ideal candidates for key components in photonic integrated circuits. SRLs can generate light in two counterpropagating directions between which bistability has been demonstrated. We demonstrate that two independent machine learning tasks , even with different nature of inputs with different input data signals can be simultaneously computed using a single photonic nonlinear node relying on the parallelism offered by photonics. We illustrate the performance on simultaneous chaotic time series prediction and a classification of the Nonlinear Channel Equalization. We take advantage of different directional modes to process individual tasks. Each directional mode processes one individual task to mitigate possible crosstalk between the tasks. Our results indicate that prediction/classification with errors comparable to the state-of-the-art performance can be obtained even with noise despite the two tasks being computed simultaneously. We also find that a good performance is obtained for both tasks for a broad range of the parameters. The results are discussed in detail in [Nguimdo et al., IEEE Trans. Neural Netw. Learn. Syst. 26, pp. 3301-3307, 2015

  9. Parallel coupling of symmetric and asymmetric exclusion processes

    International Nuclear Information System (INIS)

    Tsekouras, K; Kolomeisky, A B

    2008-01-01

    A system consisting of two parallel coupled channels where particles in one of them follow the rules of totally asymmetric exclusion processes (TASEP) and in another one move as in symmetric simple exclusion processes (SSEP) is investigated theoretically. Particles interact with each other via hard-core exclusion potential, and in the asymmetric channel they can only hop in one direction, while on the symmetric lattice particles jump in both directions with equal probabilities. Inter-channel transitions are also allowed at every site of both lattices. Stationary state properties of the system are solved exactly in the limit of strong couplings between the channels. It is shown that strong symmetric couplings between totally asymmetric and symmetric channels lead to an effective partially asymmetric simple exclusion process (PASEP) and properties of both channels become almost identical. However, strong asymmetric couplings between symmetric and asymmetric channels yield an effective TASEP with nonzero particle flux in the asymmetric channel and zero flux on the symmetric lattice. For intermediate strength of couplings between the lattices a vertical-cluster mean-field method is developed. This approximate approach treats exactly particle dynamics during the vertical transitions between the channels and it neglects the correlations along the channels. Our calculations show that in all cases there are three stationary phases defined by particle dynamics at entrances, at exits or in the bulk of the system, while phase boundaries depend on the strength and symmetry of couplings between the channels. Extensive Monte Carlo computer simulations strongly support our theoretical predictions. Theoretical calculations and computer simulations predict that inter-channel couplings have a strong effect on stationary properties. It is also argued that our results might be relevant for understanding multi-particle dynamics of motor proteins

  10. Parallel Algorithm of Geometrical Hashing Based on NumPy Package and Processes Pool

    Directory of Open Access Journals (Sweden)

    Klyachin Vladimir Aleksandrovich

    2015-10-01

    Full Text Available The article considers the problem of multi-dimensional geometric hashing. The paper describes a mathematical model of geometric hashing and considers an example of its use in localization problems for the point. A method of constructing the corresponding hash matrix by parallel algorithm is considered. In this paper an algorithm of parallel geometric hashing using a development pattern «pool processes» is proposed. The implementation of the algorithm is executed using the Python programming language and NumPy package for manipulating multidimensional data. To implement the process pool it is proposed to use a class Process Pool Executor imported from module concurrent.futures, which is included in the distribution of the interpreter Python since version 3.2. All the solutions are presented in the paper by corresponding UML class diagrams. Designed GeomNash package includes classes Data, Result, GeomHash, Job. The results of the developed program presents the corresponding graphs. Also, the article presents the theoretical justification for the application process pool for the implementation of parallel algorithms. It is obtained condition t2 > (p/(p-1*t1 of the appropriateness of process pool. Here t1 - the time of transmission unit of data between processes, and t2 - the time of processing unit data by one processor.

  11. Modelling and parallel calculation of a kinetic boundary layer

    International Nuclear Information System (INIS)

    Perlat, Jean Philippe

    1998-01-01

    This research thesis aims at addressing reliability and cost issues in the calculation by numeric simulation of flows in transition regime. The first step has been to reduce calculation cost and memory space for the Monte Carlo method which is known to provide performance and reliability for rarefied regimes. Vector and parallel computers allow this objective to be reached. Here, a MIMD (multiple instructions, multiple data) machine has been used which implements parallel calculation at different levels of parallelization. Parallelization procedures have been adapted, and results showed that parallelization by calculation domain decomposition was far more efficient. Due to reliability issue related to the statistic feature of Monte Carlo methods, a new deterministic model was necessary to simulate gas molecules in transition regime. New models and hyperbolic systems have therefore been studied. One is chosen which allows thermodynamic values (density, average velocity, temperature, deformation tensor, heat flow) present in Navier-Stokes equations to be determined, and the equations of evolution of thermodynamic values are described for the mono-atomic case. Numerical resolution of is reported. A kinetic scheme is developed which complies with the structure of all systems, and which naturally expresses boundary conditions. The validation of the obtained 14 moment-based model is performed on shock problems and on Couette flows [fr

  12. A hybrid parallel framework for the cellular Potts model simulations

    Energy Technology Data Exchange (ETDEWEB)

    Jiang, Yi [Los Alamos National Laboratory; He, Kejing [SOUTH CHINA UNIV; Dong, Shoubin [SOUTH CHINA UNIV

    2009-01-01

    The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).

  13. Parallelization of elliptic solver for solving 1D Boussinesq model

    Science.gov (United States)

    Tarwidi, D.; Adytia, D.

    2018-03-01

    In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.

  14. Badlands: A parallel basin and landscape dynamics model

    Directory of Open Access Journals (Sweden)

    T. Salles

    2016-01-01

    Full Text Available Over more than three decades, a number of numerical landscape evolution models (LEMs have been developed to study the combined effects of climate, sea-level, tectonics and sediments on Earth surface dynamics. Most of them are written in efficient programming languages, but often cannot be used on parallel architectures. Here, I present a LEM which ports a common core of accepted physical principles governing landscape evolution into a distributed memory parallel environment. Badlands (acronym for BAsin anD LANdscape DynamicS is an open-source, flexible, TIN-based landscape evolution model, built to simulate topography development at various space and time scales.

  15. A tomograph VMEbus parallel processing data acquisition system

    International Nuclear Information System (INIS)

    Atkins, M.S.; Wilkinson, N.A.; Rogers, J.G.

    1988-11-01

    This paper describes a VME based data acquisition system suitable for the development of Positron Volume Imaging tomographs which use 3-D data for improved image resolution over slice-oriented tomographs. The data acquisition must be flexible enough to accommodate several 3-D reconstruction algorithms; hence, a software-based system is most suitable. Furthermore, because of the increased dimensions and resolution of volume imaging tomographs, the raw data event rate is greater than that of slice-oriented machines. These dual requirements are met by our data acquisition systems. Flexibility is achieved through an array of processors connected over a VMEbus, operating asynchronously and in parallel. High raw data throughput is achieved using a dedicated high speed data transfer device available for the VMEbus. The device can attain a raw data rate of 2.5 million coincidence events per second for raw events per second for raw events which are 64 bits wide. Real-time data acquisition and pre-processing requirements can be met by about forty 20 MHz Motorola 68020/68881 processors

  16. Performance modeling of parallel algorithms for solving neutron diffusion problems

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Kirk, B.L.

    1995-01-01

    Neutron diffusion calculations are the most common computational methods used in the design, analysis, and operation of nuclear reactors and related activities. Here, mathematical performance models are developed for the parallel algorithm used to solve the neutron diffusion equation on message passing and shared memory multiprocessors represented by the Intel iPSC/860 and the Sequent Balance 8000, respectively. The performance models are validated through several test problems, and these models are used to estimate the performance of each of the two considered architectures in situations typical of practical applications, such as fine meshes and a large number of participating processors. While message passing computers are capable of producing speedup, the parallel efficiency deteriorates rapidly as the number of processors increases. Furthermore, the speedup fails to improve appreciably for massively parallel computers so that only small- to medium-sized message passing multiprocessors offer a reasonable platform for this algorithm. In contrast, the performance model for the shared memory architecture predicts very high efficiency over a wide range of number of processors reasonable for this architecture. Furthermore, the model efficiency of the Sequent remains superior to that of the hypercube if its model parameters are adjusted to make its processors as fast as those of the iPSC/860. It is concluded that shared memory computers are better suited for this parallel algorithm than message passing computers

  17. The Temporal Dynamics of Visual Search: Evidence for Parallel Processing in Feature and Conjunction Searches

    Science.gov (United States)

    McElree, Brian; Carrasco, Marisa

    2012-01-01

    Feature and conjunction searches have been argued to delineate parallel and serial operations in visual processing. The authors evaluated this claim by examining the temporal dynamics of the detection of features and conjunctions. The 1st experiment used a reaction time (RT) task to replicate standard mean RT patterns and to examine the shapes of the RT distributions. The 2nd experiment used the response-signal speed–accuracy trade-off (SAT) procedure to measure discrimination (asymptotic detection accuracy) and detection speed (processing dynamics). Set size affected discrimination in both feature and conjunction searches but affected detection speed only in the latter. Fits of models to the SAT data that included a serial component overpredicted the magnitude of the observed dynamics differences. The authors concluded that both features and conjunctions are detected in parallel. Implications for the role of attention in visual processing are discussed. PMID:10641310

  18. Parallel and distributed processing in two SGBDS: A case study

    OpenAIRE

    Francisco Javier Moreno; Nataly Castrillón Charari; Camilo Taborda Zuluaga

    2017-01-01

    Context: One of the strategies for managing large volumes of data is distributed and parallel computing. Among the tools that allow applying these characteristics are some Data Base Management Systems (DBMS), such as Oracle, DB2, and SQL Server. Method: In this paper we present a case study where we evaluate the performance of an SQL query in two of these DBMS. The evaluation is done through various forms of data distribution in a computer network with different degrees of parallelism. ...

  19. Hybrid parallel execution model for logic-based specification languages

    CERN Document Server

    Tsai, Jeffrey J P

    2001-01-01

    Parallel processing is a very important technique for improving the performance of various software development and maintenance activities. The purpose of this book is to introduce important techniques for parallel executation of high-level specifications of software systems. These techniques are very useful for the construction, analysis, and transformation of reliable large-scale and complex software systems. Contents: Current Approaches; Overview of the New Approach; FRORL Requirements Specification Language and Its Decomposition; Rewriting and Data Dependency, Control Flow Analysis of a Lo

  20. Boltzmann machines as a model for parallel annealing

    NARCIS (Netherlands)

    Aarts, E.H.L.; Korst, J.H.M.

    1991-01-01

    The potential of Boltzmann machines to cope with difficult combinatorial optimization problems is investigated. A discussion of various (parallel) models of Boltzmann machines is given based on the theory of Markov chains. A general strategy is presented for solving (approximately) combinatorial

  1. Methods and models for the construction of weakly parallel tests

    NARCIS (Netherlands)

    Adema, J.J.; Adema, Jos J.

    1992-01-01

    Several methods are proposed for the construction of weakly parallel tests [i.e., tests with the same test information function (TIF)]. A mathematical programming model that constructs tests containing a prespecified TIF and a heuristic that assigns items to tests with information functions that are

  2. Two Phase Flow Split Model for Parallel Channels | Iloeje | Nigerian ...

    African Journals Online (AJOL)

    The model and code are capable of handling single and two phase flows, steady states and transients, up to ten parallel flow paths, simple and complicated geometries, including the boilers of fossil steam generators and nuclear power plants. A test calculation has been made with a simplified three-channel system ...

  3. Methods and models for the construction of weakly parallel tests

    NARCIS (Netherlands)

    Adema, J.J.; Adema, Jos J.

    1990-01-01

    Methods are proposed for the construction of weakly parallel tests, that is, tests with the same test information function. A mathematical programing model for constructing tests with a prespecified test information function and a heuristic for assigning items to tests such that their information

  4. Application of parallel computing to seismic damage process simulation of an arch dam

    International Nuclear Information System (INIS)

    Zhong Hong; Lin Gao; Li Jianbo

    2010-01-01

    The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.

  5. Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Sarje, Abhinav [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jacobsen, Douglas W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Williams, Samuel W. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ringler, Todd [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2016-05-01

    The incorporation of increasing core counts in modern processors used to build state-of-the-art supercomputers is driving application development towards exploitation of thread parallelism, in addition to distributed memory parallelism, with the goal of delivering efficient high-performance codes. In this work we describe the exploitation of threading and our experiences with it with respect to a real-world ocean modeling application code, MPAS-Ocean. We present detailed performance analysis and comparisons of various approaches and configurations for threading on the Cray XC series supercomputers.

  6. Modeling and optimization of parallel and distributed embedded systems

    CERN Document Server

    Munir, Arslan; Ranka, Sanjay

    2016-01-01

    This book introduces the state-of-the-art in research in parallel and distributed embedded systems, which have been enabled by developments in silicon technology, micro-electro-mechanical systems (MEMS), wireless communications, computer networking, and digital electronics. These systems have diverse applications in domains including military and defense, medical, automotive, and unmanned autonomous vehicles. The emphasis of the book is on the modeling and optimization of emerging parallel and distributed embedded systems in relation to the three key design metrics of performance, power and dependability.

  7. Hardware system of parallel processing for fast CT image reconstruction based on circular shifting float memory architecture

    International Nuclear Information System (INIS)

    Wang Shi; Kang Kejun; Wang Jingjin

    1995-01-01

    Computerized Tomography (CT) is expected to become an inevitable diagnostic technique in the future. However, the long time required to reconstruct an image has been one of the major drawbacks associated with this technique. Parallel process is one of the best way to solve this problem. This paper gives the architecture and hardware design of PIRS-4 (4-processor Parallel Image Reconstruction System) which is a parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture. It includes structure and component of the system, the design of cross bar switch and details of control model. The test results are described

  8. Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

    2014-11-11

    Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.

  9. Tuning of tool dynamics for increased stability of parallel (simultaneous) turning processes

    Science.gov (United States)

    Ozturk, E.; Comak, A.; Budak, E.

    2016-01-01

    Parallel (simultaneous) turning operations make use of more than one cutting tool acting on a common workpiece offering potential for higher productivity. However, dynamic interaction between the tools and workpiece and resulting chatter vibrations may create quality problems on machined surfaces. In order to determine chatter free cutting process parameters, stability models can be employed. In this paper, stability of parallel turning processes is formulated in frequency and time domain for two different parallel turning cases. Predictions of frequency and time domain methods demonstrated reasonable agreement with each other. In addition, the predicted stability limits are also verified experimentally. Simulation and experimental results show multi regional stability diagrams which can be used to select most favorable set of process parameters for higher stable material removal rates. In addition to parameter selection, developed models can be used to determine the best natural frequency ratio of tools resulting in the highest stable depth of cuts. It is concluded that the most stable operations are obtained when natural frequency of the tools are slightly off each other and worst stability occurs when the natural frequency of the tools are exactly the same.

  10. Parallel and distributed processing in two SGBDS: A case study

    Directory of Open Access Journals (Sweden)

    Francisco Javier Moreno

    2017-04-01

    Full Text Available Context: One of the strategies for managing large volumes of data is distributed and parallel computing. Among the tools that allow applying these characteristics are some Data Base Management Systems (DBMS, such as Oracle, DB2, and SQL Server. Method: In this paper we present a case study where we evaluate the performance of an SQL query in two of these DBMS. The evaluation is done through various forms of data distribution in a computer network with different degrees of parallelism. Results: The tests of the SQL query evidenced the performance differences between the two DBMS analyzed. However, more thorough testing and a wider variety of queries are needed. Conclusions: The differences in performance between the two DBMSs analyzed show that when evaluating this aspect, it is necessary to consider the particularities of each DBMS and the degree of parallelism of the queries.

  11. Parallel processing architecture for H.264 deblocking filter on multi-core platforms

    Science.gov (United States)

    Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

    2012-03-01

    Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking

  12. Error Modeling and Design Optimization of Parallel Manipulators

    DEFF Research Database (Denmark)

    Wu, Guanglei

    /backlash, manufacturing and assembly errors and joint clearances. From the error prediction model, the distributions of the pose errors due to joint clearances are mapped within its constant-orientation workspace and the correctness of the developed model is validated experimentally. ix Additionally, using the screw......, dynamic modeling etc. Next, the rst-order dierential equation of the kinematic closure equation of planar parallel manipulator is obtained to develop its error model both in Polar and Cartesian coordinate systems. The established error model contains the error sources of actuation error...

  13. Introduction to parallel programming

    CERN Document Server

    Brawer, Steven

    1989-01-01

    Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race

  14. Model-driven product line engineering for mapping parallel algorithms to parallel computing platforms

    NARCIS (Netherlands)

    Arkin, Ethem; Tekinerdogan, Bedir

    2016-01-01

    Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, the mapping of the algorithm to the logical configuration platform and the implementation of the

  15. A Programming Model for Massive Data Parallelism with Data Dependencies

    International Nuclear Information System (INIS)

    Cui, Xiaohui; Mueller, Frank; Potok, Thomas E.; Zhang, Yongpeng

    2009-01-01

    Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For graphics processor units (GPUs), this is particularly the case when program development is aided by environments such as NVIDIA s Compute Unified Device Architecture (CUDA), which dramatically reduces the gap between domain-specific architectures and general purpose programming. Nonetheless, general-purpose GPU (GPGPU) programming remains subject to several restrictions. Most significantly, the separation of host (CPU) and accelerator (GPU) address spaces requires explicit management of GPU memory resources, especially for massive data parallelism that well exceeds the memory capacity of GPUs. One solution to this problem is to transfer data between the GPU and host memories frequently. In this work, we investigate another approach. We run massively data-parallel applications on GPU clusters. We further propose a programming model for massive data parallelism with data dependencies for this scenario. Experience from micro benchmarks and real-world applications shows that our model provides not only ease of programming but also significant performance gains

  16. Study and simulation of a parallel numerical processing machine

    International Nuclear Information System (INIS)

    Bel Hadj, Slaheddine

    1981-12-01

    This study has been carried out in the perspective of the implementation on a minicomputer of the NEPTUNIX package (software for the resolution of very large algebra-differential equation systems). Aiming at increasing the system performance, a previous research work has shown the necessity of reducing the execution time of certain numerical computation tasks, which are of frequent use. It has also demonstrated the feasibility of handling these tasks with efficient algorithms of parallel type. The present work deals with the study and simulation of a parallel architecture processor adapted to the fast execution of these algorithms. A minicomputer fitted with a connection to such a parallel processor, has a greatly extended computing power. Then the architecture of a parallel numerical processor, based on the use of VLSI microprocessors and co-processors, is described. Its design aims at the best cost / performance ratio. The last part deals with the simulation processor with the 'CHAMBOR' program. Results show an increasing factor of 30 in speed, in comparison with the execution on a MITRA 15 minicomputer. Moreover the conflicts importance, mainly at the level of access to a shared resource is evaluated. Although this implementation has been designed having in mind a dedicated application, other uses could be envisaged, particularly for the simulation of nuclear reactors: operator guiding system, the behavioural study under accidental circumstances, etc. (author) [fr

  17. Leveraging Non-Uniform Resources for Parallel Query Processing

    DEFF Research Database (Denmark)

    Mayr, Tobias; Bonnet, Philippe; Gehrke, Johannes

    2003-01-01

    Modular clusters are now composed of non- uniform nodes with different CPUs, disks or network cards so that customers can adapt the cluster configuration to the changing technologies and to their changing needs. This challenges dataflow parallelism as the primary load balancing technique of exist...

  18. A Parallel Algebraic Multigrid Solver on Graphics Processing Units

    KAUST Repository

    Haase, Gundolf; Liebmann, Manfred; Douglas, Craig C.; Plank, Gernot

    2010-01-01

    -vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster

  19. Development of Parallel Code for the Alaska Tsunami Forecast Model

    Science.gov (United States)

    Bahng, B.; Knight, W. R.; Whitmore, P.

    2014-12-01

    The Alaska Tsunami Forecast Model (ATFM) is a numerical model used to forecast propagation and inundation of tsunamis generated by earthquakes and other means in both the Pacific and Atlantic Oceans. At the U.S. National Tsunami Warning Center (NTWC), the model is mainly used in a pre-computed fashion. That is, results for hundreds of hypothetical events are computed before alerts, and are accessed and calibrated with observations during tsunamis to immediately produce forecasts. ATFM uses the non-linear, depth-averaged, shallow-water equations of motion with multiply nested grids in two-way communications between domains of each parent-child pair as waves get closer to coastal waters. Even with the pre-computation the task becomes non-trivial as sub-grid resolution gets finer. Currently, the finest resolution Digital Elevation Models (DEM) used by ATFM are 1/3 arc-seconds. With a serial code, large or multiple areas of very high resolution can produce run-times that are unrealistic even in a pre-computed approach. One way to increase the model performance is code parallelization used in conjunction with a multi-processor computing environment. NTWC developers have undertaken an ATFM code-parallelization effort to streamline the creation of the pre-computed database of results with the long term aim of tsunami forecasts from source to high resolution shoreline grids in real time. Parallelization will also permit timely regeneration of the forecast model database with new DEMs; and, will make possible future inclusion of new physics such as the non-hydrostatic treatment of tsunami propagation. The purpose of our presentation is to elaborate on the parallelization approach and to show the compute speed increase on various multi-processor systems.

  20. A Parallel Workload Model and its Implications for Processor Allocation

    Science.gov (United States)

    1996-11-01

    with SEV or AVG, both of which can tolerate c = 0.4 { 0.6 before their performance deteriorates signi cantly. On the other hand, Setia [10] has...Sanjeev. K Setia . The interaction between memory allocation and adaptive partitioning in message-passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [11] Sanjeev K. Setia and Satish K. Tripathi. An analysis of several processor

  1. Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

    KAUST Repository

    Hamam, Alwaleed A.

    2017-03-13

    Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it\\'s time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.

  2. Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

    KAUST Repository

    Hamam, Alwaleed A.; Khan, Ayaz H.

    2017-01-01

    Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it's time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.

  3. Parallel assembling and equation solving via graph algorithms with an application to the FE simulation of metal extrusion processes

    CERN Document Server

    Unterkircher, A

    2005-01-01

    We propose methods for parallel assembling and iterative equation solving based on graph algorithms. The assembling technique is independent of dimension, element type and model shape. As a parallel solving technique we construct a multiplicative symmetric Schwarz preconditioner for the conjugate gradient method. Both methods have been incorporated into a non-linear FE code to simulate 3D metal extrusion processes. We illustrate the efficiency of these methods on shared memory computers by realistic examples.

  4. Final Report: Center for Programming Models for Scalable Parallel Computing

    Energy Technology Data Exchange (ETDEWEB)

    Mellor-Crummey, John [William Marsh Rice University

    2011-09-13

    As part of the Center for Programming Models for Scalable Parallel Computing, Rice University collaborated with project partners in the design, development and deployment of language, compiler, and runtime support for parallel programming models to support application development for the “leadership-class” computer systems at DOE national laboratories. Work over the course of this project has focused on the design, implementation, and evaluation of a second-generation version of Coarray Fortran. Research and development efforts of the project have focused on the CAF 2.0 language, compiler, runtime system, and supporting infrastructure. This has involved working with the teams that provide infrastructure for CAF that we rely on, implementing new language and runtime features, producing an open source compiler that enabled us to evaluate our ideas, and evaluating our design and implementation through the use of benchmarks. The report details the research, development, findings, and conclusions from this work.

  5. A Parallel Processing Algorithm for Remote Sensing Classification

    Science.gov (United States)

    Gualtieri, J. Anthony

    2005-01-01

    A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.

  6. Vector-Parallel processing of the successive overrelaxation method

    International Nuclear Information System (INIS)

    Yokokawa, Mitsuo

    1988-02-01

    Successive overrelaxation method, called SOR method, is one of iterative methods for solving linear system of equations, and it has been calculated in serial with a natural ordering in many nuclear codes. After the appearance of vector processors, this natural SOR method has been changed for the parallel algorithm such as hyperplane or red-black method, in which the calculation order is modified. These methods are suitable for vector processors, and more high-speed calculation can be obtained compared with the natural SOR method on vector processors. In this report, a new scheme named 4-colors SOR method is proposed. We find that the 4-colors SOR method can be executed on vector-parallel processors and it gives the most high-speed calculation among all SOR methods according to results of the vector-parallel execution on the Alliant FX/8 multiprocessor system. It is also shown that the theoretical optimal acceleration parameters are equal among five different ordering SOR methods, and the difference between convergence rates of these SOR methods are examined. (author)

  7. Connectionist Models and Parallelism in High Level Vision.

    Science.gov (United States)

    1985-01-01

    GRANT NUMBER(s) Jerome A. Feldman N00014-82-K-0193 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENt. PROJECT, TASK Computer Science...Connectionist Models 2.1 Background and Overviev % Computer science is just beginning to look seriously at parallel computation : it may turn out that...the chair. The program includes intermediate level networks that compute more complex joints and ones that compute parallelograms in the image. These

  8. Parallel processing data network of master and slave transputers controlled by a serial control network

    Science.gov (United States)

    Crosetto, Dario B.

    1996-01-01

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.

  9. Parallel family trees for transfer matrices in the Potts model

    Science.gov (United States)

    Navarro, Cristobal A.; Canfora, Fabrizio; Hitschfeld, Nancy; Navarro, Gonzalo

    2015-02-01

    The computational cost of transfer matrix methods for the Potts model is related to the question in how many ways can two layers of a lattice be connected? Answering the question leads to the generation of a combinatorial set of lattice configurations. This set defines the configuration space of the problem, and the smaller it is, the faster the transfer matrix can be computed. The configuration space of generic (q , v) transfer matrix methods for strips is in the order of the Catalan numbers, which grows asymptotically as O(4m) where m is the width of the strip. Other transfer matrix methods with a smaller configuration space indeed exist but they make assumptions on the temperature, number of spin states, or restrict the structure of the lattice. In this paper we propose a parallel algorithm that uses a sub-Catalan configuration space of O(3m) to build the generic (q , v) transfer matrix in a compressed form. The improvement is achieved by grouping the original set of Catalan configurations into a forest of family trees, in such a way that the solution to the problem is now computed by solving the root node of each family. As a result, the algorithm becomes exponentially faster than the Catalan approach while still highly parallel. The resulting matrix is stored in a compressed form using O(3m ×4m) of space, making numerical evaluation and decompression to be faster than evaluating the matrix in its O(4m ×4m) uncompressed form. Experimental results for different sizes of strip lattices show that the parallel family trees (PFT) strategy indeed runs exponentially faster than the Catalan Parallel Method (CPM), especially when dealing with dense transfer matrices. In terms of parallel performance, we report strong-scaling speedups of up to 5.7 × when running on an 8-core shared memory machine and 28 × for a 32-core cluster. The best balance of speedup and efficiency for the multi-core machine was achieved when using p = 4 processors, while for the cluster

  10. Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU

    Directory of Open Access Journals (Sweden)

    Yong Xia

    2015-01-01

    Full Text Available Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation and the other is the diffusion term of the monodomain model (partial differential equation. Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations.

  11. CSDFa: a model for exploiting the trade-off between data and pipeline parallelism

    NARCIS (Netherlands)

    Koek, Peter; Geuns, S.J.; Hausmans, J.P.H.M.; Corporaal, Henk; Bekooij, Marco Jan Gerrit

    2016-01-01

    Real-time stream processing applications, such as SDR applications, are often executed concurrently on multiprocessor systems. A unified data flow model and analysis method have been proposed that can be used to simultaneously determine the amount of pipeline and coarse-grained data parallelism

  12. Efficient parallel implementation of active appearance model fitting algorithm on GPU.

    Science.gov (United States)

    Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou

    2014-01-01

    The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.

  13. Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU

    Directory of Open Access Journals (Sweden)

    Jinwei Wang

    2014-01-01

    Full Text Available The active appearance model (AAM is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA on the Nvidia’s GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.

  14. Examination of Speed Contribution of Parallelization for Several Fingerprint Pre-Processing Algorithms

    Directory of Open Access Journals (Sweden)

    GORGUNOGLU, S.

    2014-05-01

    Full Text Available In analysis of minutiae based fingerprint systems, fingerprints needs to be pre-processed. The pre-processing is carried out to enhance the quality of the fingerprint and to obtain more accurate minutiae points. Reducing the pre-processing time is important for identification and verification in real time systems and especially for databases holding large fingerprints information. Parallel processing and parallel CPU computing can be considered as distribution of processes over multi core processor. This is done by using parallel programming techniques. Reducing the execution time is the main objective in parallel processing. In this study, pre-processing of minutiae based fingerprint system is implemented by parallel processing on multi core computers using OpenMP and on graphics processor using CUDA to improve execution time. The execution times and speedup ratios are compared with the one that of single core processor. The results show that by using parallel processing, execution time is substantially improved. The improvement ratios obtained for different pre-processing algorithms allowed us to make suggestions on the more suitable approaches for parallelization.

  15. A model of breakdown in parallel-plate detectors

    International Nuclear Information System (INIS)

    Fonte, P.

    1996-01-01

    Parallel-plate avalanche chambers (PPAC's) have many desirable properties, such as a fast, large area particle detector. However, the maximum gain is limited by a form of violent breakdown that limits the usefulness of this detector, despite its other evident qualities. The exact nature of this phenomenon is not yet sufficiently clear to sustain possible improvements. A previous experimental study is complemented in the present work by a quantitative model of the breakdown phenomenon in PPAC's, based on the streamer theory. The model reproduces well the peculiar behavior of the external current observed in PPAC's and resistive-plate chambers. Other breakdown properties measured in PPAC's are also well reproduced

  16. cellGPU: Massively parallel simulations of dynamic vertex models

    Science.gov (United States)

    Sussman, Daniel M.

    2017-10-01

    Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation

  17. A new model for reliability optimization of series-parallel systems with non-homogeneous components

    International Nuclear Information System (INIS)

    Feizabadi, Mohammad; Jahromi, Abdolhamid Eshraghniaye

    2017-01-01

    In discussions related to reliability optimization using redundancy allocation, one of the structures that has attracted the attention of many researchers, is series-parallel structure. In models previously presented for reliability optimization of series-parallel systems, there is a restricting assumption based on which all components of a subsystem must be homogeneous. This constraint limits system designers in selecting components and prevents achieving higher levels of reliability. In this paper, a new model is proposed for reliability optimization of series-parallel systems, which makes possible the use of non-homogeneous components in each subsystem. As a result of this flexibility, the process of supplying system components will be easier. To solve the proposed model, since the redundancy allocation problem (RAP) belongs to the NP-hard class of optimization problems, a genetic algorithm (GA) is developed. The computational results of the designed GA are indicative of high performance of the proposed model in increasing system reliability and decreasing costs. - Highlights: • In this paper, a new model is proposed for reliability optimization of series-parallel systems. • In the previous models, there is a restricting assumption based on which all components of a subsystem must be homogeneous. • The presented model provides a possibility for the subsystems’ components to be non- homogeneous in the required conditions. • The computational results demonstrate the high performance of the proposed model in improving reliability and reducing costs.

  18. A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

    Energy Technology Data Exchange (ETDEWEB)

    Moreland, Kenneth [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Geveci, Berk [Kitware, Inc., Clifton Park, NY (United States)

    2014-11-01

    The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipeline model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.

  19. The island dynamics model on parallel quadtree grids

    Science.gov (United States)

    Mistani, Pouria; Guittet, Arthur; Bochkov, Daniil; Schneider, Joshua; Margetis, Dionisios; Ratsch, Christian; Gibou, Frederic

    2018-05-01

    We introduce an approach for simulating epitaxial growth by use of an island dynamics model on a forest of quadtree grids, and in a parallel environment. To this end, we use a parallel framework introduced in the context of the level-set method. This framework utilizes: discretizations that achieve a second-order accurate level-set method on non-graded adaptive Cartesian grids for solving the associated free boundary value problem for surface diffusion; and an established library for the partitioning of the grid. We consider the cases with: irreversible aggregation, which amounts to applying Dirichlet boundary conditions at the island boundary; and an asymmetric (Ehrlich-Schwoebel) energy barrier for attachment/detachment of atoms at the island boundary, which entails the use of a Robin boundary condition. We provide the scaling analyses performed on the Stampede supercomputer and numerical examples that illustrate the capability of our methodology to efficiently simulate different aspects of epitaxial growth. The combination of adaptivity and parallelism in our approach enables simulations that are several orders of magnitude faster than those reported in the recent literature and, thus, provides a viable framework for the systematic study of mound formation on crystal surfaces.

  20. Parallel processing of dose calculation for external photon beam therapy

    International Nuclear Information System (INIS)

    Kunieda, Etsuo; Ando, Yutaka; Tsukamoto, Nobuhiro; Ito, Hisao; Kubo, Atsushi

    1994-01-01

    We implemented external photon beam dose calculation programs into a parallel processor system consisting of Transputers, 32-bit processors especially suitable for multi-processor configuration. Two network conformations, binary-tree and pipeline, were evaluated for rectangular and irregular field dose calculation algorithms. Although computation speed increased in proportion to the number of CPU, substantial overhead caused by inter-processor communication occurred when a smaller computation load was delivered to each processor. On the other hand, for irregular field calculation, which requires more computation capability for each calculation point, the communication overhead was still less even when more than 50 processors were involved. Real-time responses could be expected for more complex algorithms by increasing the number of processors. (author)

  1. A tomograph VMEbus parallel processing data acquisition system

    International Nuclear Information System (INIS)

    Wilkinson, N.A.; Rogers, J.G.; Atkins, M.S.

    1989-01-01

    This paper describes a VME based data acquisition system suitable for the development of Positron Volume Imaging tomographs which use 3-D data for improved image resolution over slice-oriented tomographs. the data acquisition must be flexible enough to accommodate several 3-D reconstruction algorithms; hence, a software-based system is most suitable. Furthermore, because of the increased dimensions and resolution of volume imaging tomographs, the raw data event rate is greater than that of slice-oriented machines. These dual requirements are met by our data acquisition system. Flexibility is achieved through an array of processors connected over a VMEbus, operating asynchronously and in parallel. High raw data throughput is achieved using a dedicated high speed data transfer device available for the VMEbus. The device can attain a raw data rate of 2.5 million coincidence events per second for raw events which are 64 bits wide

  2. PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records.

    Science.gov (United States)

    Ng, Kenney; Ghoting, Amol; Steinhubl, Steven R; Stewart, Walter F; Malin, Bradley; Sun, Jimeng

    2014-04-01

    Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: (1) cohort construction, (2) feature construction, (3) cross-validation, (4) feature selection, and (5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data. To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which (1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, (2) schedules the tasks in a topological ordering of the graph, and (3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported. We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3h in parallel compared to 9days if running sequentially. This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate goal of building analytic pipelines

  3. Improved modelling of a parallel plate active magnetic regenerator

    International Nuclear Information System (INIS)

    Engelbrecht, K; Nielsen, K K; Bahl, C R H; Tušek, J; Kitanovski, A; Poredoš, A

    2013-01-01

    Much of the active magnetic regenerator (AMR) modelling presented in the literature considers only the solid and fluid domains of the regenerator and ignores other physical effects that have been shown to be important, such as demagnetizing fields in the regenerator, parasitic heat losses and fluid flow maldistribution in the regenerator. This paper studies the effects of these loss mechanisms and compares theoretical results with experimental results obtained on an experimental AMR device. Three parallel plate regenerators were tested, each having different demagnetizing field characteristics and fluid flow maldistributions. It was shown that when these loss mechanisms are ignored, the model significantly over predicts experimental results. Including the loss mechanisms can significantly change the model predictions, depending on the operating conditions and construction of the regenerator. The model is compared with experimental results for a range of fluid flow rates and cooling loads. (paper)

  4. The parallel processing impact in the optimization of the reactors neutronic by genetic algorithms

    International Nuclear Information System (INIS)

    Pereira, Claudio M.N.A.; Universidade Federal, Rio de Janeiro, RJ; Lapa, Celso M.F.; Mol, Antonio C.A.

    2002-01-01

    Nowadays, many optimization problems found in nuclear engineering has been solved through genetic algorithms (GA). The robustness of such methods is strongly related to the nature of search process which is based on populations of solution candidates, and this fact implies high computational cost in the optimization process. The use of GA become more critical when the evaluation process of a solution candidate is highly time consuming. Problems of this nature are common in the nuclear engineering, and an example is the reactor design optimization, where neutronic codes, which consume high CPU time, must be run. Aiming to investigate the impact of the use of parallel computation in the solution, through GA, of a reactor design optimization problem, a parallel genetic algorithm (PGA), using the Island Model, was developed. Exhaustive experiments, then 1500 processing hours in 550 MHz personal computers, have been done, in order to compare the conventional GA with the PGA. Such experiments have demonstrating the superiority of the PGA not only in terms of execution time, but also, in the optimization results. (author)

  5. Parallel workflow tools to facilitate human brain MRI post-processing

    Directory of Open Access Journals (Sweden)

    Zaixu eCui

    2015-05-01

    Full Text Available Multi-modal magnetic resonance imaging (MRI techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues.

  6. Co-simulation of dynamic systems in parallel and serial model configurations

    International Nuclear Information System (INIS)

    Sweafford, Trevor; Yoon, Hwan Sik

    2013-01-01

    Recent advancement in simulation software and computation hardware make it realizable to simulate complex dynamic systems comprised of multiple submodels developed in different modeling languages. The so-called co-simulation enables one to study various aspects of a complex dynamic system with heterogeneous submodels in a cost-effective manner. Among several different model configurations for co-simulation, synchronized parallel configuration is regarded to expedite the simulation process by simulation multiple sub models concurrently on a multi core processor. In this paper, computational accuracies as well as computation time are studied for three different co-simulation frameworks : integrated, serial, and parallel. for this purpose, analytical evaluations of the three different methods are made using the explicit Euler method and then they are applied to two-DOF mass-spring systems. The result show that while the parallel simulation configuration produces the same accurate results as the integrated configuration, results of the serial configuration, results of the serial configuration show a slight deviation. it is also shown that the computation time can be reduced by running simulation in the parallel configuration. Therefore, it can be concluded that the synchronized parallel simulation methodology is the best for both simulation accuracy and time efficiency.

  7. The Processing of Somatosensory Information Shifts from an Early Parallel into a Serial Processing Mode: A Combined fMRI/MEG Study.

    Science.gov (United States)

    Klingner, Carsten M; Brodoehl, Stefan; Huonker, Ralph; Witte, Otto W

    2016-01-01

    The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM) to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG) data collected during sustained (260 ms) tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII) receives information regarding a new stimulus in parallel with the primary somatosensory area (SI), whereas later processing in the SII is dominated by the preprocessed input from the SI.

  8. The Processing of Somatosensory Information shifts from an early parallel into a serial processing mode: a combined fMRI/MEG study.

    Directory of Open Access Journals (Sweden)

    Carsten Michael Klingner

    2016-12-01

    Full Text Available The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG data collected during sustained (260 ms tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII receives information regarding a new stimulus in parallel with the primary somatosensory area (SI, whereas later processing in the SII is dominated by the preprocessed input from the SI.

  9. Parallel Block Structured Adaptive Mesh Refinement on Graphics Processing Units

    Energy Technology Data Exchange (ETDEWEB)

    Beckingsale, D. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Gaudin, W. P. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Hornung, R. D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Gunney, B. T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Gamblin, T. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Herdman, J. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom); Jarvis, S. A. [Atomic Weapons Establishment (AWE), Aldermaston (United Kingdom)

    2014-11-17

    Block-structured adaptive mesh refinement is a technique that can be used when solving partial differential equations to reduce the number of zones necessary to achieve the required accuracy in areas of interest. These areas (shock fronts, material interfaces, etc.) are recursively covered with finer mesh patches that are grouped into a hierarchy of refinement levels. Despite the potential for large savings in computational requirements and memory usage without a corresponding reduction in accuracy, AMR adds overhead in managing the mesh hierarchy, adding complex communication and data movement requirements to a simulation. In this paper, we describe the design and implementation of a native GPU-based AMR library, including: the classes used to manage data on a mesh patch, the routines used for transferring data between GPUs on different nodes, and the data-parallel operators developed to coarsen and refine mesh data. We validate the performance and accuracy of our implementation using three test problems and two architectures: an eight-node cluster, and over four thousand nodes of Oak Ridge National Laboratory’s Titan supercomputer. Our GPU-based AMR hydrodynamics code performs up to 4.87× faster than the CPU-based implementation, and has been scaled to over four thousand GPUs using a combination of MPI and CUDA.

  10. Developing a Massively Parallel Forward Projection Radiography Model for Large-Scale Industrial Applications

    Energy Technology Data Exchange (ETDEWEB)

    Bauerle, Matthew [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2014-08-01

    This project utilizes Graphics Processing Units (GPUs) to compute radiograph simulations for arbitrary objects. The generation of radiographs, also known as the forward projection imaging model, is computationally intensive and not widely utilized. The goal of this research is to develop a massively parallel algorithm that can compute forward projections for objects with a trillion voxels (3D pixels). To achieve this end, the data are divided into blocks that can each t into GPU memory. The forward projected image is also divided into segments to allow for future parallelization and to avoid needless computations.

  11. Teaching Scientific Computing: A Model-Centered Approach to Pipeline and Parallel Programming with C

    Directory of Open Access Journals (Sweden)

    Vladimiras Dolgopolovas

    2015-01-01

    Full Text Available The aim of this study is to present an approach to the introduction into pipeline and parallel computing, using a model of the multiphase queueing system. Pipeline computing, including software pipelines, is among the key concepts in modern computing and electronics engineering. The modern computer science and engineering education requires a comprehensive curriculum, so the introduction to pipeline and parallel computing is the essential topic to be included in the curriculum. At the same time, the topic is among the most motivating tasks due to the comprehensive multidisciplinary and technical requirements. To enhance the educational process, the paper proposes a novel model-centered framework and develops the relevant learning objects. It allows implementing an educational platform of constructivist learning process, thus enabling learners’ experimentation with the provided programming models, obtaining learners’ competences of the modern scientific research and computational thinking, and capturing the relevant technical knowledge. It also provides an integral platform that allows a simultaneous and comparative introduction to pipelining and parallel computing. The programming language C for developing programming models and message passing interface (MPI and OpenMP parallelization tools have been chosen for implementation.

  12. Information-Limited Parallel Processing in Difficult Heterogeneous Covert Visual Search

    Science.gov (United States)

    Dosher, Barbara Anne; Han, Songmei; Lu, Zhong-Lin

    2010-01-01

    Difficult visual search is often attributed to time-limited serial attention operations, although neural computations in the early visual system are parallel. Using probabilistic search models (Dosher, Han, & Lu, 2004) and a full time-course analysis of the dynamics of covert visual search, we distinguish unlimited capacity parallel versus serial…

  13. ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.

    Science.gov (United States)

    Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping

    2018-04-27

    A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.

  14. The parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture

    International Nuclear Information System (INIS)

    Wang Shi; Kang Kejun; Wang Jingjin

    1996-01-01

    Computerized Tomography (CT) is expected to become an inevitable diagnostic technique in the future. However, the long time required to reconstruct an image has been one of the major drawbacks associated with this technique. Parallel process is one of the best way to solve this problem. This paper gives the architecture, hardware and software design of PIRS-4 (4-processor Parallel Image Reconstruction System), which is a parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture. It includes the structure and components of the system, the design of crossbar switch and details of control model, the description of RPBP image reconstruction, the choice of OS (Operate System) and language, the principle of imitating EMS, direct memory R/W of float and programming in the protect model. Finally, the test results are given

  15. Modelling distribution of evaporating CO2 in parallel minichannels

    DEFF Research Database (Denmark)

    Brix, Wiebke; Kærn, Martin Ryhl; Elmegaard, Brian

    2010-01-01

    The effects of airflow non-uniformity and uneven inlet qualities on the performance of a minichannel evaporator with parallel channels, using CO2 as refrigerant, are investigated numerically. For this purpose a one-dimensional discretised steady-state model was developed, applying well-known empi......The effects of airflow non-uniformity and uneven inlet qualities on the performance of a minichannel evaporator with parallel channels, using CO2 as refrigerant, are investigated numerically. For this purpose a one-dimensional discretised steady-state model was developed, applying well...... to maldistribution of the refrigerant and considerable capacity reduction of the evaporator. Uneven inlet ualities to the different channels show only minor effects on the refrigerant distribution and evaporator capacity as long as the channels are vertically oriented with CO2 flowing upwards. For horizontal...... channels capacity reductions are found for both non-uniform airflow and uneven inlet qualities. For horizontal minichannels the results are very similar to those obtained using R134a as refrigerant....

  16. cudaBayesreg: Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis

    Directory of Open Access Journals (Sweden)

    Adelino R. Ferreira da Silva

    2011-10-01

    Full Text Available Graphic processing units (GPUs are rapidly gaining maturity as powerful general parallel computing devices. A key feature in the development of modern GPUs has been the advancement of the programming model and programming tools. Compute Unified Device Architecture (CUDA is a software platform for massively parallel high-performance computing on Nvidia many-core GPUs. In functional magnetic resonance imaging (fMRI, the volume of the data to be processed, and the type of statistical analysis to perform call for high-performance computing strategies. In this work, we present the main features of the R-CUDA package cudaBayesreg which implements in CUDA the core of a Bayesian multilevel model for the analysis of brain fMRI data. The statistical model implements a Gibbs sampler for multilevel/hierarchical linear models with a normal prior. The main contribution for the increased performance comes from the use of separate threads for fitting the linear regression model at each voxel in parallel. The R-CUDA implementation of the Bayesian model proposed here has been able to reduce significantly the run-time processing of Markov chain Monte Carlo (MCMC simulations used in Bayesian fMRI data analyses. Presently, cudaBayesreg is only configured for Linux systems with Nvidia CUDA support.

  17. Parallel eigenanalysis of finite element models in a completely connected architecture

    Science.gov (United States)

    Akl, F. A.; Morel, M. R.

    1989-01-01

    A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.

  18. Next Generation Parallelization Systems for Processing and Control of PDS Image Node Assets

    Science.gov (United States)

    Verma, R.

    2017-06-01

    We present next-generation parallelization tools to help Planetary Data System (PDS) Imaging Node (IMG) better monitor, process, and control changes to nearly 650 million file assets and over a dozen machines on which they are referenced or stored.

  19. Parallel Application Development Using Architecture View Driven Model Transformations

    NARCIS (Netherlands)

    Arkin, E.; Tekinerdogan, B.

    2015-01-01

    o realize the increased need for computing performance the current trend is towards applying parallel computing in which the tasks are run in parallel on multiple nodes. On its turn we can observe the rapid increase of the scale of parallel computing platforms. This situation has led to a complexity

  20. Partial Overhaul and Initial Parallel Optimization of KINETICS, a Coupled Dynamics and Chemistry Atmosphere Model

    Science.gov (United States)

    Nguyen, Howard; Willacy, Karen; Allen, Mark

    2012-01-01

    KINETICS is a coupled dynamics and chemistry atmosphere model that is data intensive and computationally demanding. The potential performance gain from using a supercomputer motivates the adaptation from a serial version to a parallelized one. Although the initial parallelization had been done, bottlenecks caused by an abundance of communication calls between processors led to an unfavorable drop in performance. Before starting on the parallel optimization process, a partial overhaul was required because a large emphasis was placed on streamlining the code for user convenience and revising the program to accommodate the new supercomputers at Caltech and JPL. After the first round of optimizations, the partial runtime was reduced by a factor of 23; however, performance gains are dependent on the size of the data, the number of processors requested, and the computer used.

  1. Design and simulation of parallel and distributed architectures for images processing

    International Nuclear Information System (INIS)

    Pirson, Alain

    1990-01-01

    The exploitation of visual information requires special computers. The diversity of operations and the Computing power involved bring about structures founded on the concepts of concurrency and distributed processing. This work identifies a vision computer with an association of dedicated intelligent entities, exchanging messages according to the model of parallelism introduced by the language Occam. It puts forward an architecture of the 'enriched processor network' type. It consists of a classical multiprocessor structure where each node is provided with specific devices. These devices perform processing tasks as well as inter-nodes dialogues. Such an architecture benefits from the homogeneity of multiprocessor networks and the power of dedicated resources. Its implementation corresponds to that of a distributed structure, tasks being allocated to each Computing element. This approach culminates in an original architecture called ATILA. This modular structure is based on a transputer network supplied with vision dedicated co-processors and powerful communication devices. (author) [fr

  2. Unified Singularity Modeling and Reconfiguration of 3rTPS Metamorphic Parallel Mechanisms with Parallel Constraint Screws

    Directory of Open Access Journals (Sweden)

    Yufeng Zhuang

    2015-01-01

    Full Text Available This paper presents a unified singularity modeling and reconfiguration analysis of variable topologies of a class of metamorphic parallel mechanisms with parallel constraint screws. The new parallel mechanisms consist of three reconfigurable rTPS limbs that have two working phases stemming from the reconfigurable Hooke (rT joint. While one phase has full mobility, the other supplies a constraint force to the platform. Based on these, the platform constraint screw systems show that the new metamorphic parallel mechanisms have four topologies by altering the limb phases with mobility change among 1R2T (one rotation with two translations, 2R2T, and 3R2T and mobility 6. Geometric conditions of the mechanism design are investigated with some special topologies illustrated considering the limb arrangement. Following this and the actuation scheme analysis, a unified Jacobian matrix is formed using screw theory to include the change between geometric constraints and actuation constraints in the topology reconfiguration. Various singular configurations are identified by analyzing screw dependency in the Jacobian matrix. The work in this paper provides basis for singularity-free workspace analysis and optimal design of the class of metamorphic parallel mechanisms with parallel constraint screws which shows simple geometric constraints with potential simple kinematics and dynamics properties.

  3. Dynamic CT perfusion image data compression for efficient parallel processing.

    Science.gov (United States)

    Barros, Renan Sales; Olabarriaga, Silvia Delgado; Borst, Jordi; van Walderveen, Marianne A A; Posthuma, Jorrit S; Streekstra, Geert J; van Herk, Marcel; Majoie, Charles B L M; Marquering, Henk A

    2016-03-01

    The increasing size of medical imaging data, in particular time series such as CT perfusion (CTP), requires new and fast approaches to deliver timely results for acute care. Cloud architectures based on graphics processing units (GPUs) can provide the processing capacity required for delivering fast results. However, the size of CTP datasets makes transfers to cloud infrastructures time-consuming and therefore not suitable in acute situations. To reduce this transfer time, this work proposes a fast and lossless compression algorithm for CTP data. The algorithm exploits redundancies in the temporal dimension and keeps random read-only access to the image elements directly from the compressed data on the GPU. To the best of our knowledge, this is the first work to present a GPU-ready method for medical image compression with random access to the image elements from the compressed data.

  4. Category Specific Spatial Dissociations of Parallel Processes Underlying Visual Naming

    OpenAIRE

    Conner, Christopher R.; Chen, Gang; Pieters, Thomas A.; Tandon, Nitin

    2013-01-01

    The constituent elements and dynamics of the networks responsible for word production are a central issue to understanding human language. Of particular interest is their dependency on lexical category, particularly the possible segregation of nouns and verbs into separate processing streams. We applied a novel mixed-effects, multilevel analysis to electrocorticographic data collected from 19 patients (1942 electrodes) to examine the activity of broadly disseminated cortical networks during t...

  5. Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

    Science.gov (United States)

    Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

    2008-04-01

    This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.

  6. Fast parallel algorithm for three-dimensional distance-driven model in iterative computed tomography reconstruction

    International Nuclear Information System (INIS)

    Chen Jian-Lin; Li Lei; Wang Lin-Yuan; Cai Ai-Long; Xi Xiao-Qi; Zhang Han-Ming; Li Jian-Xin; Yan Bin

    2015-01-01

    The projection matrix model is used to describe the physical relationship between reconstructed object and projection. Such a model has a strong influence on projection and backprojection, two vital operations in iterative computed tomographic reconstruction. The distance-driven model (DDM) is a state-of-the-art technology that simulates forward and back projections. This model has a low computational complexity and a relatively high spatial resolution; however, it includes only a few methods in a parallel operation with a matched model scheme. This study introduces a fast and parallelizable algorithm to improve the traditional DDM for computing the parallel projection and backprojection operations. Our proposed model has been implemented on a GPU (graphic processing unit) platform and has achieved satisfactory computational efficiency with no approximation. The runtime for the projection and backprojection operations with our model is approximately 4.5 s and 10.5 s per loop, respectively, with an image size of 256×256×256 and 360 projections with a size of 512×512. We compare several general algorithms that have been proposed for maximizing GPU efficiency by using the unmatched projection/backprojection models in a parallel computation. The imaging resolution is not sacrificed and remains accurate during computed tomographic reconstruction. (paper)

  7. Nambu-Jona-Lasinio model in a parallel electromagnetic field

    Science.gov (United States)

    Wang, Lingxiao; Cao, Gaoqing; Huang, Xu-Guang; Zhuang, Pengfei

    2018-05-01

    We explore the features of the UA (1) and chiral symmetry breaking of the Nambu-Jona-Lasinio model without the Kobayashi-Maskawa-'t Hooft determinant term in the presence of a parallel electromagnetic field. We show that the electromagnetic chiral anomaly can induce both finite neutral pion condensate and isospin-singlet pseudo-scalar η condensate and thus modifies the chiral symmetry breaking pattern. In order to characterize the strength of the UA (1) symmetry breaking, we evaluate the susceptibility associated with the UA (1) charge. The result shows that the susceptibility contributed from the chiral anomaly is consistent with the behavior of the corresponding η condensate. The spectra of the mesonic excitations are also studied.

  8. Parallel imaging enhanced MR colonography using a phantom model.

    LENUS (Irish Health Repository)

    Morrin, Martina M

    2008-09-01

    To compare various Array Spatial and Sensitivity Encoding Technique (ASSET)-enhanced T2W SSFSE (single shot fast spin echo) and T1-weighted (T1W) 3D SPGR (spoiled gradient recalled echo) sequences for polyp detection and image quality at MR colonography (MRC) in a phantom model. Limitations of MRC using standard 3D SPGR T1W imaging include the long breath-hold required to cover the entire colon within one acquisition and the relatively low spatial resolution due to the long acquisition time. Parallel imaging using ASSET-enhanced T2W SSFSE and 3D T1W SPGR imaging results in much shorter imaging times, which allows for increased spatial resolution.

  9. Parallel Attack and the Enemy’s Decision Making Process

    Science.gov (United States)

    1998-04-01

    Theory Coursebook , Vol 2, Sept 1997, p 365 8 Joint Publication 3-0, Doctrine for Joint Operations, 1 Feb 1995, p III-11 9 Gorrell, LtCol Edgar S., “The...Behavioral Strategies,” Journal of the American Statistical Association, Vol 90, Issue 432, December, 1995, p1137 3 Joint Publication 5-0, Doctrine for...Strategies,” Journal of the American Statistical Association, Vol 90, Issue 432, December, 1995, p1137 Allison, Graham T., “Conceptual Models and the Cuban

  10. Ordering schemes for parallel processing of certain mesh problems

    International Nuclear Information System (INIS)

    O'Leary, D.

    1984-01-01

    In this work, some ordering schemes for mesh points are presented which enable algorithms such as the Gauss-Seidel or SOR iteration to be performed efficiently for the nine-point operator finite difference method on computers consisting of a two-dimensional grid of processors. Convergence results are presented for the discretization of u /SUB xx/ + u /SUB yy/ on a uniform mesh over a square, showing that the spectral radius of the iteration for these orderings is no worse than that for the standard row by row ordering of mesh points. Further applications of these mesh point orderings to network problems, more general finite difference operators, and picture processing problems are noted

  11. A Hybrid Parallel Execution Model for Logic Based Requirement Specifications (Invited Paper

    Directory of Open Access Journals (Sweden)

    Jeffrey J. P. Tsai

    1999-05-01

    Full Text Available It is well known that undiscovered errors in a requirements specification is extremely expensive to be fixed when discovered in the software maintenance phase. Errors in the requirement phase can be reduced through the validation and verification of the requirements specification. Many logic-based requirements specification languages have been developed to achieve these goals. However, the execution and reasoning of a logic-based requirements specification can be very slow. An effective way to improve their performance is to execute and reason the logic-based requirements specification in parallel. In this paper, we present a hybrid model to facilitate the parallel execution of a logic-based requirements specification language. A logic-based specification is first applied by a data dependency analysis technique which can find all the mode combinations that exist within a specification clause. This mode information is used to support a novel hybrid parallel execution model, which combines both top-down and bottom-up evaluation strategies. This new execution model can find the failure in the deepest node of the search tree at the early stage of the evaluation, thus this new execution model can reduce the total number of nodes searched in the tree, the total processes needed to be generated, and the total communication channels needed in the search process. A simulator has been implemented to analyze the execution behavior of the new model. Experiments show significant improvement based on several criteria.

  12. Process-Oriented Parallel Programming with an Application to Data-Intensive Computing

    OpenAIRE

    Givelberg, Edward

    2014-01-01

    We introduce process-oriented programming as a natural extension of object-oriented programming for parallel computing. It is based on the observation that every class of an object-oriented language can be instantiated as a process, accessible via a remote pointer. The introduction of process pointers requires no syntax extension, identifies processes with programming objects, and enables processes to exchange information simply by executing remote methods. Process-oriented programming is a h...

  13. Initial Assessment of Parallelization of Monte Carlo Calculation using Graphics Processing Units

    International Nuclear Information System (INIS)

    Choi, Sung Hoon; Joo, Han Gyu

    2009-01-01

    Monte Carlo (MC) simulation is an effective tool for calculating neutron transports in complex geometry. However, because Monte Carlo simulates each neutron behavior one by one, it takes a very long computing time if enough neutrons are used for high precision of calculation. Accordingly, methods that reduce the computing time are required. In a Monte Carlo code, parallel calculation is well-suited since it simulates the behavior of each neutron independently and thus parallel computation is natural. The parallelization of the Monte Carlo codes, however, was done using multi CPUs. By the global demand for high quality 3D graphics, the Graphics Processing Unit (GPU) has developed into a highly parallel, multi-core processor. This parallel processing capability of GPUs can be available to engineering computing once a suitable interface is provided. Recently, NVIDIA introduced CUDATM, a general purpose parallel computing architecture. CUDA is a software environment that allows developers to manage GPU using C/C++ or other languages. In this work, a GPU-based Monte Carlo is developed and the initial assessment of it parallel performance is investigated

  14. The role of parallelism in the real-time processing of anaphora.

    Science.gov (United States)

    Poirier, Josée; Walenski, Matthew; Shapiro, Lewis P

    2012-06-01

    Parallelism effects refer to the facilitated processing of a target structure when it follows a similar, parallel structure. In coordination, a parallelism-related conjunction triggers the expectation that a second conjunct with the same structure as the first conjunct should occur. It has been proposed that parallelism effects reflect the use of the first structure as a template that guides the processing of the second. In this study, we examined the role of parallelism in real-time anaphora resolution by charting activation patterns in coordinated constructions containing anaphora, Verb-Phrase Ellipsis (VPE) and Noun-Phrase Traces (NP-traces). Specifically, we hypothesised that an expectation of parallelism would incite the parser to assume a structure similar to the first conjunct in the second, anaphora-containing conjunct. The speculation of a similar structure would result in early postulation of covert anaphora. Experiment 1 confirms that following a parallelism-related conjunction, first-conjunct material is activated in the second conjunct. Experiment 2 reveals that an NP-trace in the second conjunct is posited immediately where licensed, which is earlier than previously reported in the literature. In light of our findings, we propose an intricate relation between structural expectations and anaphor resolution.

  15. Teaching ethics to engineers: ethical decision making parallels the engineering design process.

    Science.gov (United States)

    Bero, Bridget; Kuhlman, Alana

    2011-09-01

    In order to fulfill ABET requirements, Northern Arizona University's Civil and Environmental engineering programs incorporate professional ethics in several of its engineering courses. This paper discusses an ethics module in a 3rd year engineering design course that focuses on the design process and technical writing. Engineering students early in their student careers generally possess good black/white critical thinking skills on technical issues. Engineering design is the first time students are exposed to "grey" or multiple possible solution technical problems. To identify and solve these problems, the engineering design process is used. Ethical problems are also "grey" problems and present similar challenges to students. Students need a practical tool for solving these ethical problems. The step-wise engineering design process was used as a model to demonstrate a similar process for ethical situations. The ethical decision making process of Martin and Schinzinger was adapted for parallelism to the design process and presented to students as a step-wise technique for identification of the pertinent ethical issues, relevant moral theories, possible outcomes and a final decision. Students had greatest difficulty identifying the broader, global issues presented in an ethical situation, but by the end of the module, were better able to not only identify the broader issues, but also to more comprehensively assess specific issues, generate solutions and a desired response to the issue.

  16. Adapting high-level language programs for parallel processing using data flow

    Science.gov (United States)

    Standley, Hilda M.

    1988-01-01

    EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.

  17. Parallel processing implementation for the coupled transport of photons and electrons using OpenMP

    Science.gov (United States)

    Doerner, Edgardo

    2016-05-01

    In this work the use of OpenMP to implement the parallel processing of the Monte Carlo (MC) simulation of the coupled transport for photons and electrons is presented. This implementation was carried out using a modified EGSnrc platform which enables the use of the Microsoft Visual Studio 2013 (VS2013) environment, together with the developing tools available in the Intel Parallel Studio XE 2015 (XE2015). The performance study of this new implementation was carried out in a desktop PC with a multi-core CPU, taking as a reference the performance of the original platform. The results were satisfactory, both in terms of scalability as parallelization efficiency.

  18. Parallel processes: using motivational interviewing as an implementation coaching strategy.

    Science.gov (United States)

    Hettema, Jennifer E; Ernst, Denise; Williams, Jessica Roberts; Miller, Kristin J

    2014-07-01

    In addition to its clinical efficacy as a communication style for strengthening motivation and commitment to change, motivational interviewing (MI) has been hypothesized to be a potential tool for facilitating evidence-based practice adoption decisions. This paper reports on the rationale and content of MI-based implementation coaching Webinars that, as part of a larger active dissemination strategy, were found to be more effective than passive dissemination strategies at promoting adoption decisions among behavioral health and health providers and administrators. The Motivational Interviewing Treatment Integrity scale (MITI 3.1.1) was used to rate coaching Webinars from 17 community behavioral health organizations and 17 community health centers. The MITI coding system was found to be applicable to the coaching Webinars, and raters achieved high levels of agreement on global and behavior count measurements of fidelity to MI. Results revealed that implementation coaches maintained fidelity to the MI model, exceeding competency benchmarks for almost all measures. Findings suggest that it is feasible to implement MI as a coaching tool.

  19. Algorithm comparison and benchmarking using a parallel spectra transform shallow water model

    Energy Technology Data Exchange (ETDEWEB)

    Worley, P.H. [Oak Ridge National Lab., TN (United States); Foster, I.T.; Toonen, B. [Argonne National Lab., IL (United States)

    1995-04-01

    In recent years, a number of computer vendors have produced supercomputers based on a massively parallel processing (MPP) architecture. These computers have been shown to be competitive in performance with conventional vector supercomputers for some applications. As spectral weather and climate models are heavy users of vector supercomputers, it is interesting to determine how these models perform on MPPS, and which MPPs are best suited to the execution of spectral models. The benchmarking of MPPs is complicated by the fact that different algorithms may be more efficient on different architectures. Hence, a comprehensive benchmarking effort must answer two related questions: which algorithm is most efficient on each computer and how do the most efficient algorithms compare on different computers. In general, these are difficult questions to answer because of the high cost associated with implementing and evaluating a range of different parallel algorithms on each MPP platform.

  20. A Parallel and Distributed Surrogate Model Implementation for Computational Steering

    KAUST Repository

    Butnaru, Daniel; Buse, Gerrit; Pfluger, Dirk

    2012-01-01

    of the input parameters. Such an exploration process is however not possible if the simulation is computationally too expensive. For these cases we present in this paper a scalable computational steering approach utilizing a fast surrogate model as substitute

  1. Parallel-hierarchical processing and classification of laser beam profile images based on the GPU-oriented architecture

    Science.gov (United States)

    Yarovyi, Andrii A.; Timchenko, Leonid I.; Kozhemiako, Volodymyr P.; Kokriatskaia, Nataliya I.; Hamdi, Rami R.; Savchuk, Tamara O.; Kulyk, Oleksandr O.; Surtel, Wojciech; Amirgaliyev, Yedilkhan; Kashaganova, Gulzhan

    2017-08-01

    The paper deals with a problem of insufficient productivity of existing computer means for large image processing, which do not meet modern requirements posed by resource-intensive computing tasks of laser beam profiling. The research concentrated on one of the profiling problems, namely, real-time processing of spot images of the laser beam profile. Development of a theory of parallel-hierarchic transformation allowed to produce models for high-performance parallel-hierarchical processes, as well as algorithms and software for their implementation based on the GPU-oriented architecture using GPGPU technologies. The analyzed performance of suggested computerized tools for processing and classification of laser beam profile images allows to perform real-time processing of dynamic images of various sizes.

  2. Decreasing Data Analytics Time: Hybrid Architecture MapReduce-Massive Parallel Processing for a Smart Grid

    Directory of Open Access Journals (Sweden)

    Abdeslam Mehenni

    2017-03-01

    Full Text Available As our populations grow in a world of limited resources enterprise seek ways to lighten our load on the planet. The idea of modifying consumer behavior appears as a foundation for smart grids. Enterprise demonstrates the value available from deep analysis of electricity consummation histories, consumers’ messages, and outage alerts, etc. Enterprise mines massive structured and unstructured data. In a nutshell, smart grids result in a flood of data that needs to be analyzed, for better adjust to demand and give customers more ability to delve into their power consumption. Simply put, smart grids will increasingly have a flexible data warehouse attached to them. The key driver for the adoption of data management strategies is clearly the need to handle and analyze the large amounts of information utilities are now faced with. New approaches to data integration are nauseating moment; Hadoop is in fact now being used by the utility to help manage the huge growth in data whilst maintaining coherence of the Data Warehouse. In this paper we define a new Meter Data Management System Architecture repository that differ with three leaders MDMS, where we use MapReduce programming model for ETL and Parallel DBMS in Query statements(Massive Parallel Processing MPP.

  3. Three-dimensional parallel edge-based finite element modeling of electromagnetic data with field redatuming

    DEFF Research Database (Denmark)

    Cai, Hongzhu; Čuma, Martin; Zhdanov, Michael

    2015-01-01

    This paper presents a parallelized version of the edge-based finite element method with a novel post-processing approach for numerical modeling of an electromagnetic field in complex media. The method uses an unstructured tetrahedral mesh which can reduce the number of degrees of freedom signific......This paper presents a parallelized version of the edge-based finite element method with a novel post-processing approach for numerical modeling of an electromagnetic field in complex media. The method uses an unstructured tetrahedral mesh which can reduce the number of degrees of freedom...... significantly. The linear system of finite element equations is solved using parallel direct solvers which are robust for ill-conditioned systems and efficient for multiple source electromagnetic (EM) modeling. We also introduce a novel approach to compute the scalar components of the electric field from...... the tangential components along each edge based on field redatuming. The method can produce a more accurate result as compared to conventional approach. We have applied the developed algorithm to compute the EM response for a typical 3D anisotropic geoelectrical model of the off-shore HC reservoir with complex...

  4. Research on Multi - Person Parallel Modeling Method Based on Integrated Model Persistent Storage

    Science.gov (United States)

    Qu, MingCheng; Wu, XiangHu; Tao, YongChao; Liu, Ying

    2018-03-01

    This paper mainly studies the multi-person parallel modeling method based on the integrated model persistence storage. The integrated model refers to a set of MDDT modeling graphics system, which can carry out multi-angle, multi-level and multi-stage description of aerospace general embedded software. Persistent storage refers to converting the data model in memory into a storage model and converting the storage model into a data model in memory, where the data model refers to the object model and the storage model is a binary stream. And multi-person parallel modeling refers to the need for multi-person collaboration, the role of separation, and even real-time remote synchronization modeling.

  5. Parallelization of simulation code for liquid-gas model of lattice-gas fluid

    International Nuclear Information System (INIS)

    Kawai, Wataru; Ebihara, Kenichi; Kume, Etsuo; Watanabe, Tadashi

    2000-03-01

    A simulation code for hydrodynamical phenomena which is based on the liquid-gas model of lattice-gas fluid is parallelized by using MPI (Message Passing Interface) library. The parallelized code can be applied to the larger size of the simulations than the non-parallelized code. The calculation times of the parallelized code on VPP500 (Vector-Parallel super computer with dispersed memory units), AP3000 (Scalar-parallel server with dispersed memory units), and a workstation cluster decreased in inverse proportion to the number of processors. (author)

  6. Parallel Development of Products and New Business Models

    DEFF Research Database (Denmark)

    Lund, Morten; Hansen, Poul H. Kyvsgård

    2014-01-01

    The perception of product development and the practical execution of product development in professional organizations have undergone dramatic changes in recent years. Many of these chances relate to introduction of broader and more cross-disciplinary views that involves new organizational functi...... and innovation management the 4th generation models are increasingly including the concept business models and business model innovation....... functions and new concepts. These chances can be captured in various generations of practice. This paper will discuss the recent development of 3rd generation product development process models and the emergence of a 4th generation. While the 3rd generation models included the concept of innovation...

  7. Reliable and Efficient Parallel Processing Algorithms and Architectures for Modern Signal Processing. Ph.D. Thesis

    Science.gov (United States)

    Liu, Kuojuey Ray

    1990-01-01

    Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered.

  8. Visual analysis of inter-process communication for large-scale parallel computing.

    Science.gov (United States)

    Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

    2009-01-01

    In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.

  9. Implementation of parallel processing in the basf2 framework for Belle II

    International Nuclear Information System (INIS)

    Itoh, Ryosuke; Lee, Soohyung; Katayama, N; Mineo, S; Moll, A; Kuhr, T; Heck, M

    2012-01-01

    Recent PC servers are equipped with multi-core CPUs and it is desired to utilize the full processing power of them for the data analysis in large scale HEP experiments. A software framework basf2 is being developed for the use in the Belle II experiment, a new generation B-factory experiment at KEK, and the parallel event processing to utilize the multi-core CPUs is in its design for the use in the massive data production. The details of the implementation of event parallel processing in the basf2 framework are discussed with the report of preliminary performance study in the realistic use on a 32 core PC server.

  10. A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

    Energy Technology Data Exchange (ETDEWEB)

    Ma, Kwan-Liu [Univ. of California, Davis, CA (United States)

    2017-02-01

    Most of today’s visualization libraries and applications are based off of what is known today as the visualization pipeline. In the visualization pipeline model, algorithms are encapsulated as “filtering” components with inputs and outputs. These components can be combined by connecting the outputs of one filter to the inputs of another filter. The visualization pipeline model is popular because it provides a convenient abstraction that allows users to combine algorithms in powerful ways. Unfortunately, the visualization pipeline cannot run effectively on exascale computers. Experts agree that the exascale machine will comprise processors that contain many cores. Furthermore, physical limitations will prevent data movement in and out of the chip (that is, between main memory and the processing cores) from keeping pace with improvements in overall compute performance. To use these processors to their fullest capability, it is essential to carefully consider memory access. This is where the visualization pipeline fails. Each filtering component in the visualization library is expected to take a data set in its entirety, perform some computation across all of the elements, and output the complete results. The process of iterating over all elements must be repeated in each filter, which is one of the worst possible ways to traverse memory when trying to maximize the number of executions per memory access. This project investigates a new type of visualization framework that exhibits a pervasive parallelism necessary to run on exascale machines. Our framework achieves this by defining algorithms in terms of functors, which are localized, stateless operations. Functors can be composited in much the same way as filters in the visualization pipeline. But, functors’ design allows them to be concurrently running on massive amounts of lightweight threads. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for

  11. A Hybrid FPGA/Coarse Parallel Processing Architecture for Multi-modal Visual Feature Descriptors

    DEFF Research Database (Denmark)

    Jensen, Lars Baunegaard With; Kjær-Nielsen, Anders; Alonso, Javier Díaz

    2008-01-01

    This paper describes the hybrid architecture developed for speeding up the processing of so-called multi-modal visual primitives which are sparse image descriptors extracted along contours. In the system, the first stages of visual processing are implemented on FPGAs due to their highly parallel...

  12. Strong Bisimilarity and Regularity of Basic Parallel Processes is PSPACE-Hard

    DEFF Research Database (Denmark)

    Srba, Jirí

    2002-01-01

    We show that the problem of checking whether two processes definable in the syntax of Basic Parallel Processes (BPP) are strongly bisimilar is PSPACE-hard. We also demonstrate that there is a polynomial time reduction from the strong bisimilarity checking problem of regular BPP to the strong...

  13. Parallels between a Collaborative Research Process and the Middle Level Philosophy

    Science.gov (United States)

    Dever, Robin; Ross, Diane; Miller, Jennifer; White, Paula; Jones, Karen

    2014-01-01

    The characteristics of the middle level philosophy as described in This We Believe closely parallel the collaborative research process. The journey of one research team is described in relationship to these characteristics. The collaborative process includes strengths such as professional relationships, professional development, courageous…

  14. Solution-processed parallel tandem polymer solar cells using silver nanowires as intermediate electrode.

    Science.gov (United States)

    Guo, Fei; Kubis, Peter; Li, Ning; Przybilla, Thomas; Matt, Gebhard; Stubhan, Tobias; Ameri, Tayebeh; Butz, Benjamin; Spiecker, Erdmann; Forberich, Karen; Brabec, Christoph J

    2014-12-23

    Tandem architecture is the most relevant concept to overcome the efficiency limit of single-junction photovoltaic solar cells. Series-connected tandem polymer solar cells (PSCs) have advanced rapidly during the past decade. In contrast, the development of parallel-connected tandem cells is lagging far behind due to the big challenge in establishing an efficient interlayer with high transparency and high in-plane conductivity. Here, we report all-solution fabrication of parallel tandem PSCs using silver nanowires as intermediate charge collecting electrode. Through a rational interface design, a robust interlayer is established, enabling the efficient extraction and transport of electrons from subcells. The resulting parallel tandem cells exhibit high fill factors of ∼60% and enhanced current densities which are identical to the sum of the current densities of the subcells. These results suggest that solution-processed parallel tandem configuration provides an alternative avenue toward high performance photovoltaic devices.

  15. Semantic Business Process Modeling

    OpenAIRE

    Markovic, Ivan

    2010-01-01

    This book presents a process-oriented business modeling framework based on semantic technologies. The framework consists of modeling languages, methods, and tools that allow for semantic modeling of business motivation, business policies and rules, and business processes. Quality of the proposed modeling framework is evaluated based on the modeling content of SAP Solution Composer and several real-world business scenarios.

  16. Managing internode data communications for an uninitialized process in a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

    2014-05-20

    A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.

  17. Parallel processing algorithms for hydrocodes on a computer with MIMD architecture (DENELCOR's HEP)

    International Nuclear Information System (INIS)

    Hicks, D.L.

    1983-11-01

    In real time simulation/prediction of complex systems such as water-cooled nuclear reactors, if reactor operators had fast simulator/predictors to check the consequences of their operations before implementing them, events such as the incident at Three Mile Island might be avoided. However, existing simulator/predictors such as RELAP run slower than real time on serial computers. It appears that the only way to overcome the barrier to higher computing rates is to use computers with architectures that allow concurrent computations or parallel processing. The computer architecture with the greatest degree of parallelism is labeled Multiple Instruction Stream, Multiple Data Stream (MIMD). An example of a machine of this type is the HEP computer by DENELCOR. It appears that hydrocodes are very well suited for parallelization on the HEP. It is a straightforward exercise to parallelize explicit, one-dimensional Lagrangean hydrocodes in a zone-by-zone parallelization. Similarly, implicit schemes can be parallelized in a zone-by-zone fashion via an a priori, symbolic inversion of the tridiagonal matrix that arises in an implicit scheme. These techniques are extended to Eulerian hydrocodes by using Harlow's rezone technique. The extension from single-phase Eulerian to two-phase Eulerian is straightforward. This step-by-step extension leads to hydrocodes with zone-by-zone parallelization that are capable of two-phase flow simulation. Extensions to two and three spatial dimensions can be achieved by operator splitting. It appears that a zone-by-zone parallelization is the best way to utilize the capabilities of an MIMD machine. 40 references

  18. Processing communications events in parallel active messaging interface by awakening thread from wait state

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-10-22

    Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.

  19. A queueing network model to analyze the impact of parallelization of care on patient cycle time.

    Science.gov (United States)

    Jiang, Lixiang; Giachetti, Ronald E

    2008-09-01

    The total time a patient spends in an outpatient facility, called the patient cycle time, is a major contributor to overall patient satisfaction. A frequently recommended strategy to reduce the total time is to perform some activities in parallel thereby shortening patient cycle time. To analyze patient cycle time this paper extends and improves upon existing multi-class open queueing network model (MOQN) so that the patient flow in an urgent care center can be modeled. Results of the model are analyzed using data from an urgent care center contemplating greater parallelization of patient care activities. The results indicate that parallelization can reduce the cycle time for those patient classes which require more than one diagnostic and/ or treatment intervention. However, for many patient classes there would be little if any improvement, indicating the importance of tools to analyze business process reengineering rules. The paper makes contributions by implementing an approximation for fork/join queues in the network and by improving the approximation for multiple server queues in both low traffic and high traffic conditions. We demonstrate the accuracy of the MOQN results through comparisons to simulation results.

  20. Modeling multiphase materials processes

    CERN Document Server

    Iguchi, Manabu

    2010-01-01

    ""Modeling Multiphase Materials Processes: Gas-Liquid Systems"" describes the methodology and application of physical and mathematical modeling to multi-phase flow phenomena in materials processing. The book focuses on systems involving gas-liquid interaction, the most prevalent in current metallurgical processes. The performance characteristics of these processes are largely dependent on transport phenomena. This volume covers the inherent characteristics that complicate the modeling of transport phenomena in such systems, including complex multiphase structure, intense turbulence, opacity of

  1. Dynamics modeling for parallel haptic interfaces with force sensing and control.

    Science.gov (United States)

    Bernstein, Nicholas; Lawrence, Dale; Pao, Lucy

    2013-01-01

    Closed-loop force control can be used on haptic interfaces (HIs) to mitigate the effects of mechanism dynamics. A single multidimensional force-torque sensor is often employed to measure the interaction force between the haptic device and the user's hand. The parallel haptic interface at the University of Colorado (CU) instead employs smaller 1D force sensors oriented along each of the five actuating rods to build up a 5D force vector. This paper shows that a particular manipulandum/hand partition in the system dynamics is induced by the placement and type of force sensing, and discusses the implications on force and impedance control for parallel haptic interfaces. The details of a "squaring down" process are also discussed, showing how to obtain reduced degree-of-freedom models from the general six degree-of-freedom dynamics formulation.

  2. The design of multi-core DSP parallel model based on message passing and multi-level pipeline

    Science.gov (United States)

    Niu, Jingyu; Hu, Jian; He, Wenjing; Meng, Fanrong; Li, Chuanrong

    2017-10-01

    Currently, the design of embedded signal processing system is often based on a specific application, but this idea is not conducive to the rapid development of signal processing technology. In this paper, a parallel processing model architecture based on multi-core DSP platform is designed, and it is mainly suitable for the complex algorithms which are composed of different modules. This model combines the ideas of multi-level pipeline parallelism and message passing, and summarizes the advantages of the mainstream model of multi-core DSP (the Master-Slave model and the Data Flow model), so that it has better performance. This paper uses three-dimensional image generation algorithm to validate the efficiency of the proposed model by comparing with the effectiveness of the Master-Slave and the Data Flow model.

  3. Recent development for the ITS code system: Parallel processing and visualization

    International Nuclear Information System (INIS)

    Fan, W.C.; Turner, C.D.; Halbleib, J.A. Sr.; Kensek, R.P.

    1996-01-01

    A brief overview is given for two software developments related to the ITS code system. These developments provide parallel processing and visualization capabilities and thus allow users to perform ITS calculations more efficiently. Timing results and a graphical example are presented to demonstrate these capabilities

  4. Real-time SHVC software decoding with multi-threaded parallel processing

    Science.gov (United States)

    Gudumasu, Srinivas; He, Yuwen; Ye, Yan; He, Yong; Ryu, Eun-Seok; Dong, Jie; Xiu, Xiaoyu

    2014-09-01

    This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.

  5. High Performance Parallel Processing Project: Industrial computing initiative. Progress reports for fiscal year 1995

    Energy Technology Data Exchange (ETDEWEB)

    Koniges, A.

    1996-02-09

    This project is a package of 11 individual CRADA`s plus hardware. This innovative project established a three-year multi-party collaboration that is significantly accelerating the availability of commercial massively parallel processing computing software technology to U.S. government, academic, and industrial end-users. This report contains individual presentations from nine principal investigators along with overall program information.

  6. Exact stationary state for an asymmetric exclusion process with fully parallel dynamics

    NARCIS (Netherlands)

    Gier, J.C.|info:eu-repo/dai/nl/170218430; Nienhuis, B.

    The exact stationary state of an asymmetric exclusion process with fully parallel dynamics is obtained using the matrix product ansatz. We give a simple derivation for the deterministic case by a physical interpretation of the dimension of the matrices. We prove the stationarity via a cancellation

  7. Facilitating arrhythmia simulation: the method of quantitative cellular automata modeling and parallel running

    Directory of Open Access Journals (Sweden)

    Mondry Adrian

    2004-08-01

    Full Text Available Abstract Background Many arrhythmias are triggered by abnormal electrical activity at the ionic channel and cell level, and then evolve spatio-temporally within the heart. To understand arrhythmias better and to diagnose them more precisely by their ECG waveforms, a whole-heart model is required to explore the association between the massively parallel activities at the channel/cell level and the integrative electrophysiological phenomena at organ level. Methods We have developed a method to build large-scale electrophysiological models by using extended cellular automata, and to run such models on a cluster of shared memory machines. We describe here the method, including the extension of a language-based cellular automaton to implement quantitative computing, the building of a whole-heart model with Visible Human Project data, the parallelization of the model on a cluster of shared memory computers with OpenMP and MPI hybrid programming, and a simulation algorithm that links cellular activity with the ECG. Results We demonstrate that electrical activities at channel, cell, and organ levels can be traced and captured conveniently in our extended cellular automaton system. Examples of some ECG waveforms simulated with a 2-D slice are given to support the ECG simulation algorithm. A performance evaluation of the 3-D model on a four-node cluster is also given. Conclusions Quantitative multicellular modeling with extended cellular automata is a highly efficient and widely applicable method to weave experimental data at different levels into computational models. This process can be used to investigate complex and collective biological activities that can be described neither by their governing differentiation equations nor by discrete parallel computation. Transparent cluster computing is a convenient and effective method to make time-consuming simulation feasible. Arrhythmias, as a typical case, can be effectively simulated with the methods

  8. Process modeling style

    CERN Document Server

    Long, John

    2014-01-01

    Process Modeling Style focuses on other aspects of process modeling beyond notation that are very important to practitioners. Many people who model processes focus on the specific notation used to create their drawings. While that is important, there are many other aspects to modeling, such as naming, creating identifiers, descriptions, interfaces, patterns, and creating useful process documentation. Experience author John Long focuses on those non-notational aspects of modeling, which practitioners will find invaluable. Gives solid advice for creating roles, work produ

  9. Availability modeling and optimization of dynamic multi-state series–parallel systems with random reconfiguration

    International Nuclear Information System (INIS)

    Li, Y.F.; Peng, R.

    2014-01-01

    Most studies on multi-state series–parallel systems focus on the static type of system architecture. However, it is insufficient to model many complex industrial systems having several operation phases and each requires a subset of the subsystems combined together to perform certain tasks. To bridge this gap, this study takes into account this type of dynamic behavior in the multi-state series–parallel system and proposes an analytical approach to calculate the system availability and the operation cost. In this approach, Markov process is used to model the dynamics of system phase changing and component state changing, Markov reward model is used to calculate the operation cost associated with the dynamics, and universal generating function (UGF) is used to build system availability function from the system phase model and the component models. Based upon these models, an optimization problem is formulated to minimize the total system cost with the constraint that system availability is greater than a desired level. The genetic algorithm is then applied to solve the optimization problem. The proposed modeling and solution procedures are illustrated on a system design problem modified from a real-world maritime oil transportation system

  10. A Model of Parallel Kinematics for Machine Calibration

    DEFF Research Database (Denmark)

    Pedersen, David Bue; Bæk Nielsen, Morten; Kløve Christensen, Simon

    2016-01-01

    Parallel kinematics have been adopted by more than 25 manufacturers of high-end desktop 3D printers [Wohlers Report (2015), p.118] as well as by research projects such as the WASP project [WASP (2015)], a 12 meter tall linear delta robot for Additive Manufacture of large-scale components for cons......Parallel kinematics have been adopted by more than 25 manufacturers of high-end desktop 3D printers [Wohlers Report (2015), p.118] as well as by research projects such as the WASP project [WASP (2015)], a 12 meter tall linear delta robot for Additive Manufacture of large-scale components...

  11. Parallel Development of Products and New Business Models

    OpenAIRE

    Lund, Morten; Hansen, Poul H. Kyvsgård

    2014-01-01

    The perception of product development and the practical execution of product development in professional organizations have undergone dramatic changes in recent years. Many of these chances relate to introduction of broader and more cross-disciplinary views that involves new organizational functions and new concepts. These chances can be captured in various generations of practice. This paper will discuss the recent development of 3rd generation product development process models and the emer...

  12. Design of parallel intersector weld/cut robot for machining processes in ITER vacuum vessel

    International Nuclear Information System (INIS)

    Wu Huapeng; Handroos, Heikki; Kovanen, Janne; Rouvinen, Asko; Hannukainen, Petri; Saira, Tanja; Jones, Lawrence

    2003-01-01

    This paper presents a new parallel robot Penta-WH, which has five degrees of freedom driven by hydraulic cylinders. The manipulator has a large, singularity-free workspace and high stiffness and it acts as a transport device for welding, machining and inspection end-effectors inside the ITER vacuum vessel. The presented kinematic structure of a parallel robot is particularly suitable for the ITER environment. Analysis of the machining process for ITER, such as the machining methods and forces are given, and the kinematic analyses, such as workspace and force capacity are discussed

  13. Product and Process Modelling

    DEFF Research Database (Denmark)

    Cameron, Ian T.; Gani, Rafiqul

    . These approaches are put into the context of life cycle modelling, where multiscale and multiform modelling is increasingly prevalent in the 21st century. The book commences with a discussion of modern product and process modelling theory and practice followed by a series of case studies drawn from a variety......This book covers the area of product and process modelling via a case study approach. It addresses a wide range of modelling applications with emphasis on modelling methodology and the subsequent in-depth analysis of mathematical models to gain insight via structural aspects of the models...... to biotechnology applications, food, polymer and human health application areas. The book highlights to important nature of modern product and process modelling in the decision making processes across the life cycle. As such it provides an important resource for students, researchers and industrial practitioners....

  14. Implementation of an Agent-Based Parallel Tissue Modelling Framework for the Intel MIC Architecture

    Directory of Open Access Journals (Sweden)

    Maciej Cytowski

    2017-01-01

    Full Text Available Timothy is a novel large scale modelling framework that allows simulating of biological processes involving different cellular colonies growing and interacting with variable environment. Timothy was designed for execution on massively parallel High Performance Computing (HPC systems. The high parallel scalability of the implementation allows for simulations of up to 109 individual cells (i.e., simulations at tissue spatial scales of up to 1 cm3 in size. With the recent advancements of the Timothy model, it has become critical to ensure appropriate performance level on emerging HPC architectures. For instance, the introduction of blood vessels supplying nutrients to the tissue is a very important step towards realistic simulations of complex biological processes, but it greatly increased the computational complexity of the model. In this paper, we describe the process of modernization of the application in order to achieve high computational performance on HPC hybrid systems based on modern Intel® MIC architecture. Experimental results on the Intel Xeon Phi™ coprocessor x100 and the Intel Xeon Phi processor x200 are presented.

  15. Distributed system for parallel data processing of ECT signals for electromagnetic flaw detection in materials

    International Nuclear Information System (INIS)

    Guliashki, Vassil; Marinova, Galia

    2002-01-01

    The paper proposes a distributed system for parallel data processing of ECT signals for flaw detection in materials. The measured data are stored in files on a host computer, where a JAVA server is located. The host computer is connected through Internet to a set of client computers, distributed geographically. The data are distributed from the host computer by means of the JAVA server to the client computers according their requests. The software necessary for the data processing is installed on each client computer in advance. The organization of the data processing on many computers, working simultaneously in parallel, leads to great time reducing, especially in cases when huge amount of data should be processed in very short time. (Author)

  16. Supertracker: A Programmable Parallel Pipeline Arithmetic Processor For Auto-Cueing Target Processing

    Science.gov (United States)

    Mack, Harold; Reddi, S. S.

    1980-04-01

    Supertracker represents a programmable parallel pipeline computer architecture that has been designed to meet the real time image processing requirements of auto-cueing target data processing. The prototype bread-board currently under development will be designed to perform input video preprocessing and processing for 525-line and 875-line TV formats FLIR video, automatic display gain and contrast control, and automatic target cueing, classification, and tracking. The video preprocessor is capable of performing operations full frames of video data in real time, e.g., frame integration, storage, 3 x 3 convolution, and neighborhood processing. The processor architecture is being implemented using bit-slice microprogrammable arithmetic processors, operating in parallel. Each processor is capable of up to 20 million operations per second. Multiple frame memories are used for additional flexibility.

  17. Performance of MPI parallel processing implemented by MCNP5/ MCNPX for criticality benchmark problems

    International Nuclear Information System (INIS)

    Mark Dennis Usang; Mohd Hairie Rabir; Mohd Amin Sharifuldin Salleh; Mohamad Puad Abu

    2012-01-01

    MPI parallelism are implemented on a SUN Workstation for running MCNPX and on the High Performance Computing Facility (HPC) for running MCNP5. 23 input less obtained from MCNP Criticality Validation Suite are utilized for the purpose of evaluating the amount of speed up achievable by using the parallel capabilities of MPI. More importantly, we will study the economics of using more processors and the type of problem where the performance gain are obvious. This is important to enable better practices of resource sharing especially for the HPC facilities processing time. Future endeavours in this direction might even reveal clues for best MCNP5/ MCNPX coding practices for optimum performance of MPI parallelisms. (author)

  18. Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

    Science.gov (United States)

    Qin, Cheng-Zhi; Zhan, Lijun

    2012-06-01

    As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU

  19. Standard Model processes

    CERN Document Server

    Mangano, M.L.; Aguilar-Saavedra, Juan Antonio; Alekhin, S.; Badger, S.; Bauer, C.W.; Becher, T.; Bertone, V.; Bonvini, M.; Boselli, S.; Bothmann, E.; Boughezal, R.; Cacciari, M.; Carloni Calame, C.M.; Caola, F.; Campbell, J.M.; Carrazza, S.; Chiesa, M.; Cieri, L.; Cimaglia, F.; Febres Cordero, F.; Ferrarese, P.; D'Enterria, D.; Ferrera, G.; Garcia i Tormo, X.; Garzelli, M.V.; Germann, E.; Hirschi, V.; Han, T.; Ita, H.; Jäger, B.; Kallweit, S.; Karlberg, A.; Kuttimalai, S.; Krauss, F.; Larkoski, A.J.; Lindert, J.; Luisoni, G.; Maierhöfer, P.; Mattelaer, O.; Martinez, H.; Moch, S.; Montagna, G.; Moretti, M.; Nason, P.; Nicrosini, O.; Oleari, C.; Pagani, D.; Papaefstathiou, A.; Petriello, F.; Piccinini, F.; Pierini, M.; Pierog, T.; Pozzorini, S.; Re, E.; Robens, T.; Rojo, J.; Ruiz, R.; Sakurai, K.; Salam, G.P.; Salfelder, L.; Schönherr, M.; Schulze, M.; Schumann, S.; Selvaggi, M.; Shivaji, A.; Siodmok, A.; Skands, P.; Torrielli, P.; Tramontano, F.; Tsinikos, I.; Tweedie, B.; Vicini, A.; Westhoff, S.; Zaro, M.; Zeppenfeld, D.; CERN. Geneva. ATS Department

    2017-06-22

    This report summarises the properties of Standard Model processes at the 100 TeV pp collider. We document the production rates and typical distributions for a number of benchmark Standard Model processes, and discuss new dynamical phenomena arising at the highest energies available at this collider. We discuss the intrinsic physics interest in the measurement of these Standard Model processes, as well as their role as backgrounds for New Physics searches.

  20. Load balancing in highly parallel processing of Monte Carlo code for particle transport

    International Nuclear Information System (INIS)

    Higuchi, Kenji; Takemiya, Hiroshi; Kawasaki, Takuji

    1998-01-01

    In parallel processing of Monte Carlo (MC) codes for neutron, photon and electron transport problems, particle histories are assigned to processors making use of independency of the calculation for each particle. Although we can easily parallelize main part of a MC code by this method, it is necessary and practically difficult to optimize the code concerning load balancing in order to attain high speedup ratio in highly parallel processing. In fact, the speedup ratio in the case of 128 processors remains in nearly one hundred times when using the test bed for the performance evaluation. Through the parallel processing of the MCNP code, which is widely used in the nuclear field, it is shown that it is difficult to attain high performance by static load balancing in especially neutron transport problems, and a load balancing method, which dynamically changes the number of assigned particles minimizing the sum of the computational and communication costs, overcomes the difficulty, resulting in nearly fifteen percentage of reduction for execution time. (author)

  1. Using parallel computing in modeling and optimization of mineral ...

    African Journals Online (AJOL)

    Then to solve ultimate pit limit problem it is required to find such a sub graph in a graph whose sum of weights will be maximal. One of the possible solutions of this problem is using genetic algorithms. We use a ... Details of implementation parallel genetic algorithm for searching open pit limits are provided. Comparison with ...

  2. Calibrationless Parallel Magnetic Resonance Imaging: A Joint Sparsity Model

    Directory of Open Access Journals (Sweden)

    Angshul Majumdar

    2013-12-01

    Full Text Available State-of-the-art parallel MRI techniques either explicitly or implicitly require certain parameters to be estimated, e.g., the sensitivity map for SENSE, SMASH and interpolation weights for GRAPPA, SPIRiT. Thus all these techniques are sensitive to the calibration (parameter estimation stage. In this work, we have proposed a parallel MRI technique that does not require any calibration but yields reconstruction results that are at par with (or even better than state-of-the-art methods in parallel MRI. Our proposed method required solving non-convex analysis and synthesis prior joint-sparsity problems. This work also derives the algorithms for solving them. Experimental validation was carried out on two datasets—eight channel brain and eight channel Shepp-Logan phantom. Two sampling methods were used—Variable Density Random sampling and non-Cartesian Radial sampling. For the brain data, acceleration factor of 4 was used and for the other an acceleration factor of 6 was used. The reconstruction results were quantitatively evaluated based on the Normalised Mean Squared Error between the reconstructed image and the originals. The qualitative evaluation was based on the actual reconstructed images. We compared our work with four state-of-the-art parallel imaging techniques; two calibrated methods—CS SENSE and l1SPIRiT and two calibration free techniques—Distributed CS and SAKE. Our method yields better reconstruction results than all of them.

  3. Strong Bisimilarity and Regularity of Basic Parallel Processes is PSPACE-Hard

    DEFF Research Database (Denmark)

    Srba, Jirí

    2002-01-01

    We show that the problem of checking whether two processes definable in the syntax of Basic Parallel Processes (BPP) are strongly bisimilar is PSPACE-hard. We also demonstrate that there is a polynomial time reduction from the strong bisimilarity checking problem of regular BPP to the strong...... regularity (finiteness) checking of BPP. This implies that strong regularity of BPP is also PSPACE-hard....

  4. Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

    Science.gov (United States)

    Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

    2017-08-01

    We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU

  5. A web-based, collaborative modeling, simulation, and parallel computing environment for electromechanical systems

    Directory of Open Access Journals (Sweden)

    Xiaoliang Yin

    2015-03-01

    Full Text Available Complex electromechanical system is usually composed of multiple components from different domains, including mechanical, electronic, hydraulic, control, and so on. Modeling and simulation for electromechanical system on a unified platform is one of the research hotspots in system engineering at present. It is also the development trend of the design for complex electromechanical system. The unified modeling techniques and tools based on Modelica language provide a satisfactory solution. To meet with the requirements of collaborative modeling, simulation, and parallel computing for complex electromechanical systems based on Modelica, a general web-based modeling and simulation prototype environment, namely, WebMWorks, is designed and implemented. Based on the rich Internet application technologies, an interactive graphic user interface for modeling and post-processing on web browser was implemented; with the collaborative design module, the environment supports top-down, concurrent modeling and team cooperation; additionally, service-oriented architecture–based architecture was applied to supply compiling and solving services which run on cloud-like servers, so the environment can manage and dispatch large-scale simulation tasks in parallel on multiple computing servers simultaneously. An engineering application about pure electric vehicle is tested on WebMWorks. The results of simulation and parametric experiment demonstrate that the tested web-based environment can effectively shorten the design cycle of the complex electromechanical system.

  6. Parallel Algorithm for Solving TOV Equations for Sequence of Cold and Dense Nuclear Matter Models

    Science.gov (United States)

    Ayriyan, Alexander; Buša, Ján; Grigorian, Hovik; Poghosyan, Gevorg

    2018-04-01

    We have introduced parallel algorithm simulation of neutron star configurations for set of equation of state models. The performance of the parallel algorithm has been investigated for testing set of EoS models on two computational systems. It scales when using with MPI on modern CPUs and this investigation allowed us also to compare two different types of computational nodes.

  7. Real-time data acquisition and parallel data processing solution for TJ-II Bolometer arrays diagnostic

    Energy Technology Data Exchange (ETDEWEB)

    Barrera, E. [Departamento de Sistemas Electronicos y de Control, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain)]. E-mail: eduardo.barrera@upm.es; Ruiz, M. [Grupo de Investigacion en Instrumentacion y Acustica Aplicada, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain); Lopez, S. [Departamento de Sistemas Electronicos y de Control, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain); Machon, D. [Departamento de Sistemas Electronicos y de Control, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain); Vega, J. [Asociacion EURATOM/CIEMAT para Fusion, 28040 Madrid (Spain); Ochando, M. [Asociacion EURATOM/CIEMAT para Fusion, 28040 Madrid (Spain)

    2006-07-15

    Maps of local plasma emissivity of TJ-II plasmas are determined using three-array cameras of silicon photodiodes (AXUV type from IRD). They have assigned the top and side ports of the same sector of the vacuum vessel. Each array consists of 20 unfiltered detectors. The signals from each of these detectors are the inputs to an iterative algorithm of tomographic reconstruction. Currently, these signals are acquired by a PXI standard system at approximately 50 kS/s, with 12 bits of resolution and are stored for off-line processing. A 0.5 s discharge generates 3 Mbytes of raw data. The algorithm's load exceeds the CPU capacity of the PXI system's controller in a continuous mode, making unfeasible to process the samples in parallel with their acquisition in a PXI standard system. A new architecture model has been developed, making possible to add one or several processing cards to a standard PXI system. With this model, it is possible to define how to distribute, in real-time, the data from all acquired signals in the system among the processing cards and the PXI controller. This way, by distributing the data processing among the system controller and two processing cards, the data processing can be done in parallel with the acquisition. Hence, this system configuration would be able to measure even in long pulse devices.

  8. Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages

    Science.gov (United States)

    Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel

    2018-01-01

    This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.

  9. Parallel computing of a climate model on the dawn 1000 by domain decomposition method

    Science.gov (United States)

    Bi, Xunqiang

    1997-12-01

    In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.

  10. Implementing parallel spreadsheet models for health policy decisions: The impact of unintentional errors on model projections.

    Science.gov (United States)

    Bailey, Stephanie L; Bono, Rose S; Nash, Denis; Kimmel, April D

    2018-01-01

    Spreadsheet software is increasingly used to implement systems science models informing health policy decisions, both in academia and in practice where technical capacity may be limited. However, spreadsheet models are prone to unintentional errors that may not always be identified using standard error-checking techniques. Our objective was to illustrate, through a methodologic case study analysis, the impact of unintentional errors on model projections by implementing parallel model versions. We leveraged a real-world need to revise an existing spreadsheet model designed to inform HIV policy. We developed three parallel versions of a previously validated spreadsheet-based model; versions differed by the spreadsheet cell-referencing approach (named single cells; column/row references; named matrices). For each version, we implemented three model revisions (re-entry into care; guideline-concordant treatment initiation; immediate treatment initiation). After standard error-checking, we identified unintentional errors by comparing model output across the three versions. Concordant model output across all versions was considered error-free. We calculated the impact of unintentional errors as the percentage difference in model projections between model versions with and without unintentional errors, using +/-5% difference to define a material error. We identified 58 original and 4,331 propagated unintentional errors across all model versions and revisions. Over 40% (24/58) of original unintentional errors occurred in the column/row reference model version; most (23/24) were due to incorrect cell references. Overall, >20% of model spreadsheet cells had material unintentional errors. When examining error impact along the HIV care continuum, the percentage difference between versions with and without unintentional errors ranged from +3% to +16% (named single cells), +26% to +76% (column/row reference), and 0% (named matrices). Standard error-checking techniques may not

  11. Cache-aware data structure model for parallelism and dynamic load balancing

    International Nuclear Information System (INIS)

    Sridi, Marwa

    2016-01-01

    This PhD thesis is dedicated to the implementation of innovative parallel methods in the framework of fast transient fluid-structure dynamics. It improves existing methods within EUROPLEXUS software, in order to optimize the shared memory parallel strategy, complementary to the original distributed memory approach, brought together into a global hybrid strategy for clusters of multi-core nodes. Starting from a sound analysis of the state of the art concerning data structuring techniques correlated to the hierarchic memory organization of current multi-processor architectures, the proposed work introduces an approach suitable for an explicit time integration (i.e. with no linear system to solve at each step). A data structure of type 'Structure of arrays' is conserved for the global data storage, providing flexibility and efficiency for current operations on kinematics fields (displacement, velocity and acceleration). On the contrary, in the particular case of elementary operations (for internal forces generic computations, as well as fluxes computations between cell faces for fluid models), particularly time consuming but localized in the program, a temporary data structure of type 'Array of structures' is used instead, to force an efficient filling of the cache memory and increase the performance of the resolution, for both serial and shared memory parallel processing. Switching from the global structure to the temporary one is based on a cell grouping strategy, following classing cache-blocking principles but handling specifically for this work neighboring data necessary to the efficient treatment of ALE fluxes for cells on the group boundaries. The proposed approach is extensively tested, from the point of views of both the computation time and the access failures into cache memory, confronting the gains obtained within the elementary operations to the potential overhead generated by the data structure switch. Obtained results are very satisfactory, especially

  12. A new decomposition method for parallel processing multi-level optimization

    International Nuclear Information System (INIS)

    Park, Hyung Wook; Kim, Min Soo; Choi, Dong Hoon

    2002-01-01

    In practical designs, most of the multidisciplinary problems have a large-size and complicate design system. Since multidisciplinary problems have hundreds of analyses and thousands of variables, the grouping of analyses and the order of the analyses in the group affect the speed of the total design cycle. Therefore, it is very important to reorder and regroup the original design processes in order to minimize the total computational cost by decomposing large multidisciplinary problems into several MultiDisciplinary Analysis SubSystems (MDASS) and by processing them in parallel. In this study, a new decomposition method is proposed for parallel processing of multidisciplinary design optimization, such as Collaborative Optimization (CO) and Individual Discipline Feasible (IDF) method. Numerical results for two example problems are presented to show the feasibility of the proposed method

  13. Resistance to awareness of the supervisor's transferences with special reference to the parallel process.

    Science.gov (United States)

    Stimmel, B

    1995-06-01

    Supervision is an essential part of psychoanalytic education. Although not taken for granted, it is not studied with the same critical eye as is the analytic process. This paper examines the supervision specifically with a focus on the supervisor's transference towards the supervisee. The point is made, in the context of clinical examples, that one of the ways these transference reactions may be rationalised is within the setting of the parallel process so often encountered in supervision. Parallel process, a very familiar term, is used frequently and easily when discussing supervision. It may be used also as a resistance to awareness of transference phenomena within the supervisor in relation to the supervisee, particularly because of its clinical presentation. It is an enactment between supervisor and supervisee, thus ripe with possibilities for disguise, displacement and gratification. While transference reactions of the supervisee are often discussed, those of the supervisor are notably missing in our literature.

  14. A model for optimizing file access patterns using spatio-temporal parallelism

    Energy Technology Data Exchange (ETDEWEB)

    Boonthanome, Nouanesengsy [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Patchett, John [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Geveci, Berk [Kitware Inc., Clifton Park, NY (United States); Ahrens, James [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Bauer, Andy [Kitware Inc., Clifton Park, NY (United States); Chaudhary, Aashish [Kitware Inc., Clifton Park, NY (United States); Miller, Ross G. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Shipman, Galen M. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Williams, Dean N. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2013-01-01

    For many years now, I/O read time has been recognized as the primary bottleneck for parallel visualization and analysis of large-scale data. In this paper, we introduce a model that can estimate the read time for a file stored in a parallel filesystem when given the file access pattern. Read times ultimately depend on how the file is stored and the access pattern used to read the file. The file access pattern will be dictated by the type of parallel decomposition used. We employ spatio-temporal parallelism, which combines both spatial and temporal parallelism, to provide greater flexibility to possible file access patterns. Using our model, we were able to configure the spatio-temporal parallelism to design optimized read access patterns that resulted in a speedup factor of approximately 400 over traditional file access patterns.

  15. A Parallel and Distributed Surrogate Model Implementation for Computational Steering

    KAUST Repository

    Butnaru, Daniel

    2012-06-01

    Understanding the influence of multiple parameters in a complex simulation setting is a difficult task. In the ideal case, the scientist can freely steer such a simulation and is immediately presented with the results for a certain configuration of the input parameters. Such an exploration process is however not possible if the simulation is computationally too expensive. For these cases we present in this paper a scalable computational steering approach utilizing a fast surrogate model as substitute for the time-consuming simulation. The surrogate model we propose is based on the sparse grid technique, and we identify the main computational tasks associated with its evaluation and its extension. We further show how distributed data management combined with the specific use of accelerators allows us to approximate and deliver simulation results to a high-resolution visualization system in real-time. This significantly enhances the steering workflow and facilitates the interactive exploration of large datasets. © 2012 IEEE.

  16. A New Tool for Intelligent Parallel Processing of Radar/SAR Remotely Sensed Imagery

    Directory of Open Access Journals (Sweden)

    A. Castillo Atoche

    2013-01-01

    Full Text Available A novel parallel tool for large-scale image enhancement/reconstruction and postprocessing of radar/SAR sensor systems is addressed. The proposed parallel tool performs the following intelligent processing steps: image formation, for the application of different system-level effects of image degradation with a particular remote sensing (RS system and simulation of random noising effects, enhancement/reconstruction by employing nonparametric robust high-resolution techniques, and image postprocessing using the fuzzy anisotropic diffusion technique which incorporates a better edge-preserving noise removal effect and faster diffusion process. This innovative tool allows the processing of high-resolution images provided with different radar/SAR sensor systems as required by RS endusers for environmental monitoring, risk prevention, and resource management. To verify the performance implementation of the proposed parallel framework, the processing steps are developed and specifically tested on graphic processing units (GPU, achieving considerable speedups compared to the serial version of the same techniques implemented in C language.

  17. Design of a family of integrated parallel co-processors for images processing

    International Nuclear Information System (INIS)

    Court, Thierry

    1991-01-01

    The design of parallel image processing Systems joining in a same architecture, sophisticated microprocessors and specialised operators is a difficult task, because of the various problems to be taken into account. The current study identifies a certain way of realizing and interfacing such dedicated operators to a central unit with microprocessor type. The two guide lines of this work are the search for polyvalent specialized and re-configurated operators as well as their connections to a System bus, and not to specialized video buses. This research work proposes a certain architecture of circuits dedicated to image processing and two realization proposals of them. One of them was be realized in this study by using silicon compiler tools. This work belongs to a more important project, whose aim is the development of an industrial image processing System, high performing, modular, based on the parallelization, in MIMD structures, of an elementary, autonomous image processing unit integrating a microprocessor equipped with a parallel coprocessor suited to image processing. (author) [fr

  18. Practical enhancement factor model based on GM for multiple parallel reactions: Piperazine (PZ) CO2 capture

    DEFF Research Database (Denmark)

    Gaspar, Jozsef; Fosbøl, Philip Loldrup

    2017-01-01

    Reactive absorption is a key process for gas separation and purification and it is the main technology for CO2 capture. Thus, reliable and simple mathematical models for mass transfer rate calculation are essential. Models which apply to parallel interacting and non-interacting reactions, for all......, desorption and pinch conditions.In this work, we apply the GM model to multiple parallel reactions. We deduce the model for piperazine (PZ) CO2 capture and we validate it against wetted-wall column measurements using 2, 5 and 8 molal PZ for temperatures between 40 °C and 100 °C and CO2 loadings between 0.......23 and 0.41 mol CO2/2 mol PZ. We show that overall second order kinetics describes well the reaction between CO2 and PZ accounting for the carbamate and bicarbamate reactions. Here we prove the GM model for piperazine and MEA but we expect that this practical approach is applicable for various amines...

  19. Parallel processing for a 1-D time-dependent solution to impurity rate equations for fusion plasma simulations

    International Nuclear Information System (INIS)

    Veerasingam, R.

    1990-01-01

    In fusion plasmas impurities such as carbon, oxygen or nickel can contaminate the plasma and cause degradation of the performance of a fusion device through radiation. However, impurities can also be used as diagnostics to obtain information about a plasma through spectroscopic experiments which can then be used in plasma modeling and simulations. In the past, serial algorithms have been described for either the time dependent or steady state problem. In this paper, we describe a parallel procedure adopted to solve the time-dependent problem. It can be shown that for the steady state problem a parallel procedure would not be a useful application of parallelization because a few seconds of the Central Processing Unit time on a CRAY-XMP or IBM 3090/600S would suffice to obtain the solution, while this is not the case for the time-dependent problem. In order to study the effects of low Z and high Z impurities on the final state of a plasma, time-dependent solutions are necessary. For purposes of diagnostics and comparisons with experiments, a fast turn around time of the simulations would be advantageous. We have implemented a parallel algorithm on and IBM 3090/600S and tested its performance for a typical set of fusion plasma parameters. 4 refs., 1 tab

  20. The Potsdam Parallel Ice Sheet Model (PISM-PIK – Part 1: Model description

    Directory of Open Access Journals (Sweden)

    R. Winkelmann

    2011-09-01

    Full Text Available We present the Potsdam Parallel Ice Sheet Model (PISM-PIK, developed at the Potsdam Institute for Climate Impact Research to be used for simulations of large-scale ice sheet-shelf systems. It is derived from the Parallel Ice Sheet Model (Bueler and Brown, 2009. Velocities are calculated by superposition of two shallow stress balance approximations within the entire ice covered region: the shallow ice approximation (SIA is dominant in grounded regions and accounts for shear deformation parallel to the geoid. The plug-flow type shallow shelf approximation (SSA dominates the velocity field in ice shelf regions and serves as a basal sliding velocity in grounded regions. Ice streams can be identified diagnostically as regions with a significant contribution of membrane stresses to the local momentum balance. All lateral boundaries in PISM-PIK are free to evolve, including the grounding line and ice fronts. Ice shelf margins in particular are modeled using Neumann boundary conditions for the SSA equations, reflecting a hydrostatic stress imbalance along the vertical calving face. The ice front position is modeled using a subgrid-scale representation of calving front motion (Albrecht et al., 2011 and a physically-motivated calving law based on horizontal spreading rates. The model is tested in experiments from the Marine Ice Sheet Model Intercomparison Project (MISMIP. A dynamic equilibrium simulation of Antarctica under present-day conditions is presented in Martin et al. (2011.

  1. The Potsdam Parallel Ice Sheet Model (PISM-PIK) - Part 1: Model description

    Science.gov (United States)

    Winkelmann, R.; Martin, M. A.; Haseloff, M.; Albrecht, T.; Bueler, E.; Khroulev, C.; Levermann, A.

    2011-09-01

    We present the Potsdam Parallel Ice Sheet Model (PISM-PIK), developed at the Potsdam Institute for Climate Impact Research to be used for simulations of large-scale ice sheet-shelf systems. It is derived from the Parallel Ice Sheet Model (Bueler and Brown, 2009). Velocities are calculated by superposition of two shallow stress balance approximations within the entire ice covered region: the shallow ice approximation (SIA) is dominant in grounded regions and accounts for shear deformation parallel to the geoid. The plug-flow type shallow shelf approximation (SSA) dominates the velocity field in ice shelf regions and serves as a basal sliding velocity in grounded regions. Ice streams can be identified diagnostically as regions with a significant contribution of membrane stresses to the local momentum balance. All lateral boundaries in PISM-PIK are free to evolve, including the grounding line and ice fronts. Ice shelf margins in particular are modeled using Neumann boundary conditions for the SSA equations, reflecting a hydrostatic stress imbalance along the vertical calving face. The ice front position is modeled using a subgrid-scale representation of calving front motion (Albrecht et al., 2011) and a physically-motivated calving law based on horizontal spreading rates. The model is tested in experiments from the Marine Ice Sheet Model Intercomparison Project (MISMIP). A dynamic equilibrium simulation of Antarctica under present-day conditions is presented in Martin et al. (2011).

  2. A learnable parallel processing architecture towards unity of memory and computing.

    Science.gov (United States)

    Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

    2015-08-14

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  3. A learnable parallel processing architecture towards unity of memory and computing

    Science.gov (United States)

    Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

    2015-08-01

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  4. Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

    Science.gov (United States)

    Moon, Hongsik

    What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the

  5. Unified dataflow model for the analysis of data and pipeline parallelism, and buffer sizing

    NARCIS (Netherlands)

    Hausmans, J.P.H.M.; Geuns, S.J.; Wiggers, M.H.; Bekooij, Marco Jan Gerrit

    2014-01-01

    Real-time stream processing applications such as software defined radios are usually executed concurrently on multiprocessor systems. Exploiting coarse-grained data parallelism by duplicating tasks is often required, besides pipeline parallelism, to meet the temporal constraints of the applications.

  6. WWTP Process Tank Modelling

    DEFF Research Database (Denmark)

    Laursen, Jesper

    The present thesis considers numerical modeling of activated sludge tanks on municipal wastewater treatment plants. Focus is aimed at integrated modeling where the detailed microbiological model the Activated Sludge Model 3 (ASM3) is combined with a detailed hydrodynamic model based on a numerical...... solution of the Navier-Stokes equations in a multiphase scheme. After a general introduction to the activated sludge tank as a system, the activated sludge tank model is gradually setup in separate stages. The individual sub-processes that are often occurring in activated sludge tanks are initially...... hydrofoil shaped propellers. These two sub-processes deliver the main part of the supplied energy to the activated sludge tank, and for this reason they are important for the mixing conditions in the tank. For other important processes occurring in the activated sludge tank, existing models and measurements...

  7. Research on Gear Shifting Process without Disengaging Clutch for a Parallel Hybrid Electric Vehicle Equipped with AMT

    Directory of Open Access Journals (Sweden)

    Hui-Long Yu

    2014-01-01

    Full Text Available Dynamic models of a single-shaft parallel hybrid electric vehicle (HEV equipped with automated mechanical transmission (AMT were described in different working stages during a gear shifting process without disengaging clutch. Parameters affecting the gear shifting time, components life, and gear shifting jerk in different transient states during a gear shifting process were deeply analyzed. The mathematical models considering the detailed synchronizer working process which can explain the gear shifting failure, long time gear shifting, and frequent synchronizer failure phenomenon in HEV were derived. Dynamic coordinated control strategy of the engine, motor, and actuators in different transient states considering the detailed working stages of synchronizer in a gear shifting process of a HEV is for the first time innovatively proposed according to the state of art references. Bench test and real road test results show that the proposed control strategy can improve the gear shifting quality in all its evaluation indexes significantly.

  8. Improvements in fast-response flood modeling: desktop parallel computing and domain tracking

    Energy Technology Data Exchange (ETDEWEB)

    Judi, David R [Los Alamos National Laboratory; Mcpherson, Timothy N [Los Alamos National Laboratory; Burian, Steven J [UNIV. OF UTAH

    2009-01-01

    It is becoming increasingly important to have the ability to accurately forecast flooding, as flooding accounts for the most losses due to natural disasters in the world and the United States. Flood inundation modeling has been dominated by one-dimensional approaches. These models are computationally efficient and are considered by many engineers to produce reasonably accurate water surface profiles. However, because the profiles estimated in these models must be superimposed on digital elevation data to create a two-dimensional map, the result may be sensitive to the ability of the elevation data to capture relevant features (e.g. dikes/levees, roads, walls, etc...). Moreover, one-dimensional models do not explicitly represent the complex flow processes present in floodplains and urban environments and because two-dimensional models based on the shallow water equations have significantly greater ability to determine flow velocity and direction, the National Research Council (NRC) has recommended that two-dimensional models be used over one-dimensional models for flood inundation studies. This paper has shown that two-dimensional flood modeling computational time can be greatly reduced through the use of Java multithreading on multi-core computers which effectively provides a means for parallel computing on a desktop computer. In addition, this paper has shown that when desktop parallel computing is coupled with a domain tracking algorithm, significant computation time can be eliminated when computations are completed only on inundated cells. The drastic reduction in computational time shown here enhances the ability of two-dimensional flood inundation models to be used as a near-real time flood forecasting tool, engineering, design tool, or planning tool. Perhaps even of greater significance, the reduction in computation time makes the incorporation of risk and uncertainty/ensemble forecasting more feasible for flood inundation modeling (NRC 2000; Sayers et al

  9. Improved Path Loss Simulation Incorporating Three-Dimensional Terrain Model Using Parallel Coprocessors

    Directory of Open Access Journals (Sweden)

    Zhang Bin Loo

    2017-01-01

    Full Text Available Current network simulators abstract out wireless propagation models due to the high computation requirements for realistic modeling. As such, there is still a large gap between the results obtained from simulators and real world scenario. In this paper, we present a framework for improved path loss simulation built on top of an existing network simulation software, NS-3. Different from the conventional disk model, the proposed simulation also considers the diffraction loss computed using Epstein and Peterson’s model through the use of actual terrain elevation data to give an accurate estimate of path loss between a transmitter and a receiver. The drawback of high computation requirements is relaxed by offloading the computationally intensive components onto an inexpensive off-the-shelf parallel coprocessor, which is a NVIDIA GPU. Experiments are performed using actual terrain elevation data provided from United States Geological Survey. As compared to the conventional CPU architecture, the experimental result shows that a speedup of 20x to 42x is achieved by exploiting the parallel processing of GPU to compute the path loss between two nodes using terrain elevation data. The result shows that the path losses between two nodes are greatly affected by the terrain profile between these two nodes. Besides this, the result also suggests that the common strategy to place the transmitter in the highest position may not always work.

  10. Parallel, multi-stage processing of colors, faces and shapes in macaque inferior temporal cortex

    Science.gov (United States)

    Lafer-Sousa, Rosa; Conway, Bevil R.

    2014-01-01

    Visual-object processing culminates in inferior temporal (IT) cortex. To assess the organization of IT, we measured fMRI responses in alert monkey to achromatic images (faces, fruit, bodies, places) and colored gratings. IT contained multiple color-biased regions, which were typically ventral to face patches and, remarkably, yoked to them, spaced regularly at four locations predicted by known anatomy. Color and face selectivity increased for more anterior regions, indicative of a broad hierarchical arrangement. Responses to non-face shapes were found across IT, but were stronger outside color-biased regions and face patches, consistent with multiple parallel streams. IT also contained multiple coarse eccentricity maps: face patches overlapped central representations; color-biased regions spanned mid-peripheral representations; and place-biased regions overlapped peripheral representations. These results suggest that IT comprises parallel, multi-stage processing networks subject to one organizing principle. PMID:24141314

  11. The Masterson Approach with play therapy: a parallel process between mother and child.

    Science.gov (United States)

    Mulherin, M A

    2001-01-01

    This paper discusses a case in which the Masterson Approach was used with play therapy to treat a child with a developing personality disorder. It describes the parallel progression of the child and mother in adjunct therapy throughout a six-year period. The unique value of the Masterson Approach is that it provides the therapist with a framework and tool to diagnose and treat a child during the dynamic process of play. The case describes the mother-child dyad throughout therapy. It traces their parallel processes that involve separation, individuation, rapprochement, and the recovery of real self-capacities. Each stage of treatment is described, including verbal interventions. The child's internal affective state and intrapsychic structure during the various stages of treatment are illustrated by representative pictures.

  12. Data structures and languages in support of parallel image processing for astronomy

    International Nuclear Information System (INIS)

    Tanimoto, S.L.

    1985-01-01

    This paper discusses data structures, and aspects of programming languages and systems that are relevant to image processing of astronomy data. Emphasis is on image processing computations, because this kind of data processing is obviously a ripe one for parallelism and is important in astronomy. However, some discussion of general possibilities are also presented. The role of algorithms is examined since they are not dependent on a particular language. As an implementation of an algorithm a program is equally tied to data structure, operations, architecture and language, and therefore the issue of programming resides in the center of the tetrahedron

  13. Efficient Out of Core Sorting Algorithms for the Parallel Disks Model.

    Science.gov (United States)

    Kundeti, Vamsi; Rajasekaran, Sanguthevar

    2011-11-01

    In this paper we present efficient algorithms for sorting on the Parallel Disks Model (PDM). Numerous asymptotically optimal algorithms have been proposed in the literature. However many of these merge based algorithms have large underlying constants in the time bounds, because they suffer from the lack of read parallelism on PDM. The irregular consumption of the runs during the merge affects the read parallelism and contributes to the increased sorting time. In this paper we first introduce a novel idea called the dirty sequence accumulation that improves the read parallelism. Secondly, we show analytically that this idea can reduce the number of parallel I/O's required to sort the input close to the lower bound of [Formula: see text]. We experimentally verify our dirty sequence idea with the standard R-Way merge and show that our idea can reduce the number of parallel I/Os to sort on PDM significantly.

  14. The concept of parallel input/output processing for an electron linac

    International Nuclear Information System (INIS)

    Emoto, Takashi

    1993-01-01

    The instrumentation of and the control system for the PNC 10 MeV CW electron linac are described. A new concept of parallel input/output processing for the linac has been introduced. It is based on a substantial number of input/output processors(IOP) using beam control and diagnostics. The flexibility and simplicity of hardware/software are significant advantages with this scheme. (author)

  15. A Review of Parallel Processing Approaches to Robot Kinematics and Jacobian

    OpenAIRE

    Henrich, Dominik; Karl, Joachim; Wörn, Heinz

    1997-01-01

    Due to continuously increasing demands in the area of advanced robot control, it became necessary to speed up the computation. One way to reduce the computation time is to distribute the computation onto several processing units. In this survey we present different approaches to parallel computation of robot kinematics and Jacobian. Thereby, we discuss both the forward and the reverse problem. We introduce a classification scheme and class...

  16. A simple and efficient parallel FFT algorithm using the BSP model

    NARCIS (Netherlands)

    Bisseling, R.H.; Inda, M.A.

    2000-01-01

    In this paper we present a new parallel radix FFT algorithm based on the BSP model Our parallel algorithm uses the groupcyclic distribution family which makes it simple to understand and easy to implement We show how to reduce the com munication cost of the algorithm by a factor of three in the case

  17. Understanding decimal proportions: discrete representations, parallel access, and privileged processing of zero.

    Science.gov (United States)

    Varma, Sashank; Karl, Stacy R

    2013-05-01

    Much of the research on mathematical cognition has focused on the numbers 1, 2, 3, 4, 5, 6, 7, 8, and 9, with considerably less attention paid to more abstract number classes. The current research investigated how people understand decimal proportions--rational numbers between 0 and 1 expressed in the place-value symbol system. The results demonstrate that proportions are represented as discrete structures and processed in parallel. There was a semantic interference effect: When understanding a proportion expression (e.g., "0.29"), both the correct proportion referent (e.g., 0.29) and the incorrect natural number referent (e.g., 29) corresponding to the visually similar natural number expression (e.g., "29") are accessed in parallel, and when these referents lead to conflicting judgments, performance slows. There was also a syntactic interference effect, generalizing the unit-decade compatibility effect for natural numbers: When comparing two proportions, their tenths and hundredths components are processed in parallel, and when the different components lead to conflicting judgments, performance slows. The results also reveal that zero decimals--proportions ending in zero--serve multiple cognitive functions, including eliminating semantic interference and speeding processing. The current research also extends the distance, semantic congruence, and SNARC effects from natural numbers to decimal proportions. These findings inform how people understand the place-value symbol system, and the mental implementation of mathematical symbol systems more generally. Copyright © 2013 Elsevier Inc. All rights reserved.

  18. Fast phase processing in off-axis holography by CUDA including parallel phase unwrapping.

    Science.gov (United States)

    Backoach, Ohad; Kariv, Saar; Girshovitz, Pinhas; Shaked, Natan T

    2016-02-22

    We present parallel processing implementation for rapid extraction of the quantitative phase maps from off-axis holograms on the Graphics Processing Unit (GPU) of the computer using computer unified device architecture (CUDA) programming. To obtain efficient implementation, we parallelized both the wrapped phase map extraction algorithm and the two-dimensional phase unwrapping algorithm. In contrast to previous implementations, we utilized unweighted least squares phase unwrapping algorithm that better suits parallelism. We compared the proposed algorithm run times on the CPU and the GPU of the computer for various sizes of off-axis holograms. Using the GPU implementation, we extracted the unwrapped phase maps from the recorded off-axis holograms at 35 frames per second (fps) for 4 mega pixel holograms, and at 129 fps for 1 mega pixel holograms, which presents the fastest processing framerates obtained so far, to the best of our knowledge. We then used common-path off-axis interferometric imaging to quantitatively capture the phase maps of a micro-organism with rapid flagellum movements.

  19. The Medial Temporal Lobe – Conduit of Parallel Connectivity: A model for Attention, Memory, and Perception.

    Directory of Open Access Journals (Sweden)

    Brian B. Mozaffari

    2014-11-01

    Full Text Available Based on the notion that the brain is equipped with a hierarchical organization, which embodies environmental contingencies across many time scales, this paper suggests that the medial temporal lobe (MTL – located deep in the hierarchy – serves as a bridge connecting supra to infra – MTL levels. Bridging the upper and lower regions of the hierarchy provides a parallel architecture that optimizes information flow between upper and lower regions to aid attention, encoding, and processing of quick complex visual phenomenon. Bypassing intermediate hierarchy levels, information conveyed through the MTL ‘bridge’ allows upper levels to make educated predictions about the prevailing context and accordingly select lower representations to increase the efficiency of predictive coding throughout the hierarchy. This selection or activation/deactivation is associated with endogenous attention. In the event that these ‘bridge’ predictions are inaccurate, this architecture enables the rapid encoding of novel contingencies. A review of hierarchical models in relation to memory is provided along with a new theory, Medial-temporal-lobe Conduit for Parallel Connectivity (MCPC. In this scheme, consolidation is considered as a secondary process, occurring after a MTL-bridged connection, which eventually allows upper and lower levels to access each other directly. With repeated reactivations, as contingencies become consolidated, less MTL activity is predicted. Finally, MTL bridging may aid processing transient but structured perceptual events, by allowing communication between upper and lower levels without calling on intermediate levels of representation.

  20. Distributed and cloud computing from parallel processing to the Internet of Things

    CERN Document Server

    Hwang, Kai; Fox, Geoffrey C

    2012-01-01

    Distributed and Cloud Computing, named a 2012 Outstanding Academic Title by the American Library Association's Choice publication, explains how to create high-performance, scalable, reliable systems, exposing the design principles, architecture, and innovative applications of parallel, distributed, and cloud computing systems. Starting with an overview of modern distributed models, the book provides comprehensive coverage of distributed and cloud computing, including: Facilitating management, debugging, migration, and disaster recovery through virtualization Clustered systems for resear

  1. Parallel processing method for high-speed real time digital pulse processing for gamma-ray spectroscopy

    International Nuclear Information System (INIS)

    Fernandes, A.M.; Pereira, R.C.; Sousa, J.; Neto, A.; Carvalho, P.; Batista, A.J.N.; Carvalho, B.B.; Varandas, C.A.F.; Tardocchi, M.; Gorini, G.

    2010-01-01

    A new data acquisition (DAQ) system was developed to fulfil the requirements of the gamma-ray spectrometer (GRS) JET-EP2 (joint European Torus enhancement project 2), providing high-resolution spectroscopy at very high-count rate (up to few MHz). The system is based on the Advanced Telecommunications Computing Architecture TM (ATCA TM ) and includes a transient record (TR) module with 8 channels of 14 bits resolution at 400 MSamples/s (MSPS) sampling rate, 4 GB of local memory, and 2 field programmable gate array (FPGA) able to perform real time algorithms for data reduction and digital pulse processing. Although at 400 MSPS only fast programmable devices such as FPGAs can be used either for data processing and data transfer, FPGA resources also present speed limitation at some specific tasks, leading to an unavoidable data lost when demanding algorithms are applied. To overcome this problem and foreseeing an increase of the algorithm complexity, a new digital parallel filter was developed, aiming to perform real time pulse processing in the FPGAs of the TR module at the presented sampling rate. The filter is based on the conventional digital time-invariant trapezoidal shaper operating with parallelized data while performing pulse height analysis (PHA) and pile up rejection (PUR). The incoming sampled data is successively parallelized and fed into the processing algorithm block at one fourth of the sampling rate. The following data processing and data transfer is also performed at one fourth of the sampling rate. The algorithm based on data parallelization technique was implemented and tested at JET facilities, where a spectrum was obtained. Attending to the observed results, the PHA algorithm will be improved by implementing the pulse pile up discrimination.

  2. Model Process Control Language

    Data.gov (United States)

    National Aeronautics and Space Administration — The MPC (Model Process Control) language enables the capture, communication and preservation of a simulation instance, with sufficient detail that it can be...

  3. Effects of visual information regarding allocentric processing in haptic parallelity matching.

    Science.gov (United States)

    Van Mier, Hanneke I

    2013-10-01

    Research has revealed that haptic perception of parallelity deviates from physical reality. Large and systematic deviations have been found in haptic parallelity matching most likely due to the influence of the hand-centered egocentric reference frame. Providing information that increases the influence of allocentric processing has been shown to improve performance on haptic matching. In this study allocentric processing was stimulated by providing informative vision in haptic matching tasks that were performed using hand- and arm-centered reference frames. Twenty blindfolded participants (ten men, ten women) explored the orientation of a reference bar with the non-dominant hand and subsequently matched (task HP) or mirrored (task HM) its orientation on a test bar with the dominant hand. Visual information was provided by means of informative vision with participants having full view of the test bar, while the reference bar was blocked from their view (task VHP). To decrease the egocentric bias of the hands, participants also performed a visual haptic parallelity drawing task (task VHPD) using an arm-centered reference frame, by drawing the orientation of the reference bar. In all tasks, the distance between and orientation of the bars were manipulated. A significant effect of task was found; performance improved from task HP, to VHP to VHPD, and HM. Significant effects of distance were found in the first three tasks, whereas orientation and gender effects were only significant in tasks HP and VHP. The results showed that stimulating allocentric processing by means of informative vision and reducing the egocentric bias by using an arm-centered reference frame led to most accurate performance on parallelity matching. © 2013 Elsevier B.V. All rights reserved.

  4. Passive and partially active fault tolerance for massively parallel stream processing engines

    DEFF Research Database (Denmark)

    Su, Li; Zhou, Yongluan

    2018-01-01

    . On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE...... also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness...

  5. Biosphere Process Model Report

    Energy Technology Data Exchange (ETDEWEB)

    J. Schmitt

    2000-05-25

    To evaluate the postclosure performance of a potential monitored geologic repository at Yucca Mountain, a Total System Performance Assessment (TSPA) will be conducted. Nine Process Model Reports (PMRs), including this document, are being developed to summarize the technical basis for each of the process models supporting the TSPA model. These reports cover the following areas: (1) Integrated Site Model; (2) Unsaturated Zone Flow and Transport; (3) Near Field Environment; (4) Engineered Barrier System Degradation, Flow, and Transport; (5) Waste Package Degradation; (6) Waste Form Degradation; (7) Saturated Zone Flow and Transport; (8) Biosphere; and (9) Disruptive Events. Analysis/Model Reports (AMRs) contain the more detailed technical information used to support TSPA and the PMRs. The AMRs consists of data, analyses, models, software, and supporting documentation that will be used to defend the applicability of each process model for evaluating the postclosure performance of the potential Yucca Mountain repository system. This documentation will ensure the traceability of information from its source through its ultimate use in the TSPA-Site Recommendation (SR) and in the National Environmental Policy Act (NEPA) analysis processes. The objective of the Biosphere PMR is to summarize (1) the development of the biosphere model, and (2) the Biosphere Dose Conversion Factors (BDCFs) developed for use in TSPA. The Biosphere PMR does not present or summarize estimates of potential radiation doses to human receptors. Dose calculations are performed as part of TSPA and will be presented in the TSPA documentation. The biosphere model is a component of the process to evaluate postclosure repository performance and regulatory compliance for a potential monitored geologic repository at Yucca Mountain, Nevada. The biosphere model describes those exposure pathways in the biosphere by which radionuclides released from a potential repository could reach a human receptor

  6. Biosphere Process Model Report

    International Nuclear Information System (INIS)

    Schmitt, J.

    2000-01-01

    To evaluate the postclosure performance of a potential monitored geologic repository at Yucca Mountain, a Total System Performance Assessment (TSPA) will be conducted. Nine Process Model Reports (PMRs), including this document, are being developed to summarize the technical basis for each of the process models supporting the TSPA model. These reports cover the following areas: (1) Integrated Site Model; (2) Unsaturated Zone Flow and Transport; (3) Near Field Environment; (4) Engineered Barrier System Degradation, Flow, and Transport; (5) Waste Package Degradation; (6) Waste Form Degradation; (7) Saturated Zone Flow and Transport; (8) Biosphere; and (9) Disruptive Events. Analysis/Model Reports (AMRs) contain the more detailed technical information used to support TSPA and the PMRs. The AMRs consists of data, analyses, models, software, and supporting documentation that will be used to defend the applicability of each process model for evaluating the postclosure performance of the potential Yucca Mountain repository system. This documentation will ensure the traceability of information from its source through its ultimate use in the TSPA-Site Recommendation (SR) and in the National Environmental Policy Act (NEPA) analysis processes. The objective of the Biosphere PMR is to summarize (1) the development of the biosphere model, and (2) the Biosphere Dose Conversion Factors (BDCFs) developed for use in TSPA. The Biosphere PMR does not present or summarize estimates of potential radiation doses to human receptors. Dose calculations are performed as part of TSPA and will be presented in the TSPA documentation. The biosphere model is a component of the process to evaluate postclosure repository performance and regulatory compliance for a potential monitored geologic repository at Yucca Mountain, Nevada. The biosphere model describes those exposure pathways in the biosphere by which radionuclides released from a potential repository could reach a human receptor

  7. Parallel Processing and Applied Mathematics. 10th International Conference, PPAM 2013. Revised Selected Papers

    DEFF Research Database (Denmark)

    The following topics are dealt with: parallel scientific computing; numerical algorithms; parallel nonnumerical algorithms; cloud computing; evolutionary computing; metaheuristics; applied mathematics; GPU computing; multicore systems; hybrid architectures; hierarchical parallelism; HPC systems......; power monitoring; energy monitoring; and distributed computing....

  8. A diffusion model for two parallel queues with processor sharing: transient behavior and asymptotics

    Directory of Open Access Journals (Sweden)

    Charles Knessl

    1999-01-01

    Full Text Available We consider two identical, parallel M/M/1 queues. Both queues are fed by a Poisson arrival stream of rate λ and have service rates equal to μ. When both queues are non-empty, the two systems behave independently of each other. However, when one of the queues becomes empty, the corresponding server helps in the other queue. This is called head-of-the-line processor sharing. We study this model in the heavy traffic limit, where ρ=λ/μ→1. We formulate the heavy traffic diffusion approximation and explicitly compute the time-dependent probability of the diffusion approximation to the joint queue length process. We then evaluate the solution asymptotically for large values of space and/or time. This leads to simple expressions that show how the process achieves its stead state and other transient aspects.

  9. High-Performance Parallel and Stream Processing of X-ray Microdiffraction Data on Multicores

    International Nuclear Information System (INIS)

    Bauer, Michael A; McIntyre, Stewart; Xie Yuzhen; Biem, Alain; Tamura, Nobumichi

    2012-01-01

    We present the design and implementation of a high-performance system for processing synchrotron X-ray microdiffraction (XRD) data in IBM InfoSphere Streams on multicore processors. We report on the parallel and stream processing techniques that we use to harvest the power of clusters of multicores to analyze hundreds of gigabytes of synchrotron XRD data in order to reveal the microtexture of polycrystalline materials. The timing to process one XRD image using one pipeline is about ten times faster than the best C program at present. With the support of InfoSphere Streams platform, our software is able to be scaled up to operate on clusters of multi-cores for processing multiple images concurrently. This system provides a high-performance processing kernel to achieve near real-time data analysis of image data from synchrotron experiments.

  10. A review of advanced small-scale parallel bioreactor technology for accelerated process development: current state and future need.

    Science.gov (United States)

    Bareither, Rachel; Pollard, David

    2011-01-01

    The pharmaceutical and biotech industries face continued pressure to reduce development costs and accelerate process development. This challenge occurs alongside the need for increased upstream experimentation to support quality by design initiatives and the pursuit of predictive models from systems biology. A small scale system enabling multiple reactions in parallel (n ≥ 20), with automated sampling and integrated to purification, would provide significant improvement (four to fivefold) to development timelines. State of the art attempts to pursue high throughput process development include shake flasks, microfluidic reactors, microtiter plates and small-scale stirred reactors. The limitations of these systems are compared to desired criteria to mimic large scale commercial processes. The comparison shows that significant technological improvement is still required to provide automated solutions that can speed upstream process development. Copyright © 2010 American Institute of Chemical Engineers (AIChE).

  11. Parallel processing streams for motor output and sensory prediction during action preparation.

    Science.gov (United States)

    Stenner, Max-Philipp; Bauer, Markus; Heinze, Hans-Jochen; Haggard, Patrick; Dolan, Raymond J

    2015-03-15

    Sensory consequences of one's own actions are perceived as less intense than identical, externally generated stimuli. This is generally taken as evidence for sensory prediction of action consequences. Accordingly, recent theoretical models explain this attenuation by an anticipatory modulation of sensory processing prior to stimulus onset (Roussel et al. 2013) or even action execution (Brown et al. 2013). Experimentally, prestimulus changes that occur in anticipation of self-generated sensations are difficult to disentangle from more general effects of stimulus expectation, attention and task load (performing an action). Here, we show that an established manipulation of subjective agency over a stimulus leads to a predictive modulation in sensory cortex that is independent of these factors. We recorded magnetoencephalography while subjects performed a simple action with either hand and judged the loudness of a tone caused by the action. Effector selection was manipulated by subliminal motor priming. Compatible priming is known to enhance a subjective experience of agency over a consequent stimulus (Chambon and Haggard 2012). In line with this effect on subjective agency, we found stronger sensory attenuation when the action that caused the tone was compatibly primed. This perceptual effect was reflected in a transient phase-locked signal in auditory cortex before stimulus onset and motor execution. Interestingly, this sensory signal emerged at a time when the hemispheric lateralization of motor signals in M1 indicated ongoing effector selection. Our findings confirm theoretical predictions of a sensory modulation prior to self-generated sensations and support the idea that a sensory prediction is generated in parallel to motor output (Walsh and Haggard 2010), before an efference copy becomes available. Copyright © 2015 the American Physiological Society.

  12. Mathematical model of thyristor inverter including a series-parallel resonant circuit

    OpenAIRE

    Luft, M.; Szychta, E.

    2008-01-01

    The article presents a mathematical model of thyristor inverter including a series-parallel resonant circuit with the aid of state variable method. Maple procedures are used to compute current and voltage waveforms in the inverter.

  13. Mathematical Model of Thyristor Inverter Including a Series-parallel Resonant Circuit

    OpenAIRE

    Miroslaw Luft; Elzbieta Szychta

    2008-01-01

    The article presents a mathematical model of thyristor inverter including a series-parallel resonant circuit with theaid of state variable method. Maple procedures are used to compute current and voltage waveforms in the inverter.

  14. Mathematical Model of Thyristor Inverter Including a Series-parallel Resonant Circuit

    Directory of Open Access Journals (Sweden)

    Miroslaw Luft

    2008-01-01

    Full Text Available The article presents a mathematical model of thyristor inverter including a series-parallel resonant circuit with theaid of state variable method. Maple procedures are used to compute current and voltage waveforms in the inverter.

  15. Experimental and modelling results of a parallel-plate based active magnetic regenerator

    DEFF Research Database (Denmark)

    Tura, A.; Nielsen, Kaspar Kirstein; Rowe, A.

    2012-01-01

    The performance of a permanent magnet magnetic refrigerator (PMMR) using gadolinium parallel plates is described. The configuration and operating parameters are described in detail. Experimental results are compared to simulations using an established twodimensional model of an active magnetic...

  16. Parallel Execution of Functional Mock-up Units in Buildings Modeling

    Energy Technology Data Exchange (ETDEWEB)

    Ozmen, Ozgur [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Nutaro, James J. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); New, Joshua Ryan [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2016-06-30

    A Functional Mock-up Interface (FMI) defines a standardized interface to be used in computer simulations to develop complex cyber-physical systems. FMI implementation by a software modeling tool enables the creation of a simulation model that can be interconnected, or the creation of a software library called a Functional Mock-up Unit (FMU). This report describes an FMU wrapper implementation that imports FMUs into a C++ environment and uses an Euler solver that executes FMUs in parallel using Open Multi-Processing (OpenMP). The purpose of this report is to elucidate the runtime performance of the solver when a multi-component system is imported as a single FMU (for the whole system) or as multiple FMUs (for different groups of components as sub-systems). This performance comparison is conducted using two test cases: (1) a simple, multi-tank problem; and (2) a more realistic use case based on the Modelica Buildings Library. In both test cases, the performance gains are promising when each FMU consists of a large number of states and state events that are wrapped in a single FMU. Load balancing is demonstrated to be a critical factor in speeding up parallel execution of multiple FMUs.

  17. Ocean Modeling and Visualization on Massively Parallel Computer

    Science.gov (United States)

    Chao, Yi; Li, P. Peggy; Wang, Ping; Katz, Daniel S.; Cheng, Benny N.

    1997-01-01

    Climate modeling is one of the grand challenges of computational science, and ocean modeling plays an important role in both understanding the current climatic conditions and predicting future climate change.

  18. Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units

    Energy Technology Data Exchange (ETDEWEB)

    Mburu, Joe Mwangi; Hah, Chang Joo Hah [KEPCO International Nuclear Graduate School, Ulsan (Korea, Republic of)

    2014-05-15

    Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization.

  19. Massive Parallelism of Monte-Carlo Simulation on Low-End Hardware using Graphic Processing Units

    International Nuclear Information System (INIS)

    Mburu, Joe Mwangi; Hah, Chang Joo Hah

    2014-01-01

    Within the past decade, research has been done on utilizing GPU massive parallelization in core simulation with impressive results but unfortunately, not much commercial application has been done in the nuclear field especially in reactor core simulation. The purpose of this paper is to give an introductory concept on the topic and illustrate the potential of exploiting the massive parallel nature of GPU computing on a simple monte-carlo simulation with very minimal hardware specifications. To do a comparative analysis, a simple two dimension monte-carlo simulation is implemented for both the CPU and GPU in order to evaluate performance gain based on the computing devices. The heterogeneous platform utilized in this analysis is done on a slow notebook with only 1GHz processor. The end results are quite surprising whereby high speedups obtained are almost a factor of 10. In this work, we have utilized heterogeneous computing in a GPU-based approach in applying potential high arithmetic intensive calculation. By applying a complex monte-carlo simulation on GPU platform, we have speed up the computational process by almost a factor of 10 based on one million neutrons. This shows how easy, cheap and efficient it is in using GPU in accelerating scientific computing and the results should encourage in exploring further this avenue especially in nuclear reactor physics simulation where deterministic and stochastic calculations are quite favourable in parallelization

  20. Optimization Solutions for Improving the Performance of the Parallel Reduction Algorithm Using Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Ion LUNGU

    2012-01-01

    Full Text Available In this paper, we research, analyze and develop optimization solutions for the parallel reduction function using graphics processing units (GPUs that implement the Compute Unified Device Architecture (CUDA, a modern and novel approach for improving the software performance of data processing applications and algorithms. Many of these applications and algorithms make use of the reduction function in their computational steps. After having designed the function and its algorithmic steps in CUDA, we have progressively developed and implemented optimization solutions for the reduction function. In order to confirm, test and evaluate the solutions' efficiency, we have developed a custom tailored benchmark suite. We have analyzed the obtained experimental results regarding: the comparison of the execution time and bandwidth when using graphic processing units covering the main CUDA architectures (Tesla GT200, Fermi GF100, Kepler GK104 and a central processing unit; the data type influence; the binary operator's influence.

  1. Aerospace Materials Process Modelling

    Science.gov (United States)

    1988-08-01

    Cooling Transformation diagram ( CCT diagram ) When a IT diagram is used in the heat process modelling, we suppose that a sudden cooling (instantaneous...processes. CE, chooses instead to study thermo-mechanical properties referring to a CCT diagram . This is thinked to be more reliable to give a true...k , mm-_____sml l ml A I 1 III 12.4 This determination is however based on the following approximations: i) A CCT diagram is valid only for the

  2. Parallel Motion Simulation of Large-Scale Real-Time Crowd in a Hierarchical Environmental Model

    Directory of Open Access Journals (Sweden)

    Xin Wang

    2012-01-01

    Full Text Available This paper presents a parallel real-time crowd simulation method based on a hierarchical environmental model. A dynamical model of the complex environment should be constructed to simulate the state transition and propagation of individual motions. By modeling of a virtual environment where virtual crowds reside, we employ different parallel methods on a topological layer, a path layer and a perceptual layer. We propose a parallel motion path matching method based on the path layer and a parallel crowd simulation method based on the perceptual layer. The large-scale real-time crowd simulation becomes possible with these methods. Numerical experiments are carried out to demonstrate the methods and results.

  3. Parallel distributed computing in modeling of the nanomaterials production technologies

    NARCIS (Netherlands)

    Krzhizhanovskaya, V.V.; Korkhov, V.V.; Zatevakhin, M.A.; Gorbachev, Y.E.

    2008-01-01

    Simulation of physical and chemical processes occurring in the nanomaterial production technologies is a computationally challenging problem, due to the great number of coupled processes, time and length scales to be taken into account. To solve such complex problems with a good level of detail in a

  4. The vector and parallel processing of MORSE code on Monte Carlo Machine

    International Nuclear Information System (INIS)

    Hasegawa, Yukihiro; Higuchi, Kenji.

    1995-11-01

    Multi-group Monte Carlo Code for particle transport, MORSE is modified for high performance computing on Monte Carlo Machine Monte-4. The method and the results are described. Monte-4 was specially developed to realize high performance computing of Monte Carlo codes for particle transport, which have been difficult to obtain high performance in vector processing on conventional vector processors. Monte-4 has four vector processor units with the special hardware called Monte Carlo pipelines. The vectorization and parallelization of MORSE code and the performance evaluation on Monte-4 are described. (author)

  5. Leveraging human oversight and intervention in large-scale parallel processing of open-source data

    Science.gov (United States)

    Casini, Enrico; Suri, Niranjan; Bradshaw, Jeffrey M.

    2015-05-01

    The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.

  6. Processing optimization with parallel computing for the J-PET scanner

    Directory of Open Access Journals (Sweden)

    Krzemień Wojciech

    2015-12-01

    Full Text Available The Jagiellonian Positron Emission Tomograph (J-PET collaboration is developing a prototype time of flight (TOF-positron emission tomograph (PET detector based on long polymer scintillators. This novel approach exploits the excellent time properties of the plastic scintillators, which permit very precise time measurements. The very fast field programmable gate array (FPGA-based front-end electronics and the data acquisition system, as well as low- and high-level reconstruction algorithms were specially developed to be used with the J-PET scanner. The TOF-PET data processing and reconstruction are time and resource demanding operations, especially in the case of a large acceptance detector that works in triggerless data acquisition mode. In this article, we discuss the parallel computing methods applied to optimize the data processing for the J-PET detector. We begin with general concepts of parallel computing and then we discuss several applications of those techniques in the J-PET data processing.

  7. Is orthographic information from multiple parafoveal words processed in parallel: An eye-tracking study.

    Science.gov (United States)

    Cutter, Michael G; Drieghe, Denis; Liversedge, Simon P

    2017-08-01

    In the current study we investigated whether orthographic information available from 1 upcoming parafoveal word influences the processing of another parafoveal word. Across 2 experiments we used the boundary paradigm (Rayner, 1975) to present participants with an identity preview of the 2 words after the boundary (e.g., hot pan ), a preview in which 2 letters were transposed between these words (e.g., hop tan ), or a preview in which the same 2 letters were substituted (e.g., hob fan ). We hypothesized that if these 2 words were processed in parallel in the parafovea then we may observe significant preview benefits for the condition in which the letters were transposed between words relative to the condition in which the letters were substituted. However, no such effect was observed, with participants fixating the words for the same amount of time in both conditions. This was the case both when the transposition was made between the final and first letter of the 2 words (e.g., hop tan as a preview of hot pan ; Experiment 1) and when the transposition maintained within word letter position (e.g., pit hop as a preview of hit pop ; Experiment 2). The implications of these findings are considered in relation to serial and parallel lexical processing during reading. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  8. Using a Multivariate Multilevel Polytomous Item Response Theory Model to Study Parallel Processes of Change: The Dynamic Association between Adolescents' Social Isolation and Engagement with Delinquent Peers in the National Youth Survey

    Science.gov (United States)

    Hsieh, Chueh-An; von Eye, Alexander A.; Maier, Kimberly S.

    2010-01-01

    The application of multidimensional item response theory models to repeated observations has demonstrated great promise in developmental research. It allows researchers to take into consideration both the characteristics of item response and measurement error in longitudinal trajectory analysis, which improves the reliability and validity of the…

  9. Application of Parallel Algorithms in an Air Pollution Model

    DEFF Research Database (Denmark)

    Georgiev, K.; Zlatev, Z.

    1999-01-01

    Proceedings of the NATO Advanced Research Workshop on Large Scale Computations in Air Pollution Modelling, Sofia, Bulgaria, 6-10 July 1998......Proceedings of the NATO Advanced Research Workshop on Large Scale Computations in Air Pollution Modelling, Sofia, Bulgaria, 6-10 July 1998...

  10. Parallel photonic information processing at gigabyte per second data rates using transient states

    Science.gov (United States)

    Brunner, Daniel; Soriano, Miguel C.; Mirasso, Claudio R.; Fischer, Ingo

    2013-01-01

    The increasing demands on information processing require novel computational concepts and true parallelism. Nevertheless, hardware realizations of unconventional computing approaches never exceeded a marginal existence. While the application of optics in super-computing receives reawakened interest, new concepts, partly neuro-inspired, are being considered and developed. Here we experimentally demonstrate the potential of a simple photonic architecture to process information at unprecedented data rates, implementing a learning-based approach. A semiconductor laser subject to delayed self-feedback and optical data injection is employed to solve computationally hard tasks. We demonstrate simultaneous spoken digit and speaker recognition and chaotic time-series prediction at data rates beyond 1Gbyte/s. We identify all digits with very low classification errors and perform chaotic time-series prediction with 10% error. Our approach bridges the areas of photonic information processing, cognitive and information science.

  11. Business Model Process Configurations

    DEFF Research Database (Denmark)

    Taran, Yariv; Nielsen, Christian; Thomsen, Peter

    2015-01-01

    , by developing (inductively) an ontological classification framework, in view of the BM process configurations typology developed. Design/methodology/approach – Given the inconsistencies found in the business model studies (e.g. definitions, configurations, classifications) we adopted the analytical induction...

  12. Parallel Solution of Robust Nonlinear Model Predictive Control Problems in Batch Crystallization

    Directory of Open Access Journals (Sweden)

    Yankai Cao

    2016-06-01

    Full Text Available Representing the uncertainties with a set of scenarios, the optimization problem resulting from a robust nonlinear model predictive control (NMPC strategy at each sampling instance can be viewed as a large-scale stochastic program. This paper solves these optimization problems using the parallel Schur complement method developed to solve stochastic programs on distributed and shared memory machines. The control strategy is illustrated with a case study of a multidimensional unseeded batch crystallization process. For this application, a robust NMPC based on min–max optimization guarantees satisfaction of all state and input constraints for a set of uncertainty realizations, and also provides better robust performance compared with open-loop optimal control, nominal NMPC, and robust NMPC minimizing the expected performance at each sampling instance. The performance of robust NMPC can be improved by generating optimization scenarios using Bayesian inference. With the efficient parallel solver, the solution time of one optimization problem is reduced from 6.7 min to 0.5 min, allowing for real-time application.

  13. Parallel performance of TORT on the CRAY J90: Model and measurement

    International Nuclear Information System (INIS)

    Barnett, A.; Azmy, Y.Y.

    1997-10-01

    A limitation on the parallel performance of TORT on the CRAY J90 is the amount of extra work introduced by the multitasking algorithm itself. The extra work beyond that of the serial version of the code, called overhead, arises from the synchronization of the parallel tasks and the accumulation of results by the master task. The goal of recent updates to TORT was to reduce the time consumed by these activities. To help understand which components of the multitasking algorithm contribute significantly to the overhead, a parallel performance model was constructed and compared to measurements of actual timings of the code

  14. Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

    Science.gov (United States)

    Nadkarni, P M; Miller, P L

    1991-01-01

    A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.

  15. F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

    Science.gov (United States)

    DiNucci, David C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).

  16. Parallelized Genetic Identification of the Thermal-Electrochemical Model for Lithium-Ion Battery

    Directory of Open Access Journals (Sweden)

    Liqiang Zhang

    2013-01-01

    Full Text Available The parameters of a well predicted model can be used as health characteristics for Lithium-ion battery. This article reports a parallelized parameter identification of the thermal-electrochemical model, which significantly reduces the time consumption of parameter identification. Since the P2D model has the most predictability, it is chosen for further research and expanded to the thermal-electrochemical model by coupling thermal effect and temperature-dependent parameters. Then Genetic Algorithm is used for parameter identification, but it takes too much time because of the long time simulation of model. For this reason, a computer cluster is built by surplus computing resource in our laboratory based on Parallel Computing Toolbox and Distributed Computing Server in MATLAB. The performance of two parallelized methods, namely Single Program Multiple Data (SPMD and parallel FOR loop (PARFOR, is investigated and then the parallelized GA identification is proposed. With this method, model simulations running parallelly and the parameter identification could be speeded up more than a dozen times, and the identification result is batter than that from serial GA. This conclusion is validated by model parameter identification of a real LiFePO4 battery.

  17. Depth-Averaged Non-Hydrostatic Hydrodynamic Model Using a New Multithreading Parallel Computing Method

    Directory of Open Access Journals (Sweden)

    Ling Kang

    2017-03-01

    Full Text Available Compared to the hydrostatic hydrodynamic model, the non-hydrostatic hydrodynamic model can accurately simulate flows that feature vertical accelerations. The model’s low computational efficiency severely restricts its wider application. This paper proposes a non-hydrostatic hydrodynamic model based on a multithreading parallel computing method. The horizontal momentum equation is obtained by integrating the Navier–Stokes equations from the bottom to the free surface. The vertical momentum equation is approximated by the Keller-box scheme. A two-step method is used to solve the model equations. A parallel strategy based on block decomposition computation is utilized. The original computational domain is subdivided into two subdomains that are physically connected via a virtual boundary technique. Two sub-threads are created and tasked with the computation of the two subdomains. The producer–consumer model and the thread lock technique are used to achieve synchronous communication between sub-threads. The validity of the model was verified by solitary wave propagation experiments over a flat bottom and slope, followed by two sinusoidal wave propagation experiments over submerged breakwater. The parallel computing method proposed here was found to effectively enhance computational efficiency and save 20%–40% computation time compared to serial computing. The parallel acceleration rate and acceleration efficiency are approximately 1.45% and 72%, respectively. The parallel computing method makes a contribution to the popularization of non-hydrostatic models.

  18. A Parallel, Multi-Scale Watershed-Hydrologic-Inundation Model with Adaptively Switching Mesh for Capturing Flooding and Lake Dynamics

    Science.gov (United States)

    Ji, X.; Shen, C.

    2017-12-01

    Flood inundation presents substantial societal hazards and also changes biogeochemistry for systems like the Amazon. It is often expensive to simulate high-resolution flood inundation and propagation in a long-term watershed-scale model. Due to the Courant-Friedrichs-Lewy (CFL) restriction, high resolution and large local flow velocity both demand prohibitively small time steps even for parallel codes. Here we develop a parallel surface-subsurface process-based model enhanced by multi-resolution meshes that are adaptively switched on or off. The high-resolution overland flow meshes are enabled only when the flood wave invades to floodplains. This model applies semi-implicit, semi-Lagrangian (SISL) scheme in solving dynamic wave equations, and with the assistant of the multi-mesh method, it also adaptively chooses the dynamic wave equation only in the area of deep inundation. Therefore, the model achieves a balance between accuracy and computational cost.

  19. Modeling the Fracture of Ice Sheets on Parallel Computers

    Energy Technology Data Exchange (ETDEWEB)

    Waisman, Haim [Columbia Univ., New York, NY (United States); Tuminaro, Ray [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2013-10-10

    The objective of this project was to investigate the complex fracture of ice and understand its role within larger ice sheet simulations and global climate change. This objective was achieved by developing novel physics based models for ice, novel numerical tools to enable the modeling of the physics and by collaboration with the ice community experts. At the present time, ice fracture is not explicitly considered within ice sheet models due in part to large computational costs associated with the accurate modeling of this complex phenomena. However, fracture not only plays an extremely important role in regional behavior but also influences ice dynamics over much larger zones in ways that are currently not well understood. To this end, our research findings through this project offers significant advancement to the field and closes a large gap of knowledge in understanding and modeling the fracture of ice sheets in the polar regions. Thus, we believe that our objective has been achieved and our research accomplishments are significant. This is corroborated through a set of published papers, posters and presentations at technical conferences in the field. In particular significant progress has been made in the mechanics of ice, fracture of ice sheets and ice shelves in polar regions and sophisticated numerical methods that enable the solution of the physics in an efficient way.

  20. Dynamic modeling and experiment of a new type of parallel servo press considering gravity counterbalance

    Science.gov (United States)

    He, Jun; Gao, Feng; Bai, Yongjun; Wu, Shengfu

    2013-11-01

    The large capacity servo press is traditionally realized by means of redundant actuation, however there exist the over-constraint problem and interference among actuators, which increases the control difficulty and the product cost. A new type of press mechanism with parallel topology is presented to develop the mechanical servo press with high stamping capacity. The dynamic model considering gravity counterbalance is proposed based on the virtual work principle, and then the effect of counterbalance cylinder on the dynamic performance of the servo press is studied. It is found that the motor torque required to operate the press is a lot less than the others when the ratio of the counterbalance force to the gravity of ram is in the vicinity of 1.0. The stamping force of the real press prototype can reach up to 25 MN on the position of 13 mm away from the bottom dead center. The typical deep-drawing process with 1 200 mm stroke at 8 strokes per minute is proposed by means of five order polynomial. On this process condition, the driving torques are calculated based on the above dynamic model and the torque measuring test is also carried out on the prototype. It is shown that the curve trend of calculation torque is consistent to the measured result and that the average error is less than 15%. The parallel mechanism is introduced into the development of large capacity servo press to avoid the over-constraint and interference of traditional redundant actuation, and its dynamic characteristics with gravity counterbalance are presented.

  1. Application of the parallel processing computer to a nuclear disaster prevention support system

    Energy Technology Data Exchange (ETDEWEB)

    Shigehiro, Nukatsuka; Osami, Watanabe [Mitsubishi Heavy Industries, LTD (Japan)

    2003-07-01

    At the time of nuclear emergency, it is important to identify the type and the cause of the accident. Besides with these, it is also important to provide adequate information for the emergency response organization to support decision making by predicting and evaluating the development of the event and the influence of the release of radioactivity for the environment. Recently, a new type of nuclear disaster prevention support system called MEASURES (Multiple Radiological Emergency Assistance System for Urgent Response) was developed which provides not only the current state of the nuclear power plant and the influence of the radioactivity for the environment, but also the future prediction of the accident development. In order to provide the accurate results of these analyses quickly, MEASURES utilizes various techniques, such as multiple nesting method which narrows down the calculation area gradually, and parallel processing computer for three dimensional analyses, such as air current distribution analysis. In this paper, the outline and the feature of MEASURES are presented, especially focused on the usage of parallel processing computer for the three dimensional air current distribution analysis. (authors)

  2. Online measurement for geometrical parameters of wheel set based on structure light and CUDA parallel processing

    Science.gov (United States)

    Wu, Kaihua; Shao, Zhencheng; Chen, Nian; Wang, Wenjie

    2018-01-01

    The wearing degree of the wheel set tread is one of the main factors that influence the safety and stability of running train. Geometrical parameters mainly include flange thickness and flange height. Line structure laser light was projected on the wheel tread surface. The geometrical parameters can be deduced from the profile image. An online image acquisition system was designed based on asynchronous reset of CCD and CUDA parallel processing unit. The image acquisition was fulfilled by hardware interrupt mode. A high efficiency parallel segmentation algorithm based on CUDA was proposed. The algorithm firstly divides the image into smaller squares, and extracts the squares of the target by fusion of k_means and STING clustering image segmentation algorithm. Segmentation time is less than 0.97ms. A considerable acceleration ratio compared with the CPU serial calculation was obtained, which greatly improved the real-time image processing capacity. When wheel set was running in a limited speed, the system placed alone railway line can measure the geometrical parameters automatically. The maximum measuring speed is 120km/h.

  3. Application of the parallel processing computer to a nuclear disaster prevention support system

    International Nuclear Information System (INIS)

    Shigehiro, Nukatsuka; Osami, Watanabe

    2003-01-01

    At the time of nuclear emergency, it is important to identify the type and the cause of the accident. Besides with these, it is also important to provide adequate information for the emergency response organization to support decision making by predicting and evaluating the development of the event and the influence of the release of radioactivity for the environment. Recently, a new type of nuclear disaster prevention support system called MEASURES (Multiple Radiological Emergency Assistance System for Urgent Response) was developed which provides not only the current state of the nuclear power plant and the influence of the radioactivity for the environment, but also the future prediction of the accident development. In order to provide the accurate results of these analyses quickly, MEASURES utilizes various techniques, such as multiple nesting method which narrows down the calculation area gradually, and parallel processing computer for three dimensional analyses, such as air current distribution analysis. In this paper, the outline and the feature of MEASURES are presented, especially focused on the usage of parallel processing computer for the three dimensional air current distribution analysis. (authors)

  4. Single product lot-sizing on unrelated parallel machines with non-decreasing processing times

    Science.gov (United States)

    Eremeev, A.; Kovalyov, M.; Kuznetsov, P.

    2018-01-01

    We consider a problem in which at least a given quantity of a single product has to be partitioned into lots, and lots have to be assigned to unrelated parallel machines for processing. In one version of the problem, the maximum machine completion time should be minimized, in another version of the problem, the sum of machine completion times is to be minimized. Machine-dependent lower and upper bounds on the lot size are given. The product is either assumed to be continuously divisible or discrete. The processing time of each machine is defined by an increasing function of the lot volume, given as an oracle. Setup times and costs are assumed to be negligibly small, and therefore, they are not considered. We derive optimal polynomial time algorithms for several special cases of the problem. An NP-hard case is shown to admit a fully polynomial time approximation scheme. An application of the problem in energy efficient processors scheduling is considered.

  5. Parallelized preconditioned model building algorithm for matrix factorization

    OpenAIRE

    Kaya, Kamer; Birbil, İlker; Birbil, Ilker; Öztürk, Mehmet Kaan; Ozturk, Mehmet Kaan; Gohari, Amir

    2017-01-01

    Matrix factorization is a common task underlying several machine learning applications such as recommender systems, topic modeling, or compressed sensing. Given a large and possibly sparse matrix A, we seek two smaller matrices W and H such that their product is as close to A as possible. The objective is minimizing the sum of square errors in the approximation. Typically such problems involve hundreds of thousands of unknowns, so an optimizer must be exceptionally efficient. In this study, a...

  6. Business process model repositories : efficient process retrieval

    NARCIS (Netherlands)

    Yan, Z.

    2012-01-01

    As organizations increasingly work in process-oriented manner, the number of business process models that they develop and have to maintain increases. As a consequence, it has become common for organizations to have collections of hundreds or even thousands of business process models. When a

  7. A program system for ab initio MO calculations on vector and parallel processing machines. Pt. 3

    International Nuclear Information System (INIS)

    Wiest, R.; Demuynck, J.; Benard, M.; Rohmer, M.M.; Ernenwein, R.

    1991-01-01

    This series of three papers presents a program system for ab initio molecular orbital calculations on vector and parallel computers. Part III is devoted to the four-index transformation on a molecular orbital basis of size NMO of the file of two-electorn integrals (pqparallelrs) generated by a contracted Gaussian set of size NATO (number of atomic orbitals). A fast Yoshimine algorithm first sorts the (pqparallelrs) integrals with respect to index pq only. This file of half-sorted integrals labelled by their rs-index can be processed without further modification to generate either the transformed integrals or the supermatrix elements. The large memory available on the CRAY-2 hase made possible to implement the transformation algorithm proposed by Bender in 1972, which requires a core-storage allocation varying as (NATO) 3 . Two versions of Bender's algorithm are included in the present program. The first version is an in-core version, where the complete file of accumulated contributions to transformed integrals in stored and updated in central memory. This version has been parallelized by distributing over a limited number of logical tasks the NATO steps corresponding to the scanning of the most external loop. The second version is an out-of-core version, in which twin files are alternatively used as input and output for the accumulated contributions to transformed integrals. This version is not parallel. The choice of one or another version and (for version 1) the determination of the number of tasks depends upon the balance between the available and the requested amounts of storage. The storage management and the choice of the proper version are carried out automatically using dynamic storage allocation. Both versions are vectorized and take advantage of the molecular symmetry. (orig.)

  8. Minimizing makespan in a two-stage flow shop with parallel batch-processing machines and re-entrant jobs

    Science.gov (United States)

    Huang, J. D.; Liu, J. J.; Chen, Q. X.; Mao, N.

    2017-06-01

    Against a background of heat-treatment operations in mould manufacturing, a two-stage flow-shop scheduling problem is described for minimizing makespan with parallel batch-processing machines and re-entrant jobs. The weights and release dates of jobs are non-identical, but job processing times are equal. A mixed-integer linear programming model is developed and tested with small-scale scenarios. Given that the problem is NP hard, three heuristic construction methods with polynomial complexity are proposed. The worst case of the new constructive heuristic is analysed in detail. A method for computing lower bounds is proposed to test heuristic performance. Heuristic efficiency is tested with sets of scenarios. Compared with the two improved heuristics, the performance of the new constructive heuristic is superior.

  9. Jet formation and equatorial superrotation in Jupiter's atmosphere: Numerical modelling using a new efficient parallel code

    Science.gov (United States)

    Rivier, Leonard Gilles

    Using an efficient parallel code solving the primitive equations of atmospheric dynamics, the jet structure of a Jupiter like atmosphere is modeled. In the first part of this thesis, a parallel spectral code solving both the shallow water equations and the multi-level primitive equations of atmospheric dynamics is built. The implementation of this code called BOB is done so that it runs effectively on an inexpensive cluster of workstations. A one dimensional decomposition and transposition method insuring load balancing among processes is used. The Legendre transform is cache-blocked. A "compute on the fly" of the Legendre polynomials used in the spectral method produces a lower memory footprint and enables high resolution runs on relatively small memory machines. Performance studies are done using a cluster of workstations located at the National Center for Atmospheric Research (NCAR). BOB performances are compared to the parallel benchmark code PSTSWM and the dynamical core of NCAR's CCM3.6.6. In both cases, the comparison favors BOB. In the second part of this thesis, the primitive equation version of the code described in part I is used to study the formation of organized zonal jets and equatorial superrotation in a planetary atmosphere where the parameters are chosen to best model the upper atmosphere of Jupiter. Two levels are used in the vertical and only large scale forcing is present. The model is forced towards a baroclinically unstable flow, so that eddies are generated by baroclinic instability. We consider several types of forcing, acting on either the temperature or the momentum field. We show that only under very specific parametric conditions, zonally elongated structures form and persist resembling the jet structure observed near the cloud level top (1 bar) on Jupiter. We also study the effect of an equatorial heat source, meant to be a crude representation of the effect of the deep convective planetary interior onto the outer atmospheric layer. We

  10. Image reconstruction method for electrical capacitance tomography based on the combined series and parallel normalization model

    International Nuclear Information System (INIS)

    Dong, Xiangyuan; Guo, Shuqing

    2008-01-01

    In this paper, a novel image reconstruction method for electrical capacitance tomography (ECT) based on the combined series and parallel model is presented. A regularization technique is used to obtain a stabilized solution of the inverse problem. Also, the adaptive coefficient of the combined model is deduced by numerical optimization. Simulation results indicate that it can produce higher quality images when compared to the algorithm based on the parallel or series models for the cases tested in this paper. It provides a new algorithm for ECT application

  11. Development of whole core thermal-hydraulic analysis program ACT. 4. Simplified fuel assembly model and parallelization by MPI

    International Nuclear Information System (INIS)

    Ohshima, Hiroyuki

    2001-10-01

    A whole core thermal-hydraulic analysis program ACT is being developed for the purpose of evaluating detailed in-core thermal hydraulic phenomena of fast reactors including the effect of the flow between wrapper-tube walls (inter-wrapper flow) under various reactor operation conditions. As appropriate boundary conditions in addition to a detailed modeling of the core are essential for accurate simulations of in-core thermal hydraulics, ACT consists of not only fuel assembly and inter-wrapper flow analysis modules but also a heat transport system analysis module that gives response of the plant dynamics to the core model. This report describes incorporation of a simplified model to the fuel assembly analysis module and program parallelization by a message passing method toward large-scale simulations. ACT has a fuel assembly analysis module which can simulate a whole fuel pin bundle in each fuel assembly of the core and, however, it may take much CPU time for a large-scale core simulation. Therefore, a simplified fuel assembly model that is thermal-hydraulically equivalent to the detailed one has been incorporated in order to save the simulation time and resources. This simplified model is applied to several parts of fuel assemblies in a core where the detailed simulation results are not required. With regard to the program parallelization, the calculation load and the data flow of ACT were analyzed and the optimum parallelization has been done including the improvement of the numerical simulation algorithm of ACT. Message Passing Interface (MPI) is applied to data communication between processes and synchronization in parallel calculations. Parallelized ACT was verified through a comparison simulation with the original one. In addition to the above works, input manuals of the core analysis module and the heat transport system analysis module have been prepared. (author)

  12. A Tool for Performance Modeling of Parallel Programs

    Directory of Open Access Journals (Sweden)

    J.A. González

    2003-01-01

    Full Text Available Current performance prediction analytical models try to characterize the performance behavior of actual machines through a small set of parameters. In practice, substantial deviations are observed. These differences are due to factors as memory hierarchies or network latency. A natural approach is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each "communication block". Unfortunately, to use this approach implies that the evaluation of parameters must be done for each algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We present a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter, giving us, among other information, the values of those parameters.

  13. Process model repositories and PNML

    NARCIS (Netherlands)

    Hee, van K.M.; Post, R.D.J.; Somers, L.J.A.M.; Werf, van der J.M.E.M.; Kindler, E.

    2004-01-01

    Bringing system and process models together in repositories facilitates the interchange of model information between modelling tools, and allows the combination and interlinking of complementary models. Petriweb is a web application for managing such repositories. It supports hierarchical process

  14. 3D Body Scanning Measurement System Associated with RF Imaging, Zero-padding and Parallel Processing

    Directory of Open Access Journals (Sweden)

    Kim Hyung Tae

    2016-04-01

    Full Text Available This work presents a novel signal processing method for high-speed 3D body measurements using millimeter waves with a general processing unit (GPU and zero-padding fast Fourier transform (ZPFFT. The proposed measurement system consists of a radio-frequency (RF antenna array for a penetrable measurement, a high-speed analog-to-digital converter (ADC for significant data acquisition, and a general processing unit for fast signal processing. The RF waves of the transmitter and the receiver are converted to real and imaginary signals that are sampled by a high-speed ADC and synchronized with the kinematic positions of the scanner. Because the distance between the surface and the antenna is related to the peak frequency of the conjugate signals, a fast Fourier transform (FFT is applied to the signal processing after the sampling. The sampling time is finite owing to a short scanning time, and the physical resolution needs to be increased; further, zero-padding is applied to interpolate the spectra of the sampled signals to consider a 1/m floating point frequency. The GPU and parallel algorithm are applied to accelerate the speed of the ZPFFT because of the large number of additional mathematical operations of the ZPFFT. 3D body images are finally obtained by spectrograms that are the arrangement of the ZPFFT in a 3D space.

  15. Algorithms for a parallel implementation of Hidden Markov Models with a small state space

    DEFF Research Database (Denmark)

    Nielsen, Jesper; Sand, Andreas

    2011-01-01

    Two of the most important algorithms for Hidden Markov Models are the forward and the Viterbi algorithms. We show how formulating these using linear algebra naturally lends itself to parallelization. Although the obtained algorithms are slow for Hidden Markov Models with large state spaces...

  16. Parallel shooting methods for finding steady state solutions to engine simulation models

    DEFF Research Database (Denmark)

    Andersen, Stig Kildegård; Thomsen, Per Grove; Carlsen, Henrik

    2007-01-01

    Parallel single- and multiple shooting methods were tested for finding periodic steady state solutions to a Stirling engine model. The model was used to illustrate features of the methods and possibilities for optimisations. Performance was measured using simulation of an experimental data set...

  17. Error modelling and experimental validation of a planar 3-PPR parallel manipulator with joint clearances

    DEFF Research Database (Denmark)

    Wu, Guanglei; Bai, Shaoping; Kepler, Jørgen Asbøl

    2012-01-01

    This paper deals with the error modelling and analysis of a 3-PPR planar parallel manipulator with joint clearances. The kinematics and the Cartesian workspace of the manipulator are analyzed. An error model is established with considerations of both configuration errors and joint clearances. Using...

  18. A one-dimensional heat transfer model for parallel-plate thermoacoustic heat exchangers

    NARCIS (Netherlands)

    de Jong, Anne; Wijnant, Ysbrand H.; de Boer, Andries

    2014-01-01

    A one-dimensional (1D) laminar oscillating flow heat transfer model is derived and applied to parallel-plate thermoacoustic heat exchangers. The model can be used to estimate the heat transfer from the solid wall to the acoustic medium, which is required for the heat input/output of thermoacoustic

  19. Modelling and simulation of multiple single - phase induction motor in parallel connection

    Directory of Open Access Journals (Sweden)

    Sujitjorn, S.

    2006-11-01

    Full Text Available A mathematical model for parallel connected n-multiple single-phase induction motors in generalized state-space form is proposed in this paper. The motor group draws electric power from one inverter. The model is developed by the dq-frame theory and was tested against four loading scenarios in which satisfactory results were obtained.

  20. Massively Parallel Signal Processing using the Graphics Processing Unit for Real-Time Brain-Computer Interface Feature Extraction.

    Science.gov (United States)

    Wilson, J Adam; Williams, Justin C

    2009-01-01

    The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.

  1. Parallel computing works

    Energy Technology Data Exchange (ETDEWEB)

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  2. Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

    Science.gov (United States)

    Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

    2017-12-01

    We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.

  3. New physics beyond the standard model of particle physics and parallel universes

    Energy Technology Data Exchange (ETDEWEB)

    Plaga, R. [Franzstr. 40, 53111 Bonn (Germany)]. E-mail: rainer.plaga@gmx.de

    2006-03-09

    It is shown that if-and only if-'parallel universes' exist, an electroweak vacuum that is expected to have decayed since the big bang with a high probability might exist. It would neither necessarily render our existence unlikely nor could it be observed. In this special case the observation of certain combinations of Higgs-boson and top-quark masses-for which the standard model predicts such a decay-cannot be interpreted as evidence for new physics at low energy scales. The question of whether parallel universes exist is of interest to our understanding of the standard model of particle physics.

  4. Interaction Admittance Based Modeling of Multi-Paralleled Grid-Connected Inverter with LCL-Filter

    DEFF Research Database (Denmark)

    Lu, Minghui; Blaabjerg, Frede; Wang, Xiongfei

    2016-01-01

    This paper investigates the mutual interaction and stability issues of multi-parallel LCL-filtered inverters. The stability and power quality of multiple grid-tied inverters are gaining more and more research attention as the penetration of renewables increases. In this paper, interactions...... and coupling effects among the multi-paralleled inverters and power grid are explicitly revealed. An Interaction Admittance concept is introduced to express and model the interaction through the physical admittances of the network. Compared to the existing modeling methods, the proposed analysis provides...

  5. Modeling styles in business process modeling

    NARCIS (Netherlands)

    Pinggera, J.; Soffer, P.; Zugal, S.; Weber, B.; Weidlich, M.; Fahland, D.; Reijers, H.A.; Mendling, J.; Bider, I.; Halpin, T.; Krogstie, J.; Nurcan, S.; Proper, E.; Schmidt, R.; Soffer, P.; Wrycza, S.

    2012-01-01

    Research on quality issues of business process models has recently begun to explore the process of creating process models. As a consequence, the question arises whether different ways of creating process models exist. In this vein, we observed 115 students engaged in the act of modeling, recording

  6. A New Track Reconstruction Algorithm suitable for Parallel Processing based on Hit Triplets and Broken Lines

    Directory of Open Access Journals (Sweden)

    Schöning André

    2016-01-01

    Full Text Available Track reconstruction in high track multiplicity environments at current and future high rate particle physics experiments is a big challenge and very time consuming. The search for track seeds and the fitting of track candidates are usually the most time consuming steps in the track reconstruction. Here, a new and fast track reconstruction method based on hit triplets is proposed which exploits a three-dimensional fit model including multiple scattering and hit uncertainties from the very start, including the search for track seeds. The hit triplet based reconstruction method assumes a homogeneous magnetic field which allows to give an analytical solutions for the triplet fit result. This method is highly parallelizable, needs fewer operations than other standard track reconstruction methods and is therefore ideal for the implementation on parallel computing architectures. The proposed track reconstruction algorithm has been studied in the context of the Mu3e-experiment and a typical LHC experiment.

  7. Global restructuring of the CPM-2 transport algorithm for vector and parallel processing

    International Nuclear Information System (INIS)

    Vujic, J.L.; Martin, W.R.

    1989-01-01

    The CPM-2 code is an assembly transport code based on the collision probability (CP) method. It can in principle be applied to global reactor problems, but its excessive computational demands prevent this application. Therefore, a new transport algorithm for CPM-2 has been developed for vector-parallel architectures, which has resulted in an overall factor of 20 speedup (wall clock) on the IBM 3090-600E. This paper presents the detailed results of this effort as well as a brief description of ongoing effort to remove some of the modeling limitations in CPM-2 that inhibit its use for global applications, such as the use of the pure CP treatment and the assumption of isotropic scattering

  8. Parallel Processing and Bio-inspired Computing for Biomedical Image Registration

    Directory of Open Access Journals (Sweden)

    Silviu Ioan Bejinariu

    2014-07-01

    Full Text Available Image Registration (IR is an optimization problem computing optimal parameters of a geometric transform used to overlay one or more source images to a given model by maximizing a similarity measure. In this paper the use of bio-inspired optimization algorithms in image registration is analyzed. Results obtained by means of three different algorithms are compared: Bacterial Foraging Optimization Algorithm (BFOA, Genetic Algorithm (GA and Clonal Selection Algorithm (CSA. Depending on the images type, the registration may be: area based, which is slow but more precise, and features based, which is faster. In this paper a feature based approach based on the Scale Invariant Feature Transform (SIFT is proposed. Finally, results obtained using sequential and parallel implementations on multi-core systems for area based and features based image registration are compared.

  9. A novel conceptual design of parallel nitrogen expansion liquefaction process for small-scale LNG (liquefied natural gas) plant in skid-mount packages

    International Nuclear Information System (INIS)

    He, Tianbiao; Ju, Yonglin

    2014-01-01

    The utilization of unconventional natural gas is still a great challenge for China due to its distribution locations and small reserves. Thus, liquefying the unconventional natural gas by using small-scale LNG plant in skid-mount packages is a good choice with great economic benefits. A novel conceptual design of parallel nitrogen expansion liquefaction process for small-scale plant in skid-mount packages has been proposed. It first designs a process configuration. Then, thermodynamic analysis of the process is conducted. Next, an optimization model with genetic algorithm method is developed to optimize the process. Finally, the flexibilities of the process are tested by two different feed gases. In conclusion, the proposed parallel nitrogen expansion liquefaction process can be used in small-scale LNG plant in skid-mount packages with high exergy efficiency and great economic benefits. - Highlights: • A novel design of parallel nitrogen expansion liquefaction process is proposed. • Genetic algorithm is applied to optimize the novel process. • The unit energy consumption of optimized process is 0.5163 kWh/Nm 3 . • The exergy efficiency of the optimized case is 0.3683. • The novel process has a good flexibility for different feed gas conditions

  10. Stage-by-Stage and Parallel Flow Path Compressor Modeling for a Variable Cycle Engine

    Science.gov (United States)

    Kopasakis, George; Connolly, Joseph W.; Cheng, Larry

    2015-01-01

    This paper covers the development of stage-by-stage and parallel flow path compressor modeling approaches for a Variable Cycle Engine. The stage-by-stage compressor modeling approach is an extension of a technique for lumped volume dynamics and performance characteristic modeling. It was developed to improve the accuracy of axial compressor dynamics over lumped volume dynamics modeling. The stage-by-stage compressor model presented here is formulated into a parallel flow path model that includes both axial and rotational dynamics. This is done to enable the study of compressor and propulsion system dynamic performance under flow distortion conditions. The approaches utilized here are generic and should be applicable for the modeling of any axial flow compressor design.

  11. Massively parallel signal processing using the graphics processing unit for real-time brain-computer interface feature extraction

    Directory of Open Access Journals (Sweden)

    J. Adam Wilson

    2009-07-01

    Full Text Available The clock speeds of modern computer processors have nearly plateaued in the past five years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card (GPU was developed for real-time neural signal processing of a brain-computer interface (BCI. The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter, followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally-intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a CPU-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.

  12. Analysis and Modeling of Circulating Current in Two Parallel-Connected Inverters

    DEFF Research Database (Denmark)

    Maheshwari, Ram Krishan; Gohil, Ghanshyamsinh Vijaysinh; Bede, Lorand

    2015-01-01

    Parallel-connected inverters are gaining attention for high power applications because of the limited power handling capability of the power modules. Moreover, the parallel-connected inverters may have low total harmonic distortion of the ac current if they are operated with the interleaved pulse...... this model, the circulating current between two parallel-connected inverters is analysed in this study. The peak and root mean square (rms) values of the normalised circulating current are calculated for different PWM methods, which makes this analysis a valuable tool to design a filter for the circulating......-width modulation (PWM). However, the interleaved PWM causes a circulating current between the inverters, which in turn causes additional losses. A model describing the dynamics of the circulating current is presented in this study which shows that the circulating current depends on the common-mode voltage. Using...

  13. PARALLEL ADAPTIVE MULTILEVEL SAMPLING ALGORITHMS FOR THE BAYESIAN ANALYSIS OF MATHEMATICAL MODELS

    KAUST Repository

    Prudencio, Ernesto; Cheung, Sai Hung

    2012-01-01

    In recent years, Bayesian model updating techniques based on measured data have been applied to many engineering and applied science problems. At the same time, parallel computational platforms are becoming increasingly more powerful and are being used more frequently by the engineering and scientific communities. Bayesian techniques usually require the evaluation of multi-dimensional integrals related to the posterior probability density function (PDF) of uncertain model parameters. The fact that such integrals cannot be computed analytically motivates the research of stochastic simulation methods for sampling posterior PDFs. One such algorithm is the adaptive multilevel stochastic simulation algorithm (AMSSA). In this paper we discuss the parallelization of AMSSA, formulating the necessary load balancing step as a binary integer programming problem. We present a variety of results showing the effectiveness of load balancing on the overall performance of AMSSA in a parallel computational environment.

  14. Optimal parallel algorithms for problems modeled by a family of intervals

    Science.gov (United States)

    Olariu, Stephan; Schwing, James L.; Zhang, Jingyuan

    1992-01-01

    A family of intervals on the real line provides a natural model for a vast number of scheduling and VLSI problems. Recently, a number of parallel algorithms to solve a variety of practical problems on such a family of intervals have been proposed in the literature. Computational tools are developed, and it is shown how they can be used for the purpose of devising cost-optimal parallel algorithms for a number of interval-related problems including finding a largest subset of pairwise nonoverlapping intervals, a minimum dominating subset of intervals, along with algorithms to compute the shortest path between a pair of intervals and, based on the shortest path, a parallel algorithm to find the center of the family of intervals. More precisely, with an arbitrary family of n intervals as input, all algorithms run in O(log n) time using O(n) processors in the EREW-PRAM model of computation.

  15. Comparison of microbial community shifts in two parallel multi-step drinking water treatment processes.

    Science.gov (United States)

    Xu, Jiajiong; Tang, Wei; Ma, Jun; Wang, Hong

    2017-07-01

    Drinking water treatment processes remove undesirable chemicals and microorganisms from source water, which is vital to public health protection. The purpose of this study was to investigate the effects of treatment processes and configuration on the microbiome by comparing microbial community shifts in two series of different treatment processes operated in parallel within a full-scale drinking water treatment plant (DWTP) in Southeast China. Illumina sequencing of 16S rRNA genes of water samples demonstrated little effect of coagulation/sedimentation and pre-oxidation steps on bacterial communities, in contrast to dramatic and concurrent microbial community shifts during ozonation, granular activated carbon treatment, sand filtration, and disinfection for both series. A large number of unique operational taxonomic units (OTUs) at these four treatment steps further illustrated their strong shaping power towards the drinking water microbial communities. Interestingly, multidimensional scaling analysis revealed tight clustering of biofilm samples collected from different treatment steps, with Nitrospira, the nitrite-oxidizing bacteria, noted at higher relative abundances in biofilm compared to water samples. Overall, this study provides a snapshot of step-to-step microbial evolvement in multi-step drinking water treatment systems, and the results provide insight to control and manipulation of the drinking water microbiome via optimization of DWTP design and operation.

  16. Analysis of clinical complication data for radiation hepatitis using a parallel architecture model

    International Nuclear Information System (INIS)

    Jackson, A.; Haken, R.K. ten; Robertson, J.M.; Kessler, M.L.; Kutcher, G.J.; Lawrence, T.S.

    1995-01-01

    Purpose: The detailed knowledge of dose volume distributions available from the three-dimensional (3D) conformal radiation treatment of tumors in the liver (reported elsewhere) offers new opportunities to quantify the effect of volume on the probability of producing radiation hepatitis. We aim to test a new parallel architecture model of normal tissue complication probability (NTCP) with these data. Methods and Materials: Complication data and dose volume histograms from a total of 93 patients with normal liver function, treated on a prospective protocol with 3D conformal radiation therapy and intraarterial hepatic fluorodeoxyuridine, were analyzed with a new parallel architecture model. Patient treatment fell into six categories differing in doses delivered and volumes irradiated. By modeling the radiosensitivity of liver subunits, we are able to use dose volume histograms to calculate the fraction of the liver damaged in each patient. A complication results if this fraction exceeds the patient's functional reserve. To determine the patient distribution of functional reserves and the subunit radiosensitivity, the maximum likelihood method was used to fit the observed complication data. Results: The parallel model fit the complication data well, although uncertainties on the functional reserve distribution and subunit radiosensitivy are highly correlated. Conclusion: The observed radiation hepatitis complications show a threshold effect that can be described well with a parallel architecture model. However, additional independent studies are required to better determine the parameters defining the functional reserve distribution and subunit radiosensitivity

  17. a Predator-Prey Model Based on the Fully Parallel Cellular Automata

    Science.gov (United States)

    He, Mingfeng; Ruan, Hongbo; Yu, Changliang

    We presented a predator-prey lattice model containing moveable wolves and sheep, which are characterized by Penna double bit strings. Sexual reproduction and child-care strategies are considered. To implement this model in an efficient way, we build a fully parallel Cellular Automata based on a new definition of the neighborhood. We show the roles played by the initial densities of the populations, the mutation rate and the linear size of the lattice in the evolution of this model.

  18. Precise Modeling Based on Dynamic Phasors for Droop-Controlled Parallel-Connected Inverters

    DEFF Research Database (Denmark)

    Wang, L.; Guo, X.Q.; Gu, H.R.

    2012-01-01

    This paper deals with the precise modeling of droop controlled parallel inverters. This is very attractive since that is a common structure that can be found in a stand-alone droopcontrolled MicroGrid. The conventional small-signal dynamic is not able to predict instabilities of the system, so...

  19. Verification of Electromagnetic Physics Models for Parallel Computing Architectures in the GeantV Project

    Energy Technology Data Exchange (ETDEWEB)

    Amadio, G.; et al.

    2017-11-22

    An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physics models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.

  20. PVeStA: A Parallel Statistical Model Checking and Quantitative Analysis Tool

    KAUST Repository

    AlTurki, Musab

    2011-01-01

    Statistical model checking is an attractive formal analysis method for probabilistic systems such as, for example, cyber-physical systems which are often probabilistic in nature. This paper is about drastically increasing the scalability of statistical model checking, and making such scalability of analysis available to tools like Maude, where probabilistic systems can be specified at a high level as probabilistic rewrite theories. It presents PVeStA, an extension and parallelization of the VeStA statistical model checking tool [10]. PVeStA supports statistical model checking of probabilistic real-time systems specified as either: (i) discrete or continuous Markov Chains; or (ii) probabilistic rewrite theories in Maude. Furthermore, the properties that it can model check can be expressed in either: (i) PCTL/CSL, or (ii) the QuaTEx quantitative temporal logic. As our experiments show, the performance gains obtained from parallelization can be very high. © 2011 Springer-Verlag.

  1. Individual differences in speech-in-noise perception parallel neural speech processing and attention in preschoolers

    Science.gov (United States)

    Thompson, Elaine C.; Carr, Kali Woodruff; White-Schwoch, Travis; Otto-Meyer, Sebastian; Kraus, Nina

    2016-01-01

    From bustling classrooms to unruly lunchrooms, school settings are noisy. To learn effectively in the unwelcome company of numerous distractions, children must clearly perceive speech in noise. In older children and adults, speech-in-noise perception is supported by sensory and cognitive processes, but the correlates underlying this critical listening skill in young children (3–5 year olds) remain undetermined. Employing a longitudinal design (two evaluations separated by ~12 months), we followed a cohort of 59 preschoolers, ages 3.0–4.9, assessing word-in-noise perception, cognitive abilities (intelligence, short-term memory, attention), and neural responses to speech. Results reveal changes in word-in-noise perception parallel changes in processing of the fundamental frequency (F0), an acoustic cue known for playing a role central to speaker identification and auditory scene analysis. Four unique developmental trajectories (speech-in-noise perception groups) confirm this relationship, in that improvements and declines in word-in-noise perception couple with enhancements and diminishments of F0 encoding, respectively. Improvements in word-in-noise perception also pair with gains in attention. Word-in-noise perception does not relate to strength of neural harmonic representation or short-term memory. These findings reinforce previously-reported roles of F0 and attention in hearing speech in noise in older children and adults, and extend this relationship to preschool children. PMID:27864051

  2. Early and parallel processing of pragmatic and semantic information in speech acts: neurophysiological evidence

    Directory of Open Access Journals (Sweden)

    Natalia eEgorova

    2013-03-01

    Full Text Available Although language is a tool for communication, most research in the neuroscience of language has focused on studying words and sentences, while little is known about the brain mechanisms of speech acts, or communicative functions, for which words and sentences are used as tools. Here the neural processing of two types of speech acts, Naming and Requesting, was addressed using the time-resolved event-related potential (ERP technique. The brain responses for Naming and Request diverged as early as ~120 ms after the onset of the critical words, at the same time as, or even before, the earliest brain manifestations of semantic word properties could be detected. Request-evoked potentials were generally larger in amplitude than those for Naming. The use of identical words in closely matched settings for both speech acts rules out explanation of the difference in terms of phonological, lexical, semantic properties or word expectancy. The cortical sources underlying the ERP enhancement for Requests were found in the fronto-central cortex, consistent with the activation of action knowledge, as well as in right temporo-parietal junction, possibly reflecting additional implications of speech acts for social interaction and theory of mind. These results provide the first evidence for surprisingly early access to pragmatic and social interactive knowledge, which possibly occurs in parallel with other types of linguistic processing, and thus supports the near-simultaneous access to different subtypes of psycholinguistic information.

  3. Parallel processing in the brain’s visual form system: An fMRI study

    Directory of Open Access Journals (Sweden)

    Yoshihito eShigihara

    2014-07-01

    Full Text Available We here extend and complement our earlier time-based, magneto-encephalographic (MEG, study of the processing of forms by the visual brain (Shigihara and Zeki, 2013 with a functional magnetic resonance imaging (fMRI study, in order to better localize the activity produced in early visual areas when subjects view simple geometric stimuli of increasing perceptual complexity (lines, angles, rhomboids constituted from the same elements (lines. Our results show that all three categories of form activate all three visual areas with which we were principally concerned (V1, V2, V3, with angles producing the strongest and rhomboids the weakest activity in all three. The difference between the activity produced by angles and rhomboids was significant, that between lines and rhomboids was trend significant while that between lines and angles was not. Taken together with our earlier MEG results, the present ones suggest that a parallel strategy is used in processing forms, in addition to the well-documented hierarchical strategy.

  4. Parallel and interactive learning processes within the basal ganglia: relevance for the understanding of addiction.

    Science.gov (United States)

    Belin, David; Jonkman, Sietse; Dickinson, Anthony; Robbins, Trevor W; Everitt, Barry J

    2009-04-12

    In this review we discuss the evidence that drug addiction, defined as a maladaptive compulsive habit, results from the progressive subversion by addictive drugs of striatum-dependent operant and Pavlovian learning mechanisms that are usually involved in the control over behaviour by stimuli associated with natural reinforcement. Although mainly organized through segregated parallel cortico-striato-pallido-thalamo-cortical loops involved in motor or emotional functions, the basal ganglia, and especially the striatum, are key mediators of the modulation of behavioural responses, under the control of both action-outcome and stimulus-response mechanisms, by incentive motivational processes and Pavlovian associations. Here we suggest that protracted exposure to addictive drugs recruits serial and dopamine-dependent, striato-nigro-striatal ascending spirals from the nucleus accumbens to more dorsal regions of the striatum that underlie a shift from action-outcome to stimulus-response mechanisms in the control over drug seeking. When this progressive ventral to dorsal striatum shift is combined with drug-associated Pavlovian influences from limbic structures such as the amygdala and the orbitofrontal cortex, drug seeking behaviour becomes established as an incentive habit. This instantiation of implicit sub-cortical processing of drug-associated stimuli and instrumental responding might be a key mechanism underlying the development of compulsive drug seeking and the high vulnerability to relapse which are hallmarks of drug addiction.

  5. Parallelization of a Quantum-Classic Hybrid Model For Nanoscale Semiconductor Devices

    Directory of Open Access Journals (Sweden)

    Oscar Salas

    2011-07-01

    Full Text Available The expensive reengineering of the sequential software and the difficult parallel programming are two of the many technical and economic obstacles to the wide use of HPC. We investigate the chance to improve in a rapid way the performance of a numerical serial code for the simulation of the transport of a charged carriers in a Double-Gate MOSFET. We introduce the Drift-Diffusion-Schrödinger-Poisson (DDSP model and we study a rapid parallelization strategy of the numerical procedure on shared memory architectures.

  6. Modeling of Electromagnetic Fields in Parallel-Plane Structures: A Unified Contour-Integral Approach

    Directory of Open Access Journals (Sweden)

    M. Stumpf

    2017-04-01

    Full Text Available A unified reciprocity-based modeling approach for analyzing electromagnetic fields in dispersive parallel-plane structures of arbitrary shape is described. It is shown that the use of the reciprocity theorem of the time-convolution type leads to a global contour-integral interaction quantity from which novel both time- and frequency-domain numerical schemes can be arrived at. Applications of the numerical method concerning the time-domain radiated interference and susceptibility of parallel-plane structures are discussed and illustrated on numerical examples.

  7. Overtaking CPU DBMSes with a GPU in whole-query analytic processing with parallelism-friendly execution plan optimization

    NARCIS (Netherlands)

    A. Agbaria (Adnan); D. Minor (David); N. Peterfreund (Natan); E. Rozenberg (Eyal); O. Rosenberg (Ofer); Huawei Research

    2016-01-01

    textabstractExisting work on accelerating analytic DB query processing with (discrete) GPUs fails to fully realize their potential for speedup through parallelism: Published results do not achieve significant speedup over more performant CPU-only DBMSes when processing complete queries. This

  8. The Development of Reading and Spelling in Arabic Orthography: Two Parallel Processes?

    Science.gov (United States)

    Taha, Haitham

    2016-01-01

    The parallels between reading and spelling skills in Arabic were tested. One-hundred forty-three native Arab students, with typical reading development, from second, fourth, and sixth grades were tested with reading, spelling and orthographic decision tasks. The results indicated a full parallel between the reading and spelling performances within…

  9. Development Of A Parallel Performance Model For The THOR Neutral Particle Transport Code

    Energy Technology Data Exchange (ETDEWEB)

    Yessayan, Raffi; Azmy, Yousry; Schunert, Sebastian

    2017-02-01

    The THOR neutral particle transport code enables simulation of complex geometries for various problems from reactor simulations to nuclear non-proliferation. It is undergoing a thorough V&V requiring computational efficiency. This has motivated various improvements including angular parallelization, outer iteration acceleration, and development of peripheral tools. For guiding future improvements to the code’s efficiency, better characterization of its parallel performance is useful. A parallel performance model (PPM) can be used to evaluate the benefits of modifications and to identify performance bottlenecks. Using INL’s Falcon HPC, the PPM development incorporates an evaluation of network communication behavior over heterogeneous links and a functional characterization of the per-cell/angle/group runtime of each major code component. After evaluating several possible sources of variability, this resulted in a communication model and a parallel portion model. The former’s accuracy is bounded by the variability of communication on Falcon while the latter has an error on the order of 1%.

  10. Automatic analysis (aa: efficient neuroimaging workflows and parallel processing using Matlab and XML

    Directory of Open Access Journals (Sweden)

    Rhodri eCusack

    2015-01-01

    Full Text Available Recent years have seen neuroimaging data becoming richer, with larger cohorts of participants, a greater variety of acquisition techniques, and increasingly complex analyses. These advances have made data analysis pipelines complex to set up and run (increasing the risk of human error and time consuming to execute (restricting what analyses are attempted. Here we present an open-source framework, automatic analysis (aa, to address these concerns. Human efficiency is increased by making code modular and reusable, and managing its execution with a processing engine that tracks what has been completed and what needs to be (redone. Analysis is accelerated by optional parallel processing of independent tasks on cluster or cloud computing resources. A pipeline comprises a series of modules that each perform a specific task. The processing engine keeps track of the data, calculating a map of upstream and downstream dependencies for each module. Existing modules are available for many analysis tasks, such as SPM-based fMRI preprocessing, individual and group level statistics, voxel-based morphometry, tractography, and multi-voxel pattern analyses (MVPA. However, aa also allows for full customization, and encourages efficient management of code: new modules may be written with only a small code overhead. aa has been used by more than 50 researchers in hundreds of neuroimaging studies comprising thousands of subjects. It has been found to be robust, fast and efficient, for simple single subject studies up to multimodal pipelines on hundreds of subjects. It is attractive to both novice and experienced users. aa can reduce the amount of time neuroimaging laboratories spend performing analyses and reduce errors, expanding the range of scientific questions it is practical to address.

  11. Automatic analysis (aa): efficient neuroimaging workflows and parallel processing using Matlab and XML.

    Science.gov (United States)

    Cusack, Rhodri; Vicente-Grabovetsky, Alejandro; Mitchell, Daniel J; Wild, Conor J; Auer, Tibor; Linke, Annika C; Peelle, Jonathan E

    2014-01-01

    Recent years have seen neuroimaging data sets becoming richer, with larger cohorts of participants, a greater variety of acquisition techniques, and increasingly complex analyses. These advances have made data analysis pipelines complicated to set up and run (increasing the risk of human error) and time consuming to execute (restricting what analyses are attempted). Here we present an open-source framework, automatic analysis (aa), to address these concerns. Human efficiency is increased by making code modular and reusable, and managing its execution with a processing engine that tracks what has been completed and what needs to be (re)done. Analysis is accelerated by optional parallel processing of independent tasks on cluster or cloud computing resources. A pipeline comprises a series of modules that each perform a specific task. The processing engine keeps track of the data, calculating a map of upstream and downstream dependencies for each module. Existing modules are available for many analysis tasks, such as SPM-based fMRI preprocessing, individual and group level statistics, voxel-based morphometry, tractography, and multi-voxel pattern analyses (MVPA). However, aa also allows for full customization, and encourages efficient management of code: new modules may be written with only a small code overhead. aa has been used by more than 50 researchers in hundreds of neuroimaging studies comprising thousands of subjects. It has been found to be robust, fast, and efficient, for simple-single subject studies up to multimodal pipelines on hundreds of subjects. It is attractive to both novice and experienced users. aa can reduce the amount of time neuroimaging laboratories spend performing analyses and reduce errors, expanding the range of scientific questions it is practical to address.

  12. Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy.

    Science.gov (United States)

    Penas, David R; González, Patricia; Egea, Jose A; Doallo, Ramón; Banga, Julio R

    2017-01-21

    The development of large-scale kinetic models is one of the current key issues in computational systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic models. Global optimization methods can be used to solve this type of problems but the associated computational cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters, requiring a number of initial exploratory runs and therefore further increasing the computation times. Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and fine-grained parallelism, and (iii) self-tuning strategies. The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of computation times with respect to several previous state of the art methods (from days to minutes, in several cases) even when only a small number of processors is used. The new parallel cooperative method presented here allows the solution of medium and large scale parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method can play a key role in the development of large-scale and even whole-cell dynamic models.

  13. Parallel, but Dissociable, Processing in Discrete Corticostriatal Inputs Encodes Skill Learning.

    Science.gov (United States)

    Kupferschmidt, David A; Juczewski, Konrad; Cui, Guohong; Johnson, Kari A; Lovinger, David M

    2017-10-11

    Changes in cortical and striatal function underlie the transition from novel actions to refined motor skills. How discrete, anatomically defined corticostriatal projections function in vivo to encode skill learning remains unclear. Using novel fiber photometry approaches to assess real-time activity of associative inputs from medial prefrontal cortex to dorsomedial striatum and sensorimotor inputs from motor cortex to dorsolateral striatum, we show that associative and sensorimotor inputs co-engage early in action learning and disengage in a dissociable manner as actions are refined. Disengagement of associative, but not sensorimotor, inputs predicts individual differences in subsequent skill learning. Divergent somatic and presynaptic engagement in both projections during early action learning suggests potential learning-related in vivo modulation of presynaptic corticostriatal function. These findings reveal parallel processing within associative and sensorimotor circuits that challenges and refines existing views of corticostriatal function and expose neuronal projection- and compartment-specific activity dynamics that encode and predict action learning. Published by Elsevier Inc.

  14. Parallel processing and learning in simple systems. Final report, 10 January 1986-14 January 1989

    Energy Technology Data Exchange (ETDEWEB)

    Mpitsos, G.J.

    1989-03-15

    Work over the three-year tenure of this grant has dealt with interrelated studies of (1) neuropharmacology, (2) behavior, and (3) distributed/parallel processing in the generation of variable motor patterns in the buccal-oral system of the sea slug Pleurobranchaea californica. (4) Computer simulations of simple neutral networks have been undertaken to examine neurointegrative principles that could not be examined in biological preparations. The simulation work has set the basis for further simulations dealing with networks having characteristics relating to real neurons. All of the work has had the goal of developing interdisciplinary tools for understanding the scale-independent problem of how individuals, each possessing only local knowledge of group activity, act within a group to produce different and variable adaptive outputs, and, in turn, of how the group influences the activity of the individual. The pharmacologic studies have had the goal of developing biochemical tools with which to identify groups of neurons that perform specific tasks during the production of a given behavior but are multifunctional by being critically involved in generating several different behaviors.

  15. Numerical modelling of series-parallel cooling systems in power plant

    Directory of Open Access Journals (Sweden)

    Regucki Paweł

    2017-01-01

    Full Text Available The paper presents a mathematical model allowing one to study series-parallel hydraulic systems like, e.g., the cooling system of a power boiler's auxiliary devices or a closed cooling system including condensers and cooling towers. The analytical approach is based on a set of non-linear algebraic equations solved using numerical techniques. As a result of the iterative process, a set of volumetric flow rates of water through all the branches of the investigated hydraulic system is obtained. The calculations indicate the influence of changes in the pipeline's geometrical parameters on the total cooling water flow rate in the analysed installation. Such an approach makes it possible to analyse different variants of the modernization of the studied systems, as well as allowing for the indication of its critical elements. Basing on these results, an investor can choose the optimal variant of the reconstruction of the installation from the economic point of view. As examples of such a calculation, two hydraulic installations are described. One is a boiler auxiliary cooling installation including two screw ash coolers. The other is a closed cooling system consisting of cooling towers and condensers.

  16. Generating process model collections

    NARCIS (Netherlands)

    Yan, Z.; Dijkman, R.M.; Grefen, P.W.P.J.

    2017-01-01

    Business process management plays an important role in the management of organizations. More and more organizations describe their operations as business processes. It is common for organizations to have collections of thousands of business processes, but for reasons of confidentiality these

  17. Vlasov modelling of parallel transport in a tokamak scrape-off layer

    International Nuclear Information System (INIS)

    Manfredi, G; Hirstoaga, S; Devaux, S

    2011-01-01

    A one-dimensional Vlasov-Poisson model is used to describe the parallel transport in a tokamak scrape-off layer. Thanks to a recently developed 'asymptotic-preserving' numerical scheme, it is possible to lift numerical constraints on the time step and grid spacing, which are no longer limited by, respectively, the electron plasma period and Debye length. The Vlasov approach provides a good velocity-space resolution even in regions of low density. The model is applied to the study of parallel transport during edge-localized modes, with particular emphasis on the particles and energy fluxes on the divertor plates. The numerical results are compared with analytical estimates based on a free-streaming model, with good general agreement. An interesting feature is the observation of an early electron energy flux, due to suprathermal electrons escaping the ions' attraction. In contrast, the long-time evolution is essentially quasi-neutral and dominated by the ion dynamics.

  18. The Parallel System for Integrating Impact Models and Sectors (pSIMS)

    Science.gov (United States)

    Elliott, Joshua; Kelly, David; Chryssanthacopoulos, James; Glotter, Michael; Jhunjhnuwala, Kanika; Best, Neil; Wilde, Michael; Foster, Ian

    2014-01-01

    We present a framework for massively parallel climate impact simulations: the parallel System for Integrating Impact Models and Sectors (pSIMS). This framework comprises a) tools for ingesting and converting large amounts of data to a versatile datatype based on a common geospatial grid; b) tools for translating this datatype into custom formats for site-based models; c) a scalable parallel framework for performing large ensemble simulations, using any one of a number of different impacts models, on clusters, supercomputers, distributed grids, or clouds; d) tools and data standards for reformatting outputs to common datatypes for analysis and visualization; and e) methodologies for aggregating these datatypes to arbitrary spatial scales such as administrative and environmental demarcations. By automating many time-consuming and error-prone aspects of large-scale climate impacts studies, pSIMS accelerates computational research, encourages model intercomparison, and enhances reproducibility of simulation results. We present the pSIMS design and use example assessments to demonstrate its multi-model, multi-scale, and multi-sector versatility.

  19. What makes process models understandable?

    NARCIS (Netherlands)

    Mendling, J.; Reijers, H.A.; Cardoso, J.; Alonso, G.; Dadam, P.; Rosemann, M.

    2007-01-01

    Despite that formal and informal quality aspects are of significant importance to business process modeling, there is only little empirical work reported on process model quality and its impact factors. In this paper we investigate understandability as a proxy for quality of process models and focus

  20. Acceleration and sensitivity analysis of lattice kinetic Monte Carlo simulations using parallel processing and rate constant rescaling.

    Science.gov (United States)

    Núñez, M; Robie, T; Vlachos, D G

    2017-10-28

    Kinetic Monte Carlo (KMC) simulation provides insights into catalytic reactions unobtainable with either experiments or mean-field microkinetic models. Sensitivity analysis of KMC models assesses the robustness of the predictions to parametric perturbations and identifies rate determining steps in a chemical reaction network. Stiffness in the chemical reaction network, a ubiquitous feature, demands lengthy run times for KMC models and renders efficient sensitivity analysis based on the likelihood ratio method unusable. We address the challenge of efficiently conducting KMC simulations and performing accurate sensitivity analysis in systems with unknown time scales by employing two acceleration techniques: rate constant rescaling and parallel processing. We develop statistical criteria that ensure sufficient sampling of non-equilibrium steady state conditions. Our approach provides the twofold benefit of accelerating the simulation itself and enabling likelihood ratio sensitivity analysis, which provides further speedup relative to finite difference sensitivity analysis. As a result, the likelihood ratio method can be applied to real chemistry. We apply our methodology to the water-gas shift reaction on Pt(111).

  1. A one-dimensional heat transfer model for parallel-plate thermoacoustic heat exchangers.

    Science.gov (United States)

    de Jong, J A; Wijnant, Y H; de Boer, A

    2014-03-01

    A one-dimensional (1D) laminar oscillating flow heat transfer model is derived and applied to parallel-plate thermoacoustic heat exchangers. The model can be used to estimate the heat transfer from the solid wall to the acoustic medium, which is required for the heat input/output of thermoacoustic systems. The model is implementable in existing (quasi-)1D thermoacoustic codes, such as DeltaEC. Examples of generated results show good agreement with literature results. The model allows for arbitrary wave phasing; however, it is shown that the wave phasing does not significantly influence the heat transfer.

  2. Modeling and Control of the Redundant Parallel Adjustment Mechanism on a Deployable Antenna Panel

    Directory of Open Access Journals (Sweden)

    Lili Tian

    2016-10-01

    Full Text Available With the aim of developing multiple input and multiple output (MIMO coupling systems with a redundant parallel adjustment mechanism on the deployable antenna panel, a structural control integrated design methodology is proposed in this paper. Firstly, the modal information from the finite element model of the structure of the antenna panel is extracted, and then the mathematical model is established with the Hamilton principle; Secondly, the discrete Linear Quadratic Regulator (LQR controller is added to the model in order to control the actuators and adjust the shape of the panel. Finally, the engineering practicality of the modeling and control method based on finite element analysis simulation is verified.

  3. Dynamic modelling of a 3-CPU parallel robot via screw theory

    Directory of Open Access Journals (Sweden)

    L. Carbonari

    2013-04-01

    Full Text Available The article describes the dynamic modelling of I.Ca.Ro., a novel Cartesian parallel robot recently designed and prototyped by the robotics research group of the Polytechnic University of Marche. By means of screw theory and virtual work principle, a computationally efficient model has been built, with the final aim of realising advanced model based controllers. Then a dynamic analysis has been performed in order to point out possible model simplifications that could lead to a more efficient run time implementation.

  4. Reconstruction for Time-Domain In Vivo EPR 3D Multigradient Oximetric Imaging—A Parallel Processing Perspective

    Directory of Open Access Journals (Sweden)

    Christopher D. Dharmaraj

    2009-01-01

    Full Text Available Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23×23×23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet. The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.

  5. Reconstruction for time-domain in vivo EPR 3D multigradient oximetric imaging--a parallel processing perspective.

    Science.gov (United States)

    Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C

    2009-01-01

    Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.

  6. Massively Parallel Geostatistical Inversion of Coupled Processes in Heterogeneous Porous Media

    Science.gov (United States)

    Ngo, A.; Schwede, R. L.; Li, W.; Bastian, P.; Ippisch, O.; Cirpka, O. A.

    2012-04-01

    The quasi-linear geostatistical approach is an inversion scheme that can be used to estimate the spatial distribution of a heterogeneous hydraulic conductivity field. The estimated parameter field is considered to be a random variable that varies continuously in space, meets the measurements of dependent quantities (such as the hydraulic head, the concentration of a transported solute or its arrival time) and shows the required spatial correlation (described by certain variogram models). This is a method of conditioning a parameter field to observations. Upon discretization, this results in as many parameters as elements of the computational grid. For a full three dimensional representation of the heterogeneous subsurface it is hardly sufficient to work with resolutions (up to one million parameters) of the model domain that can be achieved on a serial computer. The forward problems to be solved within the inversion procedure consists of the elliptic steady-state groundwater flow equation and the formally elliptic but nearly hyperbolic steady-state advection-dominated solute transport equation in a heterogeneous porous medium. Both equations are discretized by Finite Element Methods (FEM) using fully scalable domain decomposition techniques. Whereas standard conforming FEM is sufficient for the flow equation, for the advection dominated transport equation, which rises well known numerical difficulties at sharp fronts or boundary layers, we use the streamline diffusion approach. The arising linear systems are solved using efficient iterative solvers with an AMG (algebraic multigrid) pre-conditioner. During each iteration step of the inversion scheme one needs to solve a multitude of forward and adjoint problems in order to calculate the sensitivities of each measurement and the related cross-covariance matrix of the unknown parameters and the observations. In order to reduce interprocess communications and to improve the scalability of the code on larger clusters

  7. Study on Parallel Processing for Efficient Flexible Multibody Analysis based on Subsystem Synthesis Method

    Energy Technology Data Exchange (ETDEWEB)

    Han, Jong-Boo; Song, Hajun; Kim, Sung-Soo [Chungnam Nat’l Univ., Daejeon (Korea, Republic of)

    2017-06-15

    Flexible multibody simulations are widely used in the industry to design mechanical systems. In flexible multibody dynamics, deformation coordinates are described either relatively in the body reference frame that is floating in the space or in the inertial reference frame. Moreover, these deformation coordinates are generated based on the discretization of the body according to the finite element approach. Therefore, the formulation of the flexible multibody system always deals with a huge number of degrees of freedom and the numerical solution methods require a substantial amount of computational time. Parallel computational methods are a solution for efficient computation. However, most of the parallel computational methods are focused on the efficient solution of large-sized linear equations. For multibody analysis, we need to develop an efficient formulation that could be suitable for parallel computation. In this paper, we developed a subsystem synthesis method for a flexible multibody system and proposed efficient parallel computational schemes based on the OpenMP API in order to achieve efficient computation. Simulations of a rotating blade system, which consists of three identical blades, were carried out with two different parallel computational schemes. Actual CPU times were measured to investigate the efficiency of the proposed parallel schemes.

  8. PARALLEL PROCESSING OF BIG POINT CLOUDS USING Z-ORDER-BASED PARTITIONING

    Directory of Open Access Journals (Sweden)

    C. Alis

    2016-06-01

    Full Text Available As laser scanning technology improves and costs are coming down, the amount of point cloud data being generated can be prohibitively difficult and expensive to process on a single machine. This data explosion is not only limited to point cloud data. Voluminous amounts of high-dimensionality and quickly accumulating data, collectively known as Big Data, such as those generated by social media, Internet of Things devices and commercial transactions, are becoming more prevalent as well. New computing paradigms and frameworks are being developed to efficiently handle the processing of Big Data, many of which utilize a compute cluster composed of several commodity grade machines to process chunks of data in parallel. A central concept in many of these frameworks is data locality. By its nature, Big Data is large enough that the entire dataset would not fit on the memory and hard drives of a single node hence replicating the entire dataset to each worker node is impractical. The data must then be partitioned across worker nodes in a manner that minimises data transfer across the network. This is a challenge for point cloud data because there exist different ways to partition data and they may require data transfer. We propose a partitioning based on Z-order which is a form of locality-sensitive hashing. The Z-order or Morton code is computed by dividing each dimension to form a grid then interleaving the binary representation of each dimension. For example, the Z-order code for the grid square with coordinates (x = 1 = 012, y = 3 = 112 is 10112 = 11. The number of points in each partition is controlled by the number of bits per dimension: the more bits, the fewer the points. The number of bits per dimension also controls the level of detail with more bits yielding finer partitioning. We present this partitioning method by implementing it on Apache Spark and investigating how different parameters affect the accuracy and running time of the k nearest

  9. Parallel Processing of Big Point Clouds Using Z-Order Partitioning

    Science.gov (United States)

    Alis, C.; Boehm, J.; Liu, K.

    2016-06-01

    As laser scanning technology improves and costs are coming down, the amount of point cloud data being generated can be prohibitively difficult and expensive to process on a single machine. This data explosion is not only limited to point cloud data. Voluminous amounts of high-dimensionality and quickly accumulating data, collectively known as Big Data, such as those generated by social media, Internet of Things devices and commercial transactions, are becoming more prevalent as well. New computing paradigms and frameworks are being developed to efficiently handle the processing of Big Data, many of which utilize a compute cluster composed of several commodity grade machines to process chunks of data in parallel. A central concept in many of these frameworks is data locality. By its nature, Big Data is large enough that the entire dataset would not fit on the memory and hard drives of a single node hence replicating the entire dataset to each worker node is impractical. The data must then be partitioned across worker nodes in a manner that minimises data transfer across the network. This is a challenge for point cloud data because there exist different ways to partition data and they may require data transfer. We propose a partitioning based on Z-order which is a form of locality-sensitive hashing. The Z-order or Morton code is computed by dividing each dimension to form a grid then interleaving the binary representation of each dimension. For example, the Z-order code for the grid square with coordinates (x = 1 = 012, y = 3 = 112) is 10112 = 11. The number of points in each partition is controlled by the number of bits per dimension: the more bits, the fewer the points. The number of bits per dimension also controls the level of detail with more bits yielding finer partitioning. We present this partitioning method by implementing it on Apache Spark and investigating how different parameters affect the accuracy and running time of the k nearest neighbour algorithm

  10. Highly accelerated cardiac cine parallel MRI using low-rank matrix completion and partial separability model

    Science.gov (United States)

    Lyu, Jingyuan; Nakarmi, Ukash; Zhang, Chaoyi; Ying, Leslie

    2016-05-01

    This paper presents a new approach to highly accelerated dynamic parallel MRI using low rank matrix completion, partial separability (PS) model. In data acquisition, k-space data is moderately randomly undersampled at the center kspace navigator locations, but highly undersampled at the outer k-space for each temporal frame. In reconstruction, the navigator data is reconstructed from undersampled data using structured low-rank matrix completion. After all the unacquired navigator data is estimated, the partial separable model is used to obtain partial k-t data. Then the parallel imaging method is used to acquire the entire dynamic image series from highly undersampled data. The proposed method has shown to achieve high quality reconstructions with reduction factors up to 31, and temporal resolution of 29ms, when the conventional PS method fails.

  11. SBML-PET-MPI: a parallel parameter estimation tool for Systems Biology Markup Language based models.

    Science.gov (United States)

    Zi, Zhike

    2011-04-01

    Parameter estimation is crucial for the modeling and dynamic analysis of biological systems. However, implementing parameter estimation is time consuming and computationally demanding. Here, we introduced a parallel parameter estimation tool for Systems Biology Markup Language (SBML)-based models (SBML-PET-MPI). SBML-PET-MPI allows the user to perform parameter estimation and parameter uncertainty analysis by collectively fitting multiple experimental datasets. The tool is developed and parallelized using the message passing interface (MPI) protocol, which provides good scalability with the number of processors. SBML-PET-MPI is freely available for non-commercial use at http://www.bioss.uni-freiburg.de/cms/sbml-pet-mpi.html or http://sites.google.com/site/sbmlpetmpi/.

  12. Parallel processing of information about location in the amygdala, entorhinal cortex and hippocampus.

    Science.gov (United States)

    Gaskin, Stephane; White, Norman M

    2013-11-01

    The conditioned cue preference paradigm was used to study how rats use extra-maze cues to discriminate between 2 adjacent arms on an 8-arm radial maze, a situation in which most of the same cues can be seen from both arms but only one arm contains food. Since the food-restricted rats eat while passively confined on the food-paired arm no responses are reinforced, so the discrimination is due to Pavlovian stimulus-reward (or outcome) learning. Consistent with other evidence that rats must move around in an environment to acquire a spatial map, we found that learning the adjacent arms CCP (ACCP) required a minimum amount of active exploration of the maze with no reinforcers present prior to passive pairing of the extra-maze cues with the food reinforcer, an instance of latent learning. Temporary inactivation of the hippocampus during the pre-exposure sessions had no effect on ACCP learning, confirming other evidence that the hippocampus is not involved in latent learning. A series of experiments indentified a circuit involving fimbria-fornix and dorsal entorhinal cortex as the neural basis of latent learning in this situation. In contrast, temporary inactivation of the entorhinal cortex or hippocampus during passive training or during testing blocked ACCP learning and expression, respectively, suggesting that these two structures co-operate in using spatial information to learn the location of food on the maze during passive pairing and to express this combined information during testing. In parallel with these processes we found that the amygdala processes information leading to an equal tendency to enter both adjacent arms (even though only one was paired with food) suggesting that the stimulus information available to this structure is not sufficiently precise to discriminate between the ambiguous cues visible from the adjacent arms. Expression of the ACCP in normal rats depends on hippocampus-based learning to avoid the unpaired arm which competes with the

  13. «Concurrency» in M-L-Parallel Semi-Markov Process

    Directory of Open Access Journals (Sweden)

    Larkin Eugene

    2017-01-01

    Full Text Available This article investigates the functioning of a swarm of robots, each of which receives instructions from the external human operator and autonomously executes them. An abstract model of functioning of a robot, a group of robots and multiple groups of robots was obtained using the notion of semi-Markov process. The concepts of aggregated initial and aggregated absorbing states were introduced. Correspondences for calculation of time parameters of concurrency were obtained.

  14. Neuroscientific Model of Motivational Process

    OpenAIRE

    Kim, Sung-il

    2013-01-01

    Considering the neuroscientific findings on reward, learning, value, decision-making, and cognitive control, motivation can be parsed into three sub processes, a process of generating motivation, a process of maintaining motivation, and a process of regulating motivation. I propose a tentative neuroscientific model of motivational processes which consists of three distinct but continuous sub processes, namely reward-driven approach, value-based decision-making, and goal-directed control. Rewa...

  15. Modeling of fatigue crack induced nonlinear ultrasonics using a highly parallelized explicit local interaction simulation approach

    Science.gov (United States)

    Shen, Yanfeng; Cesnik, Carlos E. S.

    2016-04-01

    This paper presents a parallelized modeling technique for the efficient simulation of nonlinear ultrasonics introduced by the wave interaction with fatigue cracks. The elastodynamic wave equations with contact effects are formulated using an explicit Local Interaction Simulation Approach (LISA). The LISA formulation is extended to capture the contact-impact phenomena during the wave damage interaction based on the penalty method. A Coulomb friction model is integrated into the computation procedure to capture the stick-slip contact shear motion. The LISA procedure is coded using the Compute Unified Device Architecture (CUDA), which enables the highly parallelized supercomputing on powerful graphic cards. Both the explicit contact formulation and the parallel feature facilitates LISA's superb computational efficiency over the conventional finite element method (FEM). The theoretical formulations based on the penalty method is introduced and a guideline for the proper choice of the contact stiffness is given. The convergence behavior of the solution under various contact stiffness values is examined. A numerical benchmark problem is used to investigate the new LISA formulation and results are compared with a conventional contact finite element solution. Various nonlinear ultrasonic phenomena are successfully captured using this contact LISA formulation, including the generation of nonlinear higher harmonic responses. Nonlinear mode conversion of guided waves at fatigue cracks is also studied.

  16. Kinetics of transformations nucleated on random parallel planes: analytical modelling and computer simulation

    International Nuclear Information System (INIS)

    Rios, Paulo R; Assis, Weslley L S; Ribeiro, Tatiana C S; Villa, Elena

    2012-01-01

    In a classical paper, Cahn derived expressions for the kinetics of transformations nucleated on random planes and lines. He used those as a model for nucleation on the boundaries, edges and vertices of a polycrystal consisting of equiaxed grains. In this paper it is demonstrated that Cahn's expression for random planes may be used in situations beyond the scope envisaged in Cahn's original paper. For instance, we derived an expression for the kinetics of transformations nucleated on random parallel planes that is identical to that formerly obtained by Cahn considering random planes. Computer simulation of transformations nucleated on random parallel planes is carried out. It is shown that there is excellent agreement between simulated results and analytical solutions. Such an agreement is to be expected if both the simulation and the analytical solution are correct. (paper)

  17. Animated computer graphics models of space and earth sciences data generated via the massively parallel processor

    Science.gov (United States)

    Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David

    1987-01-01

    The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.

  18. A self-calibrating robot based upon a virtual machine model of parallel kinematics

    DEFF Research Database (Denmark)

    Pedersen, David Bue; Eiríksson, Eyþór Rúnar; Hansen, Hans Nørgaard

    2016-01-01

    A delta-type parallel kinematics system for Additive Manufacturing has been created, which through a probing system can recognise its geometrical deviations from nominal and compensate for these in the driving inverse kinematic model of the machine. Novelty is that this model is derived from...... a virtual machine of the kinematics system, built on principles from geometrical metrology. Relevant mathematically non-trivial deviations to the ideal machine are identified and decomposed into elemental deviations. From these deviations, a routine is added to a physical machine tool, which allows...

  19. Parallel LC circuit model for multi-band absorption and preliminary design of radiative cooling.

    Science.gov (United States)

    Feng, Rui; Qiu, Jun; Liu, Linhua; Ding, Weiqiang; Chen, Lixue

    2014-12-15

    We perform a comprehensive analysis of multi-band absorption by exciting magnetic polaritons in the infrared region. According to the independent properties of the magnetic polaritons, we propose a parallel inductance and capacitance(PLC) circuit model to explain and predict the multi-band resonant absorption peaks, which is fully validated by using the multi-sized structure with identical dielectric spacing layer and the multilayer structure with the same strip width. More importantly, we present the application of the PLC circuit model to preliminarily design a radiative cooling structure realized by merging several close peaks together. This omnidirectional and polarization insensitive structure is a good candidate for radiative cooling application.

  20. Analysis of parameters for technological equipment of parallel kinematics based on rods of variable length for processing accuracy assurance

    Science.gov (United States)

    Koltsov, A. G.; Shamutdinov, A. H.; Blokhin, D. A.; Krivonos, E. V.

    2018-01-01

    A new classification of parallel kinematics mechanisms on symmetry coefficient, being proportional to mechanism stiffness and accuracy of the processing product using the technological equipment under study, is proposed. A new version of the Stewart platform with a high symmetry coefficient is presented for analysis. The workspace of the mechanism under study is described, this space being a complex solid figure. The workspace end points are reached by the center of the mobile platform which moves in parallel related to the base plate. Parameters affecting the processing accuracy, namely the static and dynamic stiffness, natural vibration frequencies are determined. The capability assessment of the mechanism operation under various loads, taking into account resonance phenomena at different points of the workspace, was conducted. The study proved that stiffness and therefore, processing accuracy with the use of the above mentioned mechanisms are comparable with the stiffness and accuracy of medium-sized series-produced machines.