parallel process model: Topics by WorldWideScience.org

Sample records for parallel process model

Investigation of Mediational Processes Using Parallel Process Latent Growth Curve Modeling

Science.gov (United States)

Cheong, JeeWon; MacKinnon, David P.; Khoo, Siek Toon

2010-01-01

This study investigated a method to evaluate mediational processes using latent growth curve modeling. The mediator and the outcome measured across multiple time points were viewed as 2 separate parallel processes. The mediational process was defined as the independent variable influencing the growth of the mediator, which, in turn, affected the growth of the outcome. To illustrate modeling procedures, empirical data from a longitudinal drug prevention program, Adolescents Training and Learning to Avoid Steroids, were used. The program effects on the growth of the mediator and the growth of the outcome were examined first in a 2-group structural equation model. The mediational process was then modeled and tested in a parallel process latent growth curve model by relating the prevention program condition, the growth rate factor of the mediator, and the growth rate factor of the outcome. PMID:20157639
Toward a model framework of generalized parallel componential processing of multi-symbol numbers.

Science.gov (United States)

Huber, Stefan; Cornelsen, Sonja; Moeller, Korbinian; Nuerk, Hans-Christoph

2015-05-01

In this article, we propose and evaluate a new model framework of parallel componential multi-symbol number processing, generalizing the idea of parallel componential processing of multi-digit numbers to the case of negative numbers by considering the polarity signs similar to single digits. In a first step, we evaluated this account by defining and investigating a sign-decade compatibility effect for the comparison of positive and negative numbers, which extends the unit-decade compatibility effect in 2-digit number processing. Then, we evaluated whether the model is capable of accounting for previous findings in negative number processing. In a magnitude comparison task, in which participants had to single out the larger of 2 integers, we observed a reliable sign-decade compatibility effect with prolonged reaction times for incompatible (e.g., -97 vs. +53; in which the number with the larger decade digit has the smaller, i.e., negative polarity sign) as compared with sign-decade compatible number pairs (e.g., -53 vs. +97). Moreover, an analysis of participants' eye fixation behavior corroborated our model of parallel componential processing of multi-symbol numbers. These results are discussed in light of concurrent theoretical notions about negative number processing. On the basis of the present results, we propose a generalized integrated model framework of parallel componential multi-symbol processing. (c) 2015 APA, all rights reserved).
The Extended Parallel Process Model: Illuminating the Gaps in Research

Science.gov (United States)

Popova, Lucy

2012-01-01

This article examines constructs, propositions, and assumptions of the extended parallel process model (EPPM). Review of the EPPM literature reveals that its theoretical concepts are thoroughly developed, but the theory lacks consistency in operational definitions of some of its constructs. Out of the 12 propositions of the EPPM, a few have not…
A model for dealing with parallel processes in supervision

Directory of Open Access Journals (Sweden)

Lilja Cajvert

2011-03-01

Supervision in social work is essential for successful outcomes when working with clients. In social work, unconscious difficulties may arise and similar difficulties may occur in supervision as parallel processes. In this article, the development of a practice-based model of supervision to deal with parallel processes in supervision is described. The model has six phases. In the first phase, the focus is on the supervisor’s inner world, his/her own reflections and observations. In the second phase, the supervision situation is “frozen”, and the supervisees are invited to join the supervisor in taking a meta-perspective on the current situation of supervision. The focus in the third phase is on the inner world of all the group members as well as the visualization and identification of reflections and feelings that arose during the supervision process. Phase four focuses on the supervisee who presented a case, and in phase five the focus shifts to the common understanding and theorization of the supervision process as well as the definition and identification of possible parallel processes. In the final phase, the supervisee, with the assistance of the supervisor and other members of the group, develops a solution and determines how to proceed with the client in treatment. This article uses phenomenological concepts to provide a theoretical framework for the supervision model. Phenomenological reduction is an important approach to examine and to externalize and visualize the inner words of the supervisor and supervisees. Een model voor het hanteren van parallelle processen tijdens supervisie Om succesvol te zijn in de hulpverlening aan cliënten, is supervisie cruciaal in het sociaal werk. Tijdens de hulpverlening kunnen impliciete moeilijkheden de kop opsteken en soortgelijke moeilijkheden duiken soms ook op tijdens supervisie. Dit worden parallelle processen genoemd. Dit artikel beschrijft een op praktijkervaringen gebaseerd model om dergelijke parallelle
Parallel processing and non-uniform grids in global air quality modeling

NARCIS (Netherlands)

Berkvens, P.J.F.; Bochev, Mikhail A.

2002-01-01

A large-scale global air quality model, running efficiently on a single vector processor, is enhanced to make more realistic and more long-term simulations feasible. Two strategies are combined: non-uniform grids and parallel processing. The communication through the hierarchy of non-uniform grids
An Inconvenient Truth: An Application of the Extended Parallel Process Model

Science.gov (United States)

Goodall, Catherine E.; Roberto, Anthony J.

2008-01-01

"An Inconvenient Truth" is an Academy Award-winning documentary about global warming presented by Al Gore. This documentary is appropriate for a lesson on fear appeals and the extended parallel process model (EPPM). The EPPM is concerned with the effects of perceived threat and efficacy on behavior change. Perceived threat is composed of an…
Lamb wave propagation modelling and simulation using parallel processing architecture and graphical cards

International Nuclear Information System (INIS)

Paćko, P; Bielak, T; Staszewski, W J; Uhl, T; Spencer, A B; Worden, K

2012-01-01

This paper demonstrates new parallel computation technology and an implementation for Lamb wave propagation modelling in complex structures. A graphical processing unit (GPU) and computer unified device architecture (CUDA), available in low-cost graphical cards in standard PCs, are used for Lamb wave propagation numerical simulations. The local interaction simulation approach (LISA) wave propagation algorithm has been implemented as an example. Other algorithms suitable for parallel discretization can also be used in practice. The method is illustrated using examples related to damage detection. The results demonstrate good accuracy and effective computational performance of very large models. The wave propagation modelling presented in the paper can be used in many practical applications of science and engineering. (paper)
Fear Control an Danger Control: A Test of the Extended Parallel Process Model (EPPM).

Science.gov (United States)

Witte, Kim

1994-01-01

Explores cognitive and emotional mechanisms underlying success and failure of fear appeals in context of AIDS prevention. Offers general support for Extended Parallel Process Model. Suggests that cognitions lead to fear appeal success (attitude, intention, or behavior changes) via danger control processes, whereas the emotion fear leads to fear…
Parallelism and array processing

International Nuclear Information System (INIS)

Zacharov, V.

1983-01-01

Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)
Belief–logic conflict resolution in syllogistic reasoning: Inspection-time evidence for a parallel process model

OpenAIRE

Stupple, Edward J.N; Ball, Linden

2008-01-01

An experiment is reported examining dual-process models of belief bias in syllogistic reasoning using a problem complexity manipulation and an inspection-time method to monitor processing latencies for premises and conclusions. Endorsement rates indicated increased belief bias on complex problems, a finding that runs counter to the “belief-first” selective scrutiny model, but which is consistent with other theories, including “reasoning-first” and “parallel-process” models. Inspection-time da...
Cellular automata a parallel model

CERN Document Server

Mazoyer, J

1999-01-01

Cellular automata can be viewed both as computational models and modelling systems of real processes. This volume emphasises the first aspect. In articles written by leading researchers, sophisticated massive parallel algorithms (firing squad, life, Fischer's primes recognition) are treated. Their computational power and the specific complexity classes they determine are surveyed, while some recent results in relation to chaos from a new dynamic systems point of view are also presented. Audience: This book will be of interest to specialists of theoretical computer science and the parallelism challenge.
When fast logic meets slow belief: Evidence for a parallel-processing model of belief bias

OpenAIRE

Trippas, Dries; Thompson, Valerie A.; Handley, Simon J.

2016-01-01

Two experiments pitted the default-interventionist account of belief bias against a parallel-processing model. According to the former, belief bias occurs because a fast, belief-based evaluation of the conclusion pre-empts a working-memory demanding logical analysis. In contrast, according to the latter both belief-based and logic-based responding occur in parallel. Participants were given deductive reasoning problems of variable complexity and instructed to decide whether the conclusion was ...
Parallel processing for fluid dynamics applications

International Nuclear Information System (INIS)

Johnson, G.M.

1989-01-01

The impact of parallel processing on computational science and, in particular, on computational fluid dynamics is growing rapidly. In this paper, particular emphasis is given to developments which have occurred within the past two years. Parallel processing is defined and the reasons for its importance in high-performance computing are reviewed. Parallel computer architectures are classified according to the number and power of their processing units, their memory, and the nature of their connection scheme. Architectures which show promise for fluid dynamics applications are emphasized. Fluid dynamics problems are examined for parallelism inherent at the physical level. CFD algorithms and their mappings onto parallel architectures are discussed. Several example are presented to document the performance of fluid dynamics applications on present-generation parallel processing devices
Towards a streaming model for nested data parallelism

DEFF Research Database (Denmark)

Madsen, Frederik Meisner; Filinski, Andrzej

2013-01-01

The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism-flattening......The language-integrated cost semantics for nested data parallelism pioneered by NESL provides an intuitive, high-level model for predicting performance and scalability of parallel algorithms with reasonable accuracy. However, this predictability, obtained through a uniform, parallelism......-processable in a streaming fashion. This semantics is directly compatible with previously proposed piecewise execution models for nested data parallelism, but allows the expected space usage to be reasoned about directly at the source-language level. The language definition and implementation are still very much work...
Parallel finite elements with domain decomposition and its pre-processing

International Nuclear Information System (INIS)

Yoshida, A.; Yagawa, G.; Hamada, S.

1993-01-01

This paper describes a parallel finite element analysis using a domain decomposition method, and the pre-processing for the parallel calculation. Computer simulations are about to replace experiments in various fields, and the scale of model to be simulated tends to be extremely large. On the other hand, computational environment has drastically changed in these years. Especially, parallel processing on massively parallel computers or computer networks is considered to be promising techniques. In order to achieve high efficiency on such parallel computation environment, large granularity of tasks, a well-balanced workload distribution are key issues. It is also important to reduce the cost of pre-processing in such parallel FEM. From the point of view, the authors developed the domain decomposition FEM with the automatic and dynamic task-allocation mechanism and the automatic mesh generation/domain subdivision system for it. (author)
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

Science.gov (United States)

Tam, Wing-Kin; Yang, Zhi

2018-05-01

Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
Parallel Framework for Cooperative Processes

Directory of Open Access Journals (Sweden)

Mitică Craus

2005-01-01

Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.
Sustainability Attitudes and Behavioral Motivations of College Students: Testing the Extended Parallel Process Model

Science.gov (United States)

Perrault, Evan K.; Clark, Scott K.

2018-01-01

Purpose: A planet that can no longer sustain life is a frightening thought--and one that is often present in mass media messages. Therefore, this study aims to test the components of a classic fear appeal theory, the extended parallel process model (EPPM) and to determine how well its constructs predict sustainability behavioral intentions. This…
Parallel processing for artificial intelligence 1

CERN Document Server

Kanal, LN; Kumar, V; Suttner, CB

1994-01-01

Parallel processing for AI problems is of great current interest because of its potential for alleviating the computational demands of AI procedures. The articles in this book consider parallel processing for problems in several areas of artificial intelligence: image processing, knowledge representation in semantic networks, production rules, mechanization of logic, constraint satisfaction, parsing of natural language, data filtering and data mining. The publication is divided into six sections. The first addresses parallel computing for processing and understanding images. The second discus
Intelligent spatial ecosystem modeling using parallel processors

International Nuclear Information System (INIS)

Maxwell, T.; Costanza, R.

1993-01-01

Spatial modeling of ecosystems is essential if one's modeling goals include developing a relatively realistic description of past behavior and predictions of the impacts of alternative management policies on future ecosystem behavior. Development of these models has been limited in the past by the large amount of input data required and the difficulty of even large mainframe serial computers in dealing with large spatial arrays. These two limitations have begun to erode with the increasing availability of remote sensing data and GIS systems to manipulate it, and the development of parallel computer systems which allow computation of large, complex, spatial arrays. Although many forms of dynamic spatial modeling are highly amenable to parallel processing, the primary focus in this project is on process-based landscape models. These models simulate spatial structure by first compartmentalizing the landscape into some geometric design and then describing flows within compartments and spatial processes between compartments according to location-specific algorithms. The authors are currently building and running parallel spatial models at the regional scale for the Patuxent River region in Maryland, the Everglades in Florida, and Barataria Basin in Louisiana. The authors are also planning a project to construct a series of spatially explicit linked ecological and economic simulation models aimed at assessing the long-term potential impacts of global climate change

An Educational Tool for Interactive Parallel and Distributed Processing

DEFF Research Database (Denmark)

Pagliarini, Luigi; Lund, Henrik Hautop

2011-01-01

In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation of the abs......In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation...... of the abstract problems related to designing interactive parallel and distributed systems. Indeed, MITS seems to bring a series of goals into the education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, user and multiuser interaction, which can hardly be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples show how to implement interactive...
Z-buffer image assembly processing in high parallel visualization processing

International Nuclear Information System (INIS)

Kaneko, Isamu; Muramatsu, Kazuhiro

2000-03-01

On the platform of the parallel computer with many processors, the domain decomposition method is used as a popular means of parallel processing. In these days when the simulation scale becomes much larger and takes a lot of time, the simultaneous visualization processing with the actual computation is much more needed, and especially in case of a real-time visualization, the domain decomposition technique is indispensable. In case of parallel rendering processing, the rendered results must be gathered to one processor to compose the integrated picture in the last stage. This integration is usually conducted by the method using Z-buffer values. This process, however, induces the crucial problems of much lower speed processing and local memory shortage in case of parallel processing exceeding more than several tens of processors. In this report, the two new solutions are proposed. The one is the adoption of a special operator (Reduce operator) in the parallelization process, and the other is a buffer compression by deleting the background informations. This report includes the performance results of these new techniques to investigate their effect with use of the parallel computer Paragon. (author)
Researching the Parallel Process in Supervision and Psychotherapy

DEFF Research Database (Denmark)

Jacobsen, Claus Haugaard

Reflects upon how to do process research in supervision and in the parallel process. A single case study is presented illustrating how a study on parallel process can be carried out.......Reflects upon how to do process research in supervision and in the parallel process. A single case study is presented illustrating how a study on parallel process can be carried out....
Regional-scale calculation of the LS factor using parallel processing

Science.gov (United States)

Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

2015-05-01

With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
When fast logic meets slow belief: Evidence for a parallel-processing model of belief bias.

Science.gov (United States)

Trippas, Dries; Thompson, Valerie A; Handley, Simon J

2017-05-01

Two experiments pitted the default-interventionist account of belief bias against a parallel-processing model. According to the former, belief bias occurs because a fast, belief-based evaluation of the conclusion pre-empts a working-memory demanding logical analysis. In contrast, according to the latter both belief-based and logic-based responding occur in parallel. Participants were given deductive reasoning problems of variable complexity and instructed to decide whether the conclusion was valid on half the trials or to decide whether the conclusion was believable on the other half. When belief and logic conflict, the default-interventionist view predicts that it should take less time to respond on the basis of belief than logic, and that the believability of a conclusion should interfere with judgments of validity, but not the reverse. The parallel-processing view predicts that beliefs should interfere with logic judgments only if the processing required to evaluate the logical structure exceeds that required to evaluate the knowledge necessary to make a belief-based judgment, and vice versa otherwise. Consistent with this latter view, for the simplest reasoning problems (modus ponens), judgments of belief resulted in lower accuracy than judgments of validity, and believability interfered more with judgments of validity than the converse. For problems of moderate complexity (modus tollens and single-model syllogisms), the interference was symmetrical, in that validity interfered with belief judgments to the same degree that believability interfered with validity judgments. For the most complex (three-term multiple-model syllogisms), conclusion believability interfered more with judgments of validity than vice versa, in spite of the significant interference from conclusion validity on judgments of belief.
"Let's Move" campaign: applying the extended parallel process model.

Science.gov (United States)

Batchelder, Alicia; Matusitz, Jonathan

2014-01-01

This article examines Michelle Obama's health campaign, "Let's Move," through the lens of the extended parallel process model (EPPM). "Let's Move" aims to reduce the childhood obesity epidemic in the United States. Developed by Kim Witte, EPPM rests on the premise that people's attitudes can be changed when fear is exploited as a factor of persuasion. Fear appeals work best (a) when a person feels a concern about the issue or situation, and (b) when he or she believes to have the capability of dealing with that issue or situation. Overall, the analysis found that "Let's Move" is based on past health campaigns that have been successful. An important element of the campaign is the use of fear appeals (as it is postulated by EPPM). For example, part of the campaign's strategies is to explain the severity of the diseases associated with obesity. By looking at the steps of EPPM, readers can also understand the strengths and weaknesses of "Let's Move."
Advanced parallel processing with supercomputer architectures

International Nuclear Information System (INIS)

Hwang, K.

1987-01-01

This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers
Cocaine Use and Delinquent Behavior among High-Risk Youths: A Growth Model of Parallel Processes

Science.gov (United States)

Dembo, Richard; Sullivan, Christopher

2009-01-01

We report the results of a parallel-process, latent growth model analysis examining the relationships between cocaine use and delinquent behavior among youths. The study examined a sample of 278 justice-involved juveniles completing at least one of three follow-up interviews as part of a National Institute on Drug Abuse-funded study. The results…
Parallelism and Scalability in an Image Processing Application

DEFF Research Database (Denmark)

Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

2008-01-01

parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...
Parallelism and Scalability in an Image Processing Application

DEFF Research Database (Denmark)

Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

2009-01-01

parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...
A Parallel Process Growth Model of Avoidant Personality Disorder Symptoms and Personality Traits

Science.gov (United States)

Wright, Aidan G. C.; Pincus, Aaron L.; Lenzenweger, Mark F.

2012-01-01

Background Avoidant personality disorder (AVPD), like other personality disorders, has historically been construed as a highly stable disorder. However, results from a number of longitudinal studies have found that the symptoms of AVPD demonstrate marked change over time. Little is known about which other psychological systems are related to this change. Although cross-sectional research suggests a strong relationship between AVPD and personality traits, no work has examined the relationship of their change trajectories. The current study sought to establish the longitudinal relationship between AVPD and basic personality traits using parallel process growth curve modeling. Methods Parallel process growth curve modeling was applied to the trajectories of AVPD and basic personality traits from the Longitudinal Study of Personality Disorders (Lenzenweger, 2006), a naturalistic, prospective, multiwave, longitudinal study of personality disorder, temperament, and normal personality. The focus of these analyses is on the relationship between the rates of change in both AVPD symptoms and basic personality traits. Results AVPD symptom trajectories demonstrated significant negative relationships with the trajectories of interpersonal dominance and affiliation, and a significant positive relationship to rates of change in neuroticism. Conclusions These results provide some of the first compelling evidence that trajectories of change in PD symptoms and personality traits are linked. These results have important implications for the ways in which temporal stability is conceptualized in AVPD specifically, and PD in general. PMID:22506627
Surface topography of parallel grinding process for nonaxisymmetric aspheric lens

International Nuclear Information System (INIS)

Zhang Ningning; Wang Zhenzhong; Pan Ri; Wang Chunjin; Guo Yinbiao

2012-01-01

Workpiece surface profile, texture and roughness can be predicted by modeling the topography of wheel surface and modeling kinematics of grinding process, which compose an important part of precision grinding process theory. Parallel grinding technology is an important method for nonaxisymmetric aspheric lens machining, but there is few report on relevant simulation. In this paper, a simulation method based on parallel grinding for precision machining of aspheric lens is proposed. The method combines modeling the random surface of wheel and modeling the single grain track based on arc wheel contact points. Then, a mathematical algorithm for surface topography is proposed and applied in conditions of different machining parameters. The consistence between the results of simulation and test proves that the algorithm is correct and efficient. (authors)
Distributed parallel computing in stochastic modeling of groundwater systems.

Science.gov (United States)

Dong, Yanhui; Li, Guomin; Xu, Haizhen

2013-03-01

Stochastic modeling is a rapidly evolving, popular approach to the study of the uncertainty and heterogeneity of groundwater systems. However, the use of Monte Carlo-type simulations to solve practical groundwater problems often encounters computational bottlenecks that hinder the acquisition of meaningful results. To improve the computational efficiency, a system that combines stochastic model generation with MODFLOW-related programs and distributed parallel processing is investigated. The distributed computing framework, called the Java Parallel Processing Framework, is integrated into the system to allow the batch processing of stochastic models in distributed and parallel systems. As an example, the system is applied to the stochastic delineation of well capture zones in the Pinggu Basin in Beijing. Through the use of 50 processing threads on a cluster with 10 multicore nodes, the execution times of 500 realizations are reduced to 3% compared with those of a serial execution. Through this application, the system demonstrates its potential in solving difficult computational problems in practical stochastic modeling. © 2012, The Author(s). Groundwater © 2012, National Ground Water Association.
Parallel phase model : a programming model for high-end parallel machines with manycores.

Energy Technology Data Exchange (ETDEWEB)

Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

2009-04-01

This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.
A qualitative single case study of parallel processes

DEFF Research Database (Denmark)

Jacobsen, Claus Haugaard

2007-01-01

Parallel process in psychotherapy and supervision is a phenomenon manifest in relationships and interactions, that originates in one setting and is reflected in another. This article presents an explorative single case study of parallel processes based on qualitative analyses of two successive...... randomly chosen psychotherapy sessions with a schizophrenic patient and the supervision session given in between. The author's analysis is verified by an independent examiner's analysis. Parallel processes are identified and described. Reflections on the dynamics of parallel processes and supervisory...
Parallelization of a hydrological model using the message passing interface

Science.gov (United States)

Wu, Yiping; Li, Tiejian; Sun, Liqun; Chen, Ji

2013-01-01

With the increasing knowledge about the natural processes, hydrological models such as the Soil and Water Assessment Tool (SWAT) are becoming larger and more complex with increasing computation time. Additionally, other procedures such as model calibration, which may require thousands of model iterations, can increase running time and thus further reduce rapid modeling and analysis. Using the widely-applied SWAT as an example, this study demonstrates how to parallelize a serial hydrological model in a Windows® environment using a parallel programing technology—Message Passing Interface (MPI). With a case study, we derived the optimal values for the two parameters (the number of processes and the corresponding percentage of work to be distributed to the master process) of the parallel SWAT (P-SWAT) on an ordinary personal computer and a work station. Our study indicates that model execution time can be reduced by 42%–70% (or a speedup of 1.74–3.36) using multiple processes (two to five) with a proper task-distribution scheme (between the master and slave processes). Although the computation time cost becomes lower with an increasing number of processes (from two to five), this enhancement becomes less due to the accompanied increase in demand for message passing procedures between the master and all slave processes. Our case study demonstrates that the P-SWAT with a five-process run may reach the maximum speedup, and the performance can be quite stable (fairly independent of a project size). Overall, the P-SWAT can help reduce the computation time substantially for an individual model run, manual and automatic calibration procedures, and optimization of best management practices. In particular, the parallelization method we used and the scheme for deriving the optimal parameters in this study can be valuable and easily applied to other hydrological or environmental models.
An educational tool for interactive parallel and distributed processing

DEFF Research Database (Denmark)

Pagliarini, Luigi; Lund, Henrik Hautop

2012-01-01

In this article we try to describe how the modular interactive tiles system (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing a handson educational tool that allows a change in the representation...... of abstract problems related to designing interactive parallel and distributed systems. Indeed, the MITS seems to bring a series of goals into education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, and user and multi-user interaction which can rarely be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples we show how to implement...
Connectionism, parallel constraint satisfaction processes, and gestalt principles: (re) introducing cognitive dynamics to social psychology.

Science.gov (United States)

Read, S J; Vanman, E J; Miller, L C

1997-01-01

We argue that recent work in connectionist modeling, in particular the parallel constraint satisfaction processes that are central to many of these models, has great importance for understanding issues of both historical and current concern for social psychologists. We first provide a brief description of connectionist modeling, with particular emphasis on parallel constraint satisfaction processes. Second, we examine the tremendous similarities between parallel constraint satisfaction processes and the Gestalt principles that were the foundation for much of modem social psychology. We propose that parallel constraint satisfaction processes provide a computational implementation of the principles of Gestalt psychology that were central to the work of such seminal social psychologists as Asch, Festinger, Heider, and Lewin. Third, we then describe how parallel constraint satisfaction processes have been applied to three areas that were key to the beginnings of modern social psychology and remain central today: impression formation and causal reasoning, cognitive consistency (balance and cognitive dissonance), and goal-directed behavior. We conclude by discussing implications of parallel constraint satisfaction principles for a number of broader issues in social psychology, such as the dynamics of social thought and the integration of social information within the narrow time frame of social interaction.
Preliminary Study on the Enhancement of Reconstruction Speed for Emission Computed Tomography Using Parallel Processing

International Nuclear Information System (INIS)

Park, Min Jae; Lee, Jae Sung; Kim, Soo Mee; Kang, Ji Yeon; Lee, Dong Soo; Park, Kwang Suk

2009-01-01

Conventional image reconstruction uses simplified physical models of projection. However, real physics, for example 3D reconstruction, takes too long time to process all the data in clinic and is unable in a common reconstruction machine because of the large memory for complex physical models. We suggest the realistic distributed memory model of fast-reconstruction using parallel processing on personal computers to enable large-scale technologies. The preliminary tests for the possibility on virtual machines and various performance test on commercial super computer, Tachyon were performed. Expectation maximization algorithm with common 2D projection and realistic 3D line of response were tested. Since the process time was getting slower (max 6 times) after a certain iteration, optimization for compiler was performed to maximize the efficiency of parallelization. Parallel processing of a program on multiple computers was available on Linux with MPICH and NFS. We verified that differences between parallel processed image and single processed image at the same iterations were under the significant digits of floating point number, about 6 bit. Double processors showed good efficiency (1.96 times) of parallel computing. Delay phenomenon was solved by vectorization method using SSE. Through the study, realistic parallel computing system in clinic was established to be able to reconstruct by plenty of memory using the realistic physical models which was impossible to simplify
Structured building model reduction toward parallel simulation

Energy Technology Data Exchange (ETDEWEB)

Dobbs, Justin R. [Cornell University; Hencey, Brondon M. [Cornell University

2013-08-26

Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.

Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2014-08-12

Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Parallel processing of two-dimensional Sn transport calculations

International Nuclear Information System (INIS)

Uematsu, M.

1997-01-01

A parallel processing method for the two-dimensional S n transport code DOT3.5 has been developed to achieve a drastic reduction in computation time. In the proposed method, parallelization is achieved with angular domain decomposition and/or space domain decomposition. The calculational speed of parallel processing by angular domain decomposition is largely influenced by frequent communications between processing elements. To assess parallelization efficiency, sample problems with up to 32 x 32 spatial meshes were solved with a Sun workstation using the PVM message-passing library. As a result, parallel calculation using 16 processing elements, for example, was found to be nine times as fast as that with one processing element. As for parallel processing by geometry segmentation, the influence of processing element communications on computation time is small; however, discontinuity at the segment boundary degrades convergence speed. To accelerate the convergence, an alternate sweep of angular flux in conjunction with space domain decomposition and a two-step rescaling method consisting of segmentwise rescaling and ordinary pointwise rescaling have been developed. By applying the developed method, the number of iterations needed to obtain a converged flux solution was reduced by a factor of 2. As a result, parallel calculation using 16 processing elements was found to be 5.98 times as fast as the original DOT3.5 calculation
A parallel process growth model of avoidant personality disorder symptoms and personality traits.

Science.gov (United States)

Wright, Aidan G C; Pincus, Aaron L; Lenzenweger, Mark F

2013-07-01

Avoidant personality disorder (AVPD), like other personality disorders, has historically been construed as a highly stable disorder. However, results from a number of longitudinal studies have found that the symptoms of AVPD demonstrate marked change over time. Little is known about which other psychological systems are related to this change. Although cross-sectional research suggests a strong relationship between AVPD and personality traits, no work has examined the relationship of their change trajectories. The current study sought to establish the longitudinal relationship between AVPD and basic personality traits using parallel process growth curve modeling. Parallel process growth curve modeling was applied to the trajectories of AVPD and basic personality traits from the Longitudinal Study of Personality Disorders (Lenzenweger, M. F., 2006, The longitudinal study of personality disorders: History, design considerations, and initial findings. Journal of Personality Disorders, 20, 645-670. doi:10.1521/pedi.2006.20.6.645), a naturalistic, prospective, multiwave, longitudinal study of personality disorder, temperament, and normal personality. The focus of these analyses is on the relationship between the rates of change in both AVPD symptoms and basic personality traits. AVPD symptom trajectories demonstrated significant negative relationships with the trajectories of interpersonal dominance and affiliation, and a significant positive relationship to rates of change in neuroticism. These results provide some of the first compelling evidence that trajectories of change in PD symptoms and personality traits are linked. These results have important implications for the ways in which temporal stability is conceptualized in AVPD specifically, and PD in general.
Parallel processing of structural integrity analysis codes

International Nuclear Information System (INIS)

Swami Prasad, P.; Dutta, B.K.; Kushwaha, H.S.

1996-01-01

Structural integrity analysis forms an important role in assessing and demonstrating the safety of nuclear reactor components. This analysis is performed using analytical tools such as Finite Element Method (FEM) with the help of digital computers. The complexity of the problems involved in nuclear engineering demands high speed computation facilities to obtain solutions in reasonable amount of time. Parallel processing systems such as ANUPAM provide an efficient platform for realising the high speed computation. The development and implementation of software on parallel processing systems is an interesting and challenging task. The data and algorithm structure of the codes plays an important role in exploiting the parallel processing system capabilities. Structural analysis codes based on FEM can be divided into two categories with respect to their implementation on parallel processing systems. The first category codes such as those used for harmonic analysis, mechanistic fuel performance codes need not require the parallelisation of individual modules of the codes. The second category of codes such as conventional FEM codes require parallelisation of individual modules. In this category, parallelisation of equation solution module poses major difficulties. Different solution schemes such as domain decomposition method (DDM), parallel active column solver and substructuring method are currently used on parallel processing systems. Two codes, FAIR and TABS belonging to each of these categories have been implemented on ANUPAM. The implementation details of these codes and the performance of different equation solvers are highlighted. (author). 5 refs., 12 figs., 1 tab
Applications of Parallel Processing in Mobile Banking

Directory of Open Access Journals (Sweden)

2007-01-01

Full Text Available The future of mobile banking will be represented by such applications that support mobile, Internet banking and EFT (Electronic Funds Transfer transactions in a single user interface. In such a way, the mobile banking will be able to cover all the types of applications demanded at the market level. The parallel processing of credit card bank transactions could be performed with the help of a grid network. Excluding some limitations, the grid processing offers huge opportunities to exploit the parallelism. For this reason, a lot of applications of waiting queues in grid processing were developed in the last years. Grid networks represent a distinctive and very modern field of the parallel and distributed processing.
Parallelization of the model-based iterative reconstruction algorithm DIRA

International Nuclear Information System (INIS)

Oertenberg, A.; Sandborg, M.; Alm Carlsson, G.; Malusek, A.; Magnusson, M.

2016-01-01

New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelization of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelization of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelized using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelization of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelization with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. (authors)
The Processing of Somatosensory Information shifts from an early parallel into a serial processing mode: a combined fMRI/MEG study.

Directory of Open Access Journals (Sweden)

Carsten Michael Klingner

2016-12-01

Full Text Available The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG data collected during sustained (260 ms tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII receives information regarding a new stimulus in parallel with the primary somatosensory area (SI, whereas later processing in the SII is dominated by the preprocessed input from the SI.
The Processing of Somatosensory Information Shifts from an Early Parallel into a Serial Processing Mode: A Combined fMRI/MEG Study.

Science.gov (United States)

Klingner, Carsten M; Brodoehl, Stefan; Huonker, Ralph; Witte, Otto W

2016-01-01

The question regarding whether somatosensory inputs are processed in parallel or in series has not been clearly answered. Several studies that have applied dynamic causal modeling (DCM) to fMRI data have arrived at seemingly divergent conclusions. However, these divergent results could be explained by the hypothesis that the processing route of somatosensory information changes with time. Specifically, we suggest that somatosensory stimuli are processed in parallel only during the early stage, whereas the processing is later dominated by serial processing. This hypothesis was revisited in the present study based on fMRI analyses of tactile stimuli and the application of DCM to magnetoencephalographic (MEG) data collected during sustained (260 ms) tactile stimulation. Bayesian model comparisons were used to infer the processing stream. We demonstrated that the favored processing stream changes over time. We found that the neural activity elicited in the first 100 ms following somatosensory stimuli is best explained by models that support a parallel processing route, whereas a serial processing route is subsequently favored. These results suggest that the secondary somatosensory area (SII) receives information regarding a new stimulus in parallel with the primary somatosensory area (SI), whereas later processing in the SII is dominated by the preprocessed input from the SI.
Implementation science: a role for parallel dual processing models of reasoning?

Science.gov (United States)

Sladek, Ruth M; Phillips, Paddy A; Bond, Malcolm J

2006-05-25

A better theoretical base for understanding professional behaviour change is needed to support evidence-based changes in medical practice. Traditionally strategies to encourage changes in clinical practices have been guided empirically, without explicit consideration of underlying theoretical rationales for such strategies. This paper considers a theoretical framework for reasoning from within psychology for identifying individual differences in cognitive processing between doctors that could moderate the decision to incorporate new evidence into their clinical decision-making. Parallel dual processing models of reasoning posit two cognitive modes of information processing that are in constant operation as humans reason. One mode has been described as experiential, fast and heuristic; the other as rational, conscious and rule based. Within such models, the uptake of new research evidence can be represented by the latter mode; it is reflective, explicit and intentional. On the other hand, well practiced clinical judgments can be positioned in the experiential mode, being automatic, reflexive and swift. Research suggests that individual differences between people in both cognitive capacity (e.g., intelligence) and cognitive processing (e.g., thinking styles) influence how both reasoning modes interact. This being so, it is proposed that these same differences between doctors may moderate the uptake of new research evidence. Such dispositional characteristics have largely been ignored in research investigating effective strategies in implementing research evidence. Whilst medical decision-making occurs in a complex social environment with multiple influences and decision makers, it remains true that an individual doctor's judgment still retains a key position in terms of diagnostic and treatment decisions for individual patients. This paper argues therefore, that individual differences between doctors in terms of reasoning are important considerations in any
Spatially parallel processing of within-dimension conjunctions.

Science.gov (United States)

Linnell, K J; Humphreys, G W

2001-01-01

Within-dimension conjunction search for red-green targets amongst red-blue, and blue-green, nontargets is extremely inefficient (Wolfe et al, 1990 Journal of Experimental Psychology: Human Perception and Performance 16 879-892). We tested whether pairs of red-green conjunction targets can nevertheless be processed spatially in parallel. Participants made speeded detection responses whenever a red-green target was present. Across trials where a second identical target was present, the distribution of detection times was compatible with the assumption that targets were processed in parallel (Miller, 1982 Cognitive Psychology 14 247-279). We show that this was not an artifact of response-competition or feature-based processing. We suggest that within-dimension conjunctions can be processed spatially in parallel. Visual search for such items may be inefficient owing to within-dimension grouping between items.
Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

Science.gov (United States)

Hsieh, Shang-Hsien

1993-01-01

The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.
Parallelization Experience with Four Canonical Econometric Models Using ParMitISEM

Directory of Open Access Journals (Sweden)

Nalan Baştürk

2016-03-01

Full Text Available This paper presents the parallel computing implementation of the MitISEM algorithm, labeled Parallel MitISEM. The basic MitISEM algorithm provides an automatic and flexible method to approximate a non-elliptical target density using adaptive mixtures of Student-t densities, where only a kernel of the target density is required. The approximation can be used as a candidate density in Importance Sampling or Metropolis Hastings methods for Bayesian inference on model parameters and probabilities. We present and discuss four canonical econometric models using a Graphics Processing Unit and a multi-core Central Processing Unit version of the MitISEM algorithm. The results show that the parallelization of the MitISEM algorithm on Graphics Processing Units and multi-core Central Processing Units is straightforward and fast to program using MATLAB. Moreover the speed performance of the Graphics Processing Unit version is much higher than the Central Processing Unit one.
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

Science.gov (United States)

Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

2016-04-01

Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and
Efficient multitasking: parallel versus serial processing of multiple tasks.

Science.gov (United States)

Fischer, Rico; Plessow, Franziska

2015-01-01

In the context of performance optimizations in multitasking, a central debate has unfolded in multitasking research around whether cognitive processes related to different tasks proceed only sequentially (one at a time), or can operate in parallel (simultaneously). This review features a discussion of theoretical considerations and empirical evidence regarding parallel versus serial task processing in multitasking. In addition, we highlight how methodological differences and theoretical conceptions determine the extent to which parallel processing in multitasking can be detected, to guide their employment in future research. Parallel and serial processing of multiple tasks are not mutually exclusive. Therefore, questions focusing exclusively on either task-processing mode are too simplified. We review empirical evidence and demonstrate that shifting between more parallel and more serial task processing critically depends on the conditions under which multiple tasks are performed. We conclude that efficient multitasking is reflected by the ability of individuals to adjust multitasking performance to environmental demands by flexibly shifting between different processing strategies of multiple task-component scheduling.
The parallel processing of EGS4 code on distributed memory scalar parallel computer:Intel Paragon XP/S15-256

Energy Technology Data Exchange (ETDEWEB)

Takemiya, Hiroshi; Ohta, Hirofumi; Honma, Ichirou

1996-03-01

The parallelization of Electro-Magnetic Cascade Monte Carlo Simulation Code, EGS4 on distributed memory scalar parallel computer: Intel Paragon XP/S15-256 is described. EGS4 has the feature that calculation time for one incident particle is quite different from each other because of the dynamic generation of secondary particles and different behavior of each particle. Granularity for parallel processing, parallel programming model and the algorithm of parallel random number generation are discussed and two kinds of method, each of which allocates particles dynamically or statically, are used for the purpose of realizing high speed parallel processing of this code. Among four problems chosen for performance evaluation, the speedup factors for three problems have been attained to nearly 100 times with 128 processor. It has been found that when both the calculation time for each incident particles and its dispersion are large, it is preferable to use dynamic particle allocation method which can average the load for each processor. And it has also been found that when they are small, it is preferable to use static particle allocation method which reduces the communication overhead. Moreover, it is pointed out that to get the result accurately, it is necessary to use double precision variables in EGS4 code. Finally, the workflow of program parallelization is analyzed and tools for program parallelization through the experience of the EGS4 parallelization are discussed. (author).
Implementation of a parallel version of a regional climate model

Energy Technology Data Exchange (ETDEWEB)

Gerstengarbe, F.W. [ed.; Kuecken, M. [Potsdam-Institut fuer Klimafolgenforschung (PIK), Potsdam (Germany); Schaettler, U. [Deutscher Wetterdienst, Offenbach am Main (Germany). Geschaeftsbereich Forschung und Entwicklung

1997-10-01

A regional climate model developed by the Max Planck Institute for Meterology and the German Climate Computing Centre in Hamburg based on the `Europa` and `Deutschland` models of the German Weather Service has been parallelized and implemented on the IBM RS/6000 SP computer system of the Potsdam Institute for Climate Impact Research including parallel input/output processing, the explicit Eulerian time-step, the semi-implicit corrections, the normal-mode initialization and the physical parameterizations of the German Weather Service. The implementation utilizes Fortran 90 and the Message Passing Interface. The parallelization strategy used is a 2D domain decomposition. This report describes the parallelization strategy, the parallel I/O organization, the influence of different domain decomposition approaches for static and dynamic load imbalances and first numerical results. (orig.)
A new parallelization algorithm of ocean model with explicit scheme

Science.gov (United States)

Fu, X. D.

2017-08-01

This paper will focus on the parallelization of ocean model with explicit scheme which is one of the most commonly used schemes in the discretization of governing equation of ocean model. The characteristic of explicit schema is that calculation is simple, and that the value of the given grid point of ocean model depends on the grid point at the previous time step, which means that one doesn’t need to solve sparse linear equations in the process of solving the governing equation of the ocean model. Aiming at characteristics of the explicit scheme, this paper designs a parallel algorithm named halo cells update with tiny modification of original ocean model and little change of space step and time step of the original ocean model, which can parallelize ocean model by designing transmission module between sub-domains. This paper takes the GRGO for an example to implement the parallelization of GRGO (Global Reduced Gravity Ocean model) with halo update. The result demonstrates that the higher speedup can be achieved at different problem size.
Psychodrama: A Creative Approach for Addressing Parallel Process in Group Supervision

Science.gov (United States)

Hinkle, Michelle Gimenez

2008-01-01

This article provides a model for using psychodrama to address issues of parallel process during group supervision. Information on how to utilize the specific concepts and techniques of psychodrama in relation to group supervision is discussed. A case vignette of the model is provided.
Parallel-Batch Scheduling with Two Models of Deterioration to Minimize the Makespan

Directory of Open Access Journals (Sweden)

Cuixia Miao

2014-01-01

Full Text Available We consider the bounded parallel-batch scheduling with two models of deterioration, in which the processing time of the first model is pj=aj+αt and of the second model is pj=a+αjt. The objective is to minimize the makespan. We present O(n log n time algorithms for the single-machine problems, respectively. And we propose fully polynomial time approximation schemes to solve the identical-parallel-machine problem and uniform-parallel-machine problem, respectively.
Efficient parallel implementation of active appearance model fitting algorithm on GPU.

Science.gov (United States)

Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou

2014-01-01

The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.

Parallel processing of Monte Carlo code MCNP for particle transport problem

Energy Technology Data Exchange (ETDEWEB)

Higuchi, Kenji; Kawasaki, Takuji

1996-06-01

It is possible to vectorize or parallelize Monte Carlo codes (MC code) for photon and neutron transport problem, making use of independency of the calculation for each particle. Applicability of existing MC code to parallel processing is mentioned. As for parallel computer, we have used both vector-parallel processor and scalar-parallel processor in performance evaluation. We have made (i) vector-parallel processing of MCNP code on Monte Carlo machine Monte-4 with four vector processors, (ii) parallel processing on Paragon XP/S with 256 processors. In this report we describe the methodology and results for parallel processing on two types of parallel or distributed memory computers. In addition, we mention the evaluation of parallel programming environments for parallel computers used in the present work as a part of the work developing STA (Seamless Thinking Aid) Basic Software. (author)
Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

Science.gov (United States)

Harper, Richard

1989-01-01

In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.
Implementation science: a role for parallel dual processing models of reasoning?

Directory of Open Access Journals (Sweden)

Phillips Paddy A

2006-05-01

Full Text Available Abstract Background A better theoretical base for understanding professional behaviour change is needed to support evidence-based changes in medical practice. Traditionally strategies to encourage changes in clinical practices have been guided empirically, without explicit consideration of underlying theoretical rationales for such strategies. This paper considers a theoretical framework for reasoning from within psychology for identifying individual differences in cognitive processing between doctors that could moderate the decision to incorporate new evidence into their clinical decision-making. Discussion Parallel dual processing models of reasoning posit two cognitive modes of information processing that are in constant operation as humans reason. One mode has been described as experiential, fast and heuristic; the other as rational, conscious and rule based. Within such models, the uptake of new research evidence can be represented by the latter mode; it is reflective, explicit and intentional. On the other hand, well practiced clinical judgments can be positioned in the experiential mode, being automatic, reflexive and swift. Research suggests that individual differences between people in both cognitive capacity (e.g., intelligence and cognitive processing (e.g., thinking styles influence how both reasoning modes interact. This being so, it is proposed that these same differences between doctors may moderate the uptake of new research evidence. Such dispositional characteristics have largely been ignored in research investigating effective strategies in implementing research evidence. Whilst medical decision-making occurs in a complex social environment with multiple influences and decision makers, it remains true that an individual doctor's judgment still retains a key position in terms of diagnostic and treatment decisions for individual patients. This paper argues therefore, that individual differences between doctors in terms of
Parallel programming practical aspects, models and current limitations

CERN Document Server

Tarkov, Mikhail S

2014-01-01

Parallel programming is designed for the use of parallel computer systems for solving time-consuming problems that cannot be solved on a sequential computer in a reasonable time. These problems can be divided into two classes: 1. Processing large data arrays (including processing images and signals in real time)2. Simulation of complex physical processes and chemical reactions For each of these classes, prospective methods are designed for solving problems. For data processing, one of the most promising technologies is the use of artificial neural networks. Particles-in-cell method and cellular automata are very useful for simulation. Problems of scalability of parallel algorithms and the transfer of existing parallel programs to future parallel computers are very acute now. An important task is to optimize the use of the equipment (including the CPU cache) of parallel computers. Along with parallelizing information processing, it is essential to ensure the processing reliability by the relevant organization ...
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU

Directory of Open Access Journals (Sweden)

Jinwei Wang

2014-01-01

Full Text Available The active appearance model (AAM is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA on the Nvidia’s GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
A Parallel Computational Model for Multichannel Phase Unwrapping Problem

Science.gov (United States)

Imperatore, Pasquale; Pepe, Antonio; Lanari, Riccardo

2015-05-01

In this paper, a parallel model for the solution of the computationally intensive multichannel phase unwrapping (MCh-PhU) problem is proposed. Firstly, the Extended Minimum Cost Flow (EMCF) algorithm for solving MCh-PhU problem is revised within the rigorous mathematical framework of the discrete calculus ; thus permitting to capture its topological structure in terms of meaningful discrete differential operators. Secondly, emphasis is placed on those methodological and practical aspects, which lead to a parallel reformulation of the EMCF algorithm. Thus, a novel dual-level parallel computational model, in which the parallelism is hierarchically implemented at two different (i.e., process and thread) levels, is presented. The validity of our approach has been demonstrated through a series of experiments that have revealed a significant speedup. Therefore, the attained high-performance prototype is suitable for the solution of large-scale phase unwrapping problems in reasonable time frames, with a significant impact on the systematic exploitation of the existing, and rapidly growing, large archives of SAR data.
Parallel-Processing Test Bed For Simulation Software

Science.gov (United States)

Blech, Richard; Cole, Gary; Townsend, Scott

1996-01-01

Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).
The study of image processing of parallel digital signal processor

International Nuclear Information System (INIS)

Liu Jie

2000-01-01

The author analyzes the basic characteristic of parallel DSP (digital signal processor) TMS320C80 and proposes related optimized image algorithm and the parallel processing method based on parallel DSP. The realtime for many image processing can be achieved in this way
Tuning of tool dynamics for increased stability of parallel (simultaneous) turning processes

Science.gov (United States)

Ozturk, E.; Comak, A.; Budak, E.

2016-01-01

Parallel (simultaneous) turning operations make use of more than one cutting tool acting on a common workpiece offering potential for higher productivity. However, dynamic interaction between the tools and workpiece and resulting chatter vibrations may create quality problems on machined surfaces. In order to determine chatter free cutting process parameters, stability models can be employed. In this paper, stability of parallel turning processes is formulated in frequency and time domain for two different parallel turning cases. Predictions of frequency and time domain methods demonstrated reasonable agreement with each other. In addition, the predicted stability limits are also verified experimentally. Simulation and experimental results show multi regional stability diagrams which can be used to select most favorable set of process parameters for higher stable material removal rates. In addition to parameter selection, developed models can be used to determine the best natural frequency ratio of tools resulting in the highest stable depth of cuts. It is concluded that the most stable operations are obtained when natural frequency of the tools are slightly off each other and worst stability occurs when the natural frequency of the tools are exactly the same.
Parallel algorithms for interactive manipulation of digital terrain models

Science.gov (United States)

Davis, E. W.; Mcallister, D. F.; Nagaraj, V.

1988-01-01

Interactive three-dimensional graphics applications, such as terrain data representation and manipulation, require extensive arithmetic processing. Massively parallel machines are attractive for this application since they offer high computational rates, and grid connected architectures provide a natural mapping for grid based terrain models. Presented here are algorithms for data movement on the massive parallel processor (MPP) in support of pan and zoom functions over large data grids. It is an extension of earlier work that demonstrated real-time performance of graphics functions on grids that were equal in size to the physical dimensions of the MPP. When the dimensions of a data grid exceed the processing array size, data is packed in the array memory. Windows of the total data grid are interactively selected for processing. Movement of packed data is needed to distribute items across the array for efficient parallel processing. Execution time for data movement was found to exceed that for arithmetic aspects of graphics functions. Performance figures are given for routines written in MPP Pascal.
A multitransputer parallel processing system (MTPPS)

International Nuclear Information System (INIS)

Jethra, A.K.; Pande, S.S.; Borkar, S.P.; Khare, A.N.; Ghodgaonkar, M.D.; Bairi, B.R.

1993-01-01

This report describes the design and implementation of a 16 node Multi Transputer Parallel Processing System(MTPPS) which is a platform for parallel program development. It is a MIMD machine based on message passing paradigm. The basic compute engine is an Inmos Transputer Ims T800-20. Transputer with local memory constitutes the processing element (NODE) of this MIMD architecture. Multiple NODES can be connected to each other in an identifiable network topology through the high speed serial links of the transputer. A Network Configuration Unit (NCU) incorporates the necessary hardware to provide software controlled network configuration. System is modularly expandable and more NODES can be added to the system to achieve the required processing power. The system is backend to the IBM-PC which has been integrated into the system to provide user I/O interface. PC resources are available to the programmer. Interface hardware between the PC and the network of transputers is INMOS compatible. Therefore, all the commercially available development software compatible to INMOS products can run on this system. While giving the details of design and implementation, this report briefly summarises MIMD Architectures, Transputer Architecture and Parallel Processing Software Development issues. LINPACK performance evaluation of the system and solutions of neutron physics and plasma physics problem have been discussed along with results. (author). 12 refs., 22 figs., 3 tabs., 3 appendixes
Parallel workflow tools to facilitate human brain MRI post-processing

Directory of Open Access Journals (Sweden)

Zaixu eCui

2015-05-01

Full Text Available Multi-modal magnetic resonance imaging (MRI techniques are widely applied in human brain studies. To obtain specific brain measures of interest from MRI datasets, a number of complex image post-processing steps are typically required. Parallel workflow tools have recently been developed, concatenating individual processing steps and enabling fully automated processing of raw MRI data to obtain the final results. These workflow tools are also designed to make optimal use of available computational resources and to support the parallel processing of different subjects or of independent processing steps for a single subject. Automated, parallel MRI post-processing tools can greatly facilitate relevant brain investigations and are being increasingly applied. In this review, we briefly summarize these parallel workflow tools and discuss relevant issues.
Methods to model-check parallel systems software

International Nuclear Information System (INIS)

Matlin, O. S.; McCune, W.; Lusk, E.

2003-01-01

We report on an effort to develop methodologies for formal verification of parts of the Multi-Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of communicating processes. While the individual components of the collection execute simple algorithms, their interaction leads to unexpected errors that are difficult to uncover by conventional means. Two verification approaches are discussed here: the standard model checking approach using the software model checker SPIN and the nonstandard use of a general-purpose first-order resolution-style theorem prover OTTER to conduct the traditional state space exploration. We compare modeling methodology and analyze performance and scalability of the two methods with respect to verification of MPD
New Parallel Algorithms for Landscape Evolution Model

Science.gov (United States)

Jin, Y.; Zhang, H.; Shi, Y.

2017-12-01

Most landscape evolution models (LEM) developed in the last two decades solve the diffusion equation to simulate the transportation of surface sediments. This numerical approach is difficult to parallelize due to the computation of drainage area for each node, which needs huge amount of communication if run in parallel. In order to overcome this difficulty, we developed two parallel algorithms for LEM with a stream net. One algorithm handles the partition of grid with traditional methods and applies an efficient global reduction algorithm to do the computation of drainage areas and transport rates for the stream net; the other algorithm is based on a new partition algorithm, which partitions the nodes in catchments between processes first, and then partitions the cells according to the partition of nodes. Both methods focus on decreasing communication between processes and take the advantage of massive computing techniques, and numerical experiments show that they are both adequate to handle large scale problems with millions of cells. We implemented the two algorithms in our program based on the widely used finite element library deal.II, so that it can be easily coupled with ASPECT.
Introduction to parallel programming

CERN Document Server

Brawer, Steven

1989-01-01

Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race
Parallel Boltzmann machines : a mathematical model

NARCIS (Netherlands)

Zwietering, P.J.; Aarts, E.H.L.

1991-01-01

A mathematical model is presented for the description of parallel Boltzmann machines. The framework is based on the theory of Markov chains and combines a number of previously known results into one generic model. It is argued that parallel Boltzmann machines maximize a function consisting of a
Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements

International Nuclear Information System (INIS)

Azmy, Y.Y.; Barnett, D.A.

1999-01-01

The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead
Multitasking TORT under UNICOS: Parallel performance models and measurements

International Nuclear Information System (INIS)

Barnett, A.; Azmy, Y.Y.

1999-01-01

The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead
Parallel Distributed Processing theory in the age of deep networks

OpenAIRE

Bowers, Jeffrey

2017-01-01

Parallel Distributed Processing (PDP) models in psychology are the precursors of deep networks used in computer science. However, only PDP models are associated with two core psychological claims, namely, that all knowledge is coded in a distributed format, and cognition is mediated by non-symbolic computations. These claims have long been debated within cognitive science, and recent work with deep networks speaks to this debate. Specifically, single-unit recordings show that deep networks le...
Examination of Speed Contribution of Parallelization for Several Fingerprint Pre-Processing Algorithms

Directory of Open Access Journals (Sweden)

GORGUNOGLU, S.

2014-05-01

Full Text Available In analysis of minutiae based fingerprint systems, fingerprints needs to be pre-processed. The pre-processing is carried out to enhance the quality of the fingerprint and to obtain more accurate minutiae points. Reducing the pre-processing time is important for identification and verification in real time systems and especially for databases holding large fingerprints information. Parallel processing and parallel CPU computing can be considered as distribution of processes over multi core processor. This is done by using parallel programming techniques. Reducing the execution time is the main objective in parallel processing. In this study, pre-processing of minutiae based fingerprint system is implemented by parallel processing on multi core computers using OpenMP and on graphics processor using CUDA to improve execution time. The execution times and speedup ratios are compared with the one that of single core processor. The results show that by using parallel processing, execution time is substantially improved. The improvement ratios obtained for different pre-processing algorithms allowed us to make suggestions on the more suitable approaches for parallelization.

HPC parallel programming model for gyrokinetic MHD simulation

International Nuclear Information System (INIS)

Naitou, Hiroshi; Yamada, Yusuke; Tokuda, Shinji; Ishii, Yasutomo; Yagi, Masatoshi

2011-01-01

The 3-dimensional gyrokinetic PIC (particle-in-cell) code for MHD simulation, Gpic-MHD, was installed on SR16000 (“Plasma Simulator”), which is a scalar cluster system consisting of 8,192 logical cores. The Gpic-MHD code advances particle and field quantities in time. In order to distribute calculations over large number of logical cores, the total simulation domain in cylindrical geometry was broken up into N DD-r × N DD-z (number of radial decomposition times number of axial decomposition) small domains including approximately the same number of particles. The axial direction was uniformly decomposed, while the radial direction was non-uniformly decomposed. N RP replicas (copies) of each decomposed domain were used (“particle decomposition”). The hybrid parallelization model of multi-threads and multi-processes was employed: threads were parallelized by the auto-parallelization and N DD-r × N DD-z × N RP processes were parallelized by MPI (message-passing interface). The parallelization performance of Gpic-MHD was investigated for the medium size system of N r × N θ × N z = 1025 × 128 × 128 mesh with 4.196 or 8.192 billion particles. The highest speed for the fixed number of logical cores was obtained for two threads, the maximum number of N DD-z , and optimum combination of N DD-r and N RP . The observed optimum speeds demonstrated good scaling up to 8,192 logical cores. (author)
Parallel Task Processing on a Multicore Platform in a PC-based Control System for Parallel Kinematics

Directory of Open Access Journals (Sweden)

Harald Michalik

2009-02-01

Full Text Available Multicore platforms are such that have one physical processor chip with multiple cores interconnected via a chip level bus. Because they deliver a greater computing power through concurrency, offer greater system density multicore platforms provide best qualifications to address the performance bottleneck encountered in PC-based control systems for parallel kinematic robots with heavy CPU-load. Heavy load control tasks are generated by new control approaches that include features like singularity prediction, structure control algorithms, vision data integration and similar tasks. In this paper we introduce the parallel task scheduling extension of a communication architecture specially tailored for the development of PC-based control of parallel kinematics. The Sche-duling is specially designed for the processing on a multicore platform. It breaks down the serial task processing of the robot control cycle and extends it with parallel task processing paths in order to enhance the overall control performance.
Parallel processing of neutron transport in fuel assembly calculation

International Nuclear Information System (INIS)

Song, Jae Seung

1992-02-01

Group constants, which are used for reactor analyses by nodal method, are generated by fuel assembly calculations based on the neutron transport theory, since one or a quarter of the fuel assembly corresponds to a unit mesh in the current nodal calculation. The group constant calculation for a fuel assembly is performed through spectrum calculations, a two-dimensional fuel assembly calculation, and depletion calculations. The purpose of this study is to develop a parallel algorithm to be used in a parallel processor for the fuel assembly calculation and the depletion calculations of the group constant generation. A serial program, which solves the neutron integral transport equation using the transmission probability method and the linear depletion equation, was prepared and verified by a benchmark calculation. Small changes from the serial program was enough to parallelize the depletion calculation which has inherent parallel characteristics. In the fuel assembly calculation, however, efficient parallelization is not simple and easy because of the many coupling parameters in the calculation and data communications among CPU's. In this study, the group distribution method is introduced for the parallel processing of the fuel assembly calculation to minimize the data communications. The parallel processing was performed on Quadputer with 4 CPU's operating in NURAD Lab. at KAIST. Efficiencies of 54.3 % and 78.0 % were obtained in the fuel assembly calculation and depletion calculation, respectively, which lead to the overall speedup of about 2.5. As a result, it is concluded that the computing time consumed for the group constant generation can be easily reduced by parallel processing on the parallel computer with small size CPU's
Parallel processing for artificial intelligence 2

CERN Document Server

Kumar, V; Suttner, CB

1994-01-01

With the increasing availability of parallel machines and the raising of interest in large scale and real world applications, research on parallel processing for Artificial Intelligence (AI) is gaining greater importance in the computer science environment. Many applications have been implemented and delivered but the field is still considered to be in its infancy. This book assembles diverse aspects of research in the area, providing an overview of the current state of technology. It also aims to promote further growth across the discipline. Contributions have been grouped according to their
High-Performance Psychometrics: The Parallel-E Parallel-M Algorithm for Generalized Latent Variable Models. Research Report. ETS RR-16-34

Science.gov (United States)

von Davier, Matthias

2016-01-01

This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…
PDDP, A Data Parallel Programming Model

Directory of Open Access Journals (Sweden)

Karen H. Warren

1996-01-01

Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.
Models of parallel computation :a survey and classification

Institute of Scientific and Technical Information of China (English)

ZHANG Yunquan; CHEN Guoliang; SUN Guangzhong; MIAO Qiankun

2007-01-01

In this paper,the state-of-the-art parallel computational model research is reviewed.We will introduce various models that were developed during the past decades.According to their targeting architecture features,especially memory organization,we classify these parallel computational models into three generations.These models and their characteristics are discussed based on three generations classification.We believe that with the ever increasing speed gap between the CPU and memory systems,incorporating non-uniform memory hierarchy into computational models will become unavoidable.With the emergence of multi-core CPUs,the parallelism hierarchy of current computing platforms becomes more and more complicated.Describing this complicated parallelism hierarchy in future computational models becomes more and more important.A semi-automatic toolkit that can extract model parameters and their values on real computers can reduce the model analysis complexity,thus allowing more complicated models with more parameters to be adopted.Hierarchical memory and hierarchical parallelism will be two very important features that should be considered in future model design and research.
LMFAO! Humor as a Response to Fear: Decomposing Fear Control within the Extended Parallel Process Model

Science.gov (United States)

Abril, Eulàlia P.; Szczypka, Glen; Emery, Sherry L.

2017-01-01

This study seeks to analyze fear control responses to the 2012 Tips from Former Smokers campaign using the Extended Parallel Process Model (EPPM). The goal is to examine the occurrence of ancillary fear control responses, like humor. In order to explore individuals’ responses in an organic setting, we use Twitter data—tweets—collected via the Firehose. Content analysis of relevant fear control tweets (N = 14,281) validated the existence of boomerang responses within the EPPM: denial, defensive avoidance, and reactance. More importantly, results showed that humor tweets were not only a significant occurrence but constituted the majority of fear control responses. PMID:29527092
Co-simulation of dynamic systems in parallel and serial model configurations

International Nuclear Information System (INIS)

Sweafford, Trevor; Yoon, Hwan Sik

2013-01-01

Recent advancement in simulation software and computation hardware make it realizable to simulate complex dynamic systems comprised of multiple submodels developed in different modeling languages. The so-called co-simulation enables one to study various aspects of a complex dynamic system with heterogeneous submodels in a cost-effective manner. Among several different model configurations for co-simulation, synchronized parallel configuration is regarded to expedite the simulation process by simulation multiple sub models concurrently on a multi core processor. In this paper, computational accuracies as well as computation time are studied for three different co-simulation frameworks : integrated, serial, and parallel. for this purpose, analytical evaluations of the three different methods are made using the explicit Euler method and then they are applied to two-DOF mass-spring systems. The result show that while the parallel simulation configuration produces the same accurate results as the integrated configuration, results of the serial configuration, results of the serial configuration show a slight deviation. it is also shown that the computation time can be reduced by running simulation in the parallel configuration. Therefore, it can be concluded that the synchronized parallel simulation methodology is the best for both simulation accuracy and time efficiency.
Iteration schemes for parallelizing models of superconductivity

Energy Technology Data Exchange (ETDEWEB)

Gray, P.A. [Michigan State Univ., East Lansing, MI (United States)

1996-12-31

The time dependent Lawrence-Doniach model, valid for high fields and high values of the Ginzburg-Landau parameter, is often used for studying vortex dynamics in layered high-T{sub c} superconductors. When solving these equations numerically, the added degrees of complexity due to the coupling and nonlinearity of the model often warrant the use of high-performance computers for their solution. However, the interdependence between the layers can be manipulated so as to allow parallelization of the computations at an individual layer level. The reduced parallel tasks may then be solved independently using a heterogeneous cluster of networked workstations connected together with Parallel Virtual Machine (PVM) software. Here, this parallelization of the model is discussed and several computational implementations of varying degrees of parallelism are presented. Computational results are also given which contrast properties of convergence speed, stability, and consistency of these implementations. Included in these results are models involving the motion of vortices due to an applied current and pinning effects due to various material properties.
The Acoustic and Peceptual Effects of Series and Parallel Processing

Directory of Open Access Journals (Sweden)

Melinda C. Anderson

2009-01-01

Full Text Available Temporal envelope (TE cues provide a great deal of speech information. This paper explores how spectral subtraction and dynamic-range compression gain modifications affect TE fluctuations for parallel and series configurations. In parallel processing, algorithms compute gains based on the same input signal, and the gains in dB are summed. In series processing, output from the first algorithm forms the input to the second algorithm. Acoustic measurements show that the parallel arrangement produces more gain fluctuations, introducing more changes to the TE than the series configurations. Intelligibility tests for normal-hearing (NH and hearing-impaired (HI listeners show (1 parallel processing gives significantly poorer speech understanding than an unprocessed (UNP signal and the series arrangement and (2 series processing and UNP yield similar results. Speech quality tests show that UNP is preferred to both parallel and series arrangements, although spectral subtraction is the most preferred. No significant differences exist in sound quality between the series and parallel arrangements, or between the NH group and the HI group. These results indicate that gain modifications affect intelligibility and sound quality differently. Listeners appear to have a higher tolerance for gain modifications with regard to intelligibility, while judgments for sound quality appear to be more affected by smaller amounts of gain modification.
Smoldyn on graphics processing units: massively parallel Brownian dynamics simulations.

Science.gov (United States)

Dematté, Lorenzo

2012-01-01

Space is a very important aspect in the simulation of biochemical systems; recently, the need for simulation algorithms able to cope with space is becoming more and more compelling. Complex and detailed models of biochemical systems need to deal with the movement of single molecules and particles, taking into consideration localized fluctuations, transportation phenomena, and diffusion. A common drawback of spatial models lies in their complexity: models can become very large, and their simulation could be time consuming, especially if we want to capture the systems behavior in a reliable way using stochastic methods in conjunction with a high spatial resolution. In order to deliver the promise done by systems biology to be able to understand a system as whole, we need to scale up the size of models we are able to simulate, moving from sequential to parallel simulation algorithms. In this paper, we analyze Smoldyn, a widely diffused algorithm for stochastic simulation of chemical reactions with spatial resolution and single molecule detail, and we propose an alternative, innovative implementation that exploits the parallelism of Graphics Processing Units (GPUs). The implementation executes the most computational demanding steps (computation of diffusion, unimolecular, and bimolecular reaction, as well as the most common cases of molecule-surface interaction) on the GPU, computing them in parallel on each molecule of the system. The implementation offers good speed-ups and real time, high quality graphics output
Optimisation of a parallel ocean general circulation model

OpenAIRE

M. I. Beare; D. P. Stevens

1997-01-01

International audience; This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by...
Test generation for digital circuits using parallel processing

Science.gov (United States)

Hartmann, Carlos R.; Ali, Akhtar-Uz-Zaman M.

1990-12-01

The problem of test generation for digital logic circuits is an NP-Hard problem. Recently, the availability of low cost, high performance parallel machines has spurred interest in developing fast parallel algorithms for computer-aided design and test. This report describes a method of applying a 15-valued logic system for digital logic circuit test vector generation in a parallel programming environment. A concept called fault site testing allows for test generation, in parallel, that targets more than one fault at a given location. The multi-valued logic system allows results obtained by distinct processors and/or processes to be merged by means of simple set intersections. A machine-independent description is given for the proposed algorithm.
Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine

DEFF Research Database (Denmark)

Madsen, Kasper Grud Skat; Zhou, Yongluan; Cao, Jianneng

2017-01-01

Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled...... solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches....
Parallel Algorithm of Geometrical Hashing Based on NumPy Package and Processes Pool

Directory of Open Access Journals (Sweden)

Klyachin Vladimir Aleksandrovich

2015-10-01

Full Text Available The article considers the problem of multi-dimensional geometric hashing. The paper describes a mathematical model of geometric hashing and considers an example of its use in localization problems for the point. A method of constructing the corresponding hash matrix by parallel algorithm is considered. In this paper an algorithm of parallel geometric hashing using a development pattern «pool processes» is proposed. The implementation of the algorithm is executed using the Python programming language and NumPy package for manipulating multidimensional data. To implement the process pool it is proposed to use a class Process Pool Executor imported from module concurrent.futures, which is included in the distribution of the interpreter Python since version 3.2. All the solutions are presented in the paper by corresponding UML class diagrams. Designed GeomNash package includes classes Data, Result, GeomHash, Job. The results of the developed program presents the corresponding graphs. Also, the article presents the theoretical justification for the application process pool for the implementation of parallel algorithms. It is obtained condition t2 > (p/(p-1*t1 of the appropriateness of process pool. Here t1 - the time of transmission unit of data between processes, and t2 - the time of processing unit data by one processor.
A parallelized three-dimensional cellular automaton model for grain growth during additive manufacturing

Science.gov (United States)

Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.

2018-05-01

In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.
A parallelized three-dimensional cellular automaton model for grain growth during additive manufacturing

Science.gov (United States)

Lian, Yanping; Lin, Stephen; Yan, Wentao; Liu, Wing Kam; Wagner, Gregory J.

2018-01-01

In this paper, a parallelized 3D cellular automaton computational model is developed to predict grain morphology for solidification of metal during the additive manufacturing process. Solidification phenomena are characterized by highly localized events, such as the nucleation and growth of multiple grains. As a result, parallelization requires careful treatment of load balancing between processors as well as interprocess communication in order to maintain a high parallel efficiency. We give a detailed summary of the formulation of the model, as well as a description of the communication strategies implemented to ensure parallel efficiency. Scaling tests on a representative problem with about half a billion cells demonstrate parallel efficiency of more than 80% on 8 processors and around 50% on 64; loss of efficiency is attributable to load imbalance due to near-surface grain nucleation in this test problem. The model is further demonstrated through an additive manufacturing simulation with resulting grain structures showing reasonable agreement with those observed in experiments.
Fast robot kinematics modeling by using a parallel simulator (PSIM)

International Nuclear Information System (INIS)

El-Gazzar, H.M.; Ayad, N.M.A.

2002-01-01

High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done
Fast robot kinematics modeling by using a parallel simulator (PSIM)

Energy Technology Data Exchange (ETDEWEB)

El-Gazzar, H M; Ayad, N M.A. [Atomic Energy Authority, Reactor Dept., Computer and Control Lab., P.O. Box no 13759 (Egypt)

2002-09-15

High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done.

Fraud Detection in Credit Card Transactions; Using Parallel Processing of Anomalies in Big Data

Directory of Open Access Journals (Sweden)

Mohammad Reza Taghva

2016-10-01

Full Text Available In parallel to the increasing use of electronic cards, especially in the banking industry, the volume of transactions using these cards has grown rapidly. Moreover, the financial nature of these cards has led to the desirability of fraud in this area. The present study with Map Reduce approach and parallel processing, applied the Kohonen neural network model to detect abnormalities in bank card transactions. For this purpose, firstly it was proposed to classify all transactions into the fraudulent and legal which showed better performance compared with other methods. In the next step, we transformed the Kohonen model into the form of parallel task which demonstrated appropriate performance in terms of time; as expected to be well implemented in transactions with Big Data assumptions.
A Novel Least Significant Bit First Processing Parallel CRC Circuit

Directory of Open Access Journals (Sweden)

Xiujie Qu

2013-01-01

Full Text Available In HDLC serial communication protocol, CRC calculation can first process the most or least significant bit of data. Nowadays most CRC calculation is based on the most significant bit (MSB first processing. An algorithm of the least significant bit (LSB first processing parallel CRC is proposed in this paper. Based on the general expression of the least significant bit first processing serial CRC, using state equation method of linear system, we derive a recursive formula by the mathematical deduction. The recursive formula is applicable to any number of bits processed in parallel and any series of generator polynomial. According to the formula, we present the parallel circuit of CRC calculation and implement it with VHDL on FPGA. The results verify the accuracy and effectiveness of this method.
Evidence of Parallel Processing During Translation

DEFF Research Database (Denmark)

Balling, Laura Winther; Hvelplund, Kristian Tangsgaard; Sjørup, Annette Camilla

2014-01-01

conclude that translation is a parallel process and that literal translation is likely to be a universal initial default strategy in translation. This conclusion is strengthened by the fact that all three experiments were relatively naturalistic, due to the combination of remote eye tracking and mixed...
Parallel Distributed Processing at 25: Further Explorations in the Microstructure of Cognition

Science.gov (United States)

Rogers, Timothy T.; McClelland, James L.

2014-01-01

This paper introduces a special issue of "Cognitive Science" initiated on the 25th anniversary of the publication of "Parallel Distributed Processing" (PDP), a two-volume work that introduced the use of neural network models as vehicles for understanding cognition. The collection surveys the core commitments of the PDP…
Advanced optical signal processing of broadband parallel data signals

DEFF Research Database (Denmark)

Oxenløwe, Leif Katsuo; Hu, Hao; Kjøller, Niels-Kristian

2016-01-01

Optical signal processing may aid in reducing the number of active components in communication systems with many parallel channels, by e.g. using telescopic time lens arrangements to perform format conversion and allow for WDM regeneration.......Optical signal processing may aid in reducing the number of active components in communication systems with many parallel channels, by e.g. using telescopic time lens arrangements to perform format conversion and allow for WDM regeneration....
Teaching Scientific Computing: A Model-Centered Approach to Pipeline and Parallel Programming with C

Directory of Open Access Journals (Sweden)

Vladimiras Dolgopolovas

2015-01-01

Full Text Available The aim of this study is to present an approach to the introduction into pipeline and parallel computing, using a model of the multiphase queueing system. Pipeline computing, including software pipelines, is among the key concepts in modern computing and electronics engineering. The modern computer science and engineering education requires a comprehensive curriculum, so the introduction to pipeline and parallel computing is the essential topic to be included in the curriculum. At the same time, the topic is among the most motivating tasks due to the comprehensive multidisciplinary and technical requirements. To enhance the educational process, the paper proposes a novel model-centered framework and develops the relevant learning objects. It allows implementing an educational platform of constructivist learning process, thus enabling learners’ experimentation with the provided programming models, obtaining learners’ competences of the modern scientific research and computational thinking, and capturing the relevant technical knowledge. It also provides an integral platform that allows a simultaneous and comparative introduction to pipelining and parallel computing. The programming language C for developing programming models and message passing interface (MPI and OpenMP parallelization tools have been chosen for implementation.
Processing data communications events by awakening threads in parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2016-03-15

Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.
Parallel and distributed processing in power system simulation and control

Energy Technology Data Exchange (ETDEWEB)

Falcao, Djalma M [Universidade Federal, Rio de Janeiro, RJ (Brazil). Coordenacao dos Programas de Pos-graduacao de Engenharia

1994-12-31

Recent advances in computer technology will certainly have a great impact in the methodologies used in power system expansion and operational planning as well as in real-time control. Parallel and distributed processing are among the new technologies that present great potential for application in these areas. Parallel computers use multiple functional or processing units to speed up computation while distributed processing computer systems are collection of computers joined together by high speed communication networks having many objectives and advantages. The paper presents some ideas for the use of parallel and distributed processing in power system simulation and control. It also comments on some of the current research work in these topics and presents a summary of the work presently being developed at COPPE. (author) 53 refs., 2 figs.
Visual analysis of inter-process communication for large-scale parallel computing.

Science.gov (United States)

Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

2009-01-01

In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.
cudaBayesreg: Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis

Directory of Open Access Journals (Sweden)

Adelino R. Ferreira da Silva

2011-10-01

Full Text Available Graphic processing units (GPUs are rapidly gaining maturity as powerful general parallel computing devices. A key feature in the development of modern GPUs has been the advancement of the programming model and programming tools. Compute Unified Device Architecture (CUDA is a software platform for massively parallel high-performance computing on Nvidia many-core GPUs. In functional magnetic resonance imaging (fMRI, the volume of the data to be processed, and the type of statistical analysis to perform call for high-performance computing strategies. In this work, we present the main features of the R-CUDA package cudaBayesreg which implements in CUDA the core of a Bayesian multilevel model for the analysis of brain fMRI data. The statistical model implements a Gibbs sampler for multilevel/hierarchical linear models with a normal prior. The main contribution for the increased performance comes from the use of separate threads for fitting the linear regression model at each voxel in parallel. The R-CUDA implementation of the Bayesian model proposed here has been able to reduce significantly the run-time processing of Markov chain Monte Carlo (MCMC simulations used in Bayesian fMRI data analyses. Presently, cudaBayesreg is only configured for Linux systems with Nvidia CUDA support.
Fast image processing on parallel hardware

International Nuclear Information System (INIS)

Bittner, U.

1988-01-01

Current digital imaging modalities in the medical field incorporate parallel hardware which is heavily used in the stage of image formation like the CT/MR image reconstruction or in the DSA real time subtraction. In order to image post-processing as efficient as image acquisition, new software approaches have to be found which take full advantage of the parallel hardware architecture. This paper describes the implementation of two-dimensional median filter which can serve as an example for the development of such an algorithm. The algorithm is analyzed by viewing it as a complete parallel sort of the k pixel values in the chosen window which leads to a generalization to rank order operators and other closely related filters reported in literature. A section about the theoretical base of the algorithm gives hints for how to characterize operations suitable for implementations on pipeline processors and the way to find the appropriate algorithms. Finally some results that computation time and usefulness of medial filtering in radiographic imaging are given
Construction of a digital elevation model: methods and parallelization

International Nuclear Information System (INIS)

Mazzoni, Christophe

1995-01-01

The aim of this work is to reduce the computation time needed to produce the Digital Elevation Models (DEM) by using a parallel machine. It is made in collaboration between the French 'Institut Geographique National' (IGN) and the Laboratoire d'Electronique de Technologie et d'Instrumentation (LETI) of the French Atomic Energy Commission (CEA). The IGN has developed a system which provides DEM that is used to produce topographic maps. The kernel of this system is the correlator, a software which automatically matches pairs of homologous points of a stereo-pair of photographs. Nevertheless the correlator is expensive In computing time. In order to reduce computation time and to produce the DEM with same accuracy that the actual system, we have parallelized the IGN's correlator on the OPENVISION system. This hardware solution uses a SIMD (Single Instruction Multiple Data) parallel machine SYMPATI-2, developed by the LETI that is involved in parallel architecture and image processing. Our analysis of the implementation has demonstrated the difficulty of efficient coupling between scalar and parallel structure. So we propose solutions to reinforce this coupling. In order to accelerate more the processing we evaluate SYMPHONIE, a SIMD calculator, successor of SYMPATI-2. On an other hand, we developed a multi-agent approach for what a MIMD (Multiple Instruction, Multiple Data) architecture is available. At last, we describe a Multi-SIMD architecture that conciliates our two approaches. This architecture offers a capacity to apprehend efficiently multi-level treatment image. It is flexible by its modularity, and its communication network supplies reliability that interest sensible systems. (author) [fr
Leveraging Parallel Data Processing Frameworks with Verified Lifting

Directory of Open Access Journals (Sweden)

Maaz Bin Safeer Ahmad

2016-11-01

Full Text Available Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific languages that each framework supports. This rewriting–tedious and error-prone–also requires developers to choose the framework that best optimizes performance given a specific workload. This paper describes Casper, a novel compiler that automatically retargets sequential Java code for execution on Hadoop, a parallel data processing framework that implements the MapReduce paradigm. Given a sequential code fragment, Casper uses verified lifting to infer a high-level summary expressed in our program specification language that is then compiled for execution on Hadoop. We demonstrate that Casper automatically translates Java benchmarks into Hadoop. The translated results execute on average 3.3x faster than the sequential implementations and scale better, as well, to larger datasets.
Efficient Parallel Statistical Model Checking of Biochemical Networks

Directory of Open Access Journals (Sweden)

Paolo Ballarini

2009-12-01

Full Text Available We consider the problem of verifying stochastic models of biochemical networks against behavioral properties expressed in temporal logic terms. Exact probabilistic verification approaches such as, for example, CSL/PCTL model checking, are undermined by a huge computational demand which rule them out for most real case studies. Less demanding approaches, such as statistical model checking, estimate the likelihood that a property is satisfied by sampling executions out of the stochastic model. We propose a methodology for efficiently estimating the likelihood that a LTL property P holds of a stochastic model of a biochemical network. As with other statistical verification techniques, the methodology we propose uses a stochastic simulation algorithm for generating execution samples, however there are three key aspects that improve the efficiency: first, the sample generation is driven by on-the-fly verification of P which results in optimal overall simulation time. Second, the confidence interval estimation for the probability of P to hold is based on an efficient variant of the Wilson method which ensures a faster convergence. Third, the whole methodology is designed according to a parallel fashion and a prototype software tool has been implemented that performs the sampling/verification process in parallel over an HPC architecture.
Shared Variable Oriented Parallel Precompiler for SPMD Model

Institute of Scientific and Technical Information of China (English)

无

1995-01-01

For the moment,commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers,which are just traditional sequential FORTRAN or C compilers expanded with communication statements.Programmers suffer from writing parallel programs with communication statements. The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD(Single Program Multiple Data) computation model and greatly ease the parallel programming with high communication efficiency.The core function of parallel C precompiler has been successfully verified on a transputer-based parallel computer.Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique.
Parallel processing approach to transform-based image coding

Science.gov (United States)

Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.

1991-06-01

This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.
Decomposition based parallel processing technique for efficient collaborative optimization

International Nuclear Information System (INIS)

Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

2000-01-01

In practical design studies, most of designers solve multidisciplinary problems with complex design structure. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder original design processes to minimize total cost and time. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology
The design of multi-core DSP parallel model based on message passing and multi-level pipeline

Science.gov (United States)

Niu, Jingyu; Hu, Jian; He, Wenjing; Meng, Fanrong; Li, Chuanrong

2017-10-01

Currently, the design of embedded signal processing system is often based on a specific application, but this idea is not conducive to the rapid development of signal processing technology. In this paper, a parallel processing model architecture based on multi-core DSP platform is designed, and it is mainly suitable for the complex algorithms which are composed of different modules. This model combines the ideas of multi-level pipeline parallelism and message passing, and summarizes the advantages of the mainstream model of multi-core DSP (the Master-Slave model and the Data Flow model), so that it has better performance. This paper uses three-dimensional image generation algorithm to validate the efficiency of the proposed model by comparing with the effectiveness of the Master-Slave and the Data Flow model.
Parallel-hierarchical processing and classification of laser beam profile images based on the GPU-oriented architecture

Science.gov (United States)

Yarovyi, Andrii A.; Timchenko, Leonid I.; Kozhemiako, Volodymyr P.; Kokriatskaia, Nataliya I.; Hamdi, Rami R.; Savchuk, Tamara O.; Kulyk, Oleksandr O.; Surtel, Wojciech; Amirgaliyev, Yedilkhan; Kashaganova, Gulzhan

2017-08-01

The paper deals with a problem of insufficient productivity of existing computer means for large image processing, which do not meet modern requirements posed by resource-intensive computing tasks of laser beam profiling. The research concentrated on one of the profiling problems, namely, real-time processing of spot images of the laser beam profile. Development of a theory of parallel-hierarchic transformation allowed to produce models for high-performance parallel-hierarchical processes, as well as algorithms and software for their implementation based on the GPU-oriented architecture using GPGPU technologies. The analyzed performance of suggested computerized tools for processing and classification of laser beam profile images allows to perform real-time processing of dynamic images of various sizes.
Optimisation of a parallel ocean general circulation model

Science.gov (United States)

Beare, M. I.; Stevens, D. P.

1997-10-01

This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

Parallel and distributed processing: applications to power systems

Energy Technology Data Exchange (ETDEWEB)

Wu, Felix; Murphy, Liam [California Univ., Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences

1994-12-31

Applications of parallel and distributed processing to power systems problems are still in the early stages. Rapid progress in computing and communications promises a revolutionary increase in the capacity of distributed processing systems. In this paper, the state-of-the art in distributed processing technology and applications is reviewed and future trends are discussed. (author) 14 refs.,1 tab.
The parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture

International Nuclear Information System (INIS)

Wang Shi; Kang Kejun; Wang Jingjin

1996-01-01

Computerized Tomography (CT) is expected to become an inevitable diagnostic technique in the future. However, the long time required to reconstruct an image has been one of the major drawbacks associated with this technique. Parallel process is one of the best way to solve this problem. This paper gives the architecture, hardware and software design of PIRS-4 (4-processor Parallel Image Reconstruction System), which is a parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture. It includes the structure and components of the system, the design of crossbar switch and details of control model, the description of RPBP image reconstruction, the choice of OS (Operate System) and language, the principle of imitating EMS, direct memory R/W of float and programming in the protect model. Finally, the test results are given
Hardware system of parallel processing for fast CT image reconstruction based on circular shifting float memory architecture

International Nuclear Information System (INIS)

Wang Shi; Kang Kejun; Wang Jingjin

1995-01-01

Computerized Tomography (CT) is expected to become an inevitable diagnostic technique in the future. However, the long time required to reconstruct an image has been one of the major drawbacks associated with this technique. Parallel process is one of the best way to solve this problem. This paper gives the architecture and hardware design of PIRS-4 (4-processor Parallel Image Reconstruction System) which is a parallel processing system for fast 3D-CT image reconstruction by circular shifting float memory architecture. It includes structure and component of the system, the design of cross bar switch and details of control model. The test results are described
Parallel factor analysis PARAFAC of process affected water

Energy Technology Data Exchange (ETDEWEB)

Ewanchuk, A.M.; Ulrich, A.C.; Sego, D. [Alberta Univ., Edmonton, AB (Canada). Dept. of Civil and Environmental Engineering; Alostaz, M. [Thurber Engineering Ltd., Calgary, AB (Canada)

2010-07-01

A parallel factor analysis (PARAFAC) of oil sands process-affected water was presented. Naphthenic acids (NA) are traditionally described as monobasic carboxylic acids. Research has indicated that oil sands NA do not fit classical definitions of NA. Oil sands organic acids have toxic and corrosive properties. When analyzed by fluorescence technology, oil sands process-affected water displays a characteristic peak at 290 nm excitation and approximately 346 nm emission. In this study, a parallel factor analysis (PARAFAC) was used to decompose process-affected water multi-way data into components representing analytes, chemical compounds, and groups of compounds. Water samples from various oil sands operations were analyzed in order to obtain EEMs. The EEMs were then arranged into a large matrix in decreasing process-affected water content for PARAFAC. Data were divided into 5 components. A comparison with commercially prepared NA samples suggested that oil sands NA is fundamentally different. Further research is needed to determine what each of the 5 components represent. tabs., figs.
Mechatronic Model Based Computed Torque Control of a Parallel Manipulator

Directory of Open Access Journals (Sweden)

Zhiyong Yang

2008-11-01

Full Text Available With high speed and accuracy the parallel manipulators have wide application in the industry, but there still exist many difficulties in the actual control process because of the time-varying and coupling. Unfortunately, the present-day commercial controlles cannot provide satisfying performance for its single axis linear control only. Therefore, aimed at a novel 2-DOF (Degree of Freedom parallel manipulator called Diamond 600, a motor-mechanism coupling dynamic model based control scheme employing the computed torque control algorithm are presented in this paper. First, the integrated dynamic coupling model is deduced, according to equivalent torques between the mechanical structure and the PM (Permanent Magnetism servomotor. Second, computed torque controller is described in detail for the above proposed model. At last, a series of numerical simulations and experiments are carried out to test the effectiveness of the system, and the results verify the favourable tracking ability and robustness.
Mechatronic Model Based Computed Torque Control of a Parallel Manipulator

Directory of Open Access Journals (Sweden)

Zhiyong Yang

2008-03-01

Full Text Available With high speed and accuracy the parallel manipulators have wide application in the industry, but there still exist many difficulties in the actual control process because of the time-varying and coupling. Unfortunately, the present-day commercial controlles cannot provide satisfying performance for its single axis linear control only. Therefore, aimed at a novel 2-DOF (Degree of Freedom parallel manipulator called Diamond 600, a motor-mechanism coupling dynamic model based control scheme employing the computed torque control algorithm are presented in this paper. First, the integrated dynamic coupling model is deduced, according to equivalent torques between the mechanical structure and the PM (Permanent Magnetism servomotor. Second, computed torque controller is described in detail for the above proposed model. At last, a series of numerical simulations and experiments are carried out to test the effectiveness of the system, and the results verify the favourable tracking ability and robustness.
Parallel processing at the SSC: The fact and the fiction

International Nuclear Information System (INIS)

Bourianoff, G.; Cole, B.

1991-10-01

Accurately modelling the behavior of particles circulating in accelerators is a computationally demanding task. The particle tracking code currently in use at SSC is based upon a ''thin element'' analysis (TEAPOT). In this model each magnet in the lattice is described by a thin element at which the particle experiences an impulsive kick. Each kick requires approximately 200 floating point operations (''FLOP''). For the SSC collider lattice consisting of 10 4 elements, performing a tracking of study for a set of 100 particles for 10 7 turns would require 2 x 10 15 FLOPS. Even on a machine capable of 100 MFLOP/sec (MFLOPS), this would require 2 x 10 7 seconds, and many such runs are necessary. It should be noted that the accuracy with which the kicks are to be calculated is important: the large number of iterations involved will magnify the effects of small errors. The inability of current computational resources to effectively perform the full calculation motivates the migration of this calculation to the most powerful computers available. A survey of the current research into new technologies for superconducting reveals that the supercomputers of the future will be parallel in nature. Further, numerous such machines exist today, and are being used to solve other difficult problems. Thus it seems clear that it is not early to begin developing the capability to develop tracking codes for parallel architectures. This report discusses implementing parallel processing on the SCC
Parallel processing based decomposition technique for efficient collaborative optimization

International Nuclear Information System (INIS)

Park, Hyung Wook; Kim, Sung Chan; Kim, Min Soo; Choi, Dong Hoon

2001-01-01

In practical design studies, most of designers solve multidisciplinary problems with large sized and complex design system. These multidisciplinary problems have hundreds of analysis and thousands of variables. The sequence of process to solve these problems affects the speed of total design cycle. Thus it is very important for designer to reorder the original design processes to minimize total computational cost. This is accomplished by decomposing large multidisciplinary problem into several MultiDisciplinary Analysis SubSystem (MDASS) and processing it in parallel. This paper proposes new strategy for parallel decomposition of multidisciplinary problem to raise design efficiency by using genetic algorithm and shows the relationship between decomposition and Multidisciplinary Design Optimization(MDO) methodology
Optimisation of a parallel ocean general circulation model

Directory of Open Access Journals (Sweden)

M. I. Beare

1997-10-01

Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Optimisation of a parallel ocean general circulation model

Directory of Open Access Journals (Sweden)

M. I. Beare

Full Text Available This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Parallel asynchronous systems and image processing algorithms

Science.gov (United States)

Coon, D. D.; Perera, A. G. U.

1989-01-01

A new hardware approach to implementation of image processing algorithms is described. The approach is based on silicon devices which would permit an independent analog processing channel to be dedicated to evey pixel. A laminar architecture consisting of a stack of planar arrays of the device would form a two-dimensional array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuronlike asynchronous pulse coded form through the laminar processor. Such systems would integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The research is aimed at implementation of algorithms, such as the intensity dependent summation algorithm and pyramid processing structures, which are motivated by the operation of natural vision systems. Implementation of natural vision algorithms would benefit from the use of neuronlike information coding and the laminar, 2-D parallel, vision system type architecture. Besides providing a neural network framework for implementation of natural vision algorithms, a 2-D parallel approach could eliminate the serial bottleneck of conventional processing systems. Conversion to serial format would occur only after raw intensity data has been substantially processed. An interesting challenge arises from the fact that the mathematical formulation of natural vision algorithms does not specify the means of implementation, so that hardware implementation poses intriguing questions involving vision science.
The role of parallelism in the real-time processing of anaphora.

Science.gov (United States)

Poirier, Josée; Walenski, Matthew; Shapiro, Lewis P

2012-06-01

Parallelism effects refer to the facilitated processing of a target structure when it follows a similar, parallel structure. In coordination, a parallelism-related conjunction triggers the expectation that a second conjunct with the same structure as the first conjunct should occur. It has been proposed that parallelism effects reflect the use of the first structure as a template that guides the processing of the second. In this study, we examined the role of parallelism in real-time anaphora resolution by charting activation patterns in coordinated constructions containing anaphora, Verb-Phrase Ellipsis (VPE) and Noun-Phrase Traces (NP-traces). Specifically, we hypothesised that an expectation of parallelism would incite the parser to assume a structure similar to the first conjunct in the second, anaphora-containing conjunct. The speculation of a similar structure would result in early postulation of covert anaphora. Experiment 1 confirms that following a parallelism-related conjunction, first-conjunct material is activated in the second conjunct. Experiment 2 reveals that an NP-trace in the second conjunct is posited immediately where licensed, which is earlier than previously reported in the literature. In light of our findings, we propose an intricate relation between structural expectations and anaphor resolution.
Parallel eigenanalysis of finite element models in a completely connected architecture

Science.gov (United States)

Akl, F. A.; Morel, M. R.

1989-01-01

A parallel algorithm is presented for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi) = (M)(phi)(omega), where (K) and (M) are of order N, and (omega) is order of q. The concurrent solution of the eigenproblem is based on the multifrontal/modified subspace method and is achieved in a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm was successfully implemented on a tightly coupled multiple-instruction multiple-data parallel processing machine, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macrotasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. A parallel finite element dynamic analysis program, p-feda, is documented and the performance of its subroutines in parallel environment is analyzed.
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.

Science.gov (United States)

Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping

2018-04-27

A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
A Hybrid Parallel Execution Model for Logic Based Requirement Specifications (Invited Paper

Directory of Open Access Journals (Sweden)

Jeffrey J. P. Tsai

1999-05-01

Full Text Available It is well known that undiscovered errors in a requirements specification is extremely expensive to be fixed when discovered in the software maintenance phase. Errors in the requirement phase can be reduced through the validation and verification of the requirements specification. Many logic-based requirements specification languages have been developed to achieve these goals. However, the execution and reasoning of a logic-based requirements specification can be very slow. An effective way to improve their performance is to execute and reason the logic-based requirements specification in parallel. In this paper, we present a hybrid model to facilitate the parallel execution of a logic-based requirements specification language. A logic-based specification is first applied by a data dependency analysis technique which can find all the mode combinations that exist within a specification clause. This mode information is used to support a novel hybrid parallel execution model, which combines both top-down and bottom-up evaluation strategies. This new execution model can find the failure in the deepest node of the search tree at the early stage of the evaluation, thus this new execution model can reduce the total number of nodes searched in the tree, the total processes needed to be generated, and the total communication channels needed in the search process. A simulator has been implemented to analyze the execution behavior of the new model. Experiments show significant improvement based on several criteria.
An extension of the extended parallel process model (EPPM) in television health news: the influence of health consciousness on individual message processing and acceptance.

Science.gov (United States)

Hong, Hyehyun

2011-06-01

The purpose of this study is to examine the role of health consciousness in processing TV news that contains potential health threats and preventive recommendations. Based on the extended parallel process model (Witte, 1992), relationships among health consciousness, perceived severity, perceived susceptibility, perceived response efficacy, perceived self-efficacy, and message acceptance/rejection were hypothesized. Responses collected from 175 participants after viewing four TV health news stories were analyzed using the bootstrapping analysis (Preacher & Hayes, 2008). Results confirmed three mediators (i.e., perceived severity, response efficacy, self-efficacy) in the influence of health consciousness on message acceptance. A negative association found between health consciousness and perceived susceptibility is discussed in relation to characteristics of health conscious individuals and optimistic bias of health risks.
Heterogeneous Multicore Parallel Programming for Graphics Processing Units

Directory of Open Access Journals (Sweden)

Francois Bodin

2009-01-01

Full Text Available Hybrid parallel multicore architectures based on graphics processing units (GPUs can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a unintrusive manner while preserving the legacy code.
Metastable states in the hierarchical Dyson model drive parallel processing in the hierarchical Hopfield network

International Nuclear Information System (INIS)

Agliari, Elena; Barra, Adriano; Guerra, Francesco; Galluzzi, Andrea; Tantari, Daniele; Tavani, Flavia

2015-01-01

In this paper, we introduce and investigate the statistical mechanics of hierarchical neural networks. First, we approach these systems à la Mattis, by thinking of the Dyson model as a single-pattern hierarchical neural network. We also discuss the stability of different retrievable states as predicted by the related self-consistencies obtained both from a mean-field bound and from a bound that bypasses the mean-field limitation. The latter is worked out by properly reabsorbing the magnetization fluctuations related to higher levels of the hierarchy into effective fields for the lower levels. Remarkably, mixing Amit's ansatz technique for selecting candidate-retrievable states with the interpolation procedure for solving for the free energy of these states, we prove that, due to gauge symmetry, the Dyson model accomplishes both serial and parallel processing. We extend this scenario to multiple stored patterns by implementing the Hebb prescription for learning within the couplings. This results in Hopfield-like networks constrained on a hierarchical topology, for which, by restricting to the low-storage regime where the number of patterns grows at its most logarithmical with the amount of neurons, we prove the existence of the thermodynamic limit for the free energy, and we give an explicit expression of its mean-field bound and of its related improved bound. We studied the resulting self-consistencies for the Mattis magnetizations, which act as order parameters, are studied and the stability of solutions is analyzed to get a picture of the overall retrieval capabilities of the system according to both mean-field and non-mean-field scenarios. Our main finding is that embedding the Hebbian rule on a hierarchical topology allows the network to accomplish both serial and parallel processing. By tuning the level of fast noise affecting it or triggering the decay of the interactions with the distance among neurons, the system may switch from sequential retrieval to
Three-dimensional parallel edge-based finite element modeling of electromagnetic data with field redatuming

DEFF Research Database (Denmark)

Cai, Hongzhu; Čuma, Martin; Zhdanov, Michael

2015-01-01

This paper presents a parallelized version of the edge-based finite element method with a novel post-processing approach for numerical modeling of an electromagnetic field in complex media. The method uses an unstructured tetrahedral mesh which can reduce the number of degrees of freedom signific......This paper presents a parallelized version of the edge-based finite element method with a novel post-processing approach for numerical modeling of an electromagnetic field in complex media. The method uses an unstructured tetrahedral mesh which can reduce the number of degrees of freedom...... significantly. The linear system of finite element equations is solved using parallel direct solvers which are robust for ill-conditioned systems and efficient for multiple source electromagnetic (EM) modeling. We also introduce a novel approach to compute the scalar components of the electric field from...... the tangential components along each edge based on field redatuming. The method can produce a more accurate result as compared to conventional approach. We have applied the developed algorithm to compute the EM response for a typical 3D anisotropic geoelectrical model of the off-shore HC reservoir with complex...
Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.

Science.gov (United States)

Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele

2015-01-01

Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.

War and peace: morphemes and full forms in a noninteractive activation parallel dual-route model.

Science.gov (United States)

Baayen, H; Schreuder, R

This article introduces a computational tool for modeling the process of morphological segmentation in visual and auditory word recognition in the framework of a parallel dual-route model. Copyright 1999 Academic Press.
PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records.

Science.gov (United States)

Ng, Kenney; Ghoting, Amol; Steinhubl, Steven R; Stewart, Walter F; Malin, Bradley; Sun, Jimeng

2014-04-01

Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: (1) cohort construction, (2) feature construction, (3) cross-validation, (4) feature selection, and (5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data. To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which (1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, (2) schedules the tasks in a topological ordering of the graph, and (3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported. We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3h in parallel compared to 9days if running sequentially. This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate goal of building analytic pipelines
Application of parallel computing to seismic damage process simulation of an arch dam

International Nuclear Information System (INIS)

Zhong Hong; Lin Gao; Li Jianbo

2010-01-01

The simulation of damage process of high arch dam subjected to strong earthquake shocks is significant to the evaluation of its performance and seismic safety, considering the catastrophic effect of dam failure. However, such numerical simulation requires rigorous computational capacity. Conventional serial computing falls short of that and parallel computing is a fairly promising solution to this problem. The parallel finite element code PDPAD was developed for the damage prediction of arch dams utilizing the damage model with inheterogeneity of concrete considered. Developed with programming language Fortran, the code uses a master/slave mode for programming, domain decomposition method for allocation of tasks, MPI (Message Passing Interface) for communication and solvers from AZTEC library for solution of large-scale equations. Speedup test showed that the performance of PDPAD was quite satisfactory. The code was employed to study the damage process of a being-built arch dam on a 4-node PC Cluster, with more than one million degrees of freedom considered. The obtained damage mode was quite similar to that of shaking table test, indicating that the proposed procedure and parallel code PDPAD has a good potential in simulating seismic damage mode of arch dams. With the rapidly growing need for massive computation emerged from engineering problems, parallel computing will find more and more applications in pertinent areas.
Aspects of parallel processing and control engineering

OpenAIRE

McKittrick, Brendan J

1991-01-01

The concept of parallel processing is not a new one, but the application of it to control engineering tasks is a relatively recent development, made possible by contemporary hardware and software innovation. It has long been accepted that, if properly orchestrated several processors/CPUs when combined can form a powerful processing entity. What prevented this from being implemented in commercial systems was the adequacy of the microprocessor for most tasks and hence the expense of a multi-pro...
Endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface of a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R; Ratterman, Joseph D; Smith, Brian E

2014-11-11

Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.
Initial Assessment of Parallelization of Monte Carlo Calculation using Graphics Processing Units

International Nuclear Information System (INIS)

Choi, Sung Hoon; Joo, Han Gyu

2009-01-01

Monte Carlo (MC) simulation is an effective tool for calculating neutron transports in complex geometry. However, because Monte Carlo simulates each neutron behavior one by one, it takes a very long computing time if enough neutrons are used for high precision of calculation. Accordingly, methods that reduce the computing time are required. In a Monte Carlo code, parallel calculation is well-suited since it simulates the behavior of each neutron independently and thus parallel computation is natural. The parallelization of the Monte Carlo codes, however, was done using multi CPUs. By the global demand for high quality 3D graphics, the Graphics Processing Unit (GPU) has developed into a highly parallel, multi-core processor. This parallel processing capability of GPUs can be available to engineering computing once a suitable interface is provided. Recently, NVIDIA introduced CUDATM, a general purpose parallel computing architecture. CUDA is a software environment that allows developers to manage GPU using C/C++ or other languages. In this work, a GPU-based Monte Carlo is developed and the initial assessment of it parallel performance is investigated
Adaptive Dynamic Process Scheduling on Distributed Memory Parallel Computers

Directory of Open Access Journals (Sweden)

Wei Shu

1994-01-01

Full Text Available One of the challenges in programming distributed memory parallel machines is deciding how to allocate work to processors. This problem is particularly important for computations with unpredictable dynamic behaviors or irregular structures. We present a scheme for dynamic scheduling of medium-grained processes that is useful in this context. The adaptive contracting within neighborhood (ACWN is a dynamic, distributed, load-dependent, and scalable scheme. It deals with dynamic and unpredictable creation of processes and adapts to different systems. The scheme is described and contrasted with two other schemes that have been proposed in this context, namely the randomized allocation and the gradient model. The performance of the three schemes on an Intel iPSC/2 hypercube is presented and analyzed. The experimental results show that even though the ACWN algorithm incurs somewhat larger overhead than the randomized allocation, it achieves better performance in most cases due to its adaptiveness. Its feature of quickly spreading the work helps it outperform the gradient model in performance and scalability.
Peformance Tuning and Evaluation of a Parallel Community Climate Model

Energy Technology Data Exchange (ETDEWEB)

Drake, J.B.; Worley, P.H.; Hammond, S.

1999-11-13

The Parallel Community Climate Model (PCCM) is a message-passing parallelization of version 2.1 of the Community Climate Model (CCM) developed by researchers at Argonne and Oak Ridge National Laboratories and at the National Center for Atmospheric Research in the early to mid 1990s. In preparation for use in the Department of Energy's Parallel Climate Model (PCM), PCCM has recently been updated with new physics routines from version 3.2 of the CCM, improvements to the parallel implementation, and ports to the SGIKray Research T3E and Origin 2000. We describe our experience in porting and tuning PCCM on these new platforms, evaluating the performance of different parallel algorithm options and comparing performance between the T3E and Origin 2000.
Method of parallel processing in SANPO real time system

International Nuclear Information System (INIS)

Ostrovnoj, A.I.; Salamatin, I.M.

1981-01-01

A method of parellel processing in SANPO real time system is described. Algorithms of data accumulation and preliminary processing in this system as a parallel processes using a specialized high level programming language are described. Hierarchy of elementary processes are also described. It provides the synchronization of concurrent processes without semaphors. The developed means are applied to the systems of experiment automation using SM-3 minicomputers [ru
Real-time SHVC software decoding with multi-threaded parallel processing

Science.gov (United States)

Gudumasu, Srinivas; He, Yuwen; Ye, Yan; He, Yong; Ryu, Eun-Seok; Dong, Jie; Xiu, Xiaoyu

2014-09-01

This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.
Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-12-31

This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.
Parallel Distributed Processing Theory in the Age of Deep Networks.

Science.gov (United States)

Bowers, Jeffrey S

2017-12-01

Parallel distributed processing (PDP) models in psychology are the precursors of deep networks used in computer science. However, only PDP models are associated with two core psychological claims, namely that all knowledge is coded in a distributed format and cognition is mediated by non-symbolic computations. These claims have long been debated in cognitive science, and recent work with deep networks speaks to this debate. Specifically, single-unit recordings show that deep networks learn units that respond selectively to meaningful categories, and researchers are finding that deep networks need to be supplemented with symbolic systems to perform some tasks. Given the close links between PDP and deep networks, it is surprising that research with deep networks is challenging PDP theory. Copyright © 2017. Published by Elsevier Ltd.
Parallel processing architecture for H.264 deblocking filter on multi-core platforms

Science.gov (United States)

Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

2012-03-01

Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking
One Factor or Two Parallel Processes? Comorbidity and Development of Adolescent Anxiety and Depressive Disorder Symptoms

Science.gov (United States)

Hale, William W., III; Raaijmakers, Quinten A. W.; Muris, Peter; van Hoof, Anne; Meeus, Wim H. J.

2009-01-01

Background: This study investigates whether anxiety and depressive disorder symptoms of adolescents from the general community are best described by a model that assumes they are indicative of one general factor or by a model that assumes they are two distinct disorders with parallel growth processes. Additional analyses were conducted to explore…
Synchronization Of Parallel Discrete Event Simulations

Science.gov (United States)

Steinman, Jeffrey S.

1992-01-01

Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
Application of parallel processing for automatic inspection of printed circuits

International Nuclear Information System (INIS)

Lougheed, R.M.

1986-01-01

Automated visual inspection of printed electronic circuits is a challenging application for image processing systems. Detailed inspection requires high speed analysis of gray scale imagery along with high quality optics, lighting, and sensing equipment. A prototype system has been developed and demonstrated at the Environmental Research Institute of Michigan (ERIM) for inspection of multilayer thick-film circuits. The central problem of real-time image processing is solved by a special-purpose parallel processor which includes a new high-speed Cytocomputer. In this chapter the inspection process and the algorithms used are summarized, along with the functional requirements of the machine vision system. Next, the parallel processor is described in detail and then performance on this application is given
Information-Limited Parallel Processing in Difficult Heterogeneous Covert Visual Search

Science.gov (United States)

Dosher, Barbara Anne; Han, Songmei; Lu, Zhong-Lin

2010-01-01

Difficult visual search is often attributed to time-limited serial attention operations, although neural computations in the early visual system are parallel. Using probabilistic search models (Dosher, Han, & Lu, 2004) and a full time-course analysis of the dynamics of covert visual search, we distinguish unlimited capacity parallel versus serial…
Development of mathematical model and optimal control system of internal temperatures of hot-blast stove process in staggered parallel operation; Netsufuro sushiki model to parallel sofu ni okeru ronai ondo saiteki seigyo system no kaihatsu

Energy Technology Data Exchange (ETDEWEB)

Matoba, Y. [Sumitomo Metal Industries, Ltd., Osaka (Japan); Otsuka, K.

1998-07-01

A mathematical model and an optimal control system of hot-blast stove process are described. A precise mathematical simulation model of the hot-blast stove was developed and the accuracy of the model has been confirmed. An optimal control system of the thermal conditions of the hot-blast stoves in staggered parallel operation was also developed. By the use of the multivariable optimal regulator and the feedforward compensations for the change of the aimed blast temperature and blast volume, the system is able to control the hot blast temperature and the brick temperature efficiently. The system has been applied to Kashima works. The variations of the blast temperature and the silica brick temperature have been decreased. The ultimate low heat level operations have been realized and the thermal efficiency furthermore has been raised by about 1%. 8 refs., 14 figs., 1 tab.
Linear parallel processing machines I

Energy Technology Data Exchange (ETDEWEB)

Von Kunze, M

1984-01-01

As is well-known, non-context-free grammars for generating formal languages happen to be of a certain intrinsic computational power that presents serious difficulties to efficient parsing algorithms as well as for the development of an algebraic theory of contextsensitive languages. In this paper a framework is given for the investigation of the computational power of formal grammars, in order to start a thorough analysis of grammars consisting of derivation rules of the form aB ..-->.. A/sub 1/ ... A /sub n/ b/sub 1/...b /sub m/ . These grammars may be thought of as automata by means of parallel processing, if one considers the variables as operators acting on the terminals while reading them right-to-left. This kind of automata and their 2-dimensional programming language prove to be useful by allowing a concise linear-time algorithm for integer multiplication. Linear parallel processing machines (LP-machines) which are, in their general form, equivalent to Turing machines, include finite automata and pushdown automata (with states encoded) as special cases. Bounded LP-machines yield deterministic accepting automata for nondeterministic contextfree languages, and they define an interesting class of contextsensitive languages. A characterization of this class in terms of generating grammars is established by using derivation trees with crossings as a helpful tool. From the algebraic point of view, deterministic LP-machines are effectively represented semigroups with distinguished subsets. Concerning the dualism between generating and accepting devices of formal languages within the algebraic setting, the concept of accepting automata turns out to reduce essentially to embeddability in an effectively represented extension monoid, even in the classical cases.
Prediction of Adequate Prenatal Care Utilization Based on the Extended Parallel Process Model.

Science.gov (United States)

Hajian, Sepideh; Imani, Fatemeh; Riazi, Hedyeh; Salmani, Fatemeh

2017-10-01

Pregnancy complications are one of the major public health concerns. One of the main causes of preventable complications is the absence of or inadequate provision of prenatal care. The present study was conducted to investigate whether Extended Parallel Process Model's constructs can predict the utilization of prenatal care services. The present longitudinal prospective study was conducted on 192 pregnant women selected through the multi-stage sampling of health facilities in Qeshm, Hormozgan province, from April to June 2015. Participants were followed up from the first half of pregnancy until their childbirth to assess adequate or inadequate/non-utilization of prenatal care services. Data were collected using the structured Risk Behavior Diagnosis Scale. The analysis of the data was carried out in SPSS-22 using one-way ANOVA, linear regression and logistic regression analysis. The level of significance was set at 0.05. Totally, 178 pregnant women with a mean age of 25.31±5.42 completed the study. Perceived self-efficacy (OR=25.23; Pprenatal care. Husband's occupation in the labor market (OR=0.43; P=0.02), unwanted pregnancy (OR=0.352; Pcare for the minors or elderly at home (OR=0.35; P=0.045) were associated with lower odds of receiving prenatal care. The model showed that when perceived efficacy of the prenatal care services overcame the perceived threat, the likelihood of prenatal care usage will increase. This study identified some modifiable factors associated with prenatal care usage by women, providing key targets for appropriate clinical interventions.

Digital intermediate frequency QAM modulator using parallel processing

Science.gov (United States)

Pao, Hsueh-Yuan [Livermore, CA; Tran, Binh-Nien [San Ramon, CA

2008-05-27

The digital Intermediate Frequency (IF) modulator applies to various modulation types and offers a simple and low cost method to implement a high-speed digital IF modulator using field programmable gate arrays (FPGAs). The architecture eliminates multipliers and sequential processing by storing the pre-computed modulated cosine and sine carriers in ROM look-up-tables (LUTs). The high-speed input data stream is parallel processed using the corresponding LUTs, which reduces the main processing speed, allowing the use of low cost FPGAs.
Parallel processing is good for your scientific codes...But massively parallel processing is so much better

International Nuclear Information System (INIS)

Thomas, B.; Domain, Ch.; Souffez, Y.; Eon-Duval, P.

1998-01-01

Harnessing the power of many computers, to solve concurrently difficult scientific problems, is one of the most innovative trend in High Performance Computing. At EDF, we have invested in parallel computing and have achieved significant results. First we improved the processing speed of strategic codes, in order to extend their scope. Then we turned to numerical simulations at the atomic scale. These computations, we never dreamt of before, provided us with a better understanding of metallurgic phenomena. More precisely we were able to trace defects in alloys that are used in nuclear power plants. (author)
Using the extended parallel process model to prevent noise-induced hearing loss among coal miners in Appalachia

Energy Technology Data Exchange (ETDEWEB)

Murray-Johnson, L.; Witte, K.; Patel, D.; Orrego, V.; Zuckerman, C.; Maxfield, A.M.; Thimons, E.D. [Ohio State University, Columbus, OH (US)

2004-12-15

Occupational noise-induced hearing loss is the second most self-reported occupational illness or injury in the United States. Among coal miners, more than 90% of the population reports a hearing deficit by age 55. In this formative evaluation, focus groups were conducted with coal miners in Appalachia to ascertain whether miners perceive hearing loss as a major health risk and if so, what would motivate the consistent wearing of hearing protection devices (HPDs). The theoretical framework of the Extended Parallel Process Model was used to identify the miners' knowledge, attitudes, beliefs, and current behaviors regarding hearing protection. Focus group participants had strong perceived severity and varying levels of perceived susceptibility to hearing loss. Various barriers significantly reduced the self-efficacy and the response efficacy of using hearing protection.
Real-time data acquisition and parallel data processing solution for TJ-II Bolometer arrays diagnostic

Energy Technology Data Exchange (ETDEWEB)

Barrera, E. [Departamento de Sistemas Electronicos y de Control, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain)]. E-mail: eduardo.barrera@upm.es; Ruiz, M. [Grupo de Investigacion en Instrumentacion y Acustica Aplicada, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain); Lopez, S. [Departamento de Sistemas Electronicos y de Control, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain); Machon, D. [Departamento de Sistemas Electronicos y de Control, Universidad Politecnica de Madrid, Crta. Valencia Km. 7, 28031 Madrid (Spain); Vega, J. [Asociacion EURATOM/CIEMAT para Fusion, 28040 Madrid (Spain); Ochando, M. [Asociacion EURATOM/CIEMAT para Fusion, 28040 Madrid (Spain)

2006-07-15

Maps of local plasma emissivity of TJ-II plasmas are determined using three-array cameras of silicon photodiodes (AXUV type from IRD). They have assigned the top and side ports of the same sector of the vacuum vessel. Each array consists of 20 unfiltered detectors. The signals from each of these detectors are the inputs to an iterative algorithm of tomographic reconstruction. Currently, these signals are acquired by a PXI standard system at approximately 50 kS/s, with 12 bits of resolution and are stored for off-line processing. A 0.5 s discharge generates 3 Mbytes of raw data. The algorithm's load exceeds the CPU capacity of the PXI system's controller in a continuous mode, making unfeasible to process the samples in parallel with their acquisition in a PXI standard system. A new architecture model has been developed, making possible to add one or several processing cards to a standard PXI system. With this model, it is possible to define how to distribute, in real-time, the data from all acquired signals in the system among the processing cards and the PXI controller. This way, by distributing the data processing among the system controller and two processing cards, the data processing can be done in parallel with the acquisition. Hence, this system configuration would be able to measure even in long pulse devices.
The parallel processing impact in the optimization of the reactors neutronic by genetic algorithms

International Nuclear Information System (INIS)

Pereira, Claudio M.N.A.; Universidade Federal, Rio de Janeiro, RJ; Lapa, Celso M.F.; Mol, Antonio C.A.

2002-01-01

Nowadays, many optimization problems found in nuclear engineering has been solved through genetic algorithms (GA). The robustness of such methods is strongly related to the nature of search process which is based on populations of solution candidates, and this fact implies high computational cost in the optimization process. The use of GA become more critical when the evaluation process of a solution candidate is highly time consuming. Problems of this nature are common in the nuclear engineering, and an example is the reactor design optimization, where neutronic codes, which consume high CPU time, must be run. Aiming to investigate the impact of the use of parallel computation in the solution, through GA, of a reactor design optimization problem, a parallel genetic algorithm (PGA), using the Island Model, was developed. Exhaustive experiments, then 1500 processing hours in 550 MHz personal computers, have been done, in order to compare the conventional GA with the PGA. Such experiments have demonstrating the superiority of the PGA not only in terms of execution time, but also, in the optimization results. (author)
CSDFa: a model for exploiting the trade-off between data and pipeline parallelism

NARCIS (Netherlands)

Koek, Peter; Geuns, S.J.; Hausmans, J.P.H.M.; Corporaal, Henk; Bekooij, Marco Jan Gerrit

2016-01-01

Real-time stream processing applications, such as SDR applications, are often executed concurrently on multiprocessor systems. A unified data flow model and analysis method have been proposed that can be used to simultaneously determine the amount of pipeline and coarse-grained data parallelism
Adapting high-level language programs for parallel processing using data flow

Science.gov (United States)

Standley, Hilda M.

1988-01-01

EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
Design of a family of integrated parallel co-processors for images processing

International Nuclear Information System (INIS)

Court, Thierry

1991-01-01

The design of parallel image processing Systems joining in a same architecture, sophisticated microprocessors and specialised operators is a difficult task, because of the various problems to be taken into account. The current study identifies a certain way of realizing and interfacing such dedicated operators to a central unit with microprocessor type. The two guide lines of this work are the search for polyvalent specialized and re-configurated operators as well as their connections to a System bus, and not to specialized video buses. This research work proposes a certain architecture of circuits dedicated to image processing and two realization proposals of them. One of them was be realized in this study by using silicon compiler tools. This work belongs to a more important project, whose aim is the development of an industrial image processing System, high performing, modular, based on the parallelization, in MIMD structures, of an elementary, autonomous image processing unit integrating a microprocessor equipped with a parallel coprocessor suited to image processing. (author) [fr
Implementation of an Agent-Based Parallel Tissue Modelling Framework for the Intel MIC Architecture

Directory of Open Access Journals (Sweden)

Maciej Cytowski

2017-01-01

Full Text Available Timothy is a novel large scale modelling framework that allows simulating of biological processes involving different cellular colonies growing and interacting with variable environment. Timothy was designed for execution on massively parallel High Performance Computing (HPC systems. The high parallel scalability of the implementation allows for simulations of up to 109 individual cells (i.e., simulations at tissue spatial scales of up to 1 cm3 in size. With the recent advancements of the Timothy model, it has become critical to ensure appropriate performance level on emerging HPC architectures. For instance, the introduction of blood vessels supplying nutrients to the tissue is a very important step towards realistic simulations of complex biological processes, but it greatly increased the computational complexity of the model. In this paper, we describe the process of modernization of the application in order to achieve high computational performance on HPC hybrid systems based on modern Intel® MIC architecture. Experimental results on the Intel Xeon Phi™ coprocessor x100 and the Intel Xeon Phi processor x200 are presented.
Parallel Computing for Terrestrial Ecosystem Carbon Modeling

International Nuclear Information System (INIS)

Wang, Dali; Post, Wilfred M.; Ricciuto, Daniel M.; Berry, Michael

2011-01-01

Terrestrial ecosystems are a primary component of research on global environmental change. Observational and modeling research on terrestrial ecosystems at the global scale, however, has lagged behind their counterparts for oceanic and atmospheric systems, largely because the unique challenges associated with the tremendous diversity and complexity of terrestrial ecosystems. There are 8 major types of terrestrial ecosystem: tropical rain forest, savannas, deserts, temperate grassland, deciduous forest, coniferous forest, tundra, and chaparral. The carbon cycle is an important mechanism in the coupling of terrestrial ecosystems with climate through biological fluxes of CO 2 . The influence of terrestrial ecosystems on atmospheric CO 2 can be modeled via several means at different timescales. Important processes include plant dynamics, change in land use, as well as ecosystem biogeography. Over the past several decades, many terrestrial ecosystem models (see the 'Model developments' section) have been developed to understand the interactions between terrestrial carbon storage and CO 2 concentration in the atmosphere, as well as the consequences of these interactions. Early TECMs generally adapted simple box-flow exchange models, in which photosynthetic CO 2 uptake and respiratory CO 2 release are simulated in an empirical manner with a small number of vegetation and soil carbon pools. Demands on kinds and amount of information required from global TECMs have grown. Recently, along with the rapid development of parallel computing, spatially explicit TECMs with detailed process based representations of carbon dynamics become attractive, because those models can readily incorporate a variety of additional ecosystem processes (such as dispersal, establishment, growth, mortality etc.) and environmental factors (such as landscape position, pest populations, disturbances, resource manipulations, etc.), and provide information to frame policy options for climate change
A new model for reliability optimization of series-parallel systems with non-homogeneous components

International Nuclear Information System (INIS)

Feizabadi, Mohammad; Jahromi, Abdolhamid Eshraghniaye

2017-01-01

In discussions related to reliability optimization using redundancy allocation, one of the structures that has attracted the attention of many researchers, is series-parallel structure. In models previously presented for reliability optimization of series-parallel systems, there is a restricting assumption based on which all components of a subsystem must be homogeneous. This constraint limits system designers in selecting components and prevents achieving higher levels of reliability. In this paper, a new model is proposed for reliability optimization of series-parallel systems, which makes possible the use of non-homogeneous components in each subsystem. As a result of this flexibility, the process of supplying system components will be easier. To solve the proposed model, since the redundancy allocation problem (RAP) belongs to the NP-hard class of optimization problems, a genetic algorithm (GA) is developed. The computational results of the designed GA are indicative of high performance of the proposed model in increasing system reliability and decreasing costs. - Highlights: • In this paper, a new model is proposed for reliability optimization of series-parallel systems. • In the previous models, there is a restricting assumption based on which all components of a subsystem must be homogeneous. • The presented model provides a possibility for the subsystems’ components to be non- homogeneous in the required conditions. • The computational results demonstrate the high performance of the proposed model in improving reliability and reducing costs.
Parallel Processing of Images in Mobile Devices using BOINC

Science.gov (United States)

Curiel, Mariela; Calle, David F.; Santamaría, Alfredo S.; Suarez, David F.; Flórez, Leonardo

2018-04-01

Medical image processing helps health professionals make decisions for the diagnosis and treatment of patients. Since some algorithms for processing images require substantial amounts of resources, one could take advantage of distributed or parallel computing. A mobile grid can be an adequate computing infrastructure for this problem. A mobile grid is a grid that includes mobile devices as resource providers. In a previous step of this research, we selected BOINC as the infrastructure to build our mobile grid. However, parallel processing of images in mobile devices poses at least two important challenges: the execution of standard libraries for processing images and obtaining adequate performance when compared to desktop computers grids. By the time we started our research, the use of BOINC in mobile devices also involved two issues: a) the execution of programs in mobile devices required to modify the code to insert calls to the BOINC API, and b) the division of the image among the mobile devices as well as its merging required additional code in some BOINC components. This article presents answers to these four challenges.
Parallel Processing of Images in Mobile Devices using BOINC

Directory of Open Access Journals (Sweden)

Curiel Mariela

2018-04-01

Full Text Available Medical image processing helps health professionals make decisions for the diagnosis and treatment of patients. Since some algorithms for processing images require substantial amounts of resources, one could take advantage of distributed or parallel computing. A mobile grid can be an adequate computing infrastructure for this problem. A mobile grid is a grid that includes mobile devices as resource providers. In a previous step of this research, we selected BOINC as the infrastructure to build our mobile grid. However, parallel processing of images in mobile devices poses at least two important challenges: the execution of standard libraries for processing images and obtaining adequate performance when compared to desktop computers grids. By the time we started our research, the use of BOINC in mobile devices also involved two issues: a the execution of programs in mobile devices required to modify the code to insert calls to the BOINC API, and b the division of the image among the mobile devices as well as its merging required additional code in some BOINC components. This article presents answers to these four challenges.
Parallel direct solver for finite element modeling of manufacturing processes

DEFF Research Database (Denmark)

Nielsen, Chris Valentin; Martins, P.A.F.

2017-01-01

The central processing unit (CPU) time is of paramount importance in finite element modeling of manufacturing processes. Because the most significant part of the CPU time is consumed in solving the main system of equations resulting from finite element assemblies, different approaches have been...
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

Science.gov (United States)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Expressing Parallelism with ROOT

Energy Technology Data Exchange (ETDEWEB)

Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab

2017-11-22

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT

Science.gov (United States)

Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

2017-10-01

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
The Temporal Dynamics of Visual Search: Evidence for Parallel Processing in Feature and Conjunction Searches

Science.gov (United States)

McElree, Brian; Carrasco, Marisa

2012-01-01

Feature and conjunction searches have been argued to delineate parallel and serial operations in visual processing. The authors evaluated this claim by examining the temporal dynamics of the detection of features and conjunctions. The 1st experiment used a reaction time (RT) task to replicate standard mean RT patterns and to examine the shapes of the RT distributions. The 2nd experiment used the response-signal speed–accuracy trade-off (SAT) procedure to measure discrimination (asymptotic detection accuracy) and detection speed (processing dynamics). Set size affected discrimination in both feature and conjunction searches but affected detection speed only in the latter. Fits of models to the SAT data that included a serial component overpredicted the magnitude of the observed dynamics differences. The authors concluded that both features and conjunctions are detected in parallel. Implications for the role of attention in visual processing are discussed. PMID:10641310
Computer model of a reverberant and parallel circuit coupling

Science.gov (United States)

Kalil, Camila de Andrade; de Castro, Maria Clícia Stelling; Cortez, Célia Martins

2017-11-01

The objective of the present study was to deepen the knowledge about the functioning of the neural circuits by implementing a signal transmission model using the Graph Theory in a small network of neurons composed of an interconnected reverberant and parallel circuit, in order to investigate the processing of the signals in each of them and the effects on the output of the network. For this, a program was developed in C language and simulations were done using neurophysiological data obtained in the literature.
Parallel community climate model: Description and user`s guide

Energy Technology Data Exchange (ETDEWEB)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H. [and others

1996-07-15

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.

Algorithm comparison and benchmarking using a parallel spectra transform shallow water model

Energy Technology Data Exchange (ETDEWEB)

Worley, P.H. [Oak Ridge National Lab., TN (United States); Foster, I.T.; Toonen, B. [Argonne National Lab., IL (United States)

1995-04-01

In recent years, a number of computer vendors have produced supercomputers based on a massively parallel processing (MPP) architecture. These computers have been shown to be competitive in performance with conventional vector supercomputers for some applications. As spectral weather and climate models are heavy users of vector supercomputers, it is interesting to determine how these models perform on MPPS, and which MPPs are best suited to the execution of spectral models. The benchmarking of MPPs is complicated by the fact that different algorithms may be more efficient on different architectures. Hence, a comprehensive benchmarking effort must answer two related questions: which algorithm is most efficient on each computer and how do the most efficient algorithms compare on different computers. In general, these are difficult questions to answer because of the high cost associated with implementing and evaluating a range of different parallel algorithms on each MPP platform.
Developing a Massively Parallel Forward Projection Radiography Model for Large-Scale Industrial Applications

Energy Technology Data Exchange (ETDEWEB)

Bauerle, Matthew [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2014-08-01

This project utilizes Graphics Processing Units (GPUs) to compute radiograph simulations for arbitrary objects. The generation of radiographs, also known as the forward projection imaging model, is computationally intensive and not widely utilized. The goal of this research is to develop a massively parallel algorithm that can compute forward projections for objects with a trillion voxels (3D pixels). To achieve this end, the data are divided into blocks that can each t into GPU memory. The forward projected image is also divided into segments to allow for future parallelization and to avoid needless computations.
A new decomposition method for parallel processing multi-level optimization

International Nuclear Information System (INIS)

Park, Hyung Wook; Kim, Min Soo; Choi, Dong Hoon

2002-01-01

In practical designs, most of the multidisciplinary problems have a large-size and complicate design system. Since multidisciplinary problems have hundreds of analyses and thousands of variables, the grouping of analyses and the order of the analyses in the group affect the speed of the total design cycle. Therefore, it is very important to reorder and regroup the original design processes in order to minimize the total computational cost by decomposing large multidisciplinary problems into several MultiDisciplinary Analysis SubSystems (MDASS) and by processing them in parallel. In this study, a new decomposition method is proposed for parallel processing of multidisciplinary design optimization, such as Collaborative Optimization (CO) and Individual Discipline Feasible (IDF) method. Numerical results for two example problems are presented to show the feasibility of the proposed method
Parallel transaction processing in functional languages, towards practical functional databases

NARCIS (Netherlands)

Wevers, L.; Huisman, Marieke; de Keijzer, Ander

2013-01-01

This paper shows how functional languages can be adapted for transaction processing, and discusses the implementation of a parallel runtime system for such functional transaction processing languages. We extend functional languages with current state variables and result state variables to allow the
Parallel processing data network of master and slave transputers controlled by a serial control network

Science.gov (United States)

Crosetto, Dario B.

1996-01-01

The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.
Oxytocin: parallel processing in the social brain?

Science.gov (United States)

Dölen, Gül

2015-06-01

Early studies attempting to disentangle the network complexity of the brain exploited the accessibility of sensory receptive fields to reveal circuits made up of synapses connected both in series and in parallel. More recently, extension of this organisational principle beyond the sensory systems has been made possible by the advent of modern molecular, viral and optogenetic approaches. Here, evidence supporting parallel processing of social behaviours mediated by oxytocin is reviewed. Understanding oxytocinergic signalling from this perspective has significant implications for the design of oxytocin-based therapeutic interventions aimed at disorders such as autism, where disrupted social function is a core clinical feature. Moreover, identification of opportunities for novel technology development will require a better appreciation of the complexity of the circuit-level organisation of the social brain. © 2015 The Authors. Journal of Neuroendocrinology published by John Wiley & Sons Ltd on behalf of British Society for Neuroendocrinology.
Parallel processing method for high-speed real time digital pulse processing for gamma-ray spectroscopy

International Nuclear Information System (INIS)

Fernandes, A.M.; Pereira, R.C.; Sousa, J.; Neto, A.; Carvalho, P.; Batista, A.J.N.; Carvalho, B.B.; Varandas, C.A.F.; Tardocchi, M.; Gorini, G.

2010-01-01

A new data acquisition (DAQ) system was developed to fulfil the requirements of the gamma-ray spectrometer (GRS) JET-EP2 (joint European Torus enhancement project 2), providing high-resolution spectroscopy at very high-count rate (up to few MHz). The system is based on the Advanced Telecommunications Computing Architecture TM (ATCA TM ) and includes a transient record (TR) module with 8 channels of 14 bits resolution at 400 MSamples/s (MSPS) sampling rate, 4 GB of local memory, and 2 field programmable gate array (FPGA) able to perform real time algorithms for data reduction and digital pulse processing. Although at 400 MSPS only fast programmable devices such as FPGAs can be used either for data processing and data transfer, FPGA resources also present speed limitation at some specific tasks, leading to an unavoidable data lost when demanding algorithms are applied. To overcome this problem and foreseeing an increase of the algorithm complexity, a new digital parallel filter was developed, aiming to perform real time pulse processing in the FPGAs of the TR module at the presented sampling rate. The filter is based on the conventional digital time-invariant trapezoidal shaper operating with parallelized data while performing pulse height analysis (PHA) and pile up rejection (PUR). The incoming sampled data is successively parallelized and fed into the processing algorithm block at one fourth of the sampling rate. The following data processing and data transfer is also performed at one fourth of the sampling rate. The algorithm based on data parallelization technique was implemented and tested at JET facilities, where a spectrum was obtained. Attending to the observed results, the PHA algorithm will be improved by implementing the pulse pile up discrimination.
Modelling and parallel calculation of a kinetic boundary layer

International Nuclear Information System (INIS)

Perlat, Jean Philippe

1998-01-01

This research thesis aims at addressing reliability and cost issues in the calculation by numeric simulation of flows in transition regime. The first step has been to reduce calculation cost and memory space for the Monte Carlo method which is known to provide performance and reliability for rarefied regimes. Vector and parallel computers allow this objective to be reached. Here, a MIMD (multiple instructions, multiple data) machine has been used which implements parallel calculation at different levels of parallelization. Parallelization procedures have been adapted, and results showed that parallelization by calculation domain decomposition was far more efficient. Due to reliability issue related to the statistic feature of Monte Carlo methods, a new deterministic model was necessary to simulate gas molecules in transition regime. New models and hyperbolic systems have therefore been studied. One is chosen which allows thermodynamic values (density, average velocity, temperature, deformation tensor, heat flow) present in Navier-Stokes equations to be determined, and the equations of evolution of thermodynamic values are described for the mono-atomic case. Numerical resolution of is reported. A kinetic scheme is developed which complies with the structure of all systems, and which naturally expresses boundary conditions. The validation of the obtained 14 moment-based model is performed on shock problems and on Couette flows [fr
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

Energy Technology Data Exchange (ETDEWEB)

Moreland, Kenneth [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Geveci, Berk [Kitware, Inc., Clifton Park, NY (United States)

2014-11-01

The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipeline model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.
Fast parallel algorithm for three-dimensional distance-driven model in iterative computed tomography reconstruction

International Nuclear Information System (INIS)

Chen Jian-Lin; Li Lei; Wang Lin-Yuan; Cai Ai-Long; Xi Xiao-Qi; Zhang Han-Ming; Li Jian-Xin; Yan Bin

2015-01-01

The projection matrix model is used to describe the physical relationship between reconstructed object and projection. Such a model has a strong influence on projection and backprojection, two vital operations in iterative computed tomographic reconstruction. The distance-driven model (DDM) is a state-of-the-art technology that simulates forward and back projections. This model has a low computational complexity and a relatively high spatial resolution; however, it includes only a few methods in a parallel operation with a matched model scheme. This study introduces a fast and parallelizable algorithm to improve the traditional DDM for computing the parallel projection and backprojection operations. Our proposed model has been implemented on a GPU (graphic processing unit) platform and has achieved satisfactory computational efficiency with no approximation. The runtime for the projection and backprojection operations with our model is approximately 4.5 s and 10.5 s per loop, respectively, with an image size of 256×256×256 and 360 projections with a size of 512×512. We compare several general algorithms that have been proposed for maximizing GPU efficiency by using the unmatched projection/backprojection models in a parallel computation. The imaging resolution is not sacrificed and remains accurate during computed tomographic reconstruction. (paper)
A learnable parallel processing architecture towards unity of memory and computing.

Science.gov (United States)

Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

2015-08-14

Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
A learnable parallel processing architecture towards unity of memory and computing

Science.gov (United States)

Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

2015-08-01

Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
Electromagnetic Physics Models for Parallel Computing Architectures

Science.gov (United States)

Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

2016-10-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
Parallelization of elliptic solver for solving 1D Boussinesq model

Science.gov (United States)

Tarwidi, D.; Adytia, D.

2018-03-01

In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
A hybrid parallel framework for the cellular Potts model simulations

Energy Technology Data Exchange (ETDEWEB)

Jiang, Yi [Los Alamos National Laboratory; He, Kejing [SOUTH CHINA UNIV; Dong, Shoubin [SOUTH CHINA UNIV

2009-01-01

The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).
Research on Multi - Person Parallel Modeling Method Based on Integrated Model Persistent Storage

Science.gov (United States)

Qu, MingCheng; Wu, XiangHu; Tao, YongChao; Liu, Ying

2018-03-01

This paper mainly studies the multi-person parallel modeling method based on the integrated model persistence storage. The integrated model refers to a set of MDDT modeling graphics system, which can carry out multi-angle, multi-level and multi-stage description of aerospace general embedded software. Persistent storage refers to converting the data model in memory into a storage model and converting the storage model into a data model in memory, where the data model refers to the object model and the storage model is a binary stream. And multi-person parallel modeling refers to the need for multi-person collaboration, the role of separation, and even real-time remote synchronization modeling.
Load balancing in highly parallel processing of Monte Carlo code for particle transport

International Nuclear Information System (INIS)

Higuchi, Kenji; Takemiya, Hiroshi; Kawasaki, Takuji

1998-01-01

In parallel processing of Monte Carlo (MC) codes for neutron, photon and electron transport problems, particle histories are assigned to processors making use of independency of the calculation for each particle. Although we can easily parallelize main part of a MC code by this method, it is necessary and practically difficult to optimize the code concerning load balancing in order to attain high speedup ratio in highly parallel processing. In fact, the speedup ratio in the case of 128 processors remains in nearly one hundred times when using the test bed for the performance evaluation. Through the parallel processing of the MCNP code, which is widely used in the nuclear field, it is shown that it is difficult to attain high performance by static load balancing in especially neutron transport problems, and a load balancing method, which dynamically changes the number of assigned particles minimizing the sum of the computational and communication costs, overcomes the difficulty, resulting in nearly fifteen percentage of reduction for execution time. (author)
Comparison of Efficacy and Threat Perception Processes in Predicting Smoking among University Students Based on Extended Parallel Process Model

Directory of Open Access Journals (Sweden)

S. Bashirian

2014-04-01

Full Text Available Introduction & Objective: The survey of smoking as the most toxic, common and cheapest ad-diction, and its psychological and demographic variables especially among the youth who are efficient and constructive individuals of the society is of great importance. This study was performed to compare efficacy and threat perception in predicting cigarette smoking among university students based on Expended Parallel Process Model (EPPM. Material & Methods: This cross sectional descriptive study was carried out on 700 college stu-dents of Hamadan recruited with a stratified sampling method. The participants completed a self-administered questionnaire including demographic characteristics, smoking status and EPPM Data analysis was done with the SPSS software (version 16, using t-test, one way ANOVA, Pierson correlation and logistic regression methods. Results: The average scores of threat and efficacy perception were 39.7 and 38.6, respectively. The prevalence of cigarette smoking among participants was 27.1 percent. Also, there were significant differences between the average score of efficacy perception and age, gender, his-tory of drug abuse and dwelling of students (P<0.05. Efficacy and threat perception both predicted student cigarette smoking. Conclusions: Cognitive mediating process of threat perception was a more powerful predictor of cigarette smoking as an unsafe behavior. Therefore, increasing self efficacy and response efficacy of university students aimed at facilitating the acceptance of safe behavior could be note-worthy as a principle in education. (Sci J Hamadan Univ Med Sci 2014; 21 (1:58-65
Parallel processing from applications to systems

CERN Document Server

Moldovan, Dan I

1993-01-01

This text provides one of the broadest presentations of parallelprocessing available, including the structure of parallelprocessors and parallel algorithms. The emphasis is on mappingalgorithms to highly parallel computers, with extensive coverage ofarray and multiprocessor architectures. Early chapters provideinsightful coverage on the analysis of parallel algorithms andprogram transformations, effectively integrating a variety ofmaterial previously scattered throughout the literature. Theory andpractice are well balanced across diverse topics in this concisepresentation. For exceptional cla
A parallel process model of the development of positive smoking expectancies and smoking behavior during early adolescence in Caucasian and African American girls

OpenAIRE

Chung, Tammy; White, Helene R.; Hipwell, Alison E.; Stepp, Stephanie D.; Loeber, Rolf

2010-01-01

This study examined the development of positive smoking expectancies and smoking behavior in an urban cohort of girls followed annually over ages 11-14. Longitudinal data from the oldest cohort of the Pittsburgh Girls Study (N=566, 56% African American, 44% Caucasian) were used to estimate a parallel process growth model of positive smoking expectancies and smoking behavior. Average level of positive smoking expectancies was relatively stable over ages 11-14, although there was significant va...

Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

Science.gov (United States)

Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

2008-04-01

This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
Electromagnetic Physics Models for Parallel Computing Architectures

International Nuclear Information System (INIS)

Amadio, G; Bianchini, C; Iope, R; Ananya, A; Apostolakis, J; Aurora, A; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S Y; Lima, G; Duhem, L

2016-01-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. (paper)
A New Tool for Intelligent Parallel Processing of Radar/SAR Remotely Sensed Imagery

Directory of Open Access Journals (Sweden)

A. Castillo Atoche

2013-01-01

Full Text Available A novel parallel tool for large-scale image enhancement/reconstruction and postprocessing of radar/SAR sensor systems is addressed. The proposed parallel tool performs the following intelligent processing steps: image formation, for the application of different system-level effects of image degradation with a particular remote sensing (RS system and simulation of random noising effects, enhancement/reconstruction by employing nonparametric robust high-resolution techniques, and image postprocessing using the fuzzy anisotropic diffusion technique which incorporates a better edge-preserving noise removal effect and faster diffusion process. This innovative tool allows the processing of high-resolution images provided with different radar/SAR sensor systems as required by RS endusers for environmental monitoring, risk prevention, and resource management. To verify the performance implementation of the proposed parallel framework, the processing steps are developed and specifically tested on graphic processing units (GPU, achieving considerable speedups compared to the serial version of the same techniques implemented in C language.
A model for optimizing file access patterns using spatio-temporal parallelism

Energy Technology Data Exchange (ETDEWEB)

Boonthanome, Nouanesengsy [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Patchett, John [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Geveci, Berk [Kitware Inc., Clifton Park, NY (United States); Ahrens, James [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Bauer, Andy [Kitware Inc., Clifton Park, NY (United States); Chaudhary, Aashish [Kitware Inc., Clifton Park, NY (United States); Miller, Ross G. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Shipman, Galen M. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Williams, Dean N. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2013-01-01

For many years now, I/O read time has been recognized as the primary bottleneck for parallel visualization and analysis of large-scale data. In this paper, we introduce a model that can estimate the read time for a file stored in a parallel filesystem when given the file access pattern. Read times ultimately depend on how the file is stored and the access pattern used to read the file. The file access pattern will be dictated by the type of parallel decomposition used. We employ spatio-temporal parallelism, which combines both spatial and temporal parallelism, to provide greater flexibility to possible file access patterns. Using our model, we were able to configure the spatio-temporal parallelism to design optimized read access patterns that resulted in a speedup factor of approximately 400 over traditional file access patterns.
A Parallel Algebraic Multigrid Solver on Graphics Processing Units

KAUST Repository

Haase, Gundolf

2010-01-01

The paper presents a multi-GPU implementation of the preconditioned conjugate gradient algorithm with an algebraic multigrid preconditioner (PCG-AMG) for an elliptic model problem on a 3D unstructured grid. An efficient parallel sparse matrix-vector multiplication scheme underlying the PCG-AMG algorithm is presented for the many-core GPU architecture. A performance comparison of the parallel solver shows that a singe Nvidia Tesla C1060 GPU board delivers the performance of a sixteen node Infiniband cluster and a multi-GPU configuration with eight GPUs is about 100 times faster than a typical server CPU core. © 2010 Springer-Verlag.
Parallel computing works

Energy Technology Data Exchange (ETDEWEB)

1991-10-23

An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Parallel assembling and equation solving via graph algorithms with an application to the FE simulation of metal extrusion processes

CERN Document Server

Unterkircher, A

2005-01-01

We propose methods for parallel assembling and iterative equation solving based on graph algorithms. The assembling technique is independent of dimension, element type and model shape. As a parallel solving technique we construct a multiplicative symmetric Schwarz preconditioner for the conjugate gradient method. Both methods have been incorporated into a non-linear FE code to simulate 3D metal extrusion processes. We illustrate the efficiency of these methods on shared memory computers by realistic examples.
Processing optimization with parallel computing for the J-PET scanner

Directory of Open Access Journals (Sweden)

Krzemień Wojciech

2015-12-01

Full Text Available The Jagiellonian Positron Emission Tomograph (J-PET collaboration is developing a prototype time of flight (TOF-positron emission tomograph (PET detector based on long polymer scintillators. This novel approach exploits the excellent time properties of the plastic scintillators, which permit very precise time measurements. The very fast field programmable gate array (FPGA-based front-end electronics and the data acquisition system, as well as low- and high-level reconstruction algorithms were specially developed to be used with the J-PET scanner. The TOF-PET data processing and reconstruction are time and resource demanding operations, especially in the case of a large acceptance detector that works in triggerless data acquisition mode. In this article, we discuss the parallel computing methods applied to optimize the data processing for the J-PET detector. We begin with general concepts of parallel computing and then we discuss several applications of those techniques in the J-PET data processing.
Implementation of parallel processing in the basf2 framework for Belle II

International Nuclear Information System (INIS)

Itoh, Ryosuke; Lee, Soohyung; Katayama, N; Mineo, S; Moll, A; Kuhr, T; Heck, M

2012-01-01

Recent PC servers are equipped with multi-core CPUs and it is desired to utilize the full processing power of them for the data analysis in large scale HEP experiments. A software framework basf2 is being developed for the use in the Belle II experiment, a new generation B-factory experiment at KEK, and the parallel event processing to utilize the multi-core CPUs is in its design for the use in the massive data production. The details of the implementation of event parallel processing in the basf2 framework are discussed with the report of preliminary performance study in the realistic use on a 32 core PC server.
Parallelized Genetic Identification of the Thermal-Electrochemical Model for Lithium-Ion Battery

Directory of Open Access Journals (Sweden)

Liqiang Zhang

2013-01-01

Full Text Available The parameters of a well predicted model can be used as health characteristics for Lithium-ion battery. This article reports a parallelized parameter identification of the thermal-electrochemical model, which significantly reduces the time consumption of parameter identification. Since the P2D model has the most predictability, it is chosen for further research and expanded to the thermal-electrochemical model by coupling thermal effect and temperature-dependent parameters. Then Genetic Algorithm is used for parameter identification, but it takes too much time because of the long time simulation of model. For this reason, a computer cluster is built by surplus computing resource in our laboratory based on Parallel Computing Toolbox and Distributed Computing Server in MATLAB. The performance of two parallelized methods, namely Single Program Multiple Data (SPMD and parallel FOR loop (PARFOR, is investigated and then the parallelized GA identification is proposed. With this method, model simulations running parallelly and the parameter identification could be speeded up more than a dozen times, and the identification result is batter than that from serial GA. This conclusion is validated by model parameter identification of a real LiFePO4 battery.
Badlands: A parallel basin and landscape dynamics model

Directory of Open Access Journals (Sweden)

T. Salles

2016-01-01

Full Text Available Over more than three decades, a number of numerical landscape evolution models (LEMs have been developed to study the combined effects of climate, sea-level, tectonics and sediments on Earth surface dynamics. Most of them are written in efficient programming languages, but often cannot be used on parallel architectures. Here, I present a LEM which ports a common core of accepted physical principles governing landscape evolution into a distributed memory parallel environment. Badlands (acronym for BAsin anD LANdscape DynamicS is an open-source, flexible, TIN-based landscape evolution model, built to simulate topography development at various space and time scales.
Tutorial: Parallel Computing of Simulation Models for Risk Analysis.

Science.gov (United States)

Reilly, Allison C; Staid, Andrea; Gao, Michael; Guikema, Seth D

2016-10-01

Simulation models are widely used in risk analysis to study the effects of uncertainties on outcomes of interest in complex problems. Often, these models are computationally complex and time consuming to run. This latter point may be at odds with time-sensitive evaluations or may limit the number of parameters that are considered. In this article, we give an introductory tutorial focused on parallelizing simulation code to better leverage modern computing hardware, enabling risk analysts to better utilize simulation-based methods for quantifying uncertainty in practice. This article is aimed primarily at risk analysts who use simulation methods but do not yet utilize parallelization to decrease the computational burden of these models. The discussion is focused on conceptual aspects of embarrassingly parallel computer code and software considerations. Two complementary examples are shown using the languages MATLAB and R. A brief discussion of hardware considerations is located in the Appendix. © 2016 Society for Risk Analysis.
A Model for Speedup of Parallel Programs

Science.gov (United States)

1997-01-01

Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static
Parallel Execution of Functional Mock-up Units in Buildings Modeling

Energy Technology Data Exchange (ETDEWEB)

Ozmen, Ozgur [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Nutaro, James J. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); New, Joshua Ryan [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

2016-06-30

A Functional Mock-up Interface (FMI) defines a standardized interface to be used in computer simulations to develop complex cyber-physical systems. FMI implementation by a software modeling tool enables the creation of a simulation model that can be interconnected, or the creation of a software library called a Functional Mock-up Unit (FMU). This report describes an FMU wrapper implementation that imports FMUs into a C++ environment and uses an Euler solver that executes FMUs in parallel using Open Multi-Processing (OpenMP). The purpose of this report is to elucidate the runtime performance of the solver when a multi-component system is imported as a single FMU (for the whole system) or as multiple FMUs (for different groups of components as sub-systems). This performance comparison is conducted using two test cases: (1) a simple, multi-tank problem; and (2) a more realistic use case based on the Modelica Buildings Library. In both test cases, the performance gains are promising when each FMU consists of a large number of states and state events that are wrapped in a single FMU. Load balancing is demonstrated to be a critical factor in speeding up parallel execution of multiple FMUs.
Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS

Science.gov (United States)

Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin

2011-04-01

To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.
A dataflow analysis tool for parallel processing of algorithms

Science.gov (United States)

Jones, Robert L., III

1993-01-01

A graph-theoretic design process and software tool is presented for selecting a multiprocessing scheduling solution for a class of computational problems. The problems of interest are those that can be described using a dataflow graph and are intended to be executed repetitively on a set of identical parallel processors. Typical applications include signal processing and control law problems. Graph analysis techniques are introduced and shown to effectively determine performance bounds, scheduling constraints, and resource requirements. The software tool is shown to facilitate the application of the design process to a given problem.
Partitioning sparse rectangular matrices for parallel processing

Energy Technology Data Exchange (ETDEWEB)

Kolda, T.G.

1998-05-01

The authors are interested in partitioning sparse rectangular matrices for parallel processing. The partitioning problem has been well-studied in the square symmetric case, but the rectangular problem has received very little attention. They will formalize the rectangular matrix partitioning problem and discuss several methods for solving it. They will extend the spectral partitioning method for symmetric matrices to the rectangular case and compare this method to three new methods -- the alternating partitioning method and two hybrid methods. The hybrid methods will be shown to be best.
Managing internode data communications for an uninitialized process in a parallel computer

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Miller, Douglas R; Parker, Jeffrey J; Ratterman, Joseph D; Smith, Brian E

2014-05-20

A parallel computer includes nodes, each having main memory and a messaging unit (MU). Each MU includes computer memory, which in turn includes, MU message buffers. Each MU message buffer is associated with an uninitialized process on the compute node. In the parallel computer, managing internode data communications for an uninitialized process includes: receiving, by an MU of a compute node, one or more data communications messages in an MU message buffer associated with an uninitialized process on the compute node; determining, by an application agent, that the MU message buffer associated with the uninitialized process is full prior to initialization of the uninitialized process; establishing, by the application agent, a temporary message buffer for the uninitialized process in main computer memory; and moving, by the application agent, data communications messages from the MU message buffer associated with the uninitialized process to the temporary message buffer in main computer memory.
Parallel processing of genomics data

Science.gov (United States)

Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

2016-10-01

The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.
Performance modeling of parallel algorithms for solving neutron diffusion problems

International Nuclear Information System (INIS)

Azmy, Y.Y.; Kirk, B.L.

1995-01-01

Neutron diffusion calculations are the most common computational methods used in the design, analysis, and operation of nuclear reactors and related activities. Here, mathematical performance models are developed for the parallel algorithm used to solve the neutron diffusion equation on message passing and shared memory multiprocessors represented by the Intel iPSC/860 and the Sequent Balance 8000, respectively. The performance models are validated through several test problems, and these models are used to estimate the performance of each of the two considered architectures in situations typical of practical applications, such as fine meshes and a large number of participating processors. While message passing computers are capable of producing speedup, the parallel efficiency deteriorates rapidly as the number of processors increases. Furthermore, the speedup fails to improve appreciably for massively parallel computers so that only small- to medium-sized message passing multiprocessors offer a reasonable platform for this algorithm. In contrast, the performance model for the shared memory architecture predicts very high efficiency over a wide range of number of processors reasonable for this architecture. Furthermore, the model efficiency of the Sequent remains superior to that of the hypercube if its model parameters are adjusted to make its processors as fast as those of the iPSC/860. It is concluded that shared memory computers are better suited for this parallel algorithm than message passing computers

Parallels between a Collaborative Research Process and the Middle Level Philosophy

Science.gov (United States)

Dever, Robin; Ross, Diane; Miller, Jennifer; White, Paula; Jones, Karen

2014-01-01

The characteristics of the middle level philosophy as described in This We Believe closely parallel the collaborative research process. The journey of one research team is described in relationship to these characteristics. The collaborative process includes strengths such as professional relationships, professional development, courageous…
Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

Science.gov (United States)

Qin, Cheng-Zhi; Zhan, Lijun

2012-06-01

As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU
The Potsdam Parallel Ice Sheet Model (PISM-PIK) - Part 1: Model description

Science.gov (United States)

Winkelmann, R.; Martin, M. A.; Haseloff, M.; Albrecht, T.; Bueler, E.; Khroulev, C.; Levermann, A.

2011-09-01

We present the Potsdam Parallel Ice Sheet Model (PISM-PIK), developed at the Potsdam Institute for Climate Impact Research to be used for simulations of large-scale ice sheet-shelf systems. It is derived from the Parallel Ice Sheet Model (Bueler and Brown, 2009). Velocities are calculated by superposition of two shallow stress balance approximations within the entire ice covered region: the shallow ice approximation (SIA) is dominant in grounded regions and accounts for shear deformation parallel to the geoid. The plug-flow type shallow shelf approximation (SSA) dominates the velocity field in ice shelf regions and serves as a basal sliding velocity in grounded regions. Ice streams can be identified diagnostically as regions with a significant contribution of membrane stresses to the local momentum balance. All lateral boundaries in PISM-PIK are free to evolve, including the grounding line and ice fronts. Ice shelf margins in particular are modeled using Neumann boundary conditions for the SSA equations, reflecting a hydrostatic stress imbalance along the vertical calving face. The ice front position is modeled using a subgrid-scale representation of calving front motion (Albrecht et al., 2011) and a physically-motivated calving law based on horizontal spreading rates. The model is tested in experiments from the Marine Ice Sheet Model Intercomparison Project (MISMIP). A dynamic equilibrium simulation of Antarctica under present-day conditions is presented in Martin et al. (2011).
Dynamics modeling for parallel haptic interfaces with force sensing and control.

Science.gov (United States)

Bernstein, Nicholas; Lawrence, Dale; Pao, Lucy

2013-01-01

Closed-loop force control can be used on haptic interfaces (HIs) to mitigate the effects of mechanism dynamics. A single multidimensional force-torque sensor is often employed to measure the interaction force between the haptic device and the user's hand. The parallel haptic interface at the University of Colorado (CU) instead employs smaller 1D force sensors oriented along each of the five actuating rods to build up a 5D force vector. This paper shows that a particular manipulandum/hand partition in the system dynamics is induced by the placement and type of force sensing, and discusses the implications on force and impedance control for parallel haptic interfaces. The details of a "squaring down" process are also discussed, showing how to obtain reduced degree-of-freedom models from the general six degree-of-freedom dynamics formulation.
Strong Bisimilarity and Regularity of Basic Parallel Processes is PSPACE-Hard

DEFF Research Database (Denmark)

Srba, Jirí

2002-01-01

We show that the problem of checking whether two processes definable in the syntax of Basic Parallel Processes (BPP) are strongly bisimilar is PSPACE-hard. We also demonstrate that there is a polynomial time reduction from the strong bisimilarity checking problem of regular BPP to the strong...
Process-Oriented Parallel Programming with an Application to Data-Intensive Computing

OpenAIRE

Givelberg, Edward

2014-01-01

We introduce process-oriented programming as a natural extension of object-oriented programming for parallel computing. It is based on the observation that every class of an object-oriented language can be instantiated as a process, accessible via a remote pointer. The introduction of process pointers requires no syntax extension, identifies processes with programming objects, and enables processes to exchange information simply by executing remote methods. Process-oriented programming is a h...
Graphics Processing Unit Enhanced Parallel Document Flocking Clustering

Energy Technology Data Exchange (ETDEWEB)

Cui, Xiaohui [ORNL; Potok, Thomas E [ORNL; ST Charles, Jesse Lee [ORNL

2010-01-01

Analyzing and clustering documents is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of document clustering is its complexity O(n2). As the number of documents grows, it becomes increasingly difficult to generate results in a reasonable amount of time. In the last few years, the graphics processing unit (GPU) has received attention for its ability to solve highly-parallel and semi-parallel problems much faster than the traditional sequential processor. In this paper, we have conducted research to exploit this archi- tecture and apply its strengths to the flocking based document clustering problem. Using the CUDA platform from NVIDIA, we developed a doc- ument flocking implementation to be run on the NVIDIA GEFORCE GPU. Performance gains ranged from thirty-six to nearly sixty times improvement of the GPU over the CPU implementation.
Supertracker: A Programmable Parallel Pipeline Arithmetic Processor For Auto-Cueing Target Processing

Science.gov (United States)

Mack, Harold; Reddi, S. S.

1980-04-01

Supertracker represents a programmable parallel pipeline computer architecture that has been designed to meet the real time image processing requirements of auto-cueing target data processing. The prototype bread-board currently under development will be designed to perform input video preprocessing and processing for 525-line and 875-line TV formats FLIR video, automatic display gain and contrast control, and automatic target cueing, classification, and tracking. The video preprocessor is capable of performing operations full frames of video data in real time, e.g., frame integration, storage, 3 x 3 convolution, and neighborhood processing. The processor architecture is being implemented using bit-slice microprogrammable arithmetic processors, operating in parallel. Each processor is capable of up to 20 million operations per second. Multiple frame memories are used for additional flexibility.
Unified Singularity Modeling and Reconfiguration of 3rTPS Metamorphic Parallel Mechanisms with Parallel Constraint Screws

Directory of Open Access Journals (Sweden)

Yufeng Zhuang

2015-01-01

Full Text Available This paper presents a unified singularity modeling and reconfiguration analysis of variable topologies of a class of metamorphic parallel mechanisms with parallel constraint screws. The new parallel mechanisms consist of three reconfigurable rTPS limbs that have two working phases stemming from the reconfigurable Hooke (rT joint. While one phase has full mobility, the other supplies a constraint force to the platform. Based on these, the platform constraint screw systems show that the new metamorphic parallel mechanisms have four topologies by altering the limb phases with mobility change among 1R2T (one rotation with two translations, 2R2T, and 3R2T and mobility 6. Geometric conditions of the mechanism design are investigated with some special topologies illustrated considering the limb arrangement. Following this and the actuation scheme analysis, a unified Jacobian matrix is formed using screw theory to include the change between geometric constraints and actuation constraints in the topology reconfiguration. Various singular configurations are identified by analyzing screw dependency in the Jacobian matrix. The work in this paper provides basis for singularity-free workspace analysis and optimal design of the class of metamorphic parallel mechanisms with parallel constraint screws which shows simple geometric constraints with potential simple kinematics and dynamics properties.
Parallel implementation of a Lagrangian-based model on an adaptive mesh in C++: Application to sea-ice

Science.gov (United States)

Samaké, Abdoulaye; Rampal, Pierre; Bouillon, Sylvain; Ólason, Einar

2017-12-01

We present a parallel implementation framework for a new dynamic/thermodynamic sea-ice model, called neXtSIM, based on the Elasto-Brittle rheology and using an adaptive mesh. The spatial discretisation of the model is done using the finite-element method. The temporal discretisation is semi-implicit and the advection is achieved using either a pure Lagrangian scheme or an Arbitrary Lagrangian Eulerian scheme (ALE). The parallel implementation presented here focuses on the distributed-memory approach using the message-passing library MPI. The efficiency and the scalability of the parallel algorithms are illustrated by the numerical experiments performed using up to 500 processor cores of a cluster computing system. The performance obtained by the proposed parallel implementation of the neXtSIM code is shown being sufficient to perform simulations for state-of-the-art sea ice forecasting and geophysical process studies over geographical domain of several millions squared kilometers like the Arctic region.
Effects of visual information regarding allocentric processing in haptic parallelity matching.

Science.gov (United States)

Van Mier, Hanneke I

2013-10-01

Research has revealed that haptic perception of parallelity deviates from physical reality. Large and systematic deviations have been found in haptic parallelity matching most likely due to the influence of the hand-centered egocentric reference frame. Providing information that increases the influence of allocentric processing has been shown to improve performance on haptic matching. In this study allocentric processing was stimulated by providing informative vision in haptic matching tasks that were performed using hand- and arm-centered reference frames. Twenty blindfolded participants (ten men, ten women) explored the orientation of a reference bar with the non-dominant hand and subsequently matched (task HP) or mirrored (task HM) its orientation on a test bar with the dominant hand. Visual information was provided by means of informative vision with participants having full view of the test bar, while the reference bar was blocked from their view (task VHP). To decrease the egocentric bias of the hands, participants also performed a visual haptic parallelity drawing task (task VHPD) using an arm-centered reference frame, by drawing the orientation of the reference bar. In all tasks, the distance between and orientation of the bars were manipulated. A significant effect of task was found; performance improved from task HP, to VHP to VHPD, and HM. Significant effects of distance were found in the first three tasks, whereas orientation and gender effects were only significant in tasks HP and VHP. The results showed that stimulating allocentric processing by means of informative vision and reducing the egocentric bias by using an arm-centered reference frame led to most accurate performance on parallelity matching. © 2013 Elsevier B.V. All rights reserved.
Parallelization of simulation code for liquid-gas model of lattice-gas fluid

International Nuclear Information System (INIS)

Kawai, Wataru; Ebihara, Kenichi; Kume, Etsuo; Watanabe, Tadashi

2000-03-01

A simulation code for hydrodynamical phenomena which is based on the liquid-gas model of lattice-gas fluid is parallelized by using MPI (Message Passing Interface) library. The parallelized code can be applied to the larger size of the simulations than the non-parallelized code. The calculation times of the parallelized code on VPP500 (Vector-Parallel super computer with dispersed memory units), AP3000 (Scalar-parallel server with dispersed memory units), and a workstation cluster decreased in inverse proportion to the number of processors. (author)
Fast phase processing in off-axis holography by CUDA including parallel phase unwrapping.

Science.gov (United States)

Backoach, Ohad; Kariv, Saar; Girshovitz, Pinhas; Shaked, Natan T

2016-02-22

We present parallel processing implementation for rapid extraction of the quantitative phase maps from off-axis holograms on the Graphics Processing Unit (GPU) of the computer using computer unified device architecture (CUDA) programming. To obtain efficient implementation, we parallelized both the wrapped phase map extraction algorithm and the two-dimensional phase unwrapping algorithm. In contrast to previous implementations, we utilized unweighted least squares phase unwrapping algorithm that better suits parallelism. We compared the proposed algorithm run times on the CPU and the GPU of the computer for various sizes of off-axis holograms. Using the GPU implementation, we extracted the unwrapped phase maps from the recorded off-axis holograms at 35 frames per second (fps) for 4 mega pixel holograms, and at 129 fps for 1 mega pixel holograms, which presents the fastest processing framerates obtained so far, to the best of our knowledge. We then used common-path off-axis interferometric imaging to quantitatively capture the phase maps of a micro-organism with rapid flagellum movements.
PSHED: a simplified approach to developing parallel programs

International Nuclear Information System (INIS)

Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

1992-01-01

This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs
An application of analyzing the trajectories of two disorders: A parallel piecewise growth model of substance use and attention-deficit/hyperactivity disorder.

Science.gov (United States)

Mamey, Mary Rose; Barbosa-Leiker, Celestina; McPherson, Sterling; Burns, G Leonard; Parks, Craig; Roll, John

2015-12-01

Researchers often want to examine 2 comorbid conditions simultaneously. One strategy to do so is through the use of parallel latent growth curve modeling (LGCM). This statistical technique allows for the simultaneous evaluation of 2 disorders to determine the explanations and predictors of change over time. Additionally, a piecewise model can help identify whether there are more than 2 growth processes within each disorder (e.g., during a clinical trial). A parallel piecewise LGCM was applied to self-reported attention-deficit/hyperactivity disorder (ADHD) and self-reported substance use symptoms in 303 adolescents enrolled in cognitive-behavioral therapy treatment for a substance use disorder and receiving either oral-methylphenidate or placebo for ADHD across 16 weeks. Assessing these 2 disorders concurrently allowed us to determine whether elevated levels of 1 disorder predicted elevated levels or increased risk of the other disorder. First, a piecewise growth model measured ADHD and substance use separately. Next, a parallel piecewise LGCM was used to estimate the regressions across disorders to determine whether higher scores at baseline of the disorders (i.e., ADHD or substance use disorder) predicted rates of change in the related disorder. Finally, treatment was added to the model to predict change. While the analyses revealed no significant relationships across disorders, this study explains and applies a parallel piecewise growth model to examine the developmental processes of comorbid conditions over the course of a clinical trial. Strengths of piecewise and parallel LGCMs for other addictions researchers interested in examining dual processes over time are discussed. (PsycINFO Database Record (c) 2015 APA, all rights reserved).
Overview of the Force Scientific Parallel Language

Directory of Open Access Journals (Sweden)

Gita Alaghband

1994-01-01

Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.
Removal of antibiotics in a parallel-plate thin-film-photocatalytic reactor: Process modeling and evolution of transformation by-products and toxicity.

Science.gov (United States)

Özkal, Can Burak; Frontistis, Zacharias; Antonopoulou, Maria; Konstantinou, Ioannis; Mantzavinos, Dionissios; Meriç, Süreyya

2017-10-01

Photocatalytic degradation of sulfamethoxazole (SMX) antibiotic has been studied under recycling batch and homogeneous flow conditions in a thin-film coated immobilized system namely parallel-plate (PPL) reactor. Experimentally designed, statistically evaluated with a factorial design (FD) approach with intent to provide a mathematical model takes into account the parameters influencing process performance. Initial antibiotic concentration, UV energy level, irradiated surface area, water matrix (ultrapure and secondary treated wastewater) and time, were defined as model parameters. A full of 2 5 experimental design was consisted of 32 random experiments. PPL reactor test experiments were carried out in order to set boundary levels for hydraulic, volumetric and defined defined process parameters. TTIP based thin-film with polyethylene glycol+TiO 2 additives were fabricated according to pre-described methodology. Antibiotic degradation was monitored by High Performance Liquid Chromatography analysis while the degradation products were specified by LC-TOF-MS analysis. Acute toxicity of untreated and treated SMX solutions was tested by standard Daphnia magna method. Based on the obtained mathematical model, the response of the immobilized PC system is described with a polynomial equation. The statistically significant positive effects are initial SMX concentration, process time and the combined effect of both, while combined effect of water matrix and irradiated surface area displays an adverse effect on the rate of antibiotic degradation by photocatalytic oxidation. Process efficiency and the validity of the acquired mathematical model was also verified for levofloxacin and cefaclor antibiotics. Immobilized PC degradation in PPL reactor configuration was found capable of providing reduced effluent toxicity by simultaneous degradation of SMX parent compound and TBPs. Copyright © 2017. Published by Elsevier B.V.
Synchronization Techniques in Parallel Discrete Event Simulation

OpenAIRE

Lindén, Jonatan

2018-01-01

Discrete event simulation is an important tool for evaluating system models in many fields of science and engineering. To improve the performance of large-scale discrete event simulations, several techniques to parallelize discrete event simulation have been developed. In parallel discrete event simulation, the work of a single discrete event simulation is distributed over multiple processing elements. A key challenge in parallel discrete event simulation is to ensure that causally dependent ...
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

Energy Technology Data Exchange (ETDEWEB)

Ma, Kwan-Liu [Univ. of California, Davis, CA (United States)

2017-02-01

Most of today’s visualization libraries and applications are based off of what is known today as the visualization pipeline. In the visualization pipeline model, algorithms are encapsulated as “filtering” components with inputs and outputs. These components can be combined by connecting the outputs of one filter to the inputs of another filter. The visualization pipeline model is popular because it provides a convenient abstraction that allows users to combine algorithms in powerful ways. Unfortunately, the visualization pipeline cannot run effectively on exascale computers. Experts agree that the exascale machine will comprise processors that contain many cores. Furthermore, physical limitations will prevent data movement in and out of the chip (that is, between main memory and the processing cores) from keeping pace with improvements in overall compute performance. To use these processors to their fullest capability, it is essential to carefully consider memory access. This is where the visualization pipeline fails. Each filtering component in the visualization library is expected to take a data set in its entirety, perform some computation across all of the elements, and output the complete results. The process of iterating over all elements must be repeated in each filter, which is one of the worst possible ways to traverse memory when trying to maximize the number of executions per memory access. This project investigates a new type of visualization framework that exhibits a pervasive parallelism necessary to run on exascale machines. Our framework achieves this by defining algorithms in terms of functors, which are localized, stateless operations. Functors can be composited in much the same way as filters in the visualization pipeline. But, functors’ design allows them to be concurrently running on massive amounts of lightweight threads. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for
Modeling and Control of Primary Parallel Isolated Boost Converter

DEFF Research Database (Denmark)

Mira Albert, Maria del Carmen; Hernandez Botella, Juan Carlos; Sen, Gökhan

2012-01-01

In this paper state space modeling and closed loop controlled operation have been presented for primary parallel isolated boost converter (PPIBC) topology as a battery charging unit. Parasitic resistances have been included to have an accurate dynamic model. The accuracy of the model has been...

Tolerating correlated failures in Massively Parallel Stream Processing Engines

DEFF Research Database (Denmark)

Su, L.; Zhou, Y.

2016-01-01

Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the o......Fault-tolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task's runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint....... On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE...
Using Motivational Interviewing Techniques to Address Parallel Process in Supervision

Science.gov (United States)

Giordano, Amanda; Clarke, Philip; Borders, L. DiAnne

2013-01-01

Supervision offers a distinct opportunity to experience the interconnection of counselor-client and counselor-supervisor interactions. One product of this network of interactions is parallel process, a phenomenon by which counselors unconsciously identify with their clients and subsequently present to their supervisors in a similar fashion…
A novel conceptual design of parallel nitrogen expansion liquefaction process for small-scale LNG (liquefied natural gas) plant in skid-mount packages

International Nuclear Information System (INIS)

He, Tianbiao; Ju, Yonglin

2014-01-01

The utilization of unconventional natural gas is still a great challenge for China due to its distribution locations and small reserves. Thus, liquefying the unconventional natural gas by using small-scale LNG plant in skid-mount packages is a good choice with great economic benefits. A novel conceptual design of parallel nitrogen expansion liquefaction process for small-scale plant in skid-mount packages has been proposed. It first designs a process configuration. Then, thermodynamic analysis of the process is conducted. Next, an optimization model with genetic algorithm method is developed to optimize the process. Finally, the flexibilities of the process are tested by two different feed gases. In conclusion, the proposed parallel nitrogen expansion liquefaction process can be used in small-scale LNG plant in skid-mount packages with high exergy efficiency and great economic benefits. - Highlights: • A novel design of parallel nitrogen expansion liquefaction process is proposed. • Genetic algorithm is applied to optimize the novel process. • The unit energy consumption of optimized process is 0.5163 kWh/Nm 3 . • The exergy efficiency of the optimized case is 0.3683. • The novel process has a good flexibility for different feed gas conditions
Combining self-affirmation with the extended parallel process model: the consequences for motivation to eat more fruit and vegetables.

Science.gov (United States)

Napper, Lucy E; Harris, Peter R; Klein, William M P

2014-01-01

There is potential for fruitful integration of research using the Extended Parallel Process Model (EPPM) with research using Self-affirmation Theory. However, to date no studies have attempted to do this. This article reports an experiment that tests whether (a) the effects of a self-affirmation manipulation add to those of EPPM variables in predicting intentions to improve a health behavior and (b) self-affirmation moderates the relationship between EPPM variables and intentions. Participants (N = 80) were randomized to either a self-affirmation or control condition prior to receiving personally relevant health information about the risks of not eating at least five portions of fruit and vegetables per day. A hierarchical regression model revealed that efficacy, threat × efficacy, self-affirmation, and self-affirmation × efficacy all uniquely contributed to the prediction of intentions to eat at least five portions per day. Self-affirmed participants and those with higher efficacy reported greater motivation to change. Threat predicted intentions at low levels of efficacy, but not at high levels. Efficacy had a stronger relationship with intentions in the nonaffirmed condition than in the self-affirmed condition. The findings indicate that self-affirmation processes can moderate the impact of variables in the EPPM and also add to the variance explained. We argue that there is potential for integration of the two traditions of research, to the benefit of both.
Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

KAUST Repository

Hamam, Alwaleed A.

2017-03-13

Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it\\'s time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.
Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

KAUST Repository

Hamam, Alwaleed A.; Khan, Ayaz H.

2017-01-01

Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it's time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.
Availability modeling and optimization of dynamic multi-state series–parallel systems with random reconfiguration

International Nuclear Information System (INIS)

Li, Y.F.; Peng, R.

2014-01-01

Most studies on multi-state series–parallel systems focus on the static type of system architecture. However, it is insufficient to model many complex industrial systems having several operation phases and each requires a subset of the subsystems combined together to perform certain tasks. To bridge this gap, this study takes into account this type of dynamic behavior in the multi-state series–parallel system and proposes an analytical approach to calculate the system availability and the operation cost. In this approach, Markov process is used to model the dynamics of system phase changing and component state changing, Markov reward model is used to calculate the operation cost associated with the dynamics, and universal generating function (UGF) is used to build system availability function from the system phase model and the component models. Based upon these models, an optimization problem is formulated to minimize the total system cost with the constraint that system availability is greater than a desired level. The genetic algorithm is then applied to solve the optimization problem. The proposed modeling and solution procedures are illustrated on a system design problem modified from a real-world maritime oil transportation system
Hybrid parallel execution model for logic-based specification languages

CERN Document Server

Tsai, Jeffrey J P

2001-01-01

Parallel processing is a very important technique for improving the performance of various software development and maintenance activities. The purpose of this book is to introduce important techniques for parallel executation of high-level specifications of software systems. These techniques are very useful for the construction, analysis, and transformation of reliable large-scale and complex software systems. Contents: Current Approaches; Overview of the New Approach; FRORL Requirements Specification Language and Its Decomposition; Rewriting and Data Dependency, Control Flow Analysis of a Lo
Facilitating arrhythmia simulation: the method of quantitative cellular automata modeling and parallel running

Directory of Open Access Journals (Sweden)

Mondry Adrian

2004-08-01

Full Text Available Abstract Background Many arrhythmias are triggered by abnormal electrical activity at the ionic channel and cell level, and then evolve spatio-temporally within the heart. To understand arrhythmias better and to diagnose them more precisely by their ECG waveforms, a whole-heart model is required to explore the association between the massively parallel activities at the channel/cell level and the integrative electrophysiological phenomena at organ level. Methods We have developed a method to build large-scale electrophysiological models by using extended cellular automata, and to run such models on a cluster of shared memory machines. We describe here the method, including the extension of a language-based cellular automaton to implement quantitative computing, the building of a whole-heart model with Visible Human Project data, the parallelization of the model on a cluster of shared memory computers with OpenMP and MPI hybrid programming, and a simulation algorithm that links cellular activity with the ECG. Results We demonstrate that electrical activities at channel, cell, and organ levels can be traced and captured conveniently in our extended cellular automaton system. Examples of some ECG waveforms simulated with a 2-D slice are given to support the ECG simulation algorithm. A performance evaluation of the 3-D model on a four-node cluster is also given. Conclusions Quantitative multicellular modeling with extended cellular automata is a highly efficient and widely applicable method to weave experimental data at different levels into computational models. This process can be used to investigate complex and collective biological activities that can be described neither by their governing differentiation equations nor by discrete parallel computation. Transparent cluster computing is a convenient and effective method to make time-consuming simulation feasible. Arrhythmias, as a typical case, can be effectively simulated with the methods
Partial Overhaul and Initial Parallel Optimization of KINETICS, a Coupled Dynamics and Chemistry Atmosphere Model

Science.gov (United States)

Nguyen, Howard; Willacy, Karen; Allen, Mark

2012-01-01

KINETICS is a coupled dynamics and chemistry atmosphere model that is data intensive and computationally demanding. The potential performance gain from using a supercomputer motivates the adaptation from a serial version to a parallelized one. Although the initial parallelization had been done, bottlenecks caused by an abundance of communication calls between processors led to an unfavorable drop in performance. Before starting on the parallel optimization process, a partial overhaul was required because a large emphasis was placed on streamlining the code for user convenience and revising the program to accommodate the new supercomputers at Caltech and JPL. After the first round of optimizations, the partial runtime was reduced by a factor of 23; however, performance gains are dependent on the size of the data, the number of processors requested, and the computer used.
Design and simulation of parallel and distributed architectures for images processing

International Nuclear Information System (INIS)

Pirson, Alain

1990-01-01

The exploitation of visual information requires special computers. The diversity of operations and the Computing power involved bring about structures founded on the concepts of concurrency and distributed processing. This work identifies a vision computer with an association of dedicated intelligent entities, exchanging messages according to the model of parallelism introduced by the language Occam. It puts forward an architecture of the 'enriched processor network' type. It consists of a classical multiprocessor structure where each node is provided with specific devices. These devices perform processing tasks as well as inter-nodes dialogues. Such an architecture benefits from the homogeneity of multiprocessor networks and the power of dedicated resources. Its implementation corresponds to that of a distributed structure, tasks being allocated to each Computing element. This approach culminates in an original architecture called ATILA. This modular structure is based on a transputer network supplied with vision dedicated co-processors and powerful communication devices. (author) [fr
The Parallel System for Integrating Impact Models and Sectors (pSIMS)

Science.gov (United States)

Elliott, Joshua; Kelly, David; Chryssanthacopoulos, James; Glotter, Michael; Jhunjhnuwala, Kanika; Best, Neil; Wilde, Michael; Foster, Ian

2014-01-01

We present a framework for massively parallel climate impact simulations: the parallel System for Integrating Impact Models and Sectors (pSIMS). This framework comprises a) tools for ingesting and converting large amounts of data to a versatile datatype based on a common geospatial grid; b) tools for translating this datatype into custom formats for site-based models; c) a scalable parallel framework for performing large ensemble simulations, using any one of a number of different impacts models, on clusters, supercomputers, distributed grids, or clouds; d) tools and data standards for reformatting outputs to common datatypes for analysis and visualization; and e) methodologies for aggregating these datatypes to arbitrary spatial scales such as administrative and environmental demarcations. By automating many time-consuming and error-prone aspects of large-scale climate impacts studies, pSIMS accelerates computational research, encourages model intercomparison, and enhances reproducibility of simulation results. We present the pSIMS design and use example assessments to demonstrate its multi-model, multi-scale, and multi-sector versatility.
A high-speed linear algebra library with automatic parallelism

Science.gov (United States)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
A Programming Model for Massive Data Parallelism with Data Dependencies

International Nuclear Information System (INIS)

Cui, Xiaohui; Mueller, Frank; Potok, Thomas E.; Zhang, Yongpeng

2009-01-01

Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For graphics processor units (GPUs), this is particularly the case when program development is aided by environments such as NVIDIA s Compute Unified Device Architecture (CUDA), which dramatically reduces the gap between domain-specific architectures and general purpose programming. Nonetheless, general-purpose GPU (GPGPU) programming remains subject to several restrictions. Most significantly, the separation of host (CPU) and accelerator (GPU) address spaces requires explicit management of GPU memory resources, especially for massive data parallelism that well exceeds the memory capacity of GPUs. One solution to this problem is to transfer data between the GPU and host memories frequently. In this work, we investigate another approach. We run massively data-parallel applications on GPU clusters. We further propose a programming model for massive data parallelism with data dependencies for this scenario. Experience from micro benchmarks and real-world applications shows that our model provides not only ease of programming but also significant performance gains
Development of Parallel Code for the Alaska Tsunami Forecast Model

Science.gov (United States)

Bahng, B.; Knight, W. R.; Whitmore, P.

2014-12-01

The Alaska Tsunami Forecast Model (ATFM) is a numerical model used to forecast propagation and inundation of tsunamis generated by earthquakes and other means in both the Pacific and Atlantic Oceans. At the U.S. National Tsunami Warning Center (NTWC), the model is mainly used in a pre-computed fashion. That is, results for hundreds of hypothetical events are computed before alerts, and are accessed and calibrated with observations during tsunamis to immediately produce forecasts. ATFM uses the non-linear, depth-averaged, shallow-water equations of motion with multiply nested grids in two-way communications between domains of each parent-child pair as waves get closer to coastal waters. Even with the pre-computation the task becomes non-trivial as sub-grid resolution gets finer. Currently, the finest resolution Digital Elevation Models (DEM) used by ATFM are 1/3 arc-seconds. With a serial code, large or multiple areas of very high resolution can produce run-times that are unrealistic even in a pre-computed approach. One way to increase the model performance is code parallelization used in conjunction with a multi-processor computing environment. NTWC developers have undertaken an ATFM code-parallelization effort to streamline the creation of the pre-computed database of results with the long term aim of tsunami forecasts from source to high resolution shoreline grids in real time. Parallelization will also permit timely regeneration of the forecast model database with new DEMs; and, will make possible future inclusion of new physics such as the non-hydrostatic treatment of tsunami propagation. The purpose of our presentation is to elaborate on the parallelization approach and to show the compute speed increase on various multi-processor systems.
Using Unified Modelling Language (UML) as a process-modelling technique for clinical-research process improvement.

Science.gov (United States)

Kumarapeli, P; De Lusignan, S; Ellis, T; Jones, B

2007-03-01

The Primary Care Data Quality programme (PCDQ) is a quality-improvement programme which processes routinely collected general practice computer data. Patient data collected from a wide range of different brands of clinical computer systems are aggregated, processed, and fed back to practices in an educational context to improve the quality of care. Process modelling is a well-established approach used to gain understanding and systematic appraisal, and identify areas of improvement of a business process. Unified modelling language (UML) is a general purpose modelling technique used for this purpose. We used UML to appraise the PCDQ process to see if the efficiency and predictability of the process could be improved. Activity analysis and thinking-aloud sessions were used to collect data to generate UML diagrams. The UML model highlighted the sequential nature of the current process as a barrier for efficiency gains. It also identified the uneven distribution of process controls, lack of symmetric communication channels, critical dependencies among processing stages, and failure to implement all the lessons learned in the piloting phase. It also suggested that improved structured reporting at each stage - especially from the pilot phase, parallel processing of data and correctly positioned process controls - should improve the efficiency and predictability of research projects. Process modelling provided a rational basis for the critical appraisal of a clinical data processing system; its potential maybe underutilized within health care.
Parallelizing an electron transport Monte Carlo simulator (MOCASIN 2.0)

International Nuclear Information System (INIS)

Schwetman, H.; Burdick, S.

1988-01-01

Electron transport simulators are tools for studying electrical properties of semiconducting materials and devices. As demands for modeling more complex devices and new materials have emerged, so have demands for more processing power. This paper documents a project to convert an electron transport simulator (MOCASIN 2.0) to a parallel processing environment. In addition to describing the conversion, the paper presents PPL, a parallel programming version of C running on a Sequent multiprocessor system. In timing tests, models that simulated the movement of 2,000 particles for 100 time steps were executed on ten processors, with a parallel efficiency of over 97%
Strong Bisimilarity and Regularity of Basic Parallel Processes is PSPACE-Hard

DEFF Research Database (Denmark)

Srba, Jirí

2002-01-01

We show that the problem of checking whether two processes definable in the syntax of Basic Parallel Processes (BPP) are strongly bisimilar is PSPACE-hard. We also demonstrate that there is a polynomial time reduction from the strong bisimilarity checking problem of regular BPP to the strong...... regularity (finiteness) checking of BPP. This implies that strong regularity of BPP is also PSPACE-hard....
Processing communications events in parallel active messaging interface by awakening thread from wait state

Science.gov (United States)

Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

2013-10-22

Processing data communications events in a parallel active messaging interface (`PAMI`) of a parallel computer that includes compute nodes that execute a parallel application, with the PAMI including data communications endpoints, and the endpoints are coupled for data communications through the PAMI and through other data communications resources, including determining by an advance function that there are no actionable data communications events pending for its context, placing by the advance function its thread of execution into a wait state, waiting for a subsequent data communications event for the context; responsive to occurrence of a subsequent data communications event for the context, awakening by the thread from the wait state; and processing by the advance function the subsequent data communications event now pending for the context.
Parallel computing in enterprise modeling.

Energy Technology Data Exchange (ETDEWEB)

Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

2008-08-01

This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

The Potsdam Parallel Ice Sheet Model (PISM-PIK – Part 1: Model description

Directory of Open Access Journals (Sweden)

R. Winkelmann

2011-09-01

Full Text Available We present the Potsdam Parallel Ice Sheet Model (PISM-PIK, developed at the Potsdam Institute for Climate Impact Research to be used for simulations of large-scale ice sheet-shelf systems. It is derived from the Parallel Ice Sheet Model (Bueler and Brown, 2009. Velocities are calculated by superposition of two shallow stress balance approximations within the entire ice covered region: the shallow ice approximation (SIA is dominant in grounded regions and accounts for shear deformation parallel to the geoid. The plug-flow type shallow shelf approximation (SSA dominates the velocity field in ice shelf regions and serves as a basal sliding velocity in grounded regions. Ice streams can be identified diagnostically as regions with a significant contribution of membrane stresses to the local momentum balance. All lateral boundaries in PISM-PIK are free to evolve, including the grounding line and ice fronts. Ice shelf margins in particular are modeled using Neumann boundary conditions for the SSA equations, reflecting a hydrostatic stress imbalance along the vertical calving face. The ice front position is modeled using a subgrid-scale representation of calving front motion (Albrecht et al., 2011 and a physically-motivated calving law based on horizontal spreading rates. The model is tested in experiments from the Marine Ice Sheet Model Intercomparison Project (MISMIP. A dynamic equilibrium simulation of Antarctica under present-day conditions is presented in Martin et al. (2011.
Assessment of Substance Abuse Behaviors in Adolescents’: Integration of Self-Control into Extended Parallel Process Model

Directory of Open Access Journals (Sweden)

K Witte

2005-04-01

Full Text Available Introduction: An effective preventive health education program on drug abuse can be delivered by applying behavior change theories in a complementary fashion. Methods: The aim of this study was to assess the effectiveness of integrating self-control into Extended Parallel Process Model in drug substance abuse behaviors. A sample of 189 governmental high school students participated in this survey. Information was collected individually by completing researcher designed questionnaire and a urinary rapid immuno-chromatography test for opium and marijuana. Results: The results of the study show that 6.9% of students used drugs (especially opium and marijuana and also peer pressure was determinant factor for using drugs. Moreover the EPPM theoretical variables of perceived severity and perceived self-efficacy with self-control are predictive factors to behavior intention against substance abuse. In this manner, self-control had a significant effect on protective motivation and perceived efficacy. Low self- control was a predictive factor of drug abuse and low self-control students had drug abuse experience. Conclusion: The results of this study suggest that an integration of self-control into EPPM can be effective in expressing and designing primary preventive programs against drug abuse, and assessing abused behavior and deviance behaviors among adolescent population, especially risk seekers
Practical enhancement factor model based on GM for multiple parallel reactions: Piperazine (PZ) CO2 capture

DEFF Research Database (Denmark)

Gaspar, Jozsef; Fosbøl, Philip Loldrup

2017-01-01

Reactive absorption is a key process for gas separation and purification and it is the main technology for CO2 capture. Thus, reliable and simple mathematical models for mass transfer rate calculation are essential. Models which apply to parallel interacting and non-interacting reactions, for all......, desorption and pinch conditions.In this work, we apply the GM model to multiple parallel reactions. We deduce the model for piperazine (PZ) CO2 capture and we validate it against wetted-wall column measurements using 2, 5 and 8 molal PZ for temperatures between 40 °C and 100 °C and CO2 loadings between 0.......23 and 0.41 mol CO2/2 mol PZ. We show that overall second order kinetics describes well the reaction between CO2 and PZ accounting for the carbamate and bicarbamate reactions. Here we prove the GM model for piperazine and MEA but we expect that this practical approach is applicable for various amines...
Depth-Averaged Non-Hydrostatic Hydrodynamic Model Using a New Multithreading Parallel Computing Method

Directory of Open Access Journals (Sweden)

Ling Kang

2017-03-01

Full Text Available Compared to the hydrostatic hydrodynamic model, the non-hydrostatic hydrodynamic model can accurately simulate flows that feature vertical accelerations. The model’s low computational efficiency severely restricts its wider application. This paper proposes a non-hydrostatic hydrodynamic model based on a multithreading parallel computing method. The horizontal momentum equation is obtained by integrating the Navier–Stokes equations from the bottom to the free surface. The vertical momentum equation is approximated by the Keller-box scheme. A two-step method is used to solve the model equations. A parallel strategy based on block decomposition computation is utilized. The original computational domain is subdivided into two subdomains that are physically connected via a virtual boundary technique. Two sub-threads are created and tasked with the computation of the two subdomains. The producer–consumer model and the thread lock technique are used to achieve synchronous communication between sub-threads. The validity of the model was verified by solitary wave propagation experiments over a flat bottom and slope, followed by two sinusoidal wave propagation experiments over submerged breakwater. The parallel computing method proposed here was found to effectively enhance computational efficiency and save 20%–40% computation time compared to serial computing. The parallel acceleration rate and acceleration efficiency are approximately 1.45% and 72%, respectively. The parallel computing method makes a contribution to the popularization of non-hydrostatic models.
Boltzmann machines as a model for parallel annealing

NARCIS (Netherlands)

Aarts, E.H.L.; Korst, J.H.M.

1991-01-01

The potential of Boltzmann machines to cope with difficult combinatorial optimization problems is investigated. A discussion of various (parallel) models of Boltzmann machines is given based on the theory of Markov chains. A general strategy is presented for solving (approximately) combinatorial
Mobile Devices and GPU Parallelism in Ionospheric Data Processing

Science.gov (United States)

Mascharka, D.; Pankratius, V.

2015-12-01

Scientific data acquisition in the field is often constrained by data transfer backchannels to analysis environments. Geoscientists are therefore facing practical bottlenecks with increasing sensor density and variety. Mobile devices, such as smartphones and tablets, offer promising solutions to key problems in scientific data acquisition, pre-processing, and validation by providing advanced capabilities in the field. This is due to affordable network connectivity options and the increasing mobile computational power. This contribution exemplifies a scenario faced by scientists in the field and presents the "Mahali TEC Processing App" developed in the context of the NSF-funded Mahali project. Aimed at atmospheric science and the study of ionospheric Total Electron Content (TEC), this app is able to gather data from various dual-frequency GPS receivers. It demonstrates parsing of full-day RINEX files on mobile devices and on-the-fly computation of vertical TEC values based on satellite ephemeris models that are obtained from NASA. Our experiments show how parallel computing on the mobile device GPU enables fast processing and visualization of up to 2 million datapoints in real-time using OpenGL. GPS receiver bias is estimated through minimum TEC approximations that can be interactively adjusted by scientists in the graphical user interface. Scientists can also perform approximate computations for "quickviews" to reduce CPU processing time and memory consumption. In the final stage of our mobile processing pipeline, scientists can upload data to the cloud for further processing. Acknowledgements: The Mahali project (http://mahali.mit.edu) is funded by the NSF INSPIRE grant no. AGS-1343967 (PI: V. Pankratius). We would like to acknowledge our collaborators at Boston College, Virginia Tech, Johns Hopkins University, Colorado State University, as well as the support of UNAVCO for loans of dual-frequency GPS receivers for use in this project, and Intel for loans of
Performance of Air Pollution Models on Massively Parallel Computers

DEFF Research Database (Denmark)

Brown, John; Hansen, Per Christian; Wasniewski, Jerzy

1996-01-01

To compare the performance and use of three massively parallel SIMD computers, we implemented a large air pollution model on the computers. Using a realistic large-scale model, we gain detailed insight about the performance of the three computers when used to solve large-scale scientific problems...
Improved Path Loss Simulation Incorporating Three-Dimensional Terrain Model Using Parallel Coprocessors

Directory of Open Access Journals (Sweden)

Zhang Bin Loo

2017-01-01

Full Text Available Current network simulators abstract out wireless propagation models due to the high computation requirements for realistic modeling. As such, there is still a large gap between the results obtained from simulators and real world scenario. In this paper, we present a framework for improved path loss simulation built on top of an existing network simulation software, NS-3. Different from the conventional disk model, the proposed simulation also considers the diffraction loss computed using Epstein and Peterson’s model through the use of actual terrain elevation data to give an accurate estimate of path loss between a transmitter and a receiver. The drawback of high computation requirements is relaxed by offloading the computationally intensive components onto an inexpensive off-the-shelf parallel coprocessor, which is a NVIDIA GPU. Experiments are performed using actual terrain elevation data provided from United States Geological Survey. As compared to the conventional CPU architecture, the experimental result shows that a speedup of 20x to 42x is achieved by exploiting the parallel processing of GPU to compute the path loss between two nodes using terrain elevation data. The result shows that the path losses between two nodes are greatly affected by the terrain profile between these two nodes. Besides this, the result also suggests that the common strategy to place the transmitter in the highest position may not always work.
Efficient Out of Core Sorting Algorithms for the Parallel Disks Model.

Science.gov (United States)

Kundeti, Vamsi; Rajasekaran, Sanguthevar

2011-11-01

In this paper we present efficient algorithms for sorting on the Parallel Disks Model (PDM). Numerous asymptotically optimal algorithms have been proposed in the literature. However many of these merge based algorithms have large underlying constants in the time bounds, because they suffer from the lack of read parallelism on PDM. The irregular consumption of the runs during the merge affects the read parallelism and contributes to the increased sorting time. In this paper we first introduce a novel idea called the dirty sequence accumulation that improves the read parallelism. Secondly, we show analytically that this idea can reduce the number of parallel I/O's required to sort the input close to the lower bound of [Formula: see text]. We experimentally verify our dirty sequence idea with the standard R-Way merge and show that our idea can reduce the number of parallel I/Os to sort on PDM significantly.
A queueing network model to analyze the impact of parallelization of care on patient cycle time.

Science.gov (United States)

Jiang, Lixiang; Giachetti, Ronald E

2008-09-01

The total time a patient spends in an outpatient facility, called the patient cycle time, is a major contributor to overall patient satisfaction. A frequently recommended strategy to reduce the total time is to perform some activities in parallel thereby shortening patient cycle time. To analyze patient cycle time this paper extends and improves upon existing multi-class open queueing network model (MOQN) so that the patient flow in an urgent care center can be modeled. Results of the model are analyzed using data from an urgent care center contemplating greater parallelization of patient care activities. The results indicate that parallelization can reduce the cycle time for those patient classes which require more than one diagnostic and/ or treatment intervention. However, for many patient classes there would be little if any improvement, indicating the importance of tools to analyze business process reengineering rules. The paper makes contributions by implementing an approximation for fork/join queues in the network and by improving the approximation for multiple server queues in both low traffic and high traffic conditions. We demonstrate the accuracy of the MOQN results through comparisons to simulation results.
Parallel object-oriented specification language

NARCIS (Netherlands)

Florescu, O.; Voeten, J.P.M.; Theelen, B.D.; Geilen, M.C.W.; Corporaal, H.; Burns, Alan

2008-01-01

The Parallel Object-Oriented Specification Language (POOSL) is an expressive modelling language for hardware/software systems [10]. It was originally defined in [7] as an object-oriented extension of process algebra CCS [6], supporting (conditional) synchronous message passing between
Error Modeling and Design Optimization of Parallel Manipulators

DEFF Research Database (Denmark)

Wu, Guanglei

/backlash, manufacturing and assembly errors and joint clearances. From the error prediction model, the distributions of the pose errors due to joint clearances are mapped within its constant-orientation workspace and the correctness of the developed model is validated experimentally. ix Additionally, using the screw......, dynamic modeling etc. Next, the rst-order dierential equation of the kinematic closure equation of planar parallel manipulator is obtained to develop its error model both in Polar and Cartesian coordinate systems. The established error model contains the error sources of actuation error...
A web-based, collaborative modeling, simulation, and parallel computing environment for electromechanical systems

Directory of Open Access Journals (Sweden)

Xiaoliang Yin

2015-03-01

Full Text Available Complex electromechanical system is usually composed of multiple components from different domains, including mechanical, electronic, hydraulic, control, and so on. Modeling and simulation for electromechanical system on a unified platform is one of the research hotspots in system engineering at present. It is also the development trend of the design for complex electromechanical system. The unified modeling techniques and tools based on Modelica language provide a satisfactory solution. To meet with the requirements of collaborative modeling, simulation, and parallel computing for complex electromechanical systems based on Modelica, a general web-based modeling and simulation prototype environment, namely, WebMWorks, is designed and implemented. Based on the rich Internet application technologies, an interactive graphic user interface for modeling and post-processing on web browser was implemented; with the collaborative design module, the environment supports top-down, concurrent modeling and team cooperation; additionally, service-oriented architecture–based architecture was applied to supply compiling and solving services which run on cloud-like servers, so the environment can manage and dispatch large-scale simulation tasks in parallel on multiple computing servers simultaneously. An engineering application about pure electric vehicle is tested on WebMWorks. The results of simulation and parametric experiment demonstrate that the tested web-based environment can effectively shorten the design cycle of the complex electromechanical system.
cellGPU: Massively parallel simulations of dynamic vertex models

Science.gov (United States)

Sussman, Daniel M.

2017-10-01

Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation
Recent development for the ITS code system: Parallel processing and visualization

International Nuclear Information System (INIS)

Fan, W.C.; Turner, C.D.; Halbleib, J.A. Sr.; Kensek, R.P.

1996-01-01

A brief overview is given for two software developments related to the ITS code system. These developments provide parallel processing and visualization capabilities and thus allow users to perform ITS calculations more efficiently. Timing results and a graphical example are presented to demonstrate these capabilities
GPU: the biggest key processor for AI and parallel processing

Science.gov (United States)

Baji, Toru

2017-07-01

Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
Modeling and optimization of parallel and distributed embedded systems

CERN Document Server

Munir, Arslan; Ranka, Sanjay

2016-01-01

This book introduces the state-of-the-art in research in parallel and distributed embedded systems, which have been enabled by developments in silicon technology, micro-electro-mechanical systems (MEMS), wireless communications, computer networking, and digital electronics. These systems have diverse applications in domains including military and defense, medical, automotive, and unmanned autonomous vehicles. The emphasis of the book is on the modeling and optimization of emerging parallel and distributed embedded systems in relation to the three key design metrics of performance, power and dependability.
Cache-aware data structure model for parallelism and dynamic load balancing

International Nuclear Information System (INIS)

Sridi, Marwa

2016-01-01

This PhD thesis is dedicated to the implementation of innovative parallel methods in the framework of fast transient fluid-structure dynamics. It improves existing methods within EUROPLEXUS software, in order to optimize the shared memory parallel strategy, complementary to the original distributed memory approach, brought together into a global hybrid strategy for clusters of multi-core nodes. Starting from a sound analysis of the state of the art concerning data structuring techniques correlated to the hierarchic memory organization of current multi-processor architectures, the proposed work introduces an approach suitable for an explicit time integration (i.e. with no linear system to solve at each step). A data structure of type 'Structure of arrays' is conserved for the global data storage, providing flexibility and efficiency for current operations on kinematics fields (displacement, velocity and acceleration). On the contrary, in the particular case of elementary operations (for internal forces generic computations, as well as fluxes computations between cell faces for fluid models), particularly time consuming but localized in the program, a temporary data structure of type 'Array of structures' is used instead, to force an efficient filling of the cache memory and increase the performance of the resolution, for both serial and shared memory parallel processing. Switching from the global structure to the temporary one is based on a cell grouping strategy, following classing cache-blocking principles but handling specifically for this work neighboring data necessary to the efficient treatment of ALE fluxes for cells on the group boundaries. The proposed approach is extensively tested, from the point of views of both the computation time and the access failures into cache memory, confronting the gains obtained within the elementary operations to the potential overhead generated by the data structure switch. Obtained results are very satisfactory, especially
Plagiarism Detection for Indonesian Language using Winnowing with Parallel Processing

Science.gov (United States)

Arifin, Y.; Isa, S. M.; Wulandhari, L. A.; Abdurachman, E.

2018-03-01

The plagiarism has many forms, not only copy paste but include changing passive become active voice, or paraphrasing without appropriate acknowledgment. It happens on all language include Indonesian Language. There are many previous research that related with plagiarism detection in Indonesian Language with different method. But there are still some part that still has opportunity to improve. This research proposed the solution that can improve the plagiarism detection technique that can detect not only copy paste form but more advance than that. The proposed solution is using Winnowing with some addition process in pre-processing stage. With stemming processing in Indonesian Language and generate fingerprint in parallel processing that can saving time processing and produce the plagiarism result on the suspected document.
Parallel computing of a climate model on the dawn 1000 by domain decomposition method

Science.gov (United States)

Bi, Xunqiang

1997-12-01

In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.

Parallel Motion Simulation of Large-Scale Real-Time Crowd in a Hierarchical Environmental Model

Directory of Open Access Journals (Sweden)

Xin Wang

2012-01-01

Full Text Available This paper presents a parallel real-time crowd simulation method based on a hierarchical environmental model. A dynamical model of the complex environment should be constructed to simulate the state transition and propagation of individual motions. By modeling of a virtual environment where virtual crowds reside, we employ different parallel methods on a topological layer, a path layer and a perceptual layer. We propose a parallel motion path matching method based on the path layer and a parallel crowd simulation method based on the perceptual layer. The large-scale real-time crowd simulation becomes possible with these methods. Numerical experiments are carried out to demonstrate the methods and results.
Multi states electromechanical switch for energy efficient parallel data processing

KAUST Repository

Kloub, Hussam; Smith, Casey; Hussain, Muhammad Mustafa

2011-01-01

We present a design, simulation results and fabrication of electromechanical switches enabling parallel data processing and multi functionality. The device is applied in logic gates AND, NOR, XNOR, and Flip-Flops. The device footprint size is 2μm by 0.5μm, and has a pull-in voltage of 5.15V which is verified by FEM simulation. © 2011 IEEE.
Multi states electromechanical switch for energy efficient parallel data processing

KAUST Repository

Kloub, Hussam

2011-04-01

We present a design, simulation results and fabrication of electromechanical switches enabling parallel data processing and multi functionality. The device is applied in logic gates AND, NOR, XNOR, and Flip-Flops. The device footprint size is 2μm by 0.5μm, and has a pull-in voltage of 5.15V which is verified by FEM simulation. © 2011 IEEE.
Climate models on massively parallel computers

International Nuclear Information System (INIS)

Vitart, F.; Rouvillois, P.

1993-01-01

First results got on massively parallel computers (Multiple Instruction Multiple Data and Simple Instruction Multiple Data) allow to consider building of coupled models with high resolutions. This would make possible simulation of thermoaline circulation and other interaction phenomena between atmosphere and ocean. The increasing of computers powers, and then the improvement of resolution will go us to revise our approximations. Then hydrostatic approximation (in ocean circulation) will not be valid when the grid mesh will be of a dimension lower than a few kilometers: We shall have to find other models. The expert appraisement got in numerical analysis at the Center of Limeil-Valenton (CEL-V) will be used again to imagine global models taking in account atmosphere, ocean, ice floe and biosphere, allowing climate simulation until a regional scale
Block-Parallel Data Analysis with DIY2

Energy Technology Data Exchange (ETDEWEB)

Morozov, Dmitriy [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Peterka, Tom [Argonne National Lab. (ANL), Argonne, IL (United States)

2017-08-30

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial, parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.
A Hybrid FPGA/Coarse Parallel Processing Architecture for Multi-modal Visual Feature Descriptors

DEFF Research Database (Denmark)

Jensen, Lars Baunegaard With; Kjær-Nielsen, Anders; Alonso, Javier Díaz

2008-01-01

This paper describes the hybrid architecture developed for speeding up the processing of so-called multi-modal visual primitives which are sparse image descriptors extracted along contours. In the system, the first stages of visual processing are implemented on FPGAs due to their highly parallel...
Practical parallel computing

CERN Document Server

Morse, H Stephen

1994-01-01

Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi
Solution-processed parallel tandem polymer solar cells using silver nanowires as intermediate electrode.

Science.gov (United States)

Guo, Fei; Kubis, Peter; Li, Ning; Przybilla, Thomas; Matt, Gebhard; Stubhan, Tobias; Ameri, Tayebeh; Butz, Benjamin; Spiecker, Erdmann; Forberich, Karen; Brabec, Christoph J

2014-12-23

Tandem architecture is the most relevant concept to overcome the efficiency limit of single-junction photovoltaic solar cells. Series-connected tandem polymer solar cells (PSCs) have advanced rapidly during the past decade. In contrast, the development of parallel-connected tandem cells is lagging far behind due to the big challenge in establishing an efficient interlayer with high transparency and high in-plane conductivity. Here, we report all-solution fabrication of parallel tandem PSCs using silver nanowires as intermediate charge collecting electrode. Through a rational interface design, a robust interlayer is established, enabling the efficient extraction and transport of electrons from subcells. The resulting parallel tandem cells exhibit high fill factors of ∼60% and enhanced current densities which are identical to the sum of the current densities of the subcells. These results suggest that solution-processed parallel tandem configuration provides an alternative avenue toward high performance photovoltaic devices.
A simple and efficient parallel FFT algorithm using the BSP model

NARCIS (Netherlands)

Bisseling, R.H.; Inda, M.A.

2000-01-01

In this paper we present a new parallel radix FFT algorithm based on the BSP model Our parallel algorithm uses the groupcyclic distribution family which makes it simple to understand and easy to implement We show how to reduce the com munication cost of the algorithm by a factor of three in the case
A further extension of the Extended Parallel Process Model (E-EPPM): implications of cognitive appraisal theory of emotion and dispositional coping style.

Science.gov (United States)

So, Jiyeon

2013-01-01

For two decades, the extended parallel process model (EPPM; Witte, 1992 ) has been one of the most widely used theoretical frameworks in health risk communication. The model has gained much popularity because it recognizes that, ironically, preceding fear appeal models do not incorporate the concept of fear as a legitimate and central part of them. As a remedy to this situation, the EPPM aims at "putting the fear back into fear appeals" ( Witte, 1992 , p. 330). Despite this attempt, however, this article argues that the EPPM still does not fully capture the essence of fear as an emotion. Specifically, drawing upon Lazarus's (1991 ) cognitive appraisal theory of emotion and the concept of dispositional coping style ( Miller, 1995 ), this article seeks to further extend the EPPM. The revised EPPM incorporates a more comprehensive perspective on risk perceptions as a construct involving both cognitive and affective aspects (i.e., fear and anxiety) and integrates the concept of monitoring and blunting coping style as a moderator of further information seeking regarding a given risk topic.
A Topological Model for Parallel Algorithm Design

Science.gov (United States)

1991-09-01

effort should be directed to planning, requirements analysis, specification and design, with 20% invested into the actual coding, and then the final 40...be olle more language to learn. And by investing the effort into improving the utility of ai, existing language instead of creating a new one, this...193) it abandons the notion of a process as a fundemental concept of parallel program design and that it facilitates program derivation by rigorously
Parallelization of the Coupled Earthquake Model

Science.gov (United States)

Block, Gary; Li, P. Peggy; Song, Yuhe T.

2007-01-01

This Web-based tsunami simulation system allows users to remotely run a model on JPL s supercomputers for a given undersea earthquake. At the time of this reporting, predicting tsunamis on the Internet has never happened before. This new code directly couples the earthquake model and the ocean model on parallel computers and improves simulation speed. Seismometers can only detect information from earthquakes; they cannot detect whether or not a tsunami may occur as a result of the earthquake. When earthquake-tsunami models are coupled with the improved computational speed of modern, high-performance computers and constrained by remotely sensed data, they are able to provide early warnings for those coastal regions at risk. The software is capable of testing NASA s satellite observations of tsunamis. It has been successfully tested for several historical tsunamis, has passed all alpha and beta testing, and is well documented for users.
Model-driven product line engineering for mapping parallel algorithms to parallel computing platforms

NARCIS (Netherlands)

Arkin, Ethem; Tekinerdogan, Bedir

2016-01-01

Mapping parallel algorithms to parallel computing platforms requires several activities such as the analysis of the parallel algorithm, the definition of the logical configuration of the platform, the mapping of the algorithm to the logical configuration platform and the implementation of the
Mathematical Methods and Algorithms of Mobile Parallel Computing on the Base of Multi-core Processors

Directory of Open Access Journals (Sweden)

Alexander B. Bakulev

2012-11-01

Full Text Available This article deals with mathematical models and algorithms, providing mobility of sequential programs parallel representation on the high-level language, presents formal model of operation environment processes management, based on the proposed model of programs parallel representation, presenting computation process on the base of multi-core processors.
Parallel processing algorithms for hydrocodes on a computer with MIMD architecture (DENELCOR's HEP)

International Nuclear Information System (INIS)

Hicks, D.L.

1983-11-01

In real time simulation/prediction of complex systems such as water-cooled nuclear reactors, if reactor operators had fast simulator/predictors to check the consequences of their operations before implementing them, events such as the incident at Three Mile Island might be avoided. However, existing simulator/predictors such as RELAP run slower than real time on serial computers. It appears that the only way to overcome the barrier to higher computing rates is to use computers with architectures that allow concurrent computations or parallel processing. The computer architecture with the greatest degree of parallelism is labeled Multiple Instruction Stream, Multiple Data Stream (MIMD). An example of a machine of this type is the HEP computer by DENELCOR. It appears that hydrocodes are very well suited for parallelization on the HEP. It is a straightforward exercise to parallelize explicit, one-dimensional Lagrangean hydrocodes in a zone-by-zone parallelization. Similarly, implicit schemes can be parallelized in a zone-by-zone fashion via an a priori, symbolic inversion of the tridiagonal matrix that arises in an implicit scheme. These techniques are extended to Eulerian hydrocodes by using Harlow's rezone technique. The extension from single-phase Eulerian to two-phase Eulerian is straightforward. This step-by-step extension leads to hydrocodes with zone-by-zone parallelization that are capable of two-phase flow simulation. Extensions to two and three spatial dimensions can be achieved by operator splitting. It appears that a zone-by-zone parallelization is the best way to utilize the capabilities of an MIMD machine. 40 references
Understanding decimal proportions: discrete representations, parallel access, and privileged processing of zero.

Science.gov (United States)

Varma, Sashank; Karl, Stacy R

2013-05-01

Much of the research on mathematical cognition has focused on the numbers 1, 2, 3, 4, 5, 6, 7, 8, and 9, with considerably less attention paid to more abstract number classes. The current research investigated how people understand decimal proportions--rational numbers between 0 and 1 expressed in the place-value symbol system. The results demonstrate that proportions are represented as discrete structures and processed in parallel. There was a semantic interference effect: When understanding a proportion expression (e.g., "0.29"), both the correct proportion referent (e.g., 0.29) and the incorrect natural number referent (e.g., 29) corresponding to the visually similar natural number expression (e.g., "29") are accessed in parallel, and when these referents lead to conflicting judgments, performance slows. There was also a syntactic interference effect, generalizing the unit-decade compatibility effect for natural numbers: When comparing two proportions, their tenths and hundredths components are processed in parallel, and when the different components lead to conflicting judgments, performance slows. The results also reveal that zero decimals--proportions ending in zero--serve multiple cognitive functions, including eliminating semantic interference and speeding processing. The current research also extends the distance, semantic congruence, and SNARC effects from natural numbers to decimal proportions. These findings inform how people understand the place-value symbol system, and the mental implementation of mathematical symbol systems more generally. Copyright © 2013 Elsevier Inc. All rights reserved.
Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

Science.gov (United States)

Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

2017-08-01

We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU
Image processing with massively parallel computer Quadrics Q1

International Nuclear Information System (INIS)

Della Rocca, A.B.; La Porta, L.; Ferriani, S.

1995-05-01

Aimed to evaluate the image processing capabilities of the massively parallel computer Quadrics Q1, a convolution algorithm that has been implemented is described in this report. At first the discrete convolution mathematical definition is recalled together with the main Q1 h/w and s/w features. Then the different codification forms of the algorythm are described and the Q1 performances are compared with those obtained by different computers. Finally, the conclusions report on main results and suggestions
Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy.

Science.gov (United States)

Penas, David R; González, Patricia; Egea, Jose A; Doallo, Ramón; Banga, Julio R

2017-01-21

The development of large-scale kinetic models is one of the current key issues in computational systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic models. Global optimization methods can be used to solve this type of problems but the associated computational cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters, requiring a number of initial exploratory runs and therefore further increasing the computation times. Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and fine-grained parallelism, and (iii) self-tuning strategies. The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of computation times with respect to several previous state of the art methods (from days to minutes, in several cases) even when only a small number of processors is used. The new parallel cooperative method presented here allows the solution of medium and large scale parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method can play a key role in the development of large-scale and even whole-cell dynamic models.
Simulating electron wave dynamics in graphene superlattices exploiting parallel processing advantages

Science.gov (United States)

Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel

2018-01-01

This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.

Parallel processing for a 1-D time-dependent solution to impurity rate equations for fusion plasma simulations

International Nuclear Information System (INIS)

Veerasingam, R.

1990-01-01

In fusion plasmas impurities such as carbon, oxygen or nickel can contaminate the plasma and cause degradation of the performance of a fusion device through radiation. However, impurities can also be used as diagnostics to obtain information about a plasma through spectroscopic experiments which can then be used in plasma modeling and simulations. In the past, serial algorithms have been described for either the time dependent or steady state problem. In this paper, we describe a parallel procedure adopted to solve the time-dependent problem. It can be shown that for the steady state problem a parallel procedure would not be a useful application of parallelization because a few seconds of the Central Processing Unit time on a CRAY-XMP or IBM 3090/600S would suffice to obtain the solution, while this is not the case for the time-dependent problem. In order to study the effects of low Z and high Z impurities on the final state of a plasma, time-dependent solutions are necessary. For purposes of diagnostics and comparisons with experiments, a fast turn around time of the simulations would be advantageous. We have implemented a parallel algorithm on and IBM 3090/600S and tested its performance for a typical set of fusion plasma parameters. 4 refs., 1 tab
Analysis and Modeling of Circulating Current in Two Parallel-Connected Inverters

DEFF Research Database (Denmark)

Maheshwari, Ram Krishan; Gohil, Ghanshyamsinh Vijaysinh; Bede, Lorand

2015-01-01

Parallel-connected inverters are gaining attention for high power applications because of the limited power handling capability of the power modules. Moreover, the parallel-connected inverters may have low total harmonic distortion of the ac current if they are operated with the interleaved pulse...... this model, the circulating current between two parallel-connected inverters is analysed in this study. The peak and root mean square (rms) values of the normalised circulating current are calculated for different PWM methods, which makes this analysis a valuable tool to design a filter for the circulating......-width modulation (PWM). However, the interleaved PWM causes a circulating current between the inverters, which in turn causes additional losses. A model describing the dynamics of the circulating current is presented in this study which shows that the circulating current depends on the common-mode voltage. Using...
Improvements in fast-response flood modeling: desktop parallel computing and domain tracking

Energy Technology Data Exchange (ETDEWEB)

Judi, David R [Los Alamos National Laboratory; Mcpherson, Timothy N [Los Alamos National Laboratory; Burian, Steven J [UNIV. OF UTAH

2009-01-01

It is becoming increasingly important to have the ability to accurately forecast flooding, as flooding accounts for the most losses due to natural disasters in the world and the United States. Flood inundation modeling has been dominated by one-dimensional approaches. These models are computationally efficient and are considered by many engineers to produce reasonably accurate water surface profiles. However, because the profiles estimated in these models must be superimposed on digital elevation data to create a two-dimensional map, the result may be sensitive to the ability of the elevation data to capture relevant features (e.g. dikes/levees, roads, walls, etc...). Moreover, one-dimensional models do not explicitly represent the complex flow processes present in floodplains and urban environments and because two-dimensional models based on the shallow water equations have significantly greater ability to determine flow velocity and direction, the National Research Council (NRC) has recommended that two-dimensional models be used over one-dimensional models for flood inundation studies. This paper has shown that two-dimensional flood modeling computational time can be greatly reduced through the use of Java multithreading on multi-core computers which effectively provides a means for parallel computing on a desktop computer. In addition, this paper has shown that when desktop parallel computing is coupled with a domain tracking algorithm, significant computation time can be eliminated when computations are completed only on inundated cells. The drastic reduction in computational time shown here enhances the ability of two-dimensional flood inundation models to be used as a near-real time flood forecasting tool, engineering, design tool, or planning tool. Perhaps even of greater significance, the reduction in computation time makes the incorporation of risk and uncertainty/ensemble forecasting more feasible for flood inundation modeling (NRC 2000; Sayers et al
Development Of A Parallel Performance Model For The THOR Neutral Particle Transport Code

Energy Technology Data Exchange (ETDEWEB)

Yessayan, Raffi; Azmy, Yousry; Schunert, Sebastian

2017-02-01

The THOR neutral particle transport code enables simulation of complex geometries for various problems from reactor simulations to nuclear non-proliferation. It is undergoing a thorough V&V requiring computational efficiency. This has motivated various improvements including angular parallelization, outer iteration acceleration, and development of peripheral tools. For guiding future improvements to the code’s efficiency, better characterization of its parallel performance is useful. A parallel performance model (PPM) can be used to evaluate the benefits of modifications and to identify performance bottlenecks. Using INL’s Falcon HPC, the PPM development incorporates an evaluation of network communication behavior over heterogeneous links and a functional characterization of the per-cell/angle/group runtime of each major code component. After evaluating several possible sources of variability, this resulted in a communication model and a parallel portion model. The former’s accuracy is bounded by the variability of communication on Falcon while the latter has an error on the order of 1%.
Mathematical Abstraction: Constructing Concept of Parallel Coordinates

Science.gov (United States)

Nurhasanah, F.; Kusumah, Y. S.; Sabandar, J.; Suryadi, D.

2017-09-01

Mathematical abstraction is an important process in teaching and learning mathematics so pre-service mathematics teachers need to understand and experience this process. One of the theoretical-methodological frameworks for studying this process is Abstraction in Context (AiC). Based on this framework, abstraction process comprises of observable epistemic actions, Recognition, Building-With, Construction, and Consolidation called as RBC + C model. This study investigates and analyzes how pre-service mathematics teachers constructed and consolidated concept of Parallel Coordinates in a group discussion. It uses AiC framework for analyzing mathematical abstraction of a group of pre-service teachers consisted of four students in learning Parallel Coordinates concepts. The data were collected through video recording, students’ worksheet, test, and field notes. The result shows that the students’ prior knowledge related to concept of the Cartesian coordinate has significant role in the process of constructing Parallel Coordinates concept as a new knowledge. The consolidation process is influenced by the social interaction between group members. The abstraction process taken place in this group were dominated by empirical abstraction that emphasizes on the aspect of identifying characteristic of manipulated or imagined object during the process of recognizing and building-with.
Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

Science.gov (United States)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks number of processors
The 2nd Symposium on the Frontiers of Massively Parallel Computations

Science.gov (United States)

Mills, Ronnie (Editor)

1988-01-01

Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
Teaching ethics to engineers: ethical decision making parallels the engineering design process.

Science.gov (United States)

Bero, Bridget; Kuhlman, Alana

2011-09-01

In order to fulfill ABET requirements, Northern Arizona University's Civil and Environmental engineering programs incorporate professional ethics in several of its engineering courses. This paper discusses an ethics module in a 3rd year engineering design course that focuses on the design process and technical writing. Engineering students early in their student careers generally possess good black/white critical thinking skills on technical issues. Engineering design is the first time students are exposed to "grey" or multiple possible solution technical problems. To identify and solve these problems, the engineering design process is used. Ethical problems are also "grey" problems and present similar challenges to students. Students need a practical tool for solving these ethical problems. The step-wise engineering design process was used as a model to demonstrate a similar process for ethical situations. The ethical decision making process of Martin and Schinzinger was adapted for parallelism to the design process and presented to students as a step-wise technique for identification of the pertinent ethical issues, relevant moral theories, possible outcomes and a final decision. Students had greatest difficulty identifying the broader, global issues presented in an ethical situation, but by the end of the module, were better able to not only identify the broader issues, but also to more comprehensively assess specific issues, generate solutions and a desired response to the issue.
The Masterson Approach with play therapy: a parallel process between mother and child.

Science.gov (United States)

Mulherin, M A

2001-01-01

This paper discusses a case in which the Masterson Approach was used with play therapy to treat a child with a developing personality disorder. It describes the parallel progression of the child and mother in adjunct therapy throughout a six-year period. The unique value of the Masterson Approach is that it provides the therapist with a framework and tool to diagnose and treat a child during the dynamic process of play. The case describes the mother-child dyad throughout therapy. It traces their parallel processes that involve separation, individuation, rapprochement, and the recovery of real self-capacities. Each stage of treatment is described, including verbal interventions. The child's internal affective state and intrapsychic structure during the various stages of treatment are illustrated by representative pictures.
Parallel processing implementation for the coupled transport of photons and electrons using OpenMP

Science.gov (United States)

Doerner, Edgardo

2016-05-01

In this work the use of OpenMP to implement the parallel processing of the Monte Carlo (MC) simulation of the coupled transport for photons and electrons is presented. This implementation was carried out using a modified EGSnrc platform which enables the use of the Microsoft Visual Studio 2013 (VS2013) environment, together with the developing tools available in the Intel Parallel Studio XE 2015 (XE2015). The performance study of this new implementation was carried out in a desktop PC with a multi-core CPU, taking as a reference the performance of the original platform. The results were satisfactory, both in terms of scalability as parallelization efficiency.
Research on Gear Shifting Process without Disengaging Clutch for a Parallel Hybrid Electric Vehicle Equipped with AMT

Directory of Open Access Journals (Sweden)

Hui-Long Yu

2014-01-01

Full Text Available Dynamic models of a single-shaft parallel hybrid electric vehicle (HEV equipped with automated mechanical transmission (AMT were described in different working stages during a gear shifting process without disengaging clutch. Parameters affecting the gear shifting time, components life, and gear shifting jerk in different transient states during a gear shifting process were deeply analyzed. The mathematical models considering the detailed synchronizer working process which can explain the gear shifting failure, long time gear shifting, and frequent synchronizer failure phenomenon in HEV were derived. Dynamic coordinated control strategy of the engine, motor, and actuators in different transient states considering the detailed working stages of synchronizer in a gear shifting process of a HEV is for the first time innovatively proposed according to the state of art references. Bench test and real road test results show that the proposed control strategy can improve the gear shifting quality in all its evaluation indexes significantly.
Optimal parallel algorithms for problems modeled by a family of intervals

Science.gov (United States)

Olariu, Stephan; Schwing, James L.; Zhang, Jingyuan

1992-01-01

A family of intervals on the real line provides a natural model for a vast number of scheduling and VLSI problems. Recently, a number of parallel algorithms to solve a variety of practical problems on such a family of intervals have been proposed in the literature. Computational tools are developed, and it is shown how they can be used for the purpose of devising cost-optimal parallel algorithms for a number of interval-related problems including finding a largest subset of pairwise nonoverlapping intervals, a minimum dominating subset of intervals, along with algorithms to compute the shortest path between a pair of intervals and, based on the shortest path, a parallel algorithm to find the center of the family of intervals. More precisely, with an arbitrary family of n intervals as input, all algorithms run in O(log n) time using O(n) processors in the EREW-PRAM model of computation.
Development of whole core thermal-hydraulic analysis program ACT. 4. Simplified fuel assembly model and parallelization by MPI

International Nuclear Information System (INIS)

Ohshima, Hiroyuki

2001-10-01

A whole core thermal-hydraulic analysis program ACT is being developed for the purpose of evaluating detailed in-core thermal hydraulic phenomena of fast reactors including the effect of the flow between wrapper-tube walls (inter-wrapper flow) under various reactor operation conditions. As appropriate boundary conditions in addition to a detailed modeling of the core are essential for accurate simulations of in-core thermal hydraulics, ACT consists of not only fuel assembly and inter-wrapper flow analysis modules but also a heat transport system analysis module that gives response of the plant dynamics to the core model. This report describes incorporation of a simplified model to the fuel assembly analysis module and program parallelization by a message passing method toward large-scale simulations. ACT has a fuel assembly analysis module which can simulate a whole fuel pin bundle in each fuel assembly of the core and, however, it may take much CPU time for a large-scale core simulation. Therefore, a simplified fuel assembly model that is thermal-hydraulically equivalent to the detailed one has been incorporated in order to save the simulation time and resources. This simplified model is applied to several parts of fuel assemblies in a core where the detailed simulation results are not required. With regard to the program parallelization, the calculation load and the data flow of ACT were analyzed and the optimum parallelization has been done including the improvement of the numerical simulation algorithm of ACT. Message Passing Interface (MPI) is applied to data communication between processes and synchronization in parallel calculations. Parallelized ACT was verified through a comparison simulation with the original one. In addition to the above works, input manuals of the core analysis module and the heat transport system analysis module have been prepared. (author)
Distributed system for parallel data processing of ECT signals for electromagnetic flaw detection in materials

International Nuclear Information System (INIS)

Guliashki, Vassil; Marinova, Galia

2002-01-01

The paper proposes a distributed system for parallel data processing of ECT signals for flaw detection in materials. The measured data are stored in files on a host computer, where a JAVA server is located. The host computer is connected through Internet to a set of client computers, distributed geographically. The data are distributed from the host computer by means of the JAVA server to the client computers according their requests. The software necessary for the data processing is installed on each client computer in advance. The organization of the data processing on many computers, working simultaneously in parallel, leads to great time reducing, especially in cases when huge amount of data should be processed in very short time. (Author)
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

Science.gov (United States)

Nadkarni, P M; Miller, P L

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
A Parallel, Multi-Scale Watershed-Hydrologic-Inundation Model with Adaptively Switching Mesh for Capturing Flooding and Lake Dynamics

Science.gov (United States)

Ji, X.; Shen, C.

2017-12-01

Flood inundation presents substantial societal hazards and also changes biogeochemistry for systems like the Amazon. It is often expensive to simulate high-resolution flood inundation and propagation in a long-term watershed-scale model. Due to the Courant-Friedrichs-Lewy (CFL) restriction, high resolution and large local flow velocity both demand prohibitively small time steps even for parallel codes. Here we develop a parallel surface-subsurface process-based model enhanced by multi-resolution meshes that are adaptively switched on or off. The high-resolution overland flow meshes are enabled only when the flood wave invades to floodplains. This model applies semi-implicit, semi-Lagrangian (SISL) scheme in solving dynamic wave equations, and with the assistant of the multi-mesh method, it also adaptively chooses the dynamic wave equation only in the area of deep inundation. Therefore, the model achieves a balance between accuracy and computational cost.
The island dynamics model on parallel quadtree grids

Science.gov (United States)

Mistani, Pouria; Guittet, Arthur; Bochkov, Daniil; Schneider, Joshua; Margetis, Dionisios; Ratsch, Christian; Gibou, Frederic

2018-05-01

We introduce an approach for simulating epitaxial growth by use of an island dynamics model on a forest of quadtree grids, and in a parallel environment. To this end, we use a parallel framework introduced in the context of the level-set method. This framework utilizes: discretizations that achieve a second-order accurate level-set method on non-graded adaptive Cartesian grids for solving the associated free boundary value problem for surface diffusion; and an established library for the partitioning of the grid. We consider the cases with: irreversible aggregation, which amounts to applying Dirichlet boundary conditions at the island boundary; and an asymmetric (Ehrlich-Schwoebel) energy barrier for attachment/detachment of atoms at the island boundary, which entails the use of a Robin boundary condition. We provide the scaling analyses performed on the Stampede supercomputer and numerical examples that illustrate the capability of our methodology to efficiently simulate different aspects of epitaxial growth. The combination of adaptivity and parallelism in our approach enables simulations that are several orders of magnitude faster than those reported in the recent literature and, thus, provides a viable framework for the systematic study of mound formation on crystal surfaces.
Exploiting Thread Parallelism for Ocean Modeling on Cray XC Supercomputers

Energy Technology Data Exchange (ETDEWEB)

Sarje, Abhinav [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Jacobsen, Douglas W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Williams, Samuel W. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Ringler, Todd [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Oliker, Leonid [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

2016-05-01

The incorporation of increasing core counts in modern processors used to build state-of-the-art supercomputers is driving application development towards exploitation of thread parallelism, in addition to distributed memory parallelism, with the goal of delivering efficient high-performance codes. In this work we describe the exploitation of threading and our experiences with it with respect to a real-world ocean modeling application code, MPAS-Ocean. We present detailed performance analysis and comparisons of various approaches and configurations for threading on the Cray XC series supercomputers.
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems

Directory of Open Access Journals (Sweden)

Loredana MOCEAN

2009-01-01

Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
Density functional theory and parallel processing

International Nuclear Information System (INIS)

Ward, R.C.; Geist, G.A.; Butler, W.H.

1987-01-01

The authors demonstrate a method for obtaining the ground state energies and charge densities of a system of atoms described within density functional theory using simulated annealing on a parallel computer

OpenMP parallelization of a gridded SWAT (SWATG)

Science.gov (United States)

Zhang, Ying; Hou, Jinliang; Cao, Yongpan; Gu, Juan; Huang, Chunlin

2017-12-01

Large-scale, long-term and high spatial resolution simulation is a common issue in environmental modeling. A Gridded Hydrologic Response Unit (HRU)-based Soil and Water Assessment Tool (SWATG) that integrates grid modeling scheme with different spatial representations also presents such problems. The time-consuming problem affects applications of very high resolution large-scale watershed modeling. The OpenMP (Open Multi-Processing) parallel application interface is integrated with SWATG (called SWATGP) to accelerate grid modeling based on the HRU level. Such parallel implementation takes better advantage of the computational power of a shared memory computer system. We conducted two experiments at multiple temporal and spatial scales of hydrological modeling using SWATG and SWATGP on a high-end server. At 500-m resolution, SWATGP was found to be up to nine times faster than SWATG in modeling over a roughly 2000 km2 watershed with 1 CPU and a 15 thread configuration. The study results demonstrate that parallel models save considerable time relative to traditional sequential simulation runs. Parallel computations of environmental models are beneficial for model applications, especially at large spatial and temporal scales and at high resolutions. The proposed SWATGP model is thus a promising tool for large-scale and high-resolution water resources research and management in addition to offering data fusion and model coupling ability.
Exact stationary state for an asymmetric exclusion process with fully parallel dynamics

NARCIS (Netherlands)

Gier, J.C.|info:eu-repo/dai/nl/170218430; Nienhuis, B.

The exact stationary state of an asymmetric exclusion process with fully parallel dynamics is obtained using the matrix product ansatz. We give a simple derivation for the deterministic case by a physical interpretation of the dimension of the matrices. We prove the stationarity via a cancellation
Performance of MPI parallel processing implemented by MCNP5/ MCNPX for criticality benchmark problems

International Nuclear Information System (INIS)

Mark Dennis Usang; Mohd Hairie Rabir; Mohd Amin Sharifuldin Salleh; Mohamad Puad Abu

2012-01-01

MPI parallelism are implemented on a SUN Workstation for running MCNPX and on the High Performance Computing Facility (HPC) for running MCNP5. 23 input less obtained from MCNP Criticality Validation Suite are utilized for the purpose of evaluating the amount of speed up achievable by using the parallel capabilities of MPI. More importantly, we will study the economics of using more processors and the type of problem where the performance gain are obvious. This is important to enable better practices of resource sharing especially for the HPC facilities processing time. Future endeavours in this direction might even reveal clues for best MCNP5/ MCNPX coding practices for optimum performance of MPI parallelisms. (author)
Parallel sorting algorithms

CERN Document Server

Akl, Selim G

1985-01-01

Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the
Parallel Solution of Robust Nonlinear Model Predictive Control Problems in Batch Crystallization

Directory of Open Access Journals (Sweden)

Yankai Cao

2016-06-01

Full Text Available Representing the uncertainties with a set of scenarios, the optimization problem resulting from a robust nonlinear model predictive control (NMPC strategy at each sampling instance can be viewed as a large-scale stochastic program. This paper solves these optimization problems using the parallel Schur complement method developed to solve stochastic programs on distributed and shared memory machines. The control strategy is illustrated with a case study of a multidimensional unseeded batch crystallization process. For this application, a robust NMPC based on min–max optimization guarantees satisfaction of all state and input constraints for a set of uncertainty realizations, and also provides better robust performance compared with open-loop optimal control, nominal NMPC, and robust NMPC minimizing the expected performance at each sampling instance. The performance of robust NMPC can be improved by generating optimization scenarios using Bayesian inference. With the efficient parallel solver, the solution time of one optimization problem is reduced from 6.7 min to 0.5 min, allowing for real-time application.
Unified dataflow model for the analysis of data and pipeline parallelism, and buffer sizing

NARCIS (Netherlands)

Hausmans, J.P.H.M.; Geuns, S.J.; Wiggers, M.H.; Bekooij, Marco Jan Gerrit

2014-01-01

Real-time stream processing applications such as software defined radios are usually executed concurrently on multiprocessor systems. Exploiting coarse-grained data parallelism by duplicating tasks is often required, besides pipeline parallelism, to meet the temporal constraints of the applications.
The concept of parallel input/output processing for an electron linac

International Nuclear Information System (INIS)

Emoto, Takashi

1993-01-01

The instrumentation of and the control system for the PNC 10 MeV CW electron linac are described. A new concept of parallel input/output processing for the linac has been introduced. It is based on a substantial number of input/output processors(IOP) using beam control and diagnostics. The flexibility and simplicity of hardware/software are significant advantages with this scheme. (author)
Parallel asynchronous hardware implementation of image processing algorithms

Science.gov (United States)

Coon, Darryl D.; Perera, A. G. U.

1990-01-01

Research is being carried out on hardware for a new approach to focal plane processing. The hardware involves silicon injection mode devices. These devices provide a natural basis for parallel asynchronous focal plane image preprocessing. The simplicity and novel properties of the devices would permit an independent analog processing channel to be dedicated to every pixel. A laminar architecture built from arrays of the devices would form a two-dimensional (2-D) array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuron-like asynchronous pulse-coded form through the laminar processor. No multiplexing, digitization, or serial processing would occur in the preprocessing state. High performance is expected, based on pulse coding of input currents down to one picoampere with noise referred to input of about 10 femtoamperes. Linear pulse coding has been observed for input currents ranging up to seven orders of magnitude. Low power requirements suggest utility in space and in conjunction with very large arrays. Very low dark current and multispectral capability are possible because of hardware compatibility with the cryogenic environment of high performance detector arrays. The aforementioned hardware development effort is aimed at systems which would integrate image acquisition and image processing.
Verification of Electromagnetic Physics Models for Parallel Computing Architectures in the GeantV Project

Energy Technology Data Exchange (ETDEWEB)

Amadio, G.; et al.

2017-11-22

An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physics models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.
Morphological evidence for parallel processing of information in rat macula

Science.gov (United States)

Ross, M. D.

1988-01-01

Study of montages, tracings and reconstructions prepared from a series of 570 consecutive ultrathin sections shows that rat maculas are morphologically organized for parallel processing of linear acceleratory information. Type II cells of one terminal field distribute information to neighboring terminals as well. The findings are examined in light of physiological data which indicate that macular receptor fields have a preferred directional vector, and are interpreted by analogy to a computer technology known as an information network.
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

Science.gov (United States)

Rao, Hariprasad Nannapaneni

1989-01-01

The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Massively parallel multicanonical simulations

Science.gov (United States)

Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

2018-03-01

Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
Integrated Task And Data Parallel Programming: Language Design

Science.gov (United States)

Grimshaw, Andrew S.; West, Emily A.

1998-01-01

his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated
Modelling distribution of evaporating CO2 in parallel minichannels

DEFF Research Database (Denmark)

Brix, Wiebke; Kærn, Martin Ryhl; Elmegaard, Brian

2010-01-01

The effects of airflow non-uniformity and uneven inlet qualities on the performance of a minichannel evaporator with parallel channels, using CO2 as refrigerant, are investigated numerically. For this purpose a one-dimensional discretised steady-state model was developed, applying well-known empi......The effects of airflow non-uniformity and uneven inlet qualities on the performance of a minichannel evaporator with parallel channels, using CO2 as refrigerant, are investigated numerically. For this purpose a one-dimensional discretised steady-state model was developed, applying well...... to maldistribution of the refrigerant and considerable capacity reduction of the evaporator. Uneven inlet ualities to the different channels show only minor effects on the refrigerant distribution and evaporator capacity as long as the channels are vertically oriented with CO2 flowing upwards. For horizontal...... channels capacity reductions are found for both non-uniform airflow and uneven inlet qualities. For horizontal minichannels the results are very similar to those obtained using R134a as refrigerant....
Frog: Asynchronous Graph Processing on GPU with Hybrid Coloring Model

Energy Technology Data Exchange (ETDEWEB)

Shi, Xuanhua; Luo, Xuan; Liang, Junling; Zhao, Peng; Di, Sheng; He, Bingsheng; Jin, Hai

2018-01-01

GPUs have been increasingly used to accelerate graph processing for complicated computational problems regarding graph theory. Many parallel graph algorithms adopt the asynchronous computing model to accelerate the iterative convergence. Unfortunately, the consistent asynchronous computing requires locking or atomic operations, leading to significant penalties/overheads when implemented on GPUs. As such, coloring algorithm is adopted to separate the vertices with potential updating conflicts, guaranteeing the consistency/correctness of the parallel processing. Common coloring algorithms, however, may suffer from low parallelism because of a large number of colors generally required for processing a large-scale graph with billions of vertices. We propose a light-weight asynchronous processing framework called Frog with a preprocessing/hybrid coloring model. The fundamental idea is based on Pareto principle (or 80-20 rule) about coloring algorithms as we observed through masses of realworld graph coloring cases. We find that a majority of vertices (about 80%) are colored with only a few colors, such that they can be read and updated in a very high degree of parallelism without violating the sequential consistency. Accordingly, our solution separates the processing of the vertices based on the distribution of colors. In this work, we mainly answer three questions: (1) how to partition the vertices in a sparse graph with maximized parallelism, (2) how to process large-scale graphs that cannot fit into GPU memory, and (3) how to reduce the overhead of data transfers on PCIe while processing each partition. We conduct experiments on real-world data (Amazon, DBLP, YouTube, RoadNet-CA, WikiTalk and Twitter) to evaluate our approach and make comparisons with well-known non-preprocessed (such as Totem, Medusa, MapGraph and Gunrock) and preprocessed (Cusha) approaches, by testing four classical algorithms (BFS, PageRank, SSSP and CC). On all the tested applications and
Next Generation Parallelization Systems for Processing and Control of PDS Image Node Assets

Science.gov (United States)

Verma, R.

2017-06-01

We present next-generation parallelization tools to help Planetary Data System (PDS) Imaging Node (IMG) better monitor, process, and control changes to nearly 650 million file assets and over a dozen machines on which they are referenced or stored.
PARALLEL ADAPTIVE MULTILEVEL SAMPLING ALGORITHMS FOR THE BAYESIAN ANALYSIS OF MATHEMATICAL MODELS

KAUST Repository

Prudencio, Ernesto; Cheung, Sai Hung

2012-01-01

In recent years, Bayesian model updating techniques based on measured data have been applied to many engineering and applied science problems. At the same time, parallel computational platforms are becoming increasingly more powerful and are being used more frequently by the engineering and scientific communities. Bayesian techniques usually require the evaluation of multi-dimensional integrals related to the posterior probability density function (PDF) of uncertain model parameters. The fact that such integrals cannot be computed analytically motivates the research of stochastic simulation methods for sampling posterior PDFs. One such algorithm is the adaptive multilevel stochastic simulation algorithm (AMSSA). In this paper we discuss the parallelization of AMSSA, formulating the necessary load balancing step as a binary integer programming problem. We present a variety of results showing the effectiveness of load balancing on the overall performance of AMSSA in a parallel computational environment.
Evolution of a minimal parallel programming model

International Nuclear Information System (INIS)

Lusk, Ewing; Butler, Ralph; Pieper, Steven C.

2017-01-01

Here, we take a historical approach to our presentation of self-scheduled task parallelism, a programming model with its origins in early irregular and nondeterministic computations encountered in automated theorem proving and logic programming. We show how an extremely simple task model has evolved into a system, asynchronous dynamic load balancing (ADLB), and a scalable implementation capable of supporting sophisticated applications on today’s (and tomorrow’s) largest supercomputers; and we illustrate the use of ADLB with a Green’s function Monte Carlo application, a modern, mature nuclear physics code in production use. Our lesson is that by surrendering a certain amount of generality and thus applicability, a minimal programming model (in terms of its basic concepts and the size of its application programmer interface) can achieve extreme scalability without introducing complexity.
Parallelization in Modern C++

CERN Multimedia

CERN. Geneva

2016-01-01

The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...
Massively Parallel Assimilation of TOGA/TAO and Topex/Poseidon Measurements into a Quasi Isopycnal Ocean General Circulation Model Using an Ensemble Kalman Filter

Science.gov (United States)

Keppenne, Christian L.; Rienecker, Michele; Borovikov, Anna Y.; Suarez, Max

1999-01-01

A massively parallel ensemble Kalman filter (EnKF)is used to assimilate temperature data from the TOGA/TAO array and altimetry from TOPEX/POSEIDON into a Pacific basin version of the NASA Seasonal to Interannual Prediction Project (NSIPP)ls quasi-isopycnal ocean general circulation model. The EnKF is an approximate Kalman filter in which the error-covariance propagation step is modeled by the integration of multiple instances of a numerical model. An estimate of the true error covariances is then inferred from the distribution of the ensemble of model state vectors. This inplementation of the filter takes advantage of the inherent parallelism in the EnKF algorithm by running all the model instances concurrently. The Kalman filter update step also occurs in parallel by having each processor process the observations that occur in the region of physical space for which it is responsible. The massively parallel data assimilation system is validated by withholding some of the data and then quantifying the extent to which the withheld information can be inferred from the assimilation of the remaining data. The distributions of the forecast and analysis error covariances predicted by the ENKF are also examined.

The effects of fear appeal message repetition on perceived threat, perceived efficacy, and behavioral intention in the extended parallel process model.

Science.gov (United States)

Shi, Jingyuan Jolie; Smith, Sandi W

2016-01-01

This study examined the effect of moderately repeated exposure (three times) to a fear appeal message on the Extended Parallel Processing Model (EPPM) variables of threat, efficacy, and behavioral intentions for the recommended behaviors in the message, as well as the proportions of systematic and message-related thoughts generated after each message exposure. The results showed that after repeated exposure to a fear appeal message about preventing melanoma, perceived threat in terms of susceptibility and perceived efficacy in terms of response efficacy significantly increased. The behavioral intentions of all recommended behaviors did not change after repeated exposure to the message. However, after the second exposure the proportions of both systematic and all message-related thoughts (relative to total thoughts) significantly decreased while the proportion of heuristic thoughts significantly increased, and this pattern held after the third exposure. The findings demonstrated that the predictions in the EPPM are likely to be operative after three exposures to a persuasive message.
Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU

Directory of Open Access Journals (Sweden)

Yong Xia

2015-01-01

Full Text Available Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation and the other is the diffusion term of the monodomain model (partial differential equation. Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations.
Parallel algorithms for mapping pipelined and parallel computations

Science.gov (United States)

Nicol, David M.

1988-01-01

Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Parallel performance of TORT on the CRAY J90: Model and measurement

International Nuclear Information System (INIS)

Barnett, A.; Azmy, Y.Y.

1997-10-01

A limitation on the parallel performance of TORT on the CRAY J90 is the amount of extra work introduced by the multitasking algorithm itself. The extra work beyond that of the serial version of the code, called overhead, arises from the synchronization of the parallel tasks and the accumulation of results by the master task. The goal of recent updates to TORT was to reduce the time consumed by these activities. To help understand which components of the multitasking algorithm contribute significantly to the overhead, a parallel performance model was constructed and compared to measurements of actual timings of the code
Implementing parallel spreadsheet models for health policy decisions: The impact of unintentional errors on model projections.

Science.gov (United States)

Bailey, Stephanie L; Bono, Rose S; Nash, Denis; Kimmel, April D

2018-01-01

Spreadsheet software is increasingly used to implement systems science models informing health policy decisions, both in academia and in practice where technical capacity may be limited. However, spreadsheet models are prone to unintentional errors that may not always be identified using standard error-checking techniques. Our objective was to illustrate, through a methodologic case study analysis, the impact of unintentional errors on model projections by implementing parallel model versions. We leveraged a real-world need to revise an existing spreadsheet model designed to inform HIV policy. We developed three parallel versions of a previously validated spreadsheet-based model; versions differed by the spreadsheet cell-referencing approach (named single cells; column/row references; named matrices). For each version, we implemented three model revisions (re-entry into care; guideline-concordant treatment initiation; immediate treatment initiation). After standard error-checking, we identified unintentional errors by comparing model output across the three versions. Concordant model output across all versions was considered error-free. We calculated the impact of unintentional errors as the percentage difference in model projections between model versions with and without unintentional errors, using +/-5% difference to define a material error. We identified 58 original and 4,331 propagated unintentional errors across all model versions and revisions. Over 40% (24/58) of original unintentional errors occurred in the column/row reference model version; most (23/24) were due to incorrect cell references. Overall, >20% of model spreadsheet cells had material unintentional errors. When examining error impact along the HIV care continuum, the percentage difference between versions with and without unintentional errors ranged from +3% to +16% (named single cells), +26% to +76% (column/row reference), and 0% (named matrices). Standard error-checking techniques may not
Multibus-based parallel processor for simulation

Science.gov (United States)

Ogrady, E. P.; Wang, C.-H.

1983-01-01

A Multibus-based parallel processor simulation system is described. The system is intended to serve as a vehicle for gaining hands-on experience, testing system and application software, and evaluating parallel processor performance during development of a larger system based on the horizontal/vertical-bus interprocessor communication mechanism. The prototype system consists of up to seven Intel iSBC 86/12A single-board computers which serve as processing elements, a multiple transmission controller (MTC) designed to support system operation, and an Intel Model 225 Microcomputer Development System which serves as the user interface and input/output processor. All components are interconnected by a Multibus/IEEE 796 bus. An important characteristic of the system is that it provides a mechanism for a processing element to broadcast data to other selected processing elements. This parallel transfer capability is provided through the design of the MTC and a minor modification to the iSBC 86/12A board. The operation of the MTC, the basic hardware-level operation of the system, and pertinent details about the iSBC 86/12A and the Multibus are described.
Co-development of Problem Gambling and Depression Symptoms in Emerging Adults: A Parallel-Process Latent Class Growth Model.

Science.gov (United States)

Edgerton, Jason D; Keough, Matthew T; Roberts, Lance W

2018-02-21

This study examines whether there are multiple joint trajectories of depression and problem gambling co-development in a sample of emerging adults. Data were from the Manitoba Longitudinal Study of Young Adults (n = 679), which was collected in 4 waves across 5 years (age 18-20 at baseline). Parallel process latent class growth modeling was used to identified 5 joint trajectory classes: low decreasing gambling, low increasing depression (81%); low stable gambling, moderate decreasing depression (9%); low stable gambling, high decreasing depression (5%); low stable gambling, moderate stable depression (3%); moderate stable problem gambling, no depression (2%). There was no evidence of reciprocal growth in problem gambling and depression in any of the joint classes. Multinomial logistic regression analyses of baseline risk and protective factors found that only neuroticism, escape-avoidance coping, and perceived level of family social support were significant predictors of joint trajectory class membership. Consistent with the pathways model framework, we observed that individuals in the problem gambling only class were more likely using gambling as a stable way to cope with negative emotions. Similarly, high levels of neuroticism and low levels of family support were associated with increased odds of being in a class with moderate to high levels of depressive symptoms (but low gambling problems). The results suggest that interventions for problem gambling and/or depression need to focus on promoting more adaptive coping skills among more "at-risk" young adults, and such interventions should be tailored in relation to specific subtypes of comorbid mental illness.
Evaluating parallel optimization on transputers

Directory of Open Access Journals (Sweden)

A.G. Chalmers

2003-12-01

Full Text Available The faster processing power of modern computers and the development of efficient algorithms have made it possible for operations researchers to tackle a much wider range of problems than ever before. Further improvements in processing speed can be achieved utilising relatively inexpensive transputers to process components of an algorithm in parallel. The Davidon-Fletcher-Powell method is one of the most successful and widely used optimisation algorithms for unconstrained problems. This paper examines the algorithm and identifies the components that can be processed in parallel. The results of some experiments with these components are presented which indicates under what conditions parallel processing with an inexpensive configuration is likely to be faster than the traditional sequential implementations. The performance of the whole algorithm with its parallel components is then compared with the original sequential algorithm. The implementation serves to illustrate the practicalities of speeding up typical OR algorithms in terms of difficulty, effort and cost. The results give an indication of the savings in time a given parallel implementation can be expected to yield.
Analysis of clinical complication data for radiation hepatitis using a parallel architecture model

International Nuclear Information System (INIS)

Jackson, A.; Haken, R.K. ten; Robertson, J.M.; Kessler, M.L.; Kutcher, G.J.; Lawrence, T.S.

1995-01-01

Purpose: The detailed knowledge of dose volume distributions available from the three-dimensional (3D) conformal radiation treatment of tumors in the liver (reported elsewhere) offers new opportunities to quantify the effect of volume on the probability of producing radiation hepatitis. We aim to test a new parallel architecture model of normal tissue complication probability (NTCP) with these data. Methods and Materials: Complication data and dose volume histograms from a total of 93 patients with normal liver function, treated on a prospective protocol with 3D conformal radiation therapy and intraarterial hepatic fluorodeoxyuridine, were analyzed with a new parallel architecture model. Patient treatment fell into six categories differing in doses delivered and volumes irradiated. By modeling the radiosensitivity of liver subunits, we are able to use dose volume histograms to calculate the fraction of the liver damaged in each patient. A complication results if this fraction exceeds the patient's functional reserve. To determine the patient distribution of functional reserves and the subunit radiosensitivity, the maximum likelihood method was used to fit the observed complication data. Results: The parallel model fit the complication data well, although uncertainties on the functional reserve distribution and subunit radiosensitivy are highly correlated. Conclusion: The observed radiation hepatitis complications show a threshold effect that can be described well with a parallel architecture model. However, additional independent studies are required to better determine the parameters defining the functional reserve distribution and subunit radiosensitivity
Reduced-Order Structure-Preserving Model for Parallel-Connected Three-Phase Grid-Tied Inverters: Preprint

Energy Technology Data Exchange (ETDEWEB)

Johnson, Brian B [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Purba, Victor [University of Minnesota; Jafarpour, Saber [University of California, Santa Barbara; Bullo, Francesco [University of California, Santa Barbara; Dhople, Sairaj [University of Minnesota

2017-08-31

Given that next-generation infrastructures will contain large numbers of grid-connected inverters and these interfaces will be satisfying a growing fraction of system load, it is imperative to analyze the impacts of power electronics on such systems. However, since each inverter model has a relatively large number of dynamic states, it would be impractical to execute complex system models where the full dynamics of each inverter are retained. To address this challenge, we derive a reduced-order structure-preserving model for parallel-connected grid-tied three-phase inverters. Here, each inverter in the system is assumed to have a full-bridge topology, LCL filter at the point of common coupling, and the control architecture for each inverter includes a current controller, a power controller, and a phase-locked loop for grid synchronization. We outline a structure-preserving reduced-order inverter model for the setting where the parallel inverters are each designed such that the filter components and controller gains scale linearly with the power rating. By structure preserving, we mean that the reduced-order three-phase inverter model is also composed of an LCL filter, a power controller, current controller, and PLL. That is, we show that the system of parallel inverters can be modeled exactly as one aggregated inverter unit and this equivalent model has the same number of dynamical states as an individual inverter in the paralleled system. Numerical simulations validate the reduced-order models.
PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

Energy Technology Data Exchange (ETDEWEB)

Lichtner, Peter C. [OFM Research, Redmond, WA (United States); Hammond, Glenn E. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Lu, Chuan [Idaho National Lab. (INL), Idaho Falls, ID (United States); Karra, Satish [Los Alamos National Lab. (LANL), Los Alamos, NM (United States); Bisht, Gautam [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Andre, Benjamin [National Center for Atmospheric Research, Boulder, CO (United States); Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Mills, Richard [Intel Corporation, Portland, OR (United States); Univ. of Tennessee, Knoxville, TN (United States); Kumar, Jitendra [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

2015-01-20

PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Written in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2³² = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.
New physics beyond the standard model of particle physics and parallel universes

Energy Technology Data Exchange (ETDEWEB)

Plaga, R. [Franzstr. 40, 53111 Bonn (Germany)]. E-mail: rainer.plaga@gmx.de

2006-03-09

It is shown that if-and only if-'parallel universes' exist, an electroweak vacuum that is expected to have decayed since the big bang with a high probability might exist. It would neither necessarily render our existence unlikely nor could it be observed. In this special case the observation of certain combinations of Higgs-boson and top-quark masses-for which the standard model predicts such a decay-cannot be interpreted as evidence for new physics at low energy scales. The question of whether parallel universes exist is of interest to our understanding of the standard model of particle physics.
Algorithmically specialized parallel computers

CERN Document Server

Snyder, Lawrence; Gannon, Dennis B

1985-01-01

Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster
A Review of Parallel Processing Approaches to Robot Kinematics and Jacobian

OpenAIRE

Henrich, Dominik; Karl, Joachim; Wörn, Heinz

1997-01-01

Due to continuously increasing demands in the area of advanced robot control, it became necessary to speed up the computation. One way to reduce the computation time is to distribute the computation onto several processing units. In this survey we present different approaches to parallel computation of robot kinematics and Jacobian. Thereby, we discuss both the forward and the reverse problem. We introduce a classification scheme and class...
Data structures and languages in support of parallel image processing for astronomy

International Nuclear Information System (INIS)

Tanimoto, S.L.

1985-01-01

This paper discusses data structures, and aspects of programming languages and systems that are relevant to image processing of astronomy data. Emphasis is on image processing computations, because this kind of data processing is obviously a ripe one for parallelism and is important in astronomy. However, some discussion of general possibilities are also presented. The role of algorithms is examined since they are not dependent on a particular language. As an implementation of an algorithm a program is equally tied to data structure, operations, architecture and language, and therefore the issue of programming resides in the center of the tetrahedron
Connectionist Models and Parallelism in High Level Vision.

Science.gov (United States)

1985-01-01

GRANT NUMBER(s) Jerome A. Feldman N00014-82-K-0193 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENt. PROJECT, TASK Computer Science...Connectionist Models 2.1 Background and Overviev % Computer science is just beginning to look seriously at parallel computation : it may turn out that...the chair. The program includes intermediate level networks that compute more complex joints and ones that compute parallelograms in the image. These
Design of parallel intersector weld/cut robot for machining processes in ITER vacuum vessel

International Nuclear Information System (INIS)

Wu Huapeng; Handroos, Heikki; Kovanen, Janne; Rouvinen, Asko; Hannukainen, Petri; Saira, Tanja; Jones, Lawrence

2003-01-01

This paper presents a new parallel robot Penta-WH, which has five degrees of freedom driven by hydraulic cylinders. The manipulator has a large, singularity-free workspace and high stiffness and it acts as a transport device for welding, machining and inspection end-effectors inside the ITER vacuum vessel. The presented kinematic structure of a parallel robot is particularly suitable for the ITER environment. Analysis of the machining process for ITER, such as the machining methods and forces are given, and the kinematic analyses, such as workspace and force capacity are discussed
Resistance to awareness of the supervisor's transferences with special reference to the parallel process.

Science.gov (United States)

Stimmel, B

1995-06-01

Supervision is an essential part of psychoanalytic education. Although not taken for granted, it is not studied with the same critical eye as is the analytic process. This paper examines the supervision specifically with a focus on the supervisor's transference towards the supervisee. The point is made, in the context of clinical examples, that one of the ways these transference reactions may be rationalised is within the setting of the parallel process so often encountered in supervision. Parallel process, a very familiar term, is used frequently and easily when discussing supervision. It may be used also as a resistance to awareness of transference phenomena within the supervisor in relation to the supervisee, particularly because of its clinical presentation. It is an enactment between supervisor and supervisee, thus ripe with possibilities for disguise, displacement and gratification. While transference reactions of the supervisee are often discussed, those of the supervisor are notably missing in our literature.
Reconstruction for Time-Domain In Vivo EPR 3D Multigradient Oximetric Imaging—A Parallel Processing Perspective

Directory of Open Access Journals (Sweden)

Christopher D. Dharmaraj

2009-01-01

Full Text Available Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23×23×23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet. The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.
Reconstruction for time-domain in vivo EPR 3D multigradient oximetric imaging--a parallel processing perspective.

Science.gov (United States)

Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C

2009-01-01

Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time.

Using Hadoop MapReduce for Parallel Genetic Algorithms: A Comparison of the Global, Grid and Island Models.

Science.gov (United States)

Ferrucci, Filomena; Salza, Pasquale; Sarro, Federica

2017-06-29

The need to improve the scalability of Genetic Algorithms (GAs) has motivated the research on Parallel Genetic Algorithms (PGAs), and different technologies and approaches have been used. Hadoop MapReduce represents one of the most mature technologies to develop parallel algorithms. Based on the fact that parallel algorithms introduce communication overhead, the aim of the present work is to understand if, and possibly when, the parallel GAs solutions using Hadoop MapReduce show better performance than sequential versions in terms of execution time. Moreover, we are interested in understanding which PGA model can be most effective among the global, grid, and island models. We empirically assessed the performance of these three parallel models with respect to a sequential GA on a software engineering problem, evaluating the execution time and the achieved speedup. We also analysed the behaviour of the parallel models in relation to the overhead produced by the use of Hadoop MapReduce and the GAs' computational effort, which gives a more machine-independent measure of these algorithms. We exploited three problem instances to differentiate the computation load and three cluster configurations based on 2, 4, and 8 parallel nodes. Moreover, we estimated the costs of the execution of the experimentation on a potential cloud infrastructure, based on the pricing of the major commercial cloud providers. The empirical study revealed that the use of PGA based on the island model outperforms the other parallel models and the sequential GA for all the considered instances and clusters. Using 2, 4, and 8 nodes, the island model achieves an average speedup over the three datasets of 1.8, 3.4, and 7.0 times, respectively. Hadoop MapReduce has a set of different constraints that need to be considered during the design and the implementation of parallel algorithms. The overhead of data store (i.e., HDFS) accesses, communication, and latency requires solutions that reduce data store
Final Report: Center for Programming Models for Scalable Parallel Computing

Energy Technology Data Exchange (ETDEWEB)

Mellor-Crummey, John [William Marsh Rice University

2011-09-13

As part of the Center for Programming Models for Scalable Parallel Computing, Rice University collaborated with project partners in the design, development and deployment of language, compiler, and runtime support for parallel programming models to support application development for the “leadership-class” computer systems at DOE national laboratories. Work over the course of this project has focused on the design, implementation, and evaluation of a second-generation version of Coarray Fortran. Research and development efforts of the project have focused on the CAF 2.0 language, compiler, runtime system, and supporting infrastructure. This has involved working with the teams that provide infrastructure for CAF that we rely on, implementing new language and runtime features, producing an open source compiler that enabled us to evaluate our ideas, and evaluating our design and implementation through the use of benchmarks. The report details the research, development, findings, and conclusions from this work.
Interaction Admittance Based Modeling of Multi-Paralleled Grid-Connected Inverter with LCL-Filter

DEFF Research Database (Denmark)

Lu, Minghui; Blaabjerg, Frede; Wang, Xiongfei

2016-01-01

This paper investigates the mutual interaction and stability issues of multi-parallel LCL-filtered inverters. The stability and power quality of multiple grid-tied inverters are gaining more and more research attention as the penetration of renewables increases. In this paper, interactions...... and coupling effects among the multi-paralleled inverters and power grid are explicitly revealed. An Interaction Admittance concept is introduced to express and model the interaction through the physical admittances of the network. Compared to the existing modeling methods, the proposed analysis provides...
PVeStA: A Parallel Statistical Model Checking and Quantitative Analysis Tool

KAUST Repository

AlTurki, Musab

2011-01-01

Statistical model checking is an attractive formal analysis method for probabilistic systems such as, for example, cyber-physical systems which are often probabilistic in nature. This paper is about drastically increasing the scalability of statistical model checking, and making such scalability of analysis available to tools like Maude, where probabilistic systems can be specified at a high level as probabilistic rewrite theories. It presents PVeStA, an extension and parallelization of the VeStA statistical model checking tool [10]. PVeStA supports statistical model checking of probabilistic real-time systems specified as either: (i) discrete or continuous Markov Chains; or (ii) probabilistic rewrite theories in Maude. Furthermore, the properties that it can model check can be expressed in either: (i) PCTL/CSL, or (ii) the QuaTEx quantitative temporal logic. As our experiments show, the performance gains obtained from parallelization can be very high. © 2011 Springer-Verlag.
Parallel External Memory Graph Algorithms

DEFF Research Database (Denmark)

Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

2010-01-01

In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of Â¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....
Stage-by-Stage and Parallel Flow Path Compressor Modeling for a Variable Cycle Engine

Science.gov (United States)

Kopasakis, George; Connolly, Joseph W.; Cheng, Larry

2015-01-01

This paper covers the development of stage-by-stage and parallel flow path compressor modeling approaches for a Variable Cycle Engine. The stage-by-stage compressor modeling approach is an extension of a technique for lumped volume dynamics and performance characteristic modeling. It was developed to improve the accuracy of axial compressor dynamics over lumped volume dynamics modeling. The stage-by-stage compressor model presented here is formulated into a parallel flow path model that includes both axial and rotational dynamics. This is done to enable the study of compressor and propulsion system dynamic performance under flow distortion conditions. The approaches utilized here are generic and should be applicable for the modeling of any axial flow compressor design.
Reduced-Order Structure-Preserving Model for Parallel-Connected Three-Phase Grid-Tied Inverters

Energy Technology Data Exchange (ETDEWEB)

Johnson, Brian B [National Renewable Energy Laboratory (NREL), Golden, CO (United States); Purba, Victor [University of Minnesota; Jafarpour, Saber [University of California Santa-Barbara; Bullo, Francesco [University of California Santa-Barbara; Dhople, Sairaj V. [University of Minnesota

2017-08-21

Next-generation power networks will contain large numbers of grid-connected inverters satisfying a significant fraction of system load. Since each inverter model has a relatively large number of dynamic states, it is impractical to analyze complex system models where the full dynamics of each inverter are retained. To address this challenge, we derive a reduced-order structure-preserving model for parallel-connected grid-tied three-phase inverters. Here, each inverter in the system is assumed to have a full-bridge topology, LCL filter at the point of common coupling, and the control architecture for each inverter includes a current controller, a power controller, and a phase-locked loop for grid synchronization. We outline a structure-preserving reduced-order inverter model with lumped parameters for the setting where the parallel inverters are each designed such that the filter components and controller gains scale linearly with the power rating. By structure preserving, we mean that the reduced-order three-phase inverter model is also composed of an LCL filter, a power controller, current controller, and PLL. We show that the system of parallel inverters can be modeled exactly as one aggregated inverter unit and this equivalent model has the same number of dynamical states as any individual inverter in the system. Numerical simulations validate the reduced-order model.
Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

CERN Document Server

Calafiura, Paolo; The ATLAS collaboration; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter

2015-01-01

AthenaMP is a multi-process version of the ATLAS reconstruction and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows the sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain confugurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the...
Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

CERN Document Server

Calafiura, Paolo; Seuster, Rolf; Tsulaia, Vakhtang; van Gemmeren, Peter

2015-01-01

AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows to run AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of Ath...
Accelerating population balance-Monte Carlo simulation for coagulation dynamics from the Markov jump model, stochastic algorithm and GPU parallel computing

International Nuclear Information System (INIS)

Xu, Zuwei; Zhao, Haibo; Zheng, Chuguang

2015-01-01

This paper proposes a comprehensive framework for accelerating population balance-Monte Carlo (PBMC) simulation of particle coagulation dynamics. By combining Markov jump model, weighted majorant kernel and GPU (graphics processing unit) parallel computing, a significant gain in computational efficiency is achieved. The Markov jump model constructs a coagulation-rule matrix of differentially-weighted simulation particles, so as to capture the time evolution of particle size distribution with low statistical noise over the full size range and as far as possible to reduce the number of time loopings. Here three coagulation rules are highlighted and it is found that constructing appropriate coagulation rule provides a route to attain the compromise between accuracy and cost of PBMC methods. Further, in order to avoid double looping over all simulation particles when considering the two-particle events (typically, particle coagulation), the weighted majorant kernel is introduced to estimate the maximum coagulation rates being used for acceptance–rejection processes by single-looping over all particles, and meanwhile the mean time-step of coagulation event is estimated by summing the coagulation kernels of rejected and accepted particle pairs. The computational load of these fast differentially-weighted PBMC simulations (based on the Markov jump model) is reduced greatly to be proportional to the number of simulation particles in a zero-dimensional system (single cell). Finally, for a spatially inhomogeneous multi-dimensional (multi-cell) simulation, the proposed fast PBMC is performed in each cell, and multiple cells are parallel processed by multi-cores on a GPU that can implement the massively threaded data-parallel tasks to obtain remarkable speedup ratio (comparing with CPU computation, the speedup ratio of GPU parallel computing is as high as 200 in a case of 100 cells with 10 000 simulation particles per cell). These accelerating approaches of PBMC are
Modeling, analysis, and design of stationary reference frame droop controlled parallel three-phase voltage source inverters

DEFF Research Database (Denmark)

Vasquez, Juan Carlos; Guerrero, Josep M.; Savaghebi, Mehdi

2013-01-01

Power electronics based MicroGrids consist of a number of voltage source inverters (VSIs) operating in parallel. In this paper, the modeling, control design, and stability analysis of parallel connected three-phase VSIs are derived. The proposed voltage and current inner control loops and the mat......Power electronics based MicroGrids consist of a number of voltage source inverters (VSIs) operating in parallel. In this paper, the modeling, control design, and stability analysis of parallel connected three-phase VSIs are derived. The proposed voltage and current inner control loops...... control restores the frequency and amplitude deviations produced by the primary control. Also, a synchronization algorithm is presented in order to connect the MicroGrid to the grid. Experimental results are provided to validate the performance and robustness of the parallel VSI system control...
Methods and models for the construction of weakly parallel tests

NARCIS (Netherlands)

Adema, J.J.; Adema, Jos J.

1992-01-01

Several methods are proposed for the construction of weakly parallel tests [i.e., tests with the same test information function (TIF)]. A mathematical programming model that constructs tests containing a prespecified TIF and a heuristic that assigns items to tests with information functions that are
Methods and models for the construction of weakly parallel tests

NARCIS (Netherlands)

Adema, J.J.; Adema, Jos J.

1990-01-01

Methods are proposed for the construction of weakly parallel tests, that is, tests with the same test information function. A mathematical programing model for constructing tests with a prespecified test information function and a heuristic for assigning items to tests such that their information
Parallel k-means++

Energy Technology Data Exchange (ETDEWEB)

2017-04-04

A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.
Parallelization of a Quantum-Classic Hybrid Model For Nanoscale Semiconductor Devices

Directory of Open Access Journals (Sweden)

Oscar Salas

2011-07-01

Full Text Available The expensive reengineering of the sequential software and the difficult parallel programming are two of the many technical and economic obstacles to the wide use of HPC. We investigate the chance to improve in a rapid way the performance of a numerical serial code for the simulation of the transport of a charged carriers in a Double-Gate MOSFET. We introduce the Drift-Diffusion-Schrödinger-Poisson (DDSP model and we study a rapid parallelization strategy of the numerical procedure on shared memory architectures.
MASSIVELY PARALLEL LATENT SEMANTIC ANALYSES USING A GRAPHICS PROCESSING UNIT

Energy Technology Data Exchange (ETDEWEB)

Cavanagh, J.; Cui, S.

2009-01-01

Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using Singular Value Decomposition. However, with the ever-expanding size of datasets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. A graphics processing unit (GPU) can solve some highly parallel problems much faster than a traditional sequential processor or central processing unit (CPU). Thus, a deployable system using a GPU to speed up large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a PC cluster. Due to the GPU’s application-specifi c architecture, harnessing the GPU’s computational prowess for LSA is a great challenge. We presented a parallel LSA implementation on the GPU, using NVIDIA® Compute Unifi ed Device Architecture and Compute Unifi ed Basic Linear Algebra Subprograms software. The performance of this implementation is compared to traditional LSA implementation on a CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1 000x1 000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran fi ve to six times faster than the CPU version. The large variation is due to architectural benefi ts of the GPU for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.
Flexible parallel implicit modelling of coupled thermal-hydraulic-mechanical processes in fractured rocks

Science.gov (United States)

Cacace, Mauro; Jacquey, Antoine B.

2017-09-01

Theory and numerical implementation describing groundwater flow and the transport of heat and solute mass in fully saturated fractured rocks with elasto-plastic mechanical feedbacks are developed. In our formulation, fractures are considered as being of lower dimension than the hosting deformable porous rock and we consider their hydraulic and mechanical apertures as scaling parameters to ensure continuous exchange of fluid mass and energy within the fracture-solid matrix system. The coupled system of equations is implemented in a new simulator code that makes use of a Galerkin finite-element technique. The code builds on a flexible, object-oriented numerical framework (MOOSE, Multiphysics Object Oriented Simulation Environment) which provides an extensive scalable parallel and implicit coupling to solve for the multiphysics problem. The governing equations of groundwater flow, heat and mass transport, and rock deformation are solved in a weak sense (either by classical Newton-Raphson or by free Jacobian inexact Newton-Krylow schemes) on an underlying unstructured mesh. Nonlinear feedbacks among the active processes are enforced by considering evolving fluid and rock properties depending on the thermo-hydro-mechanical state of the system and the local structure, i.e. degree of connectivity, of the fracture system. A suite of applications is presented to illustrate the flexibility and capability of the new simulator to address problems of increasing complexity and occurring at different spatial (from centimetres to tens of kilometres) and temporal scales (from minutes to hundreds of years).
Flexible parallel implicit modelling of coupled thermal–hydraulic–mechanical processes in fractured rocks

Directory of Open Access Journals (Sweden)

M. Cacace

2017-09-01

Full Text Available Theory and numerical implementation describing groundwater flow and the transport of heat and solute mass in fully saturated fractured rocks with elasto-plastic mechanical feedbacks are developed. In our formulation, fractures are considered as being of lower dimension than the hosting deformable porous rock and we consider their hydraulic and mechanical apertures as scaling parameters to ensure continuous exchange of fluid mass and energy within the fracture–solid matrix system. The coupled system of equations is implemented in a new simulator code that makes use of a Galerkin finite-element technique. The code builds on a flexible, object-oriented numerical framework (MOOSE, Multiphysics Object Oriented Simulation Environment which provides an extensive scalable parallel and implicit coupling to solve for the multiphysics problem. The governing equations of groundwater flow, heat and mass transport, and rock deformation are solved in a weak sense (either by classical Newton–Raphson or by free Jacobian inexact Newton–Krylow schemes on an underlying unstructured mesh. Nonlinear feedbacks among the active processes are enforced by considering evolving fluid and rock properties depending on the thermo-hydro-mechanical state of the system and the local structure, i.e. degree of connectivity, of the fracture system. A suite of applications is presented to illustrate the flexibility and capability of the new simulator to address problems of increasing complexity and occurring at different spatial (from centimetres to tens of kilometres and temporal scales (from minutes to hundreds of years.
Passive and partially active fault tolerance for massively parallel stream processing engines

DEFF Research Database (Denmark)

Su, Li; Zhou, Yongluan

2018-01-01

. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE...... also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an open-source MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness...
Parallel, multi-stage processing of colors, faces and shapes in macaque inferior temporal cortex

Science.gov (United States)

Lafer-Sousa, Rosa; Conway, Bevil R.

2014-01-01

Visual-object processing culminates in inferior temporal (IT) cortex. To assess the organization of IT, we measured fMRI responses in alert monkey to achromatic images (faces, fruit, bodies, places) and colored gratings. IT contained multiple color-biased regions, which were typically ventral to face patches and, remarkably, yoked to them, spaced regularly at four locations predicted by known anatomy. Color and face selectivity increased for more anterior regions, indicative of a broad hierarchical arrangement. Responses to non-face shapes were found across IT, but were stronger outside color-biased regions and face patches, consistent with multiple parallel streams. IT also contained multiple coarse eccentricity maps: face patches overlapped central representations; color-biased regions spanned mid-peripheral representations; and place-biased regions overlapped peripheral representations. These results suggest that IT comprises parallel, multi-stage processing networks subject to one organizing principle. PMID:24141314

F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

Science.gov (United States)

DiNucci, David C.; Saini, Subhash (Technical Monitor)

1998-01-01

Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).
Mathematical Model of Thyristor Inverter Including a Series-parallel Resonant Circuit

OpenAIRE

Miroslaw Luft; Elzbieta Szychta

2008-01-01

The article presents a mathematical model of thyristor inverter including a series-parallel resonant circuit with theaid of state variable method. Maple procedures are used to compute current and voltage waveforms in the inverter.
Parallel embedded systems: where real-time and low-power meet

DEFF Research Database (Denmark)

Karakehayov, Zdravko; Guo, Yu

2008-01-01

This paper introduces a combination of models and proofs for optimal power management via Dynamic Frequency Scaling and Dynamic Voltage Scaling. The approach is suitable for systems on a chip or microcontrollers where processors run in parallel with embedded peripherals. We have developed...... a software tool, called CASTLE, to provide computer assistance in the design process of energy-aware embedded systems. The tool considers single processor and parallel architectures. An example shows an energy reduction of 23% when the tool allocates two microcontrollers for parallel execution....
A model of breakdown in parallel-plate detectors

International Nuclear Information System (INIS)

Fonte, P.

1996-01-01

Parallel-plate avalanche chambers (PPAC's) have many desirable properties, such as a fast, large area particle detector. However, the maximum gain is limited by a form of violent breakdown that limits the usefulness of this detector, despite its other evident qualities. The exact nature of this phenomenon is not yet sufficiently clear to sustain possible improvements. A previous experimental study is complemented in the present work by a quantitative model of the breakdown phenomenon in PPAC's, based on the streamer theory. The model reproduces well the peculiar behavior of the external current observed in PPAC's and resistive-plate chambers. Other breakdown properties measured in PPAC's are also well reproduced
Parallel Algorithm for Solving TOV Equations for Sequence of Cold and Dense Nuclear Matter Models

Science.gov (United States)

Ayriyan, Alexander; Buša, Ján; Grigorian, Hovik; Poghosyan, Gevorg

2018-04-01

We have introduced parallel algorithm simulation of neutron star configurations for set of equation of state models. The performance of the parallel algorithm has been investigated for testing set of EoS models on two computational systems. It scales when using with MPI on modern CPUs and this investigation allowed us also to compare two different types of computational nodes.
Minimizing makespan in a two-stage flow shop with parallel batch-processing machines and re-entrant jobs

Science.gov (United States)

Huang, J. D.; Liu, J. J.; Chen, Q. X.; Mao, N.

2017-06-01

Against a background of heat-treatment operations in mould manufacturing, a two-stage flow-shop scheduling problem is described for minimizing makespan with parallel batch-processing machines and re-entrant jobs. The weights and release dates of jobs are non-identical, but job processing times are equal. A mixed-integer linear programming model is developed and tested with small-scale scenarios. Given that the problem is NP hard, three heuristic construction methods with polynomial complexity are proposed. The worst case of the new constructive heuristic is analysed in detail. A method for computing lower bounds is proposed to test heuristic performance. Heuristic efficiency is tested with sets of scenarios. Compared with the two improved heuristics, the performance of the new constructive heuristic is superior.
Bessel functions: parallel display and processing.

Science.gov (United States)

Lohmann, A W; Ojeda-Castañeda, J; Serrano-Heredia, A

1994-01-01

We present an optical setup that converts planar binary curves into two-dimensional amplitude distributions, which are proportional, along one axis, to the Bessel function of order n, whereas along the other axis the order n increases. This Bessel displayer can be used for parallel Bessel transformation of a signal. Experimental verifications are included.
Computer-Aided Parallelizer and Optimizer

Science.gov (United States)

Jin, Haoqiang

2011-01-01

The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
Vlasov modelling of parallel transport in a tokamak scrape-off layer

International Nuclear Information System (INIS)

Manfredi, G; Hirstoaga, S; Devaux, S

2011-01-01

A one-dimensional Vlasov-Poisson model is used to describe the parallel transport in a tokamak scrape-off layer. Thanks to a recently developed 'asymptotic-preserving' numerical scheme, it is possible to lift numerical constraints on the time step and grid spacing, which are no longer limited by, respectively, the electron plasma period and Debye length. The Vlasov approach provides a good velocity-space resolution even in regions of low density. The model is applied to the study of parallel transport during edge-localized modes, with particular emphasis on the particles and energy fluxes on the divertor plates. The numerical results are compared with analytical estimates based on a free-streaming model, with good general agreement. An interesting feature is the observation of an early electron energy flux, due to suprathermal electrons escaping the ions' attraction. In contrast, the long-time evolution is essentially quasi-neutral and dominated by the ion dynamics.
Vlasov modelling of parallel transport in a tokamak scrape-off layer

Energy Technology Data Exchange (ETDEWEB)

Manfredi, G [Institut de Physique et Chimie des Materiaux, CNRS and Universite de Strasbourg, BP 43, F-67034 Strasbourg (France); Hirstoaga, S [INRIA Nancy Grand-Est and Institut de Recherche en Mathematiques Avancees, 7 rue Rene Descartes, F-67084 Strasbourg (France); Devaux, S, E-mail: Giovanni.Manfredi@ipcms.u-strasbg.f, E-mail: hirstoaga@math.unistra.f, E-mail: Stephane.Devaux@ccfe.ac.u [JET-EFDA, Culham Science Centre, Abingdon, OX14 3DB (United Kingdom)

2011-01-15

A one-dimensional Vlasov-Poisson model is used to describe the parallel transport in a tokamak scrape-off layer. Thanks to a recently developed 'asymptotic-preserving' numerical scheme, it is possible to lift numerical constraints on the time step and grid spacing, which are no longer limited by, respectively, the electron plasma period and Debye length. The Vlasov approach provides a good velocity-space resolution even in regions of low density. The model is applied to the study of parallel transport during edge-localized modes, with particular emphasis on the particles and energy fluxes on the divertor plates. The numerical results are compared with analytical estimates based on a free-streaming model, with good general agreement. An interesting feature is the observation of an early electron energy flux, due to suprathermal electrons escaping the ions' attraction. In contrast, the long-time evolution is essentially quasi-neutral and dominated by the ion dynamics.
Mathematical model of thyristor inverter including a series-parallel resonant circuit

OpenAIRE

Luft, M.; Szychta, E.

2008-01-01

The article presents a mathematical model of thyristor inverter including a series-parallel resonant circuit with the aid of state variable method. Maple procedures are used to compute current and voltage waveforms in the inverter.
Parallel shooting methods for finding steady state solutions to engine simulation models

DEFF Research Database (Denmark)

Andersen, Stig Kildegård; Thomsen, Per Grove; Carlsen, Henrik

2007-01-01

Parallel single- and multiple shooting methods were tested for finding periodic steady state solutions to a Stirling engine model. The model was used to illustrate features of the methods and possibilities for optimisations. Performance was measured using simulation of an experimental data set...
A review of advanced small-scale parallel bioreactor technology for accelerated process development: current state and future need.

Science.gov (United States)

Bareither, Rachel; Pollard, David

2011-01-01

The pharmaceutical and biotech industries face continued pressure to reduce development costs and accelerate process development. This challenge occurs alongside the need for increased upstream experimentation to support quality by design initiatives and the pursuit of predictive models from systems biology. A small scale system enabling multiple reactions in parallel (n ≥ 20), with automated sampling and integrated to purification, would provide significant improvement (four to fivefold) to development timelines. State of the art attempts to pursue high throughput process development include shake flasks, microfluidic reactors, microtiter plates and small-scale stirred reactors. The limitations of these systems are compared to desired criteria to mimic large scale commercial processes. The comparison shows that significant technological improvement is still required to provide automated solutions that can speed upstream process development. Copyright © 2010 American Institute of Chemical Engineers (AIChE).
Encouraging early preventive dental visits for preschool-aged children enrolled in Medicaid: using the extended parallel process model to conduct formative research.

Science.gov (United States)

Askelson, Natoshia M; Chi, Donald L; Momany, Elizabeth; Kuthy, Raymond; Ortiz, Cristina; Hanson, Jessica D; Damiano, Peter

2014-01-01

Preventive dental visits for preschool-aged children can result in better oral health outcomes, especially for children from lower income families. Many children, however, still do not see a dentist for preventive visits. This qualitative study examined the potential for the Extended Parallel Process Model (EPPM) to be used to uncover potential antecedents to parents' decisions about seeking preventive dental care. Seventeen focus groups including 41 parents were conducted. The focus group protocol centered on constructs (perceived severity, perceived susceptibility, perceived self-efficacy, and perceived response efficacy) of the EPPM. Transcripts were analyzed by three coders who employed closed coding strategies. Parents' perceptions of severity of dental issues were high, particularly regarding negative health and appearance outcomes. Parents perceived susceptibility of their children to dental problems as low, primarily because most children in this study received preventive care, which parents viewed as highly efficacious. Parents' self-efficacy to obtain preventive care for their children was high. However, they were concerned about barriers including lack of dentists, especially dentists who are good with young children. Findings were consistent with EPPM, which suggests this model is a potential tool for understanding parents' decisions about seeking preventive dental care for their young children. Future research should utilize quantitative methods to test this model. © 2012 American Association of Public Health Dentistry.
Hybrid parallel computing architecture for multiview phase shifting

Science.gov (United States)

Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun

2014-11-01

The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
Mathematical Model of Thyristor Inverter Including a Series-parallel Resonant Circuit

Directory of Open Access Journals (Sweden)

Miroslaw Luft

2008-01-01

Full Text Available The article presents a mathematical model of thyristor inverter including a series-parallel resonant circuit with theaid of state variable method. Maple procedures are used to compute current and voltage waveforms in the inverter.
Parallelizing the spectral transform method: A comparison of alternative parallel algorithms

International Nuclear Information System (INIS)

Foster, I.; Worley, P.H.

1993-01-01

The spectral transform method is a standard numerical technique for solving partial differential equations on the sphere and is widely used in global climate modeling. In this paper, we outline different approaches to parallelizing the method and describe experiments that we are conducting to evaluate the efficiency of these approaches on parallel computers. The experiments are conducted using a testbed code that solves the nonlinear shallow water equations on a sphere, but are designed to permit evaluation in the context of a global model. They allow us to evaluate the relative merits of the approaches as a function of problem size and number of processors. The results of this study are guiding ongoing work on PCCM2, a parallel implementation of the Community Climate Model developed at the National Center for Atmospheric Research
Efficient parallel simulation of CO2 geologic sequestration in saline aquifers

International Nuclear Information System (INIS)

Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten

2007-01-01

An efficient parallel simulator for large-scale, long-term CO2 geologic sequestration in saline aquifers has been developed. The parallel simulator is a three-dimensional, fully implicit model that solves large, sparse linear systems arising from discretization of the partial differential equations for mass and energy balance in porous and fractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics and thermophysical properties of H2O-NaCl- CO2 mixtures, modeling single and/or two-phase isothermal or non-isothermal flow processes, two-phase mixtures, fluid phases appearing or disappearing, as well as salt precipitation or dissolution. The new parallel simulator uses MPI for parallel implementation, the METIS software package for simulation domain partitioning, and the iterative parallel linear solver package Aztec for solving linear equations by multiple processors. In addition, the parallel simulator has been implemented with an efficient communication scheme. Test examples show that a linear or super-linear speedup can be obtained on Linux clusters as well as on supercomputers. Because of the significant improvement in both simulation time and memory requirement, the new simulator provides a powerful tool for tackling larger scale and more complex problems than can be solved by single-CPU codes. A high-resolution simulation example is presented that models buoyant convection, induced by a small increase in brine density caused by dissolution of CO2
Parallel R-matrix computation

International Nuclear Information System (INIS)

Heggarty, J.W.

1999-06-01

For almost thirty years, sequential R-matrix computation has been used by atomic physics research groups, from around the world, to model collision phenomena involving the scattering of electrons or positrons with atomic or molecular targets. As considerable progress has been made in the understanding of fundamental scattering processes, new data, obtained from more complex calculations, is of current interest to experimentalists. Performing such calculations, however, places considerable demands on the computational resources to be provided by the target machine, in terms of both processor speed and memory requirement. Indeed, in some instances the computational requirements are so great that the proposed R-matrix calculations are intractable, even when utilising contemporary classic supercomputers. Historically, increases in the computational requirements of R-matrix computation were accommodated by porting the problem codes to a more powerful classic supercomputer. Although this approach has been successful in the past, it is no longer considered to be a satisfactory solution due to the limitations of current (and future) Von Neumann machines. As a consequence, there has been considerable interest in the high performance multicomputers, that have emerged over the last decade which appear to offer the computational resources required by contemporary R-matrix research. Unfortunately, developing codes for these machines is not as simple a task as it was to develop codes for successive classic supercomputers. The difficulty arises from the considerable differences in the computing models that exist between the two types of machine and results in the programming of multicomputers to be widely acknowledged as a difficult, time consuming and error-prone task. Nevertheless, unless parallel R-matrix computation is realised, important theoretical and experimental atomic physics research will continue to be hindered. This thesis describes work that was undertaken in
Does the extended parallel process model fear appeal theory explain fears and barriers to prenatal physical activity?

Science.gov (United States)

Redmond, Michelle L; Dong, Fanglong; Frazier, Linda M

2015-01-01

Few studies have looked at the impact of fear on exercise behavior during pregnancy using a fear appeal theory. It is beneficial to understand how women receive the message of safe exercise during pregnancy and whether established guidelines have any influence on their decision to exercise. Using the extended parallel process model (EPPM), we explored women's fears about prenatal physical activity. We conducted a prospective, cross-sectional study on the fears and barriers to prenatal exercise among a racially/ethnically diverse population of pregnant women. Participants were recruited from local prenatal clinics. Ninety females with a singleton pregnancy between 16 and 30 weeks gestation were enrolled in the study. The primary outcome measure was classification of risk behavior based on the EPPM theory. Women who scored high on self-efficacy for exercising safely were more likely to exercise during pregnancy (adjusted odds ratio, 5.95; 95% CI, 1.39-25.39; P=.016) for at least 90 minutes per week. Participants who exercised at least 90 minutes per week during pregnancy scored higher on their perceived ability to control danger to the baby, as well as less susceptibility of harm and threat to baby of moderate exercise from prenatal exercise. More education and counseling on specific guidelines for safely exercising during pregnancy are needed. The EPPM framework has the potential to help improve health communications about exercise safety and guidelines between patients and health care professionals during pregnancy. Copyright © 2015 Jacobs Institute of Women's Health. Published by Elsevier Inc. All rights reserved.

A parallel model for SQL astronomical databases based on solid state storage. Application to the Gaia Archive PostgreSQL database

Science.gov (United States)

González-Núñez, J.; Gutiérrez-Sánchez, R.; Salgado, J.; Segovia, J. C.; Merín, B.; Aguado-Agelet, F.

2017-07-01

Query planning and optimisation algorithms in most popular relational databases were developed at the times hard disk drives were the only storage technology available. The advent of higher parallel random access capacity devices, such as solid state disks, opens up the way for intra-machine parallel computing over large datasets. We describe a two phase parallel model for the implementation of heavy analytical processes in single instance PostgreSQL astronomical databases. This model is particularised to fulfil two frequent astronomical problems, density maps and crossmatch computation with Quad Tree Cube (Q3C) indexes. They are implemented as part of the relational databases infrastructure for the Gaia Archive and performance is assessed. Improvement of a factor 28.40 in comparison to sequential execution is observed in the reference implementation for a histogram computation. Speedup ratios of 3.7 and 4.0 are attained for the reference positional crossmatches considered. We observe large performance enhancements over sequential execution for both CPU and disk access intensive computations, suggesting these methods might be useful with the growing data volumes in Astronomy.
Numerical modelling of the tilt casting processes of titanium alumindes

OpenAIRE

Wang, Hong

2008-01-01

This research has investigated the modelling and optimisation of the tilt casting process of Titanium Aluminides (TiAl). This study is carried out in parallel with the experimental research undertaken in IRC at the University of Birmingham. They propose to use tilt casting inside a vacuum chamber and attempt to combine this tilt casting process with Induction Skull Melting (ISM). A totally novel process is developing for investment casting, which is suitable for casting gamma TiAl.\\ud \\ud As ...
Two Phase Flow Split Model for Parallel Channels | Iloeje | Nigerian ...

African Journals Online (AJOL)

The model and code are capable of handling single and two phase flows, steady states and transients, up to ten parallel flow paths, simple and complicated geometries, including the boilers of fossil steam generators and nuclear power plants. A test calculation has been made with a simplified three-channel system ...
Accuracy analysis of hybrid parallel robot for the assembling of ITER

Energy Technology Data Exchange (ETDEWEB)

Wang Yongbo [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland); The State Key Laboratory of Mechanical Transmission, Chongqing University (China); Pessi, Pekka [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland); Wu Huapeng [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland)], E-mail: huapeng@lut.fi; Handroos, Heikki [Institute of Mechatronics and Virtual Engineering, Lappeenranta University of Technology, Skinnarilankatu 34, 53850 Lappeenranta (Finland)

2009-06-15

This paper presents a novel mobile parallel robot, which is able to carry welding and machining processes from inside the international thermonuclear experimental reactor (ITER) vacuum vessel (VV). The kinematics design of the robot has been optimized for ITER access. To improve the accuracy of the parallel robot, the errors caused by the stiffness and manufacture process have to be compensated or limited to a minimum value. In this paper kinematics errors and stiffness modeling are given. The simulation results are presented.
Accuracy analysis of hybrid parallel robot for the assembling of ITER

International Nuclear Information System (INIS)

Wang Yongbo; Pessi, Pekka; Wu Huapeng; Handroos, Heikki

2009-01-01

This paper presents a novel mobile parallel robot, which is able to carry welding and machining processes from inside the international thermonuclear experimental reactor (ITER) vacuum vessel (VV). The kinematics design of the robot has been optimized for ITER access. To improve the accuracy of the parallel robot, the errors caused by the stiffness and manufacture process have to be compensated or limited to a minimum value. In this paper kinematics errors and stiffness modeling are given. The simulation results are presented.
Experiments with parallel algorithms for combinatorial problems

NARCIS (Netherlands)

G.A.P. Kindervater (Gerard); H.W.J.M. Trienekens

1985-01-01

textabstractIn the last decade many models for parallel computation have been proposed and many parallel algorithms have been developed. However, few of these models have been realized and most of these algorithms are supposed to run on idealized, unrealistic parallel machines. The parallel machines
Application of the parallel processing computer to a nuclear disaster prevention support system

Energy Technology Data Exchange (ETDEWEB)

Shigehiro, Nukatsuka; Osami, Watanabe [Mitsubishi Heavy Industries, LTD (Japan)

2003-07-01

At the time of nuclear emergency, it is important to identify the type and the cause of the accident. Besides with these, it is also important to provide adequate information for the emergency response organization to support decision making by predicting and evaluating the development of the event and the influence of the release of radioactivity for the environment. Recently, a new type of nuclear disaster prevention support system called MEASURES (Multiple Radiological Emergency Assistance System for Urgent Response) was developed which provides not only the current state of the nuclear power plant and the influence of the radioactivity for the environment, but also the future prediction of the accident development. In order to provide the accurate results of these analyses quickly, MEASURES utilizes various techniques, such as multiple nesting method which narrows down the calculation area gradually, and parallel processing computer for three dimensional analyses, such as air current distribution analysis. In this paper, the outline and the feature of MEASURES are presented, especially focused on the usage of parallel processing computer for the three dimensional air current distribution analysis. (authors)
Application of the parallel processing computer to a nuclear disaster prevention support system

International Nuclear Information System (INIS)

Shigehiro, Nukatsuka; Osami, Watanabe

2003-01-01

At the time of nuclear emergency, it is important to identify the type and the cause of the accident. Besides with these, it is also important to provide adequate information for the emergency response organization to support decision making by predicting and evaluating the development of the event and the influence of the release of radioactivity for the environment. Recently, a new type of nuclear disaster prevention support system called MEASURES (Multiple Radiological Emergency Assistance System for Urgent Response) was developed which provides not only the current state of the nuclear power plant and the influence of the radioactivity for the environment, but also the future prediction of the accident development. In order to provide the accurate results of these analyses quickly, MEASURES utilizes various techniques, such as multiple nesting method which narrows down the calculation area gradually, and parallel processing computer for three dimensional analyses, such as air current distribution analysis. In this paper, the outline and the feature of MEASURES are presented, especially focused on the usage of parallel processing computer for the three dimensional air current distribution analysis. (authors)
The application of image processing in the measurement for three-light-axis parallelity of laser ranger

Science.gov (United States)

Wang, Yang; Wang, Qianqian

2008-12-01

When laser ranger is transported or used in field operations, the transmitting axis, receiving axis and aiming axis may be not parallel. The nonparallelism of the three-light-axis will affect the range-measuring ability or make laser ranger not be operated exactly. So testing and adjusting the three-light-axis parallelity in the production and maintenance of laser ranger is important to ensure using laser ranger reliably. The paper proposes a new measurement method using digital image processing based on the comparison of some common measurement methods for the three-light-axis parallelity. It uses large aperture off-axis paraboloid reflector to get the images of laser spot and white light cross line, and then process the images on LabVIEW platform. The center of white light cross line can be achieved by the matching arithmetic in LABVIEW DLL. And the center of laser spot can be achieved by gradation transformation, binarization and area filter in turn. The software system can set CCD, detect the off-axis paraboloid reflector, measure the parallelity of transmitting axis and aiming axis and control the attenuation device. The hardware system selects SAA7111A, a programmable vedio decoding chip, to perform A/D conversion. FIFO (first-in first-out) is selected as buffer.USB bus is used to transmit data to PC. The three-light-axis parallelity can be achieved according to the position bias between them. The device based on this method has been already used. The application proves this method has high precision, speediness and automatization.
Measuring effectiveness of a university by a parallel network DEA model

Science.gov (United States)

Kashim, Rosmaini; Kasim, Maznah Mat; Rahman, Rosshairy Abd

2017-11-01

Universities contribute significantly to the development of human capital and socio-economic improvement of a country. Due to that, Malaysian universities carried out various initiatives to improve their performance. Most studies have used the Data Envelopment Analysis (DEA) model to measure efficiency rather than effectiveness, even though, the measurement of effectiveness is important to realize how effective a university in achieving its ultimate goals. A university system has two major functions, namely teaching and research and every function has different resources based on its emphasis. Therefore, a university is actually structured as a parallel production system with its overall effectiveness is the aggregated effectiveness of teaching and research. Hence, this paper is proposing a parallel network DEA model to measure the effectiveness of a university. This model includes internal operations of both teaching and research functions into account in computing the effectiveness of a university system. In literature, the graduate and the number of program offered are defined as the outputs, then, the employed graduates and the numbers of programs accredited from professional bodies are considered as the outcomes for measuring the teaching effectiveness. Amount of grants is regarded as the output of research, while the different quality of publications considered as the outcomes of research. A system is considered effective if only all functions are effective. This model has been tested using a hypothetical set of data consisting of 14 faculties at a public university in Malaysia. The results show that none of the faculties is relatively effective for the overall performance. Three faculties are effective in teaching and two faculties are effective in research. The potential applications of the parallel network DEA model allow the top management of a university to identify weaknesses in any functions in their universities and take rational steps for improvement.
A compositional reservoir simulator on distributed memory parallel computers

International Nuclear Information System (INIS)

Rame, M.; Delshad, M.

1995-01-01

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. A portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented
Seamless-merging-oriented parallel inverse lithography technology

International Nuclear Information System (INIS)

Yang Yiwei; Shi Zheng; Shen Shanhu

2009-01-01

Inverse lithography technology (ILT), a promising resolution enhancement technology (RET) used in next generations of IC manufacture, has the capability to push lithography to its limit. However, the existing methods of ILT are either time-consuming due to the large layout in a single process, or not accurate enough due to simply block merging in the parallel process. The seamless-merging-oriented parallel ILT method proposed in this paper is fast because of the parallel process; and most importantly, convergence enhancement penalty terms (CEPT) introduced in the parallel ILT optimization process take the environment into consideration as well as environmental change through target updating. This method increases the similarity of the overlapped area between guard-bands and work units, makes the merging process approach seamless and hence reduces hot-spots. The experimental results show that seamless-merging-oriented parallel ILT not only accelerates the optimization process, but also significantly improves the quality of ILT.
Vector-Parallel processing of the successive overrelaxation method

International Nuclear Information System (INIS)

Yokokawa, Mitsuo

1988-02-01

Successive overrelaxation method, called SOR method, is one of iterative methods for solving linear system of equations, and it has been calculated in serial with a natural ordering in many nuclear codes. After the appearance of vector processors, this natural SOR method has been changed for the parallel algorithm such as hyperplane or red-black method, in which the calculation order is modified. These methods are suitable for vector processors, and more high-speed calculation can be obtained compared with the natural SOR method on vector processors. In this report, a new scheme named 4-colors SOR method is proposed. We find that the 4-colors SOR method can be executed on vector-parallel processors and it gives the most high-speed calculation among all SOR methods according to results of the vector-parallel execution on the Alliant FX/8 multiprocessor system. It is also shown that the theoretical optimal acceleration parameters are equal among five different ordering SOR methods, and the difference between convergence rates of these SOR methods are examined. (author)
Parallel education: what is it?

OpenAIRE

Amos, Michelle Peta

2017-01-01

In the history of education it has long been discussed that single-sex and coeducation are the two models of education present in schools. With the introduction of parallel schools over the last 15 years, there has been very little research into this 'new model'. Many people do not understand what it means for a school to be parallel or they confuse a parallel model with co-education, due to the presence of both boys and girls within the one institution. Therefore, the main obj...
A parallel buffer tree

DEFF Research Database (Denmark)

Sitchinava, Nodar; Zeh, Norbert

2012-01-01

We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....
A one-dimensional heat transfer model for parallel-plate thermoacoustic heat exchangers

NARCIS (Netherlands)

de Jong, Anne; Wijnant, Ysbrand H.; de Boer, Andries

2014-01-01

A one-dimensional (1D) laminar oscillating flow heat transfer model is derived and applied to parallel-plate thermoacoustic heat exchangers. The model can be used to estimate the heat transfer from the solid wall to the acoustic medium, which is required for the heat input/output of thermoacoustic
A Parallel Processing Algorithm for Remote Sensing Classification

Science.gov (United States)

Gualtieri, J. Anthony

2005-01-01

A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.
Decreasing Data Analytics Time: Hybrid Architecture MapReduce-Massive Parallel Processing for a Smart Grid

Directory of Open Access Journals (Sweden)

Abdeslam Mehenni

2017-03-01

Full Text Available As our populations grow in a world of limited resources enterprise seek ways to lighten our load on the planet. The idea of modifying consumer behavior appears as a foundation for smart grids. Enterprise demonstrates the value available from deep analysis of electricity consummation histories, consumers’ messages, and outage alerts, etc. Enterprise mines massive structured and unstructured data. In a nutshell, smart grids result in a flood of data that needs to be analyzed, for better adjust to demand and give customers more ability to delve into their power consumption. Simply put, smart grids will increasingly have a flexible data warehouse attached to them. The key driver for the adoption of data management strategies is clearly the need to handle and analyze the large amounts of information utilities are now faced with. New approaches to data integration are nauseating moment; Hadoop is in fact now being used by the utility to help manage the huge growth in data whilst maintaining coherence of the Data Warehouse. In this paper we define a new Meter Data Management System Architecture repository that differ with three leaders MDMS, where we use MapReduce programming model for ETL and Parallel DBMS in Query statements(Massive Parallel Processing MPP.
Building an Elastic Parallel OGC Web Processing Service on a Cloud-Based Cluster: A Case Study of Remote Sensing Data Processing Service

Directory of Open Access Journals (Sweden)

Xicheng Tan

2015-10-01

Full Text Available Since the Open Geospatial Consortium (OGC proposed the geospatial Web Processing Service (WPS, standard OGC Web Service (OWS-based geospatial processing has become the major type of distributed geospatial application. However, improving the performance and sustainability of the distributed geospatial applications has become the dominant challenge for OWSs. This paper presents the construction of an elastic parallel OGC WPS service on a cloud-based cluster and the designs of a high-performance, cloud-based WPS service architecture, the scalability scheme of the cloud, and the algorithm of the elastic parallel geoprocessing. Experiments of the remote sensing data processing service demonstrate that our proposed method can provide a higher-performance WPS service that uses less computing resources. Our proposed method can also help institutions reduce hardware costs, raise the rate of hardware usage, and conserve energy, which is important in building green and sustainable geospatial services or applications.
SPINning parallel systems software

International Nuclear Information System (INIS)

Matlin, O.S.; Lusk, E.; McCune, W.

2002-01-01

We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin

Parallel and vector implementation of APROS simulator code

International Nuclear Information System (INIS)

Niemi, J.; Tommiska, J.

1990-01-01

In this paper the vector and parallel processing implementation of a general purpose simulator code is discussed. In this code the utilization of vector processing is straightforward. In addition to the loop level parallel processing, the functional decomposition and the domain decomposition have been considered. Results represented for a PWR-plant simulation illustrate the potential speed-up factors of the alternatives. It turns out that the loop level parallelism and the domain decomposition are the most promising alternative to employ the parallel processing. (author)
The Moderated Mediating Effect of Self-Efficacy on Exercise Among Older Adults in an Online Bone Health Intervention Study: A Parallel Process Latent Growth Curve Model.

Science.gov (United States)

Zhu, Shijun; Nahm, Eun-Shim; Resnick, Barbara; Friedmann, Erika; Brown, Clayton; Park, Jumin; Cheon, Jooyoung; Park, DoHwan

2017-07-01

This secondary data analyses of a longitudinal study assessed whether self-efficacy for exercise (SEE) mediated online intervention effects on exercise among older adults and whether age (50-64 vs. ≥65 years) moderated the mediation. Data were from an online bone health intervention study. Eight hundred sixty-six older adults (≥50 years) were randomized to three arms: Bone Power (n = 301), Bone Power Plus (n = 302), or Control (n = 263). Parallel process latent growth curve modeling (LGCM) was used to jointly model growths in SEE and in exercise and to assess the mediating effect of SEE on the effect of intervention on exercise. SEE was a significant mediator in 50- to 64-year-old adults (0.061, 95 BCI: 0.011, 0.163) but not in the ≥65 age group (-0.004, 95% BCI: -0.047, 0.025). Promotion of SEE is critical to improve exercise among 50- to 64-year-olds.
a Predator-Prey Model Based on the Fully Parallel Cellular Automata

Science.gov (United States)

He, Mingfeng; Ruan, Hongbo; Yu, Changliang

We presented a predator-prey lattice model containing moveable wolves and sheep, which are characterized by Penna double bit strings. Sexual reproduction and child-care strategies are considered. To implement this model in an efficient way, we build a fully parallel Cellular Automata based on a new definition of the neighborhood. We show the roles played by the initial densities of the populations, the mutation rate and the linear size of the lattice in the evolution of this model.
Dynamic modelling of a 3-CPU parallel robot via screw theory

Directory of Open Access Journals (Sweden)

L. Carbonari

2013-04-01

Full Text Available The article describes the dynamic modelling of I.Ca.Ro., a novel Cartesian parallel robot recently designed and prototyped by the robotics research group of the Polytechnic University of Marche. By means of screw theory and virtual work principle, a computationally efficient model has been built, with the final aim of realising advanced model based controllers. Then a dynamic analysis has been performed in order to point out possible model simplifications that could lead to a more efficient run time implementation.
Vacuum Large Current Parallel Transfer Numerical Analysis

Directory of Open Access Journals (Sweden)

Enyuan Dong

2014-01-01

Full Text Available The stable operation and reliable breaking of large generator current are a difficult problem in power system. It can be solved successfully by the parallel interrupters and proper timing sequence with phase-control technology, in which the strategy of breaker’s control is decided by the time of both the first-opening phase and second-opening phase. The precise transfer current’s model can provide the proper timing sequence to break the generator circuit breaker. By analysis of the transfer current’s experiments and data, the real vacuum arc resistance and precise correctional model in the large transfer current’s process are obtained in this paper. The transfer time calculated by the correctional model of transfer current is very close to the actual transfer time. It can provide guidance for planning proper timing sequence and breaking the vacuum generator circuit breaker with the parallel interrupters.
Distributed Parallel Architecture for "Big Data"

Directory of Open Access Journals (Sweden)

Catalin BOJA

2012-01-01

Full Text Available This paper is an extension to the "Distributed Parallel Architecture for Storing and Processing Large Datasets" paper presented at the WSEAS SEPADS’12 conference in Cambridge. In its original version the paper went over the benefits of using a distributed parallel architecture to store and process large datasets. This paper analyzes the problem of storing, processing and retrieving meaningful insight from petabytes of data. It provides a survey on current distributed and parallel data processing technologies and, based on them, will propose an architecture that can be used to solve the analyzed problem. In this version there is more emphasis put on distributed files systems and the ETL processes involved in a distributed environment.
A Parallel Strategy for Convolutional Neural Network Based on Heterogeneous Cluster for Mobile Information System

Directory of Open Access Journals (Sweden)

Jilin Zhang

2017-01-01

Full Text Available With the development of the mobile systems, we gain a lot of benefits and convenience by leveraging mobile devices; at the same time, the information gathered by smartphones, such as location and environment, is also valuable for business to provide more intelligent services for customers. More and more machine learning methods have been used in the field of mobile information systems to study user behavior and classify usage patterns, especially convolutional neural network. With the increasing of model training parameters and data scale, the traditional single machine training method cannot meet the requirements of time complexity in practical application scenarios. The current training framework often uses simple data parallel or model parallel method to speed up the training process, which is why heterogeneous computing resources have not been fully utilized. To solve these problems, our paper proposes a delay synchronization convolutional neural network parallel strategy, which leverages the heterogeneous system. The strategy is based on both synchronous parallel and asynchronous parallel approaches; the model training process can reduce the dependence on the heterogeneous architecture in the premise of ensuring the model convergence, so the convolution neural network framework is more adaptive to different heterogeneous system environments. The experimental results show that the proposed delay synchronization strategy can achieve at least three times the speedup compared to the traditional data parallelism.
Overtaking CPU DBMSes with a GPU in whole-query analytic processing with parallelism-friendly execution plan optimization

NARCIS (Netherlands)

A. Agbaria (Adnan); D. Minor (David); N. Peterfreund (Natan); E. Rozenberg (Eyal); O. Rosenberg (Ofer); Huawei Research

2016-01-01

textabstractExisting work on accelerating analytic DB query processing with (discrete) GPUs fails to fully realize their potential for speedup through parallelism: Published results do not achieve significant speedup over more performant CPU-only DBMSes when processing complete queries. This
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

Science.gov (United States)

Choudhary, Alok Nidhi

1989-01-01

Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.
Dynamic modeling and experiment of a new type of parallel servo press considering gravity counterbalance

Science.gov (United States)

He, Jun; Gao, Feng; Bai, Yongjun; Wu, Shengfu

2013-11-01

The large capacity servo press is traditionally realized by means of redundant actuation, however there exist the over-constraint problem and interference among actuators, which increases the control difficulty and the product cost. A new type of press mechanism with parallel topology is presented to develop the mechanical servo press with high stamping capacity. The dynamic model considering gravity counterbalance is proposed based on the virtual work principle, and then the effect of counterbalance cylinder on the dynamic performance of the servo press is studied. It is found that the motor torque required to operate the press is a lot less than the others when the ratio of the counterbalance force to the gravity of ram is in the vicinity of 1.0. The stamping force of the real press prototype can reach up to 25 MN on the position of 13 mm away from the bottom dead center. The typical deep-drawing process with 1 200 mm stroke at 8 strokes per minute is proposed by means of five order polynomial. On this process condition, the driving torques are calculated based on the above dynamic model and the torque measuring test is also carried out on the prototype. It is shown that the curve trend of calculation torque is consistent to the measured result and that the average error is less than 15%. The parallel mechanism is introduced into the development of large capacity servo press to avoid the over-constraint and interference of traditional redundant actuation, and its dynamic characteristics with gravity counterbalance are presented.
A tool for simulating parallel branch-and-bound methods

Science.gov (United States)

Golubeva, Yana; Orlov, Yury; Posypkin, Mikhail

2016-01-01

The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer's interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Optimization of the parameter calculation the process of production historic by using Parallel Virtual Machine-PVM; Otimizacao do calculo de parametros no processo de ajuste de historicos de producao usando PVM

Energy Technology Data Exchange (ETDEWEB)

Vargas Cuervo, Carlos Hernan

1997-03-01

The main objective of this work is to develop a methodology to optimize the simultaneous computation of two parameters in the process of production history matching. This work describes a procedure to minimize an objective function established to find the values of the parameters which are modified in the process. The parameters are chosen after a sensibility analysis. Two optimization methods are tested: a Region Search Method (MBR) and Polytope Method. Both are based in direct search methods which do not require the function derivative. The software PVM (Parallel Virtual Machine) is used to parallelize the simulation runs, allowing the acceleration of the process and the search of multiple solutions. The validation of the methodology is applied to two reservoir models: one homogeneous and other heterogeneous. The advantages of each method and of the parallelization are also present. (author)
An intelligent allocation algorithm for parallel processing

Science.gov (United States)

Carroll, Chester C.; Homaifar, Abdollah; Ananthram, Kishan G.

1988-01-01

The problem of allocating nodes of a program graph to processors in a parallel processing architecture is considered. The algorithm is based on critical path analysis, some allocation heuristics, and the execution granularity of nodes in a program graph. These factors, and the structure of interprocessor communication network, influence the allocation. To achieve realistic estimations of the executive durations of allocations, the algorithm considers the fact that nodes in a program graph have to communicate through varying numbers of tokens. Coarse and fine granularities have been implemented, with interprocessor token-communication duration, varying from zero up to values comparable to the execution durations of individual nodes. The effect on allocation of communication network structures is demonstrated by performing allocations for crossbar (non-blocking) and star (blocking) networks. The algorithm assumes the availability of as many processors as it needs for the optimal allocation of any program graph. Hence, the focus of allocation has been on varying token-communication durations rather than varying the number of processors. The algorithm always utilizes as many processors as necessary for the optimal allocation of any program graph, depending upon granularity and characteristics of the interprocessor communication network.
Parallel family trees for transfer matrices in the Potts model

Science.gov (United States)

Navarro, Cristobal A.; Canfora, Fabrizio; Hitschfeld, Nancy; Navarro, Gonzalo

2015-02-01

The computational cost of transfer matrix methods for the Potts model is related to the question in how many ways can two layers of a lattice be connected? Answering the question leads to the generation of a combinatorial set of lattice configurations. This set defines the configuration space of the problem, and the smaller it is, the faster the transfer matrix can be computed. The configuration space of generic (q , v) transfer matrix methods for strips is in the order of the Catalan numbers, which grows asymptotically as O(4m) where m is the width of the strip. Other transfer matrix methods with a smaller configuration space indeed exist but they make assumptions on the temperature, number of spin states, or restrict the structure of the lattice. In this paper we propose a parallel algorithm that uses a sub-Catalan configuration space of O(3m) to build the generic (q , v) transfer matrix in a compressed form. The improvement is achieved by grouping the original set of Catalan configurations into a forest of family trees, in such a way that the solution to the problem is now computed by solving the root node of each family. As a result, the algorithm becomes exponentially faster than the Catalan approach while still highly parallel. The resulting matrix is stored in a compressed form using O(3m ×4m) of space, making numerical evaluation and decompression to be faster than evaluating the matrix in its O(4m ×4m) uncompressed form. Experimental results for different sizes of strip lattices show that the parallel family trees (PFT) strategy indeed runs exponentially faster than the Catalan Parallel Method (CPM), especially when dealing with dense transfer matrices. In terms of parallel performance, we report strong-scaling speedups of up to 5.7 × when running on an 8-core shared memory machine and 28 × for a 32-core cluster. The best balance of speedup and efficiency for the multi-core machine was achieved when using p = 4 processors, while for the cluster
High-speed parallel solution of the neutron diffusion equation with the hierarchical domain decomposition boundary element method incorporating parallel communications

International Nuclear Information System (INIS)

Tsuji, Masashi; Chiba, Gou

2000-01-01

A hierarchical domain decomposition boundary element method (HDD-BEM) for solving the multiregion neutron diffusion equation (NDE) has been fully parallelized, both for numerical computations and for data communications, to accomplish a high parallel efficiency on distributed memory message passing parallel computers. Data exchanges between node processors that are repeated during iteration processes of HDD-BEM are implemented, without any intervention of the host processor that was used to supervise parallel processing in the conventional parallelized HDD-BEM (P-HDD-BEM). Thus, the parallel processing can be executed with only cooperative operations of node processors. The communication overhead was even the dominant time consuming part in the conventional P-HDD-BEM, and the parallelization efficiency decreased steeply with the increase of the number of processors. With the parallel data communication, the efficiency is affected only by the number of boundary elements assigned to decomposed subregions, and the communication overhead can be drastically reduced. This feature can be particularly advantageous in the analysis of three-dimensional problems where a large number of processors are required. The proposed P-HDD-BEM offers a promising solution to the deterioration problem of parallel efficiency and opens a new path to parallel computations of NDEs on distributed memory message passing parallel computers. (author)
A discrimination-association model for decomposing component processes of the implicit association test.

Science.gov (United States)

Stefanutti, Luca; Robusto, Egidio; Vianello, Michelangelo; Anselmi, Pasquale

2013-06-01

A formal model is proposed that decomposes the implicit association test (IAT) effect into three process components: stimuli discrimination, automatic association, and termination criterion. Both response accuracy and reaction time are considered. Four independent and parallel Poisson processes, one for each of the four label categories of the IAT, are assumed. The model parameters are the rate at which information accrues on the counter of each process and the amount of information that is needed before a response is given. The aim of this study is to present the model and an illustrative application in which the process components of a Coca-Pepsi IAT are decomposed.
Massively parallel mathematical sieves

Energy Technology Data Exchange (ETDEWEB)

Montry, G.R.

1989-01-01

The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
A diffusion model for two parallel queues with processor sharing: transient behavior and asymptotics

Directory of Open Access Journals (Sweden)

Charles Knessl

1999-01-01

Full Text Available We consider two identical, parallel M/M/1 queues. Both queues are fed by a Poisson arrival stream of rate λ and have service rates equal to μ. When both queues are non-empty, the two systems behave independently of each other. However, when one of the queues becomes empty, the corresponding server helps in the other queue. This is called head-of-the-line processor sharing. We study this model in the heavy traffic limit, where ρ=λ/μ→1. We formulate the heavy traffic diffusion approximation and explicitly compute the time-dependent probability of the diffusion approximation to the joint queue length process. We then evaluate the solution asymptotically for large values of space and/or time. This leads to simple expressions that show how the process achieves its stead state and other transient aspects.
Modular high-temperature gas-cooled reactor simulation using parallel processors

International Nuclear Information System (INIS)

Ball, S.J.; Conklin, J.C.

1989-01-01

The MHPP (Modular HTGR Parallel Processor) code has been developed to simulate modular high-temperature gas-cooled reactor (MHTGR) transients and accidents. MHPP incorporates a very detailed model for predicting the dynamics of the reactor core, vessel, and cooling systems over a wide variety of scenarios ranging from expected transients to very-low-probability severe accidents. The simulations routines, which had originally been developed entirely as serial code, were readily adapted to parallel processing Fortran. The resulting parallelized simulation speed was enhanced significantly. Workstation interfaces are being developed to provide for user (operator) interaction. In this paper the benefits realized by adapting previous MHTGR codes to run on a parallel processor are discussed, along with results of typical accident analyses
A Parallel Workload Model and its Implications for Processor Allocation

Science.gov (United States)

1996-11-01

with SEV or AVG, both of which can tolerate c = 0.4 { 0.6 before their performance deteriorates signi cantly. On the other hand, Setia [10] has...Sanjeev. K Setia . The interaction between memory allocation and adaptive partitioning in message-passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [11] Sanjeev K. Setia and Satish K. Tripathi. An analysis of several processor

Temporal fringe pattern analysis with parallel computing

International Nuclear Information System (INIS)

Tuck Wah Ng; Kar Tien Ang; Argentini, Gianluca

2005-01-01

Temporal fringe pattern analysis is invaluable in transient phenomena studies but necessitates long processing times. Here we describe a parallel computing strategy based on the single-program multiple-data model and hyperthreading processor technology to reduce the execution time. In a two-node cluster workstation configuration we found that execution periods were reduced by 1.6 times when four virtual processors were used. To allow even lower execution times with an increasing number of processors, the time allocated for data transfer, data read, and waiting should be minimized. Parallel computing is found here to present a feasible approach to reduce execution times in temporal fringe pattern analysis
SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS

Directory of Open Access Journals (Sweden)

M. K. Bouza

2017-01-01

Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.
Modelling and simulation of multiple single - phase induction motor in parallel connection

Directory of Open Access Journals (Sweden)

Sujitjorn, S.

2006-11-01

Full Text Available A mathematical model for parallel connected n-multiple single-phase induction motors in generalized state-space form is proposed in this paper. The motor group draws electric power from one inverter. The model is developed by the dq-frame theory and was tested against four loading scenarios in which satisfactory results were obtained.
High-performance parallel approaches for three-dimensional light detection and ranging point clouds gridding

Science.gov (United States)

Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon

2017-01-01

With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.
Experiences in the parallelization of the discrete ordinates method using OpenMP and MPI

Energy Technology Data Exchange (ETDEWEB)

Pautz, A. [TUV Hannover/Sachsen-Anhalt e.V. (Germany); Langenbuch, S. [Gesellschaft fur Anlagen- und Reaktorsicherheit (GRS) mbH (Germany)

2003-07-01

The method of Discrete Ordinates is in principle parallelizable to a high degree, since the transport 'mesh sweeps' are mutually independent for all angular directions. However, in the well-known production code Dort such a type of angular domain decomposition has to be done on a spatial line-byline basis, causing the parallelism in the code to be very fine-grained. The construction of scalar fluxes and moments requires a large effort for inter-thread or inter-process communication. We have implemented two different parallelization approaches in Dort: firstly, we have used a shared-memory model suitable for SMP (Symmetric Multiprocessor) machines based on the standard OpenMP. The second approach uses the well-known Message Passing Interface (MPI) to establish communication between parallel processes running in a distributed-memory environment. We investigate the benefits and drawbacks of both models and show first results on performance and scaling behaviour of the parallel Dort code. (authors)
Experiences in the parallelization of the discrete ordinates method using OpenMP and MPI

International Nuclear Information System (INIS)

Pautz, A.; Langenbuch, S.

2003-01-01

The method of Discrete Ordinates is in principle parallelizable to a high degree, since the transport 'mesh sweeps' are mutually independent for all angular directions. However, in the well-known production code Dort such a type of angular domain decomposition has to be done on a spatial line-byline basis, causing the parallelism in the code to be very fine-grained. The construction of scalar fluxes and moments requires a large effort for inter-thread or inter-process communication. We have implemented two different parallelization approaches in Dort: firstly, we have used a shared-memory model suitable for SMP (Symmetric Multiprocessor) machines based on the standard OpenMP. The second approach uses the well-known Message Passing Interface (MPI) to establish communication between parallel processes running in a distributed-memory environment. We investigate the benefits and drawbacks of both models and show first results on performance and scaling behaviour of the parallel Dort code. (authors)
Vector and parallel computing on the IBM ES/3090, a powerful approach to solving problems in the utility industry

International Nuclear Information System (INIS)

Bellucci, V.J.

1990-01-01

This paper describes IBM's approach to parallel computing using the IBM ES/3090 computer. Parallel processing concepts were discussed including its advantages, potential performance improvements and limitations. Particular applications and capabilities for the IBM ES/3090 were presented along with preliminary results from some utilities in the application of parallel processing to simulation of system reliability, air pollution models, and power network dynamics
Study and simulation of a parallel numerical processing machine

International Nuclear Information System (INIS)

Bel Hadj, Slaheddine

1981-12-01

This study has been carried out in the perspective of the implementation on a minicomputer of the NEPTUNIX package (software for the resolution of very large algebra-differential equation systems). Aiming at increasing the system performance, a previous research work has shown the necessity of reducing the execution time of certain numerical computation tasks, which are of frequent use. It has also demonstrated the feasibility of handling these tasks with efficient algorithms of parallel type. The present work deals with the study and simulation of a parallel architecture processor adapted to the fast execution of these algorithms. A minicomputer fitted with a connection to such a parallel processor, has a greatly extended computing power. Then the architecture of a parallel numerical processor, based on the use of VLSI microprocessors and co-processors, is described. Its design aims at the best cost / performance ratio. The last part deals with the simulation processor with the 'CHAMBOR' program. Results show an increasing factor of 30 in speed, in comparison with the execution on a MITRA 15 minicomputer. Moreover the conflicts importance, mainly at the level of access to a shared resource is evaluated. Although this implementation has been designed having in mind a dedicated application, other uses could be envisaged, particularly for the simulation of nuclear reactors: operator guiding system, the behavioural study under accidental circumstances, etc. (author) [fr
On a model of three-dimensional bursting and its parallel implementation

Science.gov (United States)

Tabik, S.; Romero, L. F.; Garzón, E. M.; Ramos, J. I.

2008-04-01

A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.
The cost of conservative synchronization in parallel discrete event simulations

Science.gov (United States)

Nicol, David M.

1990-01-01

The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.
Parallel Application Development Using Architecture View Driven Model Transformations

NARCIS (Netherlands)

Arkin, E.; Tekinerdogan, B.

2015-01-01

o realize the increased need for computing performance the current trend is towards applying parallel computing in which the tasks are run in parallel on multiple nodes. On its turn we can observe the rapid increase of the scale of parallel computing platforms. This situation has led to a complexity
Vectorization, parallelization and porting of nuclear codes. 2001

International Nuclear Information System (INIS)

Akiyama, Mitsunaga; Katakura, Fumishige; Kume, Etsuo; Nemoto, Toshiyuki; Tsuruoka, Takuya; Adachi, Masaaki

2003-07-01

Several computer codes in the nuclear field have been vectorized, parallelized and transported on the super computer system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 10 codes in fiscal 2001. In this report, the parallelization of Neutron Radiography for 3 Dimensional CT code NR3DCT, the vectorization of unsteady-state heat conduction code THERMO3D, the porting of initial program of MHD simulation, the tuning of Heat And Mass Balance Analysis Code HAMBAC, the porting and parallelization of Monte Carlo N-Particle transport code MCNP4C3, the porting and parallelization of Monte Carlo N-Particle transport code system MCNPX2.1.5, the porting of induced activity calculation code CINAC-V4, the use of VisLink library in multidimensional two-fluid model code ACD3D and the porting of experiment data processing code from GS8500 to SR8000 are described. (author)
Image reconstruction method for electrical capacitance tomography based on the combined series and parallel normalization model

International Nuclear Information System (INIS)

Dong, Xiangyuan; Guo, Shuqing

2008-01-01

In this paper, a novel image reconstruction method for electrical capacitance tomography (ECT) based on the combined series and parallel model is presented. A regularization technique is used to obtain a stabilized solution of the inverse problem. Also, the adaptive coefficient of the combined model is deduced by numerical optimization. Simulation results indicate that it can produce higher quality images when compared to the algorithm based on the parallel or series models for the cases tested in this paper. It provides a new algorithm for ECT application
A tool for simulating parallel branch-and-bound methods

Directory of Open Access Journals (Sweden)

Golubeva Yana

2016-01-01

Full Text Available The Branch-and-Bound method is known as one of the most powerful but very resource consuming global optimization methods. Parallel and distributed computing can efficiently cope with this issue. The major difficulty in parallel B&B method is the need for dynamic load redistribution. Therefore design and study of load balancing algorithms is a separate and very important research topic. This paper presents a tool for simulating parallel Branchand-Bound method. The simulator allows one to run load balancing algorithms with various numbers of processors, sizes of the search tree, the characteristics of the supercomputer’s interconnect thereby fostering deep study of load distribution strategies. The process of resolution of the optimization problem by B&B method is replaced by a stochastic branching process. Data exchanges are modeled using the concept of logical time. The user friendly graphical interface to the simulator provides efficient visualization and convenient performance analysis.
Leveraging human oversight and intervention in large-scale parallel processing of open-source data

Science.gov (United States)

Casini, Enrico; Suri, Niranjan; Bradshaw, Jeffrey M.

2015-05-01

The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.
Experimental and modelling results of a parallel-plate based active magnetic regenerator

DEFF Research Database (Denmark)

Tura, A.; Nielsen, Kaspar Kirstein; Rowe, A.

2012-01-01

The performance of a permanent magnet magnetic refrigerator (PMMR) using gadolinium parallel plates is described. The configuration and operating parameters are described in detail. Experimental results are compared to simulations using an established twodimensional model of an active magnetic...
Distributed and cloud computing from parallel processing to the Internet of Things

CERN Document Server

Hwang, Kai; Fox, Geoffrey C

2012-01-01

Distributed and Cloud Computing, named a 2012 Outstanding Academic Title by the American Library Association's Choice publication, explains how to create high-performance, scalable, reliable systems, exposing the design principles, architecture, and innovative applications of parallel, distributed, and cloud computing systems. Starting with an overview of modern distributed models, the book provides comprehensive coverage of distributed and cloud computing, including: Facilitating management, debugging, migration, and disaster recovery through virtualization Clustered systems for resear
High Performance Parallel Processing Project: Industrial computing initiative. Progress reports for fiscal year 1995

Energy Technology Data Exchange (ETDEWEB)

Koniges, A.

1996-02-09

This project is a package of 11 individual CRADA`s plus hardware. This innovative project established a three-year multi-party collaboration that is significantly accelerating the availability of commercial massively parallel processing computing software technology to U.S. government, academic, and industrial end-users. This report contains individual presentations from nine principal investigators along with overall program information.
Parallelization and automatic data distribution for nuclear reactor simulations

Energy Technology Data Exchange (ETDEWEB)

Liebrock, L.M. [Liebrock-Hicks Research, Calumet, MI (United States)

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.
Parallelization and automatic data distribution for nuclear reactor simulations

International Nuclear Information System (INIS)

Liebrock, L.M.

1997-01-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed

High-Performance Parallel and Stream Processing of X-ray Microdiffraction Data on Multicores

International Nuclear Information System (INIS)

Bauer, Michael A; McIntyre, Stewart; Xie Yuzhen; Biem, Alain; Tamura, Nobumichi

2012-01-01

We present the design and implementation of a high-performance system for processing synchrotron X-ray microdiffraction (XRD) data in IBM InfoSphere Streams on multicore processors. We report on the parallel and stream processing techniques that we use to harvest the power of clusters of multicores to analyze hundreds of gigabytes of synchrotron XRD data in order to reveal the microtexture of polycrystalline materials. The timing to process one XRD image using one pipeline is about ten times faster than the best C program at present. With the support of InfoSphere Streams platform, our software is able to be scaled up to operate on clusters of multi-cores for processing multiple images concurrently. This system provides a high-performance processing kernel to achieve near real-time data analysis of image data from synchrotron experiments.
Modeling of Electromagnetic Fields in Parallel-Plane Structures: A Unified Contour-Integral Approach

Directory of Open Access Journals (Sweden)

M. Stumpf

2017-04-01

Full Text Available A unified reciprocity-based modeling approach for analyzing electromagnetic fields in dispersive parallel-plane structures of arbitrary shape is described. It is shown that the use of the reciprocity theorem of the time-convolution type leads to a global contour-integral interaction quantity from which novel both time- and frequency-domain numerical schemes can be arrived at. Applications of the numerical method concerning the time-domain radiated interference and susceptibility of parallel-plane structures are discussed and illustrated on numerical examples.
Fast Simulation of Large-Scale Floods Based on GPU Parallel Computing

Directory of Open Access Journals (Sweden)

Qiang Liu

2018-05-01

Full Text Available Computing speed is a significant issue of large-scale flood simulations for real-time response to disaster prevention and mitigation. Even today, most of the large-scale flood simulations are generally run on supercomputers due to the massive amounts of data and computations necessary. In this work, a two-dimensional shallow water model based on an unstructured Godunov-type finite volume scheme was proposed for flood simulation. To realize a fast simulation of large-scale floods on a personal computer, a Graphics Processing Unit (GPU-based, high-performance computing method using the OpenACC application was adopted to parallelize the shallow water model. An unstructured data management method was presented to control the data transportation between the GPU and CPU (Central Processing Unit with minimum overhead, and then both computation and data were offloaded from the CPU to the GPU, which exploited the computational capability of the GPU as much as possible. The parallel model was validated using various benchmarks and real-world case studies. The results demonstrate that speed-ups of up to one order of magnitude can be achieved in comparison with the serial model. The proposed parallel model provides a fast and reliable tool with which to quickly assess flood hazards in large-scale areas and, thus, has a bright application prospect for dynamic inundation risk identification and disaster assessment.
Parallelization of 2-D lattice Boltzmann codes

International Nuclear Information System (INIS)

Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo.

1996-03-01

Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author)
Parallelization of 2-D lattice Boltzmann codes

Energy Technology Data Exchange (ETDEWEB)

Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo

1996-03-01

Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author).
High performance parallel I/O

CERN Document Server

Prabhat

2014-01-01

Gain Critical Insight into the Parallel I/O EcosystemParallel I/O is an integral component of modern high performance computing (HPC), especially in storing and processing very large datasets to facilitate scientific discovery. Revealing the state of the art in this field, High Performance Parallel I/O draws on insights from leading practitioners, researchers, software architects, developers, and scientists who shed light on the parallel I/O ecosystem.The first part of the book explains how large-scale HPC facilities scope, configure, and operate systems, with an emphasis on choices of I/O har
A tomograph VMEbus parallel processing data acquisition system

International Nuclear Information System (INIS)

Atkins, M.S.; Wilkinson, N.A.; Rogers, J.G.

1988-11-01

This paper describes a VME based data acquisition system suitable for the development of Positron Volume Imaging tomographs which use 3-D data for improved image resolution over slice-oriented tomographs. The data acquisition must be flexible enough to accommodate several 3-D reconstruction algorithms; hence, a software-based system is most suitable. Furthermore, because of the increased dimensions and resolution of volume imaging tomographs, the raw data event rate is greater than that of slice-oriented machines. These dual requirements are met by our data acquisition systems. Flexibility is achieved through an array of processors connected over a VMEbus, operating asynchronously and in parallel. High raw data throughput is achieved using a dedicated high speed data transfer device available for the VMEbus. The device can attain a raw data rate of 2.5 million coincidence events per second for raw events per second for raw events which are 64 bits wide. Real-time data acquisition and pre-processing requirements can be met by about forty 20 MHz Motorola 68020/68881 processors
Improved modelling of a parallel plate active magnetic regenerator

International Nuclear Information System (INIS)

Engelbrecht, K; Nielsen, K K; Bahl, C R H; Tušek, J; Kitanovski, A; Poredoš, A

2013-01-01

Much of the active magnetic regenerator (AMR) modelling presented in the literature considers only the solid and fluid domains of the regenerator and ignores other physical effects that have been shown to be important, such as demagnetizing fields in the regenerator, parasitic heat losses and fluid flow maldistribution in the regenerator. This paper studies the effects of these loss mechanisms and compares theoretical results with experimental results obtained on an experimental AMR device. Three parallel plate regenerators were tested, each having different demagnetizing field characteristics and fluid flow maldistributions. It was shown that when these loss mechanisms are ignored, the model significantly over predicts experimental results. Including the loss mechanisms can significantly change the model predictions, depending on the operating conditions and construction of the regenerator. The model is compared with experimental results for a range of fluid flow rates and cooling loads. (paper)
Vectorization, parallelization and porting of nuclear codes (vectorization and parallelization). Progress report fiscal 1998

International Nuclear Information System (INIS)

Ishizuki, Shigeru; Kawai, Wataru; Nemoto, Toshiyuki; Ogasawara, Shinobu; Kume, Etsuo; Adachi, Masaaki; Kawasaki, Nobuo; Yatake, Yo-ichi

2000-03-01

Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 12 codes in fiscal 1998. These results are reported in 3 parts, i.e., the vectorization and parallelization on vector processors part, the parallelization on scalar processors part and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of General Tokamak Circuit Simulation Program code GTCSP, the vectorization and parallelization of Molecular Dynamics NTV (n-particle, Temperature and Velocity) Simulation code MSP2, Eddy Current Analysis code EDDYCAL, Thermal Analysis Code for Test of Passive Cooling System by HENDEL T2 code THANPACST2 and MHD Equilibrium code SELENEJ on the VPP500 are described. In the parallelization on scalar processors part, the parallelization of Monte Carlo N-Particle Transport code MCNP4B2, Plasma Hydrodynamics code using Cubic Interpolated Propagation Method PHCIP and Vectorized Monte Carlo code (continuous energy model / multi-group model) MVP/GMVP on the Paragon are described. In the porting part, the porting of Monte Carlo N-Particle Transport code MCNP4B2 and Reactor Safety Analysis code RELAP5 on the AP3000 are described. (author)
Aspects of computation on asynchronous parallel processors

International Nuclear Information System (INIS)

Wright, M.

1989-01-01

The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues
The Medial Temporal Lobe – Conduit of Parallel Connectivity: A model for Attention, Memory, and Perception.

Directory of Open Access Journals (Sweden)

Brian B. Mozaffari

2014-11-01

Full Text Available Based on the notion that the brain is equipped with a hierarchical organization, which embodies environmental contingencies across many time scales, this paper suggests that the medial temporal lobe (MTL – located deep in the hierarchy – serves as a bridge connecting supra to infra – MTL levels. Bridging the upper and lower regions of the hierarchy provides a parallel architecture that optimizes information flow between upper and lower regions to aid attention, encoding, and processing of quick complex visual phenomenon. Bypassing intermediate hierarchy levels, information conveyed through the MTL ‘bridge’ allows upper levels to make educated predictions about the prevailing context and accordingly select lower representations to increase the efficiency of predictive coding throughout the hierarchy. This selection or activation/deactivation is associated with endogenous attention. In the event that these ‘bridge’ predictions are inaccurate, this architecture enables the rapid encoding of novel contingencies. A review of hierarchical models in relation to memory is provided along with a new theory, Medial-temporal-lobe Conduit for Parallel Connectivity (MCPC. In this scheme, consolidation is considered as a secondary process, occurring after a MTL-bridged connection, which eventually allows upper and lower levels to access each other directly. With repeated reactivations, as contingencies become consolidated, less MTL activity is predicted. Finally, MTL bridging may aid processing transient but structured perceptual events, by allowing communication between upper and lower levels without calling on intermediate levels of representation.
The vector and parallel processing of MORSE code on Monte Carlo Machine

International Nuclear Information System (INIS)

Hasegawa, Yukihiro; Higuchi, Kenji.

1995-11-01

Multi-group Monte Carlo Code for particle transport, MORSE is modified for high performance computing on Monte Carlo Machine Monte-4. The method and the results are described. Monte-4 was specially developed to realize high performance computing of Monte Carlo codes for particle transport, which have been difficult to obtain high performance in vector processing on conventional vector processors. Monte-4 has four vector processor units with the special hardware called Monte Carlo pipelines. The vectorization and parallelization of MORSE code and the performance evaluation on Monte-4 are described. (author)
Numerical simulation of Vlasov equation with parallel tools

International Nuclear Information System (INIS)

Peyroux, J.

2005-11-01

This project aims to make even more powerful the resolution of Vlasov codes through the various parallelization tools (MPI, OpenMP...). A simplified test case served as a base for constructing the parallel codes for obtaining a data-processing skeleton which, thereafter, could be re-used for increasingly complex models (more than four variables of phase space). This will thus make it possible to treat more realistic situations linked, for example, to the injection of ultra short and ultra intense impulses in inertial fusion plasmas, or the study of the instability of trapped ions now taken as being responsible for the generation of turbulence in tokamak plasmas. (author)
Parallel Monitors for Self-adaptive Sessions

Directory of Open Access Journals (Sweden)

Mario Coppo

2016-06-01

Full Text Available The paper presents a data-driven model of self-adaptivity for multiparty sessions. System choreography is prescribed by a global type. Participants are incarnated by processes associated with monitors, which control their behaviour. Each participant can access and modify a set of global data, which are able to trigger adaptations in the presence of critical changes of values. The use of the parallel composition for building global types, monitors and processes enables a significant degree of flexibility: an adaptation step can dynamically reconfigure a set of participants only, without altering the remaining participants, even if the two groups communicate.
Is orthographic information from multiple parafoveal words processed in parallel: An eye-tracking study.

Science.gov (United States)

Cutter, Michael G; Drieghe, Denis; Liversedge, Simon P

2017-08-01

In the current study we investigated whether orthographic information available from 1 upcoming parafoveal word influences the processing of another parafoveal word. Across 2 experiments we used the boundary paradigm (Rayner, 1975) to present participants with an identity preview of the 2 words after the boundary (e.g., hot pan ), a preview in which 2 letters were transposed between these words (e.g., hop tan ), or a preview in which the same 2 letters were substituted (e.g., hob fan ). We hypothesized that if these 2 words were processed in parallel in the parafovea then we may observe significant preview benefits for the condition in which the letters were transposed between words relative to the condition in which the letters were substituted. However, no such effect was observed, with participants fixating the words for the same amount of time in both conditions. This was the case both when the transposition was made between the final and first letter of the 2 words (e.g., hop tan as a preview of hot pan ; Experiment 1) and when the transposition maintained within word letter position (e.g., pit hop as a preview of hit pop ; Experiment 2). The implications of these findings are considered in relation to serial and parallel lexical processing during reading. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Parallel generation of architecture on the GPU

KAUST Repository

Steinberger, Markus

2014-05-01

In this paper, we present a novel approach for the parallel evaluation of procedural shape grammars on the graphics processing unit (GPU). Unlike previous approaches that are either limited in the kind of shapes they allow, the amount of parallelism they can take advantage of, or both, our method supports state of the art procedural modeling including stochasticity and context-sensitivity. To increase parallelism, we explicitly express independence in the grammar, reduce inter-rule dependencies required for context-sensitive evaluation, and introduce intra-rule parallelism. Our rule scheduling scheme avoids unnecessary back and forth between CPU and GPU and reduces round trips to slow global memory by dynamically grouping rules in on-chip shared memory. Our GPU shape grammar implementation is multiple orders of magnitude faster than the standard in CPU-based rule evaluation, while offering equal expressive power. In comparison to the state of the art in GPU shape grammar derivation, our approach is nearly 50 times faster, while adding support for geometric context-sensitivity. © 2014 The Author(s) Computer Graphics Forum © 2014 The Eurographics Association and John Wiley & Sons Ltd. Published by John Wiley & Sons Ltd.
An efficient parallel stochastic simulation method for analysis of nonviral gene delivery systems

KAUST Repository

Kuwahara, Hiroyuki

2011-01-01

Gene therapy has a great potential to become an effective treatment for a wide variety of diseases. One of the main challenges to make gene therapy practical in clinical settings is the development of efficient and safe mechanisms to deliver foreign DNA molecules into the nucleus of target cells. Several computational and experimental studies have shown that the design process of synthetic gene transfer vectors can be greatly enhanced by computational modeling and simulation. This paper proposes a novel, effective parallelization of the stochastic simulation algorithm (SSA) for pharmacokinetic models that characterize the rate-limiting, multi-step processes of intracellular gene delivery. While efficient parallelizations of the SSA are still an open problem in a general setting, the proposed parallel simulation method is able to substantially accelerate the next reaction selection scheme and the reaction update scheme in the SSA by exploiting and decomposing the structures of stochastic gene delivery models. This, thus, makes computationally intensive analysis such as parameter optimizations and gene dosage control for specific cell types, gene vectors, and transgene expression stability substantially more practical than that could otherwise be with the standard SSA. Here, we translated the nonviral gene delivery model based on mass-action kinetics by Varga et al. [Molecular Therapy, 4(5), 2001] into a more realistic model that captures intracellular fluctuations based on stochastic chemical kinetics, and as a case study we applied our parallel simulation to this stochastic model. Our results show that our simulation method is able to increase the efficiency of statistical analysis by at least 50% in various settings. © 2011 ACM.
Algorithms for a parallel implementation of Hidden Markov Models with a small state space

DEFF Research Database (Denmark)

Nielsen, Jesper; Sand, Andreas

2011-01-01

Two of the most important algorithms for Hidden Markov Models are the forward and the Viterbi algorithms. We show how formulating these using linear algebra naturally lends itself to parallelization. Although the obtained algorithms are slow for Hidden Markov Models with large state spaces...
Parallel computation of multigroup reactivity coefficient using iterative method

Science.gov (United States)

Susmikanti, Mike; Dewayatna, Winter

2013-09-01

One of the research activities to support the commercial radioisotope production program is a safety research target irradiation FPM (Fission Product Molybdenum). FPM targets form a tube made of stainless steel in which the nuclear degrees of superimposed high-enriched uranium. FPM irradiation tube is intended to obtain fission. The fission material widely used in the form of kits in the world of nuclear medicine. Irradiation FPM tube reactor core would interfere with performance. One of the disorders comes from changes in flux or reactivity. It is necessary to study a method for calculating safety terrace ongoing configuration changes during the life of the reactor, making the code faster became an absolute necessity. Neutron safety margin for the research reactor can be reused without modification to the calculation of the reactivity of the reactor, so that is an advantage of using perturbation method. The criticality and flux in multigroup diffusion model was calculate at various irradiation positions in some uranium content. This model has a complex computation. Several parallel algorithms with iterative method have been developed for the sparse and big matrix solution. The Black-Red Gauss Seidel Iteration and the power iteration parallel method can be used to solve multigroup diffusion equation system and calculated the criticality and reactivity coeficient. This research was developed code for reactivity calculation which used one of safety analysis with parallel processing. It can be done more quickly and efficiently by utilizing the parallel processing in the multicore computer. This code was applied for the safety limits calculation of irradiated targets FPM with increment Uranium.
A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

Science.gov (United States)

Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

2014-01-01

It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.

Category specific spatial dissociations of parallel processes underlying visual naming.

Science.gov (United States)

Conner, Christopher R; Chen, Gang; Pieters, Thomas A; Tandon, Nitin

2014-10-01

The constituent elements and dynamics of the networks responsible for word production are a central issue to understanding human language. Of particular interest is their dependency on lexical category, particularly the possible segregation of nouns and verbs into separate processing streams. We applied a novel mixed-effects, multilevel analysis to electrocorticographic data collected from 19 patients (1942 electrodes) to examine the activity of broadly disseminated cortical networks during the retrieval of distinct lexical categories. This approach was designed to overcome the issues of sparse sampling and individual variability inherent to invasive electrophysiology. Both noun and verb generation evoked overlapping, yet distinct nonhierarchical processes favoring ventral and dorsal visual streams, respectively. Notable differences in activity patterns were noted in Broca's area and superior lateral temporo-occipital regions (verb > noun) and in parahippocampal and fusiform cortices (noun > verb). Comparisons with functional magnetic resonance imaging (fMRI) results yielded a strong correlation of blood oxygen level-dependent signal and gamma power and an independent estimate of group size needed for fMRI studies of cognition. Our findings imply parallel, lexical category-specific processes and reconcile discrepancies between lesional and functional imaging studies. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Numerical simulation of Vlasov equation with parallel tools; Simulations numeriques de l'equation de Vlasov a l'aide d'outils paralleles

Energy Technology Data Exchange (ETDEWEB)

Peyroux, J

2005-11-15

This project aims to make even more powerful the resolution of Vlasov codes through the various parallelization tools (MPI, OpenMP...). A simplified test case served as a base for constructing the parallel codes for obtaining a data-processing skeleton which, thereafter, could be re-used for increasingly complex models (more than four variables of phase space). This will thus make it possible to treat more realistic situations linked, for example, to the injection of ultra short and ultra intense impulses in inertial fusion plasmas, or the study of the instability of trapped ions now taken as being responsible for the generation of turbulence in tokamak plasmas. (author)
Parallel Implementation of the Discrete Green's Function Formulation of the FDTD Method on a Multicore Central Processing Unit

Directory of Open Access Journals (Sweden)

T. Stefański

2014-12-01

Full Text Available Parallel implementation of the discrete Green's function formulation of the finite-difference time-domain (DGF-FDTD method was developed on a multicore central processing unit. DGF-FDTD avoids computations of the electromagnetic field in free-space cells and does not require domain termination by absorbing boundary conditions. Computed DGF-FDTD solutions are compatible with the FDTD grid enabling the perfect hybridization of FDTD with the use of time-domain integral equation methods. The developed implementation can be applied to simulations of antenna characteristics. For the sake of example, arrays of Yagi-Uda antennas were simulated with the use of parallel DGF-FDTD. The efficiency of parallel computations was investigated as a function of the number of current elements in the FDTD grid. Although the developed method does not apply the fast Fourier transform for convolution computations, advantages stemming from the application of DGF-FDTD instead of FDTD can be demonstrated for one-dimensional wire antennas when simulation results are post-processed by the near-to-far-field transformation.
Parallel coupling of symmetric and asymmetric exclusion processes

International Nuclear Information System (INIS)

Tsekouras, K; Kolomeisky, A B

2008-01-01

A system consisting of two parallel coupled channels where particles in one of them follow the rules of totally asymmetric exclusion processes (TASEP) and in another one move as in symmetric simple exclusion processes (SSEP) is investigated theoretically. Particles interact with each other via hard-core exclusion potential, and in the asymmetric channel they can only hop in one direction, while on the symmetric lattice particles jump in both directions with equal probabilities. Inter-channel transitions are also allowed at every site of both lattices. Stationary state properties of the system are solved exactly in the limit of strong couplings between the channels. It is shown that strong symmetric couplings between totally asymmetric and symmetric channels lead to an effective partially asymmetric simple exclusion process (PASEP) and properties of both channels become almost identical. However, strong asymmetric couplings between symmetric and asymmetric channels yield an effective TASEP with nonzero particle flux in the asymmetric channel and zero flux on the symmetric lattice. For intermediate strength of couplings between the lattices a vertical-cluster mean-field method is developed. This approximate approach treats exactly particle dynamics during the vertical transitions between the channels and it neglects the correlations along the channels. Our calculations show that in all cases there are three stationary phases defined by particle dynamics at entrances, at exits or in the bulk of the system, while phase boundaries depend on the strength and symmetry of couplings between the channels. Extensive Monte Carlo computer simulations strongly support our theoretical predictions. Theoretical calculations and computer simulations predict that inter-channel couplings have a strong effect on stationary properties. It is also argued that our results might be relevant for understanding multi-particle dynamics of motor proteins
Overcoming artificial spatial correlations in simulations of superstructure domain growth with parallel Monte Carlo algorithms

International Nuclear Information System (INIS)

Schleier, W.; Besold, G.; Heinz, K.

1992-01-01

The authors study the applicability of parallelized/vectorized Monte Carlo (MC) algorithms to the simulation of domain growth in two-dimensional lattice gas models undergoing an ordering process after a rapid quench below an order-disorder transition temperature. As examples they consider models with 2 x 1 and c(2 x 2) equilibrium superstructures on the square and rectangular lattices, respectively. They also study the case of phase separation ('1 x 1' islands) on the square lattice. A generalized parallel checkerboard algorithm for Kawasaki dynamics is shown to give rise to artificial spatial correlations in all three models. However, only if superstructure domains evolve do these correlations modify the kinetics by influencing the nucleation process and result in a reduced growth exponent compared to the value from the conventional heat bath algorithm with random single-site updates. In order to overcome these artificial modifications, two MC algorithms with a reduced degree of parallelism ('hybrid' and 'mask' algorithms, respectively) are presented and applied. As the results indicate, these algorithms are suitable for the simulation of superstructure domain growth on parallel/vector computers. 60 refs., 10 figs., 1 tab
Parallel two-phase-flow-induced vibrations in fuel pin model

International Nuclear Information System (INIS)

Hara, Fumio; Yamashita, Tadashi

1978-01-01

This paper reports the experimental results of vibrations of a fuel pin model -herein meaning the essential form of a fuel pin from the standpoint of vibration- in a parallel air-and-water two-phase flow. The essential part of the experimental apparatus consisted of a flat elastic strip made of stainless steel, both ends of which were firmly supported in a circular channel conveying the two-phase fluid. Vibrational strain of the fuel pin model, pressure fluctuation of the two-phase flow and two-phase-flow void signals were measured. Statistical measures such as power spectral density, variance and correlation function were calculated. The authors obtained (1) the relation between variance of vibrational strain and two-phase-flow velocity, (2) the relation between variance of vibrational strain and two-phase-flow pressure fluctuation, (3) frequency characteristics of variance of vibrational strain against the dominant frequency of the two-phase-flow pressure fluctuation, and (4) frequency characteristics of variance of vibrational strain against the dominant frequency of two-phase-flow void signals. The authors conclude that there exist two kinds of excitation mechanisms in vibrations of a fuel pin model inserted in a parallel air-and-water two-phase flow; namely, (1) parametric excitation, which occurs when the fundamental natural frequency of the fuel pin model is related to the dominant travelling frequency of water slugs in the two-phase flow by the ratio 1/2, 1/1, 3/2 and so on; and (2) vibrational resonance, which occurs when the fundamental frequency coincides with the dominant frequency of the two-phase-flow pressure fluctuation. (auth.)
Precise Modeling Based on Dynamic Phasors for Droop-Controlled Parallel-Connected Inverters

DEFF Research Database (Denmark)

Wang, L.; Guo, X.Q.; Gu, H.R.

2012-01-01

This paper deals with the precise modeling of droop controlled parallel inverters. This is very attractive since that is a common structure that can be found in a stand-alone droopcontrolled MicroGrid. The conventional small-signal dynamic is not able to predict instabilities of the system, so...
Parallel plasma fluid turbulence calculations

International Nuclear Information System (INIS)

Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.

1994-01-01

The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated
Acceleration and sensitivity analysis of lattice kinetic Monte Carlo simulations using parallel processing and rate constant rescaling.

Science.gov (United States)

Núñez, M; Robie, T; Vlachos, D G

2017-10-28

Kinetic Monte Carlo (KMC) simulation provides insights into catalytic reactions unobtainable with either experiments or mean-field microkinetic models. Sensitivity analysis of KMC models assesses the robustness of the predictions to parametric perturbations and identifies rate determining steps in a chemical reaction network. Stiffness in the chemical reaction network, a ubiquitous feature, demands lengthy run times for KMC models and renders efficient sensitivity analysis based on the likelihood ratio method unusable. We address the challenge of efficiently conducting KMC simulations and performing accurate sensitivity analysis in systems with unknown time scales by employing two acceleration techniques: rate constant rescaling and parallel processing. We develop statistical criteria that ensure sufficient sampling of non-equilibrium steady state conditions. Our approach provides the twofold benefit of accelerating the simulation itself and enabling likelihood ratio sensitivity analysis, which provides further speedup relative to finite difference sensitivity analysis. As a result, the likelihood ratio method can be applied to real chemistry. We apply our methodology to the water-gas shift reaction on Pt(111).
Dual processing model of medical decision-making

Science.gov (United States)

2012-01-01

decision-making field, which is still to the large extent dominated by expected utility theory. The model also provides a platform for reconciling two groups of competing dual processing theories (parallel competitive with default-interventionalist theories). PMID:22943520
Dual processing model of medical decision-making.

Science.gov (United States)

Djulbegovic, Benjamin; Hozo, Iztok; Beckstead, Jason; Tsalatsanis, Athanasios; Pauker, Stephen G

2012-09-03

large extent dominated by expected utility theory. The model also provides a platform for reconciling two groups of competing dual processing theories (parallel competitive with default-interventionalist theories).
Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

Science.gov (United States)

OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

1998-01-01

This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Parallel Factor-Based Model for Two-Dimensional Direction Estimation

Directory of Open Access Journals (Sweden)

Nizar Tayem

2017-01-01

Full Text Available Two-dimensional (2D Direction-of-Arrivals (DOA estimation for elevation and azimuth angles assuming noncoherent, mixture of coherent and noncoherent, and coherent sources using extended three parallel uniform linear arrays (ULAs is proposed. Most of the existing schemes have drawbacks in estimating 2D DOA for multiple narrowband incident sources as follows: use of large number of snapshots, estimation failure problem for elevation and azimuth angles in the range of typical mobile communication, and estimation of coherent sources. Moreover, the DOA estimation for multiple sources requires complex pair-matching methods. The algorithm proposed in this paper is based on first-order data matrix to overcome these problems. The main contributions of the proposed method are as follows: (1 it avoids estimation failure problem using a new antenna configuration and estimates elevation and azimuth angles for coherent sources; (2 it reduces the estimation complexity by constructing Toeplitz data matrices, which are based on a single or few snapshots; (3 it derives parallel factor (PARAFAC model to avoid pair-matching problems between multiple sources. Simulation results demonstrate the effectiveness of the proposed algorithm.
High performance computing of density matrix renormalization group method for 2-dimensional model. Parallelization strategy toward peta computing

International Nuclear Information System (INIS)

Yamada, Susumu; Igarashi, Ryo; Machida, Masahiko; Imamura, Toshiyuki; Okumura, Masahiko; Onishi, Hiroaki

2010-01-01

We parallelize the density matrix renormalization group (DMRG) method, which is a ground-state solver for one-dimensional quantum lattice systems. The parallelization allows us to extend the applicable range of the DMRG to n-leg ladders i.e., quasi two-dimension cases. Such an extension is regarded to bring about several breakthroughs in e.g., quantum-physics, chemistry, and nano-engineering. However, the straightforward parallelization requires all-to-all communications between all processes which are unsuitable for multi-core systems, which is a mainstream of current parallel computers. Therefore, we optimize the all-to-all communications by the following two steps. The first one is the elimination of the communications between all processes by only rearranging data distribution with the communication data amount kept. The second one is the avoidance of the communication conflict by rescheduling the calculation and the communication. We evaluate the performance of the DMRG method on multi-core supercomputers and confirm that our two-steps tuning is quite effective. (author)
Current distribution characteristics of superconducting parallel circuits

International Nuclear Information System (INIS)

Mori, K.; Suzuki, Y.; Hara, N.; Kitamura, M.; Tominaka, T.

1994-01-01

In order to increase the current carrying capacity of the current path of the superconducting magnet system, the portion of parallel circuits such as insulated multi-strand cables or parallel persistent current switches (PCS) are made. In superconducting parallel circuits of an insulated multi-strand cable or a parallel persistent current switch (PCS), the current distribution during the current sweep, the persistent mode, and the quench process were investigated. In order to measure the current distribution, two methods were used. (1) Each strand was surrounded with a pure iron core with the air gap. In the air gap, a Hall probe was located. The accuracy of this method was deteriorated by the magnetic hysteresis of iron. (2) The Rogowski coil without iron was used for the current measurement of each path in a 4-parallel PCS. As a result, it was shown that the current distribution characteristics of a parallel PCS is very similar to that of an insulated multi-strand cable for the quench process
Numerical modelling of series-parallel cooling systems in power plant

Directory of Open Access Journals (Sweden)

Regucki Paweł

2017-01-01

Full Text Available The paper presents a mathematical model allowing one to study series-parallel hydraulic systems like, e.g., the cooling system of a power boiler's auxiliary devices or a closed cooling system including condensers and cooling towers. The analytical approach is based on a set of non-linear algebraic equations solved using numerical techniques. As a result of the iterative process, a set of volumetric flow rates of water through all the branches of the investigated hydraulic system is obtained. The calculations indicate the influence of changes in the pipeline's geometrical parameters on the total cooling water flow rate in the analysed installation. Such an approach makes it possible to analyse different variants of the modernization of the studied systems, as well as allowing for the indication of its critical elements. Basing on these results, an investor can choose the optimal variant of the reconstruction of the installation from the economic point of view. As examples of such a calculation, two hydraulic installations are described. One is a boiler auxiliary cooling installation including two screw ash coolers. The other is a closed cooling system consisting of cooling towers and condensers.
Leveraging Non-Uniform Resources for Parallel Query Processing

DEFF Research Database (Denmark)

Mayr, Tobias; Bonnet, Philippe; Gehrke, Johannes

2003-01-01

Modular clusters are now composed of non- uniform nodes with different CPUs, disks or network cards so that customers can adapt the cluster configuration to the changing technologies and to their changing needs. This challenges dataflow parallelism as the primary load balancing technique of exist...
Self-critical perfectionism, dependency, and symptomatic distress in patients with personality disorder during hospitalization-based psychodynamic treatment: A parallel process growth modeling approach.

Science.gov (United States)

Lowyck, Benedicte; Luyten, Patrick; Vermote, Rudi; Verhaest, Yannic; Vansteelandt, Kristof

2017-07-01

There is growing evidence for the efficacy and effectiveness of psychotherapy in patients with personality disorder (PD), but very little is known about the factors underlying these effects. Two-polarities models of personality development provide an empirically supported approach to studying therapeutic change. Briefly, these models argue that personality pathology is characterized by an imbalance between development of the capacity for self-definition and for relatedness, with an exaggerated emphasis on issues regarding self-definition and relatedness being expressed in high levels of self-critical perfectionism (SCP) and dependency, respectively. This study used data from a study of 111 patients with PD who received long-term hospitalization-based psychodynamic treatment to investigate whether (a) treatment was related to changes in SCP, dependency, and symptomatic distress; (b) these changes could be explained by pretreatment levels of SCP, dependency, and/or symptomatic distress; and (c) changes in these personality dimensions over time were associated with symptomatic improvement. SCP, dependency, and symptomatic distress were assessed at admission (baseline), at 12 and 24 weeks into treatment, and at discharge. Parallel process multilevel growth modeling showed that (a) treatment was associated with a significant decrease in levels of SCP, dependency, and symptomatic distress, whereas (b) pretreatment levels of each of these three factors did not predict the decreases observed, and (c) changes in SCP, but not dependency, were associated with the rate of decrease in symptomatic distress over time. Implications of these findings for our understanding of therapeutic change in the treatment of PD are discussed. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Parallel and distributed processing in two SGBDS: A case study

Directory of Open Access Journals (Sweden)

Francisco Javier Moreno

2017-04-01

Full Text Available Context: One of the strategies for managing large volumes of data is distributed and parallel computing. Among the tools that allow applying these characteristics are some Data Base Management Systems (DBMS, such as Oracle, DB2, and SQL Server. Method: In this paper we present a case study where we evaluate the performance of an SQL query in two of these DBMS. The evaluation is done through various forms of data distribution in a computer network with different degrees of parallelism. Results: The tests of the SQL query evidenced the performance differences between the two DBMS analyzed. However, more thorough testing and a wider variety of queries are needed. Conclusions: The differences in performance between the two DBMSs analyzed show that when evaluating this aspect, it is necessary to consider the particularities of each DBMS and the degree of parallelism of the queries.
Jet formation and equatorial superrotation in Jupiter's atmosphere: Numerical modelling using a new efficient parallel code

Science.gov (United States)

Rivier, Leonard Gilles

Using an efficient parallel code solving the primitive equations of atmospheric dynamics, the jet structure of a Jupiter like atmosphere is modeled. In the first part of this thesis, a parallel spectral code solving both the shallow water equations and the multi-level primitive equations of atmospheric dynamics is built. The implementation of this code called BOB is done so that it runs effectively on an inexpensive cluster of workstations. A one dimensional decomposition and transposition method insuring load balancing among processes is used. The Legendre transform is cache-blocked. A "compute on the fly" of the Legendre polynomials used in the spectral method produces a lower memory footprint and enables high resolution runs on relatively small memory machines. Performance studies are done using a cluster of workstations located at the National Center for Atmospheric Research (NCAR). BOB performances are compared to the parallel benchmark code PSTSWM and the dynamical core of NCAR's CCM3.6.6. In both cases, the comparison favors BOB. In the second part of this thesis, the primitive equation version of the code described in part I is used to study the formation of organized zonal jets and equatorial superrotation in a planetary atmosphere where the parameters are chosen to best model the upper atmosphere of Jupiter. Two levels are used in the vertical and only large scale forcing is present. The model is forced towards a baroclinically unstable flow, so that eddies are generated by baroclinic instability. We consider several types of forcing, acting on either the temperature or the momentum field. We show that only under very specific parametric conditions, zonally elongated structures form and persist resembling the jet structure observed near the cloud level top (1 bar) on Jupiter. We also study the effect of an equatorial heat source, meant to be a crude representation of the effect of the deep convective planetary interior onto the outer atmospheric layer. We

Some links on this page may take you to non-federal websites. Their policies may differ from this site.