WorldWideScience

Sample records for poker parallel programming

  1. Poker Phases

    DEFF Research Database (Denmark)

    Bjerg, Ole

    2011-01-01

    The subject of the article is the history of poker. It explores how different structural variations of the game have evolved and how different types of poker have been dominant at different periods in history. There are three main forms of poker: Draw, Stud and Hold’Em. In the article......, it is demonstrated how the three forms emerge and become the most popular form of poker at three different periods in history. It identifies structural homologies between the historical development of poker and key elements in the manifestation of capitalism at different times in history...

  2. Introduction to parallel programming

    CERN Document Server

    Brawer, Steven

    1989-01-01

    Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race

  3. Parallel Programming with Intel Parallel Studio XE

    CERN Document Server

    Blair-Chappell , Stephen

    2012-01-01

    Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the

  4. Parallel programming with Python

    CERN Document Server

    Palach, Jan

    2014-01-01

    A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.

  5. Practical parallel programming

    CERN Document Server

    Bauer, Barr E

    2014-01-01

    This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.

  6. A heads-up no-limit Texas Hold'em poker player: Discretized betting models and automatically generated equilibrium-finding programs

    DEFF Research Database (Denmark)

    Gilpin, Andrew G.; Sandholm, Tuomas; Sørensen, Troels Bjerre

    2008-01-01

    We present Tartanian, a game theory-based player for heads-up no-limit Texas Hold'em poker. Tartanian is built from three components. First, to deal with the virtually infinite strategy space of no-limit poker, we develop a discretized betting model designed to capture the most important strategic...

  7. Writing parallel programs that work

    CERN Multimedia

    CERN. Geneva

    2012-01-01

    Serial algorithms typically run inefficiently on parallel machines. This may sound like an obvious statement, but it is the root cause of why parallel programming is considered to be difficult. The current state of the computer industry is still that almost all programs in existence are serial. This talk will describe the techniques used in the Intel Parallel Studio to provide a developer with the tools necessary to understand the behaviors and limitations of the existing serial programs. Once the limitations are known the developer can refactor the algorithms and reanalyze the resulting programs with the tools in the Intel Parallel Studio to create parallel programs that work. About the speaker Paul Petersen is a Sr. Principal Engineer in the Software and Solutions Group (SSG) at Intel. He received a Ph.D. degree in Computer Science from the University of Illinois in 1993. After UIUC, he was employed at Kuck and Associates, Inc. (KAI) working on auto-parallelizing compiler (KAP), and was involved in th...

  8. About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems

    Directory of Open Access Journals (Sweden)

    Loredana MOCEAN

    2009-01-01

    Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.

  9. Parallelized direct execution simulation of message-passing parallel programs

    Science.gov (United States)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  10. Structured Parallel Programming Patterns for Efficient Computation

    CERN Document Server

    McCool, Michael; Robison, Arch

    2012-01-01

    Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th

  11. Parallel programming with Easy Java Simulations

    Science.gov (United States)

    Esquembre, F.; Christian, W.; Belloni, M.

    2018-01-01

    Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.

  12. Are online poker problem gamblers sensation seekers?

    Science.gov (United States)

    Bonnaire, Céline

    2018-03-31

    The purpose of this research was to examine the relationship between sensation seeking and online poker gambling in a community sample of adult online poker players, when controlling for age, gender, anxiety and depression. In total, 288 online poker gamblers were recruited. Sociodemographic data, gambling behavior (CPGI), sensation seeking (SSS), depression and anxiety (HADS) were evaluated. Problem online poker gamblers have higher sensation seeking scores (total, thrill and adventure, disinhibition and boredom susceptibility subscores) and depression scores than non-problem online poker gamblers. Being male, with total sensation seeking, disinhibition and depression scores are factors associated with online poker problem gambling. These findings are interesting in terms of harm reduction. For example, because disinhibition could lead to increased time and money spent, protective behavioral strategies like setting time and monetary limits should be encouraged in poker online gamblers. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. Parallel Programming Environment for OpenMP

    Directory of Open Access Journals (Sweden)

    Insung Park

    2001-01-01

    Full Text Available We present our effort to provide a comprehensive parallel programming environment for the OpenMP parallel directive language. This environment includes a parallel programming methodology for the OpenMP programming model and a set of tools (Ursa Minor and InterPol that support this methodology. Our toolset provides automated and interactive assistance to parallel programmers in time-consuming tasks of the proposed methodology. The features provided by our tools include performance and program structure visualization, interactive optimization, support for performance modeling, and performance advising for finding and correcting performance problems. The presented evaluation demonstrates that our environment offers significant support in general parallel tuning efforts and that the toolset facilitates many common tasks in OpenMP parallel programming in an efficient manner.

  14. PDDP, A Data Parallel Programming Model

    Directory of Open Access Journals (Sweden)

    Karen H. Warren

    1996-01-01

    Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.

  15. Massively Parallel Finite Element Programming

    KAUST Repository

    Heister, Timo

    2010-01-01

    Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.

  16. Experiences in Data-Parallel Programming

    Directory of Open Access Journals (Sweden)

    Terry W. Clark

    1997-01-01

    Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.

  17. Productive Parallel Programming: The PCN Approach

    Directory of Open Access Journals (Sweden)

    Ian Foster

    1992-01-01

    Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.

  18. Semantic Language Extensions for Implicit Parallel Programming

    Science.gov (United States)

    2013-09-01

    81], Cyclone [87] Parallel Manual Manual Manual × × Parallel X10 [52], DPJ [37] Programming Intel TBB [162], TPL [131] C++0x [29], Erlang [21] Atomos ...another within WEAKC (see Chapter 5). 44 Several solutions have been proposed that support transactional memory at the language level. Atomos [47] is...Olukotun. The Atomos transactional programming language. In Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and imple

  19. Parallel processor programs in the Federal Government

    Science.gov (United States)

    Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

    1985-01-01

    In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.

  20. Integrated Task and Data Parallel Programming

    Science.gov (United States)

    Grimshaw, A. S.

    1998-01-01

    This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated

  1. Automatic Parallelization Tool: Classification of Program Code for Parallel Computing

    Directory of Open Access Journals (Sweden)

    Mustafa Basthikodi

    2016-04-01

    Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.

  2. Portable parallel programming in a Fortran environment

    International Nuclear Information System (INIS)

    May, E.N.

    1989-01-01

    Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs

  3. From sequential to parallel programming with patterns

    CERN Multimedia

    CERN. Geneva

    2018-01-01

    To increase in both performance and efficiency, our programming models need to adapt to better exploit modern processors. The classic idioms and patterns for programming such as loops, branches or recursion are the pillars of almost every code and are well known among all programmers. These patterns all have in common that they are sequential in nature. Embracing parallel programming patterns, which allow us to program for multi- and many-core hardware in a natural way, greatly simplifies the task of designing a program that scales and performs on modern hardware, independently of the used programming language, and in a generic way.

  4. OpenCL parallel programming development cookbook

    CERN Document Server

    Tay, Raymond

    2013-01-01

    OpenCL Parallel Programming Development Cookbook will provide a set of advanced recipes that can be utilized to optimize existing code. This book is therefore ideal for experienced developers with a working knowledge of C/C++ and OpenCL.This book is intended for software developers who have often wondered what to do with that newly bought CPU or GPU they bought other than using it for playing computer games; this book is also for developers who have a working knowledge of C/C++ and who want to learn how to write parallel programs in OpenCL so that life isn't too boring.

  5. Contributions to computational stereology and parallel programming

    DEFF Research Database (Denmark)

    Rasmusson, Allan

    rotator, even without the need for isotropic sections. To meet the need for computational power to perform image restoration of virtual tissue sections, parallel programming on GPUs has also been part of the project. This has lead to a significant change in paradigm for a previously developed surgical...

  6. Profiling Online Poker Players: Are Executive Functions Correlated with Poker Ability and Problem Gambling?

    Science.gov (United States)

    Schiavella, Mauro; Pelagatti, Matteo; Westin, Jerker; Lepore, Gabriele; Cherubini, Paolo

    2018-01-12

    Poker playing and responsible gambling both entail the use of the executive functions (EF), which are higher-level cognitive abilities. This study investigated if online poker players of different ability showed different performances in their EF and if so, which functions were the most discriminating for their playing ability. Furthermore, it assessed if the EF performance was correlated to the quality of gambling, according to self-reported questionnaires (PGSI, SOGS, GRCS). Three poker experts evaluated anonymized poker hand history files and, then, a trained professional administered an extensive neuropsychological test battery. Data analysis determined which variables of the tests correlated with poker ability and gambling quality scores. The highest correlations between EF test results and poker ability and between EF test results and gambling quality assessment showed that mostly different clusters of executive functions characterize the profile of the strong(er) poker player and those ones of the problem gamblers (PGSI and SOGS) and the one of the cognitions related to gambling (GRCS). Taking into consideration only the variables overlapping between PGSI and SOGS, we found some key predictive factors for a more risky and harmful online poker playing: a lower performance in the emotional intelligence competences (Emotional Quotient inventory Short) and, in particular, those grouped in the Intrapersonal scale (emotional self-awareness, assertiveness, self-regard, independence and self-actualization).

  7. Programming massively parallel processors a hands-on approach

    CERN Document Server

    Kirk, David B

    2010-01-01

    Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...

  8. PSHED: a simplified approach to developing parallel programs

    International Nuclear Information System (INIS)

    Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

    1992-01-01

    This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs

  9. Strategic Style in Pared-Down Poker

    Science.gov (United States)

    Burns, Kevin

    This chapter deals with the manner of making diagnoses and decisions, called strategic style, in a gambling game called Pared-down Poker. The approach treats style as a mental mode in which choices are constrained by expected utilities. The focus is on two classes of utility, i.e., money and effort, and how cognitive styles compare to normative strategies in optimizing these utilities. The insights are applied to real-world concerns like managing the war against terror networks and assessing the risks of system failures. After "Introducing the Interactions" involved in playing poker, the contents are arranged in four sections, as follows. "Underpinnings of Utility" outlines four classes of utility and highlights the differences between them: economic utility (money), ergonomic utility (effort), informatic utility (knowledge), and aesthetic utility (pleasure). "Inference and Investment" dissects the cognitive challenges of playing poker and relates them to real-world situations of business and war, where the key tasks are inference (of cards in poker, or strength in war) and investment (of chips in poker, or force in war) to maximize expected utility. "Strategies and Styles" presents normative (optimal) approaches to inference and investment, and compares them to cognitive heuristics by which people play poker--focusing on Bayesian methods and how they differ from human styles. The normative strategy is then pitted against cognitive styles in head-to-head tournaments, and tournaments are also held between different styles. The results show that style is ergonomically efficient and economically effective, i.e., style is smart. "Applying the Analysis" explores how style spaces, of the sort used to model individual behavior in Pared-down Poker, might also be applied to real-world problems where organizations evolve in terror networks and accidents arise from system failures.

  10. Specifying and Executing Optimizations for Parallel Programs

    Directory of Open Access Journals (Sweden)

    William Mansky

    2014-07-01

    Full Text Available Compiler optimizations, usually expressed as rewrites on program graphs, are a core part of all modern compilers. However, even production compilers have bugs, and these bugs are difficult to detect and resolve. The problem only becomes more complex when compiling parallel programs; from the choice of graph representation to the possibility of race conditions, optimization designers have a range of factors to consider that do not appear when dealing with single-threaded programs. In this paper we present PTRANS, a domain-specific language for formal specification of compiler transformations, and describe its executable semantics. The fundamental approach of PTRANS is to describe program transformations as rewrites on control flow graphs with temporal logic side conditions. The syntax of PTRANS allows cleaner, more comprehensible specification of program optimizations; its executable semantics allows these specifications to act as prototypes for the optimizations themselves, so that candidate optimizations can be tested and refined before going on to include them in a compiler. We demonstrate the use of PTRANS to state, test, and refine the specification of a redundant store elimination optimization on parallel programs.

  11. VERIFICATION OF PARALLEL AUTOMATA-BASED PROGRAMS

    OpenAIRE

    M. A. Lukin

    2014-01-01

    The paper deals with an interactive method of automatic verification for parallel automata-based programs. The hierarchical state machines can be implemented in different threads and can interact with each other. Verification is done by means of Spin tool and includes automatic Promela model construction, conversion of LTL-formula to Spin format and counterexamples in terms of automata. Interactive verification gives the possibility to decrease verification time and increase the maxi...

  12. Transformation of functional programs for identification of parallel skeletons

    OpenAIRE

    Kannan, Venkatesh

    2017-01-01

    Hardware is becoming increasingly parallel. Thus, it is essential to identify and exploit inherent parallelism in a given program to effectively utilise the computing power available. However, parallel programming is tedious and error-prone when done by hand, and is very difficult for a compiler to do automatically to the desired level. One possible approach to parallel programming is to use transformation techniques to automatically identify and explicitly specify parallel computations in a ...

  13. Poker E-Sports Reflection Cover

    OpenAIRE

    Karlsson, Stefan

    2011-01-01

    The paper will reflect on Stefan Karlsson’s process of developing the business plan Poker E-­‐‑ Sports. This paper is intended to be read by (1) nascent entrepreneurs, (2) students in the end of their education whom is about to create a link between the academic learning’s and the first job, (3) university personnel involved in this development process and (4) stakeholders whom are about to read the business plan Poker E-­‐‑Sports. This reflection will chronology describe the development proc...

  14. Development of parallel/serial program analyzing tool

    International Nuclear Information System (INIS)

    Watanabe, Hiroshi; Nagao, Saichi; Takigawa, Yoshio; Kumakura, Toshimasa

    1999-03-01

    Japan Atomic Energy Research Institute has been developing 'KMtool', a parallel/serial program analyzing tool, in order to promote the parallelization of the science and engineering computation program. KMtool analyzes the performance of program written by FORTRAN77 and MPI, and it reduces the effort for parallelization. This paper describes development purpose, design, utilization and evaluation of KMtool. (author)

  15. A Fast Causal Profiler for Task Parallel Programs

    OpenAIRE

    Yoga, Adarsh; Nagarakatte, Santosh

    2017-01-01

    This paper proposes TASKPROF, a profiler that identifies parallelism bottlenecks in task parallel programs. It leverages the structure of a task parallel execution to perform fine-grained attribution of work to various parts of the program. TASKPROF's use of hardware performance counters to perform fine-grained measurements minimizes perturbation. TASKPROF's profile execution runs in parallel using multi-cores. TASKPROF's causal profile enables users to estimate improvements in parallelism wh...

  16. A Tutorial on Parallel and Concurrent Programming in Haskell

    Science.gov (United States)

    Peyton Jones, Simon; Singh, Satnam

    This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.

  17. Psychopathology of Online Poker Players: Review of Literature.

    Science.gov (United States)

    Moreau, Axelle; Chabrol, Henri; Chauchard, Emeline

    2016-06-01

    Background and aims Online Texas Hold'em poker has become a spectacular form of entertainment in our society, and the number of people who use this form of gambling is increasing. It seems that online poker activity challenges existing theoretical concepts about problem gambling behaviors. The purpose of this literature review is to provide a current overview about the population of online poker players. Methods To be selected, articles had to focus on psychopathology in a sample of online poker players, be written in English or French, and be published before November 2015. A total of 17 relevant studies were identified. Results In this population, the proportion of problematic gamblers was higher than in other forms of gambling. Several factors predicting excessive gambling were identified such as stress, internal attribution, dissociation, boredom, negative emotions, irrational beliefs, anxiety, and impulsivity. The population of online poker players is largely heterogeneous, with experimental players forming a specific group. Finally, the validity of the tools used to measure excessive or problematic gambling and irrational beliefs are not suitable for assessing online poker activity. Discussion and conclusions Future studies need to confirm previous findings in the literature of online poker games. Given that skills are important in poker playing, skill development in the frames of excessive use of online poker should be explored more in depth, particularly regarding poker experience and loss chasing. Future research should focus on skills, self-regulation, and psychopathology of online poker players.

  18. VERIFICATION OF PARALLEL AUTOMATA-BASED PROGRAMS

    Directory of Open Access Journals (Sweden)

    M. A. Lukin

    2014-01-01

    Full Text Available The paper deals with an interactive method of automatic verification for parallel automata-based programs. The hierarchical state machines can be implemented in different threads and can interact with each other. Verification is done by means of Spin tool and includes automatic Promela model construction, conversion of LTL-formula to Spin format and counterexamples in terms of automata. Interactive verification gives the possibility to decrease verification time and increase the maximum size of verifiable programs. Considered method supports verification of the parallel system for hierarchical automata that interact with each other through messages and shared variables. The feature of automaton model is that each state machine is considered as a new data type and can have an arbitrary bounded number of instances. Each state machine in the system can run a different state machine in a new thread or have nested state machine. This method was implemented in the developed Stater tool. Stater shows correct operation for all test cases.

  19. Poker as a skill game: rational versus irrational behaviors

    Science.gov (United States)

    Javarone, Marco Alberto

    2015-03-01

    In many countries poker is one of the most popular card games. Although each variant of poker has its own rules, all involve the use of money to make the challenge meaningful. Nowadays, in the collective consciousness, some variants of poker are referred to as games of skill, others as gambling. A poker table can be viewed as a psychology lab, where human behavior can be observed and quantified. This work provides a preliminary analysis of the role of rationality in poker games, using a stylized version of Texas Hold'em. In particular, we compare the performance of two different kinds of players, i.e. rational versus irrational players, during a poker tournament. Results show that these behaviors (i.e. rationality and irrationality) affect both the outcomes of challenges and the way poker should be classified.

  20. Measuring Consumer Preferences Using Conjoint Poker

    NARCIS (Netherlands)

    O. Toubia; M.G. de Jong (Martijn); D. Stieger; J.H. Fuller (John)

    2012-01-01

    textabstractWe develop and test an incentive-compatible Conjoint Poker (CP) game. The preference data collected in the context of this game are comparable to incentive-compatible choice-based conjoint (CBC) analysis data. We develop a statistical efficiency measure and an algorithm to construct

  1. Parallel solution of sparse one-dimensional dynamic programming problems

    Science.gov (United States)

    Nicol, David M.

    1989-01-01

    Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.

  2. Parallelization for first principles electronic state calculation program

    International Nuclear Information System (INIS)

    Watanabe, Hiroshi; Oguchi, Tamio.

    1997-03-01

    In this report we study the parallelization for First principles electronic state calculation program. The target machines are NEC SX-4 for shared memory type parallelization and FUJITSU VPP300 for distributed memory type parallelization. The features of each parallel machine are surveyed, and the parallelization methods suitable for each are proposed. It is shown that 1.60 times acceleration is achieved with 2 CPU parallelization by SX-4 and 4.97 times acceleration is achieved with 12 PE parallelization by VPP 300. (author)

  3. Data-Parallel Programming in a Multithreaded Environment

    Directory of Open Access Journals (Sweden)

    Matthew Haines

    1997-01-01

    Full Text Available Research on programming distributed memory multiprocessors has resulted in a well-understood programming model, namely data-parallel programming. However, data-parallel programming in a multithreaded environment is far less understood. For example, if multiple threads within the same process belong to different data-parallel computations, then the architecture, compiler, or run-time system must ensure that relative indexing and collective operations are handled properly and efficiently. We introduce a run-time-based solution for data-parallel programming in a distributed memory environment that handles the problems of relative indexing and collective communications among thread groups. As a result, the data-parallel programming model can now be executed in a multithreaded environment, such as a system using threads to support both task and data parallelism.

  4. Parallel phase model : a programming model for high-end parallel machines with manycores.

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

    2009-04-01

    This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.

  5. A Verification Technique for Deterministic Parallel Programs (extended version)

    NARCIS (Netherlands)

    Darabi, Saeed; Blom, Stefan; Huisman, Marieke

    2017-01-01

    A commonly used approach to develop parallel programs is to augment a sequential program with compiler directives that indicate which program blocks may potentially be executed in parallel. This paper develops a verification technique to prove correctness of compiler directives combined with

  6. Programming parallel architectures - The BLAZE family of languages

    Science.gov (United States)

    Mehrotra, Piyush

    1989-01-01

    This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.

  7. Parallel programming practical aspects, models and current limitations

    CERN Document Server

    Tarkov, Mikhail S

    2014-01-01

    Parallel programming is designed for the use of parallel computer systems for solving time-consuming problems that cannot be solved on a sequential computer in a reasonable time. These problems can be divided into two classes: 1. Processing large data arrays (including processing images and signals in real time)2. Simulation of complex physical processes and chemical reactions For each of these classes, prospective methods are designed for solving problems. For data processing, one of the most promising technologies is the use of artificial neural networks. Particles-in-cell method and cellular automata are very useful for simulation. Problems of scalability of parallel algorithms and the transfer of existing parallel programs to future parallel computers are very acute now. An important task is to optimize the use of the equipment (including the CPU cache) of parallel computers. Along with parallelizing information processing, it is essential to ensure the processing reliability by the relevant organization ...

  8. Survey on present status and trend of parallel programming environments

    International Nuclear Information System (INIS)

    Takemiya, Hiroshi; Higuchi, Kenji; Honma, Ichiro; Ohta, Hirofumi; Kawasaki, Takuji; Imamura, Toshiyuki; Koide, Hiroshi; Akimoto, Masayuki.

    1997-03-01

    This report intends to provide useful information on software tools for parallel programming through the survey on parallel programming environments of the following six parallel computers, Fujitsu VPP300/500, NEC SX-4, Hitachi SR2201, Cray T94, IBM SP, and Intel Paragon, all of which are installed at Japan Atomic Energy Research Institute (JAERI), moreover, the present status of R and D's on parallel softwares of parallel languages, compilers, debuggers, performance evaluation tools, and integrated tools is reported. This survey has been made as a part of our project of developing a basic software for parallel programming environment, which is designed on the concept of STA (Seamless Thinking Aid to programmers). (author)

  9. The BLAZE language - A parallel language for scientific programming

    Science.gov (United States)

    Mehrotra, Piyush; Van Rosendale, John

    1987-01-01

    A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.

  10. The BLAZE language: A parallel language for scientific programming

    Science.gov (United States)

    Mehrotra, P.; Vanrosendale, J.

    1985-01-01

    A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.

  11. Professional Parallel Programming with C# Master Parallel Extensions with NET 4

    CERN Document Server

    Hillar, Gastón

    2010-01-01

    Expert guidance for those programming today's dual-core processors PCs As PC processors explode from one or two to now eight processors, there is an urgent need for programmers to master concurrent programming. This book dives deep into the latest technologies available to programmers for creating professional parallel applications using C#, .NET 4, and Visual Studio 2010. The book covers task-based programming, coordination data structures, PLINQ, thread pools, asynchronous programming model, and more. It also teaches other parallel programming techniques, such as SIMD and vectorization.Teach

  12. Programming parallel architectures: The BLAZE family of languages

    Science.gov (United States)

    Mehrotra, Piyush

    1988-01-01

    Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.

  13. Integrated Task And Data Parallel Programming: Language Design

    Science.gov (United States)

    Grimshaw, Andrew S.; West, Emily A.

    1998-01-01

    his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated

  14. The FORCE: A highly portable parallel programming language

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.

  15. The FORCE - A highly portable parallel programming language

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.

  16. Declarative Parallel Programming in Spreadsheet End-User Development

    DEFF Research Database (Denmark)

    Biermann, Florian

    2016-01-01

    Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations. In this liter...... can directly apply results from functional array programming to a spreadsheet model of computations.......Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations....... In this literature study, we provide an overview of the publications on spreadsheet end-user programming and declarative array programming to inform further research on parallel programming in spreadsheets. Our results show that there is a clear overlap between spreadsheet programming and array programming and we...

  17. Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

    Directory of Open Access Journals (Sweden)

    Stephen L. Olivier

    2013-01-01

    Full Text Available Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems. Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.

  18. Parallel Libraries to support High-Level Programming

    DEFF Research Database (Denmark)

    Larsen, Morten Nørgaard

    parallel programs by raising the abstraction level, so that the programmers can focus on implementing their algorithms and not on the details of the underlying hardware. The outcome has ranged from ideas on of automating the parallelization of sequential programs to presenting a cluster of machines...... as they would be a single machine. In between is a number of tools helping the programmers handle communication, share data, run loops in parallel, handle algorithms mining huge amounts of data etc. Even though most of them do a good job performance-wise, almost all of them require that the programmers learn......The development of computer architectures during the last ten years have forced programmers to move towards writing parallel programs instead of sequential ones. The homogenous multi-core architectures from the major CPU producers like Intel and AMD has led this trend, but the introduction...

  19. Parallel adaptation of a vectorised quantumchemical program system

    International Nuclear Information System (INIS)

    Van Corler, L.C.H.; Van Lenthe, J.H.

    1987-01-01

    Supercomputers, like the CRAY 1 or the Cyber 205, have had, and still have, a marked influence on Quantum Chemistry. Vectorization has led to a considerable increase in the performance of Quantum Chemistry programs. However, clockcycle times more than a factor 10 smaller than those of the present supercomputers are not to be expected. Therefore future supercomputers will have to depend on parallel structures. Recently, the first examples of such supercomputers have been installed. To be prepared for this new generation of (parallel) supercomputers one should consider the concepts one wants to use and the kind of problems one will encounter during implementation of existing vectorized programs on those parallel systems. The authors implemented four important parts of a large quantumchemical program system (ATMOL), i.e. integrals, SCF, 4-index and Direct-CI in the parallel environment at ECSEC (Rome, Italy). This system offers simulated parallellism on the host computer (IBM 4381) and real parallellism on at most 10 attached processors (FPS-164). Quantumchemical programs usually handle large amounts of data and very large, often sparse matrices. The transfer of that many data can cause problems concerning communication and overhead, in view of which shared memory and shared disks must be considered. The strategy and the tools that were used to parallellise the programs are shown. Also, some examples are presented to illustrate effectiveness and performance of the system in Rome for these type of calculations

  20. Program Transformation to Identify List-Based Parallel Skeletons

    Directory of Open Access Journals (Sweden)

    Venkatesh Kannan

    2016-07-01

    Full Text Available Algorithmic skeletons are used as building-blocks to ease the task of parallel programming by abstracting the details of parallel implementation from the developer. Most existing libraries provide implementations of skeletons that are defined over flat data types such as lists or arrays. However, skeleton-based parallel programming is still very challenging as it requires intricate analysis of the underlying algorithm and often uses inefficient intermediate data structures. Further, the algorithmic structure of a given program may not match those of list-based skeletons. In this paper, we present a method to automatically transform any given program to one that is defined over a list and is more likely to contain instances of list-based skeletons. This facilitates the parallel execution of a transformed program using existing implementations of list-based parallel skeletons. Further, by using an existing transformation called distillation in conjunction with our method, we produce transformed programs that contain fewer inefficient intermediate data structures.

  1. Development of massively parallel quantum chemistry program SMASH

    International Nuclear Information System (INIS)

    Ishimura, Kazuya

    2015-01-01

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C 150 H 30 ) 2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer

  2. Is poker a skill game? New insights from statistical physics

    Science.gov (United States)

    Javarone, Marco Alberto

    2015-06-01

    During last years poker has gained a lot of prestige in several countries and, besides being one of the most famous card games, it represents a modern challenge for scientists belonging to different communities, spanning from artificial intelligence to physics and from psychology to mathematics. Unlike games like chess, the task of classifying the nature of poker (i.e., as “skill game” or gambling) seems really hard and it also constitutes a current problem, whose solution has several implications. In general, gambling offers equal winning probabilities both to rational players (i.e., those that use a strategy) and to irrational ones (i.e., those without a strategy). Therefore, in order to uncover the nature of poker, a viable way is comparing performances of rational vs. irrational players during a series of challenges. Recently, a work on this topic revealed that rationality is a fundamental ingredient to succeed in poker tournaments. In this study we analyze a simple model of poker challenges by a statistical physics approach, with the aim to uncover the nature of this game. As main result we found that, under particular conditions, few irrational players can turn poker into gambling. Therefore, although rationality is a key ingredient to succeed in poker, also the format of challenges has an important role in these dynamics, as it can strongly influence the underlying nature of the game. The importance of our results lies on the related implications, as for instance in identifying the limits within which poker can be considered as a “skill game” and, as a consequence, which kind of format must be chosen to devise algorithms able to face humans.

  3. On program restructuring, scheduling, and communication for parallel processor systems

    Energy Technology Data Exchange (ETDEWEB)

    Polychronopoulos, Constantine D. [Univ. of Illinois, Urbana, IL (United States)

    1986-08-01

    This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, these algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.

  4. WFIRST: Science from the Guest Investigator and Parallel Observation Programs

    Science.gov (United States)

    Postman, Marc; Nataf, David; Furlanetto, Steve; Milam, Stephanie; Robertson, Brant; Williams, Ben; Teplitz, Harry; Moustakas, Leonidas; Geha, Marla; Gilbert, Karoline; Dickinson, Mark; Scolnic, Daniel; Ravindranath, Swara; Strolger, Louis; Peek, Joshua; Marc Postman

    2018-01-01

    The Wide Field InfraRed Survey Telescope (WFIRST) mission will provide an extremely rich archival dataset that will enable a broad range of scientific investigations beyond the initial objectives of the proposed key survey programs. The scientific impact of WFIRST will thus be significantly expanded by a robust Guest Investigator (GI) archival research program. We will present examples of GI research opportunities ranging from studies of the properties of a variety of Solar System objects, surveys of the outer Milky Way halo, comprehensive studies of cluster galaxies, to unique and new constraints on the epoch of cosmic re-ionization and the assembly of galaxies in the early universe.WFIRST will also support the acquisition of deep wide-field imaging and slitless spectroscopic data obtained in parallel during campaigns with the coronagraphic instrument (CGI). These parallel wide-field imager (WFI) datasets can provide deep imaging data covering several square degrees at no impact to the scheduling of the CGI program. A competitively selected program of well-designed parallel WFI observation programs will, like the GI science above, maximize the overall scientific impact of WFIRST. We will give two examples of parallel observations that could be conducted during a proposed CGI program centered on a dozen nearby stars.

  5. Basic design of parallel computational program for probabilistic structural analysis

    International Nuclear Information System (INIS)

    Kaji, Yoshiyuki; Arai, Taketoshi; Gu, Wenwei; Nakamura, Hitoshi

    1999-06-01

    In our laboratory, for 'development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory' (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)

  6. A Programming Model for Massive Data Parallelism with Data Dependencies

    International Nuclear Information System (INIS)

    Cui, Xiaohui; Mueller, Frank; Potok, Thomas E.; Zhang, Yongpeng

    2009-01-01

    Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For graphics processor units (GPUs), this is particularly the case when program development is aided by environments such as NVIDIA s Compute Unified Device Architecture (CUDA), which dramatically reduces the gap between domain-specific architectures and general purpose programming. Nonetheless, general-purpose GPU (GPGPU) programming remains subject to several restrictions. Most significantly, the separation of host (CPU) and accelerator (GPU) address spaces requires explicit management of GPU memory resources, especially for massive data parallelism that well exceeds the memory capacity of GPUs. One solution to this problem is to transfer data between the GPU and host memories frequently. In this work, we investigate another approach. We run massively data-parallel applications on GPU clusters. We further propose a programming model for massive data parallelism with data dependencies for this scenario. Experience from micro benchmarks and real-world applications shows that our model provides not only ease of programming but also significant performance gains

  7. Parallel implementation of the PHOENIX generalized stellar atmosphere program. II. Wavelength parallelization

    International Nuclear Information System (INIS)

    Baron, E.; Hauschildt, Peter H.

    1998-01-01

    We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000 - 300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers. copyright 1998 The American Astronomical Society

  8. Center for Programming Models for Scalable Parallel Computing

    Energy Technology Data Exchange (ETDEWEB)

    John Mellor-Crummey

    2008-02-29

    Rice University's achievements as part of the Center for Programming Models for Scalable Parallel Computing include: (1) design and implemention of cafc, the first multi-platform CAF compiler for distributed and shared-memory machines, (2) performance studies of the efficiency of programs written using the CAF and UPC programming models, (3) a novel technique to analyze explicitly-parallel SPMD programs that facilitates optimization, (4) design, implementation, and evaluation of new language features for CAF, including communication topologies, multi-version variables, and distributed multithreading to simplify development of high-performance codes in CAF, and (5) a synchronization strength reduction transformation for automatically replacing barrier-based synchronization with more efficient point-to-point synchronization. The prototype Co-array Fortran compiler cafc developed in this project is available as open source software from http://www.hipersoft.rice.edu/caf.

  9. Identifying risk and mitigating gambling-related harm in online poker

    OpenAIRE

    Parke, A; Griffiths, MD

    2016-01-01

    The present paper conducts a critical analysis of the potential for gambling-related harm in relation to online poker participation, and a theoretical evaluation of current responsible gambling strategies employed to mitigate harm in online gambling and applies the evaluation of these strategies specifically to online poker gambling. Theoretically, the primary risk for harm in online poker is the rapid and continuous nature of poker provisions online, and has been demonstrated to be associate...

  10. Anxiety, Depression and Emotion Regulation Among Regular Online Poker Players.

    Science.gov (United States)

    Barrault, Servane; Bonnaire, Céline; Herrmann, Florian

    2017-12-01

    Poker is a type of gambling that has specific features, including the need to regulate one's emotion to be successful. The aim of the present study is to assess emotion regulation, anxiety and depression in a sample of regular poker players, and to compare the results of problem and non-problem gamblers. 416 regular online poker players completed online questionnaires including sociodemographic data, measures of problem gambling (CPGI), anxiety and depression (HAD scale), and emotion regulation (ERQ). The CPGI was used to divide participants into four groups according to the intensity of their gambling practice (non-problem, low risk, moderate risk and problem gamblers). Anxiety and depression were significantly higher among severe-problem gamblers than among the other groups. Both significantly predicted problem gambling. On the other hand, there was no difference between groups in emotion regulation (cognitive reappraisal and expressive suppression), which was linked neither to problem gambling nor to anxiety and depression (except for cognitive reappraisal, which was significantly correlated to anxiety). Our results underline the links between anxiety, depression and problem gambling among poker players. If emotion regulation is involved in problem gambling among poker players, as strongly suggested by data from the literature, the emotion regulation strategies we assessed (cognitive reappraisal and expressive suppression) may not be those involved. Further studies are thus needed to investigate the involvement of other emotion regulation strategies.

  11. Online Poker Gambling in University Students: Further Findings from an Online Survey

    Science.gov (United States)

    Griffiths, Mark; Parke, Jonathan; Wood, Richard; Rigbye, Jane

    2010-01-01

    Online poker is one of the fastest growing forms of online gambling yet there has been relatively little research to date. This study comprised 422 online poker players (362 males and 60 females) and investigated some of the predicting factors of online poker success and problem gambling using an online questionnaire. Results showed that length of…

  12. Feedback Driven Annotation and Refactoring of Parallel Programs

    DEFF Research Database (Denmark)

    Larsen, Per

    , some program properties are beyond reach of such analysis for theoretical and practical reasons - but can be described by programmers. Three aspects are explored. The first is annotation of the source code. Two annotations are introduced. These allow more accurate modeling of parallelism...... and communication in embedded programs. Runtime checks are developed to ensure that annotations correctly describe observable program behavior. The performance impact of runtime checking is evaluated on several benchmark kernels and is negligible in all cases. The second aspect is compilation feedback. Annotations...

  13. Protocol-Based Verification of Message-Passing Parallel Programs

    DEFF Research Database (Denmark)

    López-Acosta, Hugo-Andrés; Eduardo R. B. Marques, Eduardo R. B.; Martins, Francisco

    2015-01-01

    a protocol language based on a dependent type system for message-passing parallel programs, which includes various communication operators, such as point-to-point messages, broadcast, reduce, array scatter and gather. For the verification of a program against a given protocol, the protocol is first......, that suffer from the state-explosion problem or that otherwise depend on parameters to the program itself. We experimentally evaluated our approach against state-of-the-art tools for MPI to conclude that our approach offers a scalable solution....

  14. Beyond chance? The persistence of performance in online poker.

    Directory of Open Access Journals (Sweden)

    Rogier J D Potter van Loon

    Full Text Available A major issue in the widespread controversy about the legality of poker and the appropriate taxation of winnings is whether poker should be considered a game of skill or a game of chance. To inform this debate we present an analysis into the role of skill in the performance of online poker players, using a large database with hundreds of millions of player-hand observations from real money ring games at three different stakes levels. We find that players whose earlier profitability was in the top (bottom deciles perform better (worse and are substantially more likely to end up in the top (bottom performance deciles of the following time period. Regression analyses of performance on historical performance and other skill-related proxies provide further evidence for persistence and predictability. Simulations point out that skill dominates chance when performance is measured over 1,500 or more hands of play.

  15. Final Report: Center for Programming Models for Scalable Parallel Computing

    Energy Technology Data Exchange (ETDEWEB)

    Mellor-Crummey, John [William Marsh Rice University

    2011-09-13

    As part of the Center for Programming Models for Scalable Parallel Computing, Rice University collaborated with project partners in the design, development and deployment of language, compiler, and runtime support for parallel programming models to support application development for the “leadership-class” computer systems at DOE national laboratories. Work over the course of this project has focused on the design, implementation, and evaluation of a second-generation version of Coarray Fortran. Research and development efforts of the project have focused on the CAF 2.0 language, compiler, runtime system, and supporting infrastructure. This has involved working with the teams that provide infrastructure for CAF that we rely on, implementing new language and runtime features, producing an open source compiler that enabled us to evaluate our ideas, and evaluating our design and implementation through the use of benchmarks. The report details the research, development, findings, and conclusions from this work.

  16. MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

    Science.gov (United States)

    Taft, James R.

    1999-01-01

    Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.

  17. A scalable parallel algorithm for multiple objective linear programs

    Science.gov (United States)

    Wiecek, Malgorzata M.; Zhang, Hong

    1994-01-01

    This paper presents an ADBASE-based parallel algorithm for solving multiple objective linear programs (MOLP's). Job balance, speedup and scalability are of primary interest in evaluating efficiency of the new algorithm. Implementation results on Intel iPSC/2 and Paragon multiprocessors show that the algorithm significantly speeds up the process of solving MOLP's, which is understood as generating all or some efficient extreme points and unbounded efficient edges. The algorithm gives specially good results for large and very large problems. Motivation and justification for solving such large MOLP's are also included.

  18. Parallelization and checkpointing of GPU applications through program transformation

    Energy Technology Data Exchange (ETDEWEB)

    Solano-Quinde, Lizandro Damian [Iowa State Univ., Ames, IA (United States)

    2012-01-01

    GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solve the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and

  19. An empirical study of FORTRAN programs for parallelizing compilers

    Science.gov (United States)

    Shen, Zhiyu; Li, Zhiyuan; Yew, Pen-Chung

    1990-01-01

    Some results are reported from an empirical study of program characteristics that are important in parallelizing compiler writers, especially in the area of data dependence analysis and program transformations. The state of the art in data dependence analysis and some parallel execution techniques are examined. The major findings are included. Many subscripts contain symbolic terms with unknown values. A few methods of determining their values at compile time are evaluated. Array references with coupled subscripts appear quite frequently; these subscripts must be handled simultaneously in a dependence test, rather than being handled separately as in current test algorithms. Nonzero coefficients of loop indexes in most subscripts are found to be simple: they are either 1 or -1. This allows an exact real-valued test to be as accurate as an exact integer-valued test for one-dimensional or two-dimensional arrays. Dependencies with uncertain distance are found to be rather common, and one of the main reasons is the frequent appearance of symbolic terms with unknown values.

  20. The Glasgow Parallel Reduction Machine: Programming Shared-memory Many-core Systems using Parallel Task Composition

    Directory of Open Access Journals (Sweden)

    Ashkan Tousimojarad

    2013-12-01

    Full Text Available We present the Glasgow Parallel Reduction Machine (GPRM, a novel, flexible framework for parallel task-composition based many-core programming. We allow the programmer to structure programs into task code, written as C++ classes, and communication code, written in a restricted subset of C++ with functional semantics and parallel evaluation. In this paper we discuss the GPRM, the virtual machine framework that enables the parallel task composition approach. We focus the discussion on GPIR, the functional language used as the intermediate representation of the bytecode running on the GPRM. Using examples in this language we show the flexibility and power of our task composition framework. We demonstrate the potential using an implementation of a merge sort algorithm on a 64-core Tilera processor, as well as on a conventional Intel quad-core processor and an AMD 48-core processor system. We also compare our framework with OpenMP tasks in a parallel pointer chasing algorithm running on the Tilera processor. Our results show that the GPRM programs outperform the corresponding OpenMP codes on all test platforms, and can greatly facilitate writing of parallel programs, in particular non-data parallel algorithms such as reductions.

  1. A Parallel Vector Machine for the PM Programming Language

    Science.gov (United States)

    Bellerby, Tim

    2016-04-01

    PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using

  2. Judgments under competition and uncertainty : empirical evidence from online poker

    OpenAIRE

    Engelbergs, Jörg

    2010-01-01

    Playing poker has many aspects in common with the making of business decisions. Agents act in a strategically rich, dynamic environment, where they are repeatedly facing competition under uncertain prospects. Their main goal is to maximize their resources. However simple this goal is put down, it is difficult to fulfill. As vast and rich research has shown, heuristics and biases affect human decision-making. This work adds to these findings by providing empirical evidence for behavioral patte...

  3. Massive parallel electromagnetic field simulation program JEMS-FDTD design and implementation on jasmin

    International Nuclear Information System (INIS)

    Li Hanyu; Zhou Haijing; Dong Zhiwei; Liao Cheng; Chang Lei; Cao Xiaolin; Xiao Li

    2010-01-01

    A large-scale parallel electromagnetic field simulation program JEMS-FDTD(J Electromagnetic Solver-Finite Difference Time Domain) is designed and implemented on JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). This program can simulate propagation, radiation, couple of electromagnetic field by solving Maxwell equations on structured mesh explicitly with FDTD method. JEMS-FDTD is able to simulate billion-mesh-scale problems on thousands of processors. In this article, the program is verified by simulating the radiation of an electric dipole. A beam waveguide is simulated to demonstrate the capability of large scale parallel computation. A parallel performance test indicates that a high parallel efficiency is obtained. (authors)

  4. Exploiting variability for energy optimization of parallel programs

    Energy Technology Data Exchange (ETDEWEB)

    Lavrijsen, Wim [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Iancu, Costin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); de Jong, Wibe [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Chen, Xin [Georgia Inst. of Technology, Atlanta, GA (United States); Schwan, Karsten [Georgia Inst. of Technology, Atlanta, GA (United States)

    2016-04-18

    Here in this paper we present optimizations that use DVFS mechanisms to reduce the total energy usage in scientific applications. Our main insight is that noise is intrinsic to large scale parallel executions and it appears whenever shared resources are contended. The presence of noise allows us to identify and manipulate any program regions amenable to DVFS. When compared to previous energy optimizations that make per core decisions using predictions of the running time, our scheme uses a qualitative approach to recognize the signature of executions amenable to DVFS. By recognizing the "shape of variability" we can optimize codes with highly dynamic behavior, which pose challenges to all existing DVFS techniques. We validate our approach using offline and online analyses for one-sided and two-sided communication paradigms. We have applied our methods to NWChem, and we show best case improvements in energy use of 12% at no loss in performance when using online optimizations running on 720 Haswell cores with one-sided communication. With NWChem on MPI two-sided and offline analysis, capturing the initialization, we find energy savings of up to 20%, with less than 1% performance cost.

  5. MulticoreBSP for C : A high-performance library for shared-memory parallel programming

    NARCIS (Netherlands)

    Yzelman, A. N.; Bisseling, R. H.; Roose, D.; Meerbergen, K.

    2014-01-01

    The bulk synchronous parallel (BSP) model, as well as parallel programming interfaces based on BSP, classically target distributed-memory parallel architectures. In earlier work, Yzelman and Bisseling designed a MulticoreBSP for Java library specifically for shared-memory architectures. In the

  6. Poker, Sports Betting, and Less Popular Alternatives: Status, Friendship Networks, and Male Adolescent Gambling

    Science.gov (United States)

    DiCicco-Bloom, Benjamin; Romer, Daniel

    2012-01-01

    The authors argue that the recent increase in poker play among adolescent males in the United States was primarily attributable to high-status male youth who are more able to organize "informal" gambling games (e.g., poker and sports betting) than are low-status male youth who are left to gamble on "formal" games (e.g., lotteries and slot…

  7. Image Encryption Performance Evaluation Based on Poker Test

    Directory of Open Access Journals (Sweden)

    Shanshan Li

    2016-01-01

    Full Text Available The fast development of image encryption requires performance evaluation metrics. Traditional metrics like entropy do not consider the correlation between local pixel and its neighborhood. These metrics cannot estimate encryption based on image pixel coordinate permutation. A novel effectiveness evaluation metric is proposed in this paper to address the issue. The cipher text image is transformed to bit stream. Then, Poker Test is implemented. The proposed metric considers the neighbor correlations of image by neighborhood selection and clip scan. The randomness of the cipher text image is tested by calculating the chi-square test value. Experiment results verify the efficiency of the proposed metrics.

  8. 76 FR 66309 - Pilot Program for Parallel Review of Medical Products; Correction

    Science.gov (United States)

    2011-10-26

    ...] Food and Drug Administration [Docket No. FDA-2010-N-0308] Pilot Program for Parallel Review of Medical Products; Correction AGENCY: Food and Drug Administration, Centers for Medicare and Medicaid Services, HHS... technologies to participate in a program of parallel FDA-CMS review. The document was published with an...

  9. MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

    Science.gov (United States)

    Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

    2018-02-01

    We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.

  10. The P4 Parallel Programming System, the Linda Environment, and Some Experiences with Parallel Computation

    Directory of Open Access Journals (Sweden)

    Allan R. Larrabee

    1993-01-01

    Full Text Available The first digital computers consisted of a single processor acting on a single stream of data. In this so-called "von Neumann" architecture, computation speed is limited mainly by the time required to transfer data between the processor and memory. This limiting factor has been referred to as the "von Neumann bottleneck". The concern that the miniaturization of silicon-based integrated circuits will soon reach theoretical limits of size and gate times has led to increased interest in parallel architectures and also spurred research into alternatives to silicon-based implementations of processors. Meanwhile, sequential processors continue to be produced that have increased clock rates and an increase in memory locally available to a processor, and an increase in the rate at which data can be transferred to and from memories, networks, and remote storage. The efficiency of compilers and operating systems is also improving over time. Although such characteristics limit maximum performance, a large improvement in the speed of scientific computations can often be achieved by utilizing more efficient algorithms, particularly those that support parallel computation. This work discusses experiences with two tools for large grain (or "macro task" parallelism.

  11. An approach to multicore parallelism using functional programming: A case study based on Presburger Arithmetic

    DEFF Research Database (Denmark)

    Dung, Phan Anh; Hansen, Michael Reichhardt

    2015-01-01

    into account negative factors such as cache misses, garbage collection and overhead due to task creations, because such factors may introduce sequential bottlenecks with severe consequences for the parallel efficiency. The experiments were conducted using the functional programming language F# and .NET......In this paper we investigate multicore parallelism in the context of functional programming by means of two quantifier-elimination procedures for Presburger Arithmetic: one is based on Cooper’s algorithm and the other is based on the Omega Test. We first develop correct-by-construction prototype...... implementations in a functional programming language. Thereafter, the parallelism inherent in the decision procedures is analyzed using the Directed Acyclic Graph (DAG) model of multicore parallelism. In the step from a DAG model to a parallel implementation, the parallel implementation is optimized taking...

  12. Retargeting of existing FORTRAN program and development of parallel compilers

    Science.gov (United States)

    Agrawal, Dharma P.

    1988-01-01

    The software models used in implementing the parallelizing compiler for the B-HIVE multiprocessor system are described. The various models and strategies used in the compiler development are: flexible granularity model, which allows a compromise between two extreme granularity models; communication model, which is capable of precisely describing the interprocessor communication timings and patterns; loop type detection strategy, which identifies different types of loops; critical path with coloring scheme, which is a versatile scheduling strategy for any multicomputer with some associated communication costs; and loop allocation strategy, which realizes optimum overlapped operations between computation and communication of the system. Using these models, several sample routines of the AIR3D package are examined and tested. It may be noted that automatically generated codes are highly parallelized to provide the maximized degree of parallelism, obtaining the speedup up to a 28 to 32-processor system. A comparison of parallel codes for both the existing and proposed communication model, is performed and the corresponding expected speedup factors are obtained. The experimentation shows that the B-HIVE compiler produces more efficient codes than existing techniques. Work is progressing well in completing the final phase of the compiler. Numerous enhancements are needed to improve the capabilities of the parallelizing compiler.

  13. Online and live regular poker players: Do they differ in impulsive sensation seeking and gambling practice?

    Science.gov (United States)

    Barrault, Servane; Varescon, Isabelle

    2016-01-01

    Background and aims Online gambling appears to have special features, such as anonymity, speed of play and permanent availability, which may contribute to the facilitation and increase in gambling practice, potentially leading to problem gambling. The aims of this study were to assess sociodemographic characteristics, gambling practice and impulsive sensation seeking among a population of regular poker players with different levels of gambling intensity and to compare online and live players. Methods 245 regular poker players (180 online players and 65 live players) completed online self-report scales assessing sociodemographic data, pathological gambling (SOGS), gambling practice (poker questionnaire) and impulsive sensation seeking (ImpSS). We used SOGS scores to rank players according to the intensity of their gambling practice (non-pathological gamblers, problem gamblers and pathological gamblers). Results All poker players displayed a particular sociodemographic profile: they were more likely to be young men, executives or students, mostly single and working full-time. Online players played significantly more often whereas live players reported significantly longer gambling sessions. Sensation seeking was high across all groups, whereas impulsivity significantly distinguished players according to the intensity of gambling. Discussion Our results show the specific profile of poker players. Both impulsivity and sensation seeking seem to be involved in pathological gambling, but playing different roles. Sensation seeking may determine interest in poker whereas impulsivity may be involved in pathological gambling development and maintenance. Conclusions This study opens up new research perspectives and insights into preventive and treatment actions for pathological poker players. PMID:28092187

  14. Cognitive distortions, anxiety, and depression among regular and pathological gambling online poker players.

    Science.gov (United States)

    Barrault, Servane; Varescon, Isabelle

    2013-03-01

    The aims were to assess cognitive distortions and psychological distress (anxiety and depression) among online poker players of different levels of gambling intensity (non-pathological gamblers [NPG], problem gamblers [PbG], and pathological gamblers [PG]), and to examine the relationship between these variables and gambling pathology. Overall, 245 regular online poker players recruited on an Internet forum completed online self-report scales assessing pathological gambling (South Oaks Gambling Screen [SOGS]), psychological distress (Hospital Anxiety and Depression Scale [HADS]) and cognitive distortions (Gambling-Related Cognition Scale). Based on their SOGS scores, poker players were ranked into three groups: NPG (n=146), PbG (n=55), and PG (n=44). All poker players appeared to be more anxious than depressive. PG exhibited higher levels of depression and anxiety than did PbG and NPG. Cognitive distortions also significantly discriminated PG from PbG and NPG. A regression model showed that the perceived inability to stop gambling, the illusion of control, depression (HADS D), and anxiety were good predictors for pathological gambling among poker players. Our results suggest that cognitive distortions play an important role in the development and maintenance of gambling pathology. This study also underlines the role of anxiety and depression in pathological gambling among poker players. It seems relevant to take these elements into account in the research, prevention, and treatment of pathological gambling poker players.

  15. Cognitive and Performance Enhancing Medication Use to Improve Performance in Poker.

    Science.gov (United States)

    Caballero, Joshua; Ownby, Raymond L; Rey, Jose A; Clauson, Kevin A

    2016-09-01

    Use of neuroenhancers has been studied in groups ranging from students to surgeons; however, use of cognitive and performance enhancing medications (CPEMs) to improve performance in poker has remained largely overlooked. To assess the use of CPEMs to improve poker performance, a survey of poker players was conducted. Participants were recruited via Internet poker forums; 198 completed the online survey. Approximately 28 % of respondents used prescription CPEMs, with the most commonly used including: amphetamine/dextroamphetamine (62 %), benzodiazepines (20 %), and methylphenidate (20 %). CPEMs were used in poker to focus (73 %), calm nerves (11 %), and stay awake (11 %). Caffeine (71 %), as well as conventionally counter-intuitive substances like marijuana (35 %) and alcohol (30 %) were also reported to enhance poker performance. Non-users of CPEMs were dissuaded from use due to not knowing where to get them (29 %), apprehension about trying them (26 %), and legal or ethical concerns (16 %). Respondents most frequently acquired CPEMs via friends/fellow poker players (52 %), or prescription from physician (38 %). Additionally, greater use of CPEMs was associated with living outside the United States (p = 0.042), prior use of prescription medications for improving non-poker related performance (p < 0.001), and amateur and semi-professional player status (p = 0.035). Unmonitored use of pharmacologically active agents and their methods of acquisition highlight safety concerns in this cohort of poker players, especially among non-professional players. The current state of guidance from national organizations on CPEM use in healthy individuals could impact prescribing patterns.

  16. Effects of alcohol, initial gambling outcomes, impulsivity, and gambling cognitions on gambling behavior using a video poker task.

    Science.gov (United States)

    Corbin, William R; Cronce, Jessica M

    2017-06-01

    Drinking and gambling frequently co-occur, and concurrent gambling and drinking may lead to greater negative consequences than either behavior alone. Building on prior research on the effects of alcohol, initial gambling outcomes, impulsivity, and gambling cognitions on gambling behaviors using a chance-based (nonstrategic) slot-machine task, the current study explored the impact of these factors on a skill-based (strategic) video poker task. We anticipated larger average bets and greater gambling persistence under alcohol relative to placebo, and expected alcohol effects to be moderated by initial gambling outcomes, impulsivity, and gambling cognitions. Participants (N = 162; 25.9% female) were randomly assigned to alcohol (target BrAC = .08g%) or placebo and were given $10 to wager on a simulated video poker task, which was programmed to produce 1 of 3 initial outcomes (win, breakeven, or lose) before beginning a progressive loss schedule. Despite evidence for validity of the video poker task and alcohol administration paradigm, primary hypotheses were not supported. Individuals who received alcohol placed smaller wagers than participants in the placebo condition, though this effect was not statistically significant, and the direction of effects was reversed in at-risk gamblers (n = 41). These findings contradict prior research and suggest that alcohol effects on gambling behavior may differ by gambling type (nonstrategic vs. strategic games). Interventions that suggest alcohol is universally disinhibiting may be at odds with young adults' lived experience and thus be less effective than those that recognize the greater complexity of alcohol effects. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  17. Cluster communication protocols for parallel-programming systems

    NARCIS (Netherlands)

    Verstoep, K.; Bhoedjang, R.A.F.; Rühl, T.; Bal, H.E.; Hofman, R.

    2004-01-01

    Clusters of workstations are a popular platform for high-performance computing. For many parallel applications, efficient use of a fast interconnection network is essential for good performance. Several modern System Area Networks include programmable network interfaces that can be tailored to

  18. Parallelizing Deadlock Resolution in Symbolic Synthesis of Distributed Programs

    Science.gov (United States)

    2008-01-01

    infinite com- putation by stuttering at sl. On the other hand, if there exists a state sd such that there is no outgoing transition (or a self-loop...Automated Software Engineering (ASE), pages 44–53, 2007. [12] J. Ezekiel and G. Lüttgen. Measuring and evaluating parallel state-space exploration

  19. HPC parallel programming model for gyrokinetic MHD simulation

    International Nuclear Information System (INIS)

    Naitou, Hiroshi; Yamada, Yusuke; Tokuda, Shinji; Ishii, Yasutomo; Yagi, Masatoshi

    2011-01-01

    The 3-dimensional gyrokinetic PIC (particle-in-cell) code for MHD simulation, Gpic-MHD, was installed on SR16000 (“Plasma Simulator”), which is a scalar cluster system consisting of 8,192 logical cores. The Gpic-MHD code advances particle and field quantities in time. In order to distribute calculations over large number of logical cores, the total simulation domain in cylindrical geometry was broken up into N DD-r × N DD-z (number of radial decomposition times number of axial decomposition) small domains including approximately the same number of particles. The axial direction was uniformly decomposed, while the radial direction was non-uniformly decomposed. N RP replicas (copies) of each decomposed domain were used (“particle decomposition”). The hybrid parallelization model of multi-threads and multi-processes was employed: threads were parallelized by the auto-parallelization and N DD-r × N DD-z × N RP processes were parallelized by MPI (message-passing interface). The parallelization performance of Gpic-MHD was investigated for the medium size system of N r × N θ × N z = 1025 × 128 × 128 mesh with 4.196 or 8.192 billion particles. The highest speed for the fixed number of logical cores was obtained for two threads, the maximum number of N DD-z , and optimum combination of N DD-r and N RP . The observed optimum speeds demonstrated good scaling up to 8,192 logical cores. (author)

  20. Machine and Collection Abstractions for User-Implemented Data-Parallel Programming

    Directory of Open Access Journals (Sweden)

    Magne Haveraaen

    2000-01-01

    Full Text Available Data parallelism has appeared as a fruitful approach to the parallelisation of compute-intensive programs. Data parallelism has the advantage of mimicking the sequential (and deterministic structure of programs as opposed to task parallelism, where the explicit interaction of processes has to be programmed. In data parallelism data structures, typically collection classes in the form of large arrays, are distributed on the processors of the target parallel machine. Trying to extract distribution aspects from conventional code often runs into problems with a lack of uniformity in the use of the data structures and in the expression of data dependency patterns within the code. Here we propose a framework with two conceptual classes, Machine and Collection. The Machine class abstracts hardware communication and distribution properties. This gives a programmer high-level access to the important parts of the low-level architecture. The Machine class may readily be used in the implementation of a Collection class, giving the programmer full control of the parallel distribution of data, as well as allowing normal sequential implementation of this class. Any program using such a collection class will be parallelisable, without requiring any modification, by choosing between sequential and parallel versions at link time. Experiments with a commercial application, built using the Sophus library which uses this approach to parallelisation, show good parallel speed-ups, without any adaptation of the application program being needed.

  1. User's guide of parallel program development environment (PPDE). The 2nd edition

    International Nuclear Information System (INIS)

    Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio; Ohta, Hirofumi

    2000-03-01

    The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a parallelizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)

  2. Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

    KAUST Repository

    Hamam, Alwaleed A.

    2017-03-13

    Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it\\'s time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.

  3. Cell verification of parallel burnup calculation program MCBMPI based on MPI

    International Nuclear Information System (INIS)

    Yang Wankui; Liu Yaoguang; Ma Jimin; Wang Guanbo; Yang Xin; She Ding

    2014-01-01

    The parallel burnup calculation program MCBMPI was developed. The program was modularized. The parallel MCNP5 program MCNP5MPI was employed as neutron transport calculation module. And a composite of three solution methods was used to solve burnup equation, i.e. matrix exponential technique, TTA analytical solution, and Gauss Seidel iteration. MPI parallel zone decomposition strategy was concluded in the program. The program system only consists of MCNP5MPI and burnup subroutine. The latter achieves three main functions, i.e. zone decomposition, nuclide transferring and decaying, and data exchanging with MCNP5MPI. Also, the program was verified with the pressurized water reactor (PWR) cell burnup benchmark. The results show that it,s capable to apply the program to burnup calculation of multiple zones, and the computation efficiency could be significantly improved with the development of computer hardware. (authors)

  4. 76 FR 62808 - Pilot Program for Parallel Review of Medical Products

    Science.gov (United States)

    2011-10-11

    ... voluntary participation in the pilot program, as well as the guiding principles the Agencies intend to... 57045), parallel review is intended to reduce the time between FDA marketing approval and CMS national...

  5. F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

    Science.gov (United States)

    DiNucci, David C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).

  6. The technical feasibility of uranium enrichment for nuclear bomb construction at the parallel nuclear program plant

    International Nuclear Information System (INIS)

    Rosa, L.P.

    1990-01-01

    It is discussed the hole of the Parallel Nuclear Program is Brazil and the feasibility of uranium enrichment for nuclear bomb construction. This program involves two research centers, one belonging to the brazilian navy and another to the aeronautics. Some other brazilian institutes like CTA, IPEN, COPESP and CETEX and also taking part in the program. (A.C.A.S.)

  7. Vdebug: debugging tool for parallel scientific programs. Design report on vdebug

    International Nuclear Information System (INIS)

    Matsuda, Katsuyuki; Takemiya, Hiroshi

    2000-02-01

    We report on a debugging tool called vdebug which supports debugging work for parallel scientific simulation programs. It is difficult to debug scientific programs with an existing debugger, because the volume of data generated by the programs is too large for users to check data in characters. Usually, the existing debugger shows data values in characters. To alleviate it, we have developed vdebug which enables to check the validity of large amounts of data by showing these data values visually. Although targets of vdebug have been restricted to sequential programs, we have made it applicable to parallel programs by realizing the function of merging and visualizing data distributed on programs on each computer node. Now, vdebug works on seven kinds of parallel computers. In this report, we describe the design of vdebug. (author)

  8. The construction of weakly parallel tests by mathematical programming

    NARCIS (Netherlands)

    Adema, J.J.; Adema, Jos J.

    1990-01-01

    Data banks with items calibrated under an item response model can be used for the construction of tests. Mathematical programming models like the Maximin Model are formulated for computerized item selection from a bank. In this paper, mathematical programming models based on the Maximin Model are

  9. User's guide of parallel program development environment (PPDE). The 2nd edition

    Energy Technology Data Exchange (ETDEWEB)

    Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio [Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute, Tokyo (Japan); Ohta, Hirofumi [Hitachi Ltd., Tokyo (Japan)

    2000-03-01

    The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a paralleilizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)

  10. Source-Level Debugging of Automatically Parallelized Programs

    Science.gov (United States)

    1992-10-23

    environment can be difficult. However, users typically design their programs so dhat they can be mmn because debugging a program usually requires that...AUTOMAflCALLY PARAULEMD PfROGAM lftm dkbmrallon In jenaaL the relationsip between globil and local addresses on a processor p is as "-folows: Equation U...Implementation, pages 1-11. ACM, San Francisco, California, June, 1992. [Bruegge 91] Bruegge, B. A portable platform for distributed event environments

  11. Concurrent extensions to the FORTRAN language for parallel programming of computational fluid dynamics algorithms

    Science.gov (United States)

    Weeks, Cindy Lou

    1986-01-01

    Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.

  12. Comparative Study of Message Passing and Shared Memory Parallel Programming Models in Neural Network Training

    Energy Technology Data Exchange (ETDEWEB)

    Vitela, J.; Gordillo, J.; Cortina, L; Hanebutte, U.

    1999-12-14

    It is presented a comparative performance study of a coarse grained parallel neural network training code, implemented in both OpenMP and MPI, standards for shared memory and message passing parallel programming environments, respectively. In addition, these versions of the parallel training code are compared to an implementation utilizing SHMEM the native SGI/CRAY environment for shared memory programming. The multiprocessor platform used is a SGI/Cray Origin 2000 with up to 32 processors. It is shown that in this study, the native CRAY environment outperforms MPI for the entire range of processors used, while OpenMP shows better performance than the other two environments when using more than 19 processors. In this study, the efficiency is always greater than 60% regardless of the parallel programming environment used as well as of the number of processors.

  13. On the Performance of the Python Programming Language for Serial and Parallel Scientific Computations

    Directory of Open Access Journals (Sweden)

    Xing Cai

    2005-01-01

    Full Text Available This article addresses the performance of scientific applications that use the Python programming language. First, we investigate several techniques for improving the computational efficiency of serial Python codes. Then, we discuss the basic programming techniques in Python for parallelizing serial scientific applications. It is shown that an efficient implementation of the array-related operations is essential for achieving good parallel performance, as for the serial case. Once the array-related operations are efficiently implemented, probably using a mixed-language implementation, good serial and parallel performance become achievable. This is confirmed by a set of numerical experiments. Python is also shown to be well suited for writing high-level parallel programs.

  14. Process-Oriented Parallel Programming with an Application to Data-Intensive Computing

    OpenAIRE

    Givelberg, Edward

    2014-01-01

    We introduce process-oriented programming as a natural extension of object-oriented programming for parallel computing. It is based on the observation that every class of an object-oriented language can be instantiated as a process, accessible via a remote pointer. The introduction of process pointers requires no syntax extension, identifies processes with programming objects, and enables processes to exchange information simply by executing remote methods. Process-oriented programming is a h...

  15. Guide to development of a scalar massive parallel programming on Paragon

    Energy Technology Data Exchange (ETDEWEB)

    Ueshima, Yutaka; Arakawa, Takuya; Sasaki, Akira [Japan Atomic Energy Research Inst., Neyagawa, Osaka (Japan). Kansai Research Establishment; Yokota, Hisasi

    1998-10-01

    Parallel calculations using more than hundred computers had begun in Japan only several years ago. The Intel Paragon XP/S 15GP256 , 75MP834 were introduced as pioneers in Japan Atomic Energy Research Institute (JAERI) to pursue massive parallel simulations for advanced photon and fusion researches. Recently, large number of parallel programs have been transplanted or newly produced to perform the parallel calculations with those computers. However, these programs are developed based on software technologies for conventional super computer, therefore they sometimes cause troubles in the massive parallel computing. In principle, when programs are developed under different computer and operating system (OS), prudent directions and knowledge are needed. However, integration of knowledge and standardization of environment are quite difficult because number of Paragon system and Paragon`s users are very small in Japan. Therefore, we summarized information which was got through the process of development of a massive parallel program in the Paragon XP/S 75MP834. (author)

  16. Programming a massively parallel, computation universal system: static behavior

    Energy Technology Data Exchange (ETDEWEB)

    Lapedes, A.; Farber, R.

    1986-01-01

    In previous work by the authors, the ''optimum finding'' properties of Hopfield neural nets were applied to the nets themselves to create a ''neural compiler.'' This was done in such a way that the problem of programming the attractors of one neural net (called the Slave net) was expressed as an optimization problem that was in turn solved by a second neural net (the Master net). In this series of papers that approach is extended to programming nets that contain interneurons (sometimes called ''hidden neurons''), and thus deals with nets capable of universal computation. 22 refs.

  17. Concurrent Programming Using Actors: Exploiting Large-Scale Parallelism,

    Science.gov (United States)

    1985-10-07

    ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT. TASK* Artificial Inteligence Laboratory AREA Is WORK UNIT NUMBERS 545 Technology Square...D-R162 422 CONCURRENT PROGRMMIZNG USING f"OS XL?ITP TEH l’ LARGE-SCALE PARALLELISH(U) NASI AC E Al CAMBRIDGE ARTIFICIAL INTELLIGENCE L. G AGHA ET AL...RESOLUTION TEST CHART N~ATIONAL BUREAU OF STANDA.RDS - -96 A -E. __ _ __ __’ .,*- - -- •. - MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL

  18. Method for resource control in parallel environments using program organization and run-time support

    Science.gov (United States)

    Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

    2001-01-01

    A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.

  19. A Tool for Performance Modeling of Parallel Programs

    Directory of Open Access Journals (Sweden)

    J.A. González

    2003-01-01

    Full Text Available Current performance prediction analytical models try to characterize the performance behavior of actual machines through a small set of parameters. In practice, substantial deviations are observed. These differences are due to factors as memory hierarchies or network latency. A natural approach is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each "communication block". Unfortunately, to use this approach implies that the evaluation of parameters must be done for each algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We present a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter, giving us, among other information, the values of those parameters.

  20. A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison.

    Science.gov (United States)

    Martins, W S; Del Cuvillo, J B; Useche, F J; Theobald, K B; Gao, G R

    2001-01-01

    This paper discusses the issues involved in implementing a dynamic programming algorithm for biological sequence comparison on a general-purpose parallel computing platform based on a fine-grain event-driven multithreaded program execution model. Fine-grain multithreading permits efficient parallelism exploitation in this application both by taking advantage of asynchronous point-to-point synchronizations and communication with low overheads and by effectively tolerating latency through the overlapping of computation and communication. We have implemented our scheme on EARTH, a fine-grain event-driven multithreaded execution and architecture model which has been ported to a number of parallel machines with off-the-shelf processors. Our experimental results show that the dynamic programming algorithm can be efficiently implemented on EARTH systems with high performance (e.g., speedup of 90 on 120 nodes), good programmability and reasonable cost.

  1. Remote Memory Access: A Case for Portable, Efficient and Library Independent Parallel Programming

    Directory of Open Access Journals (Sweden)

    Alexandros V. Gerbessiotis

    2004-01-01

    Full Text Available In this work we make a strong case for remote memory access (RMA as the effective way to program a parallel computer by proposing a framework that supports RMA in a library independent, simple and intuitive way. If one uses our approach the parallel code one writes will run transparently under MPI-2 enabled libraries but also bulk-synchronous parallel libraries. The advantage of using RMA is code simplicity, reduced programming complexity, and increased efficiency. We support the latter claims by implementing under this framework a collection of benchmark programs consisting of a communication and synchronization performance assessment program, a dense matrix multiplication algorithm, and two variants of a parallel radix-sort algorithm and examine their performance on a LINUX-based PC cluster under three different RMA enabled libraries: LAM MPI, BSPlib, and PUB. We conclude that implementations of such parallel algorithms using RMA communication primitives lead to code that is as efficient as the message-passing equivalent code and in the case of radix-sort substantially more efficient. In addition our work can be used as a comparative study of the relevant capabilities of the three libraries.

  2. High performance parallelism pearls 2 multicore and many-core programming approaches

    CERN Document Server

    Jeffers, Jim

    2015-01-01

    High Performance Parallelism Pearls Volume 2 offers another set of examples that demonstrate how to leverage parallelism. Similar to Volume 1, the techniques included here explain how to use processors and coprocessors with the same programming - illustrating the most effective ways to combine Xeon Phi coprocessors with Xeon and other multicore processors. The book includes examples of successful programming efforts, drawn from across industries and domains such as biomed, genetics, finance, manufacturing, imaging, and more. Each chapter in this edited work includes detailed explanations of t

  3. Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

    Science.gov (United States)

    Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

    2014-10-30

    Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.

  4. High performance parallel computers for science: New developments at the Fermilab advanced computer program

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.

    1988-08-01

    Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs

  5. Parallel programming of saccades during natural scene viewing: evidence from eye movement positions.

    Science.gov (United States)

    Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng

    2013-10-24

    Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.

  6. DeepStack: Expert-level artificial intelligence in heads-up no-limit poker.

    Science.gov (United States)

    Moravčík, Matej; Schmid, Martin; Burch, Neil; Lisý, Viliam; Morrill, Dustin; Bard, Nolan; Davis, Trevor; Waugh, Kevin; Johanson, Michael; Bowling, Michael

    2017-05-05

    Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker, the quintessential game of imperfect information, is a long-standing challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect-information settings. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition that is automatically learned from self-play using deep learning. In a study involving 44,000 hands of poker, DeepStack defeated, with statistical significance, professional poker players in heads-up no-limit Texas hold'em. The approach is theoretically sound and is shown to produce strategies that are more difficult to exploit than prior approaches. Copyright © 2017, American Association for the Advancement of Science.

  7. Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

    Science.gov (United States)

    Dorband, John E.; Aburdene, Maurice F.

    2002-01-01

    Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.

  8. Generalized Analytical Program of Thyristor Phase Control Circuit with Series and Parallel Resonance Load

    OpenAIRE

    Nakanishi, Sen-ichiro; Ishida, Hideaki; Himei, Toyoji

    1981-01-01

    The systematic analytical method is reqUired for the ac phase control circuit by means of an inverse parallel thyristor pair which has a series and parallel L-C resonant load, because the phase control action causes abnormal and interesting phenomena, such as an extreme increase of voltage and current, an unique increase and decrease of contained higher harmonics, and a wide variation of power factor, etc. In this paper, the program for the analysis of the thyristor phase control circuit with...

  9. Method, systems, and computer program products for implementing function-parallel network firewall

    Science.gov (United States)

    Fulp, Errin W [Winston-Salem, NC; Farley, Ryan J [Winston-Salem, NC

    2011-10-11

    Methods, systems, and computer program products for providing function-parallel firewalls are disclosed. According to one aspect, a function-parallel firewall includes a first firewall node for filtering received packets using a first portion of a rule set including a plurality of rules. The first portion includes less than all of the rules in the rule set. At least one second firewall node filters packets using a second portion of the rule set. The second portion includes at least one rule in the rule set that is not present in the first portion. The first and second portions together include all of the rules in the rule set.

  10. Programming Environment for a High-Performance Parallel Supercomputer with Intelligent Communication

    Directory of Open Access Journals (Sweden)

    A. Gunzinger

    1996-01-01

    Full Text Available At the Electronics Laboratory of the Swiss Federal Institute of Technology (ETH in Zürich, the high-performance parallel supercomputer MUSIC (MUlti processor System with Intelligent Communication has been developed. As applications like neural network simulation and molecular dynamics show, the Electronics Laboratory supercomputer is absolutely on par with those of conventional supercomputers, but electric power requirements are reduced by a factor of 1,000, weight is reduced by a factor of 400, and price is reduced by a factor of 100. Software development is a key issue of such parallel systems. This article focuses on the programming environment of the MUSIC system and on its applications.

  11. A Pilot Study to Compare Programming Effort for Two Parallel Programming Models (PREPRINT)

    National Research Council Canada - National Science Library

    Hochstein, Lorin; Basili, Victor R; Vishkin, Uzi; Gilbert, John

    2007-01-01

    CONTEXT: Writing software for the current generation of parallel systems requires significant programmer effort, and the community is seeking alternatives that reduce effort while still achieving good performance. OBJECTIVE...

  12. Teaching Scientific Computing: A Model-Centered Approach to Pipeline and Parallel Programming with C

    Directory of Open Access Journals (Sweden)

    Vladimiras Dolgopolovas

    2015-01-01

    Full Text Available The aim of this study is to present an approach to the introduction into pipeline and parallel computing, using a model of the multiphase queueing system. Pipeline computing, including software pipelines, is among the key concepts in modern computing and electronics engineering. The modern computer science and engineering education requires a comprehensive curriculum, so the introduction to pipeline and parallel computing is the essential topic to be included in the curriculum. At the same time, the topic is among the most motivating tasks due to the comprehensive multidisciplinary and technical requirements. To enhance the educational process, the paper proposes a novel model-centered framework and develops the relevant learning objects. It allows implementing an educational platform of constructivist learning process, thus enabling learners’ experimentation with the provided programming models, obtaining learners’ competences of the modern scientific research and computational thinking, and capturing the relevant technical knowledge. It also provides an integral platform that allows a simultaneous and comparative introduction to pipelining and parallel computing. The programming language C for developing programming models and message passing interface (MPI and OpenMP parallelization tools have been chosen for implementation.

  13. Academic training: From Evolution Theory to Parallel and Distributed Genetic Programming

    CERN Multimedia

    2007-01-01

    2006-2007 ACADEMIC TRAINING PROGRAMME LECTURE SERIES 15, 16 March From 11:00 to 12:00 - Main Auditorium, bldg. 500 From Evolution Theory to Parallel and Distributed Genetic Programming F. FERNANDEZ DE VEGA / Univ. of Extremadura, SP Lecture No. 1: From Evolution Theory to Evolutionary Computation Evolutionary computation is a subfield of artificial intelligence (more particularly computational intelligence) involving combinatorial optimization problems, which are based to some degree on the evolution of biological life in the natural world. In this tutorial we will review the source of inspiration for this metaheuristic and its capability for solving problems. We will show the main flavours within the field, and different problems that have been successfully solved employing this kind of techniques. Lecture No. 2: Parallel and Distributed Genetic Programming The successful application of Genetic Programming (GP, one of the available Evolutionary Algorithms) to optimization problems has encouraged an ...

  14. Algorithmic differentiation of pragma-defined parallel regions differentiating computer programs containing OpenMP

    CERN Document Server

    Förster, Michael

    2014-01-01

    Numerical programs often use parallel programming techniques such as OpenMP to compute the program's output values as efficient as possible. In addition, derivative values of these output values with respect to certain input values play a crucial role. To achieve code that computes not only the output values simultaneously but also the derivative values, this work introduces several source-to-source transformation rules. These rules are based on a technique called algorithmic differentiation. The main focus of this work lies on the important reverse mode of algorithmic differentiation. The inh

  15. Purification and Characterization of Hemagglutinating Proteins from Poker-Chip Venus (Meretrix lusoria and Corbicula Clam (Corbicula fluminea

    Directory of Open Access Journals (Sweden)

    Chin-Fu Cheng

    2012-01-01

    Full Text Available Hemagglutinating proteins (HAPs were purified from Poker-chip Venus (Meretrix lusoria and Corbicula clam (Corbicula fluminea using gel-filtration chromatography on a Sephacryl S-300 column. The molecular weights of the HAPs obtained from Poker-chip Venus and Corbicula clam were 358 kDa and 380 kDa, respectively. Purified HAP from Poker-chip Venus yielded two subunits with molecular weights of 26 kDa and 29 kDa. However, only one HAP subunit was purified from Corbicula clam, and its molecular weight was 32 kDa. The two Poker-chip Venus HAPs possessed hemagglutinating ability (HAA for erythrocytes of some vertebrate animal species, especially tilapia. Moreover, HAA of the HAP purified from Poker-chip Venus was higher than that of the HAP of Corbicula clam. Furthermore, Poker-chip Venus HAPs possessed better HAA at a pH higher than 7.0. When the temperature was at 4°C–10°C or the salinity was less than 0.5‰, the two Poker-chip Venus HAPs possessed better HAA compared with that of Corbicula clam.

  16. Purification and characterization of hemagglutinating proteins from Poker-chip Venus (Meretrix lusoria) and Corbicula clam (Corbicula fluminea).

    Science.gov (United States)

    Cheng, Chin-Fu; Hung, Shao-Wen; Chang, Yung-Chung; Chen, Ming-Hui; Chang, Chen-Hsuan; Tsou, Li-Tse; Tu, Ching-Yu; Lin, Yu-Hsing; Liu, Pan-Chen; Lin, Shiun-Long; Wang, Way-Shyan

    2012-01-01

    Hemagglutinating proteins (HAPs) were purified from Poker-chip Venus (Meretrix lusoria) and Corbicula clam (Corbicula fluminea) using gel-filtration chromatography on a Sephacryl S-300 column. The molecular weights of the HAPs obtained from Poker-chip Venus and Corbicula clam were 358 kDa and 380 kDa, respectively. Purified HAP from Poker-chip Venus yielded two subunits with molecular weights of 26 kDa and 29 kDa. However, only one HAP subunit was purified from Corbicula clam, and its molecular weight was 32 kDa. The two Poker-chip Venus HAPs possessed hemagglutinating ability (HAA) for erythrocytes of some vertebrate animal species, especially tilapia. Moreover, HAA of the HAP purified from Poker-chip Venus was higher than that of the HAP of Corbicula clam. Furthermore, Poker-chip Venus HAPs possessed better HAA at a pH higher than 7.0. When the temperature was at 4°C-10°C or the salinity was less than 0.5‰, the two Poker-chip Venus HAPs possessed better HAA compared with that of Corbicula clam.

  17. Run-Time and Compiler Support for Programming in Adaptive Parallel Environments

    Directory of Open Access Journals (Sweden)

    Guy Edjlali

    1997-01-01

    Full Text Available For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at run-time. In this article, we discuss run-time support for data-parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a run-time library to provide this support. We discuss how the run-time library can be used by compilers of high-performance Fortran (HPF-like languages to generate code for an adaptive environment. We present performance results for a Navier-Stokes solver and a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network of nondedicated workstations, which are likely to be an important resource for parallel programming in the future.

  18. Towards Interactive Visual Exploration of Parallel Programs using a Domain-Specific Language

    KAUST Repository

    Klein, Tobias

    2016-04-19

    The use of GPUs and the massively parallel computing paradigm have become wide-spread. We describe a framework for the interactive visualization and visual analysis of the run-time behavior of massively parallel programs, especially OpenCL kernels. This facilitates understanding a program\\'s function and structure, finding the causes of possible slowdowns, locating program bugs, and interactively exploring and visually comparing different code variants in order to improve performance and correctness. Our approach enables very specific, user-centered analysis, both in terms of the recording of the run-time behavior and the visualization itself. Instead of having to manually write instrumented code to record data, simple code annotations tell the source-to-source compiler which code instrumentation to generate automatically. The visualization part of our framework then enables the interactive analysis of kernel run-time behavior in a way that can be very specific to a particular problem or optimization goal, such as analyzing the causes of memory bank conflicts or understanding an entire parallel algorithm.

  19. Methodologies and Tools for Tuning Parallel Programs: 80% Art, 20% Science, and 10% Luck

    Science.gov (United States)

    Yan, Jerry C.; Bailey, David (Technical Monitor)

    1996-01-01

    The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. In the past few years, the ubiquitous introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CRI's Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance instrumentation/monitor/tuning technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g. AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.

  20. Effect on the Grain Size Distribution when Preparing Sand Using Poker Vibrators

    DEFF Research Database (Denmark)

    Nielsen, Søren Dam

    2017-01-01

    At Aalborg University and other research institutions, model tests are performed on small -scale foundations. These foundations are often installed in sand which has to be prepared in a reproductive way. At Aalborg University the preparation is done by using poker vibrators. This paper investigat...

  1. Strategic Alliance Poker: Demonstrating the Importance of Complementary Resources and Trust in Strategic Alliance Management

    Science.gov (United States)

    Reutzel, Christopher R.; Worthington, William J.; Collins, Jamie D.

    2012-01-01

    Strategic Alliance Poker (SAP) provides instructors with an opportunity to integrate the resource based view with their discussion of strategic alliances in undergraduate Strategic Management courses. Specifically, SAP provides Strategic Management instructors with an experiential exercise that can be used to illustrate the value creation…

  2. Selling Internet Gambling: Advertising, New Media and the Content of Poker Promotion

    Science.gov (United States)

    McMullan, John L.; Kervin, Melissa

    2012-01-01

    This study examines the web design and engineering, advertising and marketing, and pedagogical features present at a random sample of 71 international poker sites obtained from the Casino City directory in the summer of 2009. We coded for 22 variables related to access, appeal, player protection, customer services, on-site security, use of images,…

  3. Exploring Chemical Equilibrium with Poker Chips: A General Chemistry Laboratory Exercise

    Science.gov (United States)

    Bindel, Thomas H.

    2012-01-01

    A hands-on laboratory exercise at the general chemistry level introduces students to chemical equilibrium through a simulation that uses poker chips and rate equations. More specifically, the exercise allows students to explore reaction tables, dynamic chemical equilibrium, equilibrium constant expressions, and the equilibrium constant based on…

  4. Is poker a game of skill or chance? A quasi-experimental study.

    Science.gov (United States)

    Meyer, Gerhard; von Meduna, Marc; Brosowski, Tim; Hayer, Tobias

    2013-09-01

    Due to intensive marketing and the rapid growth of online gambling, poker currently enjoys great popularity among large sections of the population. Although poker is legally a game of chance in most countries, some (particularly operators of private poker web sites) argue that it should be regarded as a game of skill or sport because the outcome of the game primarily depends on individual aptitude and skill. The available findings indicate that skill plays a meaningful role; however, serious methodological weaknesses and the absence of reliable information regarding the relative importance of chance and skill considerably limit the validity of extant research. Adopting a quasi-experimental approach, the present study examined the extent to which the influence of poker playing skill was more important than card distribution. Three average players and three experts sat down at a six-player table and played 60 computer-based hands of the poker variant "Texas Hold'em" for money. In each hand, one of the average players and one expert received (a) better-than-average cards (winner's box), (b) average cards (neutral box) and (c) worse-than-average cards (loser's box). The standardized manipulation of the card distribution controlled the factor of chance to determine differences in performance between the average and expert groups. Overall, 150 individuals participated in a "fixed-limit" game variant, and 150 individuals participated in a "no-limit" game variant. ANOVA results showed that experts did not outperform average players in terms of final cash balance. Rather, card distribution was the decisive factor for successful poker playing. However, expert players were better able to minimize losses when confronted with disadvantageous conditions (i.e., worse-than-average cards). No significant differences were observed between the game variants. Furthermore, supplementary analyses confirm differential game-related actions dependent on the card distribution, player status

  5. Purification and Characterization of Hemagglutinating Proteins from Poker-Chip Venus (Meretrix lusoria) and Corbicula Clam (Corbicula fluminea)

    OpenAIRE

    Cheng, Chin-Fu; Hung, Shao-Wen; Chang, Yung-Chung; Chen, Ming-Hui; Chang, Chen-Hsuan; Tsou, Li-Tse; Tu, Ching-Yu; Lin, Yu-Hsing; Liu, Pan-Chen; Lin, Shiun-Long; Wang, Way-Shyan

    2012-01-01

    Hemagglutinating proteins (HAPs) were purified from Poker-chip Venus (Meretrix lusoria) and Corbicula clam (Corbicula fluminea) using gel-filtration chromatography on a Sephacryl S-300 column. The molecular weights of the HAPs obtained from Poker-chip Venus and Corbicula clam were 358 kDa and 380 kDa, respectively. Purified HAP from Poker-chip Venus yielded two subunits with molecular weights of 26 kDa and 29 kDa. However, only one HAP subunit was purified from Corbicula clam, and its molecul...

  6. Enabling Requirements-Based Programming for Highly-Dependable Complex Parallel and Distributed Systems

    Science.gov (United States)

    Hinchey, Michael G.; Rash, James L.; Rouff, Christopher A.

    2005-01-01

    The manual application of formal methods in system specification has produced successes, but in the end, despite any claims and assertions by practitioners, there is no provable relationship between a manually derived system specification or formal model and the customer's original requirements. Complex parallel and distributed system present the worst case implications for today s dearth of viable approaches for achieving system dependability. No avenue other than formal methods constitutes a serious contender for resolving the problem, and so recognition of requirements-based programming has come at a critical juncture. We describe a new, NASA-developed automated requirement-based programming method that can be applied to certain classes of systems, including complex parallel and distributed systems, to achieve a high degree of dependability.

  7. Dynamic programming in parallel boundary detection with application to ultrasound intima-media segmentation.

    Science.gov (United States)

    Zhou, Yuan; Cheng, Xinyao; Xu, Xiangyang; Song, Enmin

    2013-12-01

    Segmentation of carotid artery intima-media in longitudinal ultrasound images for measuring its thickness to predict cardiovascular diseases can be simplified as detecting two nearly parallel boundaries within a certain distance range, when plaque with irregular shapes is not considered. In this paper, we improve the implementation of two dynamic programming (DP) based approaches to parallel boundary detection, dual dynamic programming (DDP) and piecewise linear dual dynamic programming (PL-DDP). Then, a novel DP based approach, dual line detection (DLD), which translates the original 2-D curve position to a 4-D parameter space representing two line segments in a local image segment, is proposed to solve the problem while maintaining efficiency and rotation invariance. To apply the DLD to ultrasound intima-media segmentation, it is imbedded in a framework that employs an edge map obtained from multiplication of the responses of two edge detectors with different scales and a coupled snake model that simultaneously deforms the two contours for maintaining parallelism. The experimental results on synthetic images and carotid arteries of clinical ultrasound images indicate improved performance of the proposed DLD compared to DDP and PL-DDP, with respect to accuracy and efficiency. Copyright © 2013 Elsevier B.V. All rights reserved.

  8. MIST: An Open Source Environmental Modelling Programming Language Incorporating Easy to Use Data Parallelism.

    Science.gov (United States)

    Bellerby, Tim

    2014-05-01

    Model Integration System (MIST) is open-source environmental modelling programming language that directly incorporates data parallelism. The language is designed to enable straightforward programming structures, such as nested loops and conditional statements to be directly translated into sequences of whole-array (or more generally whole data-structure) operations. MIST thus enables the programmer to use well-understood constructs, directly relating to the mathematical structure of the model, without having to explicitly vectorize code or worry about details of parallelization. A range of common modelling operations are supported by dedicated language structures operating on cell neighbourhoods rather than individual cells (e.g.: the 3x3 local neighbourhood needed to implement an averaging image filter can be simply accessed from within a simple loop traversing all image pixels). This facility hides details of inter-process communication behind more mathematically relevant descriptions of model dynamics. The MIST automatic vectorization/parallelization process serves both to distribute work among available nodes and separately to control storage requirements for intermediate expressions - enabling operations on very large domains for which memory availability may be an issue. MIST is designed to facilitate efficient interpreter based implementations. A prototype open source interpreter is available, coded in standard FORTRAN 95, with tools to rapidly integrate existing FORTRAN 77 or 95 code libraries. The language is formally specified and thus not limited to FORTRAN implementation or to an interpreter-based approach. A MIST to FORTRAN compiler is under development and volunteers are sought to create an ANSI-C implementation. Parallel processing is currently implemented using OpenMP. However, parallelization code is fully modularised and could be replaced with implementations using other libraries. GPU implementation is potentially possible.

  9. Development, Verification and Validation of Parallel, Scalable Volume of Fluid CFD Program for Propulsion Applications

    Science.gov (United States)

    West, Jeff; Yang, H. Q.

    2014-01-01

    There are many instances involving liquid/gas interfaces and their dynamics in the design of liquid engine powered rockets such as the Space Launch System (SLS). Some examples of these applications are: Propellant tank draining and slosh, subcritical condition injector analysis for gas generators, preburners and thrust chambers, water deluge mitigation for launch induced environments and even solid rocket motor liquid slag dynamics. Commercially available CFD programs simulating gas/liquid interfaces using the Volume of Fluid approach are currently limited in their parallel scalability. In 2010 for instance, an internal NASA/MSFC review of three commercial tools revealed that parallel scalability was seriously compromised at 8 cpus and no additional speedup was possible after 32 cpus. Other non-interface CFD applications at the time were demonstrating useful parallel scalability up to 4,096 processors or more. Based on this review, NASA/MSFC initiated an effort to implement a Volume of Fluid implementation within the unstructured mesh, pressure-based algorithm CFD program, Loci-STREAM. After verification was achieved by comparing results to the commercial CFD program CFD-Ace+, and validation by direct comparison with data, Loci-STREAM-VoF is now the production CFD tool for propellant slosh force and slosh damping rate simulations at NASA/MSFC. On these applications, good parallel scalability has been demonstrated for problems sizes of tens of millions of cells and thousands of cpu cores. Ongoing efforts are focused on the application of Loci-STREAM-VoF to predict the transient flow patterns of water on the SLS Mobile Launch Platform in order to support the phasing of water for launch environment mitigation so that vehicle determinantal effects are not realized.

  10. CaKernel – A Parallel Application Programming Framework for Heterogenous Computing Architectures

    Directory of Open Access Journals (Sweden)

    Marek Blazewicz

    2011-01-01

    Full Text Available With the recent advent of new heterogeneous computing architectures there is still a lack of parallel problem solving environments that can help scientists to use easily and efficiently hybrid supercomputers. Many scientific simulations that use structured grids to solve partial differential equations in fact rely on stencil computations. Stencil computations have become crucial in solving many challenging problems in various domains, e.g., engineering or physics. Although many parallel stencil computing approaches have been proposed, in most cases they solve only particular problems. As a result, scientists are struggling when it comes to the subject of implementing a new stencil-based simulation, especially on high performance hybrid supercomputers. In response to the presented need we extend our previous work on a parallel programming framework for CUDA – CaCUDA that now supports OpenCL. We present CaKernel – a tool that simplifies the development of parallel scientific applications on hybrid systems. CaKernel is built on the highly scalable and portable Cactus framework. In the CaKernel framework, Cactus manages the inter-process communication via MPI while CaKernel manages the code running on Graphics Processing Units (GPUs and interactions between them. As a non-trivial test case we have developed a 3D CFD code to demonstrate the performance and scalability of the automatically generated code.

  11. The FORCE: A portable parallel programming language supporting computational structural mechanics

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Brehm, Juergen; Ramanan, Aruna

    1989-01-01

    This project supports the conversion of codes in Computational Structural Mechanics (CSM) to a parallel form which will efficiently exploit the computational power available from multiprocessors. The work is a part of a comprehensive, FORTRAN-based system to form a basis for a parallel version of the NICE/SPAR combination which will form the CSM Testbed. The software is macro-based and rests on the force methodology developed by the principal investigator in connection with an early scientific multiprocessor. Machine independence is an important characteristic of the system so that retargeting it to the Flex/32, or any other multiprocessor on which NICE/SPAR might be imnplemented, is well supported. The principal investigator has experience in producing parallel software for both full and sparse systems of linear equations using the force macros. Other researchers have used the Force in finite element programs. It has been possible to rapidly develop software which performs at maximum efficiency on a multiprocessor. The inherent machine independence of the system also means that the parallelization will not be limited to a specific multiprocessor.

  12. A parallel domain decomposition algorithm for coastal ocean circulation models based on integer linear programming

    Science.gov (United States)

    Jordi, Antoni; Georgas, Nickitas; Blumberg, Alan

    2017-05-01

    This paper presents a new parallel domain decomposition algorithm based on integer linear programming (ILP), a mathematical optimization method. To minimize the computation time of coastal ocean circulation models, the ILP decomposition algorithm divides the global domain in local domains with balanced work load according to the number of processors and avoids computations over as many as land grid cells as possible. In addition, it maintains the use of logically rectangular local domains and achieves the exact same results as traditional domain decomposition algorithms (such as Cartesian decomposition). However, the ILP decomposition algorithm may not converge to an exact solution for relatively large domains. To overcome this problem, we developed two ILP decomposition formulations. The first one (complete formulation) has no additional restriction, although it is impractical for large global domains. The second one (feasible) imposes local domains with the same dimensions and looks for the feasibility of such decomposition, which allows much larger global domains. Parallel performance of both ILP formulations is compared to a base Cartesian decomposition by simulating two cases with the newly created parallel version of the Stevens Institute of Technology's Estuarine and Coastal Ocean Model (sECOM). Simulations with the ILP formulations run always faster than the ones with the base decomposition, and the complete formulation is better than the feasible one when it is applicable. In addition, parallel efficiency with the ILP decomposition may be greater than one.

  13. "To Bluff like a Man or Fold like a Girl?" - Gender Biased Deceptive Behavior in Online Poker.

    Directory of Open Access Journals (Sweden)

    Jussi Palomäki

    Full Text Available Evolutionary psychology suggests that men are more likely than women to deceive to bolster their status and influence. Also gender perception influences deceptive behavior, which is linked to pervasive gender stereotypes: women are typically viewed as weaker and more gullible than men. We assessed bluffing in an online experiment (N = 502, where participants made decisions to bluff or not in simulated poker tasks against opponents represented by avatars. Participants bluffed on average 6% more frequently at poker tables with female-only avatars than at tables with male-only or gender mixed avatars-a highly significant effect in games involving repeated decisions. Nonetheless, participants did not believe the avatar genders affected their decisions. Males bluffed 13% more frequently than females. Unlike most economic games employed exclusively in research contexts, online poker is played for money by tens of millions of people worldwide. Thus, gender effects in bluffing have significant monetary consequences for poker players.

  14. "To Bluff like a Man or Fold like a Girl?" - Gender Biased Deceptive Behavior in Online Poker.

    Science.gov (United States)

    Palomäki, Jussi; Yan, Jeff; Modic, David; Laakasuo, Michael

    2016-01-01

    Evolutionary psychology suggests that men are more likely than women to deceive to bolster their status and influence. Also gender perception influences deceptive behavior, which is linked to pervasive gender stereotypes: women are typically viewed as weaker and more gullible than men. We assessed bluffing in an online experiment (N = 502), where participants made decisions to bluff or not in simulated poker tasks against opponents represented by avatars. Participants bluffed on average 6% more frequently at poker tables with female-only avatars than at tables with male-only or gender mixed avatars-a highly significant effect in games involving repeated decisions. Nonetheless, participants did not believe the avatar genders affected their decisions. Males bluffed 13% more frequently than females. Unlike most economic games employed exclusively in research contexts, online poker is played for money by tens of millions of people worldwide. Thus, gender effects in bluffing have significant monetary consequences for poker players.

  15. Is Parallel Programming Hard, And, If So, What Can You Do About It? (v2017.01.02a)

    OpenAIRE

    McKenney, Paul E.

    2017-01-01

    The purpose of this book is to help you program shared-memory parallel machines without risking your sanity. We hope that this book's design principles will help you avoid at least some parallel-programming pitfalls. That said, you should think of this book as a foundation on which to build, rather than as a completed cathedral. Your mission, if you choose to accept, is to help make further progress in the exciting field of parallel programming--progress that will in time render this book obs...

  16. Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-12-31

    This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.

  17. Fast implementations of 3D PET reconstruction using vector and parallel programming techniques

    International Nuclear Information System (INIS)

    Guerrero, T.M.; Cherry, S.R.; Dahlbom, M.; Ricci, A.R.; Hoffman, E.J.

    1993-01-01

    Computationally intensive techniques that offer potential clinical use have arisen in nuclear medicine. Examples include iterative reconstruction, 3D PET data acquisition and reconstruction, and 3D image volume manipulation including image registration. One obstacle in achieving clinical acceptance of these techniques is the computational time required. This study focuses on methods to reduce the computation time for 3D PET reconstruction through the use of fast computer hardware, vector and parallel programming techniques, and algorithm optimization. The strengths and weaknesses of i860 microprocessor based workstation accelerator boards are investigated in implementations of 3D PET reconstruction

  18. Full Parallel Implementation of an All-Electron Four-Component Dirac-Kohn-Sham Program.

    Science.gov (United States)

    Rampino, Sergio; Belpassi, Leonardo; Tarantelli, Francesco; Storchi, Loriano

    2014-09-09

    A full distributed-memory implementation of the Dirac-Kohn-Sham (DKS) module of the program BERTHA (Belpassi et al., Phys. Chem. Chem. Phys. 2011, 13, 12368-12394) is presented, where the self-consistent field (SCF) procedure is replicated on all the parallel processes, each process working on subsets of the global matrices. The key feature of the implementation is an efficient procedure for switching between two matrix distribution schemes, one (integral-driven) optimal for the parallel computation of the matrix elements and another (block-cyclic) optimal for the parallel linear algebra operations. This approach, making both CPU-time and memory scalable with the number of processors used, virtually overcomes at once both time and memory barriers associated with DKS calculations. Performance, portability, and numerical stability of the code are illustrated on the basis of test calculations on three gold clusters of increasing size, an organometallic compound, and a perovskite model. The calculations are performed on a Beowulf and a BlueGene/Q system.

  19. Tracking online poker problem gamblers with player account-based gambling data only.

    Science.gov (United States)

    Luquiens, Amandine; Tanguy, Marie-Laure; Benyamina, Amine; Lagadec, Marthylle; Aubin, Henri-Jean; Reynaud, Michel

    2016-12-01

    The aim was to develop and validate an instrument to track online problem poker gamblers with player account-based gambling data (PABGD). We emailed an invitation to all active poker gamblers on the online gambling service provider Winamax. The 14,261 participants completed the Problem Gambling Severity Index (PGSI). PGSI served as a gold standard to track problem gamblers (i.e., PGSI ≥ 5). We used a stepwise logistic regression to build a predictive model of problem gambling with PABGD, and validated it. Of the sample 18% was composed of online poker problem gamblers. The risk factors of problem gambling included in the predictive model were being male, compulsive, younger than 28 years, making a total deposit > 0 euros, having a mean loss per gambling session > 1.7 euros, losing a total of > 45 euros in the last 30 days, having a total stake > 298 euros, having > 60 gambling sessions in the last 30 days, and multi-tabling. The tracking instrument had a sensitivity of 80%, and a specificity of 50%. The quality of the instrument was good. This study illustrates the feasibility of a method to develop and validate instruments to track online problem gamblers with PABGD only. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  20. Searching for globally optimal functional forms for interatomic potentials using genetic programming with parallel tempering.

    Science.gov (United States)

    Slepoy, A; Peters, M D; Thompson, A P

    2007-11-30

    Molecular dynamics and other molecular simulation methods rely on a potential energy function, based only on the relative coordinates of the atomic nuclei. Such a function, called a force field, approximately represents the electronic structure interactions of a condensed matter system. Developing such approximate functions and fitting their parameters remains an arduous, time-consuming process, relying on expert physical intuition. To address this problem, a functional programming methodology was developed that may enable automated discovery of entirely new force-field functional forms, while simultaneously fitting parameter values. The method uses a combination of genetic programming, Metropolis Monte Carlo importance sampling and parallel tempering, to efficiently search a large space of candidate functional forms and parameters. The methodology was tested using a nontrivial problem with a well-defined globally optimal solution: a small set of atomic configurations was generated and the energy of each configuration was calculated using the Lennard-Jones pair potential. Starting with a population of random functions, our fully automated, massively parallel implementation of the method reproducibly discovered the original Lennard-Jones pair potential by searching for several hours on 100 processors, sampling only a minuscule portion of the total search space. This result indicates that, with further improvement, the method may be suitable for unsupervised development of more accurate force fields with completely new functional forms. Copyright (c) 2007 Wiley Periodicals, Inc.

  1. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

    Science.gov (United States)

    Shrimankar, D. D.; Sathe, S. R.

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868

  2. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

    Science.gov (United States)

    Shrimankar, D D; Sathe, S R

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

  3. PAREMD: A parallel program for the evaluation of momentum space properties of atoms and molecules

    Science.gov (United States)

    Meena, Deep Raj; Gadre, Shridhar R.; Balanarayan, P.

    2018-03-01

    The present work describes a code for evaluating the electron momentum density (EMD), its moments and the associated Shannon information entropy for a multi-electron molecular system. The code works specifically for electronic wave functions obtained from traditional electronic structure packages such as GAMESS and GAUSSIAN. For the momentum space orbitals, the general expression for Gaussian basis sets in position space is analytically Fourier transformed to momentum space Gaussian basis functions. The molecular orbital coefficients of the wave function are taken as an input from the output file of the electronic structure calculation. The analytic expressions of EMD are evaluated over a fine grid and the accuracy of the code is verified by a normalization check and a numerical kinetic energy evaluation which is compared with the analytic kinetic energy given by the electronic structure package. Apart from electron momentum density, electron density in position space has also been integrated into this package. The program is written in C++ and is executed through a Shell script. It is also tuned for multicore machines with shared memory through OpenMP. The program has been tested for a variety of molecules and correlated methods such as CISD, Møller-Plesset second order (MP2) theory and density functional methods. For correlated methods, the PAREMD program uses natural spin orbitals as an input. The program has been benchmarked for a variety of Gaussian basis sets for different molecules showing a linear speedup on a parallel architecture.

  4. Abstract machine based execution model for computer architecture design and efficient implementation of logic programs in parallel

    Energy Technology Data Exchange (ETDEWEB)

    Hermenegildo, M.V.

    1986-01-01

    The term Logic Programming refers to a variety of computer languages and execution models based on the traditional concept of Symbolic Logic. The expressive power of these languages offers promise to be of great assistance in facing the programming challenges of present and future symbolic processing applications in artificial intelligence, knowledge-based systems, and many other areas of computing. This dissertation presents an efficient parallel execution model for logic programs. The model is described from the source language level down to an Abstract Machine level, suitable for direct implementation on existing parallel systems or for the design of special purpose parallel architectures. Few assumptions are made at the source language level and, therefore, the techniques developed and the general Abstract Machine design are applicable to a variety of logic (and also functional) languages. These techniques offer efficient solutions to several areas of parallel Logic Programming implementation previously considered problematic or a source of considerable overhead, such as the detection and handling of variable binding conflicts in AND-parallelism, the specification of control and management of the execution tree, the treatment of distributed backtracking, and goal scheduling and memory management issues, etc. A parallel Abstract Machine design is offered, specifying data areas, operation, and a suitable instruction set.

  5. Drainage network extraction from a high-resolution DEM using parallel programming in the .NET Framework

    Science.gov (United States)

    Du, Chao; Ye, Aizhong; Gan, Yanjun; You, Jinjun; Duan, Qinyun; Ma, Feng; Hou, Jingwen

    2017-12-01

    High-resolution Digital Elevation Models (DEMs) can be used to extract high-accuracy prerequisite drainage networks. A higher resolution represents a larger number of grids. With an increase in the number of grids, the flow direction determination will require substantial computer resources and computing time. Parallel computing is a feasible method with which to resolve this problem. In this paper, we proposed a parallel programming method within the .NET Framework with a C# Compiler in a Windows environment. The basin is divided into sub-basins, and subsequently the different sub-basins operate on multiple threads concurrently to calculate flow directions. The method was applied to calculate the flow direction of the Yellow River basin from 3 arc-second resolution SRTM DEM. Drainage networks were extracted and compared with HydroSHEDS river network to assess their accuracy. The results demonstrate that this method can calculate the flow direction from high-resolution DEMs efficiently and extract high-precision continuous drainage networks.

  6. The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

    Directory of Open Access Journals (Sweden)

    Matthew O'keefe

    1995-01-01

    Full Text Available Massively parallel processors (MPPs hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.

  7. Increased ventral-striatal activity during monetary decision making is a marker of problem poker gambling severity.

    Science.gov (United States)

    Brevers, Damien; Noël, Xavier; He, Qinghua; Melrose, James A; Bechara, Antoine

    2016-05-01

    The aim of this study was to examine the impact of different neural systems on monetary decision making in frequent poker gamblers, who vary in their degree of problem gambling. Fifteen frequent poker players, ranging from non-problem to high-problem gambling, and 15 non-gambler controls were scanned using functional magnetic resonance imaging (fMRI) while performing the Iowa Gambling Task (IGT). During IGT deck selection, between-group fMRI analyses showed that frequent poker gamblers exhibited higher ventral-striatal but lower dorsolateral prefrontal and orbitofrontal activations as compared with controls. Moreover, using functional connectivity analyses, we observed higher ventral-striatal connectivity in poker players, and in regions involved in attentional/motor control (posterior cingulate), visual (occipital gyrus) and auditory (temporal gyrus) processing. In poker gamblers, scores of problem gambling severity were positively associated with ventral-striatal activations and with the connectivity between the ventral-striatum seed and the occipital fusiform gyrus and the middle temporal gyrus. Present results are consistent with findings from recent brain imaging studies showing that gambling disorder is associated with heightened motivational-reward processes during monetary decision making, which may hamper one's ability to moderate his level of monetary risk taking. © 2015 Society for the Study of Addiction.

  8. Trait Mindfulness, Problem-Gambling Severity, Altered State of Awareness and Urge to Gamble in Poker-Machine Gamblers.

    Science.gov (United States)

    McKeith, Charles F A; Rock, Adam J; Clark, Gavin I

    2017-06-01

    In Australia, poker-machine gamblers represent a disproportionate number of problem gamblers. To cultivate a greater understanding of the psychological mechanisms involved in poker-machine gambling, a repeated measures cue-reactivity protocol was administered. A community sample of 38 poker-machine gamblers was assessed for problem-gambling severity and trait mindfulness. Participants were also assessed regarding altered state of awareness (ASA) and urge to gamble at baseline, following a neutral cue, and following a gambling cue. Results indicated that: (a) urge to gamble significantly increased from neutral cue to gambling cue, while controlling for baseline urge; (b) cue-reactive ASA did not significantly mediate the relationship between problem-gambling severity and cue-reactive urge (from neutral cue to gambling cue); (c) trait mindfulness was significantly negatively associated with both problem-gambling severity and cue-reactive urge (i.e., from neutral cue to gambling cue, while controlling for baseline urge); and (d) trait mindfulness did not significantly moderate the effect of problem-gambling severity on cue-reactive urge (from neutral cue to gambling cue). This is the first study to demonstrate a negative association between trait mindfulness and cue-reactive urge to gamble in a population of poker-machine gamblers. Thus, this association merits further evaluation both in relation to poker-machine gambling and other gambling modalities.

  9. Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

    Science.gov (United States)

    Oliker, Leonid; Heber, Gerd; Biswas, Rupak

    2000-01-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

  10. Efficient storage, retrieval and analysis of poker hands: An adaptive data framework

    Directory of Open Access Journals (Sweden)

    Gorawski Marcin

    2017-12-01

    Full Text Available In online gambling, poker hands are one of the most popular and fundamental units of the game state and can be considered objects comprising all the events that pertain to the single hand played. In a situation where tens of millions of poker hands are produced daily and need to be stored and analysed quickly, the use of relational databases no longer provides high scalability and performance stability. The purpose of this paper is to present an efficient way of storing and retrieving poker hands in a big data environment. We propose a new, read-optimised storage model that offers significant data access improvements over traditional database systems as well as the existing Hadoop file formats such as ORC, RCFile or SequenceFile. Through index-oriented partition elimination, our file format allows reducing the number of file splits that needs to be accessed, and improves query response time up to three orders of magnitude in comparison with other approaches. In addition, our file format supports a range of new indexing structures to facilitate fast row retrieval at a split level. Both index types operate independently of the Hive execution context and allow other big data computational frameworks such as MapReduce or Spark to benefit from the optimized data access path to the hand information. Moreover, we present a detailed analysis of our storage model and its supporting index structures, and how they are organised in the overall data framework. We also describe in detail how predicate based expression trees are used to build effective file-level execution plans. Our experimental tests conducted on a production cluster, holding nearly 40 billion hands which span over 4000 partitions, show that multi-way partition pruning outperforms other existing file formats, resulting in faster query execution times and better cluster utilisation.

  11. The electronic book of educational training tasks on parallel programming based on the MPI 2.0 standard

    Directory of Open Access Journals (Sweden)

    Mikhail E. Abramyan

    2017-12-01

    Full Text Available We discuss one of approaches to the development of educational parallel software and describe the Programming Taskbook for MPI-2 (http://ptaskbook.com/ru/ptformpi2/, a courseware that includes 250 training tasks on various topics of MPI. We describe the history of its development, as well as give an overview of its task groups. Usage of the Programming Taskbook for MPI-2 at laboratory classes is illustrated by example of solving one of the training tasks. We also describe task groups related to such new features of MPI 2.0 interface as parallel input-output, remote memory access (one-sided communications, and dynamic creation of processes, and give formulations of some tasks from these groups. In addition, we discuss some features of the final task group devoted to parallel matrix algorithms.

  12. Gambling Motives: Do They Explain Cognitive Distortions in Male Poker Gamblers?

    Science.gov (United States)

    Mathieu, Sasha; Barrault, Servane; Brunault, Paul; Varescon, Isabelle

    2018-03-01

    Gambling behavior is partly the result of varied motivations leading individuals to participate in gambling activities. Specific motivational profiles are found in gamblers, and gambling motives are closely linked to the development of cognitive distortions. This cross-sectional study aimed to predict cognitive distortions from gambling motives in poker players. The population was recruited in online gambling forums. Participants reported gambling at least once a week. Data included sociodemographic characteristics, the South Oaks Gambling Screen, the Gambling Motives Questionnaire-Financial and the Gambling-Related Cognition Scale. This study was conducted on 259 male poker gamblers (aged 18-69 years, 14.3% probable pathological gamblers). Univariate analyses showed that cognitive distortions were independently predicted by overall gambling motives (34.8%) and problem gambling (22.4%) (p gambling problems, showing a close inter-relationship between gambling motives, cognitive distortions and the severity of gambling. These data are consistent with the following theoretical process model: gambling motives lead individuals to practice and repeat the gambling experience, which may lead them to develop cognitive distortions, which in turn favor problem gambling. This study opens up new research perspectives to understand better the mechanisms underlying gambling practice and has clinical implications in terms of prevention and treatment. For example, a coupled motivational and cognitive intervention focused on gambling motives/cognitive distortions could be beneficial for individuals with gambling problems.

  13. Rocket measurement of auroral partial parallel distribution functions

    Science.gov (United States)

    Lin, C.-A.

    1980-01-01

    The auroral partial parallel distribution functions are obtained by using the observed energy spectra of electrons. The experiment package was launched by a Nike-Tomahawk rocket from Poker Flat, Alaska over a bright auroral band and covered an altitude range of up to 180 km. Calculated partial distribution functions are presented with emphasis on their slopes. The implications of the slopes are discussed. It should be pointed out that the slope of the partial parallel distribution function obtained from one energy spectra will be changed by superposing another energy spectra on it.

  14. SPSS and SAS programs for determining the number of components using parallel analysis and velicer's MAP test.

    Science.gov (United States)

    O'Connor, B P

    2000-08-01

    Popular statistical software packages do not have the proper procedures for determining the number of components in factor and principal components analyses. Parallel analysis and Velicer's minimum average partial (MAP) test are validated procedures, recommended widely by statisticians. However, many researchers continue to use alternative, simpler, but flawed procedures, such as the eigenvalues-greater-than-one rule. Use of the proper procedures might be increased if these procedures could be conducted within familiar software environments. This paper describes brief and efficient programs for using SPSS and SAS to conduct parallel analyses and the MAP test.

  15. From functional programming to multicore parallelism: A case study based on Presburger Arithmetic

    DEFF Research Database (Denmark)

    Dung, Phan Anh; Hansen, Michael Reichhardt

    2011-01-01

    , we are interested in using PA in connection with the Duration Calculus Model Checker (DCMC) [5]. There are effective decision procedures for PA including Cooper’s algorithm and the Omega Test; however, their complexity is extremely high with doubly exponential lower bound and triply exponential upper...... bound [7]. We investigate these decision procedures in the context of multicore parallelism with the hope of exploiting multicore powers. Unfortunately, we are not aware of any prior parallelism research related to decision procedures for PA. The closest work is the preliminary results on parallelism...

  16. Automatic parallelization of nested loop programs for non-manifest real-time stream processing applications

    OpenAIRE

    Bijlsma, T.

    2011-01-01

    This thesis is concerned with the automatic parallelization of real-time stream processing applications, such that they can be executed on embedded multiprocessor systems. Stream processing applications can be found in the video and channel decoding domain. These applications often have temporal requirements and can contain non-manifest conditions and expressions. For non-manifest conditions and expressions the results cannot be evaluated at compile time. Current parallelization approaches ha...

  17. Development of whole core thermal-hydraulic analysis program ACT. 4. Simplified fuel assembly model and parallelization by MPI

    International Nuclear Information System (INIS)

    Ohshima, Hiroyuki

    2001-10-01

    A whole core thermal-hydraulic analysis program ACT is being developed for the purpose of evaluating detailed in-core thermal hydraulic phenomena of fast reactors including the effect of the flow between wrapper-tube walls (inter-wrapper flow) under various reactor operation conditions. As appropriate boundary conditions in addition to a detailed modeling of the core are essential for accurate simulations of in-core thermal hydraulics, ACT consists of not only fuel assembly and inter-wrapper flow analysis modules but also a heat transport system analysis module that gives response of the plant dynamics to the core model. This report describes incorporation of a simplified model to the fuel assembly analysis module and program parallelization by a message passing method toward large-scale simulations. ACT has a fuel assembly analysis module which can simulate a whole fuel pin bundle in each fuel assembly of the core and, however, it may take much CPU time for a large-scale core simulation. Therefore, a simplified fuel assembly model that is thermal-hydraulically equivalent to the detailed one has been incorporated in order to save the simulation time and resources. This simplified model is applied to several parts of fuel assemblies in a core where the detailed simulation results are not required. With regard to the program parallelization, the calculation load and the data flow of ACT were analyzed and the optimum parallelization has been done including the improvement of the numerical simulation algorithm of ACT. Message Passing Interface (MPI) is applied to data communication between processes and synchronization in parallel calculations. Parallelized ACT was verified through a comparison simulation with the original one. In addition to the above works, input manuals of the core analysis module and the heat transport system analysis module have been prepared. (author)

  18. Computer science. Heads-up limit hold'em poker is solved.

    Science.gov (United States)

    Bowling, Michael; Burch, Neil; Johanson, Michael; Tammelin, Oskari

    2015-01-09

    Poker is a family of games that exhibit imperfect information, where players do not have full knowledge of past events. Whereas many perfect-information games have been solved (e.g., Connect Four and checkers), no nontrivial imperfect-information game played competitively by humans has previously been solved. Here, we announce that heads-up limit Texas hold'em is now essentially weakly solved. Furthermore, this computation formally proves the common wisdom that the dealer in the game holds a substantial advantage. This result was enabled by a new algorithm, CFR(+), which is capable of solving extensive-form games orders of magnitude larger than previously possible. Copyright © 2015, American Association for the Advancement of Science.

  19. Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    Science.gov (United States)

    Harper, Richard

    1989-01-01

    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

  20. ParaHaplo: A program package for haplotype-based whole-genome association study using parallel computing

    Directory of Open Access Journals (Sweden)

    Kamatani Naoyuki

    2009-10-01

    Full Text Available Abstract Background Since more than a million single-nucleotide polymorphisms (SNPs are analyzed in any given genome-wide association study (GWAS, performing multiple comparisons can be problematic. To cope with multiple-comparison problems in GWAS, haplotype-based algorithms were developed to correct for multiple comparisons at multiple SNP loci in linkage disequilibrium. A permutation test can also control problems inherent in multiple testing; however, both the calculation of exact probability and the execution of permutation tests are time-consuming. Faster methods for calculating exact probabilities and executing permutation tests are required. Methods We developed a set of computer programs for the parallel computation of accurate P-values in haplotype-based GWAS. Our program, ParaHaplo, is intended for workstation clusters using the Intel Message Passing Interface (MPI. We compared the performance of our algorithm to that of the regular permutation test on JPT and CHB of HapMap. Results ParaHaplo can detect smaller differences between 2 populations than SNP-based GWAS. We also found that parallel-computing techniques made ParaHaplo 100-fold faster than a non-parallel version of the program. Conclusion ParaHaplo is a useful tool in conducting haplotype-based GWAS. Since the data sizes of such projects continue to increase, the use of fast computations with parallel computing--such as that used in ParaHaplo--will become increasingly important. The executable binaries and program sources of ParaHaplo are available at the following address: http://sourceforge.jp/projects/parallelgwas/?_sl=1

  1. Concurrent Collections (CnC): A new approach to parallel programming

    CERN Multimedia

    CERN. Geneva

    2010-01-01

    A common approach in designing parallel languages is to provide some high level handles to manipulate the use of the parallel platform. This exposes some aspects of the target platform, for example, shared vs. distributed memory. It may expose some but not all types of parallelism, for example, data parallelism but not task parallelism. This approach must find a balance between the desire to provide a simple view for the domain expert and provide sufficient power for tuning. This is hard for any given architecture and harder if the language is to apply to a range of architectures. Either simplicity or power is lost. Instead of viewing the language design problem as one of providing the programmer with high level handles, we view the problem as one of designing an interface. On one side of this interface is the programmer (domain expert) who knows the application but needs no knowledge of any aspects of the platform. On the other side of the interface is the performance expert (programmer o...

  2. Parallel Object Oriented MD Simulation Program for Long Time Simulations of Metallic Glasses and Undercooled Liquids

    Science.gov (United States)

    Böddeker, B.; Teichler, H.

    The MD simulation program TABB is motivated by the need of long time simulations for the investigation of slow processes near the glass transition of glass forming alloys. TABB is written in C++ with a high degree of flexibility: TABB allows the use of any short ranged pair potentials or EAM potentials, by generating and using a spline representation of all functions and their derivatives. TABB supports several numerical integration algorithms like the Runge-Kotta or the modified Gear-predictor-corrector algorithm of order five. The boundary conditions can be chosen to resemble the geometry of bulk materials or films. The simulation box length or the pressure can be fixed for each dimension separately. TABB may be used in isokinetic, isoenergeric or canonic (with random forces) mode. TABB contains a simple instruction interpreter to easily control the parameters and options during the simulation. The same source code can be compiled either for workstations or for parallel computers. The main optimization goal of TABB is to allow long time simulations of medium or small sized systems. To make this possible, much attention is spent on the optimized communication between the nodes. TABB uses a domain decomposition procedure. To use many nodes with a small system, the domain size has to be small compared to the range of particle interactions. In the limit of many nodes for only few atoms, the bottle neck of communication is the latency time. TABB minimizes the number of pairs of domains containing atoms that interact between these domains. This procedure minimizes the need of communication calls between pairs of nodes. TABB decides automatically, to how many, and to which directions the decomposition shall be applied. E.g., in the case of one dimensional domain decomposition, the simulation box is only split into "slabs" along a selected direction. The three dimensional domain decomposition is best with respect to the number of interacting domains only for simulations

  3. Scalable parallel programming for high performance seismic simulation on petascale heterogeneous supercomputers

    Science.gov (United States)

    Zhou, Jun

    The 1994 Northridge earthquake in Los Angeles, California, killed 57 people, injured over 8,700 and caused an estimated $20 billion in damage. Petascale simulations are needed in California and elsewhere to provide society with a better understanding of the rupture and wave dynamics of the largest earthquakes at shaking frequencies required to engineer safe structures. As the heterogeneous supercomputing infrastructures are becoming more common, numerical developments in earthquake system research are particularly challenged by the dependence on the accelerator elements to enable "the Big One" simulations with higher frequency and finer resolution. Reducing time to solution and power consumption are two primary focus area today for the enabling technology of fault rupture dynamics and seismic wave propagation in realistic 3D models of the crust's heterogeneous structure. This dissertation presents scalable parallel programming techniques for high performance seismic simulation running on petascale heterogeneous supercomputers. A real world earthquake simulation code, AWP-ODC, one of the most advanced earthquake codes to date, was chosen as the base code in this research, and the testbed is based on Titan at Oak Ridge National Laboraratory, the world's largest hetergeneous supercomputer. The research work is primarily related to architecture study, computation performance tuning and software system scalability. An earthquake simulation workflow has also been developed to support the efficient production sets of simulations. The highlights of the technical development are an aggressive performance optimization focusing on data locality and a notable data communication model that hides the data communication latency. This development results in the optimal computation efficiency and throughput for the 13-point stencil code on heterogeneous systems, which can be extended to general high-order stencil codes. Started from scratch, the hybrid CPU/GPU version of AWP

  4. Automatic parallelization of nested loop programs for non-manifest real-time stream processing applications

    NARCIS (Netherlands)

    Bijlsma, T.

    2011-01-01

    This thesis is concerned with the automatic parallelization of real-time stream processing applications, such that they can be executed on embedded multiprocessor systems. Stream processing applications can be found in the video and channel decoding domain. These applications often have temporal

  5. AxML: a fast program for sequential and parallel phylogenetic tree calculations based on the maximum likelihood method.

    Science.gov (United States)

    Stamatakis, Alexandros P; Ludwig, Thomas; Meier, Harald; Wolf, Marty J

    2002-01-01

    Heuristics for the NP-complete problem of calculating the optimal phylogenetic tree for a set of aligned rRNA sequences based on the maximum likelihood method are computationally expensive. In most existing algorithms the tree evaluation and branch length optimization functions, calculating the likelihood value for each tree topology examined in the search space, account for the greatest part of overall computation time. This paper introduces AxML, a program derived from fastDNAml, incorporating a fast topology evaluation function. The algorithmic optimizations introduced, represent a general approach for accelerating this function and are applicable to both sequential and parallel phylogeny programs, irrespective of their search space strategy. Therefore, their integration into three existing phylogeny programs rendered encouraging results. Experimental results on conventional processor architectures show a global run time improvement of 35% up to 47% for the various test sets and program versions we used.

  6. Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

    Science.gov (United States)

    Bellerby, Tim

    2015-04-01

    PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks number of processors

  7. Temperature Control with Two Parallel Small Loop Heat Pipes for GLM Program

    Science.gov (United States)

    Khrustalev, Dmitry; Stouffer, Chuck; Ku, Jentung; Hamilton, Jon; Anderson, Mark

    2014-01-01

    The concept of temperature control of an electronic component using a single Loop Heat Pipe (LHP) is well established for Aerospace applications. Using two LHPs is often desirable for redundancy/reliability reasons or for increasing the overall heat source-sink thermal conductance. This effort elaborates on temperature controlling operation of a thermal system that includes two small ammonia LHPs thermally coupled together at the evaporator end as well as at the condenser end and operating "in parallel". A transient model of the LHP system was developed on the Thermal Desktop (TradeMark) platform to understand some fundamental details of such parallel operation of the two LHPs. Extensive thermal-vacuum testing was conducted with two thermally coupled LHPs operating simultaneously as well as with only one LHP operating at a time. This paper outlines the temperature control procedures for two LHPs operating simultaneously with widely varying sink temperatures. The test data obtained during the thermal-vacuum testing, with both LHPs running simultaneously in comparison with only one LHP operating at a time, are presented with detailed explanations.

  8. Neurite, a finite difference large scale parallel program for the simulation of electrical signal propagation in neurites under mechanical loading.

    Directory of Open Access Journals (Sweden)

    Julián A García-Grajales

    Full Text Available With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite--explicit and implicit--were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon

  9. Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

    Science.gov (United States)

    Biswas, Rupak; Das, Sajal K.; Harvey, Daniel; Oliker, Leonid

    1999-01-01

    The ability to dynamically adapt an unstructured -rid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult, particularly from the view point of portability on various multiprocessor platforms We address this problem by developing PLUM, tin automatic anti architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on, an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.

  10. Parallel Programming Application to Matrix Algebra in the Spectral Method for Control Systems Analysis, Synthesis and Identification

    Directory of Open Access Journals (Sweden)

    V. Yu. Kleshnin

    2016-01-01

    Full Text Available The article describes the matrix algebra libraries based on the modern technologies of parallel programming for the Spectrum software, which can use a spectral method (in the spectral form of mathematical description to analyse, synthesise and identify deterministic and stochastic dynamical systems. The developed matrix algebra libraries use the following technologies for the GPUs: OmniThreadLibrary, OpenMP, Intel Threading Building Blocks, Intel Cilk Plus for CPUs nVidia CUDA, OpenCL, and Microsoft Accelerated Massive Parallelism.The developed libraries support matrices with real elements (single and double precision. The matrix dimensions are limited by 32-bit or 64-bit memory model and computer configuration. These libraries are general-purpose and can be used not only for the Spectrum software. They can also find application in the other projects where there is a need to perform operations with large matrices.The article provides a comparative analysis of the libraries developed for various matrix operations (addition, subtraction, scalar multiplication, multiplication, powers of matrices, tensor multiplication, transpose, inverse matrix, finding a solution of the system of linear equations through the numerical experiments using different CPU and GPU. The article contains sample programs and performance test results for matrix multiplication, which requires most of all computational resources in regard to the other operations.

  11. A parallel program for numerical simulation of discrete fracture network and groundwater flow

    Science.gov (United States)

    Huang, Ting-Wei; Liou, Tai-Sheng; Kalatehjari, Roohollah

    2017-04-01

    The ability of modeling fluid flow in Discrete Fracture Network (DFN) is critical to various applications such as exploration of reserves in geothermal and petroleum reservoirs, geological sequestration of carbon dioxide and final disposal of spent nuclear fuels. Although several commerical or acdametic DFN flow simulators are already available (e.g., FracMan and DFNWORKS), challenges in terms of computational efficiency and three-dimensional visualization still remain, which therefore motivates this study for developing a new DFN and flow simulator. A new DFN and flow simulator, DFNbox, was written in C++ under a cross-platform software development framework provided by Qt. DFNBox integrates the following capabilities into a user-friendly drop-down menu interface: DFN simulation and clipping, 3D mesh generation, fracture data analysis, connectivity analysis, flow path analysis and steady-state grounwater flow simulation. All three-dimensional visualization graphics were developed using the free OpenGL API. Similar to other DFN simulators, fractures are conceptualized as random point process in space, with stochastic characteristics represented by orientation, size, transmissivity and aperture. Fracture meshing was implemented by Delaunay triangulation for visualization but not flow simulation purposes. Boundary element method was used for flow simulations such that only unknown head or flux along exterior and interection bounaries are needed for solving the flow field in the DFN. Parallel compuation concept was taken into account in developing DFNbox for calculations that such concept is possible. For example, the time-consuming seqential code for fracture clipping calculations has been completely replaced by a highly efficient parallel one. This can greatly enhance compuational efficiency especially on multi-thread platforms. Furthermore, DFNbox have been successfully tested in Windows and Linux systems with equally-well performance.

  12. Contributions for the optimization of the extensibility of parallel programing of turbulent plasmas

    International Nuclear Information System (INIS)

    Rozar, F.

    2015-01-01

    The work realized through this thesis focuses on the optimization of the Gysela code which simulates a plasma turbulence. Optimization of a scientific application concerns mainly one of the three following points: 1) the simulation of larger meshes, 2) the reduction of computing time and 3) the enhancement of the computation accuracy. The first part of this manuscript presents the contributions relative to the simulation of larger mesh. Alike many simulation codes, getting more realistic simulations is often analogous to rene the meshes. The finer the mesh the larger the memory consumption. Moreover, during these last few years, the supercomputers had trend to provide less and less memory per computer core. For these reasons, we have developed a library, the libMTM (Modeling and Tracing Memory), dedicated to study precisely the memory consumption of parallel softwares. The libMTM tools allowed us to reduce the memory consumption of Gysela and to study its scalability. As far as we know, there is no other tool which provides equivalent features which allow the memory scalability study. The second part of the manuscript presents the works relative to the optimization of the computation time and the improvement of accuracy of the gyro-average operator. This operator represents a corner stone of the gyrokinetic model which is used by the Gysela application. The improvement of accuracy emanates from a change in the computing method: a scheme based on a 2D Hermite interpolation substitutes the Pade approximation. Although the new version of the gyro-average operator is more accurate, it is also more expensive in computation time than the former one. In order to keep the simulation in reasonable time, different optimizations have been performed on the new computing method to get it competitive. Finally, we have developed a MPI parallelized version of the new gyro-average operator. The good scalability of this new gyro-average computer will allow, eventually, a reduction

  13. Practical parallel computing

    CERN Document Server

    Morse, H Stephen

    1994-01-01

    Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

  14. 78 FR 40196 - National Environmental Policy Act; Sounding Rockets Program; Poker Flat Research Range

    Science.gov (United States)

    2013-07-03

    ... government agencies, and educational institutions have conducted suborbital rocket launches from the PFRR...-latitude, auroral-zone rocket launching facility in the United States where a sounding rocket can readily... environmental consequences of five alternative means for continuing sounding rocket launches at PFRR. The...

  15. 77 FR 61642 - National Environmental Policy Act; Sounding Rockets Program; Poker Flat Research Range

    Science.gov (United States)

    2012-10-10

    ..., and educational institutions have conducted suborbital rocket launches from the PFRR. While the PFRR...-zone rocket launching facility in the United States where a sounding rocket can readily study the... rockets are launched and within which spent stages and payloads impact the ground. Within these flight...

  16. 76 FR 20715 - National Environmental Policy Act; Sounding Rockets Program; Poker Flat Research Range

    Science.gov (United States)

    2011-04-13

    ... to prepare the EIS and to request input regarding the definition of reasonable alternatives and... EIS. The scoping meeting locations and dates identified at this time are provided under SUPPLEMENTARY... research in auroral space physics and earth science. The PFRR is the only high-latitude, auroral-zone...

  17. What Multilevel Parallel Programs do when you are not Watching: A Performance Analysis Case Study Comparing MPI/OpenMP, MLP, and Nested OpenMP

    Science.gov (United States)

    Jost, Gabriele; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.

  18. Speeding Up the String Comparison of the IDS Snort using Parallel Programming: A Systematic Literature Review on the Parallelized Aho-Corasick Algorithm

    Directory of Open Access Journals (Sweden)

    SILVA JUNIOR,J. B.

    2016-12-01

    Full Text Available The Intrusion Detection System (IDS needs to compare the contents of all packets arriving at the network interface with a set of signatures for indicating possible attacks, a task that consumes much CPU processing time. In order to alleviate this problem, some researchers have tried to parallelize the IDS's comparison engine, transferring execution from the CPU to GPU. This paper identifies and maps the parallelization features of the Aho-Corasick algorithm, which is used in Snort to compare patterns, in order to show this algorithm's implementation and execution issues, as well as optimization techniques for the Aho-Corasick machine. We have found 147 papers from important computer science publications databases, and have mapped them. We selected 22 and analyzed them in order to find our results. Our analysis of the papers showed, among other results, that parallelization of the AC algorithm is a new task and the authors have focused on the State Transition Table as the most common way to implement the algorithm on the GPU. Furthermore, we found that some techniques speed up the algorithm and reduce the required machine storage space are highly used, such as the algorithm running on the fastest memories and mechanisms for reducing the number of nodes and bit maping.

  19. Introducing PROFESS 2.0: A parallelized, fully linear scaling program for orbital-free density functional theory calculations

    Science.gov (United States)

    Hung, Linda; Huang, Chen; Shin, Ilgyou; Ho, Gregory S.; Lignères, Vincent L.; Carter, Emily A.

    2010-12-01

    Orbital-free density functional theory (OFDFT) is a first principles quantum mechanics method to find the ground-state energy of a system by variationally minimizing with respect to the electron density. No orbitals are used in the evaluation of the kinetic energy (unlike Kohn-Sham DFT), and the method scales nearly linearly with the size of the system. The PRinceton Orbital-Free Electronic Structure Software (PROFESS) uses OFDFT to model materials from the atomic scale to the mesoscale. This new version of PROFESS allows the study of larger systems with two significant changes: PROFESS is now parallelized, and the ion-electron and ion-ion terms scale quasilinearly, instead of quadratically as in PROFESS v1 (L. Hung and E.A. Carter, Chem. Phys. Lett. 475 (2009) 163). At the start of a run, PROFESS reads the various input files that describe the geometry of the system (ion positions and cell dimensions), the type of elements (defined by electron-ion pseudopotentials), the actions you want it to perform (minimize with respect to electron density and/or ion positions and/or cell lattice vectors), and the various options for the computation (such as which functionals you want it to use). Based on these inputs, PROFESS sets up a computation and performs the appropriate optimizations. Energies, forces, stresses, material geometries, and electron density configurations are some of the values that can be output throughout the optimization. New version program summaryProgram Title: PROFESS Catalogue identifier: AEBN_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBN_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 68 721 No. of bytes in distributed program, including test data, etc.: 1 708 547 Distribution format: tar.gz Programming language: Fortran 90 Computer

  20. Getting To Exascale: Applying Novel Parallel Programming Models To Lab Applications For The Next Generation Of Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Dube, Evi [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Shereda, Charles [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Nau, Lee [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Harris, Lance [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2010-09-27

    As supercomputing moves toward exascale, node architectures will change significantly. CPU core counts on nodes will increase by an order of magnitude or more. Heterogeneous architectures will become more commonplace, with GPUs or FPGAs providing additional computational power. Novel programming models may make better use of on-node parallelism in these new architectures than do current models. In this paper we examine several of these novel models – UPC, CUDA, and OpenCL –to determine their suitability to LLNL scientific application codes. Our study consisted of several phases: We conducted interviews with code teams and selected two codes to port; We learned how to program in the new models and ported the codes; We debugged and tuned the ported applications; We measured results, and documented our findings. We conclude that UPC is a challenge for porting code, Berkeley UPC is not very robust, and UPC is not suitable as a general alternative to OpenMP for a number of reasons. CUDA is well supported and robust but is a proprietary NVIDIA standard, while OpenCL is an open standard. Both are well suited to a specific set of application problems that can be run on GPUs, but some problems are not suited to GPUs. Further study of the landscape of novel models is recommended.

  1. Center for Programming Models for Scalable Parallel Computing - Towards Enhancing OpenMP for Manycore and Heterogeneous Nodes

    Energy Technology Data Exchange (ETDEWEB)

    Barbara Chapman

    2012-02-01

    OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close to DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support.

  2. Chasing losses in online poker and casino games: characteristics and game play of Internet gamblers at risk of disordered gambling.

    Science.gov (United States)

    Gainsbury, Sally M; Suhonen, Niko; Saastamoinen, Jani

    2014-07-30

    Disordered Internet gambling is a psychological disorder that represents an important public health issue due to the increase in highly available and conveniently accessible Internet gambling sites. Chasing losses is one of the few observable markers of at-risk and problem gambling that may be used to detect early signs of disordered Internet gambling. This study examined loss chasing behaviour in a sample of Internet casino and poker players and the socio-demographic variables, irrational beliefs, and gambling behaviours associated with chasing losses. An online survey was completed by 10,838 Internet gamblers (58% male) from 96 countries. The results showed that Internet casino players had a greater tendency to report chasing losses than poker players and gamblers who reported chasing losses were more likely to hold irrational beliefs about gambling and spend more time and money gambling than those who reported that they were unaffected by previous losses. Gamblers who played for excitement and to win money were more likely to report chasing losses. This study is one of the largest ever studies of Internet gamblers and the results are highly significant as they provide insight into the characteristics and behaviours of gamblers using this mode of access. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  3. Parallel Program Systems for the Analysis of Wave Processes in Elastic-Plastic, Granular, Porous and Multi-Blocky Media

    Science.gov (United States)

    Sadovskaya, Oxana; Sadovskii, Vladimir

    2017-04-01

    Under modeling the wave propagation processes in geomaterials (granular and porous media, soils and rocks) it is necessary to take into account the structural inhomogeneity of these materials. Parallel program systems for numerical solution of 2D and 3D problems of the dynamics of deformable media with constitutive relationships of rather general form on the basis of universal mathematical model describing small strains of elastic, elastic-plastic, granular and porous materials are worked out. In the case of an elastic material, the model is reduced to the system of equations, hyperbolic by Friedrichs, written in terms of velocities and stresses in a symmetric form. In the case of an elastic-plastic material, the model is a special formulation of the Prandtl-Reuss theory in the form of variational inequality with one-sided constraints on the stress tensor. Generalization of the model to describe granularity and the collapse of pores is obtained by means of the rheological approach, taking into account different resistance of materials to tension and compression. Rotational motion of particles in the material microstructure is considered within the framework of a mathematical model of the Cosserat continuum. Computational domain may have a blocky structure, composed of an arbitrary number of layers, strips in a layer and blocks in a strip from different materials with self-consistent curvilinear interfaces. At the external boundaries of computational domain the main types of dissipative boundary conditions in terms of velocities, stresses or mixed boundary conditions can be given. Shock-capturing algorithm is proposed for implementation of the model on supercomputers with cluster architecture. It is based on the two-cyclic splitting method with respect to spatial variables and the special procedures of the stresses correction to take into account plasticity, granularity or porosity of a material. An explicit monotone ENO-scheme is applied for solving one

  4. Occurrence rate of ion upflow and downflow observed by the Poker Flat Incoherent Scatter Radar (PFISR)

    Science.gov (United States)

    Zou, S.; Lu, J.; Varney, R. H.

    2017-12-01

    This study aims to investigate the occurrence rate of ion upflow and downflow events in the auroral ionosphere, using a full 3-year (2011-2013) dataset collected by the Poker Flat Incoherent Scatter Radar (PFISR) at 65.5° magnetic latitude. Ion upflow and downflow events are defined if there are three consecutive data points larger/smaller than 100/-100 m/s in the ion field-aligned velocity altitude profile. Their occurrence rates have been evaluated as a function of magnetic local time (MLT), season, geomagnetic activity, solar wind and interplanetary magnetic field (IMF). We found that the ion upflows are twice more likely to occur on the nightside than the dayside, and have slightly higher occurrence rate near Fall equinox. In contrast, the ion downflow events are more likely to occur in the afternoon sector but also during Fall equinox. In addition, the occurrence rate of ion upflows on the nightside increases when the aurora electrojet index (AE) and planetary K index (Kp) increase, while the downflows measured on the dayside clearly increase as the AE and Kp increase. In general, the occurrence rate of ion upflows increases with enhanced solar wind and IMF drivers. This correlation is particularly strong between the upflows on the nightside and the solar wind dynamic pressure and IMF Bz. The lack of correlation of upflows on the dayside with these parameters is due to the location of PFISR, which is usually equatorward of the dayside auroral zone and within the nightside auroral zone under disturbed conditions. The occurrence rate of downflow at all MLTs does not show strong dependence on the solar wind and IMF conditions. However, it occurs much more frequently on the dayside when the IMF By is strongly positive, i.e., >10 nT and the IMF Bz is strongly negative, i.e., < -10 nT. We suggest that the increased occurrence rate of downflows on the dayside is associated with dayside storm-enhanced density and the plume.

  5. Parallelism in matrix computations

    CERN Document Server

    Gallopoulos, Efstratios; Sameh, Ahmed H

    2016-01-01

    This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...

  6. Deceit and facial expression in children: the enabling role of the poker face child and the dependent personality of the detector

    Directory of Open Access Journals (Sweden)

    Marien eGadea

    2015-07-01

    Full Text Available This study presents the relation between the facial expression of a group of children when they told a lie and the accuracy in detecting the lie by a sample of adults. To evaluate the intensity and type of emotional content of the children’s faces, we applied an automated method capable of analysing the facial information from the video recordings (FaceReader 5.0 software. The program classified videos as showing a neutral facial expression or an emotional one. There was a significant higher mean of hits for the emotional than for the neutral videos, and a significant negative correlation between the intensity of the neutral expression and the number of hits from the judges. The lies expressed with emotional facial expression were more easily recognized by adults than the lies expressed with a poker face; thus, the less expressive was the child the harder was to guess. The accuracy of the lie detectors was then correlated with their subclinical traits of personality disorders, to find that participants scoring higher in the dependent personality were significantly better lie detectors. A non-significant tendency for women to discriminate better was also found, whereas men tending to be more suspicious than women when judging the children’s veracity. This study is the first to automatically decoding the facial information of the liar child and relating the results with personality characteristics of the lie detectors in the context of deceptive behavior research. Implications for forensic psychology were suggested: to explore whether the induction of an emotion in a child during an interview could be useful to evaluate the testimony during legal trials.

  7. SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS

    Directory of Open Access Journals (Sweden)

    M. K. Bouza

    2017-01-01

    Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.

  8. ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

    Directory of Open Access Journals (Sweden)

    Kamatani Naoyuki

    2011-05-01

    Full Text Available Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.

  9. ParaHaplo 2.0: a program package for haplotype-estimation and haplotype-based whole-genome association study using parallel computing

    Directory of Open Access Journals (Sweden)

    Kamatani Naoyuki

    2010-06-01

    Full Text Available Abstract Background The use of haplotype-based association tests can improve the power of genome-wide association studies. Since the observed genotypes are unordered pairs of alleles, haplotype phase must be inferred. However, estimating haplotype phase is time consuming. When millions of single-nucleotide polymorphisms (SNPs are analyzed in genome-wide association study, faster methods for haplotype estimation are required. Methods We developed a program package for parallel computation of haplotype estimation. Our program package, ParaHaplo 2.0, is intended for use in workstation clusters using the Intel Message Passing Interface (MPI. We compared the performance of our algorithm to that of the regular permutation test on both Japanese in Tokyo, Japan and Han Chinese in Beijing, China of the HapMap dataset. Results Parallel version of ParaHaplo 2.0 can estimate haplotypes 100 times faster than a non-parallel version of the ParaHaplo. Conclusion ParaHaplo 2.0 is an invaluable tool for conducting haplotype-based genome-wide association studies (GWAS. The need for fast haplotype estimation using parallel computing will become increasingly important as the data sizes of such projects continue to increase. The executable binaries and program sources of ParaHaplo are available at the following address: http://en.sourceforge.jp/projects/parallelgwas/releases/

  10. Integrating Task and Data Parallelism

    OpenAIRE

    Massingill, Berna

    1993-01-01

    Many models of concurrency and concurrent programming have been proposed; most can be categorized as either task-parallel (based on functional decomposition) or data-parallel (based on data decomposition). Task-parallel models are most effective for expressing irregular computations; data-parallel models are most effective for expressing regular computations. Some computations, however, exhibit both regular and irregular aspects. For such computations, a better programming model is one that i...

  11. Parallelization in Modern C++

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...

  12. Parallel rendering

    Science.gov (United States)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  13. Parallel computations

    CERN Document Server

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  14. Cue-Reactive Altered State of Consciousness Mediates the Relationship Between Problem-Gambling Severity and Cue-Reactive Urge in Poker-Machine Gamblers.

    Science.gov (United States)

    Tricker, Christopher; Rock, Adam J; Clark, Gavin I

    2016-06-01

    In order to enhance our understanding of the nature of poker-machine problem-gambling, a community sample of 37 poker-machine gamblers (M age = 32 years, M PGSI = 5; PGSI = Problem Gambling Severity Index) were assessed for urge to gamble (responses on a visual analogue scale) and altered state of consciousness (assessed by the Altered State of Awareness dimension of the Phenomenology of Consciousness Inventory) at baseline, after a neutral cue, and after a gambling cue. It was found that (a) problem-gambling severity (PGSI score) predicted increase in urge (from neutral cue to gambling cue, controlling for baseline; sr (2) = .19, p = .006) and increase in altered state of consciousness (from neutral cue to gambling cue, controlling for baseline; sr (2) = .57, p gambling cue) mediated the relationship between problem-gambling severity and increase in urge (from neutral cue to gambling cue; κ(2) = .40, 99 % CI [.08, .71]). These findings suggest that cue-reactive altered state of consciousness is an important component of cue-reactive urge in poker-machine problem-gamblers.

  15. Parallel computing works

    Energy Technology Data Exchange (ETDEWEB)

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  16. Parallel Simulation of Loosely Timed SystemC/TLM Programs: Challenges Raised by an Industrial Case Study

    Directory of Open Access Journals (Sweden)

    Denis Becker

    2016-05-01

    Full Text Available Transaction level models of systems-on-chip in SystemC are commonly used in the industry to provide an early simulation environment. The SystemC standard imposes coroutine semantics for the scheduling of simulated processes, to ensure determinism and reproducibility of simulations. However, because of this, sequential implementations have, for a long time, been the only option available, and still now the reference implementation is sequential. With the increasing size and complexity of models, and the multiplication of computation cores on recent machines, the parallelization of SystemC simulations is a major research concern. There have been several proposals for SystemC parallelization, but most of them are limited to cycle-accurate models. In this paper we focus on loosely timed models, which are commonly used in the industry. We present an industrial context and show that, unfortunately, most of the existing approaches for SystemC parallelization can fundamentally not apply in this context. We support this claim with a set of measurements performed on a platform used in production at STMicroelectronics. This paper surveys existing techniques, presents a visualization and profiling tool and identifies unsolved challenges in the parallelization of SystemC models at transaction level.

  17. Parallel algorithms

    CERN Document Server

    Casanova, Henri; Robert, Yves

    2008-01-01

    ""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi

  18. Imaging of Polar Mesosphere Summer Echoes with the 450 MHz Poker Flat Advanced Modular Incoherent Scatter Radar

    Science.gov (United States)

    Nicolls, M. J.; Heinselman, C. J.; Hope, E. A.; Ranjan, S.; Kelley, M. C.; Kelly, J. D.

    2007-10-01

    Polar Mesosphere Summer Echoes (PMSE) occur near the mesopause during the polar summer months. PMSE are primarily studied at VHF, however there have been some detections at higher frequencies. Here, we report on some of the first detections of PMSE with the 450 MHz (67 cm) Poker Flat Advanced Modular Incoherent Scatter Radar (PFISR). Echoes were observed with volume reflectivities (radar scattering cross section per unit volume) near 2-3 × 10-17 m-1. On 11 June 2007, PFISR was operating in a 26-beam position mode, with look directions spread over an approximately 80 by 80 km2 region at 85 km altitude with elevation angles as low as ~50°. The measurements showed patchy (tens of kilometer) irregularity regions drifting in from the north, in addition to smaller, more localized structures. There was no evidence for strong aspect sensitivity of these UHF echoes, as PMSE was observed in all look directions with relatively uniform intensity. The observations indicate the presence of fossilized irregularities drifting with the background wind field as well as areas of developing irregularities possibly associated with the presence of active neutral air turbulence.

  19. Parallel Pascal - An extended Pascal for parallel computers

    Science.gov (United States)

    Reeves, A. P.

    1984-01-01

    Parallel Pascal is an extended version of the conventional serial Pascal programming language which includes a convenient syntax for specifying array operations. It is upward compatible with standard Pascal and involves only a small number of carefully chosen new features. Parallel Pascal was developed to reduce the semantic gap between standard Pascal and a large range of highly parallel computers. Two important design goals of Parallel Pascal were efficiency and portability. Portability is particularly difficult to achieve since different parallel computers frequently have very different capabilities.

  20. Contribution to the algorithmic and efficient programming of new parallel architectures including accelerators for neutron physics and shielding computations

    International Nuclear Information System (INIS)

    Dubois, J.

    2011-01-01

    In science, simulation is a key process for research or validation. Modern computer technology allows faster numerical experiments, which are cheaper than real models. In the field of neutron simulation, the calculation of eigenvalues is one of the key challenges. The complexity of these problems is such that a lot of computing power may be necessary. The work of this thesis is first the evaluation of new computing hardware such as graphics card or massively multi-core chips, and their application to eigenvalue problems for neutron simulation. Then, in order to address the massive parallelism of supercomputers national, we also study the use of asynchronous hybrid methods for solving eigenvalue problems with this very high level of parallelism. Then we experiment the work of this research on several national supercomputers such as the Titane hybrid machine of the Computing Center, Research and Technology (CCRT), the Curie machine of the Very Large Computing Centre (TGCC), currently being installed, and the Hopper machine at the Lawrence Berkeley National Laboratory (LBNL). We also do our experiments on local workstations to illustrate the interest of this research in an everyday use with local computing resources. (author) [fr

  1. Design and Programming for Cable-Driven Parallel Robots in the German Pavilion at the EXPO 2015

    Directory of Open Access Journals (Sweden)

    Philipp Tempel

    2015-08-01

    Full Text Available In the German Pavilion at the EXPO 2015, two large cable-driven parallel robots are flying over the heads of the visitors representing two bees flying over Germany and displaying everyday life in Germany. Each robot consists of a mobile platform and eight cables suspended by winches and follows a desired trajectory, which needs to be computed in advance taking technical limitations, safety considerations and visual aspects into account. In this paper, a path planning software is presented, which includes the design process from developing a robot design and workspace estimation via planning complex trajectories considering technical limitations through to exporting a complete show. For a test trajectory, simulation results are given, which display the relevant trajectories and cable force distributions.

  2. Parallel computing: numerics, applications, and trends

    National Research Council Canada - National Science Library

    Trobec, Roman; Vajteršic, Marián; Zinterhof, Peter

    2009-01-01

    ... and/or distributed systems. The contributions to this book are focused on topics most concerned in the trends of today's parallel computing. These range from parallel algorithmics, programming, tools, network computing to future parallel computing. Particular attention is paid to parallel numerics: linear algebra, differential equations, numerica...

  3. Final Report, Center for Programming Models for Scalable Parallel Computing: Co-Array Fortran, Grant Number DE-FC02-01ER25505

    Energy Technology Data Exchange (ETDEWEB)

    Robert W. Numrich

    2008-04-22

    The major accomplishment of this project is the production of CafLib, an 'object-oriented' parallel numerical library written in Co-Array Fortran. CafLib contains distributed objects such as block vectors and block matrices along with procedures, attached to each object, that perform basic linear algebra operations such as matrix multiplication, matrix transpose and LU decomposition. It also contains constructors and destructors for each object that hide the details of data decomposition from the programmer, and it contains collective operations that allow the programmer to calculate global reductions, such as global sums, global minima and global maxima, as well as vector and matrix norms of several kinds. CafLib is designed to be extensible in such a way that programmers can define distributed grid and field objects, based on vector and matrix objects from the library, for finite difference algorithms to solve partial differential equations. A very important extra benefit that resulted from the project is the inclusion of the co-array programming model in the next Fortran standard called Fortran 2008. It is the first parallel programming model ever included as a standard part of the language. Co-arrays will be a supported feature in all Fortran compilers, and the portability provided by standardization will encourage a large number of programmers to adopt it for new parallel application development. The combination of object-oriented programming in Fortran 2003 with co-arrays in Fortran 2008 provides a very powerful programming model for high-performance scientific computing. Additional benefits from the project, beyond the original goal, include a programto provide access to the co-array model through access to the Cray compiler as a resource for teaching and research. Several academics, for the first time, included the co-array model as a topic in their courses on parallel computing. A separate collaborative project with LANL and PNNL showed how to

  4. Parallel software applications in high-energy physics

    CERN Document Server

    Biskup, Marek

    2008-01-01

    Parallel programming allows the speed of computations to be increased by using multiple processors or computers working jointly on the same task. In parallel programming dif culties that are not present in sequential programming can be encountered, for instance communication between processors. The way of writing a parallel program depends strictly on the architecture of a parallel system. An ef cient program of this kind not only performs its computations faster than its sequential version, but also effectively uses the CPU time. Parallel programming has been present in high-energy physics for years. The lecture is an introduction to parallel computing in general. It discusses the motivation for parallel computations, hardware architectures of parallel systems and the key concepts of a parallel programming. It also relates parallel computing to high-energy physics and presents a parallel programming application in the eld, namely PROOF.

  5. Structure_threader: An improved method for automation and parallelization of programs structure, fastStructure and MavericK on multicore CPU systems.

    Science.gov (United States)

    Pina-Martins, Francisco; Silva, Diogo N; Fino, Joana; Paulo, Octávio S

    2017-11-01

    Structure_threader is a program to parallelize multiple runs of genetic clustering software that does not make use of multithreading technology (structure, fastStructure and MavericK) on multicore computers. Our approach was benchmarked across multiple systems and displayed great speed improvements relative to the single-threaded implementation, scaling very close to linearly with the number of physical cores used. Structure_threader was compared to previous software written for the same task-ParallelStructure and StrAuto and was proven to be the faster (up to 25% faster) wrapper under all tested scenarios. Furthermore, Structure_threader can perform several automatic and convenient operations, assisting the user in assessing the most biologically likely value of 'K' via implementations such as the "Evanno," or "Thermodynamic Integration" tests and automatically draw the "meanQ" plots (static or interactive) for each value of K (or even combined plots). Structure_threader is written in python 3 and licensed under the GPLv3. It can be downloaded free of charge at https://github.com/StuntsPT/Structure_threader. © 2017 John Wiley & Sons Ltd.

  6. An Introduction to Parallel Computation R

    Indian Academy of Sciences (India)

    found in the Suggested Reading given at the end. Basic Programming Model. A parallel computer can be programmed by providing a program for each processor in it. In most common parallel computer organizations, a processor can only access its local memory. The program provided to each processor may perform ...

  7. Comparisons of Energy Management Methods for a Parallel Plug-In Hybrid Electric Vehicle between the Convex Optimization and Dynamic Programming

    Directory of Open Access Journals (Sweden)

    Renxin Xiao

    2018-01-01

    Full Text Available This paper proposes a comparison study of energy management methods for a parallel plug-in hybrid electric vehicle (PHEV. Based on detailed analysis of the vehicle driveline, quadratic convex functions are presented to describe the nonlinear relationship between engine fuel-rate and battery charging power at different vehicle speed and driveline power demand. The engine-on power threshold is estimated by the simulated annealing (SA algorithm, and the battery power command is achieved by convex optimization with target of improving fuel economy, compared with the dynamic programming (DP based method and the charging depleting–charging sustaining (CD/CS method. In addition, the proposed control methods are discussed at different initial battery state of charge (SOC values to extend the application. Simulation results validate that the proposed strategy based on convex optimization can save the fuel consumption and reduce the computation burden obviously.

  8. Parallel R

    CERN Document Server

    McCallum, Ethan

    2011-01-01

    It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.

  9. Jordan's 2002 to 2012 Fertility Stall and Parallel USAID Investments in Family Planning: Lessons From an Assessment to Guide Future Programming.

    Science.gov (United States)

    Spindler, Esther; Bitar, Nisreen; Solo, Julie; Menstell, Elizabeth; Shattuck, Dominick

    2017-12-28

    Health practitioners, researchers, and donors are stumped about Jordan's stalled fertility rate, which has stagnated between 3.7 and 3.5 children per woman from 2002 to 2012, above the national replacement level of 2.1. This stall paralleled United States Agency for International Development (USAID) funding investments in family planning in Jordan, triggering an assessment of USAID family planning programming in Jordan. This article describes the methods, results, and implications of the programmatic assessment. Methods included an extensive desk review of USAID programs in Jordan and 69 interviews with reproductive health stakeholders. We explored reasons for fertility stagnation in Jordan's total fertility rate (TFR) and assessed the effects of USAID programming on family planning outcomes over the same time period. The assessment results suggest that the increased use of less effective methods, in particular withdrawal and condoms, are contributing to Jordan's TFR stall. Jordan's limited method mix, combined with strong sociocultural determinants around reproduction and fertility desires, have contributed to low contraceptive effectiveness in Jordan. Over the same time period, USAID contributions toward increasing family planning access and use, largely focused on service delivery programs, were extensive. Examples of effective initiatives, among others, include task shifting of IUD insertion services to midwives due to a shortage of female physicians. However, key challenges to improved use of family planning services include limited government investments in family planning programs, influential service provider behaviors and biases that limit informed counseling and choice, pervasive strong social norms of family size and fertility, and limited availability of different contraceptive methods. In contexts where sociocultural norms and a limited method mix are the dominant barriers toward improved family planning use, increased national government investments

  10. Effectiveness of Inclusion of Dry Needling in a Multimodal Therapy Program for Patellofemoral Pain: A Randomized Parallel-Group Trial.

    Science.gov (United States)

    Espí-López, Gemma V; Serra-Añó, Pilar; Vicent-Ferrando, Juan; Sánchez-Moreno-Giner, Miguel; Arias-Buría, Jose L; Cleland, Joshua; Fernández-de-Las-Peñas, César

    2017-06-01

    Study Design Randomized controlled trial. Background Evidence suggests that multimodal interventions that include exercise therapy may be effective for patellofemoral pain (PFP); however, no study has investigated the effects of trigger point (TrP) dry needling (DN) in people with PFP. Objectives To compare the effects of adding TrP DN to a manual therapy and exercise program on pain, function, and disability in individuals with PFP. Methods Individuals with PFP (n = 60) recruited from a public hospital in Valencia, Spain were randomly allocated to manual therapy and exercises (n = 30) or manual therapy and exercise plus TrP DN (n = 30). Both groups received the same manual therapy and strengthening exercise program for 3 sessions (once a week for 3 weeks), and 1 group also received TrP DN to active TrPs within the vastus medialis and vastus lateralis muscles. The pain subscale of the Knee injury and Osteoarthritis Outcome Score (KOOS; 0-100 scale) was used as the primary outcome. Secondary outcomes included other subscales of the KOOS, the Knee Society Score, the International Knee Documentation Committee Subjective Knee Evaluation Form (IKDC), and the numeric pain-rating scale. Patients were assessed at baseline and at 15-day (posttreatment) and 3-month follow-ups. Analysis was conducted with mixed analyses of covariance, adjusted for baseline scores. Results At 3 months, 58 subjects (97%) completed the follow-up. No significant between-group differences (all, P>.391) were observed for any outcome: KOOS pain subscale mean difference, -2.1 (95% confidence interval [CI]: -4.6, 0.4); IKDC mean difference, 2.3 (95% CI: -0.1, 4.7); knee pain intensity mean difference, 0.3 (95% CI: -0.2, 0.8). Both groups experienced similar moderate-to-large within-group improvements in all outcomes (standardized mean differences of 0.6 to 1.1); however, only the KOOS function in sport and recreation subscale surpassed the prespecified minimum important change. Conclusion The current

  11. The parallel adult education system

    DEFF Research Database (Denmark)

    Wahlgren, Bjarne

    2015-01-01

    for competence development. The Danish university educational system includes two parallel programs: a traditional academic track (candidatus) and an alternative practice-based track (master). The practice-based program was established in 2001 and organized as part time. The total program takes half the time...

  12. Parallel Lines

    Directory of Open Access Journals (Sweden)

    James G. Worner

    2017-05-01

    Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship.   ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.

  13. Parallel Algorithms

    National Research Council Canada - National Science Library

    Sahni, Sartaj

    1999-01-01

    During the course of the contract we developed a model, the master-slave system, to accurately model the scheduling problem that arises when a program running on a host processor initiates many tasks...

  14. Internal combustion engine control for series hybrid electric vehicles by parallel and distributed genetic programming/multiobjective genetic algorithms

    Science.gov (United States)

    Gladwin, D.; Stewart, P.; Stewart, J.

    2011-02-01

    This article addresses the problem of maintaining a stable rectified DC output from the three-phase AC generator in a series-hybrid vehicle powertrain. The series-hybrid prime power source generally comprises an internal combustion (IC) engine driving a three-phase permanent magnet generator whose output is rectified to DC. A recent development has been to control the engine/generator combination by an electronically actuated throttle. This system can be represented as a nonlinear system with significant time delay. Previously, voltage control of the generator output has been achieved by model predictive methods such as the Smith Predictor. These methods rely on the incorporation of an accurate system model and time delay into the control algorithm, with a consequent increase in computational complexity in the real-time controller, and as a necessity relies to some extent on the accuracy of the models. Two complementary performance objectives exist for the control system. Firstly, to maintain the IC engine at its optimal operating point, and secondly, to supply a stable DC supply to the traction drive inverters. Achievement of these goals minimises the transient energy storage requirements at the DC link, with a consequent reduction in both weight and cost. These objectives imply constant velocity operation of the IC engine under external load disturbances and changes in both operating conditions and vehicle speed set-points. In order to achieve these objectives, and reduce the complexity of implementation, in this article a controller is designed by the use of Genetic Programming methods in the Simulink modelling environment, with the aim of obtaining a relatively simple controller for the time-delay system which does not rely on the implementation of real time system models or time delay approximations in the controller. A methodology is presented to utilise the miriad of existing control blocks in the Simulink libraries to automatically evolve optimal control

  15. Both the caspase CSP-1 and a caspase-independent pathway promote programmed cell death in parallel to the canonical pathway for apoptosis in Caenorhabditis elegans.

    Directory of Open Access Journals (Sweden)

    Daniel P Denning

    Full Text Available Caspases are cysteine proteases that can drive apoptosis in metazoans and have critical functions in the elimination of cells during development, the maintenance of tissue homeostasis, and responses to cellular damage. Although a growing body of research suggests that programmed cell death can occur in the absence of caspases, mammalian studies of caspase-independent apoptosis are confounded by the existence of at least seven caspase homologs that can function redundantly to promote cell death. Caspase-independent programmed cell death is also thought to occur in the invertebrate nematode Caenorhabditis elegans. The C. elegans genome contains four caspase genes (ced-3, csp-1, csp-2, and csp-3, of which only ced-3 has been demonstrated to promote apoptosis. Here, we show that CSP-1 is a pro-apoptotic caspase that promotes programmed cell death in a subset of cells fated to die during C. elegans embryogenesis. csp-1 is expressed robustly in late pachytene nuclei of the germline and is required maternally for its role in embryonic programmed cell deaths. Unlike CED-3, CSP-1 is not regulated by the APAF-1 homolog CED-4 or the BCL-2 homolog CED-9, revealing that csp-1 functions independently of the canonical genetic pathway for apoptosis. Previously we demonstrated that embryos lacking all four caspases can eliminate cells through an extrusion mechanism and that these cells are apoptotic. Extruded cells differ from cells that normally undergo programmed cell death not only by being extruded but also by not being engulfed by neighboring cells. In this study, we identify in csp-3; csp-1; csp-2 ced-3 quadruple mutants apoptotic cell corpses that fully resemble wild-type cell corpses: these caspase-deficient cell corpses are morphologically apoptotic, are not extruded, and are internalized by engulfing cells. We conclude that both caspase-dependent and caspase-independent pathways promote apoptotic programmed cell death and the phagocytosis of cell

  16. Expressing Parallelism with ROOT

    Science.gov (United States)

    Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

    2017-10-01

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  17. Expressing Parallelism with ROOT

    Energy Technology Data Exchange (ETDEWEB)

    Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab

    2017-11-22

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  18. Enhancing data parallel aplications with task parallelism

    OpenAIRE

    Fernández, Jacqueline; Guerrero, Roberto A.; Piccoli, María Fabiana; Printista, Alicia Marcela; Villalobos, M.

    2001-01-01

    Most parallel applications contain data parallelism and almost all discussion of its solutions has limited to the simplest and least expressive form: flat data parallelism. Several generalization of the flat data parallel model have been proposed because a large number of those applications need a combination of task and data parallelism to represent their natural computation structure and to achieve good performance in their results. Their aim is to allow the capability of combining the easi...

  19. Exploiting Symmetry on Parallel Architectures.

    Science.gov (United States)

    Stiller, Lewis Benjamin

    1995-01-01

    This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.

  20. EDDY RESOLVING NUTRIENT ECODYNAMICS IN THE GLOBAL PARALLEL OCEAN PROGRAM AND CONNECTIONS WITH TRACE GASES IN THE SULFUR, HALOGEN AND NMHC CYCLES

    Energy Technology Data Exchange (ETDEWEB)

    S. CHU; S. ELLIOTT

    2000-08-01

    Ecodynamics and the sea-air transfer of climate relevant trace gases are intimately coupled in the oceanic mixed layer. Ventilation of species such as dimethyl sulfide and methyl bromide constitutes a key linkage within the earth system. We are creating a research tool for the study of marine trace gas distributions by implementing coupled ecology-gas chemistry in the high resolution Parallel Ocean Program (POP). The fundamental circulation model is eddy resolving, with cell sizes averaging 0.15 degree (lat/long). Here we describe ecochemistry integration. Density dependent mortality and iron geochemistry have enhanced agreement with chlorophyll measurements. Indications are that dimethyl sulfide production rates must be adjusted for latitude dependence to match recent compilations. This may reflect the need for phytoplankton to conserve nitrogen by favoring sulfurous osmolytes. Global simulations are also available for carbonyl sulfide, the methyl halides and for nonmethane hydrocarbons. We discuss future applications including interaction with atmospheric chemistry models, high resolution biogeochemical snapshots and the study of open ocean fertilization.

  1. Parallelizing Data-Centric Programs

    Science.gov (United States)

    2013-09-25

    2009. [AWS] Amazon web services, search engines & web crawlers . http://aws.amazon.com/search-engines/. [BJvOR03] Olaf Bonorden, Ben Juurlink, Ingo von...datasets that represent graphs. These graphs may describe the Web , public social networks or other networks of strategic impor- tance to organizations like...ZWS+11] and time-to-solution is dramatically affected by the latency of the worst link. Goal: Service response time. Web services and portals, as well

  2. Streaming for Functional Data-Parallel Languages

    DEFF Research Database (Denmark)

    Madsen, Frederik Meisner

    In this thesis, we investigate streaming as a general solution to the space inefficiency commonly found in functional data-parallel programming languages. The data-parallel paradigm maps well to parallel SIMD-style hardware. However, the traditional fully materializing execution strategy......, and the limited memory in these architectures, severely constrains the data sets that can be processed. Moreover, the language-integrated cost semantics for nested data parallelism pioneered by NESL depends on a parallelism-flattening execution strategy that only exacerbates the problem. This is because...... by extending two existing data-parallel languages: NESL and Accelerate. In the extensions we map bulk operations to data-parallel streams that can evaluate fully sequential, fully parallel or anything in between. By a dataflow, piecewise parallel execution strategy, the runtime system can adjust to any target...

  3. Pattern-Driven Automatic Parallelization

    Directory of Open Access Journals (Sweden)

    Christoph W. Kessler

    1996-01-01

    Full Text Available This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequently occurring computations and programming concepts in the source code. This tool also works for dusty deck codes that have been "encrypted" by former machine-specific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machine-specific library routines, and powerful program transformations. The partially restored program semantics also supports local array alignment, distribution, and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.

  4. Is Monte Carlo embarrassingly parallel?

    International Nuclear Information System (INIS)

    Hoogenboom, J. E.

    2012-01-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  5. Parallelization of the molecular dynamics code GROMOS87 for distributed memory parallel architectures

    NARCIS (Netherlands)

    Green, DG; Meacham, KE; vanHoesel, F; Hertzberger, B; Serazzi, G

    1995-01-01

    This paper describes the techniques and methodologies employed during parallelization of the Molecular Dynamics (MD) code GROMOS87, with the specific requirement that the program run efficiently on a range of distributed-memory parallel platforms. We discuss the preliminary results of our parallel

  6. Implementation and performance of parallelized elegant

    International Nuclear Information System (INIS)

    Wang, Y.; Borland, M.

    2008-01-01

    The program elegant is widely used for design and modeling of linacs for free-electron lasers and energy recovery linacs, as well as storage rings and other applications. As part of a multi-year effort, we have parallelized many aspects of the code, including single-particle dynamics, wakefields, and coherent synchrotron radiation. We report on the approach used for gradual parallelization, which proved very beneficial in getting parallel features into the hands of users quickly. We also report details of parallelization of collective effects. Finally, we discuss performance of the parallelized code in various applications.

  7. Parallel algorithms for continuum dynamics

    International Nuclear Information System (INIS)

    Hicks, D.L.; Liebrock, L.M.

    1987-01-01

    Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors

  8. Algorithms for the Construction of Parallel Tests by Zero-One Programming. Project Psychometric Aspects of Item Banking No. 7. Research Report 86-7.

    Science.gov (United States)

    Boekkooi-Timminga, Ellen

    Nine methods for automated test construction are described. All are based on the concepts of information from item response theory. Two general kinds of methods for the construction of parallel tests are presented: (1) sequential test design; and (2) simultaneous test design. Sequential design implies that the tests are constructed one after the…

  9. Massively Parallel QCD

    Energy Technology Data Exchange (ETDEWEB)

    Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G

    2007-04-11

    The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results.

  10. On-the-fly pipeline parallelism

    OpenAIRE

    Lee, I-Ting Angelina; Leiserson, Charles E.; Sukha, Jim; Zhang, Zhunping; Schardl, Tao Benjamin

    2013-01-01

    Pipeline parallelism organizes a parallel program as a linear sequence of s stages. Each stage processes elements of a data stream, passing each processed data element to the next stage, and then taking on a new element before the subsequent stages have necessarily completed their processing. Pipeline parallelism is used especially in streaming applications that perform video, audio, and digital signal processing. Three out of 13 benchmarks in PARSEC, a popular software benchmark suite design...

  11. Parallel sorting algorithms

    CERN Document Server

    Akl, Selim G

    1985-01-01

    Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the

  12. The language parallel Pascal and other aspects of the massively parallel processor

    Science.gov (United States)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  13. Parallel BLAST on split databases.

    Science.gov (United States)

    Mathog, David R

    2003-09-22

    BLAST programs often run on large SMP machines where multiple threads can work simultaneously and there is enough memory to cache the databases between program runs. A group of programs is described which allows comparable performance to be achieved with a Beowulf configuration in which no node has enough memory to cache a database but the cluster as an aggregate does. To achieve this result, databases are split into equal sized pieces and stored locally on each node. Each query is run on all nodes in parallel and the resultant BLAST output files from all nodes merged to yield the final output. Source code is available from ftp://saf.bio.caltech.edu/

  14. Parallel computing works!

    CERN Document Server

    Fox, Geoffrey C; Messina, Guiseppe C

    2014-01-01

    A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop

  15. Parallel simulation today

    Science.gov (United States)

    Nicol, David; Fujimoto, Richard

    1992-01-01

    This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

  16. An object-oriented approach to nested data parallelism

    Science.gov (United States)

    Sheffler, Thomas J.; Chatterjee, Siddhartha

    1994-01-01

    This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.

  17. .NET 4.5 parallel extensions

    CERN Document Server

    Freeman, Bryan

    2013-01-01

    This book contains practical recipes on everything you will need to create task-based parallel programs using C#, .NET 4.5, and Visual Studio. The book is packed with illustrated code examples to create scalable programs.This book is intended to help experienced C# developers write applications that leverage the power of modern multicore processors. It provides the necessary knowledge for an experienced C# developer to work with .NET parallelism APIs. Previous experience of writing multithreaded applications is not necessary.

  18. Characteristics and error estimation of stratospheric ozone and ozone-related species over Poker Flat (65° N, 147° W, Alaska observed by a ground-based FTIR spectrometer from 2001 to 2003

    Directory of Open Access Journals (Sweden)

    K. Mizutani

    2007-07-01

    Full Text Available It is important to obtain the year-to-year trend of stratospheric minor species in the context of global changes. An important example is the trend in global ozone depletion. The purpose of this paper is to report the accuracy and precision of measurements of stratospheric chemical species that are made at our Poker Flat site in Alaska (65° N, 147° W. Since 1999, minor atmospheric molecules have been observed using a Fourier-Transform solar-absorption infrared Spectrometer (FTS at Poker Flat. Vertical profiles of the abundances of ozone, HNO3, HCl, and HF for the period from 2001 to 2003 were retrieved from FTS spectra using Rodgers' formulation of the Optimal Estimation Method (OEM. The accuracy and precision of the retrievals were estimated by formal error analysis. Errors for the total column were estimated to be 5.3%, 3.4%, 5.9%, and 5.3% for ozone, HNO3, HCl, and HF, respectively. The ozone vertical profiles were in good agreement with profiles derived from collocated ozonesonde measurements that were smoothed with averaging kernel functions that had been obtained with the retrieval procedure used in the analysis of spectra from the ground-based FTS (gb-FTS. The O3, HCl, and HF columns that were retrieved from the FTS measurements were consistent with Earth Probe/Total Ozone Mapping Spectrometer (TOMS and HALogen Occultation Experiment (HALOE data over Alaska within the error limits of all the respective datasets. This is the first report from the Poker Flat FTS observation site on a number of stratospheric gas profiles including a comprehensive error analysis.

  19. Computer-Aided Parallelizer and Optimizer

    Science.gov (United States)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  20. Parallel Atomistic Simulations

    Energy Technology Data Exchange (ETDEWEB)

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  1. A parallel buffer tree

    DEFF Research Database (Denmark)

    Sitchinava, Nodar; Zeh, Norbert

    2012-01-01

    We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number...... of available processor cores compared to its sequential counterpart, thereby taking full advantage of multicore parallelism. The parallel buffer tree is a search tree data structure that supports the batched parallel processing of a sequence of N insertions, deletions, membership queries, and range queries...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....

  2. Stage-by-Stage and Parallel Flow Path Compressor Modeling for a Variable Cycle Engine, NASA Advanced Air Vehicles Program - Commercial Supersonic Technology Project - AeroServoElasticity

    Science.gov (United States)

    Kopasakis, George; Connolly, Joseph W.; Cheng, Larry

    2015-01-01

    This paper covers the development of stage-by-stage and parallel flow path compressor modeling approaches for a Variable Cycle Engine. The stage-by-stage compressor modeling approach is an extension of a technique for lumped volume dynamics and performance characteristic modeling. It was developed to improve the accuracy of axial compressor dynamics over lumped volume dynamics modeling. The stage-by-stage compressor model presented here is formulated into a parallel flow path model that includes both axial and rotational dynamics. This is done to enable the study of compressor and propulsion system dynamic performance under flow distortion conditions. The approaches utilized here are generic and should be applicable for the modeling of any axial flow compressor design accurate time domain simulations. The objective of this work is as follows. Given the parameters describing the conditions of atmospheric disturbances, and utilizing the derived formulations, directly compute the transfer function poles and zeros describing these disturbances for acoustic velocity, temperature, pressure, and density. Time domain simulations of representative atmospheric turbulence can then be developed by utilizing these computed transfer functions together with the disturbance frequencies of interest.

  3. Parallel Computing Strategies for Irregular Algorithms

    Science.gov (United States)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  4. Parallel Algorithms and Patterns

    Energy Technology Data Exchange (ETDEWEB)

    Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-06-16

    This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.

  5. PCLIPS: Parallel CLIPS

    Science.gov (United States)

    Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

    1994-01-01

    A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.

  6. Parallel MR imaging.

    Science.gov (United States)

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole

    2012-07-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.

  7. Automatic Management of Parallel and Distributed System Resources

    Science.gov (United States)

    Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

    1990-01-01

    Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.

  8. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2009-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...

  9. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2008-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...

  10. Applications of the parallel computing system using network

    International Nuclear Information System (INIS)

    Ido, Shunji; Hasebe, Hiroki

    1994-01-01

    Parallel programming is applied to multiple processors connected in Ethernet. Data exchanges between tasks located in each processing element are realized by two ways. One is socket which is standard library on recent UNIX operating systems. Another is a network connecting software, named as Parallel Virtual Machine (PVM) which is a free software developed by ORNL, to use many workstations connected to network as a parallel computer. This paper discusses the availability of parallel computing using network and UNIX workstations and comparison between specialized parallel systems (Transputer and iPSC/860) in a Monte Carlo simulation which generally shows high parallelization ratio. (author)

  11. Parallel reservoir simulator computations

    International Nuclear Information System (INIS)

    Hemanth-Kumar, K.; Young, L.C.

    1995-01-01

    The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90

  12. Totally parallel multilevel algorithms

    Science.gov (United States)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  13. Computer program MCAP-TOSS calculates steady-state fluid dynamics of coolant in parallel channels and temperature distribution in surrounding heat-generating solid

    Science.gov (United States)

    Lee, A. Y.

    1967-01-01

    Computer program calculates the steady state fluid distribution, temperature rise, and pressure drop of a coolant, the material temperature distribution of a heat generating solid, and the heat flux distributions at the fluid-solid interfaces. It performs the necessary iterations automatically within the computer, in one machine run.

  14. On the parallelization of molecular dynamics codes

    Science.gov (United States)

    Trabado, G. P.; Plata, O.; Zapata, E. L.

    2002-08-01

    Molecular dynamics (MD) codes present a high degree of spatial data locality and a significant amount of independent computations. However, most of the parallelization strategies are usually based on the manual transformation of sequential programs either by completely rewriting the code with message passing routines or using specific libraries intended for writing new MD programs. In this paper we propose a new library-based approach (DDLY) which supports parallelization of existing short-range MD sequential codes. The novelty of this approach is that it can directly handle the distribution of common data structures used in MD codes to represent data (arrays, Verlet lists, link cells), using domain decomposition. Thus, the insertion of run-time support for distribution and communication in a MD program does not imply significant changes to its structure. The method is simple, efficient and portable. It may be also used to extend existing parallel programming languages, such as HPF.

  15. An Introduction to Parallel Computation R

    Indian Academy of Sciences (India)

    Parallel computers have also been very useful for solving problems in engineering and biological sciences as well as in business data processing. ..... depends upon whether high speed is the only objective, or whether program portability, ease of program development are also important. For achieving very high speed, ...

  16. Adapting algorithms to massively parallel hardware

    CERN Document Server

    Sioulas, Panagiotis

    2016-01-01

    In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.

  17. Algorithms for parallel computers

    International Nuclear Information System (INIS)

    Churchhouse, R.F.

    1985-01-01

    Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)

  18. Parallelism and array processing

    International Nuclear Information System (INIS)

    Zacharov, V.

    1983-01-01

    Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)

  19. The PARTY parallel runtime system

    Science.gov (United States)

    Saltz, J. H.; Mirchandaney, Ravi; Smith, R. M.; Crowley, Kay; Nicol, D. M.

    1989-01-01

    In the present automated system for the organization of the data and computational operations entailed by parallel problems, in ways that optimize multiprocessor performance, general heuristics for partitioning program data and control are implemented by capturing and manipulating representations of a computation at run time. These heuristics are directed toward the dynamic identification and allocation of concurrent work in computations with irregular computational patterns. An optimized static-workload partitioning is computed for such repetitive-computation pattern problems as the iterative ones employed in scientific computation.

  20. Abstract Level Parallelization of Finite Difference Methods

    Directory of Open Access Journals (Sweden)

    Edwin Vollebregt

    1997-01-01

    Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.

  1. Parallel magnetic resonance imaging

    International Nuclear Information System (INIS)

    Larkman, David J; Nunes, Rita G

    2007-01-01

    Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)

  2. Global Optimal Energy Management Strategy Research for a Plug-In Series-Parallel Hybrid Electric Bus by Using Dynamic Programming

    Directory of Open Access Journals (Sweden)

    Hongwen He

    2013-01-01

    Full Text Available Energy management strategy influences the power performance and fuel economy of plug-in hybrid electric vehicles greatly. To explore the fuel-saving potential of a plug-in hybrid electric bus (PHEB, this paper searched the global optimal energy management strategy using dynamic programming (DP algorithm. Firstly, the simplified backward model of the PHEB was built which is necessary for DP algorithm. Then the torque and speed of engine and the torque of motor were selected as the control variables, and the battery state of charge (SOC was selected as the state variables. The DP solution procedure was listed, and the way was presented to find all possible control variables at every state of each stage in detail. Finally, the appropriate SOC increment is determined after quantizing the state variables, and then the optimal control of long driving distance of a specific driving cycle is replaced with the optimal control of one driving cycle, which reduces the computational time significantly and keeps the precision at the same time. The simulation results show that the fuel economy of the PEHB with the optimal energy management strategy is improved by 53.7% compared with that of the conventional bus, which can be a benchmark for the assessment of other control strategies.

  3. A parallel PCG solver for MODFLOW.

    Science.gov (United States)

    Dong, Yanhui; Li, Guomin

    2009-01-01

    In order to simulate large-scale ground water flow problems more efficiently with MODFLOW, the OpenMP programming paradigm was used to parallelize the preconditioned conjugate-gradient (PCG) solver with in this study. Incremental parallelization, the significant advantage supported by OpenMP on a shared-memory computer, made the solver transit to a parallel program smoothly one block of code at a time. The parallel PCG solver, suitable for both MODFLOW-2000 and MODFLOW-2005, is verified using an 8-processor computer. Both the impact of compilers and different model domain sizes were considered in the numerical experiments. Based on the timing results, execution times using the parallel PCG solver are typically about 1.40 to 5.31 times faster than those using the serial one. In addition, the simulation results are the exact same as the original PCG solver, because the majority of serial codes were not changed. It is worth noting that this parallelizing approach reduces cost in terms of software maintenance because only a single source PCG solver code needs to be maintained in the MODFLOW source tree. Copyright © 2009 The Author(s). Journal Compilation © 2009 National Ground Water Association.

  4. Efficiency analysis of a physical problem: different parallel computational approaches for a dynamical integrator evolution

    OpenAIRE

    Gaudiani, Adriana; Carusela, Florencia; Soba, Alejandro

    2013-01-01

    A great challenge for scientists is to execute their computational applications efficiently. Nowadays, parallel programming has become a fundamental key to achieve this goal. High-performance computing provides a solution to exploit parallel architectures in order to get optimal performance. Both parallel programming model and the system architecture will maximize the benefits if both together are suitable to the inherent parallelism of the problem. We compared three parallelized versions ...

  5. Parallel time integration software

    Energy Technology Data Exchange (ETDEWEB)

    2014-07-01

    This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds must come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.

  6. Effect of Combined Systematized Behavioral Modification Education Program With Desmopressin in Patients With Nocturia: A Prospective, Multicenter, Randomized, and Parallel Study

    Directory of Open Access Journals (Sweden)

    Sung Yong Cho

    2014-12-01

    Full Text Available Purpose The aims of this study were to investigate the efficacy of combining the systematized behavioral modification program (SBMP with desmopressin therapy and to compare this with desmopressin monotherapy in the treatment of nocturnal polyuria (NPU. Methods Patients were randomized at 8 centers to receive desmopressin monotherapy (group A or combination therapy, comprising desmopressin and the SBMP (group B. Nocturia was defined as an average of 2 or more nightly voids. The primary endpoint was a change in the mean number of nocturnal voids from baseline during the 3-month treatment period. The secondary endpoints were changes in the bladder diary parameters and questionnaires scores, and improvements in self-perception for nocturia. Results A total of 200 patients were screened and 76 were excluded from the study, because they failed the screening process. A total of 124 patients were randomized to receive treatment, with group A comprising 68 patients and group B comprising 56 patients. The patients' characteristics were similar between the groups. Nocturnal voids showed a greater decline in group B (-1.5 compared with group A (-1.2, a difference that was not statistically significant. Significant differences were observed between groups A and B with respect to the NPU index (0.37 vs. 0.29, P=0.028, the change in the maximal bladder capacity (-41.3 mL vs. 13.3 mL, P<0.001, and the rate of patients lost to follow up (10.3% [7/68] vs. 0% [0/56], P=0.016. Self-perception for nocturia significantly improved in both groups. Conclusions Combination treatment did not have any additional benefits in relation to reducing nocturnal voids in patients with NPU; however, combination therapy is helpful because it increases the maximal bladder capacity and decreases the NPI. Furthermore, combination therapy increased the persistence of desmopressin in patients with NPU.

  7. 6th International Parallel Tools Workshop

    CERN Document Server

    Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang

    2013-01-01

    The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and  tuning work required. This process  is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus  making a major contribution to the development of  robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.

  8. Massively parallel multicanonical simulations

    Science.gov (United States)

    Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

    2018-03-01

    Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.

  9. SPINning parallel systems software

    International Nuclear Information System (INIS)

    Matlin, O.S.; Lusk, E.; McCune, W.

    2002-01-01

    We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin

  10. Automata-theoretic protocol programming

    NARCIS (Netherlands)

    Jongmans, Sung-Shik Theodorus Quirinus

    2016-01-01

    Parallel programming has become essential for writing scalable programs on general hardware. Conceptually, every parallel program consists of workers, which implement primary units of sequential computation, and protocols, which implement the rules of interaction that workers must abide by. As

  11. Combined Scheduling and Mapping for Scalable Computing with Parallel Tasks

    Directory of Open Access Journals (Sweden)

    Jörg Dümmler

    2012-01-01

    Full Text Available Recent and future parallel clusters and supercomputers use symmetric multiprocessors (SMPs and multi-core processors as basic nodes, providing a huge amount of parallel resources. These systems often have hierarchically structured interconnection networks combining computing resources at different levels, starting with the interconnect within multi-core processors up to the interconnection network combining nodes of the cluster or supercomputer. The challenge for the programmer is that these computing resources should be utilized efficiently by exploiting the available degree of parallelism of the application program and by structuring the application in a way which is sensitive to the heterogeneous interconnect. In this article, we pursue a parallel programming method using parallel tasks to structure parallel implementations. A parallel task can be executed by multiple processors or cores and, for each activation of a parallel task, the actual number of executing cores can be adapted to the specific execution situation. In particular, we propose a new combined scheduling and mapping technique for parallel tasks with dependencies that takes the hierarchical structure of modern multi-core clusters into account. An experimental evaluation shows that the presented programming approach can lead to a significantly higher performance compared to standard data parallel implementations.

  12. Parallel Fast Legendre Transform

    NARCIS (Netherlands)

    Alves de Inda, M.; Bisseling, R.H.; Maslen, D.K.

    1998-01-01

    We discuss a parallel implementation of a fast algorithm for the discrete polynomial Legendre transform We give an introduction to the DriscollHealy algorithm using polynomial arithmetic and present experimental results on the eciency and accuracy of our implementation The algorithms were

  13. Parallel k-means++

    Energy Technology Data Exchange (ETDEWEB)

    2017-04-04

    A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.

  14. Parallel universes beguile science

    CERN Multimedia

    2007-01-01

    A staple of mind-bending science fiction, the possibility of multiple universes has long intrigued hard-nosed physicists, mathematicians and cosmologists too. We may not be able -- as least not yet -- to prove they exist, many serious scientists say, but there are plenty of reasons to think that parallel dimensions are more than figments of eggheaded imagination.

  15. Massively parallel signature sequencing.

    Science.gov (United States)

    Zhou, Daixing; Rao, Mahendra S; Walker, Roger; Khrebtukova, Irina; Haudenschild, Christian D; Miura, Takumi; Decola, Shannon; Vermaas, Eric; Moon, Keith; Vasicek, Thomas J

    2006-01-01

    Massively parallel signature sequencing is an ultra-high throughput sequencing technology. It can simultaneously sequence millions of sequence tags, and, therefore, is ideal for whole genome analysis. When applied to expression profiling, it reveals almost every transcript in the sample and provides its accurate expression level. This chapter describes the technology and its application in establishing stem cell transcriptome databases.

  16. Parallel hierarchical radiosity rendering

    Energy Technology Data Exchange (ETDEWEB)

    Carter, Michael [Iowa State Univ., Ames, IA (United States)

    1993-07-01

    In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.

  17. Parallel hierarchical global illumination

    Energy Technology Data Exchange (ETDEWEB)

    Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  18. Parallel Splash Belief Propagation

    Science.gov (United States)

    2010-08-01

    Service, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202-4302, and to the Office of...heaps: An alternative to Fibonacci heaps with applications to parallel computation. Communications of the ACM, 31:1343–1354, 1988. G. Elidan, I. Mcgraw

  19. s2k: A parallel language for streaming text extraction and transformations

    OpenAIRE

    Herdy, Kenneth Stuart

    2015-01-01

    Computer architectures continue to evolve and expose additional hardware parallelism to software applications. Although programming systems can leverage advances in hardware parallelism for the processing of numerical data, finding ways to exploit additional hardware parallelism for text data is particularly challenging. To address this challenge we define s2k, a global-view parallel programming language for streaming text extraction and transformations that integrates stream programming abst...

  20. FRPA: A Framework for Recursive Parallel Algorithms

    Science.gov (United States)

    2015-05-01

    observations that communication costs often dominate computation costs. Previous work [1]–[3] demonstrates that carefully choosing which divide -and- conquer ...CARMA [3] matrix multiplication, mergesort and quicksort sorting algorithms , solving triangular systems of linear equations with many right-hand sides... divide - and- conquer programs as input, and outputs parallel implementations for multiprocessor machines. However, FRPA is a much simpler framework due

  1. A Scalable Prescriptive Parallel Debugging Model

    DEFF Research Database (Denmark)

    Jensen, Nicklas Bo; Quarfot Nielsen, Niklas; Lee, Gregory L.

    2015-01-01

    Debugging is a critical step in the development of any parallel program. However, the traditional interactive debugging model, where users manually step through code and inspect their application, does not scale well even for current supercomputers due its centralized nature. While lightweight...

  2. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)

  3. NDetermin: Inferring Nondeterministic Sequential Specifications for Parallelism Correctness

    Science.gov (United States)

    2011-12-16

    for Deterministic Parallel Java. In Object- Oriented Programming, Systems, Languages , and Applications OOP - SLA, pages 97–116, 2009. [3] S. Burckhardt...formalize these concepts next after describing the language constructs that we use to write parallel programs and their specifications. We will follow...serializability checking tech- niques to verify the program against its NDSeq specification. The syntax for the language is shown in Figure 5. To

  4. Mixed Task and Data Parallel Executions in General Linear Methods

    Directory of Open Access Journals (Sweden)

    Thomas Rauber

    2007-01-01

    Full Text Available On many parallel target platforms it can be advantageous to implement parallel applications as a collection of multiprocessor tasks that are concurrently executed and are internally implemented with fine-grain SPMD parallelism. A class of applications which can benefit from this programming style are methods for solving systems of ordinary differential equations. Many recent solvers have been designed with an additional potential of method parallelism, but the actual effectiveness of mixed task and data parallelism depends on the specific communication and computation requirements imposed by the equation to be solved. In this paper we study mixed task and data parallel implementations for general linear methods realized using a library for multiprocessor task programming. Experiments on a number of different platforms show good efficiency results.

  5. Parallel finite-difference time-domain method

    CERN Document Server

    Yu, Wenhua

    2006-01-01

    The finite-difference time-domain (FTDT) method has revolutionized antenna design and electromagnetics engineering. This book raises the FDTD method to the next level by empowering it with the vast capabilities of parallel computing. It shows engineers how to exploit the natural parallel properties of FDTD to improve the existing FDTD method and to efficiently solve more complex and large problem sets. Professionals learn how to apply open source software to develop parallel software and hardware to run FDTD in parallel for their projects. The book features hands-on examples that illustrate the power of parallel FDTD and presents practical strategies for carrying out parallel FDTD. This detailed resource provides instructions on downloading, installing, and setting up the required open source software on either Windows or Linux systems, and includes a handy tutorial on parallel programming.

  6. MASSIVELY PARALLEL GENETIC PROGRAMMING ON GPUS

    OpenAIRE

    CLEOMAR PEREIRA DA SILVA

    2014-01-01

    A Programação Genética permite que computadores resolvam problemas automaticamente, sem que eles tenham sido programados para tal. Utilizando a inspiração no princípio da seleção natural de Darwin, uma população de programas, ou indivíduos, é mantida, modificada baseada em variação genética, e avaliada de acordo com uma função de aptidão (fitness). A programação genética tem sido usada com sucesso por uma série de aplicações como projeto automático, reconhecimento de padrões...

  7. Parallel grid population

    Science.gov (United States)

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  8. Ultrascalable petaflop parallel supercomputer

    Science.gov (United States)

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  9. PARALLEL MOVING MECHANICAL SYSTEMS

    Directory of Open Access Journals (Sweden)

    Florian Ion Tiberius Petrescu

    2014-09-01

    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Moving mechanical systems parallel structures are solid, fast, and accurate. Between parallel systems it is to be noticed Stewart platforms, as the oldest systems, fast, solid and precise. The work outlines a few main elements of Stewart platforms. Begin with the geometry platform, kinematic elements of it, and presented then and a few items of dynamics. Dynamic primary element on it means the determination mechanism kinetic energy of the entire Stewart platforms. It is then in a record tail cinematic mobile by a method dot matrix of rotation. If a structural mottoelement consists of two moving elements which translates relative, drive train and especially dynamic it is more convenient to represent the mottoelement as a single moving components. We have thus seven moving parts (the six motoelements or feet to which is added mobile platform 7 and one fixed.

  10. An Expert System for the Development of Efficient Parallel Code

    Science.gov (United States)

    Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.

  11. Xyce parallel electronic simulator.

    Energy Technology Data Exchange (ETDEWEB)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  12. Algorithmically specialized parallel computers

    CERN Document Server

    Snyder, Lawrence; Gannon, Dennis B

    1985-01-01

    Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster

  13. Stability of parallel flows

    CERN Document Server

    Betchov, R

    2012-01-01

    Stability of Parallel Flows provides information pertinent to hydrodynamical stability. This book explores the stability problems that occur in various fields, including electronics, mechanics, oceanography, administration, economics, as well as naval and aeronautical engineering. Organized into two parts encompassing 10 chapters, this book starts with an overview of the general equations of a two-dimensional incompressible flow. This text then explores the stability of a laminar boundary layer and presents the equation of the inviscid approximation. Other chapters present the general equation

  14. Anti-parallel triplexes

    DEFF Research Database (Denmark)

    Kosbar, Tamer R.; Sofan, Mamdouh A.; Waly, Mohamed A.

    2015-01-01

    -parallel TFO strand was modified with Y with one or two insertions at the end of the TFO strand, the thermal stability was increased 1.2 °C and 3 °C at pH 7.2, respectively, whereas one insertion in the middle of the TFO strand decreased the thermal stability 1.4 °C compared to the wild type oligonucleotide......The phosphoramidites of DNA monomers of 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine (Y) and 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine LNA (Z) are synthesized, and the thermal stability at pH 7.2 and 8.2 of anti-parallel triplexes modified with these two monomers is determined. When, the anti...... chain, especially at the end of the TFO strand. On the other hand, the thermal stability of the anti-parallel triplex was dramatically decreased when the TFO strand was modified with the LNA monomer analog Z in the middle of the TFO strand (ΔTm = -9.1 °C). Also the thermal stability decreased...

  15. The ongoing investigation of high performance parallel computing in HEP

    CERN Document Server

    Peach, Kenneth J; Böck, R K; Dobinson, Robert W; Hansroul, M; Norton, Alan Robert; Willers, Ian Malcolm; Baud, J P; Carminati, F; Gagliardi, F; McIntosh, E; Metcalf, M; Robertson, L; CERN. Geneva. Detector Research and Development Committee

    1993-01-01

    Past and current exploitation of parallel computing in High Energy Physics is summarized and a list of R & D projects in this area is presented. The applicability of new parallel hardware and software to physics problems is investigated, in the light of the requirements for computing power of LHC experiments and the current trends in the computer industry. Four main themes are discussed (possibilities for a finer grain of parallelism; fine-grain communication mechanism; usable parallel programming environment; different programming models and architectures, using standard commercial products). Parallel computing technology is potentially of interest for offline and vital for real time applications in LHC. A substantial investment in applications development and evaluation of state of the art hardware and software products is needed. A solid development environment is required at an early stage, before mainline LHC program development begins.

  16. Comparative efficiencies of three parallel algorithms for nonlinear ...

    Indian Academy of Sciences (India)

    R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

    overall organisation and data structures of the program. Many researchers have devised algorithms for nonlinear dynamic analysis exploiting parallelism in both explicit and implicit time integration techniques. The explicit time integration algorithms like central difference method can easily be moved on to parallel pro-.

  17. TWO PHASE FLOW SPLIT MODEL FOR PARALLEL CHANNELS

    African Journals Online (AJOL)

    Ifeanyichukwu Onwuka

    A model has been developed for the determination of two phase flow distributions between multiple parallel channels which ... transients, up to ten parallel flow paths, simple and complicated geometries, including the boilers of fossil steam generators and ..... The above model and numerical technique were programmed in ...

  18. Improved techniques of parallel gap welding and monitoring

    Science.gov (United States)

    Mardesich, N.; Gillanders, M. S.

    1984-01-01

    Welding programs which show that parallel gas welding is a reliable process are discussed. When monitoring controls and nondestructive tests are incorporated into the process, parallel gap welding becomes more reliable and cost effective. The panel fabrication techniques and the HAC thermal cycling test indicate reliable product integrity. The design and building of automated tooling and fixturing for welding are discussed.

  19. Impact of a nurse-directed, coordinated school health program to enhance physical activity behaviors and reduce body mass index among minority children: a parallel-group, randomized control trial.

    Science.gov (United States)

    Wright, Kynna; Giger, Joyce Newman; Norris, Keth; Suro, Zulma

    2013-06-01

    Underserved children, particularly girls and those in urban communities, do not meet the recommended physical activity guidelines (>60min of daily physical activity), and this behavior can lead to obesity. The school years are known to be a critical period in the life course for shaping attitudes and behaviors. Children look to schools for much of their access to physical activity. Thus, through the provision of appropriate physical activity programs, schools have the power to influence apt physical activity choices, especially for underserved children where disparities in obesity-related outcomes exist. To evaluate the impact of a nurse directed, coordinated, culturally sensitive, school-based, family-centered lifestyle program on activity behaviors and body mass index. This was a parallel group, randomized controlled trial utilizing a community-based participatory research approach, through a partnership with a University and 5 community schools. Participants included 251 children ages 8-12 from elementary schools in urban, low-income neighborhoods in Los Angeles, USA. The intervention included Kids N Fitness(©), a 6-week program which met weekly to provide 45min of structured physical activity and a 45min nutrition education class for parents and children. Intervention sites also participated in school-wide wellness activities, including health and counseling services, staff professional development in health promotion, parental education newsletters, and wellness policies for the provision of healthy foods at the school. The Child and Adolescent Trial for Cardiovascular Health School Physical Activity and Nutrition Student Questionnaire measured physical activity behavior, including: daily physical activity, participation in team sports, attending physical education class, and TV viewing/computer game playing. Anthropometric measures included height, weight, body mass index, resting blood pressure, and waist circumference. Measures were collected at baseline

  20. Resistor Combinations for Parallel Circuits.

    Science.gov (United States)

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  1. Requirements for supercomputing in energy research: The transition to massively parallel computing

    Energy Technology Data Exchange (ETDEWEB)

    1993-02-01

    This report discusses: The emergence of a practical path to TeraFlop computing and beyond; requirements of energy research programs at DOE; implementation: supercomputer production computing environment on massively parallel computers; and implementation: user transition to massively parallel computing.

  2. The 2nd Symposium on the Frontiers of Massively Parallel Computations

    Science.gov (United States)

    Mills, Ronnie (Editor)

    1988-01-01

    Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.

  3. Very Large Parallel Data Flow

    Science.gov (United States)

    1988-03-01

    billion characters. Teradata Corporation’s DBC/1012, a parallel relational database machine, is another high performance engine for large database...the volume of data being processed. Such parallelism is currently exploited in multiproces- sor relational database machines such as the Teradata DBC...scale parallelism can be achieved in all-solutions relations using or-parallelism in a multiprocessor architecture. For instance, the Teradata database

  4. Parallel External Memory Graph Algorithms

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

    2010-01-01

    In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of ¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....

  5. Parallel Architectures and Bioinspired Algorithms

    CERN Document Server

    Pérez, José; Lanchares, Juan

    2012-01-01

    This monograph presents examples of best practices when combining bioinspired algorithms with parallel architectures. The book includes recent work by leading researchers in the field and offers a map with the main paths already explored and new ways towards the future. Parallel Architectures and Bioinspired Algorithms will be of value to both specialists in Bioinspired Algorithms, Parallel and Distributed Computing, as well as computer science students trying to understand the present and the future of Parallel Architectures and Bioinspired Algorithms.

  6. A parallel implementation of a maximum entropy reconstruction algorithm for PET images in a visual language

    International Nuclear Information System (INIS)

    Bastiens, K.; Lemahieu, I.

    1994-01-01

    The application of a maximum entropy reconstruction algorithm to PET images requires a lot of computing resources. A parallel implementation could seriously reduce the execution time. However, programming a parallel application is still a non trivial task, needing specialized people. In this paper a programming environment based on a visual programming language is used for a parallel implementation of the reconstruction algorithm. This programming environment allows less experienced programmers to use the performance of multiprocessor systems. (authors)

  7. Massively Parallel Genetics.

    Science.gov (United States)

    Shendure, Jay; Fields, Stanley

    2016-06-01

    Human genetics has historically depended on the identification of individuals whose natural genetic variation underlies an observable trait or disease risk. Here we argue that new technologies now augment this historical approach by allowing the use of massively parallel assays in model systems to measure the functional effects of genetic variation in many human genes. These studies will help establish the disease risk of both observed and potential genetic variants and to overcome the problem of "variants of uncertain significance." Copyright © 2016 by the Genetics Society of America.

  8. Parallel sphere rendering

    Energy Technology Data Exchange (ETDEWEB)

    Krogh, M.; Hansen, C.; Painter, J. [Los Alamos National Lab., NM (United States); de Verdiere, G.C. [CEA Centre d`Etudes de Limeil, 94 - Villeneuve-Saint-Georges (France)

    1995-05-01

    Sphere rendering is an important method for visualizing molecular dynamics data. This paper presents a parallel divide-and-conquer algorithm that is almost 90 times faster than current graphics workstations. To render extremely large data sets and large images, the algorithm uses the MIMD features of the supercomputers to divide up the data, render independent partial images, and then finally composite the multiple partial images using an optimal method. The algorithm and performance results are presented for the CM-5 and the T3D.

  9. Parallel Repetition From Fortification

    OpenAIRE

    Moshkovitz Aaronson, Dana Hadar

    2014-01-01

    The Parallel Repetition Theorem upper-bounds the value of a repeated (tensored) two prover game in terms of the value of the base game and the number of repetitions. In this work we give a simple transformation on games – “fortification” – and show that for fortified games, the value of the repeated game decreases perfectly exponentially with the number of repetitions, up to an arbitrarily small additive error. Our proof is combinatorial and short. As corollaries, we obtain: (1) Starting from...

  10. Linear Bregman algorithm implemented in parallel GPU

    Science.gov (United States)

    Li, Pengyan; Ke, Jue; Sui, Dong; Wei, Ping

    2015-08-01

    At present, most compressed sensing (CS) algorithms have poor converging speed, thus are difficult to run on PC. To deal with this issue, we use a parallel GPU, to implement a broadly used compressed sensing algorithm, the Linear Bregman algorithm. Linear iterative Bregman algorithm is a reconstruction algorithm proposed by Osher and Cai. Compared with other CS reconstruction algorithms, the linear Bregman algorithm only involves the vector and matrix multiplication and thresholding operation, and is simpler and more efficient for programming. We use C as a development language and adopt CUDA (Compute Unified Device Architecture) as parallel computing architectures. In this paper, we compared the parallel Bregman algorithm with traditional CPU realized Bregaman algorithm. In addition, we also compared the parallel Bregman algorithm with other CS reconstruction algorithms, such as OMP and TwIST algorithms. Compared with these two algorithms, the result of this paper shows that, the parallel Bregman algorithm needs shorter time, and thus is more convenient for real-time object reconstruction, which is important to people's fast growing demand to information technology.

  11. Equalizer: a scalable parallel rendering framework.

    Science.gov (United States)

    Eilemann, Stefan; Makhinya, Maxim; Pajarola, Renato

    2009-01-01

    Continuing improvements in CPU and GPU performances as well as increasing multi-core processor and cluster-based parallelism demand for flexible and scalable parallel rendering solutions that can exploit multipipe hardware accelerated graphics. In fact, to achieve interactive visualization, scalable rendering systems are essential to cope with the rapid growth of data sets. However, parallel rendering systems are non-trivial to develop and often only application specific implementations have been proposed. The task of developing a scalable parallel rendering framework is even more difficult if it should be generic to support various types of data and visualization applications, and at the same time work efficiently on a cluster with distributed graphics cards. In this paper we introduce a novel system called Equalizer, a toolkit for scalable parallel rendering based on OpenGL which provides an application programming interface (API) to develop scalable graphics applications for a wide range of systems ranging from large distributed visualization clusters and multi-processor multipipe graphics systems to single-processor single-pipe desktop machines. We describe the system architecture, the basic API, discuss its advantages over previous approaches, present example configurations and usage scenarios as well as scalability results.

  12. Parallel Nonlinear Optimization for Astrodynamic Navigation, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — CU Aerospace proposes the development of a new parallel nonlinear program (NLP) solver software package. NLPs allow the solution of complex optimization problems,...

  13. A Parallel Butterfly Algorithm

    KAUST Repository

    Poulson, Jack

    2014-02-04

    The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.

  14. Fast parallel event reconstruction

    CERN Document Server

    CERN. Geneva

    2010-01-01

    On-line processing of large data volumes produced in modern HEP experiments requires using maximum capabilities of modern and future many-core CPU and GPU architectures.One of such powerful feature is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achievingmore operations per clock cycle. Motivated by the idea of using the SIMD unit ofmodern processors, the KF based track fit has been adapted for parallelism, including memory optimization, numerical analysis, vectorization with inline operator overloading, and optimization using SDKs. The speed of the algorithm has been increased in 120000 times with 0.1 ms/track, running in parallel on 16 SPEs of a Cell Blade computer.  Running on a Nehalem CPU with 8 cores it shows the processing speed of 52 ns/track using the Intel Threading Building Blocks. The same KF algorithm running on an Nvidia GTX 280 in the CUDA frameworkprovi...

  15. Theory of Parallel Mechanisms

    CERN Document Server

    Huang, Zhen; Ding, Huafeng

    2013-01-01

    This book contains mechanism analysis and synthesis. In mechanism analysis, a mobility methodology is first systematically presented. This methodology, based on the author's screw theory, proposed in 1997, of which the generality and validity was only proved recently,  is a very complex issue, researched by various scientists over the last 150 years. The principle of kinematic influence coefficient and its latest developments are described. This principle is suitable for kinematic analysis of various 6-DOF and lower-mobility parallel manipulators. The singularities are classified by a new point of view, and progress in position-singularity and orientation-singularity is stated. In addition, the concept of over-determinate input is proposed and a new method of force analysis based on screw theory is presented. In mechanism synthesis, the synthesis for spatial parallel mechanisms is discussed, and the synthesis method of difficult 4-DOF and 5-DOF symmetric mechanisms, which was first put forward by the a...

  16. “Everybody Brush!”: Protocol for a Parallel-Group Randomized Controlled Trial of a Family-Focused Primary Prevention Program With Distribution of Oral Hygiene Products and Education to Increase Frequency of Toothbrushing

    Science.gov (United States)

    2015-01-01

    Background Twice daily toothbrushing with fluoridated toothpaste is the most widely advocated preventive strategy for dental caries (tooth decay) and is recommended by professional dental associations. Not all parents, children, or adolescents follow this recommendation. This protocol describes the methods for the implementation and evaluation of a quality improvement health promotion program. Objective The objective of the study is to show a theory-informed, evidence-based program to improve twice daily toothbrushing and oral health-related quality of life that may reduce dental caries, dental treatment need, and costs. Methods The design is a parallel-group, pragmatic randomized controlled trial. Families of Medicaid-insured children and adolescents within a large dental care organization in central Oregon will participate in the trial (n=21,743). Families will be assigned to one of three groups: a test intervention, an active control, or a passive control condition. The intervention aims to address barriers and support for twice-daily toothbrushing. Families in the test condition will receive toothpaste and toothbrushes by mail for all family members every three months. In addition, they will receive education and social support to encourage toothbrushing via postcards, recorded telephone messages, and an optional participant-initiated telephone helpline. Families in the active control condition will receive the kit of supplies by mail, but no additional instructional information or telephone support. Families assigned to the passive control will be on a waiting list. The primary outcomes are restorative dental care received and, only for children younger than 36 months old at baseline, the frequency of twice-daily toothbrushing. Data will be collected through dental claims records and, for children younger than 36 months old at baseline, parent interviews and clinical exams. Results Enrollment of participants and baseline interviews have been completed. Final

  17. Does one size really fit all? The effectiveness of a non-diagnosis-specific integrated mental health care program in Germany in a prospective, parallel-group controlled multi-centre trial.

    Science.gov (United States)

    Mueller-Stierlin, Annabel Sandra; Helmbrecht, Marina Julia; Herder, Katrin; Prinz, Stefanie; Rosenfeld, Nadine; Walendzik, Julia; Holzmann, Marco; Dinc, Uemmueguelsuem; Schützwohl, Matthias; Becker, Thomas; Kilian, Reinhold

    2017-08-01

    The Network for Mental Health (NWpG-IC) is an integrated mental health care program implemented in 2009 by cooperation between health insurance companies and community mental health providers in Germany. Meanwhile about 10,000 patients have been enrolled. This is the first study evaluating the effectiveness of the program in comparison to standard mental health care in Germany. In a parallel-group controlled trial over 18 months conducted in five regions across Germany, a total of 260 patients enrolled in NWpG-IC and 251 patients in standard mental health care (TAU) were recruited between August 2013 and November 2014. The NWpG-IC patients had access to special services such as community-based multi-professional teams, case management, crisis intervention and family-oriented psychoeducation in addition to standard mental health care. The primary outcome empowerment (EPAS) and the secondary outcomes quality of life (WHO-QoL-BREF), satisfaction with psychiatric treatment (CSQ-8), psychosocial and clinical impairment (HoNOS) and information about mental health service needs (CAN) were measured four times at 6-month intervals. Linear mixed-effect regression models were used to estimate the main effects and interaction effects of treatment, time and primary diagnosis. Due to the non-randomised group assignment, propensity score adjustment was used to control the selection bias. NWpG-IC and TAU groups did not differ with respect to most primary and secondary outcomes in our participating patients who showed a broad spectrum of psychiatric diagnoses and illness severities. However, a significant improvement in terms of patients' satisfaction with psychiatric care and their perception of treatment participation in favour of the NWpG-IC group was found. Providing integrated mental health care for unspecific mentally ill target groups increases treatment participation and service satisfaction but seems not suitable to enhance the overall outcomes of mental health care in

  18. Rubus: A compiler for seamless and extensible parallelism.

    Directory of Open Access Journals (Sweden)

    Muhammad Adnan

    Full Text Available Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU, originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84

  19. Rubus: A compiler for seamless and extensible parallelism.

    Science.gov (United States)

    Adnan, Muhammad; Aslam, Faisal; Nawaz, Zubair; Sarwar, Syed Mansoor

    2017-01-01

    Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been

  20. Parallel computing in enterprise modeling.

    Energy Technology Data Exchange (ETDEWEB)

    Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

    2008-08-01

    This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

  1. Porting, parallelization and performance evaluation experiences with massively parallel supercomputing system based on transputer

    Energy Technology Data Exchange (ETDEWEB)

    Fruscione, M.; Stofella, P.; Cleri, F.; Mazzeo, M.; Ornelli, P.; Schiano, P.

    1991-02-01

    This paper decribes the most important aspects and results obtained from the porting and parallelization of two programs, VPMC and EULERO, on a Meiko multiprocessor `Computing Surface` system. The VPMC program was developed by ENEA (the Italian Agency for Energy, New Technologies and the Environment) to simulate travelling electrons. EULERO is a fluid dynamics simulation program, owned by CIRA (Centro Italiano di Ricerche Aereospaziali) which uses it for its aereo space components projects. This report gives short descriptions of the two programs and their parallelization methodologies, and provides a performance evaluation of the Meiko `Computing Surface` system. Moreover, these performance data are compared with corresponding data obtained with IBM 3090, CRAY and other computers by ENEA and CIRA in their research and development activities.

  2. Parallel community climate model: Description and user`s guide

    Energy Technology Data Exchange (ETDEWEB)

    Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H. [and others

    1996-07-15

    This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.

  3. An Educational Tool for Interactive Parallel and Distributed Processing

    DEFF Research Database (Denmark)

    Pagliarini, Luigi; Lund, Henrik Hautop

    2011-01-01

    of the abstract problems related to designing interactive parallel and distributed systems. Indeed, MITS seems to bring a series of goals into the education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation...

  4. Inflated speedups in parallel simulations via malloc()

    Science.gov (United States)

    Nicol, David M.

    1990-01-01

    Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all systems) can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is especially severe on most parallel architectures, which tend not to support virtual memory. It is shown how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply catches blocks on the basis of their size. The problem is demonstrated empirically, and the effectiveness of the solution is shown both empirically and analytically.

  5. The new landscape of parallel computer architecture

    Science.gov (United States)

    Shalf, John

    2007-07-01

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.

  6. Parallel Evolutionary Optimization for Neuromorphic Network Training

    Energy Technology Data Exchange (ETDEWEB)

    Schuman, Catherine D [ORNL; Disney, Adam [University of Tennessee (UT); Singh, Susheela [North Carolina State University (NCSU), Raleigh; Bruer, Grant [University of Tennessee (UT); Mitchell, John Parker [University of Tennessee (UT); Klibisz, Aleksander [University of Tennessee (UT); Plank, James [University of Tennessee (UT)

    2016-01-01

    One of the key impediments to the success of current neuromorphic computing architectures is the issue of how best to program them. Evolutionary optimization (EO) is one promising programming technique; in particular, its wide applicability makes it especially attractive for neuromorphic architectures, which can have many different characteristics. In this paper, we explore different facets of EO on a spiking neuromorphic computing model called DANNA. We focus on the performance of EO in the design of our DANNA simulator, and on how to structure EO on both multicore and massively parallel computing systems. We evaluate how our parallel methods impact the performance of EO on Titan, the U.S.'s largest open science supercomputer, and BOB, a Beowulf-style cluster of Raspberry Pi's. We also focus on how to improve the EO by evaluating commonality in higher performing neural networks, and present the result of a study that evaluates the EO performed by Titan.

  7. Parallel Polarization State Generation.

    Science.gov (United States)

    She, Alan; Capasso, Federico

    2016-05-17

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.

  8. Parallel imaging microfluidic cytometer.

    Science.gov (United States)

    Ehrlich, Daniel J; McKenna, Brian K; Evans, James G; Belkina, Anna C; Denis, Gerald V; Sherr, David H; Cheung, Man Ching

    2011-01-01

    By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of fluorescence-activated flow cytometry (FCM) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity, and (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in ∼6-10 min, about 30 times the speed of most current FCM systems. In 1D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times for the sample throughput of charge-coupled device (CCD)-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. Copyright © 2011 Elsevier Inc. All rights reserved.

  9. Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

    Science.gov (United States)

    Tam, Wing-Kin; Yang, Zhi

    2018-05-01

    Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.

  10. Power stability methods for parallel systems

    International Nuclear Information System (INIS)

    Wallach, Y.

    1988-01-01

    Parallel-Processing Systems are already commercially available. This paper shows that if one of them - the Alternating Sequential Parallel, or ASP system - is applied to network stability calculations it will lead to a higher speed of solution. The ASP system is first described and is then shown to be cheaper, more reliable and available than other parallel systems. Also, no deadlock need be feared and the speedup is normally very high. A number of ASP systems were already assembled (the SMS systems, Topps, DIRMU etc.). At present, an IBM Local Area Network is being modified so that it too can work in the ASP mode. Existing ASP systems were programmed in Fortran or assembly language. Since newer systems (e.g. DIRMU) are programmed in Modula-2, this language can be used. Stability analysis is based on solving nonlinear differential and algebraic equations. The algorithm for solving the nonlinear differential equations on ASP, is described and programmed in Modula-2. The speedup is computed and is shown to be almost optimal

  11. CRBLASTER: A Parallel-Processing Computational Framework for Embarrassingly-Parallel Image-Analysis Algorithms

    Science.gov (United States)

    Mighell, Kenneth John

    2011-11-01

    The development of parallel-processing image-analysis codes is generally a challenging task that requires complicated choreography of interprocessor communications. If, however, the image-analysis algorithm is embarrassingly parallel, then the development of a parallel-processing implementation of that algorithm can be a much easier task to accomplish because, by definition, there is little need for communication between the compute processes. I describe the design, implementation, and performance of a parallel-processing image-analysis application, called CRBLASTER, which does cosmic-ray rejection of CCD (charge-coupled device) images using the embarrassingly-parallel L.A.COSMIC algorithm. CRBLASTER is written in C using the high-performance computing industry standard Message Passing Interface (MPI) library. The code has been designed to be used by research scientists who are familiar with C as a parallel-processing computational framework that enables the easy development of parallel-processing image-analysis programs based on embarrassingly-parallel algorithms. The CRBLASTER source code is freely available at the official application website at the National Optical Astronomy Observatory. Removing cosmic rays from a single 800x800 pixel Hubble Space Telescope WFPC2 image takes 44 seconds with the IRAF script lacos_im.cl running on a single core of an Apple Mac Pro computer with two 2.8-GHz quad-core Intel Xeon processors. CRBLASTER is 7.4 times faster processing the same image on a single core on the same machine. Processing the same image with CRBLASTER simultaneously on all 8 cores of the same machine takes 0.875 seconds - which is a speedup factor of 50.3 times faster than the IRAF script. A detailed analysis is presented of the performance of CRBLASTER using between 1 and 57 processors on a low-power Tilera 700-MHz 64-core TILE64 processor.

  12. Parallel Framework for Cooperative Processes

    Directory of Open Access Journals (Sweden)

    Mitică Craus

    2005-01-01

    Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.

  13. A high-speed linear algebra library with automatic parallelism

    Science.gov (United States)

    Boucher, Michael L.

    1994-01-01

    Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.

  14. Parallel Monte Carlo reactor neutronics

    International Nuclear Information System (INIS)

    Blomquist, R.N.; Brown, F.B.

    1994-01-01

    The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved

  15. A Parallel Particle Swarm Optimizer

    National Research Council Canada - National Science Library

    Schutte, J. F; Fregly, B .J; Haftka, R. T; George, A. D

    2003-01-01

    .... Motivated by a computationally demanding biomechanical system identification problem, we introduce a parallel implementation of a stochastic population based global optimizer, the Particle Swarm...

  16. Performance studies of the parallel VIM code

    International Nuclear Information System (INIS)

    Shi, B.; Blomquist, R.N.

    1996-01-01

    In this paper, the authors evaluate the performance of the parallel version of the VIM Monte Carlo code on the IBM SPx at the High Performance Computing Research Facility at ANL. Three test problems with contrasting computational characteristics were used to assess effects in performance. A statistical method for estimating the inefficiencies due to load imbalance and communication is also introduced. VIM is a large scale continuous energy Monte Carlo radiation transport program and was parallelized using history partitioning, the master/worker approach, and p4 message passing library. Dynamic load balancing is accomplished when the master processor assigns chunks of histories to workers that have completed a previously assigned task, accommodating variations in the lengths of histories, processor speeds, and worker loads. At the end of each batch (generation), the fission sites and tallies are sent from each worker to the master process, contributing to the parallel inefficiency. All communications are between master and workers, and are serial. The SPx is a scalable 128-node parallel supercomputer with high-performance Omega switches of 63 microsec latency and 35 MBytes/sec bandwidth. For uniform and reproducible performance, they used only the 120 identical regular processors (IBM RS/6000) and excluded the remaining eight planet nodes, which may be loaded by other's jobs

  17. Parallelized reliability estimation of reconfigurable computer networks

    Science.gov (United States)

    Nicol, David M.; Das, Subhendu; Palumbo, Dan

    1990-01-01

    A parallelized system, ASSURE, for computing the reliability of embedded avionics flight control systems which are able to reconfigure themselves in the event of failure is described. ASSURE accepts a grammar that describes a reliability semi-Markov state-space. From this it creates a parallel program that simultaneously generates and analyzes the state-space, placing upper and lower bounds on the probability of system failure. ASSURE is implemented on a 32-node Intel iPSC/860, and has achieved high processor efficiencies on real problems. Through a combination of improved algorithms, exploitation of parallelism, and use of an advanced microprocessor architecture, ASSURE has reduced the execution time on substantial problems by a factor of one thousand over previous workstation implementations. Furthermore, ASSURE's parallel execution rate on the iPSC/860 is an order of magnitude faster than its serial execution rate on a Cray-2 supercomputer. While dynamic load balancing is necessary for ASSURE's good performance, it is needed only infrequently; the particular method of load balancing used does not substantially affect performance.

  18. Logical inference techniques for loop parallelization

    KAUST Repository

    Oancea, Cosmin E.

    2012-01-01

    This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop\\'s memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S = Ø, where S is a set expression representing array indexes. Using a language instead of an array-abstraction representation for S results in a smaller number of conservative approximations but exhibits a potentially-high runtime cost. To alleviate this cost we introduce a language translation F from the USR set-expression language to an equally rich language of predicates (F(S) ⇒ S = Ø). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates (F(S)) into a sequence of sufficient-independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECTCLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers. Copyright © 2012 ACM.

  19. Parallelization and automatic data distribution for nuclear reactor simulations

    Energy Technology Data Exchange (ETDEWEB)

    Liebrock, L.M. [Liebrock-Hicks Research, Calumet, MI (United States)

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.

  20. Parallelization and automatic data distribution for nuclear reactor simulations

    International Nuclear Information System (INIS)

    Liebrock, L.M.

    1997-01-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed

  1. Truly nested data-parallelism: compiling SaC for the Microgrid architecture

    NARCIS (Netherlands)

    Herhut, S.; Joslin, C.; Scholz, S.-B.; Grelck, C.; Morazan, M.

    2009-01-01

    Data-parallel programming facilitates elegant specification of concurrency. However, the composability of data-parallel operations so far has been constrained by the requirement to have only at data- parallel operation at runtime. In this paper, we present early results on our work to exploit

  2. The construction of parallel tests from IRT-based item banks

    NARCIS (Netherlands)

    Boekkooi-Timminga, Ellen

    1989-01-01

    The construction of parallel tests from item response theory (IRT) based item banks is discussed. Tests are considered parallel whenever their information functions are identical. After the methods for constructing parallel tests are considered, the computational complexity of 0-1 linear programming

  3. A solution for automatic parallelization of sequential assembly code

    Directory of Open Access Journals (Sweden)

    Kovačević Đorđe

    2013-01-01

    Full Text Available Since modern multicore processors can execute existing sequential programs only on a single core, there is a strong need for automatic parallelization of program code. Relying on existing algorithms, this paper describes one new software solution tool for parallelization of sequential assembly code. The main goal of this paper is to develop the parallelizator which reads sequential assembler code and at the output provides parallelized code for MIPS processor with multiple cores. The idea is the following: the parser translates assembler input file to program objects suitable for further processing. After that the static single assignment is done. Based on the data flow graph, the parallelization algorithm separates instructions on different cores. Once sequential code is parallelized by the parallelization algorithm, registers are allocated with the algorithm for linear allocation, and the result at the end of the program is distributed assembler code on each of the cores. In the paper we evaluate the speedup of the matrix multiplication example, which was processed by the parallelizator of assembly code. The result is almost linear speedup of code execution, which increases with the number of cores. The speed up on the two cores is 1.99, while on 16 cores the speed up is 13.88.

  4. Parallelizing Monte Carlo with PMC

    International Nuclear Information System (INIS)

    Rathkopf, J.A.; Jones, T.R.; Nessett, D.M.; Stanberry, L.C.

    1994-11-01

    PMC (Parallel Monte Carlo) is a system of generic interface routines that allows easy porting of Monte Carlo packages of large-scale physics simulation codes to Massively Parallel Processor (MPP) computers. By loading various versions of PMC, simulation code developers can configure their codes to run in several modes: serial, Monte Carlo runs on the same processor as the rest of the code; parallel, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on other MPP processor(s); distributed, Monte Carlo runs in parallel across many processors of the MPP with the rest of the code running on a different machine. This multi-mode approach allows maintenance of a single simulation code source regardless of the target machine. PMC handles passing of messages between nodes on the MPP, passing of messages between a different machine and the MPP, distributing work between nodes, and providing independent, reproducible sequences of random numbers. Several production codes have been parallelized under the PMC system. Excellent parallel efficiency in both the distributed and parallel modes results if sufficient workload is available per processor. Experiences with a Monte Carlo photonics demonstration code and a Monte Carlo neutronics package are described

  5. Parallel context-free languages

    DEFF Research Database (Denmark)

    Skyum, Sven

    1974-01-01

    The relation between the family of context-free languages and the family of parallel context-free languages is examined in this paper. It is proved that the families are incomparable. Finally we prove that the family of languages of finite index is contained in the family of parallel context...

  6. Parallel pseudospectral domain decomposition techniques

    Science.gov (United States)

    Gottlieb, David; Hirsch, Richard S.

    1989-01-01

    The influence of interface boundary conditions on the ability to parallelize pseudospectral multidomain algorithms is investigated. Using the properties of spectral expansions, a novel parallel two domain procedure is generalized to an arbitrary number of domains each of which can be solved on a separate processor. This interface boundary condition considerably simplifies influence matrix techniques.

  7. Design Patterns: establishing a discipline of parallel software engineering

    CERN Document Server

    CERN. Geneva

    2010-01-01

    Many core processors present us with a software challenge. We must turn our serial code into parallel code. To accomplish this wholesale transformation of our software ecosystem, we must define established practice is in parallel programming and then develop tools to support that practice. This leads to design patterns supported by frameworks optimized at runtime with advanced autotuning compilers. In this talk I provide an update of my ongoing research with the ParLab at UC Berkeley to realize this vision. In particular, I will describe our draft parallel pattern language, our early experiments with software frameworks, and the associated runtime optimization tools.About the speakerTim Mattson is a parallel programmer (Ph.D. Chemistry, UCSC, 1985). He does linear algebra, finds oil, shakes molecules, solves differential equations, and models electrons in simple atomic systems. He has spent his career working with computer scientists to make sure the needs of parallel applications programmers are met.Tim has ...

  8. Finite element electromagnetic field computation on the Sequent Symmetry 81 parallel computer

    International Nuclear Information System (INIS)

    Ratnajeevan, S.; Hoole, H.

    1990-01-01

    Finite element field analysis algorithms lend themselves to parallelization and this fact is exploited in this paper to implement a finite element analysis program for electromagnetic field computation on the Sequent Symmetry 81 parallel computer with three processors. In terms of waiting time, the maximum gains are to be made in matrix solution and therefore this paper concentrates on the gains in parallelizing the solution part of finite element analysis. An outline of how parallelization could be exploited in most finite element operations is given in this paper although the actual implemention of parallelism on the Sequent Symmetry 81 parallel computer was in sparsity computation, matrix assembly and the matrix solution areas. In all cases, the algorithms were modified suit the parallel programming application rather than allowing the compiler to parallelize on existing algorithms

  9. Parallelization of the FLAPW method

    International Nuclear Information System (INIS)

    Canning, A.; Mannstadt, W.; Freeman, A.J.

    1999-01-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about one hundred atoms due to a lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel computer

  10. Parallel Scientific Computing in C++ and MPI

    Science.gov (United States)

    Karniadakis, George Em; Kirby, Robert M., II

    2003-06-01

    This book provides a seamless approach to numerical algorithms, modern programming techniques and parallel computing. These concepts and tools are usually taught serially across different courses and different textbooks, thus observing the connection between them. The necessity of integrating these subjects usually comes after such courses are concluded (e.g., during a first job or a thesis project), thus forcing the student to synthesize what is perceived to be three independent subfields into one in order to produce a solution. The book includes both basic and advanced topics and places equal emphasis on the discretization of partial differential equations and on solvers. Advanced topics include wavelets, high-order methods, non-symmetric systems and parallelization of sparse systems. A CD-ROM accompanies the text.

  11. Massively parallel sparse matrix function calculations with NTPoly

    Science.gov (United States)

    Dawson, William; Nakajima, Takahito

    2018-04-01

    We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.

  12. Directions in parallel processor architecture, and GPUs too

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    Modern computing is power-limited in every domain of computing. Performance increments extracted from instruction-level parallelism (ILP) are no longer power-efficient; they haven't been for some time. Thread-level parallelism (TLP) is a more easily exploited form of parallelism, at the expense of programmer effort to expose it in the program. In this talk, I will introduce you to disparate topics in parallel processor architecture that will impact programming models (and you) in both the near and far future. About the speaker Olivier is a senior GPU (SM) architect at NVIDIA and an active participant in the concurrency working group of the ISO C++ committee. He has also worked on very large diesel engines as a mechanical engineer, and taught at McGill University (Canada) as a faculty instructor.

  13. Programming

    International Nuclear Information System (INIS)

    Jackson, M.A.

    1982-01-01

    The programmer's task is often taken to be the construction of algorithms, expressed in hierarchical structures of procedures: this view underlies the majority of traditional programming languages, such as Fortran. A different view is appropriate to a wide class of problem, perhaps including some problems in High Energy Physics. The programmer's task is regarded as having three main stages: first, an explicit model is constructed of the reality with which the program is concerned; second, this model is elaborated to produce the required program outputs; third, the resulting program is transformed to run efficiently in the execution environment. The first two stages deal in network structures of sequential processes; only the third is concerned with procedure hierarchies. (orig.)

  14. Programming

    CERN Document Server

    Jackson, M A

    1982-01-01

    The programmer's task is often taken to be the construction of algorithms, expressed in hierarchical structures of procedures: this view underlies the majority of traditional programming languages, such as Fortran. A different view is appropriate to a wide class of problem, perhaps including some problems in High Energy Physics. The programmer's task is regarded as having three main stages: first, an explicit model is constructed of the reality with which the program is concerned; second, this model is elaborated to produce the required program outputs; third, the resulting program is transformed to run efficiently in the execution environment. The first two stages deal in network structures of sequential processes; only the third is concerned with procedure hierarchies.

  15. An integrated approach to improving the parallel applications development process

    Energy Technology Data Exchange (ETDEWEB)

    Rasmussen, Craig E [Los Alamos National Laboratory; Watson, Gregory R [IBM; Tibbitts, Beth R [IBM

    2009-01-01

    The development of parallel applications is becoming increasingly important to a broad range of industries. Traditionally, parallel programming was a niche area that was primarily exploited by scientists trying to model extremely complicated physical phenomenon. It is becoming increasingly clear, however, that continued hardware performance improvements through clock scaling and feature-size reduction are simply not going to be achievable for much longer. The hardware vendor's approach to addressing this issue is to employ parallelism through multi-processor and multi-core technologies. While there is little doubt that this approach produces scaling improvements, there are still many significant hurdles to be overcome before parallelism can be employed as a general replacement to more traditional programming techniques. The Parallel Tools Platform (PTP) Project was created in 2005 in an attempt to provide developers with new tools aimed at addressing some of the parallel development issues. Since then, the introduction of a new generation of peta-scale and multi-core systems has highlighted the need for such a platform. In this paper, we describe some of the challenges facing parallel application developers, present the current state of PTP, and provide a simple case study that demonstrates how PTP can be used to locate a potential deadlock situation in an MPI code.

  16. High-energy physics software parallelization using database techniques

    International Nuclear Information System (INIS)

    Argante, E.; Van der Stok, P.D.V.; Willers, I.

    1997-01-01

    A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradigm, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI. (orig.)

  17. Parallel H.263 Encoder in Normal Coding Mode

    OpenAIRE

    Cosmas, J; Paker, Y; Pearmain, A

    1998-01-01

    A parallel H.263 video encoder, which utilises spatial para1 elism, has been modelled using a multi-threaded program. Spatial parallelism is a technique where an image is subdivided into equal parts (as far as physically possible) and each part is proces!;ed by a separate processor by computing motion and texture mding with all processors cach acting on a different part of thc ]mag. This method leads to a performance increase, which is roughly in proportion to the number ...

  18. Template based parallel checkpointing in a massively parallel computer system

    Energy Technology Data Exchange (ETDEWEB)

    Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  19. Balanced, parallel operation of flashlamps

    International Nuclear Information System (INIS)

    Carder, B.M.; Merritt, B.T.

    1979-01-01

    A new energy store, the Compensated Pulsed Alternator (CPA), promises to be a cost effective substitute for capacitors to drive flashlamps that pump large Nd:glass lasers. Because the CPA is large and discrete, it will be necessary that it drive many parallel flashlamp circuits, presenting a problem in equal current distribution. Current division to +- 20% between parallel flashlamps has been achieved, but this is marginal for laser pumping. A method is presented here that provides equal current sharing to about 1%, and it includes fused protection against short circuit faults. The method was tested with eight parallel circuits, including both open-circuit and short-circuit fault tests

  20. Parallel implementation of electronic structure energy, gradient, and Hessian calculations

    Science.gov (United States)

    Lotrich, V.; Flocke, N.; Ponton, M.; Yau, A. D.; Perera, A.; Deumens, E.; Bartlett, R. J.

    2008-05-01

    ACES III is a newly written program in which the computationally demanding components of the computational chemistry code ACES II [J. F. Stanton et al., Int. J. Quantum Chem. 526, 879 (1992); [ACES II program system, University of Florida, 1994] have been redesigned and implemented in parallel. The high-level algorithms include Hartree-Fock (HF) self-consistent field (SCF), second-order many-body perturbation theory [MBPT(2)] energy, gradient, and Hessian, and coupled cluster singles, doubles, and perturbative triples [CCSD(T)] energy and gradient. For SCF, MBPT(2), and CCSD(T), both restricted HF and unrestricted HF reference wave functions are available. For MBPT(2) gradients and Hessians, a restricted open-shell HF reference is also supported. The methods are programed in a special language designed for the parallelization project. The language is called super instruction assembly language (SIAL). The design uses an extreme form of object-oriented programing. All compute intensive operations, such as tensor contractions and diagonalizations, all communication operations, and all input-output operations are handled by a parallel program written in C and FORTRAN 77. This parallel program, called the super instruction processor (SIP), interprets and executes the SIAL program. By separating the algorithmic complexity (in SIAL) from the complexities of execution on computer hardware (in SIP), a software system is created that allows for very effective optimization and tuning on different hardware architectures with quite manageable effort.

  1. "Feeling" Series and Parallel Resistances.

    Science.gov (United States)

    Morse, Robert A.

    1993-01-01

    Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)

  2. The STAPL Parallel Graph Library

    KAUST Repository

    Harshvardhan,

    2013-01-01

    This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraph pViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges. © Springer-Verlag Berlin Heidelberg 2013.

  3. Metal structures with parallel pores

    Science.gov (United States)

    Sherfey, J. M.

    1976-01-01

    Four methods of fabricating metal plates having uniformly sized parallel pores are studied: elongate bundle, wind and sinter, extrude and sinter, and corrugate stack. Such plates are suitable for electrodes for electrochemical and fuel cells.

  4. Seeing or moving in parallel

    DEFF Research Database (Denmark)

    Christensen, Mark Schram; Ehrsson, H Henrik; Nielsen, Jens Bo

    2013-01-01

    The underlying neural mechanisms of a perceptual bias for in-phase bimanual coordination movements are not well understood. In the present study, we measured brain activity with functional magnetic resonance imaging in healthy subjects during a task, where subjects performed bimanual index finger...... adduction-abduction movements symmetrically or in parallel with real-time congruent or incongruent visual feedback of the movements. One network, consisting of bilateral superior and middle frontal gyrus and supplementary motor area (SMA), was more active when subjects performed parallel movements, whereas...... a different network, involving bilateral dorsal premotor cortex (PMd), primary motor cortex, and SMA, was more active when subjects viewed parallel movements while performing either symmetrical or parallel movements. Correlations between behavioral instability and brain activity were present in right lateral...

  5. Parallel Processing and Scientific Applications

    Science.gov (United States)

    1992-11-30

    the algorithm have computational complexity O(N) and be scalable. Multigrid Methods Among the known scalable algorith.ms for elliptic PDE solution, the...close to linear in N and scales linearly in P. However multigrid methods have a particular difficulty on parallel machines in that they do not have...This inherent disadvantage of multigrid methods has led to the search for truly parallel MG methods - multigrid methods that can utilize all processors

  6. Parallel artificial liquid membrane extraction

    DEFF Research Database (Denmark)

    Gjelstad, Astrid; Rasmussen, Knut Einar; Parmer, Marthe Petrine

    2013-01-01

    This paper reports development of a new approach towards analytical liquid-liquid-liquid membrane extraction termed parallel artificial liquid membrane extraction. A donor plate and acceptor plate create a sandwich, in which each sample (human plasma) and acceptor solution is separated by an arti...... by an artificial liquid membrane. Parallel artificial liquid membrane extraction is a modification of hollow-fiber liquid-phase microextraction, where the hollow fibers are replaced by flat membranes in a 96-well plate format....

  7. A Survey of Parallel A*

    OpenAIRE

    Fukunaga, Alex; Botea, Adi; Jinnai, Yuu; Kishimoto, Akihiro

    2017-01-01

    A* is a best-first search algorithm for finding optimal-cost paths in graphs. A* benefits significantly from parallelism because in many applications, A* is limited by memory usage, so distributed memory implementations of A* that use all of the aggregate memory on the cluster enable problems that can not be solved by serial, single-machine implementations to be solved. We survey approaches to parallel A*, focusing on decentralized approaches to A* which partition the state space among proces...

  8. Lattice gauge theory using parallel processors

    International Nuclear Information System (INIS)

    Lee, T.D.; Chou, K.C.; Zichichi, A.

    1987-01-01

    The book's contents include: Lattice Gauge Theory Lectures: Introduction and Current Fermion Simulations; Monte Carlo Algorithms for Lattice Gauge Theory; Specialized Computers for Lattice Gauge Theory; Lattice Gauge Theory at Finite Temperature: A Monte Carlo Study; Computational Method - An Elementary Introduction to the Langevin Equation, Present Status of Numerical Quantum Chromodynamics; Random Lattice Field Theory; The GF11 Processor and Compiler; and The APE Computer and First Physics Results; Columbia Supercomputer Project: Parallel Supercomputer for Lattice QCD; Statistical and Systematic Errors in Numerical Simulations; Monte Carlo Simulation for LGT and Programming Techniques on the Columbia Supercomputer; Food for Thought: Five Lectures on Lattice Gauge Theory

  9. Parallel computation of seismic analysis of high arch dam

    Science.gov (United States)

    Chen, Houqun; Ma, Huaifa; Tu, Jin; Cheng, Guangqing; Tang, Juzhen

    2008-03-01

    Parallel computation programs are developed for three-dimensional meso-mechanics analysis of fully-graded dam concrete and seismic response analysis of high arch dams (ADs), based on the Parallel Finite Element Program Generator (PFEPG). The computational algorithms of the numerical simulation of the meso-structure of concrete specimens were studied. Taking into account damage evolution, static preload, strain rate effect, and the heterogeneity of the meso-structure of dam concrete, the fracture processes of damage evolution and configuration of the cracks can be directly simulated. In the seismic response analysis of ADs, all the following factors are involved, such as the nonlinear contact due to the opening and slipping of the contraction joints, energy dispersion of the far-field foundation, dynamic interactions of the dam-foundation-reservoir system, and the combining effects of seismic action with all static loads. The correctness, reliability and efficiency of the two parallel computational programs are verified with practical illustrations.

  10. Parallel Robot for Lower Limb Rehabilitation Exercises

    Directory of Open Access Journals (Sweden)

    Alireza Rastegarpanah

    2016-01-01

    Full Text Available The aim of this study is to investigate the capability of a 6-DoF parallel robot to perform various rehabilitation exercises. The foot trajectories of twenty healthy participants have been measured by a Vicon system during the performing of four different exercises. Based on the kinematics and dynamics of a parallel robot, a MATLAB program was developed in order to calculate the length of the actuators, the actuators’ forces, workspace, and singularity locus of the robot during the performing of the exercises. The calculated length of the actuators and the actuators’ forces were used by motion analysis in SolidWorks in order to simulate different foot trajectories by the CAD model of the robot. A physical parallel robot prototype was built in order to simulate and execute the foot trajectories of the participants. Kinect camera was used to track the motion of the leg’s model placed on the robot. The results demonstrate the robot’s capability to perform a full range of various rehabilitation exercises.

  11. A multitransputer parallel processing system (MTPPS)

    International Nuclear Information System (INIS)

    Jethra, A.K.; Pande, S.S.; Borkar, S.P.; Khare, A.N.; Ghodgaonkar, M.D.; Bairi, B.R.

    1993-01-01

    This report describes the design and implementation of a 16 node Multi Transputer Parallel Processing System(MTPPS) which is a platform for parallel program development. It is a MIMD machine based on message passing paradigm. The basic compute engine is an Inmos Transputer Ims T800-20. Transputer with local memory constitutes the processing element (NODE) of this MIMD architecture. Multiple NODES can be connected to each other in an identifiable network topology through the high speed serial links of the transputer. A Network Configuration Unit (NCU) incorporates the necessary hardware to provide software controlled network configuration. System is modularly expandable and more NODES can be added to the system to achieve the required processing power. The system is backend to the IBM-PC which has been integrated into the system to provide user I/O interface. PC resources are available to the programmer. Interface hardware between the PC and the network of transputers is INMOS compatible. Therefore, all the commercially available development software compatible to INMOS products can run on this system. While giving the details of design and implementation, this report briefly summarises MIMD Architectures, Transputer Architecture and Parallel Processing Software Development issues. LINPACK performance evaluation of the system and solutions of neutron physics and plasma physics problem have been discussed along with results. (author). 12 refs., 22 figs., 3 tabs., 3 appendixes

  12. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  13. A Parallel Approach to Fractal Image Compression

    OpenAIRE

    Lubomir Dedera

    2004-01-01

    The paper deals with a parallel approach to coding and decoding algorithms in fractal image compressionand presents experimental results comparing sequential and parallel algorithms from the point of view of achieved bothcoding and decoding time and effectiveness of parallelization.

  14. Automatic parallelization of while-Loops using speculative execution

    International Nuclear Information System (INIS)

    Collard, J.F.

    1995-01-01

    Automatic parallelization of imperative sequential programs has focused on nests of for-loops. The most recent of them consist in finding an affine mapping with respect to the loop indices to simultaneously capture the temporal and spatial properties of the parallelized program. Such a mapping is usually called a open-quotes space-time transformation.close quotes This work describes an extension of these techniques to while-loops using speculative execution. We show that space-time transformations are a good framework for summing up previous restructuration techniques of while-loop, such as pipelining. Moreover, we show that these transformations can be derived and applied automatically

  15. Parallel Implicit Algorithms for CFD

    Science.gov (United States)

    Keyes, David E.

    1998-01-01

    The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.

  16. CRBLASTER: A Parallel-Processing Computational Framework for Embarrassingly Parallel Image-Analysis Algorithms

    Science.gov (United States)

    Mighell, Kenneth John

    2010-10-01

    The development of parallel-processing image-analysis codes is generally a challenging task that requires complicated choreography of interprocessor communications. If, however, the image-analysis algorithm is embarrassingly parallel, then the development of a parallel-processing implementation of that algorithm can be a much easier task to accomplish because, by definition, there is little need for communication between the compute processes. I describe the design, implementation, and performance of a parallel-processing image-analysis application, called crblaster, which does cosmic-ray rejection of CCD images using the embarrassingly parallel l.a.cosmic algorithm. crblaster is written in C using the high-performance computing industry standard Message Passing Interface (MPI) library. crblaster uses a two-dimensional image partitioning algorithm that partitions an input image into N rectangular subimages of nearly equal area; the subimages include sufficient additional pixels along common image partition edges such that the need for communication between computer processes is eliminated. The code has been designed to be used by research scientists who are familiar with C as a parallel-processing computational framework that enables the easy development of parallel-processing image-analysis programs based on embarrassingly parallel algorithms. The crblaster source code is freely available at the official application Web site at the National Optical Astronomy Observatory. Removing cosmic rays from a single 800 × 800 pixel Hubble Space Telescope WFPC2 image takes 44 s with the IRAF script lacos_im.cl running on a single core of an Apple Mac Pro computer with two 2.8 GHz quad-core Intel Xeon processors. crblaster is 7.4 times faster when processing the same image on a single core on the same machine. Processing the same image with crblaster simultaneously on all eight cores of the same machine takes 0.875 s—which is a speedup factor of 50.3 times faster than the

  17. Parallelization of the model-based iterative reconstruction algorithm DIRA

    International Nuclear Information System (INIS)

    Oertenberg, A.; Sandborg, M.; Alm Carlsson, G.; Malusek, A.; Magnusson, M.

    2016-01-01

    New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelization of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelization of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelized using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelization of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelization with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. (authors)

  18. Algorithms for computational fluid dynamics n parallel processors

    International Nuclear Information System (INIS)

    Van de Velde, E.F.

    1986-01-01

    A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries

  19. Contribution to the algorithmic and efficient programming of new parallel architectures including accelerators for neutron physics and shielding computations; Contribution a l'algorithmique et a la programmation efficace des nouvelles architectures paralleles comportant des accelerateurs de calcul dans le domaine de la neutronique et de la radioprotection

    Energy Technology Data Exchange (ETDEWEB)

    Dubois, J.

    2011-10-13

    In science, simulation is a key process for research or validation. Modern computer technology allows faster numerical experiments, which are cheaper than real models. In the field of neutron simulation, the calculation of eigenvalues is one of the key challenges. The complexity of these problems is such that a lot of computing power may be necessary. The work of this thesis is first the evaluation of new computing hardware such as graphics card or massively multi-core chips, and their application to eigenvalue problems for neutron simulation. Then, in order to address the massive parallelism of supercomputers national, we also study the use of asynchronous hybrid methods for solving eigenvalue problems with this very high level of parallelism. Then we experiment the work of this research on several national supercomputers such as the Titane hybrid machine of the Computing Center, Research and Technology (CCRT), the Curie machine of the Very Large Computing Centre (TGCC), currently being installed, and the Hopper machine at the Lawrence Berkeley National Laboratory (LBNL). We also do our experiments on local workstations to illustrate the interest of this research in an everyday use with local computing resources. (author) [French] Les travaux de cette these concernent dans un premier temps l'evaluation des nouveaux materiels de calculs tels que les cartes graphiques ou les puces massivement multicoeurs, et leur application aux problemes de valeurs propres pour la neutronique. Ensuite, dans le but d'utiliser le parallelisme massif des supercalculateurs, nous etudions egalement l'utilisation de methodes hybrides asynchrones pour resoudre des problemes a valeur propre avec ce tres haut niveau de parallelisme. Nous experimentons ensuite le resultat de ces recherches sur plusieurs supercalculateurs nationaux tels que la machine hybride Titane du Centre de Calcul, Recherche et Technologies (CCRT), la machine Curie du Tres Grand Centre de Calcul (TGCC) qui

  20. Evaluating parallel optimization on transputers

    Directory of Open Access Journals (Sweden)

    A.G. Chalmers

    2003-12-01

    Full Text Available The faster processing power of modern computers and the development of efficient algorithms have made it possible for operations researchers to tackle a much wider range of problems than ever before. Further improvements in processing speed can be achieved utilising relatively inexpensive transputers to process components of an algorithm in parallel. The Davidon-Fletcher-Powell method is one of the most successful and widely used optimisation algorithms for unconstrained problems. This paper examines the algorithm and identifies the components that can be processed in parallel. The results of some experiments with these components are presented which indicates under what conditions parallel processing with an inexpensive configuration is likely to be faster than the traditional sequential implementations. The performance of the whole algorithm with its parallel components is then compared with the original sequential algorithm. The implementation serves to illustrate the practicalities of speeding up typical OR algorithms in terms of difficulty, effort and cost. The results give an indication of the savings in time a given parallel implementation can be expected to yield.

  1. Parallelizing Timed Petri Net simulations

    Science.gov (United States)

    Nicol, David M.

    1993-01-01

    The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

  2. Parallel plasma fluid turbulence calculations

    International Nuclear Information System (INIS)

    Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.

    1994-01-01

    The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated

  3. Parallel Signal Processing and System Simulation using aCe

    Science.gov (United States)

    Dorband, John E.; Aburdene, Maurice F.

    2003-01-01

    Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).

  4. Test generation for digital circuits using parallel processing

    Science.gov (United States)

    Hartmann, Carlos R.; Ali, Akhtar-Uz-Zaman M.

    1990-12-01

    The problem of test generation for digital logic circuits is an NP-Hard problem. Recently, the availability of low cost, high performance parallel machines has spurred interest in developing fast parallel algorithms for computer-aided design and test. This report describes a method of applying a 15-valued logic system for digital logic circuit test vector generation in a parallel programming environment. A concept called fault site testing allows for test generation, in parallel, that targets more than one fault at a given location. The multi-valued logic system allows results obtained by distinct processors and/or processes to be merged by means of simple set intersections. A machine-independent description is given for the proposed algorithm.

  5. Two applications of parallel processing in power system computation

    Energy Technology Data Exchange (ETDEWEB)

    Lemaitre, C.; Thomas, B. [Electricite de France, 92 - Clamart (France). Research and Development Div.

    1996-12-31

    Performance improvements are discussed achieved in two power system software modules through the use of parallel processing techniques. The first software module, EVARISTE, outputs a voltage stability indicator for various power system situations. The second module, MEXICO, assesses power system reliability and operating costs by simulating a large number of contingencies for generation and transmission equipment. Both software modules are well-suited to coarse-grain parallel processing. The first module was parallelized on a distributed-memory machine and the second on a shared-memory machine. A description of the parallelization process used in these two cases is presented, and details on the performance levels achieved are discussed, including aspects of programming, parameter selection, and machine characteristics. (author) 7 refs.

  6. The science of computing - The evolution of parallel processing

    Science.gov (United States)

    Denning, P. J.

    1985-01-01

    The present paper is concerned with the approaches to be employed to overcome the set of limitations in software technology which impedes currently an effective use of parallel hardware technology. The process required to solve the arising problems is found to involve four different stages. At the present time, Stage One is nearly finished, while Stage Two is under way. Tentative explorations are beginning on Stage Three, and Stage Four is more distant. In Stage One, parallelism is introduced into the hardware of a single computer, which consists of one or more processors, a main storage system, a secondary storage system, and various peripheral devices. In Stage Two, parallel execution of cooperating programs on different machines becomes explicit, while in Stage Three, new languages will make parallelism implicit. In Stage Four, there will be very high level user interfaces capable of interacting with scientists at the same level of abstraction as scientists do with each other.

  7. Optimal dynamic remapping of data parallel computations

    Science.gov (United States)

    Nicol, David M.; Reynolds, Paul F., Jr.

    1990-01-01

    A large class of data parallel computations is characterized by a sequence of phases, with phase changes occurring unpredictably. Dynamic remapping of the workload to processors may be required to maintain good performance. The problem considered, for which the utility of remapping and the future behavior of the workload are uncertain, arises when phases exhibit stable execution requirements during a given phase, but requirements change radically between phases. For these situations, a workload assignment generated for one phase may hinder performance during the next phase. This problem is treated formally for a probabilistic model of computation with at most two phases. The authors address the fundamental problem of balancing the expected remapping performance gain against the delay cost, and they derive the optimal remapping decision policy. The promise of the approach is shown by application to multiprocessor implementations of an adaptive gridding fluid dynamics program and to a battlefield simulation program.

  8. ICASE Computer Science Program

    Science.gov (United States)

    1985-01-01

    The Institute for Computer Applications in Science and Engineering computer science program is discussed in outline form. Information is given on such topics as problem decomposition, algorithm development, programming languages, and parallel architectures.

  9. Parallelization of elliptic solver for solving 1D Boussinesq model

    Science.gov (United States)

    Tarwidi, D.; Adytia, D.

    2018-03-01

    In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.

  10. Parallel algorithms for mapping pipelined and parallel computations

    Science.gov (United States)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  11. Cellular automata a parallel model

    CERN Document Server

    Mazoyer, J

    1999-01-01

    Cellular automata can be viewed both as computational models and modelling systems of real processes. This volume emphasises the first aspect. In articles written by leading researchers, sophisticated massive parallel algorithms (firing squad, life, Fischer's primes recognition) are treated. Their computational power and the specific complexity classes they determine are surveyed, while some recent results in relation to chaos from a new dynamic systems point of view are also presented. Audience: This book will be of interest to specialists of theoretical computer science and the parallelism challenge.

  12. Parallel FFT using Eden Skeletons

    DEFF Research Database (Denmark)

    Berthold, Jost; Dieterle, Mischa; Lobachev, Oleg

    2009-01-01

    The paper investigates and compares skeleton-based Eden implementations of different FFT-algorithms on workstation clusters with distributed memory. Our experiments show that the basic divide-and-conquer versions suffer from an inherent input distribution and result collection problem. Advanced...... approaches like calculating FFT using a parallel map-and-transpose skeleton provide more flexibility to overcome these problems. Assuming a distributed access to input data and re-organising computation to return results in a distributed way improves the parallel runtime behaviour....

  13. Applications of the Aurora parallel Prolog system to computational molecular biology

    Energy Technology Data Exchange (ETDEWEB)

    Lusk, E.L.; Overbeek, R. [Argonne National Lab., IL (United States); Mudambi, S. [Knox Coll., Galesburg, IL (United States); Szeredi, P. [IQSOFT, Budapest (Hungary)

    1993-09-01

    We describe an investigation into the use of the Aurora parallel Prolog system in two applications within the area of computational molecular biology. The computational requirements were large, due to the nature of the applications, and were large, due to the nature of the applications, and were carried out on a scalable parallel computer the BBN ``Butterfly`` TC-2000. Results include both a demonstration that logic programming can be effective in the context of demanding applications on large-scale parallel machines, and some insights into parallel programming in Prolog.

  14. Leveraging Parallel Data Processing Frameworks with Verified Lifting

    Directory of Open Access Journals (Sweden)

    Maaz Bin Safeer Ahmad

    2016-11-01

    Full Text Available Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific languages that each framework supports. This rewriting–tedious and error-prone–also requires developers to choose the framework that best optimizes performance given a specific workload. This paper describes Casper, a novel compiler that automatically retargets sequential Java code for execution on Hadoop, a parallel data processing framework that implements the MapReduce paradigm. Given a sequential code fragment, Casper uses verified lifting to infer a high-level summary expressed in our program specification language that is then compiled for execution on Hadoop. We demonstrate that Casper automatically translates Java benchmarks into Hadoop. The translated results execute on average 3.3x faster than the sequential implementations and scale better, as well, to larger datasets.

  15. An Educational Tool for Interactive Parallel and Distributed Processing

    DEFF Research Database (Denmark)

    Pagliarini, Luigi; Lund, Henrik Hautop

    2011-01-01

    In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation...... of the abstract problems related to designing interactive parallel and distributed systems. Indeed, MITS seems to bring a series of goals into the education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, user and multiuser interaction, which can hardly be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples show how to implement interactive...

  16. An educational tool for interactive parallel and distributed processing

    DEFF Research Database (Denmark)

    Pagliarini, Luigi; Lund, Henrik Hautop

    2012-01-01

    In this article we try to describe how the modular interactive tiles system (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing a handson educational tool that allows a change in the representation...... of abstract problems related to designing interactive parallel and distributed systems. Indeed, the MITS seems to bring a series of goals into education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, and user and multi-user interaction which can rarely be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples we show how to implement...

  17. Experiments with parallel algorithms for combinatorial problems

    NARCIS (Netherlands)

    G.A.P. Kindervater (Gerard); H.W.J.M. Trienekens

    1985-01-01

    textabstractIn the last decade many models for parallel computation have been proposed and many parallel algorithms have been developed. However, few of these models have been realized and most of these algorithms are supposed to run on idealized, unrealistic parallel machines. The parallel machines

  18. [Falsified medicines in parallel trade].

    Science.gov (United States)

    Muckenfuß, Heide

    2017-11-01

    The number of falsified medicines on the German market has distinctly increased over the past few years. In particular, stolen pharmaceutical products, a form of falsified medicines, have increasingly been introduced into the legal supply chain via parallel trading. The reasons why parallel trading serves as a gateway for falsified medicines are most likely the complex supply chains and routes of transport. It is hardly possible for national authorities to trace the history of a medicinal product that was bought and sold by several intermediaries in different EU member states. In addition, the heterogeneous outward appearance of imported and relabelled pharmaceutical products facilitates the introduction of illegal products onto the market. Official batch release at the Paul-Ehrlich-Institut offers the possibility of checking some aspects that might provide an indication of a falsified medicine. In some circumstances, this may allow the identification of falsified medicines before they come onto the German market. However, this control is only possible for biomedicinal products that have not received a waiver regarding official batch release. For improved control of parallel trade, better networking among the EU member states would be beneficial. European-wide regulations, e. g., for disclosure of the complete supply chain, would help to minimise the risks of parallel trading and hinder the marketing of falsified medicines.

  19. Parallel Sparse Matrix - Vector Product

    DEFF Research Database (Denmark)

    Alexandersen, Joe; Lazarov, Boyan Stefanov; Dammann, Bernd

    This technical report contains a case study of a sparse matrix-vector product routine, implemented for parallel execution on a compute cluster with both pure MPI and hybrid MPI-OpenMP solutions. C++ classes for sparse data types were developed and the report shows how these class can be used...

  20. Elongation Cutoff Technique: Parallel Performance

    Directory of Open Access Journals (Sweden)

    Jacek Korchowiec

    2008-01-01

    Full Text Available It is demonstrated that the elongation cutoff technique (ECT substantially speeds up thequantum-chemical calculation at Hartree-Fock (HF level of theory and is especially wellsuited for parallel performance. A comparison of ECT timings for water chains with thereference HF calculations is given. The analysis includes the overall CPU (central processingunit time and its most time consuming steps.

  1. Massively parallel quantum computer simulator

    NARCIS (Netherlands)

    De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

    2007-01-01

    We describe portable software to simulate universal quantum computers on massive parallel Computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray

  2. Lightweight Specifications for Parallel Correctness

    Science.gov (United States)

    2012-12-05

    that typically no pro- grammer specification is needed. However, manually determining which reported races are benign and which are bugs can be time...marked by the pro- grammer , but no statement is enclosed with if(true∗) — i.e., the set S∗ is empty. We are given a set T of parallel execution traces, and

  3. Where are the parallel algorithms?

    Science.gov (United States)

    Voigt, R. G.

    1985-01-01

    Four paradigms that can be useful in developing parallel algorithms are discussed. These include computational complexity analysis, changing the order of computation, asynchronous computation, and divide and conquer. Each is illustrated with an example from scientific computation, and it is shown that computational complexity must be used with great care or an inefficient algorithm may be selected.

  4. Parallel imaging with phase scrambling.

    Science.gov (United States)

    Zaitsev, Maxim; Schultz, Gerrit; Hennig, Juergen; Gruetter, Rolf; Gallichan, Daniel

    2015-04-01

    Most existing methods for accelerated parallel imaging in MRI require additional data, which are used to derive information about the sensitivity profile of each radiofrequency (RF) channel. In this work, a method is presented to avoid the acquisition of separate coil calibration data for accelerated Cartesian trajectories. Quadratic phase is imparted to the image to spread the signals in k-space (aka phase scrambling). By rewriting the Fourier transform as a convolution operation, a window can be introduced to the convolved chirp function, allowing a low-resolution image to be reconstructed from phase-scrambled data without prominent aliasing. This image (for each RF channel) can be used to derive coil sensitivities to drive existing parallel imaging techniques. As a proof of concept, the quadratic phase was applied by introducing an offset to the x(2) - y(2) shim and the data were reconstructed using adapted versions of the image space-based sensitivity encoding and GeneRalized Autocalibrating Partially Parallel Acquisitions algorithms. The method is demonstrated in a phantom (1 × 2, 1 × 3, and 2 × 2 acceleration) and in vivo (2 × 2 acceleration) using a 3D gradient echo acquisition. Phase scrambling can be used to perform parallel imaging acceleration without acquisition of separate coil calibration data, demonstrated here for a 3D-Cartesian trajectory. Further research is required to prove the applicability to other 2D and 3D sampling schemes. © 2014 Wiley Periodicals, Inc.

  5. Matpar: Parallel Extensions for MATLAB

    Science.gov (United States)

    Springer, P. L.

    1998-01-01

    Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.

  6. PEDANT: parallel texts in Göteborg | Ridings | Lexikos

    African Journals Online (AJOL)

    ... and standard applications and SGML-aware programming tools. The SGML format for encoding translation pairs is outlined together. The methods allow working with everything from plain text to texts densely encoded with linguistic information. Keywords: sgml, parallel corpora, morphosyntactic encoding, lemmatization, ...

  7. Scaling up machine learning: parallel and distributed approaches

    National Research Council Canada - National Science Library

    Bekkerman, Ron; Bilenko, Mikhail; Langford, John

    2012-01-01

    ... presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters; concurrent programming frameworks that include CUDA, MPI, MapReduce, and DryadLINQ; and various learning settings: supervised, unsupervised, semi-supervised, and online learning. Extensive coverage of parallelizat...

  8. Analytical and experimental investigation of a parallel leaf spring guidance

    NARCIS (Netherlands)

    Meijaard, Jacob Philippus; Brouwer, Dannis Michel; Jonker, Jan B.

    2010-01-01

    The consequences for the static and dynamic system behaviour of misalignment in an overconstrained direction are analysed. Therefore, a relatively simple parallel leaf spring guidance, which is overdetermined only once, serves as a case to gain insight. A multibody program using flexible beam theory

  9. The Impact of Parallel Processing on Operating Systems

    Directory of Open Access Journals (Sweden)

    Felician ALECU

    2009-07-01

    Full Text Available The base entity in computer programming is the process or task. The parallelism can be achieved by executing multiple processes on different processors. Distributed systems are managed by distributed operating systems that represent the extension for multiprocessor architectures of multitasking and multiprogramming operating systems.

  10. Using a Parallel Transcript/Subtitle Corpus for Sentence Compression

    NARCIS (Netherlands)

    Vandeghinste, V.; Tjong Kim Sang, E.F.

    2004-01-01

    In this paper we describe the collection of a parallel corpus (in Dutch) and its use in a sentence compression tool with the intention to automatically generate subtitles for the deaf from transcripts of a television program. First, the collection of the corpus is described, together with the

  11. Parallelization method for three dimensional MOC calculation

    International Nuclear Information System (INIS)

    Zhang Zhizhu; Li Qing; Wang Kan

    2013-01-01

    A parallelization method based on angular decomposition for the three dimensional MOC was designed. To improve the parallel efficiency, the directions were pre-grouped and the groups were assembled to minimize the communication. The improved parallelization method was applied to the three dimensional MOC code TCM. The numerical results show that the calculation results of parallelization method are agreed with serial calculation results. The parallel efficiency gets obvious increase after the communication optimized and load balance. (authors)

  12. SPRINT: a new parallel framework for R.

    Science.gov (United States)

    Hill, Jon; Hambley, Matthew; Forster, Thorsten; Mewissen, Muriel; Sloan, Terence M; Scharinger, Florian; Trew, Arthur; Ghazal, Peter

    2008-12-29

    Microarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of the resulting data tests the limits of existing bioinformatics computing infrastructure. A solution to this issue is to use High Performance Computing (HPC) systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest. We have designed and built a prototype framework that allows the addition of parallelised functions to R to enable the easy exploitation of HPC systems. The Simple Parallel R INTerface (SPRINT) is a wrapper around such parallelised functions. Their use requires very little modification to existing sequential R scripts and no expertise in parallel computing. As an example we created a function that carries out the computation of a pairwise calculated correlation matrix. This performs well with SPRINT. When executed using SPRINT on an HPC resource of eight processors this computation reduces by more than three times the time R takes to complete it on one processor. SPRINT allows the biostatistician to concentrate on the research problems rather than the computation, while still allowing exploitation of HPC systems. It is easy to use and with further development will become more useful as more functions are added to the framework.

  13. SPRINT: A new parallel framework for R

    Directory of Open Access Journals (Sweden)

    Scharinger Florian

    2008-12-01

    Full Text Available Abstract Background Microarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of the resulting data tests the limits of existing bioinformatics computing infrastructure. A solution to this issue is to use High Performance Computing (HPC systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest. Results We have designed and built a prototype framework that allows the addition of parallelised functions to R to enable the easy exploitation of HPC systems. The Simple Parallel R INTerface (SPRINT is a wrapper around such parallelised functions. Their use requires very little modification to existing sequential R scripts and no expertise in parallel computing. As an example we created a function that carries out the computation of a pairwise calculated correlation matrix. This performs well with SPRINT. When executed using SPRINT on an HPC resource of eight processors this computation reduces by more than three times the time R takes to complete it on one processor. Conclusion SPRINT allows the biostatistician to concentrate on the research problems rather than the computation, while still allowing exploitation of HPC systems. It is easy to use and with further development will become more useful as more

  14. Automatic Parallelization Using OpenMP Based on STL Semantics

    Energy Technology Data Exchange (ETDEWEB)

    Liao, C; Quinlan, D J; Willcock, J J; Panas, T

    2008-06-03

    Automatic parallelization of sequential applications using OpenMP as a target has been attracting significant attention recently because of the popularity of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. C++ applications using high level abstractions such as STL containers are largely ignored due to the lack of research compilers that are readily able to recognize high level object-oriented abstractions of STL. In this paper, we use ROSE, a multiple-language source-to-source compiler infrastructure, to build a parallelizer that can recognize such high level semantics and parallelize C++ applications using certain STL containers. The idea of our work is to automatically insert OpenMP constructs using extended conventional dependence analysis and the known domain-specific semantics of high-level abstractions with optional assistance from source code annotations. In addition, the parallelizer is followed by an OpenMP translator to translate the generated OpenMP programs into multi-threaded code targeted to a popular OpenMP runtime library. Our work extends the applicability of automatic parallelization and provides another way to take advantage of multicore processors.

  15. Parallelization of a Monte Carlo particle transport simulation code

    Science.gov (United States)

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  16. Micro-mechanical Simulations of Soils using Massively Parallel Supercomputers

    Directory of Open Access Journals (Sweden)

    David W. Washington

    2004-06-01

    Full Text Available In this research a computer program, Trubal version 1.51, based on the Discrete Element Method was converted to run on a Connection Machine (CM-5,a massively parallel supercomputer with 512 nodes, to expedite the computational times of simulating Geotechnical boundary value problems. The dynamic memory algorithm in Trubal program did not perform efficiently in CM-2 machine with the Single Instruction Multiple Data (SIMD architecture. This was due to the communication overhead involving global array reductions, global array broadcast and random data movement. Therefore, a dynamic memory algorithm in Trubal program was converted to a static memory arrangement and Trubal program was successfully converted to run on CM-5 machines. The converted program was called "TRUBAL for Parallel Machines (TPM." Simulating two physical triaxial experiments and comparing simulation results with Trubal simulations validated the TPM program. With a 512 nodes CM-5 machine TPM produced a nine-fold speedup demonstrating the inherent parallelism within algorithms based on the Discrete Element Method.

  17. Hydraulic study of parallel channels coupled to recirculation loops

    International Nuclear Information System (INIS)

    Campos G, R. M.; Cecenas F, M.

    2009-10-01

    In this work is integrated a model of recirculation loops that allows to characterize each loop for separate and with which is possible to analyze events as shot of recirculation bombs or its transfer of high to low speed. The recirculation pattern is integrated to a model of 36 channels in parallel that represents the core of a BWR. Because the core reactor is conformed by fuel assemblies physically prepared in a parallel arrangement, it is natural to obtain a parallel application of complete pattern, where are have 36 channels tasks more other two tasks that calculates recirculation and punctual kinetics, respectively. As initial test of system, which even it is found in development, was analyzed a discharge of both recirculation pumps. In this test transitory it is only verified the hydraulic behavior, the power is imposed artificially as frontier condition that is function of flow in the calculated core by the recirculation pattern. The pattern of thermal hydraulics channel and the recirculation loops are programmed in language C, the neutronic pattern is programmed in Fortran 77. For the simulations was used a work station Alpha Station DS20E with operative system Unix and the communication system Parallel Virtual Machine, that allows to a heterogeneous collection of computers in net to work like a virtual computer in parallel. (Author)

  18. High Performance Input/Output for Parallel Computer Systems

    Science.gov (United States)

    Ligon, W. B.

    1996-01-01

    The goal of our project is to study the I/O characteristics of parallel applications used in Earth Science data processing systems such as Regional Data Centers (RDCs) or EOSDIS. Our approach is to study the runtime behavior of typical programs and the effect of key parameters of the I/O subsystem both under simulation and with direct experimentation on parallel systems. Our three year activity has focused on two items: developing a test bed that facilitates experimentation with parallel I/O, and studying representative programs from the Earth science data processing application domain. The Parallel Virtual File System (PVFS) has been developed for use on a number of platforms including the Tiger Parallel Architecture Workbench (TPAW) simulator, The Intel Paragon, a cluster of DEC Alpha workstations, and the Beowulf system (at CESDIS). PVFS provides considerable flexibility in configuring I/O in a UNIX- like environment. Access to key performance parameters facilitates experimentation. We have studied several key applications fiom levels 1,2 and 3 of the typical RDC processing scenario including instrument calibration and navigation, image classification, and numerical modeling codes. We have also considered large-scale scientific database codes used to organize image data.

  19. A Novel Parallel Algorithm for Edit Distance Computation

    Directory of Open Access Journals (Sweden)

    Muhammad Murtaza Yousaf

    2018-01-01

    Full Text Available The edit distance between two sequences is the minimum number of weighted transformation-operations that are required to transform one string into the other. The weighted transformation-operations are insert, remove, and substitute. Dynamic programming solution to find edit distance exists but it becomes computationally intensive when the lengths of strings become very large. This work presents a novel parallel algorithm to solve edit distance problem of string matching. The algorithm is based on resolving dependencies in the dynamic programming solution of the problem and it is able to compute each row of edit distance table in parallel. In this way, it becomes possible to compute the complete table in min(m,n iterations for strings of size m and n whereas state-of-the-art parallel algorithm solves the problem in max(m,n iterations. The proposed algorithm also increases the amount of parallelism in each of its iteration. The algorithm is also capable of exploiting spatial locality while its implementation. Additionally, the algorithm works in a load balanced way that further improves its performance. The algorithm is implemented for multicore systems having shared memory. Implementation of the algorithm in OpenMP shows linear speedup and better execution time as compared to state-of-the-art parallel approach. Efficiency of the algorithm is also proven better in comparison to its competitor.

  20. Optical flow optimization using parallel genetic algorithm

    Science.gov (United States)

    Zavala-Romero, Olmo; Botella, Guillermo; Meyer-Bäse, Anke; Meyer Base, Uwe

    2011-06-01

    A new approach to optimize the parameters of a gradient-based optical flow model using a parallel genetic algorithm (GA) is proposed. The main characteristics of the optical flow algorithm are its bio-inspiration and robustness against contrast, static patterns and noise, besides working consistently with several optical illusions where other algorithms fail. This model depends on many parameters which conform the number of channels, the orientations required, the length and shape of the kernel functions used in the convolution stage, among many more. The GA is used to find a set of parameters which improve the accuracy of the optical flow on inputs where the ground-truth data is available. This set of parameters helps to understand which of them are better suited for each type of inputs and can be used to estimate the parameters of the optical flow algorithm when used with videos that share similar characteristics. The proposed implementation takes into account the embarrassingly parallel nature of the GA and uses the OpenMP Application Programming Interface (API) to speedup the process of estimating an optimal set of parameters. The information obtained in this work can be used to dynamically reconfigure systems, with potential applications in robotics, medical imaging and tracking.

  1. Empirical study of parallel LRU simulation algorithms

    Science.gov (United States)

    Carr, Eric; Nicol, David M.

    1994-01-01

    This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.

  2. Parallelizing SLPA for Scalable Overlapping Community Detection

    Directory of Open Access Journals (Sweden)

    Konstantin Kuzmin

    2015-01-01

    Full Text Available Communities in networks are groups of nodes whose connections to the nodes in a community are stronger than with the nodes in the rest of the network. Quite often nodes participate in multiple communities; that is, communities can overlap. In this paper, we first analyze what other researchers have done to utilize high performance computing to perform efficient community detection in social, biological, and other networks. We note that detection of overlapping communities is more computationally intensive than disjoint community detection, and the former presents new challenges that algorithm designers have to face. Moreover, the efficiency of many existing algorithms grows superlinearly with the network size making them unsuitable to process large datasets. We use the Speaker-Listener Label Propagation Algorithm (SLPA as the basis for our parallel overlapping community detection implementation. SLPA provides near linear time overlapping community detection and is well suited for parallelization. We explore the benefits of a multithreaded programming paradigm and show that it yields a significant performance gain over sequential execution while preserving the high quality of community detection. The algorithm was tested on four real-world datasets with up to 5.5 million nodes and 170 million edges. In order to assess the quality of community detection, at least 4 different metrics were used for each of the datasets.

  3. A parallel solver for huge dense linear systems

    Science.gov (United States)

    Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

    2011-11-01

    HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system

  4. Exploiting multi-level parallelism in streaming applications for heterogeneous platforms with GPUs

    OpenAIRE

    Balevic, Ana

    2013-01-01

    Heterogeneous computing platforms support the traditional types of parallelism, such as e.g., instruction-level, data, task, and pipeline parallelism, and provide the opportunity to exploit a combination of different types of parallelism at different platform levels. The architectural diversity of platform components makes tapping into the platform potential a challenging programming task. This thesis makes an important step in this direction by introducing a novel methodology for automatic g...

  5. Parallel Materialization of Large ABoxes.

    Science.gov (United States)

    Narayanan, Sivaramakrishnan; Catalyurek, Umit; Kurc, Tahsin; Saltz, Joel

    2009-01-01

    This paper is concerned with the efficient computation of materialization in a knowledge base with a large ABox. We present a framework for performing this task on a shared-nothing parallel machine. The framework partitions TBox and ABox axioms using a min-min strategy. It utilizes an existing system, like SwiftOWLIM, to perform local inference computations and coordinates exchange of relevant information between processors. Our approach is able to exploit parallelism in the axioms of the TBox to achieve speedup in a cluster. However, this approach is limited by the complexity of the TBox. We present an experimental evaluation of the framework using datasets from the Lehigh University Benchmark (LUBM).

  6. Structural synthesis of parallel robots

    CERN Document Server

    Gogu, Grigore

    This book represents the fifth part of a larger work dedicated to the structural synthesis of parallel robots. The originality of this work resides in the fact that it combines new formulae for mobility, connectivity, redundancy and overconstraints with evolutionary morphology in a unified structural synthesis approach that yields interesting and innovative solutions for parallel robotic manipulators.  This is the first book on robotics that presents solutions for coupled, decoupled, uncoupled, fully-isotropic and maximally regular robotic manipulators with Schönflies motions systematically generated by using the structural synthesis approach proposed in Part 1.  Overconstrained non-redundant/overactuated/redundantly actuated solutions with simple/complex limbs are proposed. Many solutions are presented here for the first time in the literature. The author had to make a difficult and challenging choice between protecting these solutions through patents and releasing them directly into the public domain. T...

  7. Aspects of computation on asynchronous parallel processors

    International Nuclear Information System (INIS)

    Wright, M.

    1989-01-01

    The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues

  8. High-speed parallel counter

    International Nuclear Information System (INIS)

    Gus'kov, B.N.; Kalinnikov, V.A.; Krastev, V.R.; Maksimov, A.N.; Nikityuk, N.M.

    1985-01-01

    This paper describes a high-speed parallel counter that contains 31 inputs and 15 outputs and is implemented by integrated circuits of series 500. The counter is designed for fast sampling of events according to the number of particles that pass simultaneously through the hodoscopic plane of the detector. The minimum delay of the output signals relative to the input is 43 nsec. The duration of the output signals can be varied from 75 to 120 nsec

  9. An anthropologist in parallel structure

    Directory of Open Access Journals (Sweden)

    Noelle Molé Liston

    2016-08-01

    Full Text Available The essay examines the parallels between Molé Liston’s studies on labor and precarity in Italy and the United States’ anthropology job market. Probing the way economic shift reshaped the field of anthropology of Europe in the late 2000s, the piece explores how the neoliberalization of the American academy increased the value in studying the hardships and daily lives of non-western populations in Europe.

  10. New algorithms for parallel MRI

    International Nuclear Information System (INIS)

    Anzengruber, S; Ramlau, R; Bauer, F; Leitao, A

    2008-01-01

    Magnetic Resonance Imaging with parallel data acquisition requires algorithms for reconstructing the patient's image from a small number of measured lines of the Fourier domain (k-space). In contrast to well-known algorithms like SENSE and GRAPPA and its flavors we consider the problem as a non-linear inverse problem. However, in order to avoid cost intensive derivatives we will use Landweber-Kaczmarz iteration and in order to improve the overall results some additional sparsity constraints.

  11. Wakefield calculations on parallel computers

    International Nuclear Information System (INIS)

    Schoessow, P.

    1990-01-01

    The use of parallelism in the solution of wakefield problems is illustrated for two different computer architectures (SIMD and MIMD). Results are given for finite difference codes which have been implemented on a Connection Machine and an Alliant FX/8 and which are used to compute wakefields in dielectric loaded structures. Benchmarks on code performance are presented for both cases. 4 refs., 3 figs., 2 tabs

  12. Combinatorics of spreads and parallelisms

    CERN Document Server

    Johnson, Norman

    2010-01-01

    Partitions of Vector Spaces Quasi-Subgeometry Partitions Finite Focal-SpreadsGeneralizing André SpreadsThe Going Up Construction for Focal-SpreadsSubgeometry Partitions Subgeometry and Quasi-Subgeometry Partitions Subgeometries from Focal-SpreadsExtended André SubgeometriesKantor's Flag-Transitive DesignsMaximal Additive Partial SpreadsSubplane Covered Nets and Baer Groups Partial Desarguesian t-Parallelisms Direct Products of Affine PlanesJha-Johnson SL(2,

  13. Parallel processing of genomics data

    Science.gov (United States)

    Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-10-01

    The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.

  14. Parallel computer calculation of quantum spin lattices

    International Nuclear Information System (INIS)

    Lamarcq, J.

    1998-01-01

    Numerical simulation allows the theorists to convince themselves about the validity of the models they use. Particularly by simulating the spin lattices one can judge about the validity of a conjecture. Simulating a system defined by a large number of degrees of freedom requires highly sophisticated machines. This study deals with modelling the magnetic interactions between the ions of a crystal. Many exact results have been found for spin 1/2 systems but not for systems of other spins for which many simulation have been carried out. The interest for simulations has been renewed by the Haldane's conjecture stipulating the existence of a energy gap between the ground state and the first excited states of a spin 1 lattice. The existence of this gap has been experimentally demonstrated. This report contains the following four chapters: 1. Spin systems; 2. Calculation of eigenvalues; 3. Programming; 4. Parallel calculation

  15. MPI-hybrid Parallelism for Volume Rendering on Large, Multi-core Systems

    Energy Technology Data Exchange (ETDEWEB)

    Howison, Mark; Bethel, E. Wes; Childs, Hank

    2010-03-20

    This work studies the performance and scalability characteristics of"hybrid'" parallel programming and execution as applied to raycasting volume rendering -- a staple visualization algorithm -- on a large, multi-core platform. Historically, the Message Passing Interface (MPI) has become the de-facto standard for parallel programming and execution on modern parallel systems. As the computing industry trends towards multi-core processors, with four- and six-core chips common today and 128-core chips coming soon, we wish to better understand how algorithmic and parallel programming choices impact performance and scalability on large, distributed-memory multi-core systems. Our findings indicate that the hybrid-parallel implementation, at levels of concurrency ranging from 1,728 to 216,000, performs better, uses a smaller absolute memory footprint, and consumes less communication bandwidth than the traditional, MPI-only implementation.

  16. The PgaFrame - a frame for parallel genetic algorithms

    OpenAIRE

    Petit Silvestre, Jordi

    1997-01-01

    The PgaFrame offers to non-expert users the possibility to solve optimization problems via genetic algorithms on parallel computers. This frame has appeared as the natural fusion of two different projects: PgaPack and Frames. The PgaFrame combines the major features of these projects: it integrates the capabilities to specify genetic algorithms offered by the PgaPack library and the support of Frames to the easy and portable programming of parallel machines. In this way, a complex fra...

  17. A design methodology for portable software on parallel computers

    Science.gov (United States)

    Nicol, David M.; Miller, Keith W.; Chrisman, Dan A.

    1993-01-01

    This final report for research that was supported by grant number NAG-1-995 documents our progress in addressing two difficulties in parallel programming. The first difficulty is developing software that will execute quickly on a parallel computer. The second difficulty is transporting software between dissimilar parallel computers. In general, we expect that more hardware-specific information will be included in software designs for parallel computers than in designs for sequential computers. This inclusion is an instance of portability being sacrificed for high performance. New parallel computers are being introduced frequently. Trying to keep one's software on the current high performance hardware, a software developer almost continually faces yet another expensive software transportation. The problem of the proposed research is to create a design methodology that helps designers to more precisely control both portability and hardware-specific programming details. The proposed research emphasizes programming for scientific applications. We completed our study of the parallelizability of a subsystem of the NASA Earth Radiation Budget Experiment (ERBE) data processing system. This work is summarized in section two. A more detailed description is provided in Appendix A ('Programming Practices to Support Eventual Parallelism'). Mr. Chrisman, a graduate student, wrote and successfully defended a Ph.D. dissertation proposal which describes our research associated with the issues of software portability and high performance. The list of research tasks are specified in the proposal. The proposal 'A Design Methodology for Portable Software on Parallel Computers' is summarized in section three and is provided in its entirety in Appendix B. We are currently studying a proposed subsystem of the NASA Clouds and the Earth's Radiant Energy System (CERES) data processing system. This software is the proof-of-concept for the Ph.D. dissertation. We have implemented and measured

  18. Computing Optimal Cycle Mean in Parallel on CUDA

    Directory of Open Access Journals (Sweden)

    Jiří Barnat

    2011-10-01

    Full Text Available Computation of optimal cycle mean in a directed weighted graph has many applications in program analysis, performance verification in particular. In this paper we propose a data-parallel algorithmic solution to the problem and show how the computation of optimal cycle mean can be efficiently accelerated by means of CUDA technology. We show how the problem of computation of optimal cycle mean is decomposed into a sequence of data-parallel graph computation primitives and show how these primitives can be implemented and optimized for CUDA computation. Finally, we report a fivefold experimental speed up on graphs representing models of distributed systems when compared to best sequential algorithms.

  19. Parallel algorithms for network routing problems and recurrences

    International Nuclear Information System (INIS)

    Wisniewski, J.A.; Sameh, A.H.

    1982-01-01

    In this paper, we consider the parallel solution of recurrences, and linear systems in the regular algebra of Carre. These problems are equivalent to solving the shortest path problem in graph theory, and they also arise in the analysis of Fortran programs. Our methods for solving linear systems in the regular algebra are analogues of well-known methods for solving systems of linear algebraic equations. A parallel version of Dijkstra's method, which has no linear algebraic analogue, is presented. Considerations for choosing an algorithm when the problem is large and sparse are also discussed

  20. Badlands: A parallel basin and landscape dynamics model

    Directory of Open Access Journals (Sweden)

    T. Salles

    2016-01-01

    Full Text Available Over more than three decades, a number of numerical landscape evolution models (LEMs have been developed to study the combined effects of climate, sea-level, tectonics and sediments on Earth surface dynamics. Most of them are written in efficient programming languages, but often cannot be used on parallel architectures. Here, I present a LEM which ports a common core of accepted physical principles governing landscape evolution into a distributed memory parallel environment. Badlands (acronym for BAsin anD LANdscape DynamicS is an open-source, flexible, TIN-based landscape evolution model, built to simulate topography development at various space and time scales.

  1. Implementing Parallel Google Map-Reduce in Eden

    DEFF Research Database (Denmark)

    Berthold, Jost; Dieterle, Mischa; Loogen, Rita

    2009-01-01

    optimised version, in the parallel Haskell extension Eden. Eden's specific features, like lazy stream processing, dynamic reply channels, and nondeterministic stream merging, support the efficient implementation of the complex coordination structure of this skeleton. We compare the two implementations......Recent publications have emphasised map-reduce as a general programming model (labelled Google map-reduce), and described existing high-performance implementations for large data sets. We present two parallel implementations for this Google map-reduce skeleton, one following earlier work, and one...

  2. A scalable implementation of RI-SCF on parallel computers

    International Nuclear Information System (INIS)

    Fruechtl, H.A.; Kendall, R.A.; Harrison, R.J.

    1996-01-01

    In order to avoid the integral bottleneck of conventional SCF calculations, the Resolution of the Identity (RI) method is used to obtain an approximate solution to the Hartree-Fock equations. In this approximation only three-center integrals are needed to build the Fock matrix. It has been implemented as part of the NWChem package of portable and scalable ab initio programs for parallel computers. Utilizing the V-approximation, both the Coulomb and exchange contribution to the Fock matrix can be calculated from a transformed set of three-center integrals which have to be precalculated and stored. A distributed in-core method as well as a disk based implementation have been programmed. Details of the implementation as well as the parallel programming tools used are described. We also give results and timings from benchmark calculations

  3. Analysing the English-Xhosa parallel corpus of technical texts with ...

    African Journals Online (AJOL)

    Paraconc has been used in analysing a parallel corpus of English-Xhosa texts. Paraconc is a parallel concordancer which makes it easy to analyse translated texts. This software program was developed by Michael Barlow (1995, 2003). He designed this software for linguists and translators who wish to work with translated ...

  4. Data-parallel tomographic reconstruction : A comparison of filtered backprojection and direct Fourier reconstruction

    NARCIS (Netherlands)

    Roerdink, J.B.T.M.; Westenberg, M.A

    1998-01-01

    We consider the parallelization of two standard 2D reconstruction algorithms, filtered backprojection and direct Fourier reconstruction, using the data-parallel programming style. The algorithms are implemented on a Connection Machine CM-5 with 16 processors and a peak performance of 2 Gflop/s.

  5. Parallel grid generation algorithm for distributed memory computers

    Science.gov (United States)

    Moitra, Stuti; Moitra, Anutosh

    1994-01-01

    A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.

  6. Parallel kinematics type, kinematics, and optimal design

    CERN Document Server

    Liu, Xin-Jun

    2014-01-01

    Parallel Kinematics- Type, Kinematics, and Optimal Design presents the results of 15 year's research on parallel mechanisms and parallel kinematics machines. This book covers the systematic classification of parallel mechanisms (PMs) as well as providing a large number of mechanical architectures of PMs available for use in practical applications. It focuses on the kinematic design of parallel robots. One successful application of parallel mechanisms in the field of machine tools, which is also called parallel kinematics machines, has been the emerging trend in advanced machine tools. The book describes not only the main aspects and important topics in parallel kinematics, but also references novel concepts and approaches, i.e. type synthesis based on evolution, performance evaluation and optimization based on screw theory, singularity model taking into account motion and force transmissibility, and others.   This book is intended for researchers, scientists, engineers and postgraduates or above with interes...

  7. Practical Parallel Divide-and-Conquer Algorithms

    National Research Council Canada - National Science Library

    Hardwick, Jonathan

    1997-01-01

    .... This thesis shows that by restricting the problem set to that of data-parallel divide and conquer algorithms I can maintain the expressibility of full nested data-parallel languages while achieving...

  8. Applied Parallel Computing Industrial Computation and Optimization

    DEFF Research Database (Denmark)

    Madsen, Kaj; NA NA NA Olesen, Dorte

    Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)......Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)...

  9. Distributed Memory Programming on Many-Cores

    DEFF Research Database (Denmark)

    Berthold, Jost; Dieterle, Mischa; Lobachev, Oleg

    2009-01-01

    Eden is a parallel extension of the lazy functional language Haskell providing dynamic process creation and automatic data exchange. As a Haskell extension, Eden takes a high-level approach to parallel programming and thereby simplifies parallel program development. The current implementation...

  10. The parallel volume at large distances

    DEFF Research Database (Denmark)

    Kampf, Jürgen

    In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to . This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....

  11. The parallel volume at large distances

    DEFF Research Database (Denmark)

    Kampf, Jürgen

    In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to 0. This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....

  12. Parallel algorithms and cluster computing

    CERN Document Server

    Hoffmann, Karl Heinz

    2007-01-01

    This book presents major advances in high performance computing as well as major advances due to high performance computing. It contains a collection of papers in which results achieved in the collaboration of scientists from computer science, mathematics, physics, and mechanical engineering are presented. From the science problems to the mathematical algorithms and on to the effective implementation of these algorithms on massively parallel and cluster computers we present state-of-the-art methods and technology as well as exemplary results in these fields. This book shows that problems which seem superficially distinct become intimately connected on a computational level.

  13. Parallel Compilation of CMS Software

    CERN Document Server

    Ashby, Shaun; Schmid, Stefan; Tuura, Lassi A

    2005-01-01

    LHC experiments have large amounts of software to build. CMS has studied ways to shorten project build times using parallel and distributed builds as well as improved ways to decide what to rebuild. We have experimented with making idle desktop and server machines easily available as a virtual build cluster using distcc and zeroconf. We have also tested variations of ccache and more traditional make dependency analysis. We report on our test results, with analysis of the factors that most improve or limit build performance.

  14. Parallel computation of rotating flows

    DEFF Research Database (Denmark)

    Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær

    1999-01-01

    This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process...... is that of solving a singular, large, sparse, over‐determined linear system of equations, and the iterative method CGLS is applied for this purpose. We discuss some of the mathematical and numerical aspects of this procedure and report on the performance of our software on a wide range of parallel computers. Darbe...

  15. The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit

    Science.gov (United States)

    Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete; hide

    1998-01-01

    Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.

  16. Parallel Sorting System for Objects

    OpenAIRE

    Opoku, Samuel King

    2012-01-01

    Conventional sorting algorithms make use of such data structures as array, file and list which define access methods of the items to be sorted. Such traditional methods as exchange sort, divide and conquer sort, selection sort and insertion sort require supervisory control program. The supervisory control program has access to the items and is responsible for arranging them in the proper order. This paper presents a different sorting algorithm that does not require supervisory control program...

  17. Parallel Processing at the High School Level.

    Science.gov (United States)

    Sheary, Kathryn Anne

    This study investigated the ability of high school students to cognitively understand and implement parallel processing. Data indicates that most parallel processing is being taught at the university level. Instructional modules on C, Linux, and the parallel processing language, P4, were designed to show that high school students are highly…

  18. A parallel gravitational N-body kernel

    NARCIS (Netherlands)

    Portegies Zwart, S.; McMillan, S.; Groen, D.; Gualandris, A.; Sipior, M.; Vermin, W.

    2008-01-01

    We describe source code level parallelization for the kira direct gravitational N-body integrator, the workhorse of the starlab production environment for simulating dense stellar systems. The parallelization strategy, called "j-parallelization", involves the partition of the computational domain by

  19. Comparison of Parallel Viscosity with Neoclassical Theory

    OpenAIRE

    K., Ida; N., Nakajima

    1996-01-01

    Toroidal rotation profiles are measured with charge exchange spectroscopy for the plasma heated with tangential NBI in CHS heliotron/torsatron device to estimate parallel viscosity. The parallel viscosity derived from the toroidal rotation velocity shows good agreement with the neoclassical parallel viscosity plus the perpendicular viscosity. (mu_perp =2m^2 /s).

  20. Identifying, Quantifying, Extracting and Enhancing Implicit Parallelism

    Science.gov (United States)

    Agarwal, Mayank

    2009-01-01

    The shift of the microprocessor industry towards multicore architectures has placed a huge burden on the programmers by requiring explicit parallelization for performance. Implicit Parallelization is an alternative that could ease the burden on programmers by parallelizing applications "under the covers" while maintaining sequential semantics…

  1. A Parallel Approach to Fractal Image Compression

    Directory of Open Access Journals (Sweden)

    Lubomir Dedera

    2004-01-01

    Full Text Available The paper deals with a parallel approach to coding and decoding algorithms in fractal image compressionand presents experimental results comparing sequential and parallel algorithms from the point of view of achieved bothcoding and decoding time and effectiveness of parallelization.

  2. Data-parallel DNS of turbulent flow

    NARCIS (Netherlands)

    Verstappen, R.W.C.P.; Veldman, A.E.P.; Emerson, DR; Ecer, A; Periaux, J; Satofuka, N

    1998-01-01

    This contribution deals with direct numerical simulation (DNS) of incompressible turbulent flows on parallel computers. We make use of the data-parallel model on shared memory systems as well as on a distributed memory machine. The combination of fast parallel computers and efficient numerical

  3. Parallelization of applications for networks with homogeneous and heterogeneous processors

    International Nuclear Information System (INIS)

    Colombet, L.

    1994-01-01

    The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs

  4. PLAST: parallel local alignment search tool for database comparison

    Directory of Open Access Journals (Sweden)

    Lavenier Dominique

    2009-10-01

    Full Text Available Abstract Background Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors. Results A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set and the multithreading concept (multicore. Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy. Conclusion A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems.

  5. PLAST: parallel local alignment search tool for database comparison.

    Science.gov (United States)

    Nguyen, Van Hoa; Lavenier, Dominique

    2009-10-12

    Sequence similarity searching is an important and challenging task in molecular biology and next-generation sequencing should further strengthen the need for faster algorithms to process such vast amounts of data. At the same time, the internal architecture of current microprocessors is tending towards more parallelism, leading to the use of chips with two, four and more cores integrated on the same die. The main purpose of this work was to design an effective algorithm to fit with the parallel capabilities of modern microprocessors. A parallel algorithm for comparing large genomic banks and targeting middle-range computers has been developed and implemented in PLAST software. The algorithm exploits two key parallel features of existing and future microprocessors: the SIMD programming model (SSE instruction set) and the multithreading concept (multicore). Compared to multithreaded BLAST software, tests performed on an 8-processor server have shown speedup ranging from 3 to 6 with a similar level of accuracy. A parallel algorithmic approach driven by the knowledge of the internal microprocessor architecture allows significant speedup to be obtained while preserving standard sensitivity for similarity search problems.

  6. Xyce parallel electronic simulator design.

    Energy Technology Data Exchange (ETDEWEB)

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  7. Advances in randomized parallel computing

    CERN Document Server

    Rajasekaran, Sanguthevar

    1999-01-01

    The technique of randomization has been employed to solve numerous prob­ lems of computing both sequentially and in parallel. Examples of randomized algorithms that are asymptotically better than their deterministic counterparts in solving various fundamental problems abound. Randomized algorithms have the advantages of simplicity and better performance both in theory and often in practice. This book is a collection of articles written by renowned experts in the area of randomized parallel computing. A brief introduction to randomized algorithms In the aflalysis of algorithms, at least three different measures of performance can be used: the best case, the worst case, and the average case. Often, the average case run time of an algorithm is much smaller than the worst case. 2 For instance, the worst case run time of Hoare's quicksort is O(n ), whereas its average case run time is only O( n log n). The average case analysis is conducted with an assumption on the input space. The assumption made to arrive at t...

  8. Integrated Task and Data Parallel Support for Dynamic Applications

    Directory of Open Access Journals (Sweden)

    James M. Rehg

    1999-01-01

    Full Text Available There is an emerging class of real‐time interactive applications that require the dynamic integration of task and data parallelism. An example is the Smart Kiosk, a free‐standing computer device that provides information and entertainment to people in public spaces. The kiosk interface is computationally demanding: It employs vision and speech sensing and an animated graphical talking face for output. The computational demands of an interactive kiosk can vary widely with the number of customers and the state of the interaction. Unfortunately this makes it difficult to apply current techniques for integrated task and data parallel computing, which can produce optimal decompositions for static problems. Using experimental results from a color‐based people tracking module, we demonstrate the existence of a small number of distinct operating regimes in the kiosk application. We refer to this type of program behavior as constrained dynamism. An application exhibiting constrained dynamism can execute efficiently by dynamically switching among a small number of statically determined fixed data parallel strategies. We present a novel framework for integrating task and data parallelism for applications that exhibit constrained dynamism. Our solution has been implemented using Stampede, a cluster programming system developed at the Cambridge Research Laboratory.

  9. User-friendly parallelization of GAUDI applications with Python

    International Nuclear Information System (INIS)

    Mato, Pere; Smith, Eoin

    2010-01-01

    GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.

  10. Interpreting the Data: Parallel Analysis with Sawzall

    Directory of Open Access Journals (Sweden)

    Rob Pike

    2005-01-01

    Full Text Available Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large data sets are not amenable to study using traditional database techniques, if only because they can be too large to fit in a single relational database. On the other hand, many of the analyses done on them can be expressed using simple, easily distributed computations: filtering, aggregation, extraction of statistics, and so on. We present a system for automating such analyses. A filtering phase, in which a query is expressed using a new procedural programming language, emits data to an aggregation phase. Both phases are distributed over hundreds or even thousands of computers. The results are then collated and saved to a file. The design – including the separation into two phases, the form of the programming language, and the properties of the aggregators – exploits the parallelism inherent in having data and computation distributed across many machines.

  11. An efficient implementation of parallel molecular dynamics method on SMP cluster architecture

    International Nuclear Information System (INIS)

    Suzuki, Masaaki; Okuda, Hiroshi; Yagawa, Genki

    2003-01-01

    The authors have applied MPI/OpenMP hybrid parallel programming model to parallelize a molecular dynamics (MD) method on a symmetric multiprocessor (SMP) cluster architecture. In that architecture, it can be expected that the hybrid parallel programming model, which uses the message passing library such as MPI for inter-SMP node communication and the loop directive such as OpenMP for intra-SNP node parallelization, is the most effective one. In this study, the parallel performance of the hybrid style has been compared with that of conventional flat parallel programming style, which uses only MPI, both in cases the fast multipole method (FMM) is employed for computing long-distance interactions and that is not employed. The computer environments used here are Hitachi SR8000/MPP placed at the University of Tokyo. The results of calculation are as follows. Without FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 90% with the hybrid style, 75% with the flat-MPI style for MD simulation with 33,402 atoms. With FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 60% with the hybrid style, 48% with the flat-MPI style for MD simulation with 117,649 atoms. (author)

  12. Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

    Science.gov (United States)

    Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

    2017-01-01

    Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.

  13. Discussion about the design for mesh data structure within the parallel framework

    International Nuclear Information System (INIS)

    Shi Guangmei; Wu Ruian; Wang Keying; Ji Xiaoyu; Hao Zhiming; Mo Jun; He Yingbo

    2010-01-01

    The mesh data structure, one of the fundamental data structure within the parallel framework, its design and realization level have an effect upon parallel capability of the parallel framework. Through the architecture and the fundamental data structure within some typical parallel framework relatively analyzed, such as JASMIN, SIERRA, and ITAPS, the design thought of parallel framework is discussed. Through borrowing ideas from layered set of services design about the SIERRA Framework, and combining with the objective of PANDA Framework in the near future, this paper present the rudimentary system about PANDA framework layered set of services. On this foundation, detailed introduction is placed in the definition and the management of the mesh data structure that it is located in the underlayer of the PANDA framework. The design and realization about parallel distributed mesh data structure of PANDA are emphatically discussed. The PANDA framework extension and application program development based on PANDA framework are grounded on our efforts.

  14. Vectorization, parallelization and porting of nuclear codes on the VPP500 system (parallelization). Progress report fiscal 1996

    International Nuclear Information System (INIS)

    Watanabe, Hideo; Kawai, Wataru; Nemoto, Toshiyuki

    1997-12-01

    Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. These results are reported in 3 parts, i.e., the vectorization part, the parallelization part and the porting part. In this report, we describe the parallelization. In this parallelization part, the parallelization of 2-Dimensional relativistic electromagnetic particle code EM2D, Cylindrical Direct Numerical Simulation code CYLDNS and molecular dynamics code for simulating radiation damages in diamond crystals DGR are described. In the vectorization part, the vectorization of two and three dimensional discrete ordinates simulation code DORT-TORT, gas dynamics analysis code FLOWGR and relativistic Boltzmann-Uehling-Uhlenbeck simulation code RBUU are described. And then, in the porting part, the porting of reactor safety analysis code RELAP5/MOD3.2 and RELAP5/MOD3.2.1.2, nuclear data processing system NJOY and 2-D multigroup discrete ordinate transport code TWOTRAN-II are described. And also, a survey for the porting of command-driven interactive data analysis plotting program IPLOT are described. (author)

  15. Systematic approach for deriving feasible mappings of parallel algorithms to parallel computing platforms

    NARCIS (Netherlands)

    Arkin, Ethem; Tekinerdogan, Bedir; Imre, Kayhan M.

    2017-01-01

    The need for high-performance computing together with the increasing trend from single processor to parallel computer architectures has leveraged the adoption of parallel computing. To benefit from parallel computing power, usually parallel algorithms are defined that can be mapped and executed

  16. GRADSPMHD: A parallel MHD code based on the SPH formalism

    Science.gov (United States)

    Vanaverbeke, S.; Keppens, R.; Poedts, S.

    2014-03-01

    We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a

  17. Portable parallel stochastic optimization for the design of aeropropulsion components

    Science.gov (United States)

    Sues, Robert H.; Rhodes, G. S.

    1994-01-01

    This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization

  18. Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

    Science.gov (United States)

    Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

    2016-01-01

    Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.

  19. Parallel Algorithms for Graph Optimization using Tree Decompositions

    Energy Technology Data Exchange (ETDEWEB)

    Sullivan, Blair D [ORNL; Weerapurage, Dinesh P [ORNL; Groer, Christopher S [ORNL

    2012-06-01

    Although many $\\cal{NP}$-hard graph optimization problems can be solved in polynomial time on graphs of bounded tree-width, the adoption of these techniques into mainstream scientific computation has been limited due to the high memory requirements of the necessary dynamic programming tables and excessive runtimes of sequential implementations. This work addresses both challenges by proposing a set of new parallel algorithms for all steps of a tree decomposition-based approach to solve the maximum weighted independent set problem. A hybrid OpenMP/MPI implementation includes a highly scalable parallel dynamic programming algorithm leveraging the MADNESS task-based runtime, and computational results demonstrate scaling. This work enables a significant expansion of the scale of graphs on which exact solutions to maximum weighted independent set can be obtained, and forms a framework for solving additional graph optimization problems with similar techniques.

  20. Device for balancing parallel strings

    Science.gov (United States)

    Mashikian, Matthew S.

    1985-01-01

    A battery plant is described which features magnetic circuit means in association with each of the battery strings in the battery plant for balancing the electrical current flow through the battery strings by equalizing the voltage across each of the battery strings. Each of the magnetic circuit means generally comprises means for sensing the electrical current flow through one of the battery strings, and a saturable reactor having a main winding connected electrically in series with the battery string, a bias winding connected to a source of alternating current and a control winding connected to a variable source of direct current controlled by the sensing means. Each of the battery strings is formed by a plurality of batteries connected electrically in series, and these battery strings are connected electrically in parallel across common bus conductors.