WorldWideScience

Sample records for poker parallel programming

  1. 78 FR 40196 - National Environmental Policy Act; Sounding Rockets Program; Poker Flat Research Range

    Science.gov (United States)

    2013-07-03

    ...; Sounding Rockets Program; Poker Flat Research Range AGENCY: National Aeronautics and Space Administration... Sounding Rockets Program (SRP) at Poker Flat Research Range (PFRR), Alaska. SUMMARY: Pursuant to the... government agencies, and educational institutions have conducted suborbital rocket launches from the PFRR...

  2. 76 FR 20715 - National Environmental Policy Act; Sounding Rockets Program; Poker Flat Research Range

    Science.gov (United States)

    2011-04-13

    ...; Sounding Rockets Program; Poker Flat Research Range AGENCY: National Aeronautics and Space Administration... continuing sounding rocket operations at Poker Flat Research Range (PFRR), Alaska. SUMMARY: Pursuant to the... information about NASA's Sounding Rocket Program (SRP) and the University of Alaska-Fairbanks' PFRR may be...

  3. Poker Phases

    DEFF Research Database (Denmark)

    Bjerg, Ole

    2011-01-01

    , it is demonstrated how the three forms emerge and become the most popular form of poker at three different periods in history. It identifies structural homologies between the historical development of poker and key elements in the manifestation of capitalism at different times in history...

  4. Introduction to parallel programming

    CERN Document Server

    Brawer, Steven

    1989-01-01

    Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race

  5. Poker-camp: a program for calculating detector responses and phantom organ doses in environmental gamma fields

    International Nuclear Information System (INIS)

    Koblinger, L.

    1981-09-01

    A general description, user's manual and a sample problem are given in this report on the POKER-CAMP adjoint Monte Carlo photon transport program. Gamma fields of different environmental sources which are uniformly or exponentially distributed sources or plane sources in the air, in the soil or in an intermediate layer placed between them are simulated in the code. Calculations can be made on flux, kerma and spectra of photons at any point; and on responses of point-like, cylindrical, or spherical detectors; and on doses absorbed in anthropomorphic phantoms. (author)

  6. Parallel Programming with Intel Parallel Studio XE

    CERN Document Server

    Blair-Chappell , Stephen

    2012-01-01

    Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the

  7. Opponent Classification in Poker

    Science.gov (United States)

    Ahmad, Muhammad Aurangzeb; Elidrisi, Mohamed

    Modeling games has a long history in the Artificial Intelligence community. Most of the games that have been considered solved in AI are perfect information games. Imperfect information games like Poker and Bridge represent a domain where there is a great deal of uncertainty involved and additional challenges with respect to modeling the behavior of the opponent etc. Techniques developed for playing imperfect games also have many real world applications like repeated online auctions, human computer interaction, opponent modeling for military applications etc. In this paper we explore different techniques for playing poker, the core of these techniques is opponent modeling via classifying the behavior of opponent according to classes provided by domain experts. We utilize windows of full observation in the game to classify the opponent. In Poker, the behavior of an opponent is classified into four standard poker-playing styles based on a subjective function.

  8. A heads-up no-limit Texas Hold'em poker player: Discretized betting models and automatically generated equilibrium-finding programs

    DEFF Research Database (Denmark)

    Gilpin, Andrew G.; Sandholm, Tuomas; Sørensen, Troels Bjerre

    2008-01-01

    choices in the game. Second, we employ potential-aware automated abstraction algorithms for identifying strategically similar situations in order to decrease the size of the game tree. Third, we develop a new technique for automatically generating the source code of an equilibrium-finding algorithm from...... an XML-based description of a game. This automatically generated program is more efficient than what would be possible with a general-purpose equilibrium-finding program. Finally, we present results from the AAAI-07 Computer Poker Competition, in which Tartanian placed second out of ten entries....

  9. Parallel programming with Python

    CERN Document Server

    Palach, Jan

    2014-01-01

    A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.

  10. Practical parallel programming

    CERN Document Server

    Bauer, Barr E

    2014-01-01

    This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.

  11. Writing parallel programs that work

    CERN Multimedia

    CERN. Geneva

    2012-01-01

    Serial algorithms typically run inefficiently on parallel machines. This may sound like an obvious statement, but it is the root cause of why parallel programming is considered to be difficult. The current state of the computer industry is still that almost all programs in existence are serial. This talk will describe the techniques used in the Intel Parallel Studio to provide a developer with the tools necessary to understand the behaviors and limitations of the existing serial programs. Once the limitations are known the developer can refactor the algorithms and reanalyze the resulting programs with the tools in the Intel Parallel Studio to create parallel programs that work. About the speaker Paul Petersen is a Sr. Principal Engineer in the Software and Solutions Group (SSG) at Intel. He received a Ph.D. degree in Computer Science from the University of Illinois in 1993. After UIUC, he was employed at Kuck and Associates, Inc. (KAI) working on auto-parallelizing compiler (KAP), and was involved in th...

  12. Refinement of Parallel and Reactive Programs

    OpenAIRE

    Back, R. J. R.

    1992-01-01

    We show how to apply the refinement calculus to stepwise refinement of parallel and reactive programs. We use action systems as our basic program model. Action systems are sequential programs which can be implemented in a parallel fashion. Hence refinement calculus methods, originally developed for sequential programs, carry over to the derivation of parallel programs. Refinement of reactive programs is handled by data refinement techniques originally developed for the sequential refinement c...

  13. About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems

    Directory of Open Access Journals (Sweden)

    Loredana MOCEAN

    2009-01-01

    Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.

  14. The Poker-Litigation Game

    OpenAIRE

    Guerra-Pujol, Enrique

    2015-01-01

    Is litigation a serious search for truth or simply a game of skill or luck? Although the process of litigation has been modeled as a Prisoner's Dilemma, as a War of Attrition, as a Game of Chicken and even as a simple coin toss, no one has formally modeled litigation as a game of poker. This paper is the first to do so. We present a simple "poker-litigation game" and find the optimal strategy for playing this game.

  15. Structured Parallel Programming Patterns for Efficient Computation

    CERN Document Server

    McCool, Michael; Robison, Arch

    2012-01-01

    Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th

  16. Parallel programming with Easy Java Simulations

    Science.gov (United States)

    Esquembre, F.; Christian, W.; Belloni, M.

    2018-01-01

    Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.

  17. Compiling Scientific Programs for Scalable Parallel Systems

    National Research Council Canada - National Science Library

    Kennedy, Ken

    2001-01-01

    ...). The research performed in this project included new techniques for recognizing implicit parallelism in sequential programs, a powerful and precise set-based framework for analysis and transformation...

  18. PDDP, A Data Parallel Programming Model

    Directory of Open Access Journals (Sweden)

    Karen H. Warren

    1996-01-01

    Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.

  19. Massively Parallel Finite Element Programming

    KAUST Repository

    Heister, Timo

    2010-01-01

    Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.

  20. Massively Parallel Finite Element Programming

    KAUST Repository

    Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang

    2010-01-01

    Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.

  1. Language constructs for modular parallel programs

    Energy Technology Data Exchange (ETDEWEB)

    Foster, I.

    1996-03-01

    We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.

  2. Determinants of internet poker adoption.

    Science.gov (United States)

    Philander, Kahlil S; Abarbanel, B Lillian

    2014-09-01

    In nearly all jurisdictions, adoption of a new form of gambling has been a controversial and contentious subject. Online gambling has been no different, though there are many aspects that affect online gambling that do not appear in the brick and mortar environment. This study seeks to identify whether demographic, economic, political, technological, and/or sociological determinants contribute to online poker gambling adoption. A theoretical discussion of these categories' importance to online poker is provided and exploratory empirical analysis is used to examine their potential validity. The analysis revealed support for all of the proposed categories of variables thought to be predictive of online gambling legality.

  3. Experiences in Data-Parallel Programming

    Directory of Open Access Journals (Sweden)

    Terry W. Clark

    1997-01-01

    Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.

  4. Are online poker problem gamblers sensation seekers?

    Science.gov (United States)

    Bonnaire, Céline

    2018-03-31

    The purpose of this research was to examine the relationship between sensation seeking and online poker gambling in a community sample of adult online poker players, when controlling for age, gender, anxiety and depression. In total, 288 online poker gamblers were recruited. Sociodemographic data, gambling behavior (CPGI), sensation seeking (SSS), depression and anxiety (HADS) were evaluated. Problem online poker gamblers have higher sensation seeking scores (total, thrill and adventure, disinhibition and boredom susceptibility subscores) and depression scores than non-problem online poker gamblers. Being male, with total sensation seeking, disinhibition and depression scores are factors associated with online poker problem gambling. These findings are interesting in terms of harm reduction. For example, because disinhibition could lead to increased time and money spent, protective behavioral strategies like setting time and monetary limits should be encouraged in poker online gamblers. Copyright © 2018 Elsevier B.V. All rights reserved.

  5. Productive Parallel Programming: The PCN Approach

    Directory of Open Access Journals (Sweden)

    Ian Foster

    1992-01-01

    Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.

  6. Parallel processor programs in the Federal Government

    Science.gov (United States)

    Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

    1985-01-01

    In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.

  7. The kpx, a program analyzer for parallelization

    International Nuclear Information System (INIS)

    Matsuyama, Yuji; Orii, Shigeo; Ota, Toshiro; Kume, Etsuo; Aikawa, Hiroshi.

    1997-03-01

    The kpx is a program analyzer, developed as a common technological basis for promoting parallel processing. The kpx consists of three tools. The first is ktool, that shows how much execution time is spent in program segments. The second is ptool, that shows parallelization overhead on the Paragon system. The last is xtool, that shows parallelization overhead on the VPP system. The kpx, designed to work for any FORTRAN cord on any UNIX computer, is confirmed to work well after testing on Paragon, SP2, SR2201, VPP500, VPP300, Monte-4, SX-4 and T90. (author)

  8. Speedup predictions on large scientific parallel programs

    International Nuclear Information System (INIS)

    Williams, E.; Bobrowicz, F.

    1985-01-01

    How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory

  9. Automatic Parallelization Tool: Classification of Program Code for Parallel Computing

    Directory of Open Access Journals (Sweden)

    Mustafa Basthikodi

    2016-04-01

    Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.

  10. Portable parallel programming in a Fortran environment

    International Nuclear Information System (INIS)

    May, E.N.

    1989-01-01

    Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs

  11. From sequential to parallel programming with patterns

    CERN Document Server

    CERN. Geneva

    2018-01-01

    To increase in both performance and efficiency, our programming models need to adapt to better exploit modern processors. The classic idioms and patterns for programming such as loops, branches or recursion are the pillars of almost every code and are well known among all programmers. These patterns all have in common that they are sequential in nature. Embracing parallel programming patterns, which allow us to program for multi- and many-core hardware in a natural way, greatly simplifies the task of designing a program that scales and performs on modern hardware, independently of the used programming language, and in a generic way.

  12. Parallel Volunteer Learning during Youth Programs

    Science.gov (United States)

    Lesmeister, Marilyn K.; Green, Jeremy; Derby, Amy; Bothum, Candi

    2012-01-01

    Lack of time is a hindrance for volunteers to participate in educational opportunities, yet volunteer success in an organization is tied to the orientation and education they receive. Meeting diverse educational needs of volunteers can be a challenge for program managers. Scheduling a Volunteer Learning Track for chaperones that is parallel to a…

  13. Contributions to computational stereology and parallel programming

    DEFF Research Database (Denmark)

    Rasmusson, Allan

    rotator, even without the need for isotropic sections. To meet the need for computational power to perform image restoration of virtual tissue sections, parallel programming on GPUs has also been part of the project. This has lead to a significant change in paradigm for a previously developed surgical...

  14. Program For Parallel Discrete-Event Simulation

    Science.gov (United States)

    Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.

    1991-01-01

    User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.

  15. Umelá inteligencia pre hru Texas Holdem poker

    OpenAIRE

    Moravčík, Matej

    2012-01-01

    Recently there has been a great expansion of poker. This includes live games, as well as games on the internet. For beginners, it may be difficult to find opponents skilled enough and thus improve their gaming performance without deposit of their own funds. Using of artificial intelligence seems as good solution for the problem, but there are only few suitable programs available. This thesis describes the overall design and development of such an application, specially designed for tournament...

  16. Programming massively parallel processors a hands-on approach

    CERN Document Server

    Kirk, David B

    2010-01-01

    Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...

  17. PSHED: a simplified approach to developing parallel programs

    International Nuclear Information System (INIS)

    Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

    1992-01-01

    This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs

  18. Evolution of a minimal parallel programming model

    International Nuclear Information System (INIS)

    Lusk, Ewing; Butler, Ralph; Pieper, Steven C.

    2017-01-01

    Here, we take a historical approach to our presentation of self-scheduled task parallelism, a programming model with its origins in early irregular and nondeterministic computations encountered in automated theorem proving and logic programming. We show how an extremely simple task model has evolved into a system, asynchronous dynamic load balancing (ADLB), and a scalable implementation capable of supporting sophisticated applications on today’s (and tomorrow’s) largest supercomputers; and we illustrate the use of ADLB with a Green’s function Monte Carlo application, a modern, mature nuclear physics code in production use. Our lesson is that by surrendering a certain amount of generality and thus applicability, a minimal programming model (in terms of its basic concepts and the size of its application programmer interface) can achieve extreme scalability without introducing complexity.

  19. Profiling Online Poker Players: Are Executive Functions Correlated with Poker Ability and Problem Gambling?

    Science.gov (United States)

    Schiavella, Mauro; Pelagatti, Matteo; Westin, Jerker; Lepore, Gabriele; Cherubini, Paolo

    2018-01-12

    Poker playing and responsible gambling both entail the use of the executive functions (EF), which are higher-level cognitive abilities. This study investigated if online poker players of different ability showed different performances in their EF and if so, which functions were the most discriminating for their playing ability. Furthermore, it assessed if the EF performance was correlated to the quality of gambling, according to self-reported questionnaires (PGSI, SOGS, GRCS). Three poker experts evaluated anonymized poker hand history files and, then, a trained professional administered an extensive neuropsychological test battery. Data analysis determined which variables of the tests correlated with poker ability and gambling quality scores. The highest correlations between EF test results and poker ability and between EF test results and gambling quality assessment showed that mostly different clusters of executive functions characterize the profile of the strong(er) poker player and those ones of the problem gamblers (PGSI and SOGS) and the one of the cognitions related to gambling (GRCS). Taking into consideration only the variables overlapping between PGSI and SOGS, we found some key predictive factors for a more risky and harmful online poker playing: a lower performance in the emotional intelligence competences (Emotional Quotient inventory Short) and, in particular, those grouped in the Intrapersonal scale (emotional self-awareness, assertiveness, self-regard, independence and self-actualization).

  20. Development of parallel/serial program analyzing tool

    International Nuclear Information System (INIS)

    Watanabe, Hiroshi; Nagao, Saichi; Takigawa, Yoshio; Kumakura, Toshimasa

    1999-03-01

    Japan Atomic Energy Research Institute has been developing 'KMtool', a parallel/serial program analyzing tool, in order to promote the parallelization of the science and engineering computation program. KMtool analyzes the performance of program written by FORTRAN77 and MPI, and it reduces the effort for parallelization. This paper describes development purpose, design, utilization and evaluation of KMtool. (author)

  1. A Tutorial on Parallel and Concurrent Programming in Haskell

    Science.gov (United States)

    Peyton Jones, Simon; Singh, Satnam

    This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.

  2. Strategic Style in Pared-Down Poker

    Science.gov (United States)

    Burns, Kevin

    This chapter deals with the manner of making diagnoses and decisions, called strategic style, in a gambling game called Pared-down Poker. The approach treats style as a mental mode in which choices are constrained by expected utilities. The focus is on two classes of utility, i.e., money and effort, and how cognitive styles compare to normative strategies in optimizing these utilities. The insights are applied to real-world concerns like managing the war against terror networks and assessing the risks of system failures. After "Introducing the Interactions" involved in playing poker, the contents are arranged in four sections, as follows. "Underpinnings of Utility" outlines four classes of utility and highlights the differences between them: economic utility (money), ergonomic utility (effort), informatic utility (knowledge), and aesthetic utility (pleasure). "Inference and Investment" dissects the cognitive challenges of playing poker and relates them to real-world situations of business and war, where the key tasks are inference (of cards in poker, or strength in war) and investment (of chips in poker, or force in war) to maximize expected utility. "Strategies and Styles" presents normative (optimal) approaches to inference and investment, and compares them to cognitive heuristics by which people play poker--focusing on Bayesian methods and how they differ from human styles. The normative strategy is then pitted against cognitive styles in head-to-head tournaments, and tournaments are also held between different styles. The results show that style is ergonomically efficient and economically effective, i.e., style is smart. "Applying the Analysis" explores how style spaces, of the sort used to model individual behavior in Pared-down Poker, might also be applied to real-world problems where organizations evolve in terror networks and accidents arise from system failures.

  3. Parallelization for first principles electronic state calculation program

    International Nuclear Information System (INIS)

    Watanabe, Hiroshi; Oguchi, Tamio.

    1997-03-01

    In this report we study the parallelization for First principles electronic state calculation program. The target machines are NEC SX-4 for shared memory type parallelization and FUJITSU VPP300 for distributed memory type parallelization. The features of each parallel machine are surveyed, and the parallelization methods suitable for each are proposed. It is shown that 1.60 times acceleration is achieved with 2 CPU parallelization by SX-4 and 4.97 times acceleration is achieved with 12 PE parallelization by VPP 300. (author)

  4. Step by step parallel programming method for molecular dynamics code

    International Nuclear Information System (INIS)

    Orii, Shigeo; Ohta, Toshio

    1996-07-01

    Parallel programming for a numerical simulation program of molecular dynamics is carried out with a step-by-step programming technique using the two phase method. As a result, within the range of a certain computing parameters, it is found to obtain parallel performance by using the level of parallel programming which decomposes the calculation according to indices of do-loops into each processor on the vector parallel computer VPP500 and the scalar parallel computer Paragon. It is also found that VPP500 shows parallel performance in wider range computing parameters. The reason is that the time cost of the program parts, which can not be reduced by the do-loop level of the parallel programming, can be reduced to the negligible level by the vectorization. After that, the time consuming parts of the program are concentrated on less parts that can be accelerated by the do-loop level of the parallel programming. This report shows the step-by-step parallel programming method and the parallel performance of the molecular dynamics code on VPP500 and Paragon. (author)

  5. Parallel phase model : a programming model for high-end parallel machines with manycores.

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

    2009-04-01

    This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.

  6. Psychopathology of Online Poker Players: Review of Literature.

    Science.gov (United States)

    Moreau, Axelle; Chabrol, Henri; Chauchard, Emeline

    2016-06-01

    Background and aims Online Texas Hold'em poker has become a spectacular form of entertainment in our society, and the number of people who use this form of gambling is increasing. It seems that online poker activity challenges existing theoretical concepts about problem gambling behaviors. The purpose of this literature review is to provide a current overview about the population of online poker players. Methods To be selected, articles had to focus on psychopathology in a sample of online poker players, be written in English or French, and be published before November 2015. A total of 17 relevant studies were identified. Results In this population, the proportion of problematic gamblers was higher than in other forms of gambling. Several factors predicting excessive gambling were identified such as stress, internal attribution, dissociation, boredom, negative emotions, irrational beliefs, anxiety, and impulsivity. The population of online poker players is largely heterogeneous, with experimental players forming a specific group. Finally, the validity of the tools used to measure excessive or problematic gambling and irrational beliefs are not suitable for assessing online poker activity. Discussion and conclusions Future studies need to confirm previous findings in the literature of online poker games. Given that skills are important in poker playing, skill development in the frames of excessive use of online poker should be explored more in depth, particularly regarding poker experience and loss chasing. Future research should focus on skills, self-regulation, and psychopathology of online poker players.

  7. Three dimensional Burn-up program parallelization using socket programming

    International Nuclear Information System (INIS)

    Haliyati R, Evi; Su'ud, Zaki

    2002-01-01

    A computer parallelization process was built with a purpose to decrease execution time of a physics program. In this case, a multi computer system was built to be used to analyze burn-up process of a nuclear reactor. This multi computer system was design need using a protocol communication among sockets, i.e. TCP/IP. This system consists of computer as a server and the rest as clients. The server has a main control to all its clients. The server also divides the reactor core geometrically to in parts in accordance with the number of clients, each computer including the server has a task to conduct burn-up analysis of 1/n part of the total reactor core measure. This burn-up analysis was conducted simultaneously and in a parallel way by all computers, so a faster program execution time was achieved close to 1/n times that of one computer. Then an analysis was carried out and states that in order to calculate the density of atoms in a reactor of 91 cm x 91 cm x 116 cm, the usage of a parallel system of 2 computers has the highest efficiency

  8. Measuring Consumer Preferences Using Conjoint Poker

    NARCIS (Netherlands)

    O. Toubia; M.G. de Jong (Martijn); D. Stieger; J.H. Fuller (John)

    2012-01-01

    textabstractWe develop and test an incentive-compatible Conjoint Poker (CP) game. The preference data collected in the context of this game are comparable to incentive-compatible choice-based conjoint (CBC) analysis data. We develop a statistical efficiency measure and an algorithm to construct

  9. "Don't worry, it's just poker!"--experience, self-rumination and self-reflection as determinants of decision-making in on-line poker.

    Science.gov (United States)

    Palomäki, Jussi; Laakasuo, Michael; Salmela, Mikko

    2013-09-01

    On-line poker is a game of chance and skill. The construct of poker playing skill has both a technical (game strategy-related) and an emotional (emotion regulation-related) aspect. A correlational on-line study (N = 354) was conducted to assess differences in technical skills and emotional characteristics related to poker playing style between experienced and inexperienced poker players. Results suggest that, with respect to emotional characteristics, experienced poker players engage in less self-rumination and more self-reflection, as compared to inexperienced players. Experienced poker players are also able to make better decisions, by mathematical standards, in a poker decision-making environment, as assessed by two fictitious on-line poker decision-making scenarios. Furthermore, this study provides supportive evidence that experienced poker players conceptualize the construct of "luck" differently from inexperienced players. A new poker playing experience scale (PES) for accurately measuring poker playing experience is presented in this paper.

  10. Programming parallel architectures - The BLAZE family of languages

    Science.gov (United States)

    Mehrotra, Piyush

    1989-01-01

    This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.

  11. Internet poker websites and pathological gambling prevention policy.

    Science.gov (United States)

    Khazaal, Yasser; Chatton, Anne; Bouvard, Audrey; Khiari, Hiba; Achab, Sophia; Zullino, Daniele

    2013-03-01

    Despite the widespread increase in online poker playing and the risk related to excessive poker playing, research on online poker websites is still lacking with regard to pathological gambling prevention strategies offered by the websites. The aim of the present study was to assess the pathological gambling-related prevention strategies of online poker websites. Two keywords ("poker" and "poker help") were entered into two popular World Wide Web search engines. The first 20 links related to French and English online poker websites were assessed. Seventy-four websites were assessed with a standardized tool designed to rate sites on the basis of accountability, interactivity, prevention strategies, marketing, and messages related to poker strategies. Prevention strategies appeared to be lacking. Whereas a substantial proportion of the websites offered incitation to gambling such as betting "tips," few sites offered strategies to prevent or address problem gambling. Furthermore, strategies related to poker, such as probability estimation, were mostly reported without acknowledging their limitations. Results of this study suggest that more adequate prevention strategies for risky gambling should be developed for online poker.

  12. Parallel programming practical aspects, models and current limitations

    CERN Document Server

    Tarkov, Mikhail S

    2014-01-01

    Parallel programming is designed for the use of parallel computer systems for solving time-consuming problems that cannot be solved on a sequential computer in a reasonable time. These problems can be divided into two classes: 1. Processing large data arrays (including processing images and signals in real time)2. Simulation of complex physical processes and chemical reactions For each of these classes, prospective methods are designed for solving problems. For data processing, one of the most promising technologies is the use of artificial neural networks. Particles-in-cell method and cellular automata are very useful for simulation. Problems of scalability of parallel algorithms and the transfer of existing parallel programs to future parallel computers are very acute now. An important task is to optimize the use of the equipment (including the CPU cache) of parallel computers. Along with parallelizing information processing, it is essential to ensure the processing reliability by the relevant organization ...

  13. On the Automatic Parallelization of Sparse and Irregular Fortran Programs

    Directory of Open Access Journals (Sweden)

    Yuan Lin

    1999-01-01

    Full Text Available Automatic parallelization is usually believed to be less effective at exploiting implicit parallelism in sparse/irregular programs than in their dense/regular counterparts. However, not much is really known because there have been few research reports on this topic. In this work, we have studied the possibility of using an automatic parallelizing compiler to detect the parallelism in sparse/irregular programs. The study with a collection of sparse/irregular programs led us to some common loop patterns. Based on these patterns new techniques were derived that produced good speedups when manually applied to our benchmark codes. More importantly, these parallelization methods can be implemented in a parallelizing compiler and can be applied automatically.

  14. Survey on present status and trend of parallel programming environments

    International Nuclear Information System (INIS)

    Takemiya, Hiroshi; Higuchi, Kenji; Honma, Ichiro; Ohta, Hirofumi; Kawasaki, Takuji; Imamura, Toshiyuki; Koide, Hiroshi; Akimoto, Masayuki.

    1997-03-01

    This report intends to provide useful information on software tools for parallel programming through the survey on parallel programming environments of the following six parallel computers, Fujitsu VPP300/500, NEC SX-4, Hitachi SR2201, Cray T94, IBM SP, and Intel Paragon, all of which are installed at Japan Atomic Energy Research Institute (JAERI), moreover, the present status of R and D's on parallel softwares of parallel languages, compilers, debuggers, performance evaluation tools, and integrated tools is reported. This survey has been made as a part of our project of developing a basic software for parallel programming environment, which is designed on the concept of STA (Seamless Thinking Aid to programmers). (author)

  15. The BLAZE language - A parallel language for scientific programming

    Science.gov (United States)

    Mehrotra, Piyush; Van Rosendale, John

    1987-01-01

    A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.

  16. The BLAZE language: A parallel language for scientific programming

    Science.gov (United States)

    Mehrotra, P.; Vanrosendale, J.

    1985-01-01

    A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.

  17. Professional Parallel Programming with C# Master Parallel Extensions with NET 4

    CERN Document Server

    Hillar, Gastón

    2010-01-01

    Expert guidance for those programming today's dual-core processors PCs As PC processors explode from one or two to now eight processors, there is an urgent need for programmers to master concurrent programming. This book dives deep into the latest technologies available to programmers for creating professional parallel applications using C#, .NET 4, and Visual Studio 2010. The book covers task-based programming, coordination data structures, PLINQ, thread pools, asynchronous programming model, and more. It also teaches other parallel programming techniques, such as SIMD and vectorization.Teach

  18. A Model for Speedup of Parallel Programs

    Science.gov (United States)

    1997-01-01

    Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static

  19. An environment for parallel structuring of Fortran programs

    International Nuclear Information System (INIS)

    Sridharan, K.; McShea, M.; Denton, C.; Eventoff, B.; Browne, J.C.; Newton, P.; Ellis, M.; Grossbard, D.; Wise, T.; Clemmer, D.

    1990-01-01

    The paper describes and illustrates an environment for interactive support of the detection and implementation of macro-level parallelism in Fortran programs. The approach couples algorithms for dependence analysis with both innovative techniques for complexity management and capabilities for the measurement and analysis of the parallel computation structures generated through use of the environment. The resulting environment is complementary to the more common approach of seeking local parallelism by loop unrolling, either by an automatic compiler or manually. (orig.)

  20. Programming parallel architectures: The BLAZE family of languages

    Science.gov (United States)

    Mehrotra, Piyush

    1988-01-01

    Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.

  1. Integrated Task And Data Parallel Programming: Language Design

    Science.gov (United States)

    Grimshaw, Andrew S.; West, Emily A.

    1998-01-01

    his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated

  2. The FORCE: A highly portable parallel programming language

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.

  3. The FORCE - A highly portable parallel programming language

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.

  4. A Programming Environment for Parallel Vision Algorithms

    Science.gov (United States)

    1990-04-11

    industrial arm on the market , while the unique head was designed by Rochester’s Computer Science and Mechanical Engineering Departments. 9a 4.1 Introduction...R. Constraining-Unification and the Programming Language Unicorn . In Logic Programming, Functions, Relations, and Equations, Degroot and Lind- strom

  5. Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

    Directory of Open Access Journals (Sweden)

    Stephen L. Olivier

    2013-01-01

    Full Text Available Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems. Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.

  6. Declarative Parallel Programming in Spreadsheet End-User Development

    DEFF Research Database (Denmark)

    Biermann, Florian

    2016-01-01

    Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations. In this liter...... can directly apply results from functional array programming to a spreadsheet model of computations.......Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations....... In this literature study, we provide an overview of the publications on spreadsheet end-user programming and declarative array programming to inform further research on parallel programming in spreadsheets. Our results show that there is a clear overlap between spreadsheet programming and array programming and we...

  7. Parallel adaptation of a vectorised quantumchemical program system

    International Nuclear Information System (INIS)

    Van Corler, L.C.H.; Van Lenthe, J.H.

    1987-01-01

    Supercomputers, like the CRAY 1 or the Cyber 205, have had, and still have, a marked influence on Quantum Chemistry. Vectorization has led to a considerable increase in the performance of Quantum Chemistry programs. However, clockcycle times more than a factor 10 smaller than those of the present supercomputers are not to be expected. Therefore future supercomputers will have to depend on parallel structures. Recently, the first examples of such supercomputers have been installed. To be prepared for this new generation of (parallel) supercomputers one should consider the concepts one wants to use and the kind of problems one will encounter during implementation of existing vectorized programs on those parallel systems. The authors implemented four important parts of a large quantumchemical program system (ATMOL), i.e. integrals, SCF, 4-index and Direct-CI in the parallel environment at ECSEC (Rome, Italy). This system offers simulated parallellism on the host computer (IBM 4381) and real parallellism on at most 10 attached processors (FPS-164). Quantumchemical programs usually handle large amounts of data and very large, often sparse matrices. The transfer of that many data can cause problems concerning communication and overhead, in view of which shared memory and shared disks must be considered. The strategy and the tools that were used to parallellise the programs are shown. Also, some examples are presented to illustrate effectiveness and performance of the system in Rome for these type of calculations

  8. Program Transformation to Identify List-Based Parallel Skeletons

    Directory of Open Access Journals (Sweden)

    Venkatesh Kannan

    2016-07-01

    Full Text Available Algorithmic skeletons are used as building-blocks to ease the task of parallel programming by abstracting the details of parallel implementation from the developer. Most existing libraries provide implementations of skeletons that are defined over flat data types such as lists or arrays. However, skeleton-based parallel programming is still very challenging as it requires intricate analysis of the underlying algorithm and often uses inefficient intermediate data structures. Further, the algorithmic structure of a given program may not match those of list-based skeletons. In this paper, we present a method to automatically transform any given program to one that is defined over a list and is more likely to contain instances of list-based skeletons. This facilitates the parallel execution of a transformed program using existing implementations of list-based parallel skeletons. Further, by using an existing transformation called distillation in conjunction with our method, we produce transformed programs that contain fewer inefficient intermediate data structures.

  9. Social and psychological challenges of poker.

    Science.gov (United States)

    Siler, Kyle

    2010-09-01

    Poker is a competitive, social game of skill and luck, which presents players with numerous challenging strategic and interpersonal decisions. The adaptation of poker into a game played over the internet provides the unprecedented opportunity to quantitatively analyze extremely large numbers of hands and players. This paper analyzes roughly twenty-seven million hands played online in small-stakes, medium-stakes and high-stakes games. Using PokerTracker software, statistics are generated to (a) gauge the types of strategies utilized by players (i.e. the 'strategic demography') at each level and (b) examine the various payoffs associated with different strategies at varying levels of play. The results show that competitive edges attenuate as one moves up levels, and tight-aggressive strategies--which tend to be the most remunerative--become more prevalent. Further, payoffs for different combinations of cards, varies between levels, showing how strategic payoffs are derived from competitive interactions. Smaller-stakes players also have more difficulty appropriately weighting incentive structures with frequent small gains and occasional large losses. Consequently, the relationship between winning a large proportion of hands and profitability is negative, and is strongest in small-stakes games. These variations reveal a meta-game of rationality and psychology which underlies the card game. Adopting risk-neutrality to maximize expected value, aggression and appropriate mental accounting, are cognitive burdens on players, and underpin the rationality work--reconfiguring of personal preferences and goals--players engage into be competitive, and maximize their winning and profit chances.

  10. Development of massively parallel quantum chemistry program SMASH

    Energy Technology Data Exchange (ETDEWEB)

    Ishimura, Kazuya [Department of Theoretical and Computational Molecular Science, Institute for Molecular Science 38 Nishigo-Naka, Myodaiji, Okazaki, Aichi 444-8585 (Japan)

    2015-12-31

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.

  11. Development of massively parallel quantum chemistry program SMASH

    International Nuclear Information System (INIS)

    Ishimura, Kazuya

    2015-01-01

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C 150 H 30 ) 2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer

  12. Parallelization for X-ray crystal structural analysis program

    Energy Technology Data Exchange (ETDEWEB)

    Watanabe, Hiroshi [Japan Atomic Energy Research Inst., Tokyo (Japan); Minami, Masayuki; Yamamoto, Akiji

    1997-10-01

    In this report we study vectorization and parallelization for X-ray crystal structural analysis program. The target machine is NEC SX-4 which is a distributed/shared memory type vector parallel supercomputer. X-ray crystal structural analysis is surveyed, and a new multi-dimensional discrete Fourier transform method is proposed. The new method is designed to have a very long vector length, so that it enables to obtain the 12.0 times higher performance result that the original code. Besides the above-mentioned vectorization, the parallelization by micro-task functions on SX-4 reaches 13.7 times acceleration in the part of multi-dimensional discrete Fourier transform with 14 CPUs, and 3.0 times acceleration in the whole program. Totally 35.9 times acceleration to the original 1CPU scalar version is achieved with vectorization and parallelization on SX-4. (author)

  13. HoldemML: A framework to generate No Limit Hold'em Poker agents from human player strategies

    OpenAIRE

    Luís Filipe Teófilo; Luís Paulo Reis

    2011-01-01

    Developing computer programs that play Poker at human level is considered to be challenge to the A.I research community, due to its incomplete information and stochastic nature. Due to these characteristics of the game, a competitive agent must manage luck and use opponent modeling to be successful at short term and therefore be profitable. In this paper we propose the creation of No Limit Hold'em Poker agents by copying strategies of the best human players, by analyzing past games between th...

  14. On program restructuring, scheduling, and communication for parallel processor systems

    Energy Technology Data Exchange (ETDEWEB)

    Polychronopoulos, Constantine D. [Univ. of Illinois, Urbana, IL (United States)

    1986-08-01

    This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, these algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.

  15. Basic design of parallel computational program for probabilistic structural analysis

    International Nuclear Information System (INIS)

    Kaji, Yoshiyuki; Arai, Taketoshi; Gu, Wenwei; Nakamura, Hitoshi

    1999-06-01

    In our laboratory, for 'development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory' (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)

  16. Basic design of parallel computational program for probabilistic structural analysis

    Energy Technology Data Exchange (ETDEWEB)

    Kaji, Yoshiyuki; Arai, Taketoshi [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Gu, Wenwei; Nakamura, Hitoshi

    1999-06-01

    In our laboratory, for `development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory` (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)

  17. Parallel implementation of the PHOENIX generalized stellar atmosphere program. II. Wavelength parallelization

    International Nuclear Information System (INIS)

    Baron, E.; Hauschildt, Peter H.

    1998-01-01

    We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000 - 300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers. copyright 1998 The American Astronomical Society

  18. A Programming Model for Massive Data Parallelism with Data Dependencies

    International Nuclear Information System (INIS)

    Cui, Xiaohui; Mueller, Frank; Potok, Thomas E.; Zhang, Yongpeng

    2009-01-01

    Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For graphics processor units (GPUs), this is particularly the case when program development is aided by environments such as NVIDIA s Compute Unified Device Architecture (CUDA), which dramatically reduces the gap between domain-specific architectures and general purpose programming. Nonetheless, general-purpose GPU (GPGPU) programming remains subject to several restrictions. Most significantly, the separation of host (CPU) and accelerator (GPU) address spaces requires explicit management of GPU memory resources, especially for massive data parallelism that well exceeds the memory capacity of GPUs. One solution to this problem is to transfer data between the GPU and host memories frequently. In this work, we investigate another approach. We run massively data-parallel applications on GPU clusters. We further propose a programming model for massive data parallelism with data dependencies for this scenario. Experience from micro benchmarks and real-world applications shows that our model provides not only ease of programming but also significant performance gains

  19. Heterogeneous Multicore Parallel Programming for Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Francois Bodin

    2009-01-01

    Full Text Available Hybrid parallel multicore architectures based on graphics processing units (GPUs can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a unintrusive manner while preserving the legacy code.

  20. Is poker a skill game? New insights from statistical physics

    Science.gov (United States)

    Javarone, Marco Alberto

    2015-06-01

    During last years poker has gained a lot of prestige in several countries and, besides being one of the most famous card games, it represents a modern challenge for scientists belonging to different communities, spanning from artificial intelligence to physics and from psychology to mathematics. Unlike games like chess, the task of classifying the nature of poker (i.e., as “skill game” or gambling) seems really hard and it also constitutes a current problem, whose solution has several implications. In general, gambling offers equal winning probabilities both to rational players (i.e., those that use a strategy) and to irrational ones (i.e., those without a strategy). Therefore, in order to uncover the nature of poker, a viable way is comparing performances of rational vs. irrational players during a series of challenges. Recently, a work on this topic revealed that rationality is a fundamental ingredient to succeed in poker tournaments. In this study we analyze a simple model of poker challenges by a statistical physics approach, with the aim to uncover the nature of this game. As main result we found that, under particular conditions, few irrational players can turn poker into gambling. Therefore, although rationality is a key ingredient to succeed in poker, also the format of challenges has an important role in these dynamics, as it can strongly influence the underlying nature of the game. The importance of our results lies on the related implications, as for instance in identifying the limits within which poker can be considered as a “skill game” and, as a consequence, which kind of format must be chosen to devise algorithms able to face humans.

  1. Testing New Programming Paradigms with NAS Parallel Benchmarks

    Science.gov (United States)

    Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

    2000-01-01

    Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage

  2. Scientific programming on massively parallel processor CP-PACS

    International Nuclear Information System (INIS)

    Boku, Taisuke

    1998-01-01

    The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)

  3. Final Report: Center for Programming Models for Scalable Parallel Computing

    Energy Technology Data Exchange (ETDEWEB)

    Mellor-Crummey, John [William Marsh Rice University

    2011-09-13

    As part of the Center for Programming Models for Scalable Parallel Computing, Rice University collaborated with project partners in the design, development and deployment of language, compiler, and runtime support for parallel programming models to support application development for the “leadership-class” computer systems at DOE national laboratories. Work over the course of this project has focused on the design, implementation, and evaluation of a second-generation version of Coarray Fortran. Research and development efforts of the project have focused on the CAF 2.0 language, compiler, runtime system, and supporting infrastructure. This has involved working with the teams that provide infrastructure for CAF that we rely on, implementing new language and runtime features, producing an open source compiler that enabled us to evaluate our ideas, and evaluating our design and implementation through the use of benchmarks. The report details the research, development, findings, and conclusions from this work.

  4. Feedback Driven Annotation and Refactoring of Parallel Programs

    DEFF Research Database (Denmark)

    Larsen, Per

    and communication in embedded programs. Runtime checks are developed to ensure that annotations correctly describe observable program behavior. The performance impact of runtime checking is evaluated on several benchmark kernels and is negligible in all cases. The second aspect is compilation feedback. Annotations...... are not effective unless programmers are told how and when they are benecial. A prototype compilation feedback system was developed in collaboration with IBM Haifa Research Labs. It reports issues that prevent further analysis to the programmer. Performance evaluation shows that three programs performes signicantly......This thesis combines programmer knowledge and feedback to improve modeling and optimization of software. The research is motivated by two observations. First, there is a great need for automatic analysis of software for embedded systems - to expose and model parallelism inherent in programs. Second...

  5. Emotional and Social Factors influence Poker Decision Making Accuracy.

    Science.gov (United States)

    Laakasuo, Michael; Palomäki, Jussi; Salmela, Mikko

    2015-09-01

    Poker is a social game, where success depends on both game strategic knowledge and emotion regulation abilities. Thus, poker provides a productive environment for studying the effects of emotional and social factors on micro-economic decision making. Previous research indicates that experiencing negative emotions, such as moral anger, reduces mathematical accuracy in poker decision making. Furthermore, various social aspects of the game—such as losing against "bad players" due to "bad luck"—seem to fuel these emotional states. We designed an Internet-based experiment, where participants' (N = 459) mathematical accuracy in five different poker decision making tasks were assessed. In addition, we manipulated the emotional and social conditions under which the tasks were presented, in a 2 × 2 experimental setup: (1) Anger versus neutral emotional state—participants were primed either with an anger-inducing, or emotionally neutral story, and (2) Social cue versus non-social cue—during the tasks, either an image of a pair of human eyes was "following" the mouse cursor, or an image of a black moving box was presented. The results showed that anger reduced mathematical accuracy of decision making only when participants were "being watched" by a pair of moving eyes. Experienced poker players made mathematically more accurate decisions than inexperienced ones. The results contribute to current understanding on how emotional and social factors influence decision making accuracy in economic games.

  6. Anxiety, Depression and Emotion Regulation Among Regular Online Poker Players.

    Science.gov (United States)

    Barrault, Servane; Bonnaire, Céline; Herrmann, Florian

    2017-12-01

    Poker is a type of gambling that has specific features, including the need to regulate one's emotion to be successful. The aim of the present study is to assess emotion regulation, anxiety and depression in a sample of regular poker players, and to compare the results of problem and non-problem gamblers. 416 regular online poker players completed online questionnaires including sociodemographic data, measures of problem gambling (CPGI), anxiety and depression (HAD scale), and emotion regulation (ERQ). The CPGI was used to divide participants into four groups according to the intensity of their gambling practice (non-problem, low risk, moderate risk and problem gamblers). Anxiety and depression were significantly higher among severe-problem gamblers than among the other groups. Both significantly predicted problem gambling. On the other hand, there was no difference between groups in emotion regulation (cognitive reappraisal and expressive suppression), which was linked neither to problem gambling nor to anxiety and depression (except for cognitive reappraisal, which was significantly correlated to anxiety). Our results underline the links between anxiety, depression and problem gambling among poker players. If emotion regulation is involved in problem gambling among poker players, as strongly suggested by data from the literature, the emotion regulation strategies we assessed (cognitive reappraisal and expressive suppression) may not be those involved. Further studies are thus needed to investigate the involvement of other emotion regulation strategies.

  7. Online Poker Gambling in University Students: Further Findings from an Online Survey

    Science.gov (United States)

    Griffiths, Mark; Parke, Jonathan; Wood, Richard; Rigbye, Jane

    2010-01-01

    Online poker is one of the fastest growing forms of online gambling yet there has been relatively little research to date. This study comprised 422 online poker players (362 males and 60 females) and investigated some of the predicting factors of online poker success and problem gambling using an online questionnaire. Results showed that length of…

  8. A scalable parallel algorithm for multiple objective linear programs

    Science.gov (United States)

    Wiecek, Malgorzata M.; Zhang, Hong

    1994-01-01

    This paper presents an ADBASE-based parallel algorithm for solving multiple objective linear programs (MOLP's). Job balance, speedup and scalability are of primary interest in evaluating efficiency of the new algorithm. Implementation results on Intel iPSC/2 and Paragon multiprocessors show that the algorithm significantly speeds up the process of solving MOLP's, which is understood as generating all or some efficient extreme points and unbounded efficient edges. The algorithm gives specially good results for large and very large problems. Motivation and justification for solving such large MOLP's are also included.

  9. MPI_XSTAR: MPI-based parallelization of XSTAR program

    Science.gov (United States)

    Danehkar, A.

    2017-12-01

    MPI_XSTAR parallelizes execution of multiple XSTAR runs using Message Passing Interface (MPI). XSTAR (ascl:9910.008), part of the HEASARC's HEAsoft (ascl:1408.004) package, calculates the physical conditions and emission spectra of ionized gases. MPI_XSTAR invokes XSTINITABLE from HEASoft to generate a job list of XSTAR commands for given physical parameters. The job list is used to make directories in ascending order, where each individual XSTAR is spawned on each processor and outputs are saved. HEASoft's XSTAR2TABLE program is invoked upon the contents of each directory in order to produce table model FITS files for spectroscopy analysis tools.

  10. Parallelization and checkpointing of GPU applications through program transformation

    Energy Technology Data Exchange (ETDEWEB)

    Solano-Quinde, Lizandro Damian [Iowa State Univ., Ames, IA (United States)

    2012-01-01

    GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solve the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and

  11. Beyond chance? The persistence of performance in online poker.

    Directory of Open Access Journals (Sweden)

    Rogier J D Potter van Loon

    Full Text Available A major issue in the widespread controversy about the legality of poker and the appropriate taxation of winnings is whether poker should be considered a game of skill or a game of chance. To inform this debate we present an analysis into the role of skill in the performance of online poker players, using a large database with hundreds of millions of player-hand observations from real money ring games at three different stakes levels. We find that players whose earlier profitability was in the top (bottom deciles perform better (worse and are substantially more likely to end up in the top (bottom performance deciles of the following time period. Regression analyses of performance on historical performance and other skill-related proxies provide further evidence for persistence and predictability. Simulations point out that skill dominates chance when performance is measured over 1,500 or more hands of play.

  12. The Glasgow Parallel Reduction Machine: Programming Shared-memory Many-core Systems using Parallel Task Composition

    Directory of Open Access Journals (Sweden)

    Ashkan Tousimojarad

    2013-12-01

    Full Text Available We present the Glasgow Parallel Reduction Machine (GPRM, a novel, flexible framework for parallel task-composition based many-core programming. We allow the programmer to structure programs into task code, written as C++ classes, and communication code, written in a restricted subset of C++ with functional semantics and parallel evaluation. In this paper we discuss the GPRM, the virtual machine framework that enables the parallel task composition approach. We focus the discussion on GPIR, the functional language used as the intermediate representation of the bytecode running on the GPRM. Using examples in this language we show the flexibility and power of our task composition framework. We demonstrate the potential using an implementation of a merge sort algorithm on a 64-core Tilera processor, as well as on a conventional Intel quad-core processor and an AMD 48-core processor system. We also compare our framework with OpenMP tasks in a parallel pointer chasing algorithm running on the Tilera processor. Our results show that the GPRM programs outperform the corresponding OpenMP codes on all test platforms, and can greatly facilitate writing of parallel programs, in particular non-data parallel algorithms such as reductions.

  13. An object-oriented programming paradigm for parallelization of computational fluid dynamics

    International Nuclear Information System (INIS)

    Ohta, Takashi.

    1997-03-01

    We propose an object-oriented programming paradigm for parallelization of scientific computing programs, and show that the approach can be a very useful strategy. Generally, parallelization of scientific programs tends to be complicated and unportable due to the specific requirements of each parallel computer or compiler. In this paper, we show that the object-oriented programming design, which separates the parallel processing parts from the solver of the applications, can achieve the large improvement in the maintenance of the codes, as well as the high portability. We design the program for the two-dimensional Euler equations according to the paradigm, and evaluate the parallel performance on IBM SP2. (author)

  14. Clausewitz: On Poker, How Today’s Leaders Can Use Poker to Better Prepare Tomorrow’s Warriors

    Science.gov (United States)

    2007-01-01

    thing applies to combat leaders, and that is why the military could benefit by teaching poker. Rocks, Maniacs, Calling Stations and TA’s Clausewitz...much like churches around the world have done with bingo . Instead of playing for money, perhaps contestants could compete for a 3-day pass, or free

  15. Massive parallel electromagnetic field simulation program JEMS-FDTD design and implementation on jasmin

    International Nuclear Information System (INIS)

    Li Hanyu; Zhou Haijing; Dong Zhiwei; Liao Cheng; Chang Lei; Cao Xiaolin; Xiao Li

    2010-01-01

    A large-scale parallel electromagnetic field simulation program JEMS-FDTD(J Electromagnetic Solver-Finite Difference Time Domain) is designed and implemented on JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). This program can simulate propagation, radiation, couple of electromagnetic field by solving Maxwell equations on structured mesh explicitly with FDTD method. JEMS-FDTD is able to simulate billion-mesh-scale problems on thousands of processors. In this article, the program is verified by simulating the radiation of an electric dipole. A beam waveguide is simulated to demonstrate the capability of large scale parallel computation. A parallel performance test indicates that a high parallel efficiency is obtained. (authors)

  16. Exploiting variability for energy optimization of parallel programs

    Energy Technology Data Exchange (ETDEWEB)

    Lavrijsen, Wim [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Iancu, Costin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); de Jong, Wibe [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Chen, Xin [Georgia Inst. of Technology, Atlanta, GA (United States); Schwan, Karsten [Georgia Inst. of Technology, Atlanta, GA (United States)

    2016-04-18

    Here in this paper we present optimizations that use DVFS mechanisms to reduce the total energy usage in scientific applications. Our main insight is that noise is intrinsic to large scale parallel executions and it appears whenever shared resources are contended. The presence of noise allows us to identify and manipulate any program regions amenable to DVFS. When compared to previous energy optimizations that make per core decisions using predictions of the running time, our scheme uses a qualitative approach to recognize the signature of executions amenable to DVFS. By recognizing the "shape of variability" we can optimize codes with highly dynamic behavior, which pose challenges to all existing DVFS techniques. We validate our approach using offline and online analyses for one-sided and two-sided communication paradigms. We have applied our methods to NWChem, and we show best case improvements in energy use of 12% at no loss in performance when using online optimizations running on 720 Haswell cores with one-sided communication. With NWChem on MPI two-sided and offline analysis, capturing the initialization, we find energy savings of up to 20%, with less than 1% performance cost.

  17. MulticoreBSP for C : A high-performance library for shared-memory parallel programming

    NARCIS (Netherlands)

    Yzelman, A. N.; Bisseling, R. H.; Roose, D.; Meerbergen, K.

    2014-01-01

    The bulk synchronous parallel (BSP) model, as well as parallel programming interfaces based on BSP, classically target distributed-memory parallel architectures. In earlier work, Yzelman and Bisseling designed a MulticoreBSP for Java library specifically for shared-memory architectures. In the

  18. Judgments under competition and uncertainty : empirical evidence from online poker

    OpenAIRE

    Engelbergs, Jörg

    2010-01-01

    Playing poker has many aspects in common with the making of business decisions. Agents act in a strategically rich, dynamic environment, where they are repeatedly facing competition under uncertain prospects. Their main goal is to maximize their resources. However simple this goal is put down, it is difficult to fulfill. As vast and rich research has shown, heuristics and biases affect human decision-making. This work adds to these findings by providing empirical evidence for behavioral patte...

  19. Synergistic Information Processing Encrypts Strategic Reasoning in Poker.

    Science.gov (United States)

    Frey, Seth; Albino, Dominic K; Williams, Paul L

    2018-06-14

    There is a tendency in decision-making research to treat uncertainty only as a problem to be overcome. But it is also a feature that can be leveraged, particularly in social interaction. Comparing the behavior of profitable and unprofitable poker players, we reveal a strategic use of information processing that keeps decision makers unpredictable. To win at poker, a player must exploit public signals from others. But using public inputs makes it easier for an observer to reconstruct that player's strategy and predict his or her behavior. How should players trade off between exploiting profitable opportunities and remaining unexploitable themselves? Using a recent multivariate approach to information theoretic data analysis and 1.75 million hands of online two-player No-Limit Texas Hold'em, we find that the important difference between winning and losing players is not in the amount of information they process, but how they process it. In particular, winning players are better at integrative information processing-creating new information from the interaction between their cards and their opponents' signals. We argue that integrative information processing does not just produce better decisions, it makes decision-making harder for others to reverse engineer, as an expert poker player's cards act like the private key in public-key cryptography. Poker players encrypt their reasoning with the way they process information. The encryption function of integrative information processing makes it possible for players to exploit others while remaining unexploitable. By recognizing the act of information processing as a strategic behavior in its own right, we offer a detailed account of how experts use endemic uncertainty to conceal their intentions in high-stakes competitive environments, and we highlight new opportunities between cognitive science, information theory, and game theory. Copyright © 2018 Cognitive Science Society, Inc.

  20. Effects of Testosterone Administration on Strategic Gambling in Poker Play.

    Science.gov (United States)

    van Honk, Jack; Will, Geert-Jan; Terburg, David; Raub, Werner; Eisenegger, Christoph; Buskens, Vincent

    2016-01-04

    Testosterone has been associated with economically egoistic and materialistic behaviors, but -defensibly driven by reputable status seeking- also with economically fair, generous and cooperative behaviors. Problematically, social status and economic resources are inextricably intertwined in humans, thus testosterone's primal motives are concealed. We critically addressed this issue by performing a placebo-controlled single-dose testosterone administration in young women, who played a game of bluff poker wherein concerns for status and resources collide. The profit-maximizing strategy in this game is to mislead the other players by bluffing randomly (independent of strength of the hand), thus also when holding very poor cards (cold bluffing). The profit-maximizing strategy also dictates the players in this poker game to never call the other players' bluffs. For reputable-status seeking these materialistic strategies are disadvantageous; firstly, being caught cold bluffing damages one's reputation by revealing deceptive intent, and secondly, not calling the other players' bluffs signals submission in blindly tolerating deception. Here we show that testosterone administration in this game of bluff poker significantly reduces random bluffing, as well as cold bluffing, while significantly increasing calling. Our data suggest that testosterone in humans primarily motivates for reputable-status seeking, even when this elicits behaviors that are economically disadvantageous.

  1. Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

    Science.gov (United States)

    Quealy, Angela; Cole, Gary L.; Blech, Richard A.

    1993-01-01

    The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.

  2. MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

    Science.gov (United States)

    Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

    2018-02-01

    We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.

  3. Poker, Sports Betting, and Less Popular Alternatives: Status, Friendship Networks, and Male Adolescent Gambling

    Science.gov (United States)

    DiCicco-Bloom, Benjamin; Romer, Daniel

    2012-01-01

    The authors argue that the recent increase in poker play among adolescent males in the United States was primarily attributable to high-status male youth who are more able to organize "informal" gambling games (e.g., poker and sports betting) than are low-status male youth who are left to gamble on "formal" games (e.g., lotteries and slot…

  4. Parallel Libraries to support High-Level Programming

    DEFF Research Database (Denmark)

    Larsen, Morten Nørgaard

    and the Microsoft .NET iv framework. Normally, one would not directly think of the .NET framework when talking scientific applications, but Microsoft has in the last couple of versions of .NET introduce a number of tools for writing parallel and high performance code. The first section examines how programmers can...

  5. On the effective parallel programming of multi-core processors

    NARCIS (Netherlands)

    Varbanescu, A.L.

    2010-01-01

    Multi-core processors are considered now the only feasible alternative to the large single-core processors which have become limited by technological aspects such as power consumption and heat dissipation. However, due to their inherent parallel structure and their diversity, multi-cores are

  6. HPC parallel programming model for gyrokinetic MHD simulation

    International Nuclear Information System (INIS)

    Naitou, Hiroshi; Yamada, Yusuke; Tokuda, Shinji; Ishii, Yasutomo; Yagi, Masatoshi

    2011-01-01

    The 3-dimensional gyrokinetic PIC (particle-in-cell) code for MHD simulation, Gpic-MHD, was installed on SR16000 (“Plasma Simulator”), which is a scalar cluster system consisting of 8,192 logical cores. The Gpic-MHD code advances particle and field quantities in time. In order to distribute calculations over large number of logical cores, the total simulation domain in cylindrical geometry was broken up into N DD-r × N DD-z (number of radial decomposition times number of axial decomposition) small domains including approximately the same number of particles. The axial direction was uniformly decomposed, while the radial direction was non-uniformly decomposed. N RP replicas (copies) of each decomposed domain were used (“particle decomposition”). The hybrid parallelization model of multi-threads and multi-processes was employed: threads were parallelized by the auto-parallelization and N DD-r × N DD-z × N RP processes were parallelized by MPI (message-passing interface). The parallelization performance of Gpic-MHD was investigated for the medium size system of N r × N θ × N z = 1025 × 128 × 128 mesh with 4.196 or 8.192 billion particles. The highest speed for the fixed number of logical cores was obtained for two threads, the maximum number of N DD-z , and optimum combination of N DD-r and N RP . The observed optimum speeds demonstrated good scaling up to 8,192 logical cores. (author)

  7. Adapting high-level language programs for parallel processing using data flow

    Science.gov (United States)

    Standley, Hilda M.

    1988-01-01

    EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.

  8. Communications oriented programming of parallel iterative solutions of sparse linear systems

    Science.gov (United States)

    Patrick, M. L.; Pratt, T. W.

    1986-01-01

    Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.

  9. Compiler and Runtime Support for Programming in Adaptive Parallel Environments

    Science.gov (United States)

    1998-10-15

    noother job is waiting for resources, and use a smaller number of processors when other jobs needresources. Setia et al. [15, 20] have shown that such...15] Vijay K. Naik, Sanjeev Setia , and Mark Squillante. Performance analysis of job scheduling policiesin parallel supercomputing environments. In...on networks ofheterogeneous workstations. Technical Report CSE-94-012, Oregon Graduate Institute of Scienceand Technology, 1994.[20] Sanjeev Setia

  10. Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

    KAUST Repository

    Hamam, Alwaleed A.

    2017-03-13

    Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it\\'s time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.

  11. Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

    KAUST Repository

    Hamam, Alwaleed A.; Khan, Ayaz H.

    2017-01-01

    Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it's time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.

  12. User's guide of parallel program development environment (PPDE). The 2nd edition

    International Nuclear Information System (INIS)

    Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio; Ohta, Hirofumi

    2000-03-01

    The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a parallelizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)

  13. Poker Player Behavior After Big Wins and Big Losses

    OpenAIRE

    Gary Smith; Michael Levere; Robert Kurtzman

    2009-01-01

    We find that experienced poker players typically change their style of play after winning or losing a big pot--most notably, playing less cautiously after a big loss, evidently hoping for lucky cards that will erase their loss. This finding is consistent with Kahneman and Tversky's (Kahneman, D., A. Tversky. 1979. Prospect theory: An analysis of decision under risk. Econometrica 47(2) 263-292) break-even hypothesis and suggests that when investors incur a large loss, it might be time to take ...

  14. Protocol-Based Verification of Message-Passing Parallel Programs

    DEFF Research Database (Denmark)

    López-Acosta, Hugo-Andrés; Eduardo R. B. Marques, Eduardo R. B.; Martins, Francisco

    2015-01-01

    We present ParTypes, a type-based methodology for the verification of Message Passing Interface (MPI) programs written in the C programming language. The aim is to statically verify programs against protocol specifications, enforcing properties such as fidelity and absence of deadlocks. We develo...

  15. Feasibility studies for a high energy physics MC program on massive parallel platforms

    International Nuclear Information System (INIS)

    Bertolotto, L.M.; Peach, K.J.; Apostolakis, J.; Bruschini, C.E.; Calafiura, P.; Gagliardi, F.; Metcalf, M.; Norton, A.; Panzer-Steindel, B.

    1994-01-01

    The parallelization of a Monte Carlo program for the NA48 experiment is presented. As a first step, a task farming structure was realized. Based on this, a further step, making use of a distributed database for showers in the electro-magnetic calorimeter, was implemented. Further possibilities for using parallel processing for a quasi-real time calibration of the calorimeter are described

  16. Cell verification of parallel burnup calculation program MCBMPI based on MPI

    International Nuclear Information System (INIS)

    Yang Wankui; Liu Yaoguang; Ma Jimin; Wang Guanbo; Yang Xin; She Ding

    2014-01-01

    The parallel burnup calculation program MCBMPI was developed. The program was modularized. The parallel MCNP5 program MCNP5MPI was employed as neutron transport calculation module. And a composite of three solution methods was used to solve burnup equation, i.e. matrix exponential technique, TTA analytical solution, and Gauss Seidel iteration. MPI parallel zone decomposition strategy was concluded in the program. The program system only consists of MCNP5MPI and burnup subroutine. The latter achieves three main functions, i.e. zone decomposition, nuclide transferring and decaying, and data exchanging with MCNP5MPI. Also, the program was verified with the pressurized water reactor (PWR) cell burnup benchmark. The results show that it,s capable to apply the program to burnup calculation of multiple zones, and the computation efficiency could be significantly improved with the development of computer hardware. (authors)

  17. 76 FR 62808 - Pilot Program for Parallel Review of Medical Products

    Science.gov (United States)

    2011-10-11

    ... voluntary participation in the pilot program, as well as the guiding principles the Agencies intend to... 57045), parallel review is intended to reduce the time between FDA marketing approval and CMS national...

  18. F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

    Science.gov (United States)

    DiNucci, David C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).

  19. The technical feasibility of uranium enrichment for nuclear bomb construction at the parallel nuclear program plant

    International Nuclear Information System (INIS)

    Rosa, L.P.

    1990-01-01

    It is discussed the hole of the Parallel Nuclear Program is Brazil and the feasibility of uranium enrichment for nuclear bomb construction. This program involves two research centers, one belonging to the brazilian navy and another to the aeronautics. Some other brazilian institutes like CTA, IPEN, COPESP and CETEX and also taking part in the program. (A.C.A.S.)

  20. Development and benchmark verification of a parallelized Monte Carlo burnup calculation program MCBMPI

    International Nuclear Information System (INIS)

    Yang Wankui; Liu Yaoguang; Ma Jimin; Yang Xin; Wang Guanbo

    2014-01-01

    MCBMPI, a parallelized burnup calculation program, was developed. The program is modularized. Neutron transport calculation module employs the parallelized MCNP5 program MCNP5MPI, and burnup calculation module employs ORIGEN2, with the MPI parallel zone decomposition strategy. The program system only consists of MCNP5MPI and an interface subroutine. The interface subroutine achieves three main functions, i.e. zone decomposition, nuclide transferring and decaying, data exchanging with MCNP5MPI. Also, the program was verified with the Pressurized Water Reactor (PWR) cell burnup benchmark, the results showed that it's capable to apply the program to burnup calculation of multiple zones, and the computation efficiency could be significantly improved with the development of computer hardware. (authors)

  1. Vdebug: debugging tool for parallel scientific programs. Design report on vdebug

    International Nuclear Information System (INIS)

    Matsuda, Katsuyuki; Takemiya, Hiroshi

    2000-02-01

    We report on a debugging tool called vdebug which supports debugging work for parallel scientific simulation programs. It is difficult to debug scientific programs with an existing debugger, because the volume of data generated by the programs is too large for users to check data in characters. Usually, the existing debugger shows data values in characters. To alleviate it, we have developed vdebug which enables to check the validity of large amounts of data by showing these data values visually. Although targets of vdebug have been restricted to sequential programs, we have made it applicable to parallel programs by realizing the function of merging and visualizing data distributed on programs on each computer node. Now, vdebug works on seven kinds of parallel computers. In this report, we describe the design of vdebug. (author)

  2. User's guide of parallel program development environment (PPDE). The 2nd edition

    Energy Technology Data Exchange (ETDEWEB)

    Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio [Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute, Tokyo (Japan); Ohta, Hirofumi [Hitachi Ltd., Tokyo (Japan)

    2000-03-01

    The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a paralleilizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)

  3. User's guide of parallel program development environment (PPDE). The 2nd edition

    Energy Technology Data Exchange (ETDEWEB)

    Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio [Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute, Tokyo (Japan); Ohta, Hirofumi [Hitachi Ltd., Tokyo (Japan)

    2000-03-01

    The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a paralleilizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)

  4. Online and live regular poker players: Do they differ in impulsive sensation seeking and gambling practice?

    Science.gov (United States)

    Barrault, Servane; Varescon, Isabelle

    2016-01-01

    Background and aims Online gambling appears to have special features, such as anonymity, speed of play and permanent availability, which may contribute to the facilitation and increase in gambling practice, potentially leading to problem gambling. The aims of this study were to assess sociodemographic characteristics, gambling practice and impulsive sensation seeking among a population of regular poker players with different levels of gambling intensity and to compare online and live players. Methods 245 regular poker players (180 online players and 65 live players) completed online self-report scales assessing sociodemographic data, pathological gambling (SOGS), gambling practice (poker questionnaire) and impulsive sensation seeking (ImpSS). We used SOGS scores to rank players according to the intensity of their gambling practice (non-pathological gamblers, problem gamblers and pathological gamblers). Results All poker players displayed a particular sociodemographic profile: they were more likely to be young men, executives or students, mostly single and working full-time. Online players played significantly more often whereas live players reported significantly longer gambling sessions. Sensation seeking was high across all groups, whereas impulsivity significantly distinguished players according to the intensity of gambling. Discussion Our results show the specific profile of poker players. Both impulsivity and sensation seeking seem to be involved in pathological gambling, but playing different roles. Sensation seeking may determine interest in poker whereas impulsivity may be involved in pathological gambling development and maintenance. Conclusions This study opens up new research perspectives and insights into preventive and treatment actions for pathological poker players. PMID:28092187

  5. On the Performance of the Python Programming Language for Serial and Parallel Scientific Computations

    Directory of Open Access Journals (Sweden)

    Xing Cai

    2005-01-01

    Full Text Available This article addresses the performance of scientific applications that use the Python programming language. First, we investigate several techniques for improving the computational efficiency of serial Python codes. Then, we discuss the basic programming techniques in Python for parallelizing serial scientific applications. It is shown that an efficient implementation of the array-related operations is essential for achieving good parallel performance, as for the serial case. Once the array-related operations are efficiently implemented, probably using a mixed-language implementation, good serial and parallel performance become achievable. This is confirmed by a set of numerical experiments. Python is also shown to be well suited for writing high-level parallel programs.

  6. Effects of alcohol, initial gambling outcomes, impulsivity, and gambling cognitions on gambling behavior using a video poker task.

    Science.gov (United States)

    Corbin, William R; Cronce, Jessica M

    2017-06-01

    Drinking and gambling frequently co-occur, and concurrent gambling and drinking may lead to greater negative consequences than either behavior alone. Building on prior research on the effects of alcohol, initial gambling outcomes, impulsivity, and gambling cognitions on gambling behaviors using a chance-based (nonstrategic) slot-machine task, the current study explored the impact of these factors on a skill-based (strategic) video poker task. We anticipated larger average bets and greater gambling persistence under alcohol relative to placebo, and expected alcohol effects to be moderated by initial gambling outcomes, impulsivity, and gambling cognitions. Participants (N = 162; 25.9% female) were randomly assigned to alcohol (target BrAC = .08g%) or placebo and were given $10 to wager on a simulated video poker task, which was programmed to produce 1 of 3 initial outcomes (win, breakeven, or lose) before beginning a progressive loss schedule. Despite evidence for validity of the video poker task and alcohol administration paradigm, primary hypotheses were not supported. Individuals who received alcohol placed smaller wagers than participants in the placebo condition, though this effect was not statistically significant, and the direction of effects was reversed in at-risk gamblers (n = 41). These findings contradict prior research and suggest that alcohol effects on gambling behavior may differ by gambling type (nonstrategic vs. strategic games). Interventions that suggest alcohol is universally disinhibiting may be at odds with young adults' lived experience and thus be less effective than those that recognize the greater complexity of alcohol effects. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  7. Process-Oriented Parallel Programming with an Application to Data-Intensive Computing

    OpenAIRE

    Givelberg, Edward

    2014-01-01

    We introduce process-oriented programming as a natural extension of object-oriented programming for parallel computing. It is based on the observation that every class of an object-oriented language can be instantiated as a process, accessible via a remote pointer. The introduction of process pointers requires no syntax extension, identifies processes with programming objects, and enables processes to exchange information simply by executing remote methods. Process-oriented programming is a h...

  8. Programming a massively parallel, computation universal system: static behavior

    Energy Technology Data Exchange (ETDEWEB)

    Lapedes, A.; Farber, R.

    1986-01-01

    In previous work by the authors, the ''optimum finding'' properties of Hopfield neural nets were applied to the nets themselves to create a ''neural compiler.'' This was done in such a way that the problem of programming the attractors of one neural net (called the Slave net) was expressed as an optimization problem that was in turn solved by a second neural net (the Master net). In this series of papers that approach is extended to programming nets that contain interneurons (sometimes called ''hidden neurons''), and thus deals with nets capable of universal computation. 22 refs.

  9. Concurrent Programming Using Actors: Exploiting Large-Scale Parallelism,

    Science.gov (United States)

    1985-10-07

    ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT. TASK* Artificial Inteligence Laboratory AREA Is WORK UNIT NUMBERS 545 Technology Square...D-R162 422 CONCURRENT PROGRMMIZNG USING f"OS XL?ITP TEH l’ LARGE-SCALE PARALLELISH(U) NASI AC E Al CAMBRIDGE ARTIFICIAL INTELLIGENCE L. G AGHA ET AL...RESOLUTION TEST CHART N~ATIONAL BUREAU OF STANDA.RDS - -96 A -E. __ _ __ __’ .,*- - -- •. - MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL

  10. COMPSs-Mobile: parallel programming for mobile-cloud computing

    OpenAIRE

    Lordan Gomis, Francesc-Josep; Badia Sala, Rosa Maria

    2016-01-01

    The advent of Cloud and the popularization of mobile devices have led us to a shift in computing access. Computing users will have an interaction display while the real computation will be performed remotely, in the Cloud. COMPSs-Mobile is a framework that aims to ease the development of energy-efficient and high-performing applications for this environment. The framework provides an infrastructure-unaware programming model that allows developers to code regular Android applications that, ...

  11. Compiling the parallel programming language NestStep to the CELL processor

    OpenAIRE

    Holm, Magnus

    2010-01-01

    The goal of this project is to create a source-to-source compiler which will translate NestStep code to C code. The compiler's job is to replace NestStep constructs with a series of function calls to the NestStep runtime system. NestStep is a parallel programming language extension based on the BSP model. It adds constructs for parallel programming on top of an imperative programming language. For this project, only constructs extending the C language are relevant. The output code will compil...

  12. 76 FR 66309 - Pilot Program for Parallel Review of Medical Products; Correction

    Science.gov (United States)

    2011-10-26

    ... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Medicare and Medicaid Services [CMS-3180-N2] Food and Drug Administration [Docket No. FDA-2010-N-0308] Pilot Program for Parallel Review of Medical... 11, 2011 (76 FR 62808). The document announced a pilot program for sponsors of innovative device...

  13. A Tool for Performance Modeling of Parallel Programs

    Directory of Open Access Journals (Sweden)

    J.A. González

    2003-01-01

    Full Text Available Current performance prediction analytical models try to characterize the performance behavior of actual machines through a small set of parameters. In practice, substantial deviations are observed. These differences are due to factors as memory hierarchies or network latency. A natural approach is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each "communication block". Unfortunately, to use this approach implies that the evaluation of parameters must be done for each algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We present a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter, giving us, among other information, the values of those parameters.

  14. A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison.

    Science.gov (United States)

    Martins, W S; Del Cuvillo, J B; Useche, F J; Theobald, K B; Gao, G R

    2001-01-01

    This paper discusses the issues involved in implementing a dynamic programming algorithm for biological sequence comparison on a general-purpose parallel computing platform based on a fine-grain event-driven multithreaded program execution model. Fine-grain multithreading permits efficient parallelism exploitation in this application both by taking advantage of asynchronous point-to-point synchronizations and communication with low overheads and by effectively tolerating latency through the overlapping of computation and communication. We have implemented our scheme on EARTH, a fine-grain event-driven multithreaded execution and architecture model which has been ported to a number of parallel machines with off-the-shelf processors. Our experimental results show that the dynamic programming algorithm can be efficiently implemented on EARTH systems with high performance (e.g., speedup of 90 on 120 nodes), good programmability and reasonable cost.

  15. One-Year Prospective Study on Passion and Gambling Problems in Poker Players.

    Science.gov (United States)

    Morvannou, Adèle; Dufour, Magali; Brunelle, Natacha; Berbiche, Djamal; Roy, Élise

    2018-06-01

    The concept of passion is relevant to understanding gambling behaviours and gambling problems. Longitudinal studies are useful to better understand the absence and development of gambling problems; however, only one study has specifically considered poker players. Using a longitudinal design, this study aims to determine the influence, 1 year later, of two forms of passion-harmonious and obsessive-on gambling problems in poker players. A total of 116 poker players was recruited from across Quebec, Canada. The outcome variable of interest was participants' category on the Canadian Pathological Gambling Index, and the predictive variable was the Gambling Passion Scale. Multiple logistic regression analyses were conducted to identify independent risk factors of at-risk poker players 1 year later. Obsessive passion at baseline doubled the risk of gambling problems 1 year later (p passion, there was no association. Number of gambling activities, drug problems, and impulsivity were also associated with at-risk gambling. This study highlights the links between obsessive passion and at-risk behaviours among poker players. It is therefore important to prevent the development of obsessive passion among poker players.

  16. High performance parallelism pearls 2 multicore and many-core programming approaches

    CERN Document Server

    Jeffers, Jim

    2015-01-01

    High Performance Parallelism Pearls Volume 2 offers another set of examples that demonstrate how to leverage parallelism. Similar to Volume 1, the techniques included here explain how to use processors and coprocessors with the same programming - illustrating the most effective ways to combine Xeon Phi coprocessors with Xeon and other multicore processors. The book includes examples of successful programming efforts, drawn from across industries and domains such as biomed, genetics, finance, manufacturing, imaging, and more. Each chapter in this edited work includes detailed explanations of t

  17. Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

    Science.gov (United States)

    Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

    2014-10-30

    Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.

  18. High performance parallel computers for science: New developments at the Fermilab advanced computer program

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.

    1988-08-01

    Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs

  19. Empirical valence bond models for reactive potential energy surfaces: a parallel multilevel genetic program approach.

    Science.gov (United States)

    Bellucci, Michael A; Coker, David F

    2011-07-28

    We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics

  20. Parallel programming of saccades during natural scene viewing: evidence from eye movement positions.

    Science.gov (United States)

    Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng

    2013-10-24

    Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.

  1. A language for data-parallel and task parallel programming dedicated to multi-SIMD computers. Contributions to hydrodynamic simulation with lattice gases

    International Nuclear Information System (INIS)

    Pic, Marc Michel

    1995-01-01

    Parallel programming covers task-parallelism and data-parallelism. Many problems need both parallelisms. Multi-SIMD computers allow hierarchical approach of these parallelisms. The T++ language, based on C++, is dedicated to exploit Multi-SIMD computers using a programming paradigm which is an extension of array-programming to tasks managing. Our language introduced array of independent tasks to achieve separately (MIMD), on subsets of processors of identical behaviour (SIMD), in order to translate the hierarchical inclusion of data-parallelism in task-parallelism. To manipulate in a symmetrical way tasks and data we propose meta-operations which have the same behaviour on tasks arrays and on data arrays. We explain how to implement this language on our parallel computer SYMPHONIE in order to profit by the locally-shared memory, by the hardware virtualization, and by the multiplicity of communications networks. We analyse simultaneously a typical application of such architecture. Finite elements scheme for Fluid mechanic needs powerful parallel computers and requires large floating points abilities. Lattice gases is an alternative to such simulations. Boolean lattice bases are simple, stable, modular, need to floating point computation, but include numerical noise. Boltzmann lattice gases present large precision of computation, but needs floating points and are only locally stable. We propose a new scheme, called multi-bit, who keeps the advantages of each boolean model to which it is applied, with large numerical precision and reduced noise. Experiments on viscosity, physical behaviour, noise reduction and spurious invariants are shown and implementation techniques for parallel Multi-SIMD computers detailed. (author) [fr

  2. Clausewitz: On Poker (Clausewitz was a TA). How Today’s Leaders Can Use Poker to Better Prepare Tomorrow’s Warriors

    Science.gov (United States)

    2007-01-01

    thing applies to combat leaders, and that is why the military could benefit by teaching poker. Rocks, Maniacs, Calling Stations and TA’s Clausewitz...much like churches around the world have done with bingo . Instead of playing for money, perhaps contestants could compete for a 3-day pass, or free

  3. Generalized Analytical Program of Thyristor Phase Control Circuit with Series and Parallel Resonance Load

    OpenAIRE

    Nakanishi, Sen-ichiro; Ishida, Hideaki; Himei, Toyoji

    1981-01-01

    The systematic analytical method is reqUired for the ac phase control circuit by means of an inverse parallel thyristor pair which has a series and parallel L-C resonant load, because the phase control action causes abnormal and interesting phenomena, such as an extreme increase of voltage and current, an unique increase and decrease of contained higher harmonics, and a wide variation of power factor, etc. In this paper, the program for the analysis of the thyristor phase control circuit with...

  4. Teaching Scientific Computing: A Model-Centered Approach to Pipeline and Parallel Programming with C

    Directory of Open Access Journals (Sweden)

    Vladimiras Dolgopolovas

    2015-01-01

    Full Text Available The aim of this study is to present an approach to the introduction into pipeline and parallel computing, using a model of the multiphase queueing system. Pipeline computing, including software pipelines, is among the key concepts in modern computing and electronics engineering. The modern computer science and engineering education requires a comprehensive curriculum, so the introduction to pipeline and parallel computing is the essential topic to be included in the curriculum. At the same time, the topic is among the most motivating tasks due to the comprehensive multidisciplinary and technical requirements. To enhance the educational process, the paper proposes a novel model-centered framework and develops the relevant learning objects. It allows implementing an educational platform of constructivist learning process, thus enabling learners’ experimentation with the provided programming models, obtaining learners’ competences of the modern scientific research and computational thinking, and capturing the relevant technical knowledge. It also provides an integral platform that allows a simultaneous and comparative introduction to pipelining and parallel computing. The programming language C for developing programming models and message passing interface (MPI and OpenMP parallelization tools have been chosen for implementation.

  5. Academic training: From Evolution Theory to Parallel and Distributed Genetic Programming

    CERN Multimedia

    2007-01-01

    2006-2007 ACADEMIC TRAINING PROGRAMME LECTURE SERIES 15, 16 March From 11:00 to 12:00 - Main Auditorium, bldg. 500 From Evolution Theory to Parallel and Distributed Genetic Programming F. FERNANDEZ DE VEGA / Univ. of Extremadura, SP Lecture No. 1: From Evolution Theory to Evolutionary Computation Evolutionary computation is a subfield of artificial intelligence (more particularly computational intelligence) involving combinatorial optimization problems, which are based to some degree on the evolution of biological life in the natural world. In this tutorial we will review the source of inspiration for this metaheuristic and its capability for solving problems. We will show the main flavours within the field, and different problems that have been successfully solved employing this kind of techniques. Lecture No. 2: Parallel and Distributed Genetic Programming The successful application of Genetic Programming (GP, one of the available Evolutionary Algorithms) to optimization problems has encouraged an ...

  6. Algorithmic differentiation of pragma-defined parallel regions differentiating computer programs containing OpenMP

    CERN Document Server

    Förster, Michael

    2014-01-01

    Numerical programs often use parallel programming techniques such as OpenMP to compute the program's output values as efficient as possible. In addition, derivative values of these output values with respect to certain input values play a crucial role. To achieve code that computes not only the output values simultaneously but also the derivative values, this work introduces several source-to-source transformation rules. These rules are based on a technique called algorithmic differentiation. The main focus of this work lies on the important reverse mode of algorithmic differentiation. The inh

  7. Run-Time and Compiler Support for Programming in Adaptive Parallel Environments

    Directory of Open Access Journals (Sweden)

    Guy Edjlali

    1997-01-01

    Full Text Available For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at run-time. In this article, we discuss run-time support for data-parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a run-time library to provide this support. We discuss how the run-time library can be used by compilers of high-performance Fortran (HPF-like languages to generate code for an adaptive environment. We present performance results for a Navier-Stokes solver and a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network of nondedicated workstations, which are likely to be an important resource for parallel programming in the future.

  8. P3T+: A Performance Estimator for Distributed and Parallel Programs

    Directory of Open Access Journals (Sweden)

    T. Fahringer

    2000-01-01

    Full Text Available Developing distributed and parallel programs on today's multiprocessor architectures is still a challenging task. Particular distressing is the lack of effective performance tools that support the programmer in evaluating changes in code, problem and machine sizes, and target architectures. In this paper we introduce P3T+ which is a performance estimator for mostly regular HPF (High Performance Fortran programs but partially covers also message passing programs (MPI. P3T+ is unique by modeling programs, compiler code transformations, and parallel and distributed architectures. It computes at compile-time a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. Several novel technologies are employed to compute these parameters: loop iteration spaces, array access patterns, and data distributions are modeled by employing highly effective symbolic analysis. Communication is estimated by simulating the behavior of a communication library used by the underlying compiler. Computation times are predicted through pre-measured kernels on every target architecture of interest. We carefully model most critical architecture specific factors such as cache lines sizes, number of cache lines available, startup times, message transfer time per byte, etc. P3T+ has been implemented and is closely integrated with the Vienna High Performance Compiler (VFC to support programmers develop parallel and distributed applications. Experimental results for realistic kernel codes taken from real-world applications are presented to demonstrate both accuracy and usefulness of P3T+.

  9. Towards Interactive Visual Exploration of Parallel Programs using a Domain-Specific Language

    KAUST Repository

    Klein, Tobias

    2016-04-19

    The use of GPUs and the massively parallel computing paradigm have become wide-spread. We describe a framework for the interactive visualization and visual analysis of the run-time behavior of massively parallel programs, especially OpenCL kernels. This facilitates understanding a program\\'s function and structure, finding the causes of possible slowdowns, locating program bugs, and interactively exploring and visually comparing different code variants in order to improve performance and correctness. Our approach enables very specific, user-centered analysis, both in terms of the recording of the run-time behavior and the visualization itself. Instead of having to manually write instrumented code to record data, simple code annotations tell the source-to-source compiler which code instrumentation to generate automatically. The visualization part of our framework then enables the interactive analysis of kernel run-time behavior in a way that can be very specific to a particular problem or optimization goal, such as analyzing the causes of memory bank conflicts or understanding an entire parallel algorithm.

  10. Methodologies and Tools for Tuning Parallel Programs: 80% Art, 20% Science, and 10% Luck

    Science.gov (United States)

    Yan, Jerry C.; Bailey, David (Technical Monitor)

    1996-01-01

    The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. In the past few years, the ubiquitous introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CRI's Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance instrumentation/monitor/tuning technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g. AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.

  11. Enabling Requirements-Based Programming for Highly-Dependable Complex Parallel and Distributed Systems

    Science.gov (United States)

    Hinchey, Michael G.; Rash, James L.; Rouff, Christopher A.

    2005-01-01

    The manual application of formal methods in system specification has produced successes, but in the end, despite any claims and assertions by practitioners, there is no provable relationship between a manually derived system specification or formal model and the customer's original requirements. Complex parallel and distributed system present the worst case implications for today s dearth of viable approaches for achieving system dependability. No avenue other than formal methods constitutes a serious contender for resolving the problem, and so recognition of requirements-based programming has come at a critical juncture. We describe a new, NASA-developed automated requirement-based programming method that can be applied to certain classes of systems, including complex parallel and distributed systems, to achieve a high degree of dependability.

  12. An approach to multicore parallelism using functional programming: A case study based on Presburger Arithmetic

    DEFF Research Database (Denmark)

    Dung, Phan Anh; Hansen, Michael Reichhardt

    2015-01-01

    In this paper we investigate multicore parallelism in the context of functional programming by means of two quantifier-elimination procedures for Presburger Arithmetic: one is based on Cooper’s algorithm and the other is based on the Omega Test. We first develop correct-by-construction prototype...... platform executing on an 8-core machine. A speedup of approximately 4 was obtained for Cooper’s algorithm and a speedup of approximately 6 was obtained for the exact-shadow part of the Omega Test. The considered procedures are complex, memory-intense algorithms on huge formula trees and the case study...... reveals more general applicable techniques and guideline for deriving parallel algorithms from sequential ones in the context of data-intensive tree algorithms. The obtained insights should apply for any strict and impure functional programming language. Furthermore, the results obtained for the exact...

  13. Dynamic programming in parallel boundary detection with application to ultrasound intima-media segmentation.

    Science.gov (United States)

    Zhou, Yuan; Cheng, Xinyao; Xu, Xiangyang; Song, Enmin

    2013-12-01

    Segmentation of carotid artery intima-media in longitudinal ultrasound images for measuring its thickness to predict cardiovascular diseases can be simplified as detecting two nearly parallel boundaries within a certain distance range, when plaque with irregular shapes is not considered. In this paper, we improve the implementation of two dynamic programming (DP) based approaches to parallel boundary detection, dual dynamic programming (DDP) and piecewise linear dual dynamic programming (PL-DDP). Then, a novel DP based approach, dual line detection (DLD), which translates the original 2-D curve position to a 4-D parameter space representing two line segments in a local image segment, is proposed to solve the problem while maintaining efficiency and rotation invariance. To apply the DLD to ultrasound intima-media segmentation, it is imbedded in a framework that employs an edge map obtained from multiplication of the responses of two edge detectors with different scales and a coupled snake model that simultaneously deforms the two contours for maintaining parallelism. The experimental results on synthetic images and carotid arteries of clinical ultrasound images indicate improved performance of the proposed DLD compared to DDP and PL-DDP, with respect to accuracy and efficiency. Copyright © 2013 Elsevier B.V. All rights reserved.

  14. Towards Interactive Visual Exploration of Parallel Programs using a Domain-Specific Language

    KAUST Repository

    Klein, Tobias; Bruckner, Stefan; Grö ller, M. Eduard; Hadwiger, Markus; Rautek, Peter

    2016-01-01

    The use of GPUs and the massively parallel computing paradigm have become wide-spread. We describe a framework for the interactive visualization and visual analysis of the run-time behavior of massively parallel programs, especially OpenCL kernels. This facilitates understanding a program's function and structure, finding the causes of possible slowdowns, locating program bugs, and interactively exploring and visually comparing different code variants in order to improve performance and correctness. Our approach enables very specific, user-centered analysis, both in terms of the recording of the run-time behavior and the visualization itself. Instead of having to manually write instrumented code to record data, simple code annotations tell the source-to-source compiler which code instrumentation to generate automatically. The visualization part of our framework then enables the interactive analysis of kernel run-time behavior in a way that can be very specific to a particular problem or optimization goal, such as analyzing the causes of memory bank conflicts or understanding an entire parallel algorithm.

  15. A backtracking algorithm for the stream AND-parallel execution of logic programs

    Energy Technology Data Exchange (ETDEWEB)

    Somogyi, Z.; Ramamohanarao, K.; Vaghani, J. (Univ. of Melbourne, Parkville (Australia))

    1988-06-01

    The authors present the first backtracking algorithm for stream AND-parallel logic programs. It relies on compile-time knowledge of the data flow graph of each clause to let it figure out efficiently which goals to kill or restart when a goal fails. This crucial information, which they derive from mode declarations, was not available at compile-time in any previous stream AND-parallel system. They show that modes can increase the precision of the backtracking algorithm, though their algorithm allows this precision to be traded off against overhead on a procedure-by-procedure and call-by-call basis. The modes also allow their algorithm to handle efficiently programs that manipulate partially instantiated data structures and an important class of programs with circular dependency graphs. On code that does not need backtracking, the efficiency of their algorithm approaches that of the committed-choice languages; on code that does need backtracking its overhead is comparable to that of the independent AND-parallel backtracking algorithms.

  16. HEALTHY INDIVIDUALS DECISION MAKING IN ONLINE POKER: PERSPECTIVES ON EMOTIONAL STABILITY AND WELLBEING

    OpenAIRE

    Laakasuo, Michael

    2015-01-01

    In our previous studies it has been found that a phenomenon labeled tilting is a form of moral anger. When players are in tilt they make a series of bad decisions, chase their losses and express anger by cursing their opponents. In the context of tilting, the players also report episodes of memory loss. Additionally, we also developed a scale that measures the level of a player's poker experience, and we found evidence to suggest that poker experience is associated with mature self-reflec...

  17. Development, Verification and Validation of Parallel, Scalable Volume of Fluid CFD Program for Propulsion Applications

    Science.gov (United States)

    West, Jeff; Yang, H. Q.

    2014-01-01

    There are many instances involving liquid/gas interfaces and their dynamics in the design of liquid engine powered rockets such as the Space Launch System (SLS). Some examples of these applications are: Propellant tank draining and slosh, subcritical condition injector analysis for gas generators, preburners and thrust chambers, water deluge mitigation for launch induced environments and even solid rocket motor liquid slag dynamics. Commercially available CFD programs simulating gas/liquid interfaces using the Volume of Fluid approach are currently limited in their parallel scalability. In 2010 for instance, an internal NASA/MSFC review of three commercial tools revealed that parallel scalability was seriously compromised at 8 cpus and no additional speedup was possible after 32 cpus. Other non-interface CFD applications at the time were demonstrating useful parallel scalability up to 4,096 processors or more. Based on this review, NASA/MSFC initiated an effort to implement a Volume of Fluid implementation within the unstructured mesh, pressure-based algorithm CFD program, Loci-STREAM. After verification was achieved by comparing results to the commercial CFD program CFD-Ace+, and validation by direct comparison with data, Loci-STREAM-VoF is now the production CFD tool for propellant slosh force and slosh damping rate simulations at NASA/MSFC. On these applications, good parallel scalability has been demonstrated for problems sizes of tens of millions of cells and thousands of cpu cores. Ongoing efforts are focused on the application of Loci-STREAM-VoF to predict the transient flow patterns of water on the SLS Mobile Launch Platform in order to support the phasing of water for launch environment mitigation so that vehicle determinantal effects are not realized.

  18. The FORCE: A portable parallel programming language supporting computational structural mechanics

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Brehm, Juergen; Ramanan, Aruna

    1989-01-01

    This project supports the conversion of codes in Computational Structural Mechanics (CSM) to a parallel form which will efficiently exploit the computational power available from multiprocessors. The work is a part of a comprehensive, FORTRAN-based system to form a basis for a parallel version of the NICE/SPAR combination which will form the CSM Testbed. The software is macro-based and rests on the force methodology developed by the principal investigator in connection with an early scientific multiprocessor. Machine independence is an important characteristic of the system so that retargeting it to the Flex/32, or any other multiprocessor on which NICE/SPAR might be imnplemented, is well supported. The principal investigator has experience in producing parallel software for both full and sparse systems of linear equations using the force macros. Other researchers have used the Force in finite element programs. It has been possible to rapidly develop software which performs at maximum efficiency on a multiprocessor. The inherent machine independence of the system also means that the parallelization will not be limited to a specific multiprocessor.

  19. Purification and Characterization of Hemagglutinating Proteins from Poker-Chip Venus (Meretrix lusoria and Corbicula Clam (Corbicula fluminea

    Directory of Open Access Journals (Sweden)

    Chin-Fu Cheng

    2012-01-01

    Full Text Available Hemagglutinating proteins (HAPs were purified from Poker-chip Venus (Meretrix lusoria and Corbicula clam (Corbicula fluminea using gel-filtration chromatography on a Sephacryl S-300 column. The molecular weights of the HAPs obtained from Poker-chip Venus and Corbicula clam were 358 kDa and 380 kDa, respectively. Purified HAP from Poker-chip Venus yielded two subunits with molecular weights of 26 kDa and 29 kDa. However, only one HAP subunit was purified from Corbicula clam, and its molecular weight was 32 kDa. The two Poker-chip Venus HAPs possessed hemagglutinating ability (HAA for erythrocytes of some vertebrate animal species, especially tilapia. Moreover, HAA of the HAP purified from Poker-chip Venus was higher than that of the HAP of Corbicula clam. Furthermore, Poker-chip Venus HAPs possessed better HAA at a pH higher than 7.0. When the temperature was at 4°C–10°C or the salinity was less than 0.5‰, the two Poker-chip Venus HAPs possessed better HAA compared with that of Corbicula clam.

  20. Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-12-31

    This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.

  1. Effect on the Grain Size Distribution when Preparing Sand Using Poker Vibrators

    DEFF Research Database (Denmark)

    Nielsen, Søren Dam

    2017-01-01

    At Aalborg University and other research institutions, model tests are performed on small -scale foundations. These foundations are often installed in sand which has to be prepared in a reproductive way. At Aalborg University the preparation is done by using poker vibrators. This paper investigat...

  2. Exploring Chemical Equilibrium with Poker Chips: A General Chemistry Laboratory Exercise

    Science.gov (United States)

    Bindel, Thomas H.

    2012-01-01

    A hands-on laboratory exercise at the general chemistry level introduces students to chemical equilibrium through a simulation that uses poker chips and rate equations. More specifically, the exercise allows students to explore reaction tables, dynamic chemical equilibrium, equilibrium constant expressions, and the equilibrium constant based on…

  3. Selling Internet Gambling: Advertising, New Media and the Content of Poker Promotion

    Science.gov (United States)

    McMullan, John L.; Kervin, Melissa

    2012-01-01

    This study examines the web design and engineering, advertising and marketing, and pedagogical features present at a random sample of 71 international poker sites obtained from the Casino City directory in the summer of 2009. We coded for 22 variables related to access, appeal, player protection, customer services, on-site security, use of images,…

  4. Strategic Alliance Poker: Demonstrating the Importance of Complementary Resources and Trust in Strategic Alliance Management

    Science.gov (United States)

    Reutzel, Christopher R.; Worthington, William J.; Collins, Jamie D.

    2012-01-01

    Strategic Alliance Poker (SAP) provides instructors with an opportunity to integrate the resource based view with their discussion of strategic alliances in undergraduate Strategic Management courses. Specifically, SAP provides Strategic Management instructors with an experiential exercise that can be used to illustrate the value creation…

  5. Is poker a game of skill or chance? A quasi-experimental study.

    Science.gov (United States)

    Meyer, Gerhard; von Meduna, Marc; Brosowski, Tim; Hayer, Tobias

    2013-09-01

    Due to intensive marketing and the rapid growth of online gambling, poker currently enjoys great popularity among large sections of the population. Although poker is legally a game of chance in most countries, some (particularly operators of private poker web sites) argue that it should be regarded as a game of skill or sport because the outcome of the game primarily depends on individual aptitude and skill. The available findings indicate that skill plays a meaningful role; however, serious methodological weaknesses and the absence of reliable information regarding the relative importance of chance and skill considerably limit the validity of extant research. Adopting a quasi-experimental approach, the present study examined the extent to which the influence of poker playing skill was more important than card distribution. Three average players and three experts sat down at a six-player table and played 60 computer-based hands of the poker variant "Texas Hold'em" for money. In each hand, one of the average players and one expert received (a) better-than-average cards (winner's box), (b) average cards (neutral box) and (c) worse-than-average cards (loser's box). The standardized manipulation of the card distribution controlled the factor of chance to determine differences in performance between the average and expert groups. Overall, 150 individuals participated in a "fixed-limit" game variant, and 150 individuals participated in a "no-limit" game variant. ANOVA results showed that experts did not outperform average players in terms of final cash balance. Rather, card distribution was the decisive factor for successful poker playing. However, expert players were better able to minimize losses when confronted with disadvantageous conditions (i.e., worse-than-average cards). No significant differences were observed between the game variants. Furthermore, supplementary analyses confirm differential game-related actions dependent on the card distribution, player status

  6. Fast implementations of 3D PET reconstruction using vector and parallel programming techniques

    International Nuclear Information System (INIS)

    Guerrero, T.M.; Cherry, S.R.; Dahlbom, M.; Ricci, A.R.; Hoffman, E.J.

    1993-01-01

    Computationally intensive techniques that offer potential clinical use have arisen in nuclear medicine. Examples include iterative reconstruction, 3D PET data acquisition and reconstruction, and 3D image volume manipulation including image registration. One obstacle in achieving clinical acceptance of these techniques is the computational time required. This study focuses on methods to reduce the computation time for 3D PET reconstruction through the use of fast computer hardware, vector and parallel programming techniques, and algorithm optimization. The strengths and weaknesses of i860 microprocessor based workstation accelerator boards are investigated in implementations of 3D PET reconstruction

  7. A program system for ab initio MO calculations on vector and parallel processing machines. Pt. 1

    International Nuclear Information System (INIS)

    Ernenwein, R.; Rohmer, M.M.; Benard, M.

    1990-01-01

    We present a program system for ab initio molecular orbital calculations on vector and parallel computers. The present article is devoted to the computation of one- and two-electron integrals over contracted Gaussian basis sets involving s-, p-, d- and f-type functions. The McMurchie and Davidson (MMD) algorithm has been implemented and parallelized by distributing over a limited number of logical tasks the calculation of the 55 relevant classes of integrals. All sections of the MMD algorithm have been efficiently vectorized, leading to a scalar/vector ratio of 5.8. Different algorithms are proposed and compared for an optimal vectorization of the contraction of the 'intermediate integrals' generated by the MMD formalism. Advantage is taken of the dynamic storage allocation for tuning the length of the vector loops (i.e. the size of the vectorization buffer) as a function of (i) the total memory available for the job, (ii) the number of logical tasks defined by the user (≤13), and (iii) the storage requested by each specific class of integrals. Test calculations carried out on a CRAY-2 computer show that the average number of finite integrals computed over a (s, p, d, f) CGTO basis set is about 1180000 per second and per processor. The combination of vectorization and parallelism on this 4-processor machine reduces the CPU time by a factor larger than 20 with respect to the scalar and sequential performance. (orig.)

  8. A program system for ab initio MO calculations on vector and parallel processing machines. Pt. 3

    International Nuclear Information System (INIS)

    Wiest, R.; Demuynck, J.; Benard, M.; Rohmer, M.M.; Ernenwein, R.

    1991-01-01

    This series of three papers presents a program system for ab initio molecular orbital calculations on vector and parallel computers. Part III is devoted to the four-index transformation on a molecular orbital basis of size NMO of the file of two-electorn integrals (pqparallelrs) generated by a contracted Gaussian set of size NATO (number of atomic orbitals). A fast Yoshimine algorithm first sorts the (pqparallelrs) integrals with respect to index pq only. This file of half-sorted integrals labelled by their rs-index can be processed without further modification to generate either the transformed integrals or the supermatrix elements. The large memory available on the CRAY-2 hase made possible to implement the transformation algorithm proposed by Bender in 1972, which requires a core-storage allocation varying as (NATO) 3 . Two versions of Bender's algorithm are included in the present program. The first version is an in-core version, where the complete file of accumulated contributions to transformed integrals in stored and updated in central memory. This version has been parallelized by distributing over a limited number of logical tasks the NATO steps corresponding to the scanning of the most external loop. The second version is an out-of-core version, in which twin files are alternatively used as input and output for the accumulated contributions to transformed integrals. This version is not parallel. The choice of one or another version and (for version 1) the determination of the number of tasks depends upon the balance between the available and the requested amounts of storage. The storage management and the choice of the proper version are carried out automatically using dynamic storage allocation. Both versions are vectorized and take advantage of the molecular symmetry. (orig.)

  9. Searching for globally optimal functional forms for interatomic potentials using genetic programming with parallel tempering.

    Science.gov (United States)

    Slepoy, A; Peters, M D; Thompson, A P

    2007-11-30

    Molecular dynamics and other molecular simulation methods rely on a potential energy function, based only on the relative coordinates of the atomic nuclei. Such a function, called a force field, approximately represents the electronic structure interactions of a condensed matter system. Developing such approximate functions and fitting their parameters remains an arduous, time-consuming process, relying on expert physical intuition. To address this problem, a functional programming methodology was developed that may enable automated discovery of entirely new force-field functional forms, while simultaneously fitting parameter values. The method uses a combination of genetic programming, Metropolis Monte Carlo importance sampling and parallel tempering, to efficiently search a large space of candidate functional forms and parameters. The methodology was tested using a nontrivial problem with a well-defined globally optimal solution: a small set of atomic configurations was generated and the energy of each configuration was calculated using the Lennard-Jones pair potential. Starting with a population of random functions, our fully automated, massively parallel implementation of the method reproducibly discovered the original Lennard-Jones pair potential by searching for several hours on 100 processors, sampling only a minuscule portion of the total search space. This result indicates that, with further improvement, the method may be suitable for unsupervised development of more accurate force fields with completely new functional forms. Copyright (c) 2007 Wiley Periodicals, Inc.

  10. "To Bluff like a Man or Fold like a Girl?" - Gender Biased Deceptive Behavior in Online Poker.

    Science.gov (United States)

    Palomäki, Jussi; Yan, Jeff; Modic, David; Laakasuo, Michael

    2016-01-01

    Evolutionary psychology suggests that men are more likely than women to deceive to bolster their status and influence. Also gender perception influences deceptive behavior, which is linked to pervasive gender stereotypes: women are typically viewed as weaker and more gullible than men. We assessed bluffing in an online experiment (N = 502), where participants made decisions to bluff or not in simulated poker tasks against opponents represented by avatars. Participants bluffed on average 6% more frequently at poker tables with female-only avatars than at tables with male-only or gender mixed avatars-a highly significant effect in games involving repeated decisions. Nonetheless, participants did not believe the avatar genders affected their decisions. Males bluffed 13% more frequently than females. Unlike most economic games employed exclusively in research contexts, online poker is played for money by tens of millions of people worldwide. Thus, gender effects in bluffing have significant monetary consequences for poker players.

  11. "To Bluff like a Man or Fold like a Girl?" - Gender Biased Deceptive Behavior in Online Poker.

    Directory of Open Access Journals (Sweden)

    Jussi Palomäki

    Full Text Available Evolutionary psychology suggests that men are more likely than women to deceive to bolster their status and influence. Also gender perception influences deceptive behavior, which is linked to pervasive gender stereotypes: women are typically viewed as weaker and more gullible than men. We assessed bluffing in an online experiment (N = 502, where participants made decisions to bluff or not in simulated poker tasks against opponents represented by avatars. Participants bluffed on average 6% more frequently at poker tables with female-only avatars than at tables with male-only or gender mixed avatars-a highly significant effect in games involving repeated decisions. Nonetheless, participants did not believe the avatar genders affected their decisions. Males bluffed 13% more frequently than females. Unlike most economic games employed exclusively in research contexts, online poker is played for money by tens of millions of people worldwide. Thus, gender effects in bluffing have significant monetary consequences for poker players.

  12. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

    Science.gov (United States)

    Shrimankar, D. D.; Sathe, S. R.

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868

  13. Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

    Science.gov (United States)

    Shrimankar, D D; Sathe, S R

    2016-01-01

    Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.

  14. PAREMD: A parallel program for the evaluation of momentum space properties of atoms and molecules

    Science.gov (United States)

    Meena, Deep Raj; Gadre, Shridhar R.; Balanarayan, P.

    2018-03-01

    The present work describes a code for evaluating the electron momentum density (EMD), its moments and the associated Shannon information entropy for a multi-electron molecular system. The code works specifically for electronic wave functions obtained from traditional electronic structure packages such as GAMESS and GAUSSIAN. For the momentum space orbitals, the general expression for Gaussian basis sets in position space is analytically Fourier transformed to momentum space Gaussian basis functions. The molecular orbital coefficients of the wave function are taken as an input from the output file of the electronic structure calculation. The analytic expressions of EMD are evaluated over a fine grid and the accuracy of the code is verified by a normalization check and a numerical kinetic energy evaluation which is compared with the analytic kinetic energy given by the electronic structure package. Apart from electron momentum density, electron density in position space has also been integrated into this package. The program is written in C++ and is executed through a Shell script. It is also tuned for multicore machines with shared memory through OpenMP. The program has been tested for a variety of molecules and correlated methods such as CISD, Møller-Plesset second order (MP2) theory and density functional methods. For correlated methods, the PAREMD program uses natural spin orbitals as an input. The program has been benchmarked for a variety of Gaussian basis sets for different molecules showing a linear speedup on a parallel architecture.

  15. What is "the patient perspective" in patient engagement programs? Implicit logics and parallels to feminist theories.

    Science.gov (United States)

    Rowland, Paula; McMillan, Sarah; McGillicuddy, Patti; Richards, Joy

    2017-01-01

    Public and patient involvement (PPI) in health care may refer to many different processes, ranging from participating in decision-making about one's own care to participating in health services research, health policy development, or organizational reforms. Across these many forms of public and patient involvement, the conceptual and theoretical underpinnings remain poorly articulated. Instead, most public and patient involvement programs rely on policy initiatives as their conceptual frameworks. This lack of conceptual clarity participates in dilemmas of program design, implementation, and evaluation. This study contributes to the development of theoretical understandings of public and patient involvement. In particular, we focus on the deployment of patient engagement programs within health service organizations. To develop a deeper understanding of the conceptual underpinnings of these programs, we examined the concept of "the patient perspective" as used by patient engagement practitioners and participants. Specifically, we focused on the way this phrase was used in the singular: "the" patient perspective or "the" patient voice. From qualitative analysis of interviews with 20 patient advisers and 6 staff members within a large urban health network in Canada, we argue that "the patient perspective" is referred to as a particular kind of situated knowledge, specifically an embodied knowledge of vulnerability. We draw parallels between this logic of patient perspective and the logic of early feminist theory, including the concepts of standpoint theory and strong objectivity. We suggest that champions of patient engagement may learn much from the way feminist theorists have constructed their arguments and addressed critique.

  16. The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

    Directory of Open Access Journals (Sweden)

    Matthew O'keefe

    1995-01-01

    Full Text Available Massively parallel processors (MPPs hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.

  17. Drainage network extraction from a high-resolution DEM using parallel programming in the .NET Framework

    Science.gov (United States)

    Du, Chao; Ye, Aizhong; Gan, Yanjun; You, Jinjun; Duan, Qinyun; Ma, Feng; Hou, Jingwen

    2017-12-01

    High-resolution Digital Elevation Models (DEMs) can be used to extract high-accuracy prerequisite drainage networks. A higher resolution represents a larger number of grids. With an increase in the number of grids, the flow direction determination will require substantial computer resources and computing time. Parallel computing is a feasible method with which to resolve this problem. In this paper, we proposed a parallel programming method within the .NET Framework with a C# Compiler in a Windows environment. The basin is divided into sub-basins, and subsequently the different sub-basins operate on multiple threads concurrently to calculate flow directions. The method was applied to calculate the flow direction of the Yellow River basin from 3 arc-second resolution SRTM DEM. Drainage networks were extracted and compared with HydroSHEDS river network to assess their accuracy. The results demonstrate that this method can calculate the flow direction from high-resolution DEMs efficiently and extract high-precision continuous drainage networks.

  18. Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

    Science.gov (United States)

    Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

    2017-01-01

    In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.

  19. Mobile and replicated alignment of arrays in data-parallel programs

    Science.gov (United States)

    Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert

    1993-01-01

    When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. We solve two facets of the problem of finding alignments that reduce residual communication: we determine alignments that vary in loops, and objects that should have replicated alignments. We show that loop-dependent mobile alignment is sometimes necessary for optimum performance, and we provide algorithms with which a compiler can determine good mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. We propose an algorithm based on network flow that determines which objects to replicate so as to minimize the total amount of broadcast communication in replication. This work on mobile and replicated alignment extends our earlier work on determining static alignment.

  20. Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

    Science.gov (United States)

    Oliker, Leonid; Heber, Gerd; Biswas, Rupak

    2000-01-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

  1. Parallelizing Gene Expression Programming Algorithm in Enabling Large-Scale Classification

    Directory of Open Access Journals (Sweden)

    Lixiong Xu

    2017-01-01

    Full Text Available As one of the most effective function mining algorithms, Gene Expression Programming (GEP algorithm has been widely used in classification, pattern recognition, prediction, and other research fields. Based on the self-evolution, GEP is able to mine an optimal function for dealing with further complicated tasks. However, in big data researches, GEP encounters low efficiency issue due to its long time mining processes. To improve the efficiency of GEP in big data researches especially for processing large-scale classification tasks, this paper presents a parallelized GEP algorithm using MapReduce computing model. The experimental results show that the presented algorithm is scalable and efficient for processing large-scale classification tasks.

  2. SAM II aerosol profile measurements, Poker Flat, Alaska; July 16-19, 1979

    Science.gov (United States)

    Mccormick, M. P.; Chu, W. P.; Mcmaster, L. R.; Grams, G. W.; Herman, B. M.; Pepin, T. J.; Russell, P. B.; Swissler, T. J.

    1981-01-01

    SAM II satellite measurements during the July 1979 Poker Flat mission, yielded an aerosol extinction coefficient of 0.0004/km at 1.0 micron wavelength, in the region of the stratospheric aerosol mixing ratio peak (12-16 km). The stratospheric aerosol optical depth for these data, calculated from the tropopause through 30 km, is approximately 0.001. These results are consistent with the average 1979 summertime values found throughout the Arctic.

  3. Tracking online poker problem gamblers with player account-based gambling data only.

    Science.gov (United States)

    Luquiens, Amandine; Tanguy, Marie-Laure; Benyamina, Amine; Lagadec, Marthylle; Aubin, Henri-Jean; Reynaud, Michel

    2016-12-01

    The aim was to develop and validate an instrument to track online problem poker gamblers with player account-based gambling data (PABGD). We emailed an invitation to all active poker gamblers on the online gambling service provider Winamax. The 14,261 participants completed the Problem Gambling Severity Index (PGSI). PGSI served as a gold standard to track problem gamblers (i.e., PGSI ≥ 5). We used a stepwise logistic regression to build a predictive model of problem gambling with PABGD, and validated it. Of the sample 18% was composed of online poker problem gamblers. The risk factors of problem gambling included in the predictive model were being male, compulsive, younger than 28 years, making a total deposit > 0 euros, having a mean loss per gambling session > 1.7 euros, losing a total of > 45 euros in the last 30 days, having a total stake > 298 euros, having > 60 gambling sessions in the last 30 days, and multi-tabling. The tracking instrument had a sensitivity of 80%, and a specificity of 50%. The quality of the instrument was good. This study illustrates the feasibility of a method to develop and validate instruments to track online problem gamblers with PABGD only. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  4. Trait Mindfulness, Problem-Gambling Severity, Altered State of Awareness and Urge to Gamble in Poker-Machine Gamblers.

    Science.gov (United States)

    McKeith, Charles F A; Rock, Adam J; Clark, Gavin I

    2017-06-01

    In Australia, poker-machine gamblers represent a disproportionate number of problem gamblers. To cultivate a greater understanding of the psychological mechanisms involved in poker-machine gambling, a repeated measures cue-reactivity protocol was administered. A community sample of 38 poker-machine gamblers was assessed for problem-gambling severity and trait mindfulness. Participants were also assessed regarding altered state of awareness (ASA) and urge to gamble at baseline, following a neutral cue, and following a gambling cue. Results indicated that: (a) urge to gamble significantly increased from neutral cue to gambling cue, while controlling for baseline urge; (b) cue-reactive ASA did not significantly mediate the relationship between problem-gambling severity and cue-reactive urge (from neutral cue to gambling cue); (c) trait mindfulness was significantly negatively associated with both problem-gambling severity and cue-reactive urge (i.e., from neutral cue to gambling cue, while controlling for baseline urge); and (d) trait mindfulness did not significantly moderate the effect of problem-gambling severity on cue-reactive urge (from neutral cue to gambling cue). This is the first study to demonstrate a negative association between trait mindfulness and cue-reactive urge to gamble in a population of poker-machine gamblers. Thus, this association merits further evaluation both in relation to poker-machine gambling and other gambling modalities.

  5. Increased ventral-striatal activity during monetary decision making is a marker of problem poker gambling severity.

    Science.gov (United States)

    Brevers, Damien; Noël, Xavier; He, Qinghua; Melrose, James A; Bechara, Antoine

    2016-05-01

    The aim of this study was to examine the impact of different neural systems on monetary decision making in frequent poker gamblers, who vary in their degree of problem gambling. Fifteen frequent poker players, ranging from non-problem to high-problem gambling, and 15 non-gambler controls were scanned using functional magnetic resonance imaging (fMRI) while performing the Iowa Gambling Task (IGT). During IGT deck selection, between-group fMRI analyses showed that frequent poker gamblers exhibited higher ventral-striatal but lower dorsolateral prefrontal and orbitofrontal activations as compared with controls. Moreover, using functional connectivity analyses, we observed higher ventral-striatal connectivity in poker players, and in regions involved in attentional/motor control (posterior cingulate), visual (occipital gyrus) and auditory (temporal gyrus) processing. In poker gamblers, scores of problem gambling severity were positively associated with ventral-striatal activations and with the connectivity between the ventral-striatum seed and the occipital fusiform gyrus and the middle temporal gyrus. Present results are consistent with findings from recent brain imaging studies showing that gambling disorder is associated with heightened motivational-reward processes during monetary decision making, which may hamper one's ability to moderate his level of monetary risk taking. © 2015 Society for the Study of Addiction.

  6. SPSS and SAS programs for determining the number of components using parallel analysis and velicer's MAP test.

    Science.gov (United States)

    O'Connor, B P

    2000-08-01

    Popular statistical software packages do not have the proper procedures for determining the number of components in factor and principal components analyses. Parallel analysis and Velicer's minimum average partial (MAP) test are validated procedures, recommended widely by statisticians. However, many researchers continue to use alternative, simpler, but flawed procedures, such as the eigenvalues-greater-than-one rule. Use of the proper procedures might be increased if these procedures could be conducted within familiar software environments. This paper describes brief and efficient programs for using SPSS and SAS to conduct parallel analyses and the MAP test.

  7. Rocket measurement of auroral partial parallel distribution functions

    Science.gov (United States)

    Lin, C.-A.

    1980-01-01

    The auroral partial parallel distribution functions are obtained by using the observed energy spectra of electrons. The experiment package was launched by a Nike-Tomahawk rocket from Poker Flat, Alaska over a bright auroral band and covered an altitude range of up to 180 km. Calculated partial distribution functions are presented with emphasis on their slopes. The implications of the slopes are discussed. It should be pointed out that the slope of the partial parallel distribution function obtained from one energy spectra will be changed by superposing another energy spectra on it.

  8. Efficient storage, retrieval and analysis of poker hands: An adaptive data framework

    Directory of Open Access Journals (Sweden)

    Gorawski Marcin

    2017-12-01

    Full Text Available In online gambling, poker hands are one of the most popular and fundamental units of the game state and can be considered objects comprising all the events that pertain to the single hand played. In a situation where tens of millions of poker hands are produced daily and need to be stored and analysed quickly, the use of relational databases no longer provides high scalability and performance stability. The purpose of this paper is to present an efficient way of storing and retrieving poker hands in a big data environment. We propose a new, read-optimised storage model that offers significant data access improvements over traditional database systems as well as the existing Hadoop file formats such as ORC, RCFile or SequenceFile. Through index-oriented partition elimination, our file format allows reducing the number of file splits that needs to be accessed, and improves query response time up to three orders of magnitude in comparison with other approaches. In addition, our file format supports a range of new indexing structures to facilitate fast row retrieval at a split level. Both index types operate independently of the Hive execution context and allow other big data computational frameworks such as MapReduce or Spark to benefit from the optimized data access path to the hand information. Moreover, we present a detailed analysis of our storage model and its supporting index structures, and how they are organised in the overall data framework. We also describe in detail how predicate based expression trees are used to build effective file-level execution plans. Our experimental tests conducted on a production cluster, holding nearly 40 billion hands which span over 4000 partitions, show that multi-way partition pruning outperforms other existing file formats, resulting in faster query execution times and better cluster utilisation.

  9. From functional programming to multicore parallelism: A case study based on Presburger Arithmetic

    DEFF Research Database (Denmark)

    Dung, Phan Anh; Hansen, Michael Reichhardt

    2011-01-01

    , we are interested in using PA in connection with the Duration Calculus Model Checker (DCMC) [5]. There are effective decision procedures for PA including Cooper’s algorithm and the Omega Test; however, their complexity is extremely high with doubly exponential lower bound and triply exponential upper...... bound [7]. We investigate these decision procedures in the context of multicore parallelism with the hope of exploiting multicore powers. Unfortunately, we are not aware of any prior parallelism research related to decision procedures for PA. The closest work is the preliminary results on parallelism...

  10. Development of whole core thermal-hydraulic analysis program ACT. 4. Simplified fuel assembly model and parallelization by MPI

    International Nuclear Information System (INIS)

    Ohshima, Hiroyuki

    2001-10-01

    A whole core thermal-hydraulic analysis program ACT is being developed for the purpose of evaluating detailed in-core thermal hydraulic phenomena of fast reactors including the effect of the flow between wrapper-tube walls (inter-wrapper flow) under various reactor operation conditions. As appropriate boundary conditions in addition to a detailed modeling of the core are essential for accurate simulations of in-core thermal hydraulics, ACT consists of not only fuel assembly and inter-wrapper flow analysis modules but also a heat transport system analysis module that gives response of the plant dynamics to the core model. This report describes incorporation of a simplified model to the fuel assembly analysis module and program parallelization by a message passing method toward large-scale simulations. ACT has a fuel assembly analysis module which can simulate a whole fuel pin bundle in each fuel assembly of the core and, however, it may take much CPU time for a large-scale core simulation. Therefore, a simplified fuel assembly model that is thermal-hydraulically equivalent to the detailed one has been incorporated in order to save the simulation time and resources. This simplified model is applied to several parts of fuel assemblies in a core where the detailed simulation results are not required. With regard to the program parallelization, the calculation load and the data flow of ACT were analyzed and the optimum parallelization has been done including the improvement of the numerical simulation algorithm of ACT. Message Passing Interface (MPI) is applied to data communication between processes and synchronization in parallel calculations. Parallelized ACT was verified through a comparison simulation with the original one. In addition to the above works, input manuals of the core analysis module and the heat transport system analysis module have been prepared. (author)

  11. Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    Science.gov (United States)

    Harper, Richard

    1989-01-01

    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.

  12. Concurrent Collections (CnC): A new approach to parallel programming

    CERN Multimedia

    CERN. Geneva

    2010-01-01

    A common approach in designing parallel languages is to provide some high level handles to manipulate the use of the parallel platform. This exposes some aspects of the target platform, for example, shared vs. distributed memory. It may expose some but not all types of parallelism, for example, data parallelism but not task parallelism. This approach must find a balance between the desire to provide a simple view for the domain expert and provide sufficient power for tuning. This is hard for any given architecture and harder if the language is to apply to a range of architectures. Either simplicity or power is lost. Instead of viewing the language design problem as one of providing the programmer with high level handles, we view the problem as one of designing an interface. On one side of this interface is the programmer (domain expert) who knows the application but needs no knowledge of any aspects of the platform. On the other side of the interface is the performance expert (programmer o...

  13. Parallel Object Oriented MD Simulation Program for Long Time Simulations of Metallic Glasses and Undercooled Liquids

    Science.gov (United States)

    Böddeker, B.; Teichler, H.

    The MD simulation program TABB is motivated by the need of long time simulations for the investigation of slow processes near the glass transition of glass forming alloys. TABB is written in C++ with a high degree of flexibility: TABB allows the use of any short ranged pair potentials or EAM potentials, by generating and using a spline representation of all functions and their derivatives. TABB supports several numerical integration algorithms like the Runge-Kotta or the modified Gear-predictor-corrector algorithm of order five. The boundary conditions can be chosen to resemble the geometry of bulk materials or films. The simulation box length or the pressure can be fixed for each dimension separately. TABB may be used in isokinetic, isoenergeric or canonic (with random forces) mode. TABB contains a simple instruction interpreter to easily control the parameters and options during the simulation. The same source code can be compiled either for workstations or for parallel computers. The main optimization goal of TABB is to allow long time simulations of medium or small sized systems. To make this possible, much attention is spent on the optimized communication between the nodes. TABB uses a domain decomposition procedure. To use many nodes with a small system, the domain size has to be small compared to the range of particle interactions. In the limit of many nodes for only few atoms, the bottle neck of communication is the latency time. TABB minimizes the number of pairs of domains containing atoms that interact between these domains. This procedure minimizes the need of communication calls between pairs of nodes. TABB decides automatically, to how many, and to which directions the decomposition shall be applied. E.g., in the case of one dimensional domain decomposition, the simulation box is only split into "slabs" along a selected direction. The three dimensional domain decomposition is best with respect to the number of interacting domains only for simulations

  14. Gambling Motives: Do They Explain Cognitive Distortions in Male Poker Gamblers?

    Science.gov (United States)

    Mathieu, Sasha; Barrault, Servane; Brunault, Paul; Varescon, Isabelle

    2018-03-01

    Gambling behavior is partly the result of varied motivations leading individuals to participate in gambling activities. Specific motivational profiles are found in gamblers, and gambling motives are closely linked to the development of cognitive distortions. This cross-sectional study aimed to predict cognitive distortions from gambling motives in poker players. The population was recruited in online gambling forums. Participants reported gambling at least once a week. Data included sociodemographic characteristics, the South Oaks Gambling Screen, the Gambling Motives Questionnaire-Financial and the Gambling-Related Cognition Scale. This study was conducted on 259 male poker gamblers (aged 18-69 years, 14.3% probable pathological gamblers). Univariate analyses showed that cognitive distortions were independently predicted by overall gambling motives (34.8%) and problem gambling (22.4%) (p gambling problems, showing a close inter-relationship between gambling motives, cognitive distortions and the severity of gambling. These data are consistent with the following theoretical process model: gambling motives lead individuals to practice and repeat the gambling experience, which may lead them to develop cognitive distortions, which in turn favor problem gambling. This study opens up new research perspectives to understand better the mechanisms underlying gambling practice and has clinical implications in terms of prevention and treatment. For example, a coupled motivational and cognitive intervention focused on gambling motives/cognitive distortions could be beneficial for individuals with gambling problems.

  15. Investigating the illusion of control in mildly depressed and nondepressed individuals during video-poker play.

    Science.gov (United States)

    Dannewitz, Holly; Weatherly, Jeffrey N

    2007-05-01

    Cognitive fallacies, such as the illusion of control, and psychological disorders, such as depression, may perpetuate gambling and thus contribute to problem gambling (e.g., R. Ladouceur, C. Sylvan, C. Boutin, & C. Doucet, 2002). Gender differences may exist across these variables (e.g., N. M. Petry, 2005). The authors investigated these possibilities by recruiting mildly depressed and nondepressed individuals to play jacks or better, 5-card draw, video poker. Across three poker sessions, participants were given (a) no choice of which cards to play, (b) information on the best cards to play but control over which cards were played, or (c) no information and complete control over which cards were played. The total amount of money gambled increased as control over the game decreased, but this result correlated with an increase in the rate of play. Depressed and nondepressed participants did not differ in how they gambled, but men gambled significantly more and sometimes made more mistakes during play than did women. These results question the role of the illusion of control and depression in perpetuating gambling. They also suggest that providing players information about which cards to play may indirectly promote gambling and provide insight as to why men are more prone to suffer from gambling problems than are women.

  16. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals.

    Science.gov (United States)

    Brown, Noam; Sandholm, Tuomas

    2018-01-26

    No-limit Texas hold'em is the most popular form of poker. Despite artificial intelligence (AI) successes in perfect-information games, the private information and massive game tree have made no-limit poker difficult to tackle. We present Libratus, an AI that, in a 120,000-hand competition, defeated four top human specialist professionals in heads-up no-limit Texas hold'em, the leading benchmark and long-standing challenge problem in imperfect-information game solving. Our game-theoretic approach features application-independent techniques: an algorithm for computing a blueprint for the overall strategy, an algorithm that fleshes out the details of the strategy for subgames that are reached during play, and a self-improver algorithm that fixes potential weaknesses that opponents have identified in the blueprint strategy. Copyright © 2018, The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

  17. Development of parallel 3D discrete ordinates transport program on JASMIN framework

    International Nuclear Information System (INIS)

    Cheng, T.; Wei, J.; Shen, H.; Zhong, B.; Deng, L.

    2015-01-01

    A parallel 3D discrete ordinates radiation transport code JSNT-S is developed, aiming at simulating real-world radiation shielding and reactor physics applications in a reasonable time. Through the patch-based domain partition algorithm, the memory requirement is shared among processors and a space-angle parallel sweeping algorithm is developed based on data-driven algorithm. Acceleration methods such as partial current rebalance are implemented. The correctness is proved through the VENUS-3 and other benchmark models. In the radiation shielding calculation of the Qinshan-II reactor pressure vessel model with 24.3 billion DoF, only 88 seconds is required and the overall parallel efficiency of 44% is achieved on 1536 CPU cores. (author)

  18. An object-oriented bulk synchronous parallel library for multicore programming

    NARCIS (Netherlands)

    Yzelman, A.N.; Bisseling, R.H.

    2012-01-01

    We show that the bulk synchronous parallel (BSP) model, originally designed for distributed-memory systems, is also applicable for shared-memory multicore systems and, furthermore, that BSP libraries are useful in scientific computing on these systems. A proof-of-concept MulticoreBSP library has

  19. CUDA/GPU Technology : Parallel Programming For High Performance Scientific Computing

    OpenAIRE

    YUHENDRA; KUZE, Hiroaki; JOSAPHAT, Tetuko Sri Sumantyo

    2009-01-01

    [ABSTRACT]Graphics processing units (GP Us) originally designed for computer video cards have emerged as the most powerful chip in a high-performance workstation. In the high performance computation capabilities, graphic processing units (GPU) lead to much more powerful performance than conventional CPUs by means of parallel processing. In 2007, the birth of Compute Unified Device Architecture (CUDA) and CUDA-enabled GPUs by NVIDIA Corporation brought a revolution in the general purpose GPU a...

  20. Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

    Science.gov (United States)

    Bellerby, Tim

    2015-04-01

    PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks number of processors

  1. Computer science. Heads-up limit hold'em poker is solved.

    Science.gov (United States)

    Bowling, Michael; Burch, Neil; Johanson, Michael; Tammelin, Oskari

    2015-01-09

    Poker is a family of games that exhibit imperfect information, where players do not have full knowledge of past events. Whereas many perfect-information games have been solved (e.g., Connect Four and checkers), no nontrivial imperfect-information game played competitively by humans has previously been solved. Here, we announce that heads-up limit Texas hold'em is now essentially weakly solved. Furthermore, this computation formally proves the common wisdom that the dealer in the game holds a substantial advantage. This result was enabled by a new algorithm, CFR(+), which is capable of solving extensive-form games orders of magnitude larger than previously possible. Copyright © 2015, American Association for the Advancement of Science.

  2. Application Portable Parallel Library

    Science.gov (United States)

    Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

    1995-01-01

    Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.

  3. Neurite, a finite difference large scale parallel program for the simulation of electrical signal propagation in neurites under mechanical loading.

    Directory of Open Access Journals (Sweden)

    Julián A García-Grajales

    Full Text Available With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite--explicit and implicit--were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon

  4. Vectorization and parallelization of Monte-Carlo programs for calculation of radiation transport

    International Nuclear Information System (INIS)

    Seidel, R.

    1995-01-01

    The versatile MCNP-3B Monte-Carlo code written in FORTRAN77, for simulation of the radiation transport of neutral particles, has been subjected to vectorization and parallelization of essential parts, without touching its versatility. Vectorization is not dependent on a specific computer. Several sample tasks have been selected in order to test the vectorized MCNP-3B code in comparison to the scalar MNCP-3B code. The samples are a representative example of the 3-D calculations to be performed for simulation of radiation transport in neutron and reactor physics. (1) 4πneutron detector. (2) High-energy calorimeter. (3) PROTEUS benchmark (conversion rates and neutron multiplication factors for the HCLWR (High Conversion Light Water Reactor)). (orig./HP) [de

  5. Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

    Science.gov (United States)

    Biswas, Rupak; Das, Sajal K.; Harvey, Daniel; Oliker, Leonid

    1999-01-01

    The ability to dynamically adapt an unstructured -rid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult, particularly from the view point of portability on various multiprocessor platforms We address this problem by developing PLUM, tin automatic anti architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on, an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.

  6. Contributions for the optimization of the extensibility of parallel programing of turbulent plasmas

    International Nuclear Information System (INIS)

    Rozar, F.

    2015-01-01

    The work realized through this thesis focuses on the optimization of the Gysela code which simulates a plasma turbulence. Optimization of a scientific application concerns mainly one of the three following points: 1) the simulation of larger meshes, 2) the reduction of computing time and 3) the enhancement of the computation accuracy. The first part of this manuscript presents the contributions relative to the simulation of larger mesh. Alike many simulation codes, getting more realistic simulations is often analogous to rene the meshes. The finer the mesh the larger the memory consumption. Moreover, during these last few years, the supercomputers had trend to provide less and less memory per computer core. For these reasons, we have developed a library, the libMTM (Modeling and Tracing Memory), dedicated to study precisely the memory consumption of parallel softwares. The libMTM tools allowed us to reduce the memory consumption of Gysela and to study its scalability. As far as we know, there is no other tool which provides equivalent features which allow the memory scalability study. The second part of the manuscript presents the works relative to the optimization of the computation time and the improvement of accuracy of the gyro-average operator. This operator represents a corner stone of the gyrokinetic model which is used by the Gysela application. The improvement of accuracy emanates from a change in the computing method: a scheme based on a 2D Hermite interpolation substitutes the Pade approximation. Although the new version of the gyro-average operator is more accurate, it is also more expensive in computation time than the former one. In order to keep the simulation in reasonable time, different optimizations have been performed on the new computing method to get it competitive. Finally, we have developed a MPI parallelized version of the new gyro-average operator. The good scalability of this new gyro-average computer will allow, eventually, a reduction

  7. Parallel Programming Application to Matrix Algebra in the Spectral Method for Control Systems Analysis, Synthesis and Identification

    Directory of Open Access Journals (Sweden)

    V. Yu. Kleshnin

    2016-01-01

    Full Text Available The article describes the matrix algebra libraries based on the modern technologies of parallel programming for the Spectrum software, which can use a spectral method (in the spectral form of mathematical description to analyse, synthesise and identify deterministic and stochastic dynamical systems. The developed matrix algebra libraries use the following technologies for the GPUs: OmniThreadLibrary, OpenMP, Intel Threading Building Blocks, Intel Cilk Plus for CPUs nVidia CUDA, OpenCL, and Microsoft Accelerated Massive Parallelism.The developed libraries support matrices with real elements (single and double precision. The matrix dimensions are limited by 32-bit or 64-bit memory model and computer configuration. These libraries are general-purpose and can be used not only for the Spectrum software. They can also find application in the other projects where there is a need to perform operations with large matrices.The article provides a comparative analysis of the libraries developed for various matrix operations (addition, subtraction, scalar multiplication, multiplication, powers of matrices, tensor multiplication, transpose, inverse matrix, finding a solution of the system of linear equations through the numerical experiments using different CPU and GPU. The article contains sample programs and performance test results for matrix multiplication, which requires most of all computational resources in regard to the other operations.

  8. Practical parallel computing

    CERN Document Server

    Morse, H Stephen

    1994-01-01

    Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

  9. Speeding Up the String Comparison of the IDS Snort using Parallel Programming: A Systematic Literature Review on the Parallelized Aho-Corasick Algorithm

    Directory of Open Access Journals (Sweden)

    SILVA JUNIOR,J. B.

    2016-12-01

    Full Text Available The Intrusion Detection System (IDS needs to compare the contents of all packets arriving at the network interface with a set of signatures for indicating possible attacks, a task that consumes much CPU processing time. In order to alleviate this problem, some researchers have tried to parallelize the IDS's comparison engine, transferring execution from the CPU to GPU. This paper identifies and maps the parallelization features of the Aho-Corasick algorithm, which is used in Snort to compare patterns, in order to show this algorithm's implementation and execution issues, as well as optimization techniques for the Aho-Corasick machine. We have found 147 papers from important computer science publications databases, and have mapped them. We selected 22 and analyzed them in order to find our results. Our analysis of the papers showed, among other results, that parallelization of the AC algorithm is a new task and the authors have focused on the State Transition Table as the most common way to implement the algorithm on the GPU. Furthermore, we found that some techniques speed up the algorithm and reduce the required machine storage space are highly used, such as the algorithm running on the fastest memories and mechanisms for reducing the number of nodes and bit maping.

  10. Parallel computation

    International Nuclear Information System (INIS)

    Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

    1997-01-01

    The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment

  11. Comparative Study of Dynamic Programming and Pontryagin’s Minimum Principle on Energy Management for a Parallel Hybrid Electric Vehicle

    Directory of Open Access Journals (Sweden)

    Huei Peng

    2013-04-01

    Full Text Available This paper compares two optimal energy management methods for parallel hybrid electric vehicles using an Automatic Manual Transmission (AMT. A control-oriented model of the powertrain and vehicle dynamics is built first. The energy management is formulated as a typical optimal control problem to trade off the fuel consumption and gear shifting frequency under admissible constraints. The Dynamic Programming (DP and Pontryagin’s Minimum Principle (PMP are applied to obtain the optimal solutions. Tuning with the appropriate co-states, the PMP solution is found to be very close to that from DP. The solution for the gear shifting in PMP has an algebraic expression associated with the vehicular velocity and can be implemented more efficiently in the control algorithm. The computation time of PMP is significantly less than DP.

  12. Getting To Exascale: Applying Novel Parallel Programming Models To Lab Applications For The Next Generation Of Supercomputers

    Energy Technology Data Exchange (ETDEWEB)

    Dube, Evi [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Shereda, Charles [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Nau, Lee [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Harris, Lance [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2010-09-27

    As supercomputing moves toward exascale, node architectures will change significantly. CPU core counts on nodes will increase by an order of magnitude or more. Heterogeneous architectures will become more commonplace, with GPUs or FPGAs providing additional computational power. Novel programming models may make better use of on-node parallelism in these new architectures than do current models. In this paper we examine several of these novel models – UPC, CUDA, and OpenCL –to determine their suitability to LLNL scientific application codes. Our study consisted of several phases: We conducted interviews with code teams and selected two codes to port; We learned how to program in the new models and ported the codes; We debugged and tuned the ported applications; We measured results, and documented our findings. We conclude that UPC is a challenge for porting code, Berkeley UPC is not very robust, and UPC is not suitable as a general alternative to OpenMP for a number of reasons. CUDA is well supported and robust but is a proprietary NVIDIA standard, while OpenCL is an open standard. Both are well suited to a specific set of application problems that can be run on GPUs, but some problems are not suited to GPUs. Further study of the landscape of novel models is recommended.

  13. Introducing PROFESS 2.0: A parallelized, fully linear scaling program for orbital-free density functional theory calculations

    Science.gov (United States)

    Hung, Linda; Huang, Chen; Shin, Ilgyou; Ho, Gregory S.; Lignères, Vincent L.; Carter, Emily A.

    2010-12-01

    Orbital-free density functional theory (OFDFT) is a first principles quantum mechanics method to find the ground-state energy of a system by variationally minimizing with respect to the electron density. No orbitals are used in the evaluation of the kinetic energy (unlike Kohn-Sham DFT), and the method scales nearly linearly with the size of the system. The PRinceton Orbital-Free Electronic Structure Software (PROFESS) uses OFDFT to model materials from the atomic scale to the mesoscale. This new version of PROFESS allows the study of larger systems with two significant changes: PROFESS is now parallelized, and the ion-electron and ion-ion terms scale quasilinearly, instead of quadratically as in PROFESS v1 (L. Hung and E.A. Carter, Chem. Phys. Lett. 475 (2009) 163). At the start of a run, PROFESS reads the various input files that describe the geometry of the system (ion positions and cell dimensions), the type of elements (defined by electron-ion pseudopotentials), the actions you want it to perform (minimize with respect to electron density and/or ion positions and/or cell lattice vectors), and the various options for the computation (such as which functionals you want it to use). Based on these inputs, PROFESS sets up a computation and performs the appropriate optimizations. Energies, forces, stresses, material geometries, and electron density configurations are some of the values that can be output throughout the optimization. New version program summaryProgram Title: PROFESS Catalogue identifier: AEBN_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBN_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 68 721 No. of bytes in distributed program, including test data, etc.: 1 708 547 Distribution format: tar.gz Programming language: Fortran 90 Computer

  14. Center for Programming Models for Scalable Parallel Computing - Towards Enhancing OpenMP for Manycore and Heterogeneous Nodes

    Energy Technology Data Exchange (ETDEWEB)

    Barbara Chapman

    2012-02-01

    OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close to DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support.

  15. 77 FR 61642 - National Environmental Policy Act; Sounding Rockets Program; Poker Flat Research Range

    Science.gov (United States)

    2012-10-10

    ... the sun-earth connection. Related Environmental Documents In recent years, concerns raised by agencies... these lands. BLM and USFWS are currently considering if and how future authorizations for rocket landing...

  16. Parallelism in matrix computations

    CERN Document Server

    Gallopoulos, Efstratios; Sameh, Ahmed H

    2016-01-01

    This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...

  17. SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS

    Directory of Open Access Journals (Sweden)

    M. K. Bouza

    2017-01-01

    Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.

  18. ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

    Directory of Open Access Journals (Sweden)

    Kamatani Naoyuki

    2011-05-01

    Full Text Available Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.

  19. Chasing losses in online poker and casino games: characteristics and game play of Internet gamblers at risk of disordered gambling.

    Science.gov (United States)

    Gainsbury, Sally M; Suhonen, Niko; Saastamoinen, Jani

    2014-07-30

    Disordered Internet gambling is a psychological disorder that represents an important public health issue due to the increase in highly available and conveniently accessible Internet gambling sites. Chasing losses is one of the few observable markers of at-risk and problem gambling that may be used to detect early signs of disordered Internet gambling. This study examined loss chasing behaviour in a sample of Internet casino and poker players and the socio-demographic variables, irrational beliefs, and gambling behaviours associated with chasing losses. An online survey was completed by 10,838 Internet gamblers (58% male) from 96 countries. The results showed that Internet casino players had a greater tendency to report chasing losses than poker players and gamblers who reported chasing losses were more likely to hold irrational beliefs about gambling and spend more time and money gambling than those who reported that they were unaffected by previous losses. Gamblers who played for excitement and to win money were more likely to report chasing losses. This study is one of the largest ever studies of Internet gamblers and the results are highly significant as they provide insight into the characteristics and behaviours of gamblers using this mode of access. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  20. Parallelization in Modern C++

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...

  1. Occurrence rate of ion upflow and downflow observed by the Poker Flat Incoherent Scatter Radar (PFISR)

    Science.gov (United States)

    Zou, S.; Lu, J.; Varney, R. H.

    2017-12-01

    This study aims to investigate the occurrence rate of ion upflow and downflow events in the auroral ionosphere, using a full 3-year (2011-2013) dataset collected by the Poker Flat Incoherent Scatter Radar (PFISR) at 65.5° magnetic latitude. Ion upflow and downflow events are defined if there are three consecutive data points larger/smaller than 100/-100 m/s in the ion field-aligned velocity altitude profile. Their occurrence rates have been evaluated as a function of magnetic local time (MLT), season, geomagnetic activity, solar wind and interplanetary magnetic field (IMF). We found that the ion upflows are twice more likely to occur on the nightside than the dayside, and have slightly higher occurrence rate near Fall equinox. In contrast, the ion downflow events are more likely to occur in the afternoon sector but also during Fall equinox. In addition, the occurrence rate of ion upflows on the nightside increases when the aurora electrojet index (AE) and planetary K index (Kp) increase, while the downflows measured on the dayside clearly increase as the AE and Kp increase. In general, the occurrence rate of ion upflows increases with enhanced solar wind and IMF drivers. This correlation is particularly strong between the upflows on the nightside and the solar wind dynamic pressure and IMF Bz. The lack of correlation of upflows on the dayside with these parameters is due to the location of PFISR, which is usually equatorward of the dayside auroral zone and within the nightside auroral zone under disturbed conditions. The occurrence rate of downflow at all MLTs does not show strong dependence on the solar wind and IMF conditions. However, it occurs much more frequently on the dayside when the IMF By is strongly positive, i.e., >10 nT and the IMF Bz is strongly negative, i.e., < -10 nT. We suggest that the increased occurrence rate of downflows on the dayside is associated with dayside storm-enhanced density and the plume.

  2. Parallel rendering

    Science.gov (United States)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  3. Deceit and facial expression in children: the enabling role of the poker face child and the dependent personality of the detector

    Directory of Open Access Journals (Sweden)

    Marien eGadea

    2015-07-01

    Full Text Available This study presents the relation between the facial expression of a group of children when they told a lie and the accuracy in detecting the lie by a sample of adults. To evaluate the intensity and type of emotional content of the children’s faces, we applied an automated method capable of analysing the facial information from the video recordings (FaceReader 5.0 software. The program classified videos as showing a neutral facial expression or an emotional one. There was a significant higher mean of hits for the emotional than for the neutral videos, and a significant negative correlation between the intensity of the neutral expression and the number of hits from the judges. The lies expressed with emotional facial expression were more easily recognized by adults than the lies expressed with a poker face; thus, the less expressive was the child the harder was to guess. The accuracy of the lie detectors was then correlated with their subclinical traits of personality disorders, to find that participants scoring higher in the dependent personality were significantly better lie detectors. A non-significant tendency for women to discriminate better was also found, whereas men tending to be more suspicious than women when judging the children’s veracity. This study is the first to automatically decoding the facial information of the liar child and relating the results with personality characteristics of the lie detectors in the context of deceptive behavior research. Implications for forensic psychology were suggested: to explore whether the induction of an emotion in a child during an interview could be useful to evaluate the testimony during legal trials.

  4. Parallel computations

    CERN Document Server

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  5. Programs Lucky and LuckyC - 3D parallel transport codes for the multi-group transport equation solution for XYZ geometry by Pm Sn method

    International Nuclear Information System (INIS)

    Moriakov, A.; Vasyukhno, V.; Netecha, M.; Khacheresov, G.

    2003-01-01

    Powerful supercomputers are available today. MBC-1000M is one of Russian supercomputers that may be used by distant way access. Programs LUCKY and LUCKY C were created to work for multi-processors systems. These programs have algorithms created especially for these computers and used MPI (message passing interface) service for exchanges between processors. LUCKY may resolved shielding tasks by multigroup discreet ordinate method. LUCKY C may resolve critical tasks by same method. Only XYZ orthogonal geometry is available. Under little space steps to approximate discreet operator this geometry may be used as universal one to describe complex geometrical structures. Cross section libraries are used up to P8 approximation by Legendre polynomials for nuclear data in GIT format. Programming language is Fortran-90. 'Vector' processors may be used that lets get a time profit up to 30 times. But unfortunately MBC-1000M has not these processors. Nevertheless sufficient value for efficiency of parallel calculations was obtained under 'space' (LUCKY) and 'space and energy' (LUCKY C ) paralleling. AUTOCAD program is used to control geometry after a treatment of input data. Programs have powerful geometry module, it is a beautiful tool to achieve any geometry. Output results may be processed by graphic programs on personal computer. (authors)

  6. Parallel computing works

    Energy Technology Data Exchange (ETDEWEB)

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  7. Gunslingers, poker players, and chickens 1: Decision making under physical performance pressure in elite athletes.

    Science.gov (United States)

    Parkin, Beth L; Warriner, Katie; Walsh, Vincent

    2017-01-01

    The cognitive skills required during sport are highly demanding; accurate decisions based on the processing of dynamic environments are made in a fraction of a second (Walsh, 2014). Optimal decision-making abilities are crucial for success in sporting competition (Bar-Eli et al., 2011; Kaya, 2014). Moreover, for the elite athlete, decision making is required under conditions of intense mental and physical pressure (Anshel and Wells, 2000), yet much of the work in this area has largely ignored the highly stressful context in which athletes operate. A number of studies have shown that conditions of elevated pressure influence athletes' decision quality (Kinrade et al., 2015; Smith et al., 2016), response times (Hepler, 2015; Smith et al., 2016) and risk taking (Pighin et al., 2015). However, almost all of this work has been undertaken in nonelite athletes and participants who do not routinely operate under conditions of high stress. Thus, there is very little known about the influence of pressure on decision making in elite athletes. This study investigated the influence of physical performance pressure on decision making in a sample of world-class elite athletes. This allowed an examination of whether findings from the previous work in nonelite athletes extend to those who routinely operate under conditions of high stress. How this work could be applied to improve insight and understanding of decision making among sport professionals is examined. We sought to introduce a categorization of decision making useful to practitioners in sport: gunslingers, poker players, and chickens. Twenty-three elite athletes who compete and have frequent success at an international level (including six Olympic medal winners) performed tasks relating to three categories of decision making under conditions of low and high physical pressure. Decision making under risk was measured with performance on the Cambridge Gambling Task (CGT; Rogers et al., 1999), decision making under

  8. Parallel Simulation of Loosely Timed SystemC/TLM Programs: Challenges Raised by an Industrial Case Study

    Directory of Open Access Journals (Sweden)

    Denis Becker

    2016-05-01

    Full Text Available Transaction level models of systems-on-chip in SystemC are commonly used in the industry to provide an early simulation environment. The SystemC standard imposes coroutine semantics for the scheduling of simulated processes, to ensure determinism and reproducibility of simulations. However, because of this, sequential implementations have, for a long time, been the only option available, and still now the reference implementation is sequential. With the increasing size and complexity of models, and the multiplication of computation cores on recent machines, the parallelization of SystemC simulations is a major research concern. There have been several proposals for SystemC parallelization, but most of them are limited to cycle-accurate models. In this paper we focus on loosely timed models, which are commonly used in the industry. We present an industrial context and show that, unfortunately, most of the existing approaches for SystemC parallelization can fundamentally not apply in this context. We support this claim with a set of measurements performed on a platform used in production at STMicroelectronics. This paper surveys existing techniques, presents a visualization and profiling tool and identifies unsolved challenges in the parallelization of SystemC models at transaction level.

  9. Calculating ellipse area by the Monte Carlo method and analysing dice poker with Excel at high school

    Science.gov (United States)

    Benacka, Jan

    2016-08-01

    This paper reports on lessons in which 18-19 years old high school students modelled random processes with Excel. In the first lesson, 26 students formulated a hypothesis on the area of ellipse by using the analogy between the areas of circle, square and rectangle. They verified the hypothesis by the Monte Carlo method with a spreadsheet model developed in the lesson. In the second lesson, 27 students analysed the dice poker game. First, they calculated the probability of the hands by combinatorial formulae. Then, they verified the result with a spreadsheet model developed in the lesson. The students were given a questionnaire to find out if they found the lesson interesting and contributing to their mathematical and technological knowledge.

  10. Parallel algorithms

    CERN Document Server

    Casanova, Henri; Robert, Yves

    2008-01-01

    ""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi

  11. Embedding Topical Elements of Parallel Programming, Computer Graphics, and Artificial Intelligence across the Undergraduate CS Required Courses

    Directory of Open Access Journals (Sweden)

    James Wolfer

    2015-02-01

    Full Text Available Traditionally, topics such as parallel computing, computer graphics, and artificial intelligence have been taught as stand-alone courses in the computing curriculum. Often these are elective courses, limiting the material to the subset of students choosing to take the course. Recently there has been movement to distribute topics across the curriculum in order to ensure that all graduates have been exposed to concepts such as parallel computing. Previous work described an attempt to systematically weave a tapestry of topics into the undergraduate computing curriculum. This paper reviews that work and expands it with representative examples of assignments, demonstrations, and results as well as describing how the tools and examples deployed for these classes have a residual effect on classes such as Comptuer Literacy.

  12. Contribution to the algorithmic and efficient programming of new parallel architectures including accelerators for neutron physics and shielding computations

    International Nuclear Information System (INIS)

    Dubois, J.

    2011-01-01

    In science, simulation is a key process for research or validation. Modern computer technology allows faster numerical experiments, which are cheaper than real models. In the field of neutron simulation, the calculation of eigenvalues is one of the key challenges. The complexity of these problems is such that a lot of computing power may be necessary. The work of this thesis is first the evaluation of new computing hardware such as graphics card or massively multi-core chips, and their application to eigenvalue problems for neutron simulation. Then, in order to address the massive parallelism of supercomputers national, we also study the use of asynchronous hybrid methods for solving eigenvalue problems with this very high level of parallelism. Then we experiment the work of this research on several national supercomputers such as the Titane hybrid machine of the Computing Center, Research and Technology (CCRT), the Curie machine of the Very Large Computing Centre (TGCC), currently being installed, and the Hopper machine at the Lawrence Berkeley National Laboratory (LBNL). We also do our experiments on local workstations to illustrate the interest of this research in an everyday use with local computing resources. (author) [fr

  13. The numerical parallel computing of photon transport

    International Nuclear Information System (INIS)

    Huang Qingnan; Liang Xiaoguang; Zhang Lifa

    1998-12-01

    The parallel computing of photon transport is investigated, the parallel algorithm and the parallelization of programs on parallel computers both with shared memory and with distributed memory are discussed. By analyzing the inherent law of the mathematics and physics model of photon transport according to the structure feature of parallel computers, using the strategy of 'to divide and conquer', adjusting the algorithm structure of the program, dissolving the data relationship, finding parallel liable ingredients and creating large grain parallel subtasks, the sequential computing of photon transport into is efficiently transformed into parallel and vector computing. The program was run on various HP parallel computers such as the HY-1 (PVP), the Challenge (SMP) and the YH-3 (MPP) and very good parallel speedup has been gotten

  14. Design and Programming for Cable-Driven Parallel Robots in the German Pavilion at the EXPO 2015

    Directory of Open Access Journals (Sweden)

    Philipp Tempel

    2015-08-01

    Full Text Available In the German Pavilion at the EXPO 2015, two large cable-driven parallel robots are flying over the heads of the visitors representing two bees flying over Germany and displaying everyday life in Germany. Each robot consists of a mobile platform and eight cables suspended by winches and follows a desired trajectory, which needs to be computed in advance taking technical limitations, safety considerations and visual aspects into account. In this paper, a path planning software is presented, which includes the design process from developing a robot design and workspace estimation via planning complex trajectories considering technical limitations through to exporting a complete show. For a test trajectory, simulation results are given, which display the relevant trajectories and cable force distributions.

  15. Parallel computing: numerics, applications, and trends

    National Research Council Canada - National Science Library

    Trobec, Roman; Vajteršic, Marián; Zinterhof, Peter

    2009-01-01

    ... and/or distributed systems. The contributions to this book are focused on topics most concerned in the trends of today's parallel computing. These range from parallel algorithmics, programming, tools, network computing to future parallel computing. Particular attention is paid to parallel numerics: linear algebra, differential equations, numerica...

  16. Final Report, Center for Programming Models for Scalable Parallel Computing: Co-Array Fortran, Grant Number DE-FC02-01ER25505

    Energy Technology Data Exchange (ETDEWEB)

    Robert W. Numrich

    2008-04-22

    The major accomplishment of this project is the production of CafLib, an 'object-oriented' parallel numerical library written in Co-Array Fortran. CafLib contains distributed objects such as block vectors and block matrices along with procedures, attached to each object, that perform basic linear algebra operations such as matrix multiplication, matrix transpose and LU decomposition. It also contains constructors and destructors for each object that hide the details of data decomposition from the programmer, and it contains collective operations that allow the programmer to calculate global reductions, such as global sums, global minima and global maxima, as well as vector and matrix norms of several kinds. CafLib is designed to be extensible in such a way that programmers can define distributed grid and field objects, based on vector and matrix objects from the library, for finite difference algorithms to solve partial differential equations. A very important extra benefit that resulted from the project is the inclusion of the co-array programming model in the next Fortran standard called Fortran 2008. It is the first parallel programming model ever included as a standard part of the language. Co-arrays will be a supported feature in all Fortran compilers, and the portability provided by standardization will encourage a large number of programmers to adopt it for new parallel application development. The combination of object-oriented programming in Fortran 2003 with co-arrays in Fortran 2008 provides a very powerful programming model for high-performance scientific computing. Additional benefits from the project, beyond the original goal, include a programto provide access to the co-array model through access to the Cray compiler as a resource for teaching and research. Several academics, for the first time, included the co-array model as a topic in their courses on parallel computing. A separate collaborative project with LANL and PNNL showed how to

  17. Parallel R

    CERN Document Server

    McCallum, Ethan

    2011-01-01

    It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.

  18. Overview of the Force Scientific Parallel Language

    Directory of Open Access Journals (Sweden)

    Gita Alaghband

    1994-01-01

    Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.

  19. Comparisons of Energy Management Methods for a Parallel Plug-In Hybrid Electric Vehicle between the Convex Optimization and Dynamic Programming

    Directory of Open Access Journals (Sweden)

    Renxin Xiao

    2018-01-01

    Full Text Available This paper proposes a comparison study of energy management methods for a parallel plug-in hybrid electric vehicle (PHEV. Based on detailed analysis of the vehicle driveline, quadratic convex functions are presented to describe the nonlinear relationship between engine fuel-rate and battery charging power at different vehicle speed and driveline power demand. The engine-on power threshold is estimated by the simulated annealing (SA algorithm, and the battery power command is achieved by convex optimization with target of improving fuel economy, compared with the dynamic programming (DP based method and the charging depleting–charging sustaining (CD/CS method. In addition, the proposed control methods are discussed at different initial battery state of charge (SOC values to extend the application. Simulation results validate that the proposed strategy based on convex optimization can save the fuel consumption and reduce the computation burden obviously.

  20. Patterns for Parallel Software Design

    CERN Document Server

    Ortega-Arjona, Jorge Luis

    2010-01-01

    Essential reading to understand patterns for parallel programming Software patterns have revolutionized the way we think about how software is designed, built, and documented, and the design of parallel software requires you to consider other particular design aspects and special skills. From clusters to supercomputers, success heavily depends on the design skills of software developers. Patterns for Parallel Software Design presents a pattern-oriented software architecture approach to parallel software design. This approach is not a design method in the classic sense, but a new way of managin

  1. Jordan's 2002 to 2012 Fertility Stall and Parallel USAID Investments in Family Planning: Lessons From an Assessment to Guide Future Programming.

    Science.gov (United States)

    Spindler, Esther; Bitar, Nisreen; Solo, Julie; Menstell, Elizabeth; Shattuck, Dominick

    2017-12-28

    Health practitioners, researchers, and donors are stumped about Jordan's stalled fertility rate, which has stagnated between 3.7 and 3.5 children per woman from 2002 to 2012, above the national replacement level of 2.1. This stall paralleled United States Agency for International Development (USAID) funding investments in family planning in Jordan, triggering an assessment of USAID family planning programming in Jordan. This article describes the methods, results, and implications of the programmatic assessment. Methods included an extensive desk review of USAID programs in Jordan and 69 interviews with reproductive health stakeholders. We explored reasons for fertility stagnation in Jordan's total fertility rate (TFR) and assessed the effects of USAID programming on family planning outcomes over the same time period. The assessment results suggest that the increased use of less effective methods, in particular withdrawal and condoms, are contributing to Jordan's TFR stall. Jordan's limited method mix, combined with strong sociocultural determinants around reproduction and fertility desires, have contributed to low contraceptive effectiveness in Jordan. Over the same time period, USAID contributions toward increasing family planning access and use, largely focused on service delivery programs, were extensive. Examples of effective initiatives, among others, include task shifting of IUD insertion services to midwives due to a shortage of female physicians. However, key challenges to improved use of family planning services include limited government investments in family planning programs, influential service provider behaviors and biases that limit informed counseling and choice, pervasive strong social norms of family size and fertility, and limited availability of different contraceptive methods. In contexts where sociocultural norms and a limited method mix are the dominant barriers toward improved family planning use, increased national government investments

  2. The parallel adult education system

    DEFF Research Database (Denmark)

    Wahlgren, Bjarne

    2015-01-01

    for competence development. The Danish university educational system includes two parallel programs: a traditional academic track (candidatus) and an alternative practice-based track (master). The practice-based program was established in 2001 and organized as part time. The total program takes half the time...

  3. Parallel Lines

    Directory of Open Access Journals (Sweden)

    James G. Worner

    2017-05-01

    Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship.   ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.

  4. Mental models of adherence: parallels in perceptions, values, and expectations in adherence to prescribed home exercise programs and other personal regimens.

    Science.gov (United States)

    Rizzo, Jon; Bell, Alexandra

    2018-05-09

    A mental model is the collection of an individual's perceptions, values, and expectations about a particular aspect of their life, which strongly influences behaviors. This study explored orthopedic outpatients mental models of adherence to prescribed home exercise programs and how they related to mental models of adherence to other types of personal regimens. The study followed an interpretive description qualitative design. Data were collected via two semi-structured interviews. Interview One focused on participants prior experiences adhering to personal regimens. Interview Two focused on experiences adhering to their current prescribed home exercise program. Data analysis followed a constant comparative method. Findings revealed similarity in perceptions, values, and expectations that informed individuals mental models of adherence to personal regimens and prescribed home exercise programs. Perceived realized results, expected results, perceived social supports, and value of convenience characterized mental models of adherence. Parallels between mental models of adherence for prescribed home exercise and other personal regimens suggest that patients adherence behavior to prescribed routines may be influenced by adherence experiences in other aspects of their lives. By gaining insight into patients adherence experiences, values, and expectations across life domains, clinicians may tailor supports that enhance home exercise adherence. Implications for Rehabilitation A mental model is the collection of an individual's perceptions, values, and expectations about a particular aspect of their life, which is based on prior experiences and strongly influences behaviors. This study demonstrated similarity in orthopedic outpatients mental models of adherence to prescribed home exercise programs and adherence to personal regimens in other aspects of their lives. Physical therapists should inquire about patients non-medical adherence experiences, as strategies patients

  5. Internal combustion engine control for series hybrid electric vehicles by parallel and distributed genetic programming/multiobjective genetic algorithms

    Science.gov (United States)

    Gladwin, D.; Stewart, P.; Stewart, J.

    2011-02-01

    This article addresses the problem of maintaining a stable rectified DC output from the three-phase AC generator in a series-hybrid vehicle powertrain. The series-hybrid prime power source generally comprises an internal combustion (IC) engine driving a three-phase permanent magnet generator whose output is rectified to DC. A recent development has been to control the engine/generator combination by an electronically actuated throttle. This system can be represented as a nonlinear system with significant time delay. Previously, voltage control of the generator output has been achieved by model predictive methods such as the Smith Predictor. These methods rely on the incorporation of an accurate system model and time delay into the control algorithm, with a consequent increase in computational complexity in the real-time controller, and as a necessity relies to some extent on the accuracy of the models. Two complementary performance objectives exist for the control system. Firstly, to maintain the IC engine at its optimal operating point, and secondly, to supply a stable DC supply to the traction drive inverters. Achievement of these goals minimises the transient energy storage requirements at the DC link, with a consequent reduction in both weight and cost. These objectives imply constant velocity operation of the IC engine under external load disturbances and changes in both operating conditions and vehicle speed set-points. In order to achieve these objectives, and reduce the complexity of implementation, in this article a controller is designed by the use of Genetic Programming methods in the Simulink modelling environment, with the aim of obtaining a relatively simple controller for the time-delay system which does not rely on the implementation of real time system models or time delay approximations in the controller. A methodology is presented to utilise the miriad of existing control blocks in the Simulink libraries to automatically evolve optimal control

  6. Compiler Technology for Parallel Scientific Computation

    Directory of Open Access Journals (Sweden)

    Can Özturan

    1994-01-01

    Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.

  7. Both the caspase CSP-1 and a caspase-independent pathway promote programmed cell death in parallel to the canonical pathway for apoptosis in Caenorhabditis elegans.

    Directory of Open Access Journals (Sweden)

    Daniel P Denning

    Full Text Available Caspases are cysteine proteases that can drive apoptosis in metazoans and have critical functions in the elimination of cells during development, the maintenance of tissue homeostasis, and responses to cellular damage. Although a growing body of research suggests that programmed cell death can occur in the absence of caspases, mammalian studies of caspase-independent apoptosis are confounded by the existence of at least seven caspase homologs that can function redundantly to promote cell death. Caspase-independent programmed cell death is also thought to occur in the invertebrate nematode Caenorhabditis elegans. The C. elegans genome contains four caspase genes (ced-3, csp-1, csp-2, and csp-3, of which only ced-3 has been demonstrated to promote apoptosis. Here, we show that CSP-1 is a pro-apoptotic caspase that promotes programmed cell death in a subset of cells fated to die during C. elegans embryogenesis. csp-1 is expressed robustly in late pachytene nuclei of the germline and is required maternally for its role in embryonic programmed cell deaths. Unlike CED-3, CSP-1 is not regulated by the APAF-1 homolog CED-4 or the BCL-2 homolog CED-9, revealing that csp-1 functions independently of the canonical genetic pathway for apoptosis. Previously we demonstrated that embryos lacking all four caspases can eliminate cells through an extrusion mechanism and that these cells are apoptotic. Extruded cells differ from cells that normally undergo programmed cell death not only by being extruded but also by not being engulfed by neighboring cells. In this study, we identify in csp-3; csp-1; csp-2 ced-3 quadruple mutants apoptotic cell corpses that fully resemble wild-type cell corpses: these caspase-deficient cell corpses are morphologically apoptotic, are not extruded, and are internalized by engulfing cells. We conclude that both caspase-dependent and caspase-independent pathways promote apoptotic programmed cell death and the phagocytosis of cell

  8. Expressing Parallelism with ROOT

    Energy Technology Data Exchange (ETDEWEB)

    Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab

    2017-11-22

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  9. Expressing Parallelism with ROOT

    Science.gov (United States)

    Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

    2017-10-01

    The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.

  10. An Introduction to Parallel Computation R

    Indian Academy of Sciences (India)

    How are they programmed? This article provides an introduction. A parallel computer is a network of processors built for ... and have been used to solve problems much faster than a single ... in parallel computer design is to select an organization which ..... The most ambitious approach to parallel computing is to develop.

  11. Exploiting Symmetry on Parallel Architectures.

    Science.gov (United States)

    Stiller, Lewis Benjamin

    1995-01-01

    This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.

  12. EDDY RESOLVING NUTRIENT ECODYNAMICS IN THE GLOBAL PARALLEL OCEAN PROGRAM AND CONNECTIONS WITH TRACE GASES IN THE SULFUR, HALOGEN AND NMHC CYCLES

    Energy Technology Data Exchange (ETDEWEB)

    S. CHU; S. ELLIOTT

    2000-08-01

    Ecodynamics and the sea-air transfer of climate relevant trace gases are intimately coupled in the oceanic mixed layer. Ventilation of species such as dimethyl sulfide and methyl bromide constitutes a key linkage within the earth system. We are creating a research tool for the study of marine trace gas distributions by implementing coupled ecology-gas chemistry in the high resolution Parallel Ocean Program (POP). The fundamental circulation model is eddy resolving, with cell sizes averaging 0.15 degree (lat/long). Here we describe ecochemistry integration. Density dependent mortality and iron geochemistry have enhanced agreement with chlorophyll measurements. Indications are that dimethyl sulfide production rates must be adjusted for latitude dependence to match recent compilations. This may reflect the need for phytoplankton to conserve nitrogen by favoring sulfurous osmolytes. Global simulations are also available for carbonyl sulfide, the methyl halides and for nonmethane hydrocarbons. We discuss future applications including interaction with atmospheric chemistry models, high resolution biogeochemical snapshots and the study of open ocean fertilization.

  13. Gunslingers, poker players, and chickens 3: Decision making under mental performance pressure in junior elite athletes.

    Science.gov (United States)

    Parkin, Beth L; Walsh, Vincent

    2017-01-01

    Having investigated the decision making of world class elite and subelite athletes (see Parkin and Walsh, 2017; Parkin et al., 2017), here the abilities of those at the earliest stage of entry to elite sport are examined. Junior elite athletes have undergone initial national selection and are younger than athletes examined previously (mean age 13 years). Decision making under mental pressure is explored in this sample. During performance an athlete encounters a wide array of mental pressures; these include the psychological impact of errors, negative feedback, and requirements for sustained attention in a dynamic environment (Anshel and Wells, 2000; Mellalieu et al., 2009). Such factors increase the cognitive demands of the athletes, inducing distracting anxiety-related thoughts known as rumination (Beilock and Gray, 2007). Mental pressure has been shown to reduce performance of decision-making tasks where reward and loss contingencies are explicit, with a shift toward increased risk taking (Pabst et al., 2013; Starcke et al., 2011). Mental pressure has been shown to be detrimental to decision-making speed in comparison to physical stress, highlighting the importance of considering a range of different pressures encountered by athletes (Hepler, 2015). To investigate the influence of mental pressure on key indicators of decision making in junior elite athletes. This chapter concludes a wider project examining decision making across developmental stages in elite sport. The work further explores how psychological insights can be applied in an elite sporting environment and in particular tailored to the requirements of junior athletes. Seventeen junior elite athletes (10 males, mean age: 13.80 years) enrolled on a national youth athletic development program participated in the study. Performance across three categories of decision making was assessed under conditions of low and high mental pressure. Decision making under risk was measured via the Cambridge Gambling

  14. Pattern-Driven Automatic Parallelization

    Directory of Open Access Journals (Sweden)

    Christoph W. Kessler

    1996-01-01

    Full Text Available This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequently occurring computations and programming concepts in the source code. This tool also works for dusty deck codes that have been "encrypted" by former machine-specific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machine-specific library routines, and powerful program transformations. The partially restored program semantics also supports local array alignment, distribution, and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.

  15. Is Monte Carlo embarrassingly parallel?

    Energy Technology Data Exchange (ETDEWEB)

    Hoogenboom, J. E. [Delft Univ. of Technology, Mekelweg 15, 2629 JB Delft (Netherlands); Delft Nuclear Consultancy, IJsselzoom 2, 2902 LB Capelle aan den IJssel (Netherlands)

    2012-07-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  16. Is Monte Carlo embarrassingly parallel?

    International Nuclear Information System (INIS)

    Hoogenboom, J. E.

    2012-01-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  17. Implementation and performance of parallelized elegant

    International Nuclear Information System (INIS)

    Wang, Y.; Borland, M.

    2008-01-01

    The program elegant is widely used for design and modeling of linacs for free-electron lasers and energy recovery linacs, as well as storage rings and other applications. As part of a multi-year effort, we have parallelized many aspects of the code, including single-particle dynamics, wakefields, and coherent synchrotron radiation. We report on the approach used for gradual parallelization, which proved very beneficial in getting parallel features into the hands of users quickly. We also report details of parallelization of collective effects. Finally, we discuss performance of the parallelized code in various applications.

  18. Implementations of BLAST for parallel computers.

    Science.gov (United States)

    Jülich, A

    1995-02-01

    The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.

  19. Shared Variable Oriented Parallel Precompiler for SPMD Model

    Institute of Scientific and Technical Information of China (English)

    1995-01-01

    For the moment,commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers,which are just traditional sequential FORTRAN or C compilers expanded with communication statements.Programmers suffer from writing parallel programs with communication statements. The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD(Single Program Multiple Data) computation model and greatly ease the parallel programming with high communication efficiency.The core function of parallel C precompiler has been successfully verified on a transputer-based parallel computer.Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique.

  20. Parallel algorithms for continuum dynamics

    International Nuclear Information System (INIS)

    Hicks, D.L.; Liebrock, L.M.

    1987-01-01

    Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors

  1. Massively Parallel QCD

    International Nuclear Information System (INIS)

    Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G

    2007-01-01

    The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results

  2. Streaming for Functional Data-Parallel Languages

    DEFF Research Database (Denmark)

    Madsen, Frederik Meisner

    In this thesis, we investigate streaming as a general solution to the space inefficiency commonly found in functional data-parallel programming languages. The data-parallel paradigm maps well to parallel SIMD-style hardware. However, the traditional fully materializing execution strategy...... by extending two existing data-parallel languages: NESL and Accelerate. In the extensions we map bulk operations to data-parallel streams that can evaluate fully sequential, fully parallel or anything in between. By a dataflow, piecewise parallel execution strategy, the runtime system can adjust to any target...... flattening necessitates all sub-computations to materialize at the same time. For example, naive n by n matrix multiplication requires n^3 space in NESL because the algorithm contains n^3 independent scalar multiplications. For large values of n, this is completely unacceptable. We address the problem...

  3. Algorithms for the Construction of Parallel Tests by Zero-One Programming. Project Psychometric Aspects of Item Banking No. 7. Research Report 86-7.

    Science.gov (United States)

    Boekkooi-Timminga, Ellen

    Nine methods for automated test construction are described. All are based on the concepts of information from item response theory. Two general kinds of methods for the construction of parallel tests are presented: (1) sequential test design; and (2) simultaneous test design. Sequential design implies that the tests are constructed one after the…

  4. 基于OpenMP的电磁场FDTD多核并行程序设计%Design of electromagnetic field FDTD multi-core parallel program based on OpenMP

    Institute of Scientific and Technical Information of China (English)

    吕忠亭; 张玉强; 崔巍

    2013-01-01

    探讨了基于OpenMP的电磁场FDTD多核并行程序设计的方法,以期实现该方法在更复杂的算法中应用具有更理想的性能提升。针对一个一维电磁场FDTD算法问题,对其计算方法与过程做了简单描述。在Fortran语言环境中,采用OpenMP+细粒度并行的方式实现了并行化,即只对循环部分进行并行计算,并将该并行方法在一个三维瞬态场电偶极子辐射FDTD程序中进行了验证。该并行算法取得了较其他并行FDTD算法更快的加速比和更高的效率。结果表明基于OpenMP的电磁场FDTD并行算法具有非常好的加速比和效率。%The method of the electromagnetic field FDTD multi-core parallel programm design based on OpenMP is dis-cussed,in order to implement ideal performance improvement of this method in the application of more sophisticated algorithms. Aiming at a problem existing in one-dimensional electromagnetic FDTD algorithm , its calculation method and process are described briefly. In Fortran language environment,the parallelism is achieved with OpenMP technology and fine-grained parallel way,that is,the parallel computation is performed only for the cycle part. The parallel method was verified in a three-dimensional transient electromagnetic field FDTD program for dipole radiation. The parallel algorithm has achieved faster speedup and higher efficiency than other parallel FDTD algoritms. The results indicate that the electromagnetic field FDTD parallel algorithm based on OpenMP has a good speedup and efficiency.

  5. Parallel sorting algorithms

    CERN Document Server

    Akl, Selim G

    1985-01-01

    Parallel Sorting Algorithms explains how to use parallel algorithms to sort a sequence of items on a variety of parallel computers. The book reviews the sorting problem, the parallel models of computation, parallel algorithms, and the lower bounds on the parallel sorting problems. The text also presents twenty different algorithms, such as linear arrays, mesh-connected computers, cube-connected computers. Another example where algorithm can be applied is on the shared-memory SIMD (single instruction stream multiple data stream) computers in which the whole sequence to be sorted can fit in the

  6. The language parallel Pascal and other aspects of the massively parallel processor

    Science.gov (United States)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  7. Parallel computing works!

    CERN Document Server

    Fox, Geoffrey C; Messina, Guiseppe C

    2014-01-01

    A clear illustration of how parallel computers can be successfully appliedto large-scale scientific computations. This book demonstrates how avariety of applications in physics, biology, mathematics and other scienceswere implemented on real parallel computers to produce new scientificresults. It investigates issues of fine-grained parallelism relevant forfuture supercomputers with particular emphasis on hypercube architecture. The authors describe how they used an experimental approach to configuredifferent massively parallel machines, design and implement basic systemsoftware, and develop

  8. Experience with a clustered parallel reduction machine

    NARCIS (Netherlands)

    Beemster, M.; Hartel, Pieter H.; Hertzberger, L.O.; Hofman, R.F.H.; Langendoen, K.G.; Li, L.L.; Milikowski, R.; Vree, W.G.; Barendregt, H.P.; Mulder, J.C.

    A clustered architecture has been designed to exploit divide and conquer parallelism in functional programs. The programming methodology developed for the machine is based on explicit annotations and program transformations. It has been successfully applied to a number of algorithms resulting in a

  9. Parallel Computing in SCALE

    International Nuclear Information System (INIS)

    DeHart, Mark D.; Williams, Mark L.; Bowman, Stephen M.

    2010-01-01

    The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement

  10. .NET 4.5 parallel extensions

    CERN Document Server

    Freeman, Bryan

    2013-01-01

    This book contains practical recipes on everything you will need to create task-based parallel programs using C#, .NET 4.5, and Visual Studio. The book is packed with illustrated code examples to create scalable programs.This book is intended to help experienced C# developers write applications that leverage the power of modern multicore processors. It provides the necessary knowledge for an experienced C# developer to work with .NET parallelism APIs. Previous experience of writing multithreaded applications is not necessary.

  11. Parallel Atomistic Simulations

    Energy Technology Data Exchange (ETDEWEB)

    HEFFELFINGER,GRANT S.

    2000-01-18

    Algorithms developed to enable the use of atomistic molecular simulation methods with parallel computers are reviewed. Methods appropriate for bonded as well as non-bonded (and charged) interactions are included. While strategies for obtaining parallel molecular simulations have been developed for the full variety of atomistic simulation methods, molecular dynamics and Monte Carlo have received the most attention. Three main types of parallel molecular dynamics simulations have been developed, the replicated data decomposition, the spatial decomposition, and the force decomposition. For Monte Carlo simulations, parallel algorithms have been developed which can be divided into two categories, those which require a modified Markov chain and those which do not. Parallel algorithms developed for other simulation methods such as Gibbs ensemble Monte Carlo, grand canonical molecular dynamics, and Monte Carlo methods for protein structure determination are also reviewed and issues such as how to measure parallel efficiency, especially in the case of parallel Monte Carlo algorithms with modified Markov chains are discussed.

  12. Computer-Aided Parallelizer and Optimizer

    Science.gov (United States)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  13. Stage-by-Stage and Parallel Flow Path Compressor Modeling for a Variable Cycle Engine, NASA Advanced Air Vehicles Program - Commercial Supersonic Technology Project - AeroServoElasticity

    Science.gov (United States)

    Kopasakis, George; Connolly, Joseph W.; Cheng, Larry

    2015-01-01

    This paper covers the development of stage-by-stage and parallel flow path compressor modeling approaches for a Variable Cycle Engine. The stage-by-stage compressor modeling approach is an extension of a technique for lumped volume dynamics and performance characteristic modeling. It was developed to improve the accuracy of axial compressor dynamics over lumped volume dynamics modeling. The stage-by-stage compressor model presented here is formulated into a parallel flow path model that includes both axial and rotational dynamics. This is done to enable the study of compressor and propulsion system dynamic performance under flow distortion conditions. The approaches utilized here are generic and should be applicable for the modeling of any axial flow compressor design accurate time domain simulations. The objective of this work is as follows. Given the parameters describing the conditions of atmospheric disturbances, and utilizing the derived formulations, directly compute the transfer function poles and zeros describing these disturbances for acoustic velocity, temperature, pressure, and density. Time domain simulations of representative atmospheric turbulence can then be developed by utilizing these computed transfer functions together with the disturbance frequencies of interest.

  14. A parallel buffer tree

    DEFF Research Database (Denmark)

    Sitchinava, Nodar; Zeh, Norbert

    2012-01-01

    We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....

  15. Parallel MR imaging.

    Science.gov (United States)

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A; Seiberlich, Nicole

    2012-07-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the undersampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. Copyright © 2012 Wiley Periodicals, Inc.

  16. Parallel Algorithms and Patterns

    Energy Technology Data Exchange (ETDEWEB)

    Robey, Robert W. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-06-16

    This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.

  17. Parallel Computing Strategies for Irregular Algorithms

    Science.gov (United States)

    Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.

  18. Parallel discrete event simulation

    NARCIS (Netherlands)

    Overeinder, B.J.; Hertzberger, L.O.; Sloot, P.M.A.; Withagen, W.J.

    1991-01-01

    In simulating applications for execution on specific computing systems, the simulation performance figures must be known in a short period of time. One basic approach to the problem of reducing the required simulation time is the exploitation of parallelism. However, in parallelizing the simulation

  19. Parallel reservoir simulator computations

    International Nuclear Information System (INIS)

    Hemanth-Kumar, K.; Young, L.C.

    1995-01-01

    The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90

  20. Automatic Management of Parallel and Distributed System Resources

    Science.gov (United States)

    Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

    1990-01-01

    Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.

  1. Totally parallel multilevel algorithms

    Science.gov (United States)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  2. Massively parallel mathematical sieves

    Energy Technology Data Exchange (ETDEWEB)

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  3. Advanced parallel processing with supercomputer architectures

    International Nuclear Information System (INIS)

    Hwang, K.

    1987-01-01

    This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers

  4. Applications of the parallel computing system using network

    International Nuclear Information System (INIS)

    Ido, Shunji; Hasebe, Hiroki

    1994-01-01

    Parallel programming is applied to multiple processors connected in Ethernet. Data exchanges between tasks located in each processing element are realized by two ways. One is socket which is standard library on recent UNIX operating systems. Another is a network connecting software, named as Parallel Virtual Machine (PVM) which is a free software developed by ORNL, to use many workstations connected to network as a parallel computer. This paper discusses the availability of parallel computing using network and UNIX workstations and comparison between specialized parallel systems (Transputer and iPSC/860) in a Monte Carlo simulation which generally shows high parallelization ratio. (author)

  5. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2008-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...

  6. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2009-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...

  7. Characteristics and error estimation of stratospheric ozone and ozone-related species over Poker Flat (65° N, 147° W, Alaska observed by a ground-based FTIR spectrometer from 2001 to 2003

    Directory of Open Access Journals (Sweden)

    K. Mizutani

    2007-07-01

    Full Text Available It is important to obtain the year-to-year trend of stratospheric minor species in the context of global changes. An important example is the trend in global ozone depletion. The purpose of this paper is to report the accuracy and precision of measurements of stratospheric chemical species that are made at our Poker Flat site in Alaska (65° N, 147° W. Since 1999, minor atmospheric molecules have been observed using a Fourier-Transform solar-absorption infrared Spectrometer (FTS at Poker Flat. Vertical profiles of the abundances of ozone, HNO3, HCl, and HF for the period from 2001 to 2003 were retrieved from FTS spectra using Rodgers' formulation of the Optimal Estimation Method (OEM. The accuracy and precision of the retrievals were estimated by formal error analysis. Errors for the total column were estimated to be 5.3%, 3.4%, 5.9%, and 5.3% for ozone, HNO3, HCl, and HF, respectively. The ozone vertical profiles were in good agreement with profiles derived from collocated ozonesonde measurements that were smoothed with averaging kernel functions that had been obtained with the retrieval procedure used in the analysis of spectra from the ground-based FTS (gb-FTS. The O3, HCl, and HF columns that were retrieved from the FTS measurements were consistent with Earth Probe/Total Ozone Mapping Spectrometer (TOMS and HALogen Occultation Experiment (HALOE data over Alaska within the error limits of all the respective datasets. This is the first report from the Poker Flat FTS observation site on a number of stratospheric gas profiles including a comprehensive error analysis.

  8. Computer program MCAP-TOSS calculates steady-state fluid dynamics of coolant in parallel channels and temperature distribution in surrounding heat-generating solid

    Science.gov (United States)

    Lee, A. Y.

    1967-01-01

    Computer program calculates the steady state fluid distribution, temperature rise, and pressure drop of a coolant, the material temperature distribution of a heat generating solid, and the heat flux distributions at the fluid-solid interfaces. It performs the necessary iterations automatically within the computer, in one machine run.

  9. An Evaluation of Kernel Equating: Parallel Equating with Classical Methods in the SAT Subject Tests[TM] Program. Research Report. ETS RR-09-06

    Science.gov (United States)

    Grant, Mary C.; Zhang, Lilly; Damiano, Michele

    2009-01-01

    This study investigated kernel equating methods by comparing these methods to operational equatings for two tests in the SAT Subject Tests[TM] program. GENASYS (ETS, 2007) was used for all equating methods and scaled score kernel equating results were compared to Tucker, Levine observed score, chained linear, and chained equipercentile equating…

  10. Adapting algorithms to massively parallel hardware

    CERN Document Server

    Sioulas, Panagiotis

    2016-01-01

    In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.

  11. Bayer image parallel decoding based on GPU

    Science.gov (United States)

    Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

    2012-11-01

    In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.

  12. Parallel R-matrix computation

    International Nuclear Information System (INIS)

    Heggarty, J.W.

    1999-06-01

    For almost thirty years, sequential R-matrix computation has been used by atomic physics research groups, from around the world, to model collision phenomena involving the scattering of electrons or positrons with atomic or molecular targets. As considerable progress has been made in the understanding of fundamental scattering processes, new data, obtained from more complex calculations, is of current interest to experimentalists. Performing such calculations, however, places considerable demands on the computational resources to be provided by the target machine, in terms of both processor speed and memory requirement. Indeed, in some instances the computational requirements are so great that the proposed R-matrix calculations are intractable, even when utilising contemporary classic supercomputers. Historically, increases in the computational requirements of R-matrix computation were accommodated by porting the problem codes to a more powerful classic supercomputer. Although this approach has been successful in the past, it is no longer considered to be a satisfactory solution due to the limitations of current (and future) Von Neumann machines. As a consequence, there has been considerable interest in the high performance multicomputers, that have emerged over the last decade which appear to offer the computational resources required by contemporary R-matrix research. Unfortunately, developing codes for these machines is not as simple a task as it was to develop codes for successive classic supercomputers. The difficulty arises from the considerable differences in the computing models that exist between the two types of machine and results in the programming of multicomputers to be widely acknowledged as a difficult, time consuming and error-prone task. Nevertheless, unless parallel R-matrix computation is realised, important theoretical and experimental atomic physics research will continue to be hindered. This thesis describes work that was undertaken in

  13. Algorithms for parallel computers

    International Nuclear Information System (INIS)

    Churchhouse, R.F.

    1985-01-01

    Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)

  14. Parallelism and array processing

    International Nuclear Information System (INIS)

    Zacharov, V.

    1983-01-01

    Modern computing, as well as the historical development of computing, has been dominated by sequential monoprocessing. Yet there is the alternative of parallelism, where several processes may be in concurrent execution. This alternative is discussed in a series of lectures, in which the main developments involving parallelism are considered, both from the standpoint of computing systems and that of applications that can exploit such systems. The lectures seek to discuss parallelism in a historical context, and to identify all the main aspects of concurrency in computation right up to the present time. Included will be consideration of the important question as to what use parallelism might be in the field of data processing. (orig.)

  15. Parallel magnetic resonance imaging

    International Nuclear Information System (INIS)

    Larkman, David J; Nunes, Rita G

    2007-01-01

    Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed. (invited topical review)

  16. Abstract Level Parallelization of Finite Difference Methods

    Directory of Open Access Journals (Sweden)

    Edwin Vollebregt

    1997-01-01

    Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.

  17. The STAPL Parallel Graph Library

    KAUST Repository

    Harshvardhan,; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence

    2013-01-01

    This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable

  18. Global Optimal Energy Management Strategy Research for a Plug-In Series-Parallel Hybrid Electric Bus by Using Dynamic Programming

    Directory of Open Access Journals (Sweden)

    Hongwen He

    2013-01-01

    Full Text Available Energy management strategy influences the power performance and fuel economy of plug-in hybrid electric vehicles greatly. To explore the fuel-saving potential of a plug-in hybrid electric bus (PHEB, this paper searched the global optimal energy management strategy using dynamic programming (DP algorithm. Firstly, the simplified backward model of the PHEB was built which is necessary for DP algorithm. Then the torque and speed of engine and the torque of motor were selected as the control variables, and the battery state of charge (SOC was selected as the state variables. The DP solution procedure was listed, and the way was presented to find all possible control variables at every state of each stage in detail. Finally, the appropriate SOC increment is determined after quantizing the state variables, and then the optimal control of long driving distance of a specific driving cycle is replaced with the optimal control of one driving cycle, which reduces the computational time significantly and keeps the precision at the same time. The simulation results show that the fuel economy of the PEHB with the optimal energy management strategy is improved by 53.7% compared with that of the conventional bus, which can be a benchmark for the assessment of other control strategies.

  19. Domain decomposition methods and parallel computing

    International Nuclear Information System (INIS)

    Meurant, G.

    1991-01-01

    In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset

  20. 6th International Parallel Tools Workshop

    CERN Document Server

    Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang

    2013-01-01

    The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and  tuning work required. This process  is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus  making a major contribution to the development of  robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.

  1. High performance parallel computers for science

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

    1989-01-01

    This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction

  2. Effect of Combined Systematized Behavioral Modification Education Program With Desmopressin in Patients With Nocturia: A Prospective, Multicenter, Randomized, and Parallel Study

    Directory of Open Access Journals (Sweden)

    Sung Yong Cho

    2014-12-01

    Full Text Available Purpose The aims of this study were to investigate the efficacy of combining the systematized behavioral modification program (SBMP with desmopressin therapy and to compare this with desmopressin monotherapy in the treatment of nocturnal polyuria (NPU. Methods Patients were randomized at 8 centers to receive desmopressin monotherapy (group A or combination therapy, comprising desmopressin and the SBMP (group B. Nocturia was defined as an average of 2 or more nightly voids. The primary endpoint was a change in the mean number of nocturnal voids from baseline during the 3-month treatment period. The secondary endpoints were changes in the bladder diary parameters and questionnaires scores, and improvements in self-perception for nocturia. Results A total of 200 patients were screened and 76 were excluded from the study, because they failed the screening process. A total of 124 patients were randomized to receive treatment, with group A comprising 68 patients and group B comprising 56 patients. The patients' characteristics were similar between the groups. Nocturnal voids showed a greater decline in group B (-1.5 compared with group A (-1.2, a difference that was not statistically significant. Significant differences were observed between groups A and B with respect to the NPU index (0.37 vs. 0.29, P=0.028, the change in the maximal bladder capacity (-41.3 mL vs. 13.3 mL, P<0.001, and the rate of patients lost to follow up (10.3% [7/68] vs. 0% [0/56], P=0.016. Self-perception for nocturia significantly improved in both groups. Conclusions Combination treatment did not have any additional benefits in relation to reducing nocturnal voids in patients with NPU; however, combination therapy is helpful because it increases the maximal bladder capacity and decreases the NPI. Furthermore, combination therapy increased the persistence of desmopressin in patients with NPU.

  3. Massively parallel multicanonical simulations

    Science.gov (United States)

    Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

    2018-03-01

    Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.

  4. SPINning parallel systems software

    International Nuclear Information System (INIS)

    Matlin, O.S.; Lusk, E.; McCune, W.

    2002-01-01

    We describe our experiences in using Spin to verify parts of the Multi Purpose Daemon (MPD) parallel process management system. MPD is a distributed collection of processes connected by Unix network sockets. MPD is dynamic processes and connections among them are created and destroyed as MPD is initialized, runs user processes, recovers from faults, and terminates. This dynamic nature is easily expressible in the Spin/Promela framework but poses performance and scalability challenges. We present here the results of expressing some of the parallel algorithms of MPD and executing both simulation and verification runs with Spin

  5. Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

    Science.gov (United States)

    Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

    2017-07-01

    Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).

  6. Parallel Fast Legendre Transform

    NARCIS (Netherlands)

    Alves de Inda, M.; Bisseling, R.H.; Maslen, D.K.

    1998-01-01

    We discuss a parallel implementation of a fast algorithm for the discrete polynomial Legendre transform We give an introduction to the DriscollHealy algorithm using polynomial arithmetic and present experimental results on the eciency and accuracy of our implementation The algorithms were

  7. Parallel hierarchical radiosity rendering

    Energy Technology Data Exchange (ETDEWEB)

    Carter, Michael [Iowa State Univ., Ames, IA (United States)

    1993-07-01

    In this dissertation, the step-by-step development of a scalable parallel hierarchical radiosity renderer is documented. First, a new look is taken at the traditional radiosity equation, and a new form is presented in which the matrix of linear system coefficients is transformed into a symmetric matrix, thereby simplifying the problem and enabling a new solution technique to be applied. Next, the state-of-the-art hierarchical radiosity methods are examined for their suitability to parallel implementation, and scalability. Significant enhancements are also discovered which both improve their theoretical foundations and improve the images they generate. The resultant hierarchical radiosity algorithm is then examined for sources of parallelism, and for an architectural mapping. Several architectural mappings are discussed. A few key algorithmic changes are suggested during the process of making the algorithm parallel. Next, the performance, efficiency, and scalability of the algorithm are analyzed. The dissertation closes with a discussion of several ideas which have the potential to further enhance the hierarchical radiosity method, or provide an entirely new forum for the application of hierarchical methods.

  8. Parallel universes beguile science

    CERN Multimedia

    2007-01-01

    A staple of mind-bending science fiction, the possibility of multiple universes has long intrigued hard-nosed physicists, mathematicians and cosmologists too. We may not be able -- as least not yet -- to prove they exist, many serious scientists say, but there are plenty of reasons to think that parallel dimensions are more than figments of eggheaded imagination.

  9. Parallel k-means++

    Energy Technology Data Exchange (ETDEWEB)

    2017-04-04

    A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique. We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.

  10. Parallel plate detectors

    International Nuclear Information System (INIS)

    Gardes, D.; Volkov, P.

    1981-01-01

    A 5x3cm 2 (timing only) and a 15x5cm 2 (timing and position) parallel plate avalanche counters (PPAC) are considered. The theory of operation and timing resolution is given. The measurement set-up and the curves of experimental results illustrate the possibilities of the two counters [fr

  11. Parallel hierarchical global illumination

    Energy Technology Data Exchange (ETDEWEB)

    Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  12. Parallel Education and Defining the Fourth Sector.

    Science.gov (United States)

    Chessell, Diana

    1996-01-01

    Parallel to the primary, secondary, postsecondary, and adult/community education sectors is education not associated with formal programs--learning in arts and cultural sites. The emergence of cultural and educational tourism is an opportunity for adult/community education to define itself by extending lifelong learning opportunities into parallel…

  13. Design strategies for irregularly adapting parallel applications

    International Nuclear Information System (INIS)

    Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal

    2000-01-01

    Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability

  14. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)

  15. Vectorization, parallelization and porting of nuclear codes (vectorization and parallelization). Progress report fiscal 1998

    International Nuclear Information System (INIS)

    Ishizuki, Shigeru; Kawai, Wataru; Nemoto, Toshiyuki; Ogasawara, Shinobu; Kume, Etsuo; Adachi, Masaaki; Kawasaki, Nobuo; Yatake, Yo-ichi

    2000-03-01

    Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 12 codes in fiscal 1998. These results are reported in 3 parts, i.e., the vectorization and parallelization on vector processors part, the parallelization on scalar processors part and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of General Tokamak Circuit Simulation Program code GTCSP, the vectorization and parallelization of Molecular Dynamics NTV (n-particle, Temperature and Velocity) Simulation code MSP2, Eddy Current Analysis code EDDYCAL, Thermal Analysis Code for Test of Passive Cooling System by HENDEL T2 code THANPACST2 and MHD Equilibrium code SELENEJ on the VPP500 are described. In the parallelization on scalar processors part, the parallelization of Monte Carlo N-Particle Transport code MCNP4B2, Plasma Hydrodynamics code using Cubic Interpolated Propagation Method PHCIP and Vectorized Monte Carlo code (continuous energy model / multi-group model) MVP/GMVP on the Paragon are described. In the porting part, the porting of Monte Carlo N-Particle Transport code MCNP4B2 and Reactor Safety Analysis code RELAP5 on the AP3000 are described. (author)

  16. A Topological Model for Parallel Algorithm Design

    Science.gov (United States)

    1991-09-01

    effort should be directed to planning, requirements analysis, specification and design, with 20% invested into the actual coding, and then the final 40...be olle more language to learn. And by investing the effort into improving the utility of ai, existing language instead of creating a new one, this...193) it abandons the notion of a process as a fundemental concept of parallel program design and that it facilitates program derivation by rigorously

  17. Parallel grid population

    Science.gov (United States)

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  18. Ultrascalable petaflop parallel supercomputer

    Science.gov (United States)

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

    2010-07-20

    A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.

  19. More parallel please

    DEFF Research Database (Denmark)

    Gregersen, Frans; Josephson, Olle; Kristoffersen, Gjert

    of departure that English may be used in parallel with the various local, in this case Nordic, languages. As such, the book integrates the challenge of internationalization faced by any university with the wish to improve quality in research, education and administration based on the local language......Abstract [en] More parallel, please is the result of the work of an Inter-Nordic group of experts on language policy financed by the Nordic Council of Ministers 2014-17. The book presents all that is needed to plan, practice and revise a university language policy which takes as its point......(s). There are three layers in the text: First, you may read the extremely brief version of the in total 11 recommendations for best practice. Second, you may acquaint yourself with the extended version of the recommendations and finally, you may study the reasoning behind each of them. At the end of the text, we give...

  20. PARALLEL MOVING MECHANICAL SYSTEMS

    Directory of Open Access Journals (Sweden)

    Florian Ion Tiberius Petrescu

    2014-09-01

    Full Text Available Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 Moving mechanical systems parallel structures are solid, fast, and accurate. Between parallel systems it is to be noticed Stewart platforms, as the oldest systems, fast, solid and precise. The work outlines a few main elements of Stewart platforms. Begin with the geometry platform, kinematic elements of it, and presented then and a few items of dynamics. Dynamic primary element on it means the determination mechanism kinetic energy of the entire Stewart platforms. It is then in a record tail cinematic mobile by a method dot matrix of rotation. If a structural mottoelement consists of two moving elements which translates relative, drive train and especially dynamic it is more convenient to represent the mottoelement as a single moving components. We have thus seven moving parts (the six motoelements or feet to which is added mobile platform 7 and one fixed.

  1. Xyce parallel electronic simulator.

    Energy Technology Data Exchange (ETDEWEB)

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  2. Stability of parallel flows

    CERN Document Server

    Betchov, R

    2012-01-01

    Stability of Parallel Flows provides information pertinent to hydrodynamical stability. This book explores the stability problems that occur in various fields, including electronics, mechanics, oceanography, administration, economics, as well as naval and aeronautical engineering. Organized into two parts encompassing 10 chapters, this book starts with an overview of the general equations of a two-dimensional incompressible flow. This text then explores the stability of a laminar boundary layer and presents the equation of the inviscid approximation. Other chapters present the general equation

  3. Algorithmically specialized parallel computers

    CERN Document Server

    Snyder, Lawrence; Gannon, Dennis B

    1985-01-01

    Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster

  4. Semantic Language Extensions for Implicit Parallel Programming

    Science.gov (United States)

    2013-09-01

    mobile CPU interacts with a GPU on the same device and a cloud based backend at a remote location presents endless possibilities for solving com...for his contribution to the compiler infrastructure . His creativity in solving research problems and expertise in architecting and implementing...92 5.5.1 Frontend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.5.2 Backend

  5. Discrete Hadamard transformation algorithm's parallelism analysis and achievement

    Science.gov (United States)

    Hu, Hui

    2009-07-01

    With respect to Discrete Hadamard Transformation (DHT) wide application in real-time signal processing while limitation in operation speed of DSP. The article makes DHT parallel research and its parallel performance analysis. Based on multiprocessor platform-TMS320C80 programming structure, the research is carried out to achieve two kinds of parallel DHT algorithms. Several experiments demonstrated the effectiveness of the proposed algorithms.

  6. Parallel-Architecture Simulator Development Using Hardware Transactional Memory

    OpenAIRE

    Armejach Sanosa, Adrià

    2009-01-01

    To address the need for a simpler parallel programming model, Transactional Memory (TM) has been developed and promises good parallel performance with easy-to-write parallel code. Unlike lock-based approaches, with TM, programmers do not need to explicitly specify and manage the synchronization among threads. However, programmers simply mark code segments as transactions, and the TM system manages the concurrency control for them. TM can be implemented either in software (STM) or hardware (HT...

  7. An Expert System for the Development of Efficient Parallel Code

    Science.gov (United States)

    Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.

  8. Temporal fringe pattern analysis with parallel computing

    International Nuclear Information System (INIS)

    Tuck Wah Ng; Kar Tien Ang; Argentini, Gianluca

    2005-01-01

    Temporal fringe pattern analysis is invaluable in transient phenomena studies but necessitates long processing times. Here we describe a parallel computing strategy based on the single-program multiple-data model and hyperthreading processor technology to reduce the execution time. In a two-node cluster workstation configuration we found that execution periods were reduced by 1.6 times when four virtual processors were used. To allow even lower execution times with an increasing number of processors, the time allocated for data transfer, data read, and waiting should be minimized. Parallel computing is found here to present a feasible approach to reduce execution times in temporal fringe pattern analysis

  9. Logical inference techniques for loop parallelization

    DEFF Research Database (Denmark)

    Oancea, Cosmin Eugen; Rauchwerger, Lawrence

    2012-01-01

    the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S={}, where S is a set expression representing array indexes. Using...... of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECT-CLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers....

  10. The ongoing investigation of high performance parallel computing in HEP

    CERN Document Server

    Peach, Kenneth J; Böck, R K; Dobinson, Robert W; Hansroul, M; Norton, Alan Robert; Willers, Ian Malcolm; Baud, J P; Carminati, F; Gagliardi, F; McIntosh, E; Metcalf, M; Robertson, L; CERN. Geneva. Detector Research and Development Committee

    1993-01-01

    Past and current exploitation of parallel computing in High Energy Physics is summarized and a list of R & D projects in this area is presented. The applicability of new parallel hardware and software to physics problems is investigated, in the light of the requirements for computing power of LHC experiments and the current trends in the computer industry. Four main themes are discussed (possibilities for a finer grain of parallelism; fine-grain communication mechanism; usable parallel programming environment; different programming models and architectures, using standard commercial products). Parallel computing technology is potentially of interest for offline and vital for real time applications in LHC. A substantial investment in applications development and evaluation of state of the art hardware and software products is needed. A solid development environment is required at an early stage, before mainline LHC program development begins.

  11. Resistor Combinations for Parallel Circuits.

    Science.gov (United States)

    McTernan, James P.

    1978-01-01

    To help simplify both teaching and learning of parallel circuits, a high school electricity/electronics teacher presents and illustrates the use of tables of values for parallel resistive circuits in which total resistances are whole numbers. (MF)

  12. Parallel computational in nuclear group constant calculation

    International Nuclear Information System (INIS)

    Su'ud, Zaki; Rustandi, Yaddi K.; Kurniadi, Rizal

    2002-01-01

    In this paper parallel computational method in nuclear group constant calculation using collision probability method will be discuss. The main focus is on the calculation of collision matrix which need large amount of computational time. The geometry treated here is concentric cylinder. The calculation of collision probability matrix is carried out using semi analytic method using Beckley Naylor Function. To accelerate computation speed some computer parallel used to solve the problem. We used LINUX based parallelization using PVM software with C or fortran language. While in windows based we used socket programming using DELPHI or C builder. The calculation results shows the important of optimal weight for each processor in case there area many type of processor speed

  13. Parallel External Memory Graph Algorithms

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

    2010-01-01

    In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of ¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....

  14. Parallel inter channel interaction mechanisms

    International Nuclear Information System (INIS)

    Jovic, V.; Afgan, N.; Jovic, L.

    1995-01-01

    Parallel channels interactions are examined. For experimental researches of nonstationary regimes flow in three parallel vertical channels results of phenomenon analysis and mechanisms of parallel channel interaction for adiabatic condition of one-phase fluid and two-phase mixture flow are shown. (author)

  15. Parallel processing of Monte Carlo code MCNP for particle transport problem

    Energy Technology Data Exchange (ETDEWEB)

    Higuchi, Kenji; Kawasaki, Takuji

    1996-06-01

    It is possible to vectorize or parallelize Monte Carlo codes (MC code) for photon and neutron transport problem, making use of independency of the calculation for each particle. Applicability of existing MC code to parallel processing is mentioned. As for parallel computer, we have used both vector-parallel processor and scalar-parallel processor in performance evaluation. We have made (i) vector-parallel processing of MCNP code on Monte Carlo machine Monte-4 with four vector processors, (ii) parallel processing on Paragon XP/S with 256 processors. In this report we describe the methodology and results for parallel processing on two types of parallel or distributed memory computers. In addition, we mention the evaluation of parallel programming environments for parallel computers used in the present work as a part of the work developing STA (Seamless Thinking Aid) Basic Software. (author)

  16. The 2nd Symposium on the Frontiers of Massively Parallel Computations

    Science.gov (United States)

    Mills, Ronnie (Editor)

    1988-01-01

    Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.

  17. A Parallel Butterfly Algorithm

    KAUST Repository

    Poulson, Jack; Demanet, Laurent; Maxwell, Nicholas; Ying, Lexing

    2014-01-01

    The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.

  18. A Parallel Butterfly Algorithm

    KAUST Repository

    Poulson, Jack

    2014-02-04

    The butterfly algorithm is a fast algorithm which approximately evaluates a discrete analogue of the integral transform (Equation Presented.) at large numbers of target points when the kernel, K(x, y), is approximately low-rank when restricted to subdomains satisfying a certain simple geometric condition. In d dimensions with O(Nd) quasi-uniformly distributed source and target points, when each appropriate submatrix of K is approximately rank-r, the running time of the algorithm is at most O(r2Nd logN). A parallelization of the butterfly algorithm is introduced which, assuming a message latency of α and per-process inverse bandwidth of β, executes in at most (Equation Presented.) time using p processes. This parallel algorithm was then instantiated in the form of the open-source DistButterfly library for the special case where K(x, y) = exp(iΦ(x, y)), where Φ(x, y) is a black-box, sufficiently smooth, real-valued phase function. Experiments on Blue Gene/Q demonstrate impressive strong-scaling results for important classes of phase functions. Using quasi-uniform sources, hyperbolic Radon transforms, and an analogue of a three-dimensional generalized Radon transform were, respectively, observed to strong-scale from 1-node/16-cores up to 1024-nodes/16,384-cores with greater than 90% and 82% efficiency, respectively. © 2014 Society for Industrial and Applied Mathematics.

  19. Fast parallel event reconstruction

    CERN Multimedia

    CERN. Geneva

    2010-01-01

    On-line processing of large data volumes produced in modern HEP experiments requires using maximum capabilities of modern and future many-core CPU and GPU architectures.One of such powerful feature is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achievingmore operations per clock cycle. Motivated by the idea of using the SIMD unit ofmodern processors, the KF based track fit has been adapted for parallelism, including memory optimization, numerical analysis, vectorization with inline operator overloading, and optimization using SDKs. The speed of the algorithm has been increased in 120000 times with 0.1 ms/track, running in parallel on 16 SPEs of a Cell Blade computer.  Running on a Nehalem CPU with 8 cores it shows the processing speed of 52 ns/track using the Intel Threading Building Blocks. The same KF algorithm running on an Nvidia GTX 280 in the CUDA frameworkprovi...

  20. Parallel evolutionary computation in bioinformatics applications.

    Science.gov (United States)

    Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

    2013-05-01

    A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.

  1. A parallel implementation of a maximum entropy reconstruction algorithm for PET images in a visual language

    International Nuclear Information System (INIS)

    Bastiens, K.; Lemahieu, I.

    1994-01-01

    The application of a maximum entropy reconstruction algorithm to PET images requires a lot of computing resources. A parallel implementation could seriously reduce the execution time. However, programming a parallel application is still a non trivial task, needing specialized people. In this paper a programming environment based on a visual programming language is used for a parallel implementation of the reconstruction algorithm. This programming environment allows less experienced programmers to use the performance of multiprocessor systems. (authors)

  2. A parallel implementation of a maximum entropy reconstruction algorithm for PET images in a visual language

    Energy Technology Data Exchange (ETDEWEB)

    Bastiens, K; Lemahieu, I [University of Ghent - ELIS Department, St. Pietersnieuwstraat 41, B-9000 Ghent (Belgium)

    1994-12-31

    The application of a maximum entropy reconstruction algorithm to PET images requires a lot of computing resources. A parallel implementation could seriously reduce the execution time. However, programming a parallel application is still a non trivial task, needing specialized people. In this paper a programming environment based on a visual programming language is used for a parallel implementation of the reconstruction algorithm. This programming environment allows less experienced programmers to use the performance of multiprocessor systems. (authors). 8 refs, 3 figs, 1 tab.

  3. Parallel Nonlinear Optimization for Astrodynamic Navigation, Phase I

    Data.gov (United States)

    National Aeronautics and Space Administration — CU Aerospace proposes the development of a new parallel nonlinear program (NLP) solver software package. NLPs allow the solution of complex optimization problems,...

  4. Fast Inverse Distance Weighting-Based Spatiotemporal Interpolation: A Web-Based Application of Interpolating Daily Fine Particulate Matter PM2.5 in the Contiguous U.S. Using Parallel Programming and k-d Tree

    Directory of Open Access Journals (Sweden)

    Lixin Li

    2014-09-01

    Full Text Available Epidemiological studies have identified associations between mortality and changes in concentration of particulate matter. These studies have highlighted the public concerns about health effects of particulate air pollution. Modeling fine particulate matter PM2.5 exposure risk and monitoring day-to-day changes in PM2.5 concentration is a critical step for understanding the pollution problem and embarking on the necessary remedy. This research designs, implements and compares two inverse distance weighting (IDW-based spatiotemporal interpolation methods, in order to assess the trend of daily PM2.5 concentration for the contiguous United States over the year of 2009, at both the census block group level and county level. Traditionally, when handling spatiotemporal interpolation, researchers tend to treat space and time separately and reduce the spatiotemporal interpolation problems to a sequence of snapshots of spatial interpolations. In this paper, PM2.5 data interpolation is conducted in the continuous space-time domain by integrating space and time simultaneously, using the so-called extension approach. Time values are calculated with the help of a factor under the assumption that spatial and temporal dimensions are equally important when interpolating a continuous changing phenomenon in the space-time domain. Various IDW-based spatiotemporal interpolation methods with different parameter configurations are evaluated by cross-validation. In addition, this study explores computational issues (computer processing speed faced during implementation of spatiotemporal interpolation for huge data sets. Parallel programming techniques and an advanced data structure, named k-d tree, are adapted in this paper to address the computational challenges. Significant computational improvement has been achieved. Finally, a web-based spatiotemporal IDW-based interpolation application is designed and implemented where users can visualize and animate

  5. Fast Inverse Distance Weighting-Based Spatiotemporal Interpolation: A Web-Based Application of Interpolating Daily Fine Particulate Matter PM2.5 in the Contiguous U.S. Using Parallel Programming and k-d Tree

    Science.gov (United States)

    Li, Lixin; Losser, Travis; Yorke, Charles; Piltner, Reinhard

    2014-01-01

    Epidemiological studies have identified associations between mortality and changes in concentration of particulate matter. These studies have highlighted the public concerns about health effects of particulate air pollution. Modeling fine particulate matter PM2.5 exposure risk and monitoring day-to-day changes in PM2.5 concentration is a critical step for understanding the pollution problem and embarking on the necessary remedy. This research designs, implements and compares two inverse distance weighting (IDW)-based spatiotemporal interpolation methods, in order to assess the trend of daily PM2.5 concentration for the contiguous United States over the year of 2009, at both the census block group level and county level. Traditionally, when handling spatiotemporal interpolation, researchers tend to treat space and time separately and reduce the spatiotemporal interpolation problems to a sequence of snapshots of spatial interpolations. In this paper, PM2.5 data interpolation is conducted in the continuous space-time domain by integrating space and time simultaneously, using the so-called extension approach. Time values are calculated with the help of a factor under the assumption that spatial and temporal dimensions are equally important when interpolating a continuous changing phenomenon in the space-time domain. Various IDW-based spatiotemporal interpolation methods with different parameter configurations are evaluated by cross-validation. In addition, this study explores computational issues (computer processing speed) faced during implementation of spatiotemporal interpolation for huge data sets. Parallel programming techniques and an advanced data structure, named k-d tree, are adapted in this paper to address the computational challenges. Significant computational improvement has been achieved. Finally, a web-based spatiotemporal IDW-based interpolation application is designed and implemented where users can visualize and animate spatiotemporal interpolation

  6. Fast inverse distance weighting-based spatiotemporal interpolation: a web-based application of interpolating daily fine particulate matter PM2:5 in the contiguous U.S. using parallel programming and k-d tree.

    Science.gov (United States)

    Li, Lixin; Losser, Travis; Yorke, Charles; Piltner, Reinhard

    2014-09-03

    Epidemiological studies have identified associations between mortality and changes in concentration of particulate matter. These studies have highlighted the public concerns about health effects of particulate air pollution. Modeling fine particulate matter PM2.5 exposure risk and monitoring day-to-day changes in PM2.5 concentration is a critical step for understanding the pollution problem and embarking on the necessary remedy. This research designs, implements and compares two inverse distance weighting (IDW)-based spatiotemporal interpolation methods, in order to assess the trend of daily PM2.5 concentration for the contiguous United States over the year of 2009, at both the census block group level and county level. Traditionally, when handling spatiotemporal interpolation, researchers tend to treat space and time separately and reduce the spatiotemporal interpolation problems to a sequence of snapshots of spatial interpolations. In this paper, PM2.5 data interpolation is conducted in the continuous space-time domain by integrating space and time simultaneously, using the so-called extension approach. Time values are calculated with the help of a factor under the assumption that spatial and temporal dimensions are equally important when interpolating a continuous changing phenomenon in the space-time domain. Various IDW-based spatiotemporal interpolation methods with different parameter configurations are evaluated by cross-validation. In addition, this study explores computational issues (computer processing speed) faced during implementation of spatiotemporal interpolation for huge data sets. Parallel programming techniques and an advanced data structure, named k-d tree, are adapted in this paper to address the computational challenges. Significant computational improvement has been achieved. Finally, a web-based spatiotemporal IDW-based interpolation application is designed and implemented where users can visualize and animate spatiotemporal interpolation

  7. Linear parallel processing machines I

    Energy Technology Data Exchange (ETDEWEB)

    Von Kunze, M

    1984-01-01

    As is well-known, non-context-free grammars for generating formal languages happen to be of a certain intrinsic computational power that presents serious difficulties to efficient parsing algorithms as well as for the development of an algebraic theory of contextsensitive languages. In this paper a framework is given for the investigation of the computational power of formal grammars, in order to start a thorough analysis of grammars consisting of derivation rules of the form aB ..-->.. A/sub 1/ ... A /sub n/ b/sub 1/...b /sub m/ . These grammars may be thought of as automata by means of parallel processing, if one considers the variables as operators acting on the terminals while reading them right-to-left. This kind of automata and their 2-dimensional programming language prove to be useful by allowing a concise linear-time algorithm for integer multiplication. Linear parallel processing machines (LP-machines) which are, in their general form, equivalent to Turing machines, include finite automata and pushdown automata (with states encoded) as special cases. Bounded LP-machines yield deterministic accepting automata for nondeterministic contextfree languages, and they define an interesting class of contextsensitive languages. A characterization of this class in terms of generating grammars is established by using derivation trees with crossings as a helpful tool. From the algebraic point of view, deterministic LP-machines are effectively represented semigroups with distinguished subsets. Concerning the dualism between generating and accepting devices of formal languages within the algebraic setting, the concept of accepting automata turns out to reduce essentially to embeddability in an effectively represented extension monoid, even in the classical cases.

  8. Parallel computing in enterprise modeling.

    Energy Technology Data Exchange (ETDEWEB)

    Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

    2008-08-01

    This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

  9. Rubus: A compiler for seamless and extensible parallelism.

    Directory of Open Access Journals (Sweden)

    Muhammad Adnan

    Full Text Available Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU, originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84

  10. Parallel Polarization State Generation.

    Science.gov (United States)

    She, Alan; Capasso, Federico

    2016-05-17

    The control of polarization, an essential property of light, is of wide scientific and technological interest. The general problem of generating arbitrary time-varying states of polarization (SOP) has always been mathematically formulated by a series of linear transformations, i.e. a product of matrices, imposing a serial architecture. Here we show a parallel architecture described by a sum of matrices. The theory is experimentally demonstrated by modulating spatially-separated polarization components of a laser using a digital micromirror device that are subsequently beam combined. This method greatly expands the parameter space for engineering devices that control polarization. Consequently, performance characteristics, such as speed, stability, and spectral range, are entirely dictated by the technologies of optical intensity modulation, including absorption, reflection, emission, and scattering. This opens up important prospects for polarization state generation (PSG) with unique performance characteristics with applications in spectroscopic ellipsometry, spectropolarimetry, communications, imaging, and security.

  11. Parallel imaging microfluidic cytometer.

    Science.gov (United States)

    Ehrlich, Daniel J; McKenna, Brian K; Evans, James G; Belkina, Anna C; Denis, Gerald V; Sherr, David H; Cheung, Man Ching

    2011-01-01

    By adding an additional degree of freedom from multichannel flow, the parallel microfluidic cytometer (PMC) combines some of the best features of fluorescence-activated flow cytometry (FCM) and microscope-based high-content screening (HCS). The PMC (i) lends itself to fast processing of large numbers of samples, (ii) adds a 1D imaging capability for intracellular localization assays (HCS), (iii) has a high rare-cell sensitivity, and (iv) has an unusual capability for time-synchronized sampling. An inability to practically handle large sample numbers has restricted applications of conventional flow cytometers and microscopes in combinatorial cell assays, network biology, and drug discovery. The PMC promises to relieve a bottleneck in these previously constrained applications. The PMC may also be a powerful tool for finding rare primary cells in the clinic. The multichannel architecture of current PMC prototypes allows 384 unique samples for a cell-based screen to be read out in ∼6-10 min, about 30 times the speed of most current FCM systems. In 1D intracellular imaging, the PMC can obtain protein localization using HCS marker strategies at many times for the sample throughput of charge-coupled device (CCD)-based microscopes or CCD-based single-channel flow cytometers. The PMC also permits the signal integration time to be varied over a larger range than is practical in conventional flow cytometers. The signal-to-noise advantages are useful, for example, in counting rare positive cells in the most difficult early stages of genome-wide screening. We review the status of parallel microfluidic cytometry and discuss some of the directions the new technology may take. Copyright © 2011 Elsevier Inc. All rights reserved.

  12. Parallel community climate model: Description and user`s guide

    Energy Technology Data Exchange (ETDEWEB)

    Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H. [and others

    1996-07-15

    This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.

  13. The new landscape of parallel computer architecture

    Energy Technology Data Exchange (ETDEWEB)

    Shalf, John [NERSC Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley California, 94720 (United States)

    2007-07-15

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.

  14. Parallel Evolutionary Optimization for Neuromorphic Network Training

    Energy Technology Data Exchange (ETDEWEB)

    Schuman, Catherine D [ORNL; Disney, Adam [University of Tennessee (UT); Singh, Susheela [North Carolina State University (NCSU), Raleigh; Bruer, Grant [University of Tennessee (UT); Mitchell, John Parker [University of Tennessee (UT); Klibisz, Aleksander [University of Tennessee (UT); Plank, James [University of Tennessee (UT)

    2016-01-01

    One of the key impediments to the success of current neuromorphic computing architectures is the issue of how best to program them. Evolutionary optimization (EO) is one promising programming technique; in particular, its wide applicability makes it especially attractive for neuromorphic architectures, which can have many different characteristics. In this paper, we explore different facets of EO on a spiking neuromorphic computing model called DANNA. We focus on the performance of EO in the design of our DANNA simulator, and on how to structure EO on both multicore and massively parallel computing systems. We evaluate how our parallel methods impact the performance of EO on Titan, the U.S.'s largest open science supercomputer, and BOB, a Beowulf-style cluster of Raspberry Pi's. We also focus on how to improve the EO by evaluating commonality in higher performing neural networks, and present the result of a study that evaluates the EO performed by Titan.

  15. The new landscape of parallel computer architecture

    International Nuclear Information System (INIS)

    Shalf, John

    2007-01-01

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models

  16. Automatic Parallelization An Overview of Fundamental Compiler Techniques

    CERN Document Server

    Midkiff, Samuel P

    2012-01-01

    Compiling for parallelism is a longstanding topic of compiler research. This book describes the fundamental principles of compiling "regular" numerical programs for parallelism. We begin with an explanation of analyses that allow a compiler to understand the interaction of data reads and writes in different statements and loop iterations during program execution. These analyses include dependence analysis, use-def analysis and pointer analysis. Next, we describe how the results of these analyses are used to enable transformations that make loops more amenable to parallelization, and

  17. Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

    Science.gov (United States)

    Tam, Wing-Kin; Yang, Zhi

    2018-05-01

    Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.

  18. Power stability methods for parallel systems

    International Nuclear Information System (INIS)

    Wallach, Y.

    1988-01-01

    Parallel-Processing Systems are already commercially available. This paper shows that if one of them - the Alternating Sequential Parallel, or ASP system - is applied to network stability calculations it will lead to a higher speed of solution. The ASP system is first described and is then shown to be cheaper, more reliable and available than other parallel systems. Also, no deadlock need be feared and the speedup is normally very high. A number of ASP systems were already assembled (the SMS systems, Topps, DIRMU etc.). At present, an IBM Local Area Network is being modified so that it too can work in the ASP mode. Existing ASP systems were programmed in Fortran or assembly language. Since newer systems (e.g. DIRMU) are programmed in Modula-2, this language can be used. Stability analysis is based on solving nonlinear differential and algebraic equations. The algorithm for solving the nonlinear differential equations on ASP, is described and programmed in Modula-2. The speedup is computed and is shown to be almost optimal

  19. Parallel Framework for Cooperative Processes

    Directory of Open Access Journals (Sweden)

    Mitică Craus

    2005-01-01

    Full Text Available This paper describes the work of an object oriented framework designed to be used in the parallelization of a set of related algorithms. The idea behind the system we are describing is to have a re-usable framework for running several sequential algorithms in a parallel environment. The algorithms that the framework can be used with have several things in common: they have to run in cycles and the work should be possible to be split between several "processing units". The parallel framework uses the message-passing communication paradigm and is organized as a master-slave system. Two applications are presented: an Ant Colony Optimization (ACO parallel algorithm for the Travelling Salesman Problem (TSP and an Image Processing (IP parallel algorithm for the Symmetrical Neighborhood Filter (SNF. The implementations of these applications by means of the parallel framework prove to have good performances: approximatively linear speedup and low communication cost.

  20. Parallel Monte Carlo reactor neutronics

    International Nuclear Information System (INIS)

    Blomquist, R.N.; Brown, F.B.

    1994-01-01

    The issues affecting implementation of parallel algorithms for large-scale engineering Monte Carlo neutron transport simulations are discussed. For nuclear reactor calculations, these include load balancing, recoding effort, reproducibility, domain decomposition techniques, I/O minimization, and strategies for different parallel architectures. Two codes were parallelized and tested for performance. The architectures employed include SIMD, MIMD-distributed memory, and workstation network with uneven interactive load. Speedups linear with the number of nodes were achieved

  1. Anti-parallel triplexes

    DEFF Research Database (Denmark)

    Kosbar, Tamer R.; Sofan, Mamdouh A.; Waly, Mohamed A.

    2015-01-01

    about 6.1 °C when the TFO strand was modified with Z and the Watson-Crick strand with adenine-LNA (AL). The molecular modeling results showed that, in case of nucleobases Y and Z a hydrogen bond (1.69 and 1.72 Å, respectively) was formed between the protonated 3-aminopropyn-1-yl chain and one...... of the phosphate groups in Watson-Crick strand. Also, it was shown that the nucleobase Y made a good stacking and binding with the other nucleobases in the TFO and Watson-Crick duplex, respectively. In contrast, the nucleobase Z with LNA moiety was forced to twist out of plane of Watson-Crick base pair which......The phosphoramidites of DNA monomers of 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine (Y) and 7-(3-aminopropyn-1-yl)-8-aza-7-deazaadenine LNA (Z) are synthesized, and the thermal stability at pH 7.2 and 8.2 of anti-parallel triplexes modified with these two monomers is determined. When, the anti...

  2. Parallel consensual neural networks.

    Science.gov (United States)

    Benediktsson, J A; Sveinsson, J R; Ersoy, O K; Swain, P H

    1997-01-01

    A new type of a neural-network architecture, the parallel consensual neural network (PCNN), is introduced and applied in classification/data fusion of multisource remote sensing and geographic data. The PCNN architecture is based on statistical consensus theory and involves using stage neural networks with transformed input data. The input data are transformed several times and the different transformed data are used as if they were independent inputs. The independent inputs are first classified using the stage neural networks. The output responses from the stage networks are then weighted and combined to make a consensual decision. In this paper, optimization methods are used in order to weight the outputs from the stage networks. Two approaches are proposed to compute the data transforms for the PCNN, one for binary data and another for analog data. The analog approach uses wavelet packets. The experimental results obtained with the proposed approach show that the PCNN outperforms both a conjugate-gradient backpropagation neural network and conventional statistical methods in terms of overall classification accuracy of test data.

  3. A Parallel Particle Swarm Optimizer

    National Research Council Canada - National Science Library

    Schutte, J. F; Fregly, B .J; Haftka, R. T; George, A. D

    2003-01-01

    .... Motivated by a computationally demanding biomechanical system identification problem, we introduce a parallel implementation of a stochastic population based global optimizer, the Particle Swarm...

  4. Seeing or moving in parallel

    DEFF Research Database (Denmark)

    Christensen, Mark Schram; Ehrsson, H Henrik; Nielsen, Jens Bo

    2013-01-01

    a different network, involving bilateral dorsal premotor cortex (PMd), primary motor cortex, and SMA, was more active when subjects viewed parallel movements while performing either symmetrical or parallel movements. Correlations between behavioral instability and brain activity were present in right lateral...... adduction-abduction movements symmetrically or in parallel with real-time congruent or incongruent visual feedback of the movements. One network, consisting of bilateral superior and middle frontal gyrus and supplementary motor area (SMA), was more active when subjects performed parallel movements, whereas...

  5. A high-speed linear algebra library with automatic parallelism

    Science.gov (United States)

    Boucher, Michael L.

    1994-01-01

    Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.

  6. Logical inference techniques for loop parallelization

    KAUST Repository

    Oancea, Cosmin E.; Rauchwerger, Lawrence

    2012-01-01

    This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S = Ø, where S is a set expression representing array indexes. Using a language instead of an array-abstraction representation for S results in a smaller number of conservative approximations but exhibits a potentially-high runtime cost. To alleviate this cost we introduce a language translation F from the USR set-expression language to an equally rich language of predicates (F(S) ⇒ S = Ø). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates (F(S)) into a sequence of sufficient-independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECTCLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers. Copyright © 2012 ACM.

  7. Performance studies of the parallel VIM code

    International Nuclear Information System (INIS)

    Shi, B.; Blomquist, R.N.

    1996-01-01

    In this paper, the authors evaluate the performance of the parallel version of the VIM Monte Carlo code on the IBM SPx at the High Performance Computing Research Facility at ANL. Three test problems with contrasting computational characteristics were used to assess effects in performance. A statistical method for estimating the inefficiencies due to load imbalance and communication is also introduced. VIM is a large scale continuous energy Monte Carlo radiation transport program and was parallelized using history partitioning, the master/worker approach, and p4 message passing library. Dynamic load balancing is accomplished when the master processor assigns chunks of histories to workers that have completed a previously assigned task, accommodating variations in the lengths of histories, processor speeds, and worker loads. At the end of each batch (generation), the fission sites and tallies are sent from each worker to the master process, contributing to the parallel inefficiency. All communications are between master and workers, and are serial. The SPx is a scalable 128-node parallel supercomputer with high-performance Omega switches of 63 microsec latency and 35 MBytes/sec bandwidth. For uniform and reproducible performance, they used only the 120 identical regular processors (IBM RS/6000) and excluded the remaining eight planet nodes, which may be loaded by other's jobs

  8. Logical inference techniques for loop parallelization

    KAUST Repository

    Oancea, Cosmin E.

    2012-01-01

    This paper presents a fully automatic approach to loop parallelization that integrates the use of static and run-time analysis and thus overcomes many known difficulties such as nonlinear and indirect array indexing and complex control flow. Our hybrid analysis framework validates the parallelization transformation by verifying the independence of the loop\\'s memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S = Ø, where S is a set expression representing array indexes. Using a language instead of an array-abstraction representation for S results in a smaller number of conservative approximations but exhibits a potentially-high runtime cost. To alleviate this cost we introduce a language translation F from the USR set-expression language to an equally rich language of predicates (F(S) ⇒ S = Ø). Loop parallelization is then validated using a novel logic inference algorithm that factorizes the obtained complex predicates (F(S)) into a sequence of sufficient-independence conditions that are evaluated first statically and, when needed, dynamically, in increasing order of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECTCLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers. Copyright © 2012 ACM.

  9. Parallelization and automatic data distribution for nuclear reactor simulations

    Energy Technology Data Exchange (ETDEWEB)

    Liebrock, L.M. [Liebrock-Hicks Research, Calumet, MI (United States)

    1997-07-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.

  10. Parallelization and automatic data distribution for nuclear reactor simulations

    International Nuclear Information System (INIS)

    Liebrock, L.M.

    1997-01-01

    Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed

  11. PARALLEL IMPORT: REALITY FOR RUSSIA

    Directory of Open Access Journals (Sweden)

    Т. А. Сухопарова

    2014-01-01

    Full Text Available Problem of parallel import is urgent question at now. Parallel import legalization in Russia is expedient. Such statement based on opposite experts opinion analysis. At the same time it’s necessary to negative consequences consider of this decision and to apply remedies to its minimization.Purchase on Elibrary.ru > Buy now

  12. Truly nested data-parallelism: compiling SaC for the Microgrid architecture

    NARCIS (Netherlands)

    Herhut, S.; Joslin, C.; Scholz, S.-B.; Grelck, C.; Morazan, M.

    2009-01-01

    Data-parallel programming facilitates elegant specification of concurrency. However, the composability of data-parallel operations so far has been constrained by the requirement to have only at data- parallel operation at runtime. In this paper, we present early results on our work to exploit

  13. Parallel Ada benchmarks for the SVMS

    Science.gov (United States)

    Collard, Philippe E.

    1990-01-01

    The use of parallel processing paradigm to design and develop faster and more reliable computers appear to clearly mark the future of information processing. NASA started the development of such an architecture: the Spaceborne VHSIC Multi-processor System (SVMS). Ada will be one of the languages used to program the SVMS. One of the unique characteristics of Ada is that it supports parallel processing at the language level through the tasking constructs. It is important for the SVMS project team to assess how efficiently the SVMS architecture will be implemented, as well as how efficiently Ada environment will be ported to the SVMS. AUTOCLASS II, a Bayesian classifier written in Common Lisp, was selected as one of the benchmarks for SVMS configurations. The purpose of the R and D effort was to provide the SVMS project team with the version of AUTOCLASS II, written in Ada, that would make use of Ada tasking constructs as much as possible so as to constitute a suitable benchmark. Additionally, a set of programs was developed that would measure Ada tasking efficiency on parallel architectures as well as determine the critical parameters influencing tasking efficiency. All this was designed to provide the SVMS project team with a set of suitable tools in the development of the SVMS architecture.

  14. A solution for automatic parallelization of sequential assembly code

    Directory of Open Access Journals (Sweden)

    Kovačević Đorđe

    2013-01-01

    Full Text Available Since modern multicore processors can execute existing sequential programs only on a single core, there is a strong need for automatic parallelization of program code. Relying on existing algorithms, this paper describes one new software solution tool for parallelization of sequential assembly code. The main goal of this paper is to develop the parallelizator which reads sequential assembler code and at the output provides parallelized code for MIPS processor with multiple cores. The idea is the following: the parser translates assembler input file to program objects suitable for further processing. After that the static single assignment is done. Based on the data flow graph, the parallelization algorithm separates instructions on different cores. Once sequential code is parallelized by the parallelization algorithm, registers are allocated with the algorithm for linear allocation, and the result at the end of the program is distributed assembler code on each of the cores. In the paper we evaluate the speedup of the matrix multiplication example, which was processed by the parallelizator of assembly code. The result is almost linear speedup of code execution, which increases with the number of cores. The speed up on the two cores is 1.99, while on 16 cores the speed up is 13.88.

  15. The Galley Parallel File System

    Science.gov (United States)

    Nieuwejaar, Nils; Kotz, David

    1996-01-01

    Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.

  16. Parallelization of the FLAPW method

    International Nuclear Information System (INIS)

    Canning, A.; Mannstadt, W.; Freeman, A.J.

    1999-01-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about one hundred atoms due to a lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel computer

  17. Parallelization of the FLAPW method

    Science.gov (United States)

    Canning, A.; Mannstadt, W.; Freeman, A. J.

    2000-08-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.

  18. Design Patterns: establishing a discipline of parallel software engineering

    CERN Multimedia

    CERN. Geneva

    2010-01-01

    Many core processors present us with a software challenge. We must turn our serial code into parallel code. To accomplish this wholesale transformation of our software ecosystem, we must define established practice is in parallel programming and then develop tools to support that practice. This leads to design patterns supported by frameworks optimized at runtime with advanced autotuning compilers. In this talk I provide an update of my ongoing research with the ParLab at UC Berkeley to realize this vision. In particular, I will describe our draft parallel pattern language, our early experiments with software frameworks, and the associated runtime optimization tools.About the speakerTim Mattson is a parallel programmer (Ph.D. Chemistry, UCSC, 1985). He does linear algebra, finds oil, shakes molecules, solves differential equations, and models electrons in simple atomic systems. He has spent his career working with computer scientists to make sure the needs of parallel applications programmers are met.Tim has ...

  19. Parallel computation for solving the tridiagonal linear system of equations

    International Nuclear Information System (INIS)

    Ishiguro, Misako; Harada, Hiroo; Fujii, Minoru; Fujimura, Toichiro; Nakamura, Yasuhiro; Nanba, Katsumi.

    1981-09-01

    Recently, applications of parallel computation for scientific calculations have increased from the need of the high speed calculation of large scale programs. At the JAERI computing center, an array processor FACOM 230-75 APU has installed to study the applicability of parallel computation for nuclear codes. We made some numerical experiments by using the APU on the methods of solution of tridiagonal linear equation which is an important problem in scientific calculations. Referring to the recent papers with parallel methods, we investigate eight ones. These are Gauss elimination method, Parallel Gauss method, Accelerated parallel Gauss method, Jacobi method, Recursive doubling method, Cyclic reduction method, Chebyshev iteration method, and Conjugate gradient method. The computing time and accuracy were compared among the methods on the basis of the numerical experiments. As the result, it is found that the Cyclic reduction method is best both in computing time and accuracy and the Gauss elimination method is the second one. (author)

  20. Study on MPI/OpenMP hybrid parallelism for Monte Carlo neutron transport code

    International Nuclear Information System (INIS)

    Liang Jingang; Xu Qi; Wang Kan; Liu Shiwen

    2013-01-01

    Parallel programming with mixed mode of messages-passing and shared-memory has several advantages when used in Monte Carlo neutron transport code, such as fitting hardware of distributed-shared clusters, economizing memory demand of Monte Carlo transport, improving parallel performance, and so on. MPI/OpenMP hybrid parallelism was implemented based on a one dimension Monte Carlo neutron transport code. Some critical factors affecting the parallel performance were analyzed and solutions were proposed for several problems such as contention access, lock contention and false sharing. After optimization the code was tested finally. It is shown that the hybrid parallel code can reach good performance just as pure MPI parallel program, while it saves a lot of memory usage at the same time. Therefore hybrid parallel is efficient for achieving large-scale parallel of Monte Carlo neutron transport. (authors)

  1. Finite element electromagnetic field computation on the Sequent Symmetry 81 parallel computer

    International Nuclear Information System (INIS)

    Ratnajeevan, S.; Hoole, H.

    1990-01-01

    Finite element field analysis algorithms lend themselves to parallelization and this fact is exploited in this paper to implement a finite element analysis program for electromagnetic field computation on the Sequent Symmetry 81 parallel computer with three processors. In terms of waiting time, the maximum gains are to be made in matrix solution and therefore this paper concentrates on the gains in parallelizing the solution part of finite element analysis. An outline of how parallelization could be exploited in most finite element operations is given in this paper although the actual implemention of parallelism on the Sequent Symmetry 81 parallel computer was in sparsity computation, matrix assembly and the matrix solution areas. In all cases, the algorithms were modified suit the parallel programming application rather than allowing the compiler to parallelize on existing algorithms

  2. Electromagnetic Physics Models for Parallel Computing Architectures

    International Nuclear Information System (INIS)

    Amadio, G; Bianchini, C; Iope, R; Ananya, A; Apostolakis, J; Aurora, A; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S Y; Lima, G; Duhem, L

    2016-01-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. (paper)

  3. Electromagnetic Physics Models for Parallel Computing Architectures

    Science.gov (United States)

    Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

    2016-10-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.

  4. Flexibility and Performance of Parallel File Systems

    Science.gov (United States)

    Kotz, David; Nieuwejaar, Nils

    1996-01-01

    As we gain experience with parallel file systems, it becomes increasingly clear that a single solution does not suit all applications. For example, it appears to be impossible to find a single appropriate interface, caching policy, file structure, or disk-management strategy. Furthermore, the proliferation of file-system interfaces and abstractions make applications difficult to port. We propose that the traditional functionality of parallel file systems be separated into two components: a fixed core that is standard on all platforms, encapsulating only primitive abstractions and interfaces, and a set of high-level libraries to provide a variety of abstractions and application-programmer interfaces (API's). We present our current and next-generation file systems as examples of this structure. Their features, such as a three-dimensional file structure, strided read and write interfaces, and I/O-node programs, are specifically designed with the flexibility and performance necessary to support a wide range of applications.

  5. Parallel channel effects under BWR LOCA conditions

    International Nuclear Information System (INIS)

    Suzuki, H.; Hatamiya, S.; Murase, M.

    1988-01-01

    Due to parallel channel effects, different flow patterns such as liquid down-flow and gas up-flow appear simultaneously in fuel bundles of a BWR core during postulated LOCAs. Applying the parallel channel effects to the fuel bundle, water drain tubes with a restricted bottom end have been developed in order to mitigate counter-current flow limiting and to increase the falling water flow rate at the upper tie plate. The upper tie plate with water drain tubes is an especially effective means of increasing the safety margin of a reactor with narrow gaps between fuel rods and high steam velocity at the upper tie plate. The characteristics of the water drain tubes have been experimentally investigated using a small-scaled steam-water system simulating a BWR core. Then, their effect on the fuel cladding temperature was evaluated using the LOCA analysis program SAFER. (orig.)

  6. A layered semantics for a parallel object-oriented language

    NARCIS (Netherlands)

    P.H.M. America (Pierre); J.J.M.M. Rutten (Jan)

    1990-01-01

    textabstractWe develop a denotational semantics for POOL, a parallel object-oriented programming language. The main contribution of this semantics is an accurate mathematical model of the most important concept in object-oriented programming: the object. This is achieved by structuring the semantics

  7. Directions in parallel processor architecture, and GPUs too

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    Modern computing is power-limited in every domain of computing. Performance increments extracted from instruction-level parallelism (ILP) are no longer power-efficient; they haven't been for some time. Thread-level parallelism (TLP) is a more easily exploited form of parallelism, at the expense of programmer effort to expose it in the program. In this talk, I will introduce you to disparate topics in parallel processor architecture that will impact programming models (and you) in both the near and far future. About the speaker Olivier is a senior GPU (SM) architect at NVIDIA and an active participant in the concurrency working group of the ISO C++ committee. He has also worked on very large diesel engines as a mechanical engineer, and taught at McGill University (Canada) as a faculty instructor.

  8. Massively parallel sparse matrix function calculations with NTPoly

    Science.gov (United States)

    Dawson, William; Nakajima, Takahito

    2018-04-01

    We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.

  9. Parallel integer sorting with medium and fine-scale parallelism

    Science.gov (United States)

    Dagum, Leonardo

    1993-01-01

    Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.

  10. Template based parallel checkpointing in a massively parallel computer system

    Science.gov (United States)

    Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  11. Practical parallel processing

    International Nuclear Information System (INIS)

    Arendt, M.L.

    1986-01-01

    ELXSI, a San Jose based computer company, was founded in January of 1979 for the purpose of developing and marketing a tightly-coupled multiple processor system. After five years ELXSI succeeded in making the first commercial installations at Digicon Geophysical, NASA-Dryden, and Sandia National Laboratories. Since that time over fifty-one systems and ninety-three processors have been installed. The commercial success of the ELXSI system 6400(TM) is due to several significant breakthroughs in computer technology including a system bus operating at 320 million bytes per second, a new Message-Based Operating System, EMBOS (TM), and a new system organization which allows for easy expansion in any dimension without changes to the operating system, the user environment, or the application programs. (Auth.)

  12. High-energy physics software parallelization using database techniques

    International Nuclear Information System (INIS)

    Argante, E.; Van der Stok, P.D.V.; Willers, I.

    1997-01-01

    A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradigm, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI. (orig.)

  13. Parallel education: what is it?

    OpenAIRE

    Amos, Michelle Peta

    2017-01-01

    In the history of education it has long been discussed that single-sex and coeducation are the two models of education present in schools. With the introduction of parallel schools over the last 15 years, there has been very little research into this 'new model'. Many people do not understand what it means for a school to be parallel or they confuse a parallel model with co-education, due to the presence of both boys and girls within the one institution. Therefore, the main obj...

  14. Balanced, parallel operation of flashlamps

    International Nuclear Information System (INIS)

    Carder, B.M.; Merritt, B.T.

    1979-01-01

    A new energy store, the Compensated Pulsed Alternator (CPA), promises to be a cost effective substitute for capacitors to drive flashlamps that pump large Nd:glass lasers. Because the CPA is large and discrete, it will be necessary that it drive many parallel flashlamp circuits, presenting a problem in equal current distribution. Current division to +- 20% between parallel flashlamps has been achieved, but this is marginal for laser pumping. A method is presented here that provides equal current sharing to about 1%, and it includes fused protection against short circuit faults. The method was tested with eight parallel circuits, including both open-circuit and short-circuit fault tests

  15. Workspace Analysis for Parallel Robot

    Directory of Open Access Journals (Sweden)

    Ying Sun

    2013-05-01

    Full Text Available As a completely new-type of robot, the parallel robot possesses a lot of advantages that the serial robot does not, such as high rigidity, great load-carrying capacity, small error, high precision, small self-weight/load ratio, good dynamic behavior and easy control, hence its range is extended in using domain. In order to find workspace of parallel mechanism, the numerical boundary-searching algorithm based on the reverse solution of kinematics and limitation of link length has been introduced. This paper analyses position workspace, orientation workspace of parallel robot of the six degrees of freedom. The result shows: It is a main means to increase and decrease its workspace to change the length of branch of parallel mechanism; The radius of the movement platform has no effect on the size of workspace, but will change position of workspace.

  16. "Feeling" Series and Parallel Resistances.

    Science.gov (United States)

    Morse, Robert A.

    1993-01-01

    Equipped with drinking straws and stirring straws, a teacher can help students understand how resistances in electric circuits combine in series and in parallel. Follow-up suggestions are provided. (ZWH)

  17. Parallel encoders for pixel detectors

    International Nuclear Information System (INIS)

    Nikityuk, N.M.

    1991-01-01

    A new method of fast encoding and determining the multiplicity and coordinates of fired pixels is described. A specific example construction of parallel encodes and MCC for n=49 and t=2 is given. 16 refs.; 6 figs.; 2 tabs

  18. Event monitoring of parallel computations

    Directory of Open Access Journals (Sweden)

    Gruzlikov Alexander M.

    2015-06-01

    Full Text Available The paper considers the monitoring of parallel computations for detection of abnormal events. It is assumed that computations are organized according to an event model, and monitoring is based on specific test sequences

  19. The STAPL Parallel Graph Library

    KAUST Repository

    Harshvardhan,

    2013-01-01

    This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraph pViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges. © Springer-Verlag Berlin Heidelberg 2013.

  20. Programming

    International Nuclear Information System (INIS)

    Jackson, M.A.

    1982-01-01

    The programmer's task is often taken to be the construction of algorithms, expressed in hierarchical structures of procedures: this view underlies the majority of traditional programming languages, such as Fortran. A different view is appropriate to a wide class of problem, perhaps including some problems in High Energy Physics. The programmer's task is regarded as having three main stages: first, an explicit model is constructed of the reality with which the program is concerned; second, this model is elaborated to produce the required program outputs; third, the resulting program is transformed to run efficiently in the execution environment. The first two stages deal in network structures of sequential processes; only the third is concerned with procedure hierarchies. (orig.)

  1. Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS

    Science.gov (United States)

    Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin

    2011-04-01

    To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.

  2. Programming

    OpenAIRE

    Jackson, M A

    1982-01-01

    The programmer's task is often taken to be the construction of algorithms, expressed in hierarchical structures of procedures: this view underlies the majority of traditional programming languages, such as Fortran. A different view is appropriate to a wide class of problem, perhaps including some problems in High Energy Physics. The programmer's task is regarded as having three main stages: first, an explicit model is constructed of the reality with which the program is concerned; second, thi...

  3. Lattice gauge theory using parallel processors

    International Nuclear Information System (INIS)

    Lee, T.D.; Chou, K.C.; Zichichi, A.

    1987-01-01

    The book's contents include: Lattice Gauge Theory Lectures: Introduction and Current Fermion Simulations; Monte Carlo Algorithms for Lattice Gauge Theory; Specialized Computers for Lattice Gauge Theory; Lattice Gauge Theory at Finite Temperature: A Monte Carlo Study; Computational Method - An Elementary Introduction to the Langevin Equation, Present Status of Numerical Quantum Chromodynamics; Random Lattice Field Theory; The GF11 Processor and Compiler; and The APE Computer and First Physics Results; Columbia Supercomputer Project: Parallel Supercomputer for Lattice QCD; Statistical and Systematic Errors in Numerical Simulations; Monte Carlo Simulation for LGT and Programming Techniques on the Columbia Supercomputer; Food for Thought: Five Lectures on Lattice Gauge Theory

  4. Parallelized Seeded Region Growing Using CUDA

    Directory of Open Access Journals (Sweden)

    Seongjin Park

    2014-01-01

    Full Text Available This paper presents a novel method for parallelizing the seeded region growing (SRG algorithm using Compute Unified Device Architecture (CUDA technology, with intention to overcome the theoretical weakness of SRG algorithm of its computation time being directly proportional to the size of a segmented region. The segmentation performance of the proposed CUDA-based SRG is compared with SRG implementations on single-core CPUs, quad-core CPUs, and shader language programming, using synthetic datasets and 20 body CT scans. Based on the experimental results, the CUDA-based SRG outperforms the other three implementations, advocating that it can substantially assist the segmentation during massive CT screening tests.

  5. Parallel processing of neutron transport in fuel assembly calculation

    International Nuclear Information System (INIS)

    Song, Jae Seung

    1992-02-01

    Group constants, which are used for reactor analyses by nodal method, are generated by fuel assembly calculations based on the neutron transport theory, since one or a quarter of the fuel assembly corresponds to a unit mesh in the current nodal calculation. The group constant calculation for a fuel assembly is performed through spectrum calculations, a two-dimensional fuel assembly calculation, and depletion calculations. The purpose of this study is to develop a parallel algorithm to be used in a parallel processor for the fuel assembly calculation and the depletion calculations of the group constant generation. A serial program, which solves the neutron integral transport equation using the transmission probability method and the linear depletion equation, was prepared and verified by a benchmark calculation. Small changes from the serial program was enough to parallelize the depletion calculation which has inherent parallel characteristics. In the fuel assembly calculation, however, efficient parallelization is not simple and easy because of the many coupling parameters in the calculation and data communications among CPU's. In this study, the group distribution method is introduced for the parallel processing of the fuel assembly calculation to minimize the data communications. The parallel processing was performed on Quadputer with 4 CPU's operating in NURAD Lab. at KAIST. Efficiencies of 54.3 % and 78.0 % were obtained in the fuel assembly calculation and depletion calculation, respectively, which lead to the overall speedup of about 2.5. As a result, it is concluded that the computing time consumed for the group constant generation can be easily reduced by parallel processing on the parallel computer with small size CPU's

  6. Combining Compile-Time and Run-Time Parallelization

    Directory of Open Access Journals (Sweden)

    Sungdo Moon

    1999-01-01

    Full Text Available This paper demonstrates that significant improvements to automatic parallelization technology require that existing systems be extended in two ways: (1 they must combine high‐quality compile‐time analysis with low‐cost run‐time testing; and (2 they must take control flow into account during analysis. We support this claim with the results of an experiment that measures the safety of parallelization at run time for loops left unparallelized by the Stanford SUIF compiler’s automatic parallelization system. We present results of measurements on programs from two benchmark suites – SPECFP95 and NAS sample benchmarks – which identify inherently parallel loops in these programs that are missed by the compiler. We characterize remaining parallelization opportunities, and find that most of the loops require run‐time testing, analysis of control flow, or some combination of the two. We present a new compile‐time analysis technique that can be used to parallelize most of these remaining loops. This technique is designed to not only improve the results of compile‐time parallelization, but also to produce low‐cost, directed run‐time tests that allow the system to defer binding of parallelization until run‐time when safety cannot be proven statically. We call this approach predicated array data‐flow analysis. We augment array data‐flow analysis, which the compiler uses to identify independent and privatizable arrays, by associating predicates with array data‐flow values. Predicated array data‐flow analysis allows the compiler to derive “optimistic” data‐flow values guarded by predicates; these predicates can be used to derive a run‐time test guaranteeing the safety of parallelization.

  7. New Parallel Algorithms for Landscape Evolution Model

    Science.gov (United States)

    Jin, Y.; Zhang, H.; Shi, Y.

    2017-12-01

    Most landscape evolution models (LEM) developed in the last two decades solve the diffusion equation to simulate the transportation of surface sediments. This numerical approach is difficult to parallelize due to the computation of drainage area for each node, which needs huge amount of communication if run in parallel. In order to overcome this difficulty, we developed two parallel algorithms for LEM with a stream net. One algorithm handles the partition of grid with traditional methods and applies an efficient global reduction algorithm to do the computation of drainage areas and transport rates for the stream net; the other algorithm is based on a new partition algorithm, which partitions the nodes in catchments between processes first, and then partitions the cells according to the partition of nodes. Both methods focus on decreasing communication between processes and take the advantage of massive computing techniques, and numerical experiments show that they are both adequate to handle large scale problems with millions of cells. We implemented the two algorithms in our program based on the widely used finite element library deal.II, so that it can be easily coupled with ASPECT.

  8. Parallel Robot for Lower Limb Rehabilitation Exercises

    Directory of Open Access Journals (Sweden)

    Alireza Rastegarpanah

    2016-01-01

    Full Text Available The aim of this study is to investigate the capability of a 6-DoF parallel robot to perform various rehabilitation exercises. The foot trajectories of twenty healthy participants have been measured by a Vicon system during the performing of four different exercises. Based on the kinematics and dynamics of a parallel robot, a MATLAB program was developed in order to calculate the length of the actuators, the actuators’ forces, workspace, and singularity locus of the robot during the performing of the exercises. The calculated length of the actuators and the actuators’ forces were used by motion analysis in SolidWorks in order to simulate different foot trajectories by the CAD model of the robot. A physical parallel robot prototype was built in order to simulate and execute the foot trajectories of the participants. Kinect camera was used to track the motion of the leg’s model placed on the robot. The results demonstrate the robot’s capability to perform a full range of various rehabilitation exercises.

  9. A multitransputer parallel processing system (MTPPS)

    International Nuclear Information System (INIS)

    Jethra, A.K.; Pande, S.S.; Borkar, S.P.; Khare, A.N.; Ghodgaonkar, M.D.; Bairi, B.R.

    1993-01-01

    This report describes the design and implementation of a 16 node Multi Transputer Parallel Processing System(MTPPS) which is a platform for parallel program development. It is a MIMD machine based on message passing paradigm. The basic compute engine is an Inmos Transputer Ims T800-20. Transputer with local memory constitutes the processing element (NODE) of this MIMD architecture. Multiple NODES can be connected to each other in an identifiable network topology through the high speed serial links of the transputer. A Network Configuration Unit (NCU) incorporates the necessary hardware to provide software controlled network configuration. System is modularly expandable and more NODES can be added to the system to achieve the required processing power. The system is backend to the IBM-PC which has been integrated into the system to provide user I/O interface. PC resources are available to the programmer. Interface hardware between the PC and the network of transputers is INMOS compatible. Therefore, all the commercially available development software compatible to INMOS products can run on this system. While giving the details of design and implementation, this report briefly summarises MIMD Architectures, Transputer Architecture and Parallel Processing Software Development issues. LINPACK performance evaluation of the system and solutions of neutron physics and plasma physics problem have been discussed along with results. (author). 12 refs., 22 figs., 3 tabs., 3 appendixes

  10. Endpoint-based parallel data processing in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

    2014-08-12

    Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.

  11. Mathematical Methods and Algorithms of Mobile Parallel Computing on the Base of Multi-core Processors

    Directory of Open Access Journals (Sweden)

    Alexander B. Bakulev

    2012-11-01

    Full Text Available This article deals with mathematical models and algorithms, providing mobility of sequential programs parallel representation on the high-level language, presents formal model of operation environment processes management, based on the proposed model of programs parallel representation, presenting computation process on the base of multi-core processors.

  12. Parallel Implicit Algorithms for CFD

    Science.gov (United States)

    Keyes, David E.

    1998-01-01

    The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.

  13. Second derivative parallel block backward differentiation type ...

    African Journals Online (AJOL)

    Second derivative parallel block backward differentiation type formulas for Stiff ODEs. ... Log in or Register to get access to full text downloads. ... and the methods are inherently parallel and can be distributed over parallel processors. They are ...

  14. A Parallel Approach to Fractal Image Compression

    OpenAIRE

    Lubomir Dedera

    2004-01-01

    The paper deals with a parallel approach to coding and decoding algorithms in fractal image compressionand presents experimental results comparing sequential and parallel algorithms from the point of view of achieved bothcoding and decoding time and effectiveness of parallelization.

  15. Automatic parallelization of while-Loops using speculative execution

    International Nuclear Information System (INIS)

    Collard, J.F.

    1995-01-01

    Automatic parallelization of imperative sequential programs has focused on nests of for-loops. The most recent of them consist in finding an affine mapping with respect to the loop indices to simultaneously capture the temporal and spatial properties of the parallelized program. Such a mapping is usually called a open-quotes space-time transformation.close quotes This work describes an extension of these techniques to while-loops using speculative execution. We show that space-time transformations are a good framework for summing up previous restructuration techniques of while-loop, such as pipelining. Moreover, we show that these transformations can be derived and applied automatically

  16. Rubus: A compiler for seamless and extensible parallelism

    Science.gov (United States)

    Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

    2017-01-01

    Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been

  17. Parallelization of the model-based iterative reconstruction algorithm DIRA

    International Nuclear Information System (INIS)

    Oertenberg, A.; Sandborg, M.; Alm Carlsson, G.; Malusek, A.; Magnusson, M.

    2016-01-01

    New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelization of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelization of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelized using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelization of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelization with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. (authors)

  18. Algorithms for computational fluid dynamics n parallel processors

    International Nuclear Information System (INIS)

    Van de Velde, E.F.

    1986-01-01

    A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries

  19. Parallel fabrication of macroporous scaffolds.

    Science.gov (United States)

    Dobos, Andrew; Grandhi, Taraka Sai Pavan; Godeshala, Sudhakar; Meldrum, Deirdre R; Rege, Kaushal

    2018-07-01

    Scaffolds generated from naturally occurring and synthetic polymers have been investigated in several applications because of their biocompatibility and tunable chemo-mechanical properties. Existing methods for generation of 3D polymeric scaffolds typically cannot be parallelized, suffer from low throughputs, and do not allow for quick and easy removal of the fragile structures that are formed. Current molds used in hydrogel and scaffold fabrication using solvent casting and porogen leaching are often single-use and do not facilitate 3D scaffold formation in parallel. Here, we describe a simple device and related approaches for the parallel fabrication of macroporous scaffolds. This approach was employed for the generation of macroporous and non-macroporous materials in parallel, in higher throughput and allowed for easy retrieval of these 3D scaffolds once formed. In addition, macroporous scaffolds with interconnected as well as non-interconnected pores were generated, and the versatility of this approach was employed for the generation of 3D scaffolds from diverse materials including an aminoglycoside-derived cationic hydrogel ("Amikagel"), poly(lactic-co-glycolic acid) or PLGA, and collagen. Macroporous scaffolds generated using the device were investigated for plasmid DNA binding and cell loading, indicating the use of this approach for developing materials for different applications in biotechnology. Our results demonstrate that the device-based approach is a simple technology for generating scaffolds in parallel, which can enhance the toolbox of current fabrication techniques. © 2018 Wiley Periodicals, Inc.

  20. Parallel plasma fluid turbulence calculations

    International Nuclear Information System (INIS)

    Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.

    1994-01-01

    The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated

  1. Evaluating parallel optimization on transputers

    Directory of Open Access Journals (Sweden)

    A.G. Chalmers

    2003-12-01

    Full Text Available The faster processing power of modern computers and the development of efficient algorithms have made it possible for operations researchers to tackle a much wider range of problems than ever before. Further improvements in processing speed can be achieved utilising relatively inexpensive transputers to process components of an algorithm in parallel. The Davidon-Fletcher-Powell method is one of the most successful and widely used optimisation algorithms for unconstrained problems. This paper examines the algorithm and identifies the components that can be processed in parallel. The results of some experiments with these components are presented which indicates under what conditions parallel processing with an inexpensive configuration is likely to be faster than the traditional sequential implementations. The performance of the whole algorithm with its parallel components is then compared with the original sequential algorithm. The implementation serves to illustrate the practicalities of speeding up typical OR algorithms in terms of difficulty, effort and cost. The results give an indication of the savings in time a given parallel implementation can be expected to yield.

  2. An Automatic Instruction-Level Parallelization of Machine Code

    Directory of Open Access Journals (Sweden)

    MARINKOVIC, V.

    2018-02-01

    Full Text Available Prevailing multicores and novel manycores have made a great challenge of modern day - parallelization of embedded software that is still written as sequential. In this paper, automatic code parallelization is considered, focusing on developing a parallelization tool at the binary level as well as on the validation of this approach. The novel instruction-level parallelization algorithm for assembly code which uses the register names after SSA to find independent blocks of code and then to schedule independent blocks using METIS to achieve good load balance is developed. The sequential consistency is verified and the validation is done by measuring the program execution time on the target architecture. Great speedup, taken as the performance measure in the validation process, and optimal load balancing are achieved for multicore RISC processors with 2 to 16 cores (e.g. MIPS, MicroBlaze, etc.. In particular, for 16 cores, the average speedup is 7.92x, while in some cases it reaches 14x. An approach to automatic parallelization provided by this paper is useful to researchers and developers in the area of parallelization as the basis for further optimizations, as the back-end of a compiler, or as the code parallelization tool for an embedded system.

  3. Parallelization of a numerical simulation code for isotropic turbulence

    International Nuclear Information System (INIS)

    Sato, Shigeru; Yokokawa, Mitsuo; Watanabe, Tadashi; Kaburaki, Hideo.

    1996-03-01

    A parallel pseudospectral code which solves the three-dimensional Navier-Stokes equation by direct numerical simulation is developed and execution time, parallelization efficiency, load balance and scalability are evaluated. A vector parallel supercomputer, Fujitsu VPP500 with up to 16 processors is used for this calculation for Fourier modes up to 256x256x256 using 16 processors. Good scalability for number of processors is achieved when number of Fourier mode is fixed. For small Fourier modes, calculation time of the program is proportional to NlogN which is ideal complexity of calculation for 3D-FFT on vector parallel processors. It is found that the calculation performance decreases as the increase of the Fourier modes. (author)

  4. Test generation for digital circuits using parallel processing

    Science.gov (United States)

    Hartmann, Carlos R.; Ali, Akhtar-Uz-Zaman M.

    1990-12-01

    The problem of test generation for digital logic circuits is an NP-Hard problem. Recently, the availability of low cost, high performance parallel machines has spurred interest in developing fast parallel algorithms for computer-aided design and test. This report describes a method of applying a 15-valued logic system for digital logic circuit test vector generation in a parallel programming environment. A concept called fault site testing allows for test generation, in parallel, that targets more than one fault at a given location. The multi-valued logic system allows results obtained by distinct processors and/or processes to be merged by means of simple set intersections. A machine-independent description is given for the proposed algorithm.

  5. Parallel artificial liquid membrane extraction

    DEFF Research Database (Denmark)

    Gjelstad, Astrid; Rasmussen, Knut Einar; Parmer, Marthe Petrine

    2013-01-01

    This paper reports development of a new approach towards analytical liquid-liquid-liquid membrane extraction termed parallel artificial liquid membrane extraction. A donor plate and acceptor plate create a sandwich, in which each sample (human plasma) and acceptor solution is separated by an arti......This paper reports development of a new approach towards analytical liquid-liquid-liquid membrane extraction termed parallel artificial liquid membrane extraction. A donor plate and acceptor plate create a sandwich, in which each sample (human plasma) and acceptor solution is separated...... by an artificial liquid membrane. Parallel artificial liquid membrane extraction is a modification of hollow-fiber liquid-phase microextraction, where the hollow fibers are replaced by flat membranes in a 96-well plate format....

  6. An intelligent allocation algorithm for parallel processing

    Science.gov (United States)

    Carroll, Chester C.; Homaifar, Abdollah; Ananthram, Kishan G.

    1988-01-01

    The problem of allocating nodes of a program graph to processors in a parallel processing architecture is considered. The algorithm is based on critical path analysis, some allocation heuristics, and the execution granularity of nodes in a program graph. These factors, and the structure of interprocessor communication network, influence the allocation. To achieve realistic estimations of the executive durations of allocations, the algorithm considers the fact that nodes in a program graph have to communicate through varying numbers of tokens. Coarse and fine granularities have been implemented, with interprocessor token-communication duration, varying from zero up to values comparable to the execution durations of individual nodes. The effect on allocation of communication network structures is demonstrated by performing allocations for crossbar (non-blocking) and star (blocking) networks. The algorithm assumes the availability of as many processors as it needs for the optimal allocation of any program graph. Hence, the focus of allocation has been on varying token-communication durations rather than varying the number of processors. The algorithm always utilizes as many processors as necessary for the optimal allocation of any program graph, depending upon granularity and characteristics of the interprocessor communication network.

  7. Parallel algorithms for mapping pipelined and parallel computations

    Science.gov (United States)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  8. The parallel processing of EGS4 code on distributed memory scalar parallel computer:Intel Paragon XP/S15-256

    Energy Technology Data Exchange (ETDEWEB)

    Takemiya, Hiroshi; Ohta, Hirofumi; Honma, Ichirou

    1996-03-01

    The parallelization of Electro-Magnetic Cascade Monte Carlo Simulation Code, EGS4 on distributed memory scalar parallel computer: Intel Paragon XP/S15-256 is described. EGS4 has the feature that calculation time for one incident particle is quite different from each other because of the dynamic generation of secondary particles and different behavior of each particle. Granularity for parallel processing, parallel programming model and the algorithm of parallel random number generation are discussed and two kinds of method, each of which allocates particles dynamically or statically, are used for the purpose of realizing high speed parallel processing of this code. Among four problems chosen for performance evaluation, the speedup factors for three problems have been attained to nearly 100 times with 128 processor. It has been found that when both the calculation time for each incident particles and its dispersion are large, it is preferable to use dynamic particle allocation method which can average the load for each processor. And it has also been found that when they are small, it is preferable to use static particle allocation method which reduces the communication overhead. Moreover, it is pointed out that to get the result accurately, it is necessary to use double precision variables in EGS4 code. Finally, the workflow of program parallelization is analyzed and tools for program parallelization through the experience of the EGS4 parallelization are discussed. (author).

  9. CUBESIM, Hypercube and Denelcor Hep Parallel Computer Simulation

    International Nuclear Information System (INIS)

    Dunigan, T.H.

    1988-01-01

    1 - Description of program or function: CUBESIM is a set of subroutine libraries and programs for the simulation of message-passing parallel computers and shared-memory parallel computers. Subroutines are supplied to simulate the Intel hypercube and the Denelcor HEP parallel computers. The system permits a user to develop and test parallel programs written in C or FORTRAN on a single processor. The user may alter such hypercube parameters as message startup times, packet size, and the computation-to-communication ratio. The simulation generates a trace file that can be used for debugging, performance analysis, or graphical display. 2 - Method of solution: The CUBESIM simulator is linked with the user's parallel application routines to run as a single UNIX process. The simulator library provides a small operating system to perform process and message management. 3 - Restrictions on the complexity of the problem: Up to 128 processors can be simulated with a virtual memory limit of 6 million bytes. Up to 1000 processes can be simulated

  10. Parallelism in computations in quantum and statistical mechanics

    International Nuclear Information System (INIS)

    Clementi, E.; Corongiu, G.; Detrich, J.H.

    1985-01-01

    Often very fundamental biochemical and biophysical problems defy simulations because of limitations in today's computers. We present and discuss a distributed system composed of two IBM 4341 s and/or an IBM 4381 as front-end processors and ten FPS-164 attached array processors. This parallel system - called LCAP - has presently a peak performance of about 110 Mflops; extensions to higher performance are discussed. Presently, the system applications use a modified version of VM/SP as the operating system: description of the modifications is given. Three applications programs have been migrated from sequential to parallel: a molecular quantum mechanical, a Metropolis-Monte Carlo and a molecular dynamics program. Descriptions of the parallel codes are briefly outlined. Use of these parallel codes has already opened up new capabilities for our research. The very positive performance comparisons with today's supercomputers allow us to conclude that parallel computers and programming, of the type we have considered, represent a pragmatic answer to many computationally intensive problems. (orig.)

  11. Parallelization of elliptic solver for solving 1D Boussinesq model

    Science.gov (United States)

    Tarwidi, D.; Adytia, D.

    2018-03-01

    In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.

  12. Cellular automata a parallel model

    CERN Document Server

    Mazoyer, J

    1999-01-01

    Cellular automata can be viewed both as computational models and modelling systems of real processes. This volume emphasises the first aspect. In articles written by leading researchers, sophisticated massive parallel algorithms (firing squad, life, Fischer's primes recognition) are treated. Their computational power and the specific complexity classes they determine are surveyed, while some recent results in relation to chaos from a new dynamic systems point of view are also presented. Audience: This book will be of interest to specialists of theoretical computer science and the parallelism challenge.

  13. Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

    Science.gov (United States)

    Sun, Xian-He

    1997-01-01

    Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm

  14. Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

    Science.gov (United States)

    Nadkarni, P M; Miller, P L

    1991-01-01

    A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.

  15. ICASE Computer Science Program

    Science.gov (United States)

    1985-01-01

    The Institute for Computer Applications in Science and Engineering computer science program is discussed in outline form. Information is given on such topics as problem decomposition, algorithm development, programming languages, and parallel architectures.

  16. Leveraging Parallel Data Processing Frameworks with Verified Lifting

    Directory of Open Access Journals (Sweden)

    Maaz Bin Safeer Ahmad

    2016-11-01

    Full Text Available Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific languages that each framework supports. This rewriting–tedious and error-prone–also requires developers to choose the framework that best optimizes performance given a specific workload. This paper describes Casper, a novel compiler that automatically retargets sequential Java code for execution on Hadoop, a parallel data processing framework that implements the MapReduce paradigm. Given a sequential code fragment, Casper uses verified lifting to infer a high-level summary expressed in our program specification language that is then compiled for execution on Hadoop. We demonstrate that Casper automatically translates Java benchmarks into Hadoop. The translated results execute on average 3.3x faster than the sequential implementations and scale better, as well, to larger datasets.

  17. An educational tool for interactive parallel and distributed processing

    DEFF Research Database (Denmark)

    Pagliarini, Luigi; Lund, Henrik Hautop

    2012-01-01

    In this article we try to describe how the modular interactive tiles system (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing a handson educational tool that allows a change in the representation...... of abstract problems related to designing interactive parallel and distributed systems. Indeed, the MITS seems to bring a series of goals into education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, and user and multi-user interaction which can rarely be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples we show how to implement...

  18. Parallel Sparse Matrix - Vector Product

    DEFF Research Database (Denmark)

    Alexandersen, Joe; Lazarov, Boyan Stefanov; Dammann, Bernd

    This technical report contains a case study of a sparse matrix-vector product routine, implemented for parallel execution on a compute cluster with both pure MPI and hybrid MPI-OpenMP solutions. C++ classes for sparse data types were developed and the report shows how these class can be used...

  19. [Falsified medicines in parallel trade].

    Science.gov (United States)

    Muckenfuß, Heide

    2017-11-01

    The number of falsified medicines on the German market has distinctly increased over the past few years. In particular, stolen pharmaceutical products, a form of falsified medicines, have increasingly been introduced into the legal supply chain via parallel trading. The reasons why parallel trading serves as a gateway for falsified medicines are most likely the complex supply chains and routes of transport. It is hardly possible for national authorities to trace the history of a medicinal product that was bought and sold by several intermediaries in different EU member states. In addition, the heterogeneous outward appearance of imported and relabelled pharmaceutical products facilitates the introduction of illegal products onto the market. Official batch release at the Paul-Ehrlich-Institut offers the possibility of checking some aspects that might provide an indication of a falsified medicine. In some circumstances, this may allow the identification of falsified medicines before they come onto the German market. However, this control is only possible for biomedicinal products that have not received a waiver regarding official batch release. For improved control of parallel trade, better networking among the EU member states would be beneficial. European-wide regulations, e. g., for disclosure of the complete supply chain, would help to minimise the risks of parallel trading and hinder the marketing of falsified medicines.

  20. Where are the parallel algorithms?

    Science.gov (United States)

    Voigt, R. G.

    1985-01-01

    Four paradigms that can be useful in developing parallel algorithms are discussed. These include computational complexity analysis, changing the order of computation, asynchronous computation, and divide and conquer. Each is illustrated with an example from scientific computation, and it is shown that computational complexity must be used with great care or an inefficient algorithm may be selected.

  1. Parallel imaging with phase scrambling.

    Science.gov (United States)

    Zaitsev, Maxim; Schultz, Gerrit; Hennig, Juergen; Gruetter, Rolf; Gallichan, Daniel

    2015-04-01

    Most existing methods for accelerated parallel imaging in MRI require additional data, which are used to derive information about the sensitivity profile of each radiofrequency (RF) channel. In this work, a method is presented to avoid the acquisition of separate coil calibration data for accelerated Cartesian trajectories. Quadratic phase is imparted to the image to spread the signals in k-space (aka phase scrambling). By rewriting the Fourier transform as a convolution operation, a window can be introduced to the convolved chirp function, allowing a low-resolution image to be reconstructed from phase-scrambled data without prominent aliasing. This image (for each RF channel) can be used to derive coil sensitivities to drive existing parallel imaging techniques. As a proof of concept, the quadratic phase was applied by introducing an offset to the x(2) - y(2) shim and the data were reconstructed using adapted versions of the image space-based sensitivity encoding and GeneRalized Autocalibrating Partially Parallel Acquisitions algorithms. The method is demonstrated in a phantom (1 × 2, 1 × 3, and 2 × 2 acceleration) and in vivo (2 × 2 acceleration) using a 3D gradient echo acquisition. Phase scrambling can be used to perform parallel imaging acceleration without acquisition of separate coil calibration data, demonstrated here for a 3D-Cartesian trajectory. Further research is required to prove the applicability to other 2D and 3D sampling schemes. © 2014 Wiley Periodicals, Inc.

  2. Default Parallels Plesk Panel Page

    Science.gov (United States)

    services that small businesses want and need. Our software includes key building blocks of cloud service virtualized servers Service Provider Products Parallels® Automation Hosting, SaaS, and cloud computing , the leading hosting automation software. You see this page because there is no Web site at this

  3. Parallel plate transmission line transformer

    NARCIS (Netherlands)

    Voeten, S.J.; Brussaard, G.J.H.; Pemen, A.J.M.

    2011-01-01

    A Transmission Line Transformer (TLT) can be used to transform high-voltage nanosecond pulses. These transformers rely on the fact that the length of the pulse is shorter than the transmission lines used. This allows connecting the transmission lines in parallel at the input and in series at the

  4. Matpar: Parallel Extensions for MATLAB

    Science.gov (United States)

    Springer, P. L.

    1998-01-01

    Matpar is a set of client/server software that allows a MATLAB user to take advantage of a parallel computer for very large problems. The user can replace calls to certain built-in MATLAB functions with calls to Matpar functions.

  5. Massively parallel quantum computer simulator

    NARCIS (Netherlands)

    De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

    2007-01-01

    We describe portable software to simulate universal quantum computers on massive parallel Computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray

  6. Experiments with parallel algorithms for combinatorial problems

    NARCIS (Netherlands)

    G.A.P. Kindervater (Gerard); H.W.J.M. Trienekens

    1985-01-01

    textabstractIn the last decade many models for parallel computation have been proposed and many parallel algorithms have been developed. However, few of these models have been realized and most of these algorithms are supposed to run on idealized, unrealistic parallel machines. The parallel machines

  7. Methods and models for the construction of weakly parallel tests

    NARCIS (Netherlands)

    Adema, J.J.; Adema, Jos J.

    1992-01-01

    Several methods are proposed for the construction of weakly parallel tests [i.e., tests with the same test information function (TIF)]. A mathematical programming model that constructs tests containing a prespecified TIF and a heuristic that assigns items to tests with information functions that are

  8. A Parallel Genetic Algorithm for Automated Electronic Circuit Design

    Science.gov (United States)

    Lohn, Jason D.; Colombano, Silvano P.; Haith, Gary L.; Stassinopoulos, Dimitris; Norvig, Peter (Technical Monitor)

    2000-01-01

    We describe a parallel genetic algorithm (GA) that automatically generates circuit designs using evolutionary search. A circuit-construction programming language is introduced and we show how evolution can generate practical analog circuit designs. Our system allows circuit size (number of devices), circuit topology, and device values to be evolved. We present experimental results as applied to analog filter and amplifier design tasks.

  9. digital control of external devices through the parallel port

    African Journals Online (AJOL)

    2012-11-03

    Nov 3, 2012 ... PARALLEL PORT OF A COMPUTER USING VISUAL BASIC. P.E. Orukpea, A. Adesemowob a. Department of Electrical/Electronic Engineering, University of Benin, Nigeria. .... These are software (written program) used for con- trolling the ..... Computer Aided Design and Manufacturing. Prentice. Hall, First ...

  10. Comparative efficiencies of three parallel algorithms for nonlinear ...

    Indian Academy of Sciences (India)

    R. Narasimhan (Krishtel eMaging) 1461 1996 Oct 15 13:05:22

    This algorithm is better suited for large size problems on coarse ... and reliable time integration algorithms for solving the second-order dynamic equilibrium equations that arise due ... Programming models required to take advantage of the parallel and distributed ..... In addition, MPI added the concept of a 'virtual topology'.

  11. Methods and models for the construction of weakly parallel tests

    NARCIS (Netherlands)

    Adema, J.J.; Adema, Jos J.

    1990-01-01

    Methods are proposed for the construction of weakly parallel tests, that is, tests with the same test information function. A mathematical programing model for constructing tests with a prespecified test information function and a heuristic for assigning items to tests such that their information

  12. Scaling up machine learning: parallel and distributed approaches

    National Research Council Canada - National Science Library

    Bekkerman, Ron; Bilenko, Mikhail; Langford, John

    2012-01-01

    ... presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters; concurrent programming frameworks that include CUDA, MPI, MapReduce, and DryadLINQ; and various learning settings: supervised, unsupervised, semi-supervised, and online learning. Extensive coverage of parallelizat...

  13. Collectively loading programs in a multiple program multiple data environment

    Science.gov (United States)

    Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.; Gooding, Thomas M.; Miller, Samuel J.

    2016-11-08

    Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the program needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.

  14. Connectionist Models and Parallelism in High Level Vision.

    Science.gov (United States)

    1985-01-01

    GRANT NUMBER(s) Jerome A. Feldman N00014-82-K-0193 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENt. PROJECT, TASK Computer Science...Connectionist Models 2.1 Background and Overviev % Computer science is just beginning to look seriously at parallel computation : it may turn out that...the chair. The program includes intermediate level networks that compute more complex joints and ones that compute parallelograms in the image. These

  15. SPRINT: A new parallel framework for R

    Directory of Open Access Journals (Sweden)

    Scharinger Florian

    2008-12-01

    Full Text Available Abstract Background Microarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of the resulting data tests the limits of existing bioinformatics computing infrastructure. A solution to this issue is to use High Performance Computing (HPC systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest. Results We have designed and built a prototype framework that allows the addition of parallelised functions to R to enable the easy exploitation of HPC systems. The Simple Parallel R INTerface (SPRINT is a wrapper around such parallelised functions. Their use requires very little modification to existing sequential R scripts and no expertise in parallel computing. As an example we created a function that carries out the computation of a pairwise calculated correlation matrix. This performs well with SPRINT. When executed using SPRINT on an HPC resource of eight processors this computation reduces by more than three times the time R takes to complete it on one processor. Conclusion SPRINT allows the biostatistician to concentrate on the research problems rather than the computation, while still allowing exploitation of HPC systems. It is easy to use and with further development will become more useful as more

  16. Parallelizing AT with MatlabMPI

    International Nuclear Information System (INIS)

    2011-01-01

    The Accelerator Toolbox (AT) is a high-level collection of tools and scripts specifically oriented toward solving problems dealing with computational accelerator physics. It is integrated into the MATLAB environment, which provides an accessible, intuitive interface for accelerator physicists, allowing researchers to focus the majority of their efforts on simulations and calculations, rather than programming and debugging difficulties. Efforts toward parallelization of AT have been put in place to upgrade its performance to modern standards of computing. We utilized the packages MatlabMPI and pMatlab, which were developed by MIT Lincoln Laboratory, to set up a message-passing environment that could be called within MATLAB, which set up the necessary pre-requisites for multithread processing capabilities. On local quad-core CPUs, we were able to demonstrate processor efficiencies of roughly 95% and speed increases of nearly 380%. By exploiting the efficacy of modern-day parallel computing, we were able to demonstrate incredibly efficient speed increments per processor in AT's beam-tracking functions. Extrapolating from prediction, we can expect to reduce week-long computation runtimes to less than 15 minutes. This is a huge performance improvement and has enormous implications for the future computing power of the accelerator physics group at SSRL. However, one of the downfalls of parringpass is its current lack of transparency; the pMatlab and MatlabMPI packages must first be well-understood by the user before the system can be configured to run the scripts. In addition, the instantiation of argument parameters requires internal modification of the source code. Thus, parringpass, cannot be directly run from the MATLAB command line, which detracts from its flexibility and user-friendliness. Future work in AT's parallelization will focus on development of external functions and scripts that can be called from within MATLAB and configured on multiple nodes, while

  17. Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

    Science.gov (United States)

    Rostrup, Scott; De Sterck, Hans

    2010-12-01

    Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL

  18. A CS1 pedagogical approach to parallel thinking

    Science.gov (United States)

    Rague, Brian William

    Almost all collegiate programs in Computer Science offer an introductory course in programming primarily devoted to communicating the foundational principles of software design and development. The ACM designates this introduction to computer programming course for first-year students as CS1, during which methodologies for solving problems within a discrete computational context are presented. Logical thinking is highlighted, guided primarily by a sequential approach to algorithm development and made manifest by typically using the latest, commercially successful programming language. In response to the most recent developments in accessible multicore computers, instructors of these introductory classes may wish to include training on how to design workable parallel code. Novel issues arise when programming concurrent applications which can make teaching these concepts to beginning programmers a seemingly formidable task. Student comprehension of design strategies related to parallel systems should be monitored to ensure an effective classroom experience. This research investigated the feasibility of integrating parallel computing concepts into the first-year CS classroom. To quantitatively assess student comprehension of parallel computing, an experimental educational study using a two-factor mixed group design was conducted to evaluate two instructional interventions in addition to a control group: (1) topic lecture only, and (2) topic lecture with laboratory work using a software visualization Parallel Analysis Tool (PAT) specifically designed for this project. A new evaluation instrument developed for this study, the Perceptions of Parallelism Survey (PoPS), was used to measure student learning regarding parallel systems. The results from this educational study show a statistically significant main effect among the repeated measures, implying that student comprehension levels of parallel concepts as measured by the PoPS improve immediately after the delivery of

  19. Parallelization of a Monte Carlo particle transport simulation code

    Science.gov (United States)

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  20. A Novel Parallel Algorithm for Edit Distance Computation

    Directory of Open Access Journals (Sweden)

    Muhammad Murtaza Yousaf

    2018-01-01

    Full Text Available The edit distance between two sequences is the minimum number of weighted transformation-operations that are required to transform one string into the other. The weighted transformation-operations are insert, remove, and substitute. Dynamic programming solution to find edit distance exists but it becomes computationally intensive when the lengths of strings become very large. This work presents a novel parallel algorithm to solve edit distance problem of string matching. The algorithm is based on resolving dependencies in the dynamic programming solution of the problem and it is able to compute each row of edit distance table in parallel. In this way, it becomes possible to compute the complete table in min(m,n iterations for strings of size m and n whereas state-of-the-art parallel algorithm solves the problem in max(m,n iterations. The proposed algorithm also increases the amount of parallelism in each of its iteration. The algorithm is also capable of exploiting spatial locality while its implementation. Additionally, the algorithm works in a load balanced way that further improves its performance. The algorithm is implemented for multicore systems having shared memory. Implementation of the algorithm in OpenMP shows linear speedup and better execution time as compared to state-of-the-art parallel approach. Efficiency of the algorithm is also proven better in comparison to its competitor.

  1. Micro-mechanical Simulations of Soils using Massively Parallel Supercomputers

    Directory of Open Access Journals (Sweden)

    David W. Washington

    2004-06-01

    Full Text Available In this research a computer program, Trubal version 1.51, based on the Discrete Element Method was converted to run on a Connection Machine (CM-5,a massively parallel supercomputer with 512 nodes, to expedite the computational times of simulating Geotechnical boundary value problems. The dynamic memory algorithm in Trubal program did not perform efficiently in CM-2 machine with the Single Instruction Multiple Data (SIMD architecture. This was due to the communication overhead involving global array reductions, global array broadcast and random data movement. Therefore, a dynamic memory algorithm in Trubal program was converted to a static memory arrangement and Trubal program was successfully converted to run on CM-5 machines. The converted program was called "TRUBAL for Parallel Machines (TPM." Simulating two physical triaxial experiments and comparing simulation results with Trubal simulations validated the TPM program. With a 512 nodes CM-5 machine TPM produced a nine-fold speedup demonstrating the inherent parallelism within algorithms based on the Discrete Element Method.

  2. QDP++: Data Parallel Interface for QCD

    Energy Technology Data Exchange (ETDEWEB)

    Robert Edwards

    2003-03-01

    This is a user's guide for the C++ binding for the QDP Data Parallel Applications Programmer Interface developed under the auspices of the US Department of Energy Scientific Discovery through Advanced Computing (SciDAC) program. The QDP Level 2 API has the following features: (1) Provides data parallel operations (logically SIMD) on all sites across the lattice or subsets of these sites. (2) Operates on lattice objects, which have an implementation-dependent data layout that is not visible above this API. (3) Hides details of how the implementation maps onto a given architecture, namely how the logical problem grid (i.el lattice) is mapped onto the machine architecture. (4) Allows asynchronous (non-blocking) shifts of lattice level objects over any permutation map of site sonto sites. However, from the user's view these instructions appear blocking and in fact may be so in some implementation. (5) Provides broadcast operations (filling a lattice quantity from a scalar value(s)), global reduction operations, and lattice-wide operations on various data-type primitives, such as matrices, vectors, and tensor products of matrices (propagators). (6) Operator syntax that support complex expression constructions.

  3. Empirical study of parallel LRU simulation algorithms

    Science.gov (United States)

    Carr, Eric; Nicol, David M.

    1994-01-01

    This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.

  4. Optical flow optimization using parallel genetic algorithm

    Science.gov (United States)

    Zavala-Romero, Olmo; Botella, Guillermo; Meyer-Bäse, Anke; Meyer Base, Uwe

    2011-06-01

    A new approach to optimize the parameters of a gradient-based optical flow model using a parallel genetic algorithm (GA) is proposed. The main characteristics of the optical flow algorithm are its bio-inspiration and robustness against contrast, static patterns and noise, besides working consistently with several optical illusions where other algorithms fail. This model depends on many parameters which conform the number of channels, the orientations required, the length and shape of the kernel functions used in the convolution stage, among many more. The GA is used to find a set of parameters which improve the accuracy of the optical flow on inputs where the ground-truth data is available. This set of parameters helps to understand which of them are better suited for each type of inputs and can be used to estimate the parameters of the optical flow algorithm when used with videos that share similar characteristics. The proposed implementation takes into account the embarrassingly parallel nature of the GA and uses the OpenMP Application Programming Interface (API) to speedup the process of estimating an optimal set of parameters. The information obtained in this work can be used to dynamically reconfigure systems, with potential applications in robotics, medical imaging and tracking.

  5. Parallel Numerical Simulations of Water Reservoirs

    Science.gov (United States)

    Torres, Pedro; Mangiavacchi, Norberto

    2010-11-01

    The study of the water flow and scalar transport in water reservoirs is important for the determination of the water quality during the initial stages of the reservoir filling and during the life of the reservoir. For this scope, a parallel 2D finite element code for solving the incompressible Navier-Stokes equations coupled with scalar transport was implemented using the message-passing programming model, in order to perform simulations of hidropower water reservoirs in a computer cluster environment. The spatial discretization is based on the MINI element that satisfies the Babuska-Brezzi (BB) condition, which provides sufficient conditions for a stable mixed formulation. All the distributed data structures needed in the different stages of the code, such as preprocessing, solving and post processing, were implemented using the PETSc library. The resulting linear systems for the velocity and the pressure fields were solved using the projection method, implemented by an approximate block LU factorization. In order to increase the parallel performance in the solution of the linear systems, we employ the static condensation method for solving the intermediate velocity at vertex and centroid nodes separately. We compare performance results of the static condensation method with the approach of solving the complete system. In our tests the static condensation method shows better performance for large problems, at the cost of an increased memory usage. Performance results for other intensive parts of the code in a computer cluster are also presented.

  6. Structural synthesis of parallel robots

    CERN Document Server

    Gogu, Grigore

    This book represents the fifth part of a larger work dedicated to the structural synthesis of parallel robots. The originality of this work resides in the fact that it combines new formulae for mobility, connectivity, redundancy and overconstraints with evolutionary morphology in a unified structural synthesis approach that yields interesting and innovative solutions for parallel robotic manipulators.  This is the first book on robotics that presents solutions for coupled, decoupled, uncoupled, fully-isotropic and maximally regular robotic manipulators with Schönflies motions systematically generated by using the structural synthesis approach proposed in Part 1.  Overconstrained non-redundant/overactuated/redundantly actuated solutions with simple/complex limbs are proposed. Many solutions are presented here for the first time in the literature. The author had to make a difficult and challenging choice between protecting these solutions through patents and releasing them directly into the public domain. T...

  7. GPU Parallel Bundle Block Adjustment

    Directory of Open Access Journals (Sweden)

    ZHENG Maoteng

    2017-09-01

    Full Text Available To deal with massive data in photogrammetry, we introduce the GPU parallel computing technology. The preconditioned conjugate gradient and inexact Newton method are also applied to decrease the iteration times while solving the normal equation. A brand new workflow of bundle adjustment is developed to utilize GPU parallel computing technology. Our method can avoid the storage and inversion of the big normal matrix, and compute the normal matrix in real time. The proposed method can not only largely decrease the memory requirement of normal matrix, but also largely improve the efficiency of bundle adjustment. It also achieves the same accuracy as the conventional method. Preliminary experiment results show that the bundle adjustment of a dataset with about 4500 images and 9 million image points can be done in only 1.5 minutes while achieving sub-pixel accuracy.

  8. A tandem parallel plate analyzer

    International Nuclear Information System (INIS)

    Hamada, Y.; Fujisawa, A.; Iguchi, H.; Nishizawa, A.; Kawasumi, Y.

    1996-11-01

    By a new modification of a parallel plate analyzer the second-order focus is obtained in an arbitrary injection angle. This kind of an analyzer with a small injection angle will have an advantage of small operational voltage, compared to the Proca and Green analyzer where the injection angle is 30 degrees. Thus, the newly proposed analyzer will be very useful for the precise energy measurement of high energy particles in MeV range. (author)

  9. High-speed parallel counter

    International Nuclear Information System (INIS)

    Gus'kov, B.N.; Kalinnikov, V.A.; Krastev, V.R.; Maksimov, A.N.; Nikityuk, N.M.

    1985-01-01

    This paper describes a high-speed parallel counter that contains 31 inputs and 15 outputs and is implemented by integrated circuits of series 500. The counter is designed for fast sampling of events according to the number of particles that pass simultaneously through the hodoscopic plane of the detector. The minimum delay of the output signals relative to the input is 43 nsec. The duration of the output signals can be varied from 75 to 120 nsec

  10. An anthropologist in parallel structure

    Directory of Open Access Journals (Sweden)

    Noelle Molé Liston

    2016-08-01

    Full Text Available The essay examines the parallels between Molé Liston’s studies on labor and precarity in Italy and the United States’ anthropology job market. Probing the way economic shift reshaped the field of anthropology of Europe in the late 2000s, the piece explores how the neoliberalization of the American academy increased the value in studying the hardships and daily lives of non-western populations in Europe.

  11. Combinatorics of spreads and parallelisms

    CERN Document Server

    Johnson, Norman

    2010-01-01

    Partitions of Vector Spaces Quasi-Subgeometry Partitions Finite Focal-SpreadsGeneralizing André SpreadsThe Going Up Construction for Focal-SpreadsSubgeometry Partitions Subgeometry and Quasi-Subgeometry Partitions Subgeometries from Focal-SpreadsExtended André SubgeometriesKantor's Flag-Transitive DesignsMaximal Additive Partial SpreadsSubplane Covered Nets and Baer Groups Partial Desarguesian t-Parallelisms Direct Products of Affine PlanesJha-Johnson SL(2,

  12. New algorithms for parallel MRI

    International Nuclear Information System (INIS)

    Anzengruber, S; Ramlau, R; Bauer, F; Leitao, A

    2008-01-01

    Magnetic Resonance Imaging with parallel data acquisition requires algorithms for reconstructing the patient's image from a small number of measured lines of the Fourier domain (k-space). In contrast to well-known algorithms like SENSE and GRAPPA and its flavors we consider the problem as a non-linear inverse problem. However, in order to avoid cost intensive derivatives we will use Landweber-Kaczmarz iteration and in order to improve the overall results some additional sparsity constraints.

  13. Wakefield calculations on parallel computers

    International Nuclear Information System (INIS)

    Schoessow, P.

    1990-01-01

    The use of parallelism in the solution of wakefield problems is illustrated for two different computer architectures (SIMD and MIMD). Results are given for finite difference codes which have been implemented on a Connection Machine and an Alliant FX/8 and which are used to compute wakefields in dielectric loaded structures. Benchmarks on code performance are presented for both cases. 4 refs., 3 figs., 2 tabs

  14. A parallel solver for huge dense linear systems

    Science.gov (United States)

    Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

    2011-11-01

    HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system

  15. Aspects of computation on asynchronous parallel processors

    International Nuclear Information System (INIS)

    Wright, M.

    1989-01-01

    The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues

  16. Parallel processing of genomics data

    Science.gov (United States)

    Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario

    2016-10-01

    The availability of high-throughput experimental platforms for the analysis of biological samples, such as mass spectrometry, microarrays and Next Generation Sequencing, have made possible to analyze a whole genome in a single experiment. Such platforms produce an enormous volume of data per single experiment, thus the analysis of this enormous flow of data poses several challenges in term of data storage, preprocessing, and analysis. To face those issues, efficient, possibly parallel, bioinformatics software needs to be used to preprocess and analyze data, for instance to highlight genetic variation associated with complex diseases. In this paper we present a parallel algorithm for the parallel preprocessing and statistical analysis of genomics data, able to face high dimension of data and resulting in good response time. The proposed system is able to find statistically significant biological markers able to discriminate classes of patients that respond to drugs in different ways. Experiments performed on real and synthetic genomic datasets show good speed-up and scalability.

  17. Fiscal 2000 report on advanced parallelized compiler technology. Outlines; 2000 nendo advanced heiretsuka compiler gijutsu hokokusho (Gaiyo hen)

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2001-03-01

    Research and development was carried out concerning the automatic parallelized compiler technology which improves on the practical performance, cost/performance ratio, and ease of operation of the multiprocessor system now used for constructing supercomputers and expected to provide a fundamental architecture for microprocessors for the 21st century. Efforts were made to develop an automatic multigrain parallelization technology for extracting multigrain as parallelized from a program and for making full use of the same and a parallelizing tuning technology for accelerating parallelization by feeding back to the compiler the dynamic information and user knowledge to be acquired during execution. Moreover, a benchmark program was selected and studies were made to set execution rules and evaluation indexes for the establishment of technologies for subjectively evaluating the performance of parallelizing compilers for the existing commercial parallel processing computers, which was achieved through the implementation and evaluation of the 'Advanced parallelizing compiler technology research and development project.' (NEDO)

  18. Parallel computer calculation of quantum spin lattices

    International Nuclear Information System (INIS)

    Lamarcq, J.

    1998-01-01

    Numerical simulation allows the theorists to convince themselves about the validity of the models they use. Particularly by simulating the spin lattices one can judge about the validity of a conjecture. Simulating a system defined by a large number of degrees of freedom requires highly sophisticated machines. This study deals with modelling the magnetic interactions between the ions of a crystal. Many exact results have been found for spin 1/2 systems but not for systems of other spins for which many simulation have been carried out. The interest for simulations has been renewed by the Haldane's conjecture stipulating the existence of a energy gap between the ground state and the first excited states of a spin 1 lattice. The existence of this gap has been experimentally demonstrated. This report contains the following four chapters: 1. Spin systems; 2. Calculation of eigenvalues; 3. Programming; 4. Parallel calculation

  19. Parallel treatment of simulation particles in particle-in-cell codes on SUPRENUM

    International Nuclear Information System (INIS)

    Seldner, D.

    1990-02-01

    This report contains the program documentation and description of the program package 2D-PLAS, which has been developed at the Nuclear Research Center Karlsruhe in the Institute for Data Processing in Technology (IDT) under the auspices of the BMFT. 2D-PLAS is a parallel program version of the treatment of the simulation particles of the two-dimensional stationary particle-in-cell code BFCPIC which has been developed at the Nuclear Research Center Karlsruhe. This parallel version has been designed for the parallel computer SUPRENUM. (orig.) [de

  20. Parallel algorithms for network routing problems and recurrences

    International Nuclear Information System (INIS)

    Wisniewski, J.A.; Sameh, A.H.

    1982-01-01

    In this paper, we consider the parallel solution of recurrences, and linear systems in the regular algebra of Carre. These problems are equivalent to solving the shortest path problem in graph theory, and they also arise in the analysis of Fortran programs. Our methods for solving linear systems in the regular algebra are analogues of well-known methods for solving systems of linear algebraic equations. A parallel version of Dijkstra's method, which has no linear algebraic analogue, is presented. Considerations for choosing an algorithm when the problem is large and sparse are also discussed

  1. Badlands: A parallel basin and landscape dynamics model

    Directory of Open Access Journals (Sweden)

    T. Salles

    2016-01-01

    Full Text Available Over more than three decades, a number of numerical landscape evolution models (LEMs have been developed to study the combined effects of climate, sea-level, tectonics and sediments on Earth surface dynamics. Most of them are written in efficient programming languages, but often cannot be used on parallel architectures. Here, I present a LEM which ports a common core of accepted physical principles governing landscape evolution into a distributed memory parallel environment. Badlands (acronym for BAsin anD LANdscape DynamicS is an open-source, flexible, TIN-based landscape evolution model, built to simulate topography development at various space and time scales.

  2. An Educational Tool for Interactive Parallel and Distributed Processing

    DEFF Research Database (Denmark)

    Pagliarini, Luigi; Lund, Henrik Hautop

    2011-01-01

    In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation of the abs......In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation...... of the abstract problems related to designing interactive parallel and distributed systems. Indeed, MITS seems to bring a series of goals into the education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, user and multiuser interaction, which can hardly be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples show how to implement interactive...

  3. 'Poker' association of weekly alternating 5-fluorouracil, irinotecan, bevacizumab and oxaliplatin (FIr-B/FOx) in first line treatment of metastatic colorectal cancer: a phase II study

    International Nuclear Information System (INIS)

    Bruera, Gemma; Ficorella, Corrado; Ricevuto, Enrico; Santomaggio, Alessandra; Cannita, Katia; Baldi, Paola Lanfiuti; Tudini, Marianna; De Galitiis, Federica; Mancini, Maria; Marchetti, Paolo; Antonucci, Adelmo

    2010-01-01

    This phase II study investigated efficacy and safety of weekly alternating Bevacizumab (BEV)/Irinotecan (CPT-11) or Oxaliplatin (OHP) associated to weekly 5-Fluorouracil (5-FU) in first line treatment of metastatic colorectal carcinoma (MCRC). Simon two-step design: delta 20% (p 0 50%, p 1 70%), power 80%, α 5%, β 20%. Projected objective responses (ORR): I step, 8/15 patients (pts); II step 26/43 pts. Schedule: weekly 12 h-timed-flat-infusion/5-FU 900 mg/m 2 , days 1-2, 8-9, 15-16, 22-23; CPT-11 160 mg/m 2 plus BEV 5 mg/kg, days 1,15; OHP at three dose-levels, 60-70-80 mg/m 2 , days 8, 22; every 4 weeks. Fifty consecutive, unselected pts < 75 years were enrolled: median age 63; young-elderly (yE) 24 (48%); liver metastases (LM) 33 pts, 66% Achieved OHP recommended dose, 80 mg/m 2 . ORR 82% intent-to-treat and 84% as-treated analysis. Median progression-free survival 12 months. Equivalent efficacy was obtained in yE pts. Liver metastasectomies were performed in 26% of all pts and in 39% of pts with LM. After a median follow-up of 21 months, median overall survival was 28 months. Cumulative G3-4 toxicities per patient: diarrhea 28%, mucositis 6%, neutropenia 10%, hypertension 2%. They were equivalent in yE pts. Limiting toxicity syndromes (LTS), consisting of the dose-limiting toxicity, associated or not to G2 or limiting toxicities: 44% overall, 46% in yE. Multiple versus single site LTS, respectively: overall, 24% versus 20%; yE pts, 37.5% versus 8%. Poker combination shows high activity and efficacy in first line treatment of MCRC. It increases liver metastasectomies rate and can be safely administered. Osservatorio Nazionale sulla Sperimentazione Clinica dei Medicinali (OsSC) Agenzia Italiana del Farmaco (AIFA) Numero EudraCT 2007-004946-34

  4. A scalable implementation of RI-SCF on parallel computers

    International Nuclear Information System (INIS)

    Fruechtl, H.A.; Kendall, R.A.; Harrison, R.J.

    1996-01-01

    In order to avoid the integral bottleneck of conventional SCF calculations, the Resolution of the Identity (RI) method is used to obtain an approximate solution to the Hartree-Fock equations. In this approximation only three-center integrals are needed to build the Fock matrix. It has been implemented as part of the NWChem package of portable and scalable ab initio programs for parallel computers. Utilizing the V-approximation, both the Coulomb and exchange contribution to the Fock matrix can be calculated from a transformed set of three-center integrals which have to be precalculated and stored. A distributed in-core method as well as a disk based implementation have been programmed. Details of the implementation as well as the parallel programming tools used are described. We also give results and timings from benchmark calculations

  5. Northeast Artificial Intelligence Consortium Annual Report - 1988 Parallel Vision. Volume 9

    Science.gov (United States)

    1989-10-01

    supports the Northeast Aritificial Intelligence Consortium (NAIC). Volume 9 Parallel Vision Report submitted by Christopher M. Brown Randal C. Nelson...NORTHEAST ARTIFICIAL INTELLIGENCE CONSORTIUM ANNUAL REPORT - 1988 Parallel Vision Syracuse University Christopher M. Brown and Randal C. Nelson...Technical Director Directorate of Intelligence & Reconnaissance FOR THE COMMANDER: IGOR G. PLONISCH Directorate of Plans & Programs If your address has

  6. Data-parallel tomographic reconstruction : A comparison of filtered backprojection and direct Fourier reconstruction

    NARCIS (Netherlands)

    Roerdink, J.B.T.M.; Westenberg, M.A

    1998-01-01

    We consider the parallelization of two standard 2D reconstruction algorithms, filtered backprojection and direct Fourier reconstruction, using the data-parallel programming style. The algorithms are implemented on a Connection Machine CM-5 with 16 processors and a peak performance of 2 Gflop/s.

  7. Application of Pfortran and Co-Array Fortran in the Parallelization of the GROMOS96 Molecular Dynamics Module

    Directory of Open Access Journals (Sweden)

    Piotr Bała

    2001-01-01

    Full Text Available After at least a decade of parallel tool development, parallelization of scientific applications remains a significant undertaking. Typically parallelization is a specialized activity supported only partially by the programming tool set, with the programmer involved with parallel issues in addition to sequential ones. The details of concern range from algorithm design down to low-level data movement details. The aim of parallel programming tools is to automate the latter without sacrificing performance and portability, allowing the programmer to focus on algorithm specification and development. We present our use of two similar parallelization tools, Pfortran and Cray's Co-Array Fortran, in the parallelization of the GROMOS96 molecular dynamics module. Our parallelization started from the GROMOS96 distribution's shared-memory implementation of the replicated algorithm, but used little of that existing parallel structure. Consequently, our parallelization was close to starting with the sequential version. We found the intuitive extensions to Pfortran and Co-Array Fortran helpful in the rapid parallelization of the project. We present performance figures for both the Pfortran and Co-Array Fortran parallelizations showing linear speedup within the range expected by these parallelization methods.

  8. Automatic Loop Parallelization via Compiler Guided Refactoring

    DEFF Research Database (Denmark)

    Larsen, Per; Ladelsky, Razya; Lidman, Jacob

    For many parallel applications, performance relies not on instruction-level parallelism, but on loop-level parallelism. Unfortunately, many modern applications are written in ways that obstruct automatic loop parallelization. Since we cannot identify sufficient parallelization opportunities...... for these codes in a static, off-line compiler, we developed an interactive compilation feedback system that guides the programmer in iteratively modifying application source, thereby improving the compiler’s ability to generate loop-parallel code. We use this compilation system to modify two sequential...... benchmarks, finding that the code parallelized in this way runs up to 8.3 times faster on an octo-core Intel Xeon 5570 system and up to 12.5 times faster on a quad-core IBM POWER6 system. Benchmark performance varies significantly between the systems. This suggests that semi-automatic parallelization should...

  9. Parallel kinematics type, kinematics, and optimal design

    CERN Document Server

    Liu, Xin-Jun

    2014-01-01

    Parallel Kinematics- Type, Kinematics, and Optimal Design presents the results of 15 year's research on parallel mechanisms and parallel kinematics machines. This book covers the systematic classification of parallel mechanisms (PMs) as well as providing a large number of mechanical architectures of PMs available for use in practical applications. It focuses on the kinematic design of parallel robots. One successful application of parallel mechanisms in the field of machine tools, which is also called parallel kinematics machines, has been the emerging trend in advanced machine tools. The book describes not only the main aspects and important topics in parallel kinematics, but also references novel concepts and approaches, i.e. type synthesis based on evolution, performance evaluation and optimization based on screw theory, singularity model taking into account motion and force transmissibility, and others.   This book is intended for researchers, scientists, engineers and postgraduates or above with interes...

  10. Applied Parallel Computing Industrial Computation and Optimization

    DEFF Research Database (Denmark)

    Madsen, Kaj; NA NA NA Olesen, Dorte

    Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)......Proceedings and the Third International Workshop on Applied Parallel Computing in Industrial Problems and Optimization (PARA96)...

  11. Highly parallel machines and future of scientific computing

    International Nuclear Information System (INIS)

    Singh, G.S.

    1992-01-01

    Computing requirement of large scale scientific computing has always been ahead of what state of the art hardware could supply in the form of supercomputers of the day. And for any single processor system the limit to increase in the computing power was realized a few years back itself. Now with the advent of parallel computing systems the availability of machines with the required computing power seems a reality. In this paper the author tries to visualize the future large scale scientific computing in the penultimate decade of the present century. The author summarized trends in parallel computers and emphasize the need for a better programming environment and software tools for optimal performance. The author concludes this paper with critique on parallel architectures, software tools and algorithms. (author). 10 refs., 2 tabs

  12. Parallel grid generation algorithm for distributed memory computers

    Science.gov (United States)

    Moitra, Stuti; Moitra, Anutosh

    1994-01-01

    A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.

  13. Parallel algorithms and cluster computing

    CERN Document Server

    Hoffmann, Karl Heinz

    2007-01-01

    This book presents major advances in high performance computing as well as major advances due to high performance computing. It contains a collection of papers in which results achieved in the collaboration of scientists from computer science, mathematics, physics, and mechanical engineering are presented. From the science problems to the mathematical algorithms and on to the effective implementation of these algorithms on massively parallel and cluster computers we present state-of-the-art methods and technology as well as exemplary results in these fields. This book shows that problems which seem superficially distinct become intimately connected on a computational level.

  14. Parallel computation of rotating flows

    DEFF Research Database (Denmark)

    Lundin, Lars Kristian; Barker, Vincent A.; Sørensen, Jens Nørkær

    1999-01-01

    This paper deals with the simulation of 3‐D rotating flows based on the velocity‐vorticity formulation of the Navier‐Stokes equations in cylindrical coordinates. The governing equations are discretized by a finite difference method. The solution is advanced to a new time level by a two‐step process...... is that of solving a singular, large, sparse, over‐determined linear system of equations, and the iterative method CGLS is applied for this purpose. We discuss some of the mathematical and numerical aspects of this procedure and report on the performance of our software on a wide range of parallel computers. Darbe...

  15. The parallel volume at large distances

    DEFF Research Database (Denmark)

    Kampf, Jürgen

    In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to . This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....

  16. The parallel volume at large distances

    DEFF Research Database (Denmark)

    Kampf, Jürgen

    In this paper we examine the asymptotic behavior of the parallel volume of planar non-convex bodies as the distance tends to infinity. We show that the difference between the parallel volume of the convex hull of a body and the parallel volume of the body itself tends to 0. This yields a new proof...... for the fact that a planar body can only have polynomial parallel volume, if it is convex. Extensions to Minkowski spaces and random sets are also discussed....

  17. A Parallel Approach to Fractal Image Compression

    Directory of Open Access Journals (Sweden)

    Lubomir Dedera

    2004-01-01

    Full Text Available The paper deals with a parallel approach to coding and decoding algorithms in fractal image compressionand presents experimental results comparing sequential and parallel algorithms from the point of view of achieved bothcoding and decoding time and effectiveness of parallelization.

  18. Parallel Computing Using Web Servers and "Servlets".

    Science.gov (United States)

    Lo, Alfred; Bloor, Chris; Choi, Y. K.

    2000-01-01

    Describes parallel computing and presents inexpensive ways to implement a virtual parallel computer with multiple Web servers. Highlights include performance measurement of parallel systems; models for using Java and intranet technology including single server, multiple clients and multiple servers, single client; and a comparison of CGI (common…

  19. Comparison of parallel viscosity with neoclassical theory

    International Nuclear Information System (INIS)

    Ida, K.; Nakajima, N.

    1996-04-01

    Toroidal rotation profiles are measured with charge exchange spectroscopy for the plasma heated with tangential NBI in CHS heliotron/torsatron device to estimate parallel viscosity. The parallel viscosity derived from the toroidal rotation velocity shows good agreement with the neoclassical parallel viscosity plus the perpendicular viscosity. (μ perpendicular = 2 m 2 /s). (author)

  20. Advances in randomized parallel computing

    CERN Document Server

    Rajasekaran, Sanguthevar

    1999-01-01

    The technique of randomization has been employed to solve numerous prob­ lems of computing both sequentially and in parallel. Examples of randomized algorithms that are asymptotically better than their deterministic counterparts in solving various fundamental problems abound. Randomized algorithms have the advantages of simplicity and better performance both in theory and often in practice. This book is a collection of articles written by renowned experts in the area of randomized parallel computing. A brief introduction to randomized algorithms In the aflalysis of algorithms, at least three different measures of performance can be used: the best case, the worst case, and the average case. Often, the average case run time of an algorithm is much smaller than the worst case. 2 For instance, the worst case run time of Hoare's quicksort is O(n ), whereas its average case run time is only O( n log n). The average case analysis is conducted with an assumption on the input space. The assumption made to arrive at t...