tuning parallel programs: Topics by WorldWideScience.org

Sample records for tuning parallel programs

Methodologies and Tools for Tuning Parallel Programs: 80% Art, 20% Science, and 10% Luck

Science.gov (United States)

Yan, Jerry C.; Bailey, David (Technical Monitor)

1996-01-01

The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. In the past few years, the ubiquitous introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CRI's Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance instrumentation/monitor/tuning technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g. AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.
Automatic performance tuning of parallel and accelerated seismic imaging kernels

KAUST Repository

Haberdar, Hakan

2014-01-01

With the increased complexity and diversity of mainstream high performance computing systems, significant effort is required to tune parallel applications in order to achieve the best possible performance for each particular platform. This task becomes more and more challenging and requiring a larger set of skills. Automatic performance tuning is becoming a must for optimizing applications such as Reverse Time Migration (RTM) widely used in seismic imaging for oil and gas exploration. An empirical search based auto-tuning approach is applied to the MPI communication operations of the parallel isotropic and tilted transverse isotropic kernels. The application of auto-tuning using the Abstract Data and Communication Library improved the performance of the MPI communications as well as developer productivity by providing a higher level of abstraction. Keeping productivity in mind, we opted toward pragma based programming for accelerated computation on latest accelerated architectures such as GPUs using the fairly new OpenACC standard. The same auto-tuning approach is also applied to the OpenACC accelerated seismic code for optimizing the compute intensive kernel of the Reverse Time Migration application. The application of such technique resulted in an improved performance of the original code and its ability to adapt to different execution environments.
Enlargement of Tuning Range in a Ferrite-Tuned Cavity Through Superposed Orthogonal and Parallel Magnetic Bias

CERN Document Server

Vollinger, C

2013-01-01

Conventional ferrite-tuned cavities operate either with bias fields that are orthogonal or parallel to the magnetic RF-field. For a cavity that tunes rapidly over an overall frequency range around 100-400 MHz with high Q, we use ferrite garnets exposed to an innovative new biasing method consisting of a superposition of perpendicular and parallel magnetic fields. This method leads to a significant enlargement of the high-Q cavity tuning range by defining an operation point close to the magnetic saturation and thus improving ferrite material behaviour. A further advantage of this technique is the fast tuning speed resulting from the fact that tuning is carried out either with pure parallel biasing, or together with a very small change of operating point from perpendicular bias. In this paper, several scaled test models of ferrite-filled resonators are shown; measurements on the set-ups are compared and discussed.
Dynamic Performance Tuning Supported by Program Specification

Directory of Open Access Journals (Sweden)

Eduardo César

2002-01-01

Full Text Available Performance analysis and tuning of parallel/distributed applications are very difficult tasks for non-expert programmers. It is necessary to provide tools that automatically carry out these tasks. These can be static tools that carry out the analysis on a post-mortem phase or can tune the application on the fly. Both kind of tools have their target applications. Static automatic analysis tools are suitable for stable application while dynamic tuning tools are more appropriate to applications with dynamic behaviour. In this paper, we describe KappaPi as an example of a static automatic performance analysis tool, and also a general environment based on parallel patterns for developing and dynamically tuning parallel/distributed applications.
Tuning HDF5 subfiling performance on parallel file systems

Energy Technology Data Exchange (ETDEWEB)

Byna, Suren [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Chaarawi, Mohamad [Intel Corp. (United States); Koziol, Quincey [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Mainzer, John [The HDF Group (United States); Willmore, Frank [The HDF Group (United States)

2017-05-12

Subfiling is a technique used on parallel file systems to reduce locking and contention issues when multiple compute nodes interact with the same storage target node. Subfiling provides a compromise between the single shared file approach that instigates the lock contention problems on parallel file systems and having one file per process, which results in generating a massive and unmanageable number of files. In this paper, we evaluate and tune the performance of recently implemented subfiling feature in HDF5. In specific, we explain the implementation strategy of subfiling feature in HDF5, provide examples of using the feature, and evaluate and tune parallel I/O performance of this feature with parallel file systems of the Cray XC40 system at NERSC (Cori) that include a burst buffer storage and a Lustre disk-based storage. We also evaluate I/O performance on the Cray XC30 system, Edison, at NERSC. Our results show performance benefits of 1.2X to 6X performance advantage with subfiling compared to writing a single shared HDF5 file. We present our exploration of configurations, such as the number of subfiles and the number of Lustre storage targets to storing files, as optimization parameters to obtain superior I/O performance. Based on this exploration, we discuss recommendations for achieving good I/O performance as well as limitations with using the subfiling feature.
Dealing with BIG Data - Exploiting the Potential of Multicore Parallelism and Auto-Tuning

CERN Multimedia

CERN. Geneva

2012-01-01

Physics experiments nowadays produce tremendous amounts of data that require sophisticated analyses in order to gain new insights. At such large scale, scientists are facing non-trivial software engineering problems in addition to the physics problems. Ubiquitous multicore processors and GPGPUs have turned almost any computer into a parallel machine and have pushed compute clusters and clouds to become multicore-based and more heterogenous. These developments complicate the exploitation of various types of parallelism within different layers of hardware and software. As a consequence, manual performance tuning is non-intuitive and tedious due to the large search space spanned by numerous inter-related tuning parameters. This talk addresses these challenges at CERN and discusses how to leverage multicore parallelization techniques in this context. It presents recent advances in automatic performance tuning to algorithmically find sweet spots with good performance. The talk also presents results from empiri...
Peformance Tuning and Evaluation of a Parallel Community Climate Model

Energy Technology Data Exchange (ETDEWEB)

Drake, J.B.; Worley, P.H.; Hammond, S.

1999-11-13

The Parallel Community Climate Model (PCCM) is a message-passing parallelization of version 2.1 of the Community Climate Model (CCM) developed by researchers at Argonne and Oak Ridge National Laboratories and at the National Center for Atmospheric Research in the early to mid 1990s. In preparation for use in the Department of Energy's Parallel Climate Model (PCM), PCCM has recently been updated with new physics routines from version 3.2 of the CCM, improvements to the parallel implementation, and ports to the SGIKray Research T3E and Origin 2000. We describe our experience in porting and tuning PCCM on these new platforms, evaluating the performance of different parallel algorithm options and comparing performance between the T3E and Origin 2000.
Parallel Programming with Intel Parallel Studio XE

CERN Document Server

Blair-Chappell , Stephen

2012-01-01

Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the
Introduction to parallel programming

CERN Document Server

Brawer, Steven

1989-01-01

Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race
Scientific programming on massively parallel processor CP-PACS

International Nuclear Information System (INIS)

Boku, Taisuke

1998-01-01

The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)
Writing parallel programs that work

CERN Multimedia

CERN. Geneva

2012-01-01

Serial algorithms typically run inefficiently on parallel machines. This may sound like an obvious statement, but it is the root cause of why parallel programming is considered to be difficult. The current state of the computer industry is still that almost all programs in existence are serial. This talk will describe the techniques used in the Intel Parallel Studio to provide a developer with the tools necessary to understand the behaviors and limitations of the existing serial programs. Once the limitations are known the developer can refactor the algorithms and reanalyze the resulting programs with the tools in the Intel Parallel Studio to create parallel programs that work. About the speaker Paul Petersen is a Sr. Principal Engineer in the Software and Solutions Group (SSG) at Intel. He received a Ph.D. degree in Computer Science from the University of Illinois in 1993. After UIUC, he was employed at Kuck and Associates, Inc. (KAI) working on auto-parallelizing compiler (KAP), and was involved in th...
Optimal Design and Tuning of PID-Type Interval Type-2 Fuzzy Logic Controllers for Delta Parallel Robots

Directory of Open Access Journals (Sweden)

Xingguo Lu

2016-05-01

Full Text Available In this work, we propose a new method for the optimal design and tuning of a Proportional-Integral-Derivative type (PID-type interval type-2 fuzzy logic controller (IT2 FLC for Delta parallel robot trajectory tracking control. The presented methodology starts with an optimal design problem of IT2 FLC. A group of IT2 FLCs are obtained by blurring the membership functions using a variable called blurring degree. By comparing the performance of the controllers, the optimal structure of IT2 FLC is obtained. Then, a multi-objective optimization problem is formulated to tune the scaling factors of the PID-type IT2 FLC. The Non-dominated Sorting Genetic Algorithm (NSGA-II is adopted to solve the constrained nonlinear multi-objective optimization problem. Simulation results of the optimized controller are presented and discussed regarding application in the Delta parallel robot. The proposed method provides an effective way to design and tune the PID-type IT2 FLC with a desired control performance.
Parallel programming with Easy Java Simulations

Science.gov (United States)

Esquembre, F.; Christian, W.; Belloni, M.

2018-01-01

Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.
Refinement of Parallel and Reactive Programs

OpenAIRE

Back, R. J. R.

1992-01-01

We show how to apply the refinement calculus to stepwise refinement of parallel and reactive programs. We use action systems as our basic program model. Action systems are sequential programs which can be implemented in a parallel fashion. Hence refinement calculus methods, originally developed for sequential programs, carry over to the derivation of parallel programs. Refinement of reactive programs is handled by data refinement techniques originally developed for the sequential refinement c...
Structured Parallel Programming Patterns for Efficient Computation

CERN Document Server

McCool, Michael; Robison, Arch

2012-01-01

Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th
Parallel phase model : a programming model for high-end parallel machines with manycores.

Energy Technology Data Exchange (ETDEWEB)

Wu, Junfeng (Syracuse University, Syracuse, NY); Wen, Zhaofang; Heroux, Michael Allen; Brightwell, Ronald Brian

2009-04-01

This paper presents a parallel programming model, Parallel Phase Model (PPM), for next-generation high-end parallel machines based on a distributed memory architecture consisting of a networked cluster of nodes with a large number of cores on each node. PPM has a unified high-level programming abstraction that facilitates the design and implementation of parallel algorithms to exploit both the parallelism of the many cores and the parallelism at the cluster level. The programming abstraction will be suitable for expressing both fine-grained and coarse-grained parallelism. It includes a few high-level parallel programming language constructs that can be added as an extension to an existing (sequential or parallel) programming language such as C; and the implementation of PPM also includes a light-weight runtime library that runs on top of an existing network communication software layer (e.g. MPI). Design philosophy of PPM and details of the programming abstraction are also presented. Several unstructured applications that inherently require high-volume random fine-grained data accesses have been implemented in PPM with very promising results.
Programming massively parallel processors a hands-on approach

CERN Document Server

Kirk, David B

2010-01-01

Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...
Experiences in Data-Parallel Programming

Directory of Open Access Journals (Sweden)

Terry W. Clark

1997-01-01

Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.
PDDP, A Data Parallel Programming Model

Directory of Open Access Journals (Sweden)

Karen H. Warren

1996-01-01

Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.
Language constructs for modular parallel programs

Energy Technology Data Exchange (ETDEWEB)

Foster, I.

1996-03-01

We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.

Development of parallel/serial program analyzing tool

International Nuclear Information System (INIS)

Watanabe, Hiroshi; Nagao, Saichi; Takigawa, Yoshio; Kumakura, Toshimasa

1999-03-01

Japan Atomic Energy Research Institute has been developing 'KMtool', a parallel/serial program analyzing tool, in order to promote the parallelization of the science and engineering computation program. KMtool analyzes the performance of program written by FORTRAN77 and MPI, and it reduces the effort for parallelization. This paper describes development purpose, design, utilization and evaluation of KMtool. (author)
PSHED: a simplified approach to developing parallel programs

International Nuclear Information System (INIS)

Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

1992-01-01

This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs
Professional Parallel Programming with C# Master Parallel Extensions with NET 4

CERN Document Server

Hillar, Gastón

2010-01-01

Expert guidance for those programming today's dual-core processors PCs As PC processors explode from one or two to now eight processors, there is an urgent need for programmers to master concurrent programming. This book dives deep into the latest technologies available to programmers for creating professional parallel applications using C#, .NET 4, and Visual Studio 2010. The book covers task-based programming, coordination data structures, PLINQ, thread pools, asynchronous programming model, and more. It also teaches other parallel programming techniques, such as SIMD and vectorization.Teach
About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems

Directory of Open Access Journals (Sweden)

Loredana MOCEAN

2009-01-01

Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.
From Single- to Multi-Objective Auto-Tuning of Programs: Advantages and Implications

Directory of Open Access Journals (Sweden)

Juan Durillo

2014-01-01

Full Text Available Automatic tuning (auto-tuning of software has emerged in recent years as a promising method that tries to automatically adapt the behaviour of a program to attain different performance objectives on a given computing system. This method is gaining momentum due to the increasing complexity of modern multicore-based hardware architectures. Many solutions to auto-tuning have been explored ranging from simple random search to more sophisticate methods like machine learning or evolutionary search. To this day, it is still unclear whether these approaches are general enough to encompass all the complexities of the problem (e.g. search space, parameters influencing the search space, input data sensitivity, etc., or which approach is best suited for a given problem. Furthermore, the growing interest in auto-tuning a program for several objectives is increasing this confusion even further. The goal of this paper is to formally describe the problem addressed by auto-tuning programs and review existing solutions highlighting the advantages and drawbacks of different techniques for single-objective as well as multi-objective auto-tuning approaches.
A Tutorial on Parallel and Concurrent Programming in Haskell

Science.gov (United States)

Peyton Jones, Simon; Singh, Satnam

This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
Productive Parallel Programming: The PCN Approach

Directory of Open Access Journals (Sweden)

Ian Foster

1992-01-01

Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.
Remote tuning of NMR probe circuits.

Science.gov (United States)

Kodibagkar, V D; Conradi, M S

2000-05-01

There are many circumstances in which the probe tuning adjustments cannot be located near the rf NMR coil. These may occur in high-temperature NMR, low-temperature NMR, and in the use of magnets with small diameter access bores. We address here circuitry for connecting a fixed-tuned probe circuit by a transmission line to a remotely located tuning network. In particular, the bandwidth over which the probe may be remotely tuned while keeping the losses in the transmission line acceptably low is considered. The results show that for all resonant circuit geometries (series, parallel, series-parallel), overcoupling of the line to the tuned circuit is key to obtaining a large tuning bandwidth. At equivalent extents of overcoupling, all resonant circuit geometries have nearly equal remote tuning bandwidths. Particularly for the case of low-loss transmission line, the tuning bandwidth can be many times the tuned circuit's bandwidth, f(o)/Q. Copyright 2000 Academic Press.
Step by step parallel programming method for molecular dynamics code

International Nuclear Information System (INIS)

Orii, Shigeo; Ohta, Toshio

1996-07-01

Parallel programming for a numerical simulation program of molecular dynamics is carried out with a step-by-step programming technique using the two phase method. As a result, within the range of a certain computing parameters, it is found to obtain parallel performance by using the level of parallel programming which decomposes the calculation according to indices of do-loops into each processor on the vector parallel computer VPP500 and the scalar parallel computer Paragon. It is also found that VPP500 shows parallel performance in wider range computing parameters. The reason is that the time cost of the program parts, which can not be reduced by the do-loop level of the parallel programming, can be reduced to the negligible level by the vectorization. After that, the time consuming parts of the program are concentrated on less parts that can be accelerated by the do-loop level of the parallel programming. This report shows the step-by-step parallel programming method and the parallel performance of the molecular dynamics code on VPP500 and Paragon. (author)
Program Transformation to Identify List-Based Parallel Skeletons

Directory of Open Access Journals (Sweden)

Venkatesh Kannan

2016-07-01

Full Text Available Algorithmic skeletons are used as building-blocks to ease the task of parallel programming by abstracting the details of parallel implementation from the developer. Most existing libraries provide implementations of skeletons that are defined over flat data types such as lists or arrays. However, skeleton-based parallel programming is still very challenging as it requires intricate analysis of the underlying algorithm and often uses inefficient intermediate data structures. Further, the algorithmic structure of a given program may not match those of list-based skeletons. In this paper, we present a method to automatically transform any given program to one that is defined over a list and is more likely to contain instances of list-based skeletons. This facilitates the parallel execution of a transformed program using existing implementations of list-based parallel skeletons. Further, by using an existing transformation called distillation in conjunction with our method, we produce transformed programs that contain fewer inefficient intermediate data structures.
Portable parallel programming in a Fortran environment

International Nuclear Information System (INIS)

May, E.N.

1989-01-01

Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs
Parallel processor programs in the Federal Government

Science.gov (United States)

Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

1985-01-01

In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.
Practical tuning for Oracle

International Nuclear Information System (INIS)

Kwon, Sun Yong

2005-02-01

This book deals with tuning for oracle application, which consists of twenty two chapters. These are the contents of this book : what is tuning?, procedure of tuning, collection of performance data using stats pack, collection of performance data in real time, disk IO dispersion, architecture on Index, partition and IOT, optimization of cluster Factor, optimizer, analysis on plan of operation, selection of Index, tuning of Index, parallel processing architecture, DML, analytic function join method, join type, analysis of application, Lock architecture, SGA architecture and wait event and segment tuning.
The Glasgow Parallel Reduction Machine: Programming Shared-memory Many-core Systems using Parallel Task Composition

Directory of Open Access Journals (Sweden)

Ashkan Tousimojarad

2013-12-01

Full Text Available We present the Glasgow Parallel Reduction Machine (GPRM, a novel, flexible framework for parallel task-composition based many-core programming. We allow the programmer to structure programs into task code, written as C++ classes, and communication code, written in a restricted subset of C++ with functional semantics and parallel evaluation. In this paper we discuss the GPRM, the virtual machine framework that enables the parallel task composition approach. We focus the discussion on GPIR, the functional language used as the intermediate representation of the bytecode running on the GPRM. Using examples in this language we show the flexibility and power of our task composition framework. We demonstrate the potential using an implementation of a merge sort algorithm on a 64-core Tilera processor, as well as on a conventional Intel quad-core processor and an AMD 48-core processor system. We also compare our framework with OpenMP tasks in a parallel pointer chasing algorithm running on the Tilera processor. Our results show that the GPRM programs outperform the corresponding OpenMP codes on all test platforms, and can greatly facilitate writing of parallel programs, in particular non-data parallel algorithms such as reductions.
6th International Parallel Tools Workshop

CERN Document Server

Brinkmann, Steffen; Gracia, José; Resch, Michael; Nagel, Wolfgang

2013-01-01

The latest advances in the High Performance Computing hardware have significantly raised the level of available compute performance. At the same time, the growing hardware capabilities of modern supercomputing architectures have caused an increasing complexity of the parallel application development. Despite numerous efforts to improve and simplify parallel programming, there is still a lot of manual debugging and tuning work required. This process is supported by special software tools, facilitating debugging, performance analysis, and optimization and thus making a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools, which were presented and discussed at the 6th International Parallel Tools Workshop, held in Stuttgart, Germany, 25-26 September 2012.
Concurrent Collections (CnC): A new approach to parallel programming

CERN Multimedia

CERN. Geneva

2010-01-01

A common approach in designing parallel languages is to provide some high level handles to manipulate the use of the parallel platform. This exposes some aspects of the target platform, for example, shared vs. distributed memory. It may expose some but not all types of parallelism, for example, data parallelism but not task parallelism. This approach must find a balance between the desire to provide a simple view for the domain expert and provide sufficient power for tuning. This is hard for any given architecture and harder if the language is to apply to a range of architectures. Either simplicity or power is lost. Instead of viewing the language design problem as one of providing the programmer with high level handles, we view the problem as one of designing an interface. On one side of this interface is the programmer (domain expert) who knows the application but needs no knowledge of any aspects of the platform. On the other side of the interface is the performance expert (programmer o...
The kpx, a program analyzer for parallelization

International Nuclear Information System (INIS)

Matsuyama, Yuji; Orii, Shigeo; Ota, Toshiro; Kume, Etsuo; Aikawa, Hiroshi.

1997-03-01

The kpx is a program analyzer, developed as a common technological basis for promoting parallel processing. The kpx consists of three tools. The first is ktool, that shows how much execution time is spent in program segments. The second is ptool, that shows parallelization overhead on the Paragon system. The last is xtool, that shows parallelization overhead on the VPP system. The kpx, designed to work for any FORTRAN cord on any UNIX computer, is confirmed to work well after testing on Paragon, SP2, SR2201, VPP500, VPP300, Monte-4, SX-4 and T90. (author)
Speedup predictions on large scientific parallel programs

International Nuclear Information System (INIS)

Williams, E.; Bobrowicz, F.

1985-01-01

How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
Automatic Thread-Level Parallelization in the Chombo AMR Library

Energy Technology Data Exchange (ETDEWEB)

Christen, Matthias; Keen, Noel; Ligocki, Terry; Oliker, Leonid; Shalf, John; Van Straalen, Brian; Williams, Samuel

2011-05-26

The increasing on-chip parallelism has some substantial implications for HPC applications. Currently, hybrid programming models (typically MPI+OpenMP) are employed for mapping software to the hardware in order to leverage the hardware?s architectural features. In this paper, we present an approach that automatically introduces thread level parallelism into Chombo, a parallel adaptive mesh refinement framework for finite difference type PDE solvers. In Chombo, core algorithms are specified in the ChomboFortran, a macro language extension to F77 that is part of the Chombo framework. This domain-specific language forms an already used target language for an automatic migration of the large number of existing algorithms into a hybrid MPI+OpenMP implementation. It also provides access to the auto-tuning methodology that enables tuning certain aspects of an algorithm to hardware characteristics. Performance measurements are presented for a few of the most relevant kernels with respect to a specific application benchmark using this technique as well as benchmark results for the entire application. The kernel benchmarks show that, using auto-tuning, up to a factor of 11 in performance was gained with 4 threads with respect to the serial reference implementation.
Declarative Parallel Programming in Spreadsheet End-User Development

DEFF Research Database (Denmark)

Biermann, Florian

2016-01-01

Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations. In this liter...... can directly apply results from functional array programming to a spreadsheet model of computations.......Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations....... In this literature study, we provide an overview of the publications on spreadsheet end-user programming and declarative array programming to inform further research on parallel programming in spreadsheets. Our results show that there is a clear overlap between spreadsheet programming and array programming and we...

Programming parallel architectures - The BLAZE family of languages

Science.gov (United States)

Mehrotra, Piyush

1989-01-01

This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.
Programming parallel architectures: The BLAZE family of languages

Science.gov (United States)

Mehrotra, Piyush

1988-01-01

Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
Resource optimised reconfigurable modular parallel pipelined stochastic approximation-based self-tuning regulator architecture with reduced latency

Directory of Open Access Journals (Sweden)

Varghese Mathew Vaidyan

2015-09-01

Full Text Available Present self-tuning regulator architectures based on recursive least-square estimation are computationally expensive and require large amount of resources and time in generating the first control signal due to computational bottlenecks imposed by the calculations involved in estimation stage, different stages of matrix multiplications and the number of intermediate variables at each iteration and precludes its use in applications that have fast required response times and those which run on embedded computing platforms with low-power or low-cost requirements with constraints on resource usage. A salient feature of this study is that a new modular parallel pipelined stochastic approximation-based self-tuning regulator architecture which reduces the time required to generate the first control signal, reduces resource usage and reduces the number of intermediate variables is proposed. Fast matrix multiplication, pipelining and high-speed arithmetic function implementations were used for improving the performance. Results of implementation demonstrate that the proposed architecture has an improvement in control signal generation time by 38% and reduction in resource usage by 41% in terms of multipliers and 44.4% in terms of adders compared with the best existing related work, opening up new possibilities for the application of online embedded self-tuning regulators.
Parallel adaptation of a vectorised quantumchemical program system

International Nuclear Information System (INIS)

Van Corler, L.C.H.; Van Lenthe, J.H.

1987-01-01

Supercomputers, like the CRAY 1 or the Cyber 205, have had, and still have, a marked influence on Quantum Chemistry. Vectorization has led to a considerable increase in the performance of Quantum Chemistry programs. However, clockcycle times more than a factor 10 smaller than those of the present supercomputers are not to be expected. Therefore future supercomputers will have to depend on parallel structures. Recently, the first examples of such supercomputers have been installed. To be prepared for this new generation of (parallel) supercomputers one should consider the concepts one wants to use and the kind of problems one will encounter during implementation of existing vectorized programs on those parallel systems. The authors implemented four important parts of a large quantumchemical program system (ATMOL), i.e. integrals, SCF, 4-index and Direct-CI in the parallel environment at ECSEC (Rome, Italy). This system offers simulated parallellism on the host computer (IBM 4381) and real parallellism on at most 10 attached processors (FPS-164). Quantumchemical programs usually handle large amounts of data and very large, often sparse matrices. The transfer of that many data can cause problems concerning communication and overhead, in view of which shared memory and shared disks must be considered. The strategy and the tools that were used to parallellise the programs are shown. Also, some examples are presented to illustrate effectiveness and performance of the system in Rome for these type of calculations
The BLAZE language - A parallel language for scientific programming

Science.gov (United States)

Mehrotra, Piyush; Van Rosendale, John

1987-01-01

A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
The BLAZE language: A parallel language for scientific programming

Science.gov (United States)

Mehrotra, P.; Vanrosendale, J.

1985-01-01

A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
Carotid chemoreceptors tune breathing via multipath routing: reticular chain and loop operations supported by parallel spike train correlations.

Science.gov (United States)

Morris, Kendall F; Nuding, Sarah C; Segers, Lauren S; Iceman, Kimberly E; O'Connor, Russell; Dean, Jay B; Ott, Mackenzie M; Alencar, Pierina A; Shuman, Dale; Horton, Kofi-Kermit; Taylor-Clark, Thomas E; Bolser, Donald C; Lindsey, Bruce G

2018-02-01

We tested the hypothesis that carotid chemoreceptors tune breathing through parallel circuit paths that target distinct elements of an inspiratory neuron chain in the ventral respiratory column (VRC). Microelectrode arrays were used to monitor neuronal spike trains simultaneously in the VRC, peri-nucleus tractus solitarius (p-NTS)-medial medulla, the dorsal parafacial region of the lateral tegmental field (FTL-pF), and medullary raphe nuclei together with phrenic nerve activity during selective stimulation of carotid chemoreceptors or transient hypoxia in 19 decerebrate, neuromuscularly blocked, and artificially ventilated cats. Of 994 neurons tested, 56% had a significant change in firing rate. A total of 33,422 cell pairs were evaluated for signs of functional interaction; 63% of chemoresponsive neurons were elements of at least one pair with correlational signatures indicative of paucisynaptic relationships. We detected evidence for postinspiratory neuron inhibition of rostral VRC I-Driver (pre-Bötzinger) neurons, an interaction predicted to modulate breathing frequency, and for reciprocal excitation between chemoresponsive p-NTS neurons and more downstream VRC inspiratory neurons for control of breathing depth. Chemoresponsive pericolumnar tonic expiratory neurons, proposed to amplify inspiratory drive by disinhibition, were correlationally linked to afferent and efferent "chains" of chemoresponsive neurons extending to all monitored regions. The chains included coordinated clusters of chemoresponsive FTL-pF neurons with functional links to widespread medullary sites involved in the control of breathing. The results support long-standing concepts on brain stem network architecture and a circuit model for peripheral chemoreceptor modulation of breathing with multiple circuit loops and chains tuned by tegmental field neurons with quasi-periodic discharge patterns. NEW & NOTEWORTHY We tested the long-standing hypothesis that carotid chemoreceptors tune the
Tune Up: Automotive Mechanics Instructional Program. Block 5.

Science.gov (United States)

O'Brien, Ralph D.

The fifth of six instructional blocks in automotive mechanics, the lessons and supportive information in the document provide a guide for teachers in planning an instructional program in automotive tune-ups at the secondary and post secondary level. The material, as organized, is a suggested sequence of instruction within each block. Each lesson…
Parallel programming practical aspects, models and current limitations

CERN Document Server

Tarkov, Mikhail S

2014-01-01

Parallel programming is designed for the use of parallel computer systems for solving time-consuming problems that cannot be solved on a sequential computer in a reasonable time. These problems can be divided into two classes: 1. Processing large data arrays (including processing images and signals in real time)2. Simulation of complex physical processes and chemical reactions For each of these classes, prospective methods are designed for solving problems. For data processing, one of the most promising technologies is the use of artificial neural networks. Particles-in-cell method and cellular automata are very useful for simulation. Problems of scalability of parallel algorithms and the transfer of existing parallel programs to future parallel computers are very acute now. An important task is to optimize the use of the equipment (including the CPU cache) of parallel computers. Along with parallelizing information processing, it is essential to ensure the processing reliability by the relevant organization ...
On the Automatic Parallelization of Sparse and Irregular Fortran Programs

Directory of Open Access Journals (Sweden)

Yuan Lin

1999-01-01

Full Text Available Automatic parallelization is usually believed to be less effective at exploiting implicit parallelism in sparse/irregular programs than in their dense/regular counterparts. However, not much is really known because there have been few research reports on this topic. In this work, we have studied the possibility of using an automatic parallelizing compiler to detect the parallelism in sparse/irregular programs. The study with a collection of sparse/irregular programs led us to some common loop patterns. Based on these patterns new techniques were derived that produced good speedups when manually applied to our benchmark codes. More importantly, these parallelization methods can be implemented in a parallelizing compiler and can be applied automatically.
Survey on present status and trend of parallel programming environments

International Nuclear Information System (INIS)

Takemiya, Hiroshi; Higuchi, Kenji; Honma, Ichiro; Ohta, Hirofumi; Kawasaki, Takuji; Imamura, Toshiyuki; Koide, Hiroshi; Akimoto, Masayuki.

1997-03-01

This report intends to provide useful information on software tools for parallel programming through the survey on parallel programming environments of the following six parallel computers, Fujitsu VPP300/500, NEC SX-4, Hitachi SR2201, Cray T94, IBM SP, and Intel Paragon, all of which are installed at Japan Atomic Energy Research Institute (JAERI), moreover, the present status of R and D's on parallel softwares of parallel languages, compilers, debuggers, performance evaluation tools, and integrated tools is reported. This survey has been made as a part of our project of developing a basic software for parallel programming environment, which is designed on the concept of STA (Seamless Thinking Aid to programmers). (author)
Parallelization for first principles electronic state calculation program

International Nuclear Information System (INIS)

Watanabe, Hiroshi; Oguchi, Tamio.

1997-03-01

In this report we study the parallelization for First principles electronic state calculation program. The target machines are NEC SX-4 for shared memory type parallelization and FUJITSU VPP300 for distributed memory type parallelization. The features of each parallel machine are surveyed, and the parallelization methods suitable for each are proposed. It is shown that 1.60 times acceleration is achieved with 2 CPU parallelization by SX-4 and 4.97 times acceleration is achieved with 12 PE parallelization by VPP 300. (author)
An object-oriented programming paradigm for parallelization of computational fluid dynamics

International Nuclear Information System (INIS)

Ohta, Takashi.

1997-03-01

We propose an object-oriented programming paradigm for parallelization of scientific computing programs, and show that the approach can be a very useful strategy. Generally, parallelization of scientific programs tends to be complicated and unportable due to the specific requirements of each parallel computer or compiler. In this paper, we show that the object-oriented programming design, which separates the parallel processing parts from the solver of the applications, can achieve the large improvement in the maintenance of the codes, as well as the high portability. We design the program for the two-dimensional Euler equations according to the paradigm, and evaluate the parallel performance on IBM SP2. (author)
Testing New Programming Paradigms with NAS Parallel Benchmarks

Science.gov (United States)

Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

2000-01-01

Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage
Automatic Parallelization Tool: Classification of Program Code for Parallel Computing

Directory of Open Access Journals (Sweden)

Mustafa Basthikodi

2016-04-01

Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.
Development of massively parallel quantum chemistry program SMASH

International Nuclear Information System (INIS)

Ishimura, Kazuya

2015-01-01

A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C 150 H 30 ) 2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer
Tuned In emotion regulation program using music listening: Effectiveness for adolescents in educational settings

Directory of Open Access Journals (Sweden)

Genevieve Anita Dingle

2016-06-01

Full Text Available This paper presents an effectiveness study of Tuned In, a novel emotion regulation intervention that uses participant selected music to evoke emotions in session and teaches participants emotional awareness and regulation skills. The group program content is informed by a two dimensional model of emotion (arousal, valence, along with music psychology theories about how music evokes emotional responses. The program has been evaluated in two samples of adolescents: 41 at risk adolescents (76% males; Mage = 14.8 years attending an educational re-engagement program and 216 students (100% females; Mage = 13.6 years attending a mainstream secondary school. Results showed significant pre- to post-program improvements in measures of emotion awareness, identification, and regulation (p < .01 to p = .06 in the smaller at risk sample and all p < .001 in the mainstream school sample. Participant ratings of engagement and likelihood of using the strategies learned in the program were high. Tuned In shows promise as a brief emotion regulation intervention for adolescents, and these findings extend an earlier study with young adults. Tuned In is a-theoretical in regard to psychotherapeutic approach and could be integrated with other program components as required.
Tuned In Emotion Regulation Program Using Music Listening: Effectiveness for Adolescents in Educational Settings.

Science.gov (United States)

Dingle, Genevieve A; Hodges, Joseph; Kunde, Ashleigh

2016-01-01

This paper presents an effectiveness study of Tuned In, a novel emotion regulation intervention that uses participant selected music to evoke emotions in session and teaches participants emotional awareness and regulation skills. The group program content is informed by a two dimensional model of emotion (arousal, valence), along with music psychology theories about how music evokes emotional responses. The program has been evaluated in two samples of adolescents: 41 "at risk" adolescents (76% males; M age = 14.8 years) attending an educational re-engagement program and 216 students (100% females; M age = 13.6 years) attending a mainstream secondary school. Results showed significant pre- to post-program improvements in measures of emotion awareness, identification, and regulation (p < 0.01 to p = 0.06 in the smaller "at risk" sample and all p < 0.001 in the mainstream school sample). Participant ratings of engagement and likelihood of using the strategies learned in the program were high. Tuned In shows promise as a brief emotion regulation intervention for adolescents, and these findings extend an earlier study with young adults. Tuned In is a-theoretical in regard to psychotherapeutic approach and could be integrated with other program components as required.
Parallelization for X-ray crystal structural analysis program

Energy Technology Data Exchange (ETDEWEB)

Watanabe, Hiroshi [Japan Atomic Energy Research Inst., Tokyo (Japan); Minami, Masayuki; Yamamoto, Akiji

1997-10-01

In this report we study vectorization and parallelization for X-ray crystal structural analysis program. The target machine is NEC SX-4 which is a distributed/shared memory type vector parallel supercomputer. X-ray crystal structural analysis is surveyed, and a new multi-dimensional discrete Fourier transform method is proposed. The new method is designed to have a very long vector length, so that it enables to obtain the 12.0 times higher performance result that the original code. Besides the above-mentioned vectorization, the parallelization by micro-task functions on SX-4 reaches 13.7 times acceleration in the part of multi-dimensional discrete Fourier transform with 14 CPUs, and 3.0 times acceleration in the whole program. Totally 35.9 times acceleration to the original 1CPU scalar version is achieved with vectorization and parallelization on SX-4. (author)
The FORCE: A highly portable parallel programming language

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.

The FORCE - A highly portable parallel programming language

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
Tuning and History: A Personal Overview

Science.gov (United States)

Isaacs, Ann Katherine

2017-01-01

The text places Tuning History in the context of the rapidly developing international collaboration among historians which began in Europe in 1989, with the ECTS Pilot project, and continued, from 2000 on, with the European History Networks (for research and for curriculum development) working in parallel and in collaboration with Tuning, in…
MPI_XSTAR: MPI-based Parallelization of the XSTAR Photoionization Program

Science.gov (United States)

Danehkar, Ashkbiz; Nowak, Michael A.; Lee, Julia C.; Smith, Randall K.

2018-02-01

We describe a program for the parallel implementation of multiple runs of XSTAR, a photoionization code that is used to predict the physical properties of an ionized gas from its emission and/or absorption lines. The parallelization program, called MPI_XSTAR, has been developed and implemented in the C++ language by using the Message Passing Interface (MPI) protocol, a conventional standard of parallel computing. We have benchmarked parallel multiprocessing executions of XSTAR, using MPI_XSTAR, against a serial execution of XSTAR, in terms of the parallelization speedup and the computing resource efficiency. Our experience indicates that the parallel execution runs significantly faster than the serial execution, however, the efficiency in terms of the computing resource usage decreases with increasing the number of processors used in the parallel computing.
Development of massively parallel quantum chemistry program SMASH

Energy Technology Data Exchange (ETDEWEB)

Ishimura, Kazuya [Department of Theoretical and Computational Molecular Science, Institute for Molecular Science 38 Nishigo-Naka, Myodaiji, Okazaki, Aichi 444-8585 (Japan)

2015-12-31

A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.
Three dimensional Burn-up program parallelization using socket programming

International Nuclear Information System (INIS)

Haliyati R, Evi; Su'ud, Zaki

2002-01-01

A computer parallelization process was built with a purpose to decrease execution time of a physics program. In this case, a multi computer system was built to be used to analyze burn-up process of a nuclear reactor. This multi computer system was design need using a protocol communication among sockets, i.e. TCP/IP. This system consists of computer as a server and the rest as clients. The server has a main control to all its clients. The server also divides the reactor core geometrically to in parts in accordance with the number of clients, each computer including the server has a task to conduct burn-up analysis of 1/n part of the total reactor core measure. This burn-up analysis was conducted simultaneously and in a parallel way by all computers, so a faster program execution time was achieved close to 1/n times that of one computer. Then an analysis was carried out and states that in order to calculate the density of atoms in a reactor of 91 cm x 91 cm x 116 cm, the usage of a parallel system of 2 computers has the highest efficiency
Integrated Task And Data Parallel Programming: Language Design

Science.gov (United States)

Grimshaw, Andrew S.; West, Emily A.

1998-01-01

his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated
User's guide of parallel program development environment (PPDE). The 2nd edition

International Nuclear Information System (INIS)

Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio; Ohta, Hirofumi

2000-03-01

The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a parallelizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)
A Programming Model for Massive Data Parallelism with Data Dependencies

International Nuclear Information System (INIS)

Cui, Xiaohui; Mueller, Frank; Potok, Thomas E.; Zhang, Yongpeng

2009-01-01

Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For graphics processor units (GPUs), this is particularly the case when program development is aided by environments such as NVIDIA s Compute Unified Device Architecture (CUDA), which dramatically reduces the gap between domain-specific architectures and general purpose programming. Nonetheless, general-purpose GPU (GPGPU) programming remains subject to several restrictions. Most significantly, the separation of host (CPU) and accelerator (GPU) address spaces requires explicit management of GPU memory resources, especially for massive data parallelism that well exceeds the memory capacity of GPUs. One solution to this problem is to transfer data between the GPU and host memories frequently. In this work, we investigate another approach. We run massively data-parallel applications on GPU clusters. We further propose a programming model for massive data parallelism with data dependencies for this scenario. Experience from micro benchmarks and real-world applications shows that our model provides not only ease of programming but also significant performance gains
An environment for parallel structuring of Fortran programs

International Nuclear Information System (INIS)

Sridharan, K.; McShea, M.; Denton, C.; Eventoff, B.; Browne, J.C.; Newton, P.; Ellis, M.; Grossbard, D.; Wise, T.; Clemmer, D.

1990-01-01

The paper describes and illustrates an environment for interactive support of the detection and implementation of macro-level parallelism in Fortran programs. The approach couples algorithms for dependence analysis with both innovative techniques for complexity management and capabilities for the measurement and analysis of the parallel computation structures generated through use of the environment. The resulting environment is complementary to the more common approach of seeking local parallelism by loop unrolling, either by an automatic compiler or manually. (orig.)
Adapting high-level language programs for parallel processing using data flow

Science.gov (United States)

Standley, Hilda M.

1988-01-01

EASY-FLOW, a very high-level data flow language, is introduced for the purpose of adapting programs written in a conventional high-level language to a parallel environment. The level of parallelism provided is of the large-grained variety in which parallel activities take place between subprograms or processes. A program written in EASY-FLOW is a set of subprogram calls as units, structured by iteration, branching, and distribution constructs. A data flow graph may be deduced from an EASY-FLOW program.
Compiling Scientific Programs for Scalable Parallel Systems

National Research Council Canada - National Science Library

Kennedy, Ken

2001-01-01

...). The research performed in this project included new techniques for recognizing implicit parallelism in sequential programs, a powerful and precise set-based framework for analysis and transformation...
User's guide of parallel program development environment (PPDE). The 2nd edition

Energy Technology Data Exchange (ETDEWEB)

Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio [Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute, Tokyo (Japan); Ohta, Hirofumi [Hitachi Ltd., Tokyo (Japan)

2000-03-01

The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a paralleilizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)
Program For Parallel Discrete-Event Simulation

Science.gov (United States)

Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.

1991-01-01

User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
Dynamic Modeling and Fuzzy Self-Tuning Disturbance Decoupling Control for a 3-DOF Serial-Parallel Hybrid Humanoid Arm

Directory of Open Access Journals (Sweden)

Yueling Wang

2013-01-01

Full Text Available A unique fuzzy self-tuning disturbance decoupling controller (FSDDC is designed for a serial-parallel hybrid humanoid arm (HHA to implement the throwing trajectory-tracking mission. Firstly, the dynamic model of the HHA is established and the input signal of the throwing process is obtained by studying the throwing process of human's arm. Secondly, the FSDDC, incorporating the disturbance decoupling controller (DDC and the fuzzy logic controller (FLC, is designed to ensure trajectory tracking of the HHA in the presence of uncertainties and disturbances. With the FSDDC method, the HHA system can be decoupled by actively estimating and rejecting the effects of both the internal plant dynamics and external disturbances. The self-tuning parameters are adapted online to improve the performance of the FSDDC; thus, it does not require detailed system parameters of the presented FSDDC. Finally, the controller introduced is compared with a PD controller which is commonly used for the robot manipulators control in industry. The effectiveness of the designed FSDDC is illustrated by simulations.
Cell verification of parallel burnup calculation program MCBMPI based on MPI

International Nuclear Information System (INIS)

Yang Wankui; Liu Yaoguang; Ma Jimin; Wang Guanbo; Yang Xin; She Ding

2014-01-01

The parallel burnup calculation program MCBMPI was developed. The program was modularized. The parallel MCNP5 program MCNP5MPI was employed as neutron transport calculation module. And a composite of three solution methods was used to solve burnup equation, i.e. matrix exponential technique, TTA analytical solution, and Gauss Seidel iteration. MPI parallel zone decomposition strategy was concluded in the program. The program system only consists of MCNP5MPI and burnup subroutine. The latter achieves three main functions, i.e. zone decomposition, nuclide transferring and decaying, and data exchanging with MCNP5MPI. Also, the program was verified with the pressurized water reactor (PWR) cell burnup benchmark. The results show that it,s capable to apply the program to burnup calculation of multiple zones, and the computation efficiency could be significantly improved with the development of computer hardware. (authors)
Vdebug: debugging tool for parallel scientific programs. Design report on vdebug

International Nuclear Information System (INIS)

Matsuda, Katsuyuki; Takemiya, Hiroshi

2000-02-01

We report on a debugging tool called vdebug which supports debugging work for parallel scientific simulation programs. It is difficult to debug scientific programs with an existing debugger, because the volume of data generated by the programs is too large for users to check data in characters. Usually, the existing debugger shows data values in characters. To alleviate it, we have developed vdebug which enables to check the validity of large amounts of data by showing these data values visually. Although targets of vdebug have been restricted to sequential programs, we have made it applicable to parallel programs by realizing the function of merging and visualizing data distributed on programs on each computer node. Now, vdebug works on seven kinds of parallel computers. In this report, we describe the design of vdebug. (author)
From sequential to parallel programming with patterns

CERN Document Server

CERN. Geneva

2018-01-01

To increase in both performance and efficiency, our programming models need to adapt to better exploit modern processors. The classic idioms and patterns for programming such as loops, branches or recursion are the pillars of almost every code and are well known among all programmers. These patterns all have in common that they are sequential in nature. Embracing parallel programming patterns, which allow us to program for multi- and many-core hardware in a natural way, greatly simplifies the task of designing a program that scales and performs on modern hardware, independently of the used programming language, and in a generic way.
On program restructuring, scheduling, and communication for parallel processor systems

Energy Technology Data Exchange (ETDEWEB)

Polychronopoulos, Constantine D. [Univ. of Illinois, Urbana, IL (United States)

1986-08-01

This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, these algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

Directory of Open Access Journals (Sweden)

Stephen L. Olivier

2013-01-01

Full Text Available Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems. Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.
Parallel implementation of the PHOENIX generalized stellar atmosphere program. II. Wavelength parallelization

International Nuclear Information System (INIS)

Baron, E.; Hauschildt, Peter H.

1998-01-01

We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000 - 300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers. copyright 1998 The American Astronomical Society

Vectorization, parallelization and porting of nuclear codes. 2001

International Nuclear Information System (INIS)

Akiyama, Mitsunaga; Katakura, Fumishige; Kume, Etsuo; Nemoto, Toshiyuki; Tsuruoka, Takuya; Adachi, Masaaki

2003-07-01

Several computer codes in the nuclear field have been vectorized, parallelized and transported on the super computer system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 10 codes in fiscal 2001. In this report, the parallelization of Neutron Radiography for 3 Dimensional CT code NR3DCT, the vectorization of unsteady-state heat conduction code THERMO3D, the porting of initial program of MHD simulation, the tuning of Heat And Mass Balance Analysis Code HAMBAC, the porting and parallelization of Monte Carlo N-Particle transport code MCNP4C3, the porting and parallelization of Monte Carlo N-Particle transport code system MCNPX2.1.5, the porting of induced activity calculation code CINAC-V4, the use of VisLink library in multidimensional two-fluid model code ACD3D and the porting of experiment data processing code from GS8500 to SR8000 are described. (author)
Parallel Volunteer Learning during Youth Programs

Science.gov (United States)

Lesmeister, Marilyn K.; Green, Jeremy; Derby, Amy; Bothum, Candi

2012-01-01

Lack of time is a hindrance for volunteers to participate in educational opportunities, yet volunteer success in an organization is tied to the orientation and education they receive. Meeting diverse educational needs of volunteers can be a challenge for program managers. Scheduling a Volunteer Learning Track for chaperones that is parallel to a…
Communications oriented programming of parallel iterative solutions of sparse linear systems

Science.gov (United States)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Massive parallel electromagnetic field simulation program JEMS-FDTD design and implementation on jasmin

International Nuclear Information System (INIS)

Li Hanyu; Zhou Haijing; Dong Zhiwei; Liao Cheng; Chang Lei; Cao Xiaolin; Xiao Li

2010-01-01

A large-scale parallel electromagnetic field simulation program JEMS-FDTD(J Electromagnetic Solver-Finite Difference Time Domain) is designed and implemented on JASMIN (J parallel Adaptive Structured Mesh applications INfrastructure). This program can simulate propagation, radiation, couple of electromagnetic field by solving Maxwell equations on structured mesh explicitly with FDTD method. JEMS-FDTD is able to simulate billion-mesh-scale problems on thousands of processors. In this article, the program is verified by simulating the radiation of an electric dipole. A beam waveguide is simulated to demonstrate the capability of large scale parallel computation. A parallel performance test indicates that a high parallel efficiency is obtained. (authors)
Java parallel secure stream for grid computing

International Nuclear Information System (INIS)

Chen, J.; Akers, W.; Chen, Y.; Watson, W.

2001-01-01

The emergence of high speed wide area networks makes grid computing a reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve the bandwidth and to reduce latency on a high speed wide area network. The authors present a pure Java package called JPARSS (Java Parallel Secure Stream) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a gird environment without the necessity of tuning the TCP window size. Several experimental results are provided to show that using parallel stream is more effective than tuning TCP window size. In addition X.509 certificate based single sign-on mechanism and SSL based connection establishment are integrated into this package. Finally a few applications using this package will be discussed
Fiscal 2000 report on advanced parallelized compiler technology. Outlines; 2000 nendo advanced heiretsuka compiler gijutsu hokokusho (Gaiyo hen)

Energy Technology Data Exchange (ETDEWEB)

NONE

2001-03-01

Research and development was carried out concerning the automatic parallelized compiler technology which improves on the practical performance, cost/performance ratio, and ease of operation of the multiprocessor system now used for constructing supercomputers and expected to provide a fundamental architecture for microprocessors for the 21st century. Efforts were made to develop an automatic multigrain parallelization technology for extracting multigrain as parallelized from a program and for making full use of the same and a parallelizing tuning technology for accelerating parallelization by feeding back to the compiler the dynamic information and user knowledge to be acquired during execution. Moreover, a benchmark program was selected and studies were made to set execution rules and evaluation indexes for the establishment of technologies for subjectively evaluating the performance of parallelizing compilers for the existing commercial parallel processing computers, which was achieved through the implementation and evaluation of the 'Advanced parallelizing compiler technology research and development project.' (NEDO)
Development and benchmark verification of a parallelized Monte Carlo burnup calculation program MCBMPI

International Nuclear Information System (INIS)

Yang Wankui; Liu Yaoguang; Ma Jimin; Yang Xin; Wang Guanbo

2014-01-01

MCBMPI, a parallelized burnup calculation program, was developed. The program is modularized. Neutron transport calculation module employs the parallelized MCNP5 program MCNP5MPI, and burnup calculation module employs ORIGEN2, with the MPI parallel zone decomposition strategy. The program system only consists of MCNP5MPI and an interface subroutine. The interface subroutine achieves three main functions, i.e. zone decomposition, nuclide transferring and decaying, data exchanging with MCNP5MPI. Also, the program was verified with the Pressurized Water Reactor (PWR) cell burnup benchmark, the results showed that it's capable to apply the program to burnup calculation of multiple zones, and the computation efficiency could be significantly improved with the development of computer hardware. (authors)
A program system for ab initio MO calculations on vector and parallel processing machines. Pt. 1

International Nuclear Information System (INIS)

Ernenwein, R.; Rohmer, M.M.; Benard, M.

1990-01-01

We present a program system for ab initio molecular orbital calculations on vector and parallel computers. The present article is devoted to the computation of one- and two-electron integrals over contracted Gaussian basis sets involving s-, p-, d- and f-type functions. The McMurchie and Davidson (MMD) algorithm has been implemented and parallelized by distributing over a limited number of logical tasks the calculation of the 55 relevant classes of integrals. All sections of the MMD algorithm have been efficiently vectorized, leading to a scalar/vector ratio of 5.8. Different algorithms are proposed and compared for an optimal vectorization of the contraction of the 'intermediate integrals' generated by the MMD formalism. Advantage is taken of the dynamic storage allocation for tuning the length of the vector loops (i.e. the size of the vectorization buffer) as a function of (i) the total memory available for the job, (ii) the number of logical tasks defined by the user (≤13), and (iii) the storage requested by each specific class of integrals. Test calculations carried out on a CRAY-2 computer show that the average number of finite integrals computed over a (s, p, d, f) CGTO basis set is about 1180000 per second and per processor. The combination of vectorization and parallelism on this 4-processor machine reduces the CPU time by a factor larger than 20 with respect to the scalar and sequential performance. (orig.)
User's guide of parallel program development environment (PPDE). The 2nd edition

Energy Technology Data Exchange (ETDEWEB)

Ueno, Hirokazu; Takemiya, Hiroshi; Imamura, Toshiyuki; Koide, Hiroshi; Matsuda, Katsuyuki; Higuchi, Kenji; Hirayama, Toshio [Center for Promotion of Computational Science and Engineering, Japan Atomic Energy Research Institute, Tokyo (Japan); Ohta, Hirofumi [Hitachi Ltd., Tokyo (Japan)

2000-03-01

The STA basic system has been enhanced to accelerate support for parallel programming on heterogeneous parallel computers, through a series of R and D on the technology of parallel processing. The enhancement has been made through extending the function of the PPDF, Parallel Program Development Environment in the STA basic system. The extended PPDE has the function to make: 1) the automatic creation of a 'makefile' and a shell script file for its execution, 2) the multi-tools execution which makes the tools on heterogeneous computers to execute with one operation a task on a computer, and 3) the mirror composition to reflect editing results of a file on a computer into all related files on other computers. These additional functions will enhance the work efficiency for program development on some computers. More functions have been added to the PPDE to provide help for parallel program development. New functions were also designed to complement a HPF translator and a paralleilizing support tool when working together so that a sequential program is efficiently converted to a parallel program. This report describes the use of extended PPDE. (author)
Compiling the parallel programming language NestStep to the CELL processor

OpenAIRE

Holm, Magnus

2010-01-01

The goal of this project is to create a source-to-source compiler which will translate NestStep code to C code. The compiler's job is to replace NestStep constructs with a series of function calls to the NestStep runtime system. NestStep is a parallel programming language extension based on the BSP model. It adds constructs for parallel programming on top of an imperative programming language. For this project, only constructs extending the C language are relevant. The output code will compil...
A language for data-parallel and task parallel programming dedicated to multi-SIMD computers. Contributions to hydrodynamic simulation with lattice gases

International Nuclear Information System (INIS)

Pic, Marc Michel

1995-01-01

Parallel programming covers task-parallelism and data-parallelism. Many problems need both parallelisms. Multi-SIMD computers allow hierarchical approach of these parallelisms. The T++ language, based on C++, is dedicated to exploit Multi-SIMD computers using a programming paradigm which is an extension of array-programming to tasks managing. Our language introduced array of independent tasks to achieve separately (MIMD), on subsets of processors of identical behaviour (SIMD), in order to translate the hierarchical inclusion of data-parallelism in task-parallelism. To manipulate in a symmetrical way tasks and data we propose meta-operations which have the same behaviour on tasks arrays and on data arrays. We explain how to implement this language on our parallel computer SYMPHONIE in order to profit by the locally-shared memory, by the hardware virtualization, and by the multiplicity of communications networks. We analyse simultaneously a typical application of such architecture. Finite elements scheme for Fluid mechanic needs powerful parallel computers and requires large floating points abilities. Lattice gases is an alternative to such simulations. Boolean lattice bases are simple, stable, modular, need to floating point computation, but include numerical noise. Boltzmann lattice gases present large precision of computation, but needs floating points and are only locally stable. We propose a new scheme, called multi-bit, who keeps the advantages of each boolean model to which it is applied, with large numerical precision and reduced noise. Experiments on viscosity, physical behaviour, noise reduction and spurious invariants are shown and implementation techniques for parallel Multi-SIMD computers detailed. (author) [fr
Contributions to computational stereology and parallel programming

DEFF Research Database (Denmark)

Rasmusson, Allan

rotator, even without the need for isotropic sections. To meet the need for computational power to perform image restoration of virtual tissue sections, parallel programming on GPUs has also been part of the project. This has lead to a significant change in paradigm for a previously developed surgical...
Program Correctness, Verification and Testing for Exascale (Corvette)

Energy Technology Data Exchange (ETDEWEB)

Sen, Koushik [Univ. of California, Berkeley, CA (United States); Iancu, Costin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Demmel, James W [UC Berkeley

2018-01-26

The goal of this project is to provide tools to assess the correctness of parallel programs written using hybrid parallelism. There is a dire lack of both theoretical and engineering know-how in the area of finding bugs in hybrid or large scale parallel programs, which our research aims to change. In the project we have demonstrated novel approaches in several areas: 1. Low overhead automated and precise detection of concurrency bugs at scale. 2. Using low overhead bug detection tools to guide speculative program transformations for performance. 3. Techniques to reduce the concurrency required to reproduce a bug using partial program restart/replay. 4. Techniques to provide reproducible execution of floating point programs. 5. Techniques for tuning the floating point precision used in codes.
Heterogeneous Multicore Parallel Programming for Graphics Processing Units

Directory of Open Access Journals (Sweden)

Francois Bodin

2009-01-01

Full Text Available Hybrid parallel multicore architectures based on graphics processing units (GPUs can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a unintrusive manner while preserving the legacy code.
A novel two-level dynamic parallel data scheme for large 3-D SN calculations

International Nuclear Information System (INIS)

Sjoden, G.E.; Shedlock, D.; Haghighat, A.; Yi, C.

2005-01-01

We introduce a new dynamic parallel memory optimization scheme for executing large scale 3-D discrete ordinates (Sn) simulations on distributed memory parallel computers. In order for parallel transport codes to be truly scalable, they must use parallel data storage, where only the variables that are locally computed are locally stored. Even with parallel data storage for the angular variables, cumulative storage requirements for large discrete ordinates calculations can be prohibitive. To address this problem, Memory Tuning has been implemented into the PENTRAN 3-D parallel discrete ordinates code as an optimized, two-level ('large' array, 'small' array) parallel data storage scheme. Memory Tuning can be described as the process of parallel data memory optimization. Memory Tuning dynamically minimizes the amount of required parallel data in allocated memory on each processor using a statistical sampling algorithm. This algorithm is based on the integral average and standard deviation of the number of fine meshes contained in each coarse mesh in the global problem. Because PENTRAN only stores the locally computed problem phase space, optimal two-level memory assignments can be unique on each node, depending upon the parallel decomposition used (hybrid combinations of angular, energy, or spatial). As demonstrated in the two large discrete ordinates models presented (a storage cask and an OECD MOX Benchmark), Memory Tuning can save a substantial amount of memory per parallel processor, allowing one to accomplish very large scale Sn computations. (authors)
iTunes music

CERN Document Server

Katz, Bob

2013-01-01

Apple's exciting new Mastered for iTunes (MFiT) initiative, introduced in early 2012, introduces new possibilities for delivering high-quality audio. For the first time, record labels and program producers are encouraged to deliver audio materials to iTunes in a high resolution format, which can produce better-sounding masters. In iTunes Music, author and world-class mastering engineer Bob Katz starts out with the basics, surveys the recent past, and brings you quickly up to the present-where the current state of digital audio is bleak. Katz explains the evolution of
PAREMD: A parallel program for the evaluation of momentum space properties of atoms and molecules

Science.gov (United States)

Meena, Deep Raj; Gadre, Shridhar R.; Balanarayan, P.

2018-03-01

The present work describes a code for evaluating the electron momentum density (EMD), its moments and the associated Shannon information entropy for a multi-electron molecular system. The code works specifically for electronic wave functions obtained from traditional electronic structure packages such as GAMESS and GAUSSIAN. For the momentum space orbitals, the general expression for Gaussian basis sets in position space is analytically Fourier transformed to momentum space Gaussian basis functions. The molecular orbital coefficients of the wave function are taken as an input from the output file of the electronic structure calculation. The analytic expressions of EMD are evaluated over a fine grid and the accuracy of the code is verified by a normalization check and a numerical kinetic energy evaluation which is compared with the analytic kinetic energy given by the electronic structure package. Apart from electron momentum density, electron density in position space has also been integrated into this package. The program is written in C++ and is executed through a Shell script. It is also tuned for multicore machines with shared memory through OpenMP. The program has been tested for a variety of molecules and correlated methods such as CISD, Møller-Plesset second order (MP2) theory and density functional methods. For correlated methods, the PAREMD program uses natural spin orbitals as an input. The program has been benchmarked for a variety of Gaussian basis sets for different molecules showing a linear speedup on a parallel architecture.
Control of Fermilab Booster tunes

International Nuclear Information System (INIS)

Johnson, R.P; Meisner, K.; Sandberg, B.

1977-01-01

Control of the radial and vertical tunes of the booster is implemented using ramped correction quadrupoles. Minor modifications to the power supply cards for the 48 (previously) dc correction quadrupoles allow ''the tunes'' to be continuously programmed or held constant throughout the 33 ms acceleration cycle. This capability is in addition to the usual use of these quadrupoles to be independently varied to correct for harmonic distortions in the lattice. An automatic computer program measures and displays the tunes vs. time in the cycle to monitor performance and to allow the ramps to be adjusted by the machine operator
Comparative Study of Dynamic Programming and Pontryagin’s Minimum Principle on Energy Management for a Parallel Hybrid Electric Vehicle

Directory of Open Access Journals (Sweden)

Huei Peng

2013-04-01

Full Text Available This paper compares two optimal energy management methods for parallel hybrid electric vehicles using an Automatic Manual Transmission (AMT. A control-oriented model of the powertrain and vehicle dynamics is built first. The energy management is formulated as a typical optimal control problem to trade off the fuel consumption and gear shifting frequency under admissible constraints. The Dynamic Programming (DP and Pontryagin’s Minimum Principle (PMP are applied to obtain the optimal solutions. Tuning with the appropriate co-states, the PMP solution is found to be very close to that from DP. The solution for the gear shifting in PMP has an algebraic expression associated with the vehicular velocity and can be implemented more efficiently in the control algorithm. The computation time of PMP is significantly less than DP.
Parallel programming of saccades during natural scene viewing: evidence from eye movement positions.

Science.gov (United States)

Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng

2013-10-24

Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.

Basic design of parallel computational program for probabilistic structural analysis

International Nuclear Information System (INIS)

Kaji, Yoshiyuki; Arai, Taketoshi; Gu, Wenwei; Nakamura, Hitoshi

1999-06-01

In our laboratory, for 'development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory' (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)
Basic design of parallel computational program for probabilistic structural analysis

Energy Technology Data Exchange (ETDEWEB)

Kaji, Yoshiyuki; Arai, Taketoshi [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Gu, Wenwei; Nakamura, Hitoshi

1999-06-01

In our laboratory, for `development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory` (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)
Programming Models in HPC

Energy Technology Data Exchange (ETDEWEB)

Shipman, Galen M. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2016-06-13

These are the slides for a presentation on programming models in HPC, at the Los Alamos National Laboratory's Parallel Computing Summer School. The following topics are covered: Flynn's Taxonomy of computer architectures; single instruction single data; single instruction multiple data; multiple instruction multiple data; address space organization; definition of Trinity (Intel Xeon-Phi is a MIMD architecture); single program multiple data; multiple program multiple data; ExMatEx workflow overview; definition of a programming model, programming languages, runtime systems; programming model and environments; MPI (Message Passing Interface); OpenMP; Kokkos (Performance Portable Thread-Parallel Programming Model); Kokkos abstractions, patterns, policies, and spaces; RAJA, a systematic approach to node-level portability and tuning; overview of the Legion Programming Model; mapping tasks and data to hardware resources; interoperability: supporting task-level models; Legion S3D execution and performance details; workflow, integration of external resources into the programming model.
On the Performance of the Python Programming Language for Serial and Parallel Scientific Computations

Directory of Open Access Journals (Sweden)

Xing Cai

2005-01-01

Full Text Available This article addresses the performance of scientific applications that use the Python programming language. First, we investigate several techniques for improving the computational efficiency of serial Python codes. Then, we discuss the basic programming techniques in Python for parallelizing serial scientific applications. It is shown that an efficient implementation of the array-related operations is essential for achieving good parallel performance, as for the serial case. Once the array-related operations are efficiently implemented, probably using a mixed-language implementation, good serial and parallel performance become achievable. This is confirmed by a set of numerical experiments. Python is also shown to be well suited for writing high-level parallel programs.
MulticoreBSP for C : A high-performance library for shared-memory parallel programming

NARCIS (Netherlands)

Yzelman, A. N.; Bisseling, R. H.; Roose, D.; Meerbergen, K.

2014-01-01

The bulk synchronous parallel (BSP) model, as well as parallel programming interfaces based on BSP, classically target distributed-memory parallel architectures. In earlier work, Yzelman and Bisseling designed a MulticoreBSP for Java library specifically for shared-memory architectures. In the
A multithreaded parallel implementation of a dynamic programming algorithm for sequence comparison.

Science.gov (United States)

Martins, W S; Del Cuvillo, J B; Useche, F J; Theobald, K B; Gao, G R

2001-01-01

This paper discusses the issues involved in implementing a dynamic programming algorithm for biological sequence comparison on a general-purpose parallel computing platform based on a fine-grain event-driven multithreaded program execution model. Fine-grain multithreading permits efficient parallelism exploitation in this application both by taking advantage of asynchronous point-to-point synchronizations and communication with low overheads and by effectively tolerating latency through the overlapping of computation and communication. We have implemented our scheme on EARTH, a fine-grain event-driven multithreaded execution and architecture model which has been ported to a number of parallel machines with off-the-shelf processors. Our experimental results show that the dynamic programming algorithm can be efficiently implemented on EARTH systems with high performance (e.g., speedup of 90 on 120 nodes), good programmability and reasonable cost.
High performance parallelism pearls 2 multicore and many-core programming approaches

CERN Document Server

Jeffers, Jim

2015-01-01

High Performance Parallelism Pearls Volume 2 offers another set of examples that demonstrate how to leverage parallelism. Similar to Volume 1, the techniques included here explain how to use processors and coprocessors with the same programming - illustrating the most effective ways to combine Xeon Phi coprocessors with Xeon and other multicore processors. The book includes examples of successful programming efforts, drawn from across industries and domains such as biomed, genetics, finance, manufacturing, imaging, and more. Each chapter in this edited work includes detailed explanations of t
Towards Interactive Visual Exploration of Parallel Programs using a Domain-Specific Language

KAUST Repository

Klein, Tobias

2016-04-19

The use of GPUs and the massively parallel computing paradigm have become wide-spread. We describe a framework for the interactive visualization and visual analysis of the run-time behavior of massively parallel programs, especially OpenCL kernels. This facilitates understanding a program\\'s function and structure, finding the causes of possible slowdowns, locating program bugs, and interactively exploring and visually comparing different code variants in order to improve performance and correctness. Our approach enables very specific, user-centered analysis, both in terms of the recording of the run-time behavior and the visualization itself. Instead of having to manually write instrumented code to record data, simple code annotations tell the source-to-source compiler which code instrumentation to generate automatically. The visualization part of our framework then enables the interactive analysis of kernel run-time behavior in a way that can be very specific to a particular problem or optimization goal, such as analyzing the causes of memory bank conflicts or understanding an entire parallel algorithm.
Run-Time and Compiler Support for Programming in Adaptive Parallel Environments

Directory of Open Access Journals (Sweden)

Guy Edjlali

1997-01-01

Full Text Available For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at run-time. In this article, we discuss run-time support for data-parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a run-time library to provide this support. We discuss how the run-time library can be used by compilers of high-performance Fortran (HPF-like languages to generate code for an adaptive environment. We present performance results for a Navier-Stokes solver and a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network of nondedicated workstations, which are likely to be an important resource for parallel programming in the future.
Silicon Carbide Defect Qubits/Quantum Memory with Field-Tuning: OSD Quantum Science and Engineering Program (QSEP)

Science.gov (United States)

2017-08-01

TECHNICAL REPORT 3073 August 2017 Silicon Carbide Defect Qubits/Quantum Memory with Field-tuning: OSD Quantum Science and Engineering Program...Quantum Science and Engineering Program) by the Advanced Concepts and Applied Research Branch (Code 71730), the Energy and Environmental Sustainability...the Secretary of Defense (OSD) Quantum Science and Engineering Program (QSEP). Their collaboration topic was to examine the effect of electric-field
Induction heating using induction coils in series-parallel circuits

Science.gov (United States)

Matsen, Marc Rollo; Geren, William Preston; Miller, Robert James; Negley, Mark Alan; Dykstra, William Chet

2017-11-14

A part is inductively heated by multiple, self-regulating induction coil circuits having susceptors, coupled together in parallel and in series with an AC power supply. Each of the circuits includes a tuning capacitor that tunes the circuit to resonate at the frequency of AC power supply.
The Performance of an Object-Oriented, Parallel Operating System

Directory of Open Access Journals (Sweden)

David R. Kohr, Jr.

1994-01-01

Full Text Available The nascent and rapidly evolving state of parallel systems often leaves parallel application developers at the mercy of inefficient, inflexible operating system software. Given the relatively primitive state of parallel systems software, maximizing the performance of parallel applications not only requires judicious tuning of the application software, but occasionally, the replacement of specific system software modules with others that can more readily respond to the imposed pattern of resource demands. To assess the feasibility of application and performance tuning via malleable system software and to understand the performance penalties for detailed operating system performance data capture, we describe a set of performance instrumentation techniques for parallel, object-oriented operating systems and a set of performance experiments with Choices, an experimental, object-oriented operating system designed for use with parallel sys- tems. These performance experiments show that (a the performance overhead for operating system data capture is modest, (b the penalty for malleable, object-oriented operating systems is negligible, but (c techniques are needed to strictly enforce adherence of implementation to design if operating system modules are to be replaced.
Double-tuned radiofrequency coil for (19)F and (1)H imaging.

Science.gov (United States)

Otake, Yosuke; Soutome, Yoshihisa; Hirata, Koji; Ochi, Hisaaki; Bito, Yoshitaka

2014-01-01

We developed a double-tuned radiofrequency (RF) coil using a novel circuit method to double tune for fluorine-19 (19F) and 1H magnetic resonance imaging, whose frequencies are very close to each other. The RF coil consists of 3 parallel-connected series inductor capacitor circuits. A computer simulation for our double-tuned RF coil with a phantom demonstrated that the coil has tuned resonant frequency and high sensitivity for both 19F and 1H. Drug distribution was visualized at 7 tesla using this RF coil and a rat administered perfluoro 15-crown-5-ether emulsion. The double-tune RF coil we developed may be a powerful tool for 19F and 1H imaging.
F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

Science.gov (United States)

DiNucci, David C.; Saini, Subhash (Technical Monitor)

1998-01-01

Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).
Evolution of a minimal parallel programming model

International Nuclear Information System (INIS)

Lusk, Ewing; Butler, Ralph; Pieper, Steven C.

2017-01-01

Here, we take a historical approach to our presentation of self-scheduled task parallelism, a programming model with its origins in early irregular and nondeterministic computations encountered in automated theorem proving and logic programming. We show how an extremely simple task model has evolved into a system, asynchronous dynamic load balancing (ADLB), and a scalable implementation capable of supporting sophisticated applications on today’s (and tomorrow’s) largest supercomputers; and we illustrate the use of ADLB with a Green’s function Monte Carlo application, a modern, mature nuclear physics code in production use. Our lesson is that by surrendering a certain amount of generality and thus applicability, a minimal programming model (in terms of its basic concepts and the size of its application programmer interface) can achieve extreme scalability without introducing complexity.
Towards Interactive Visual Exploration of Parallel Programs using a Domain-Specific Language

KAUST Repository

Klein, Tobias; Bruckner, Stefan; Grö ller, M. Eduard; Hadwiger, Markus; Rautek, Peter

2016-01-01

The use of GPUs and the massively parallel computing paradigm have become wide-spread. We describe a framework for the interactive visualization and visual analysis of the run-time behavior of massively parallel programs, especially OpenCL kernels. This facilitates understanding a program's function and structure, finding the causes of possible slowdowns, locating program bugs, and interactively exploring and visually comparing different code variants in order to improve performance and correctness. Our approach enables very specific, user-centered analysis, both in terms of the recording of the run-time behavior and the visualization itself. Instead of having to manually write instrumented code to record data, simple code annotations tell the source-to-source compiler which code instrumentation to generate automatically. The visualization part of our framework then enables the interactive analysis of kernel run-time behavior in a way that can be very specific to a particular problem or optimization goal, such as analyzing the causes of memory bank conflicts or understanding an entire parallel algorithm.
A backtracking algorithm for the stream AND-parallel execution of logic programs

Energy Technology Data Exchange (ETDEWEB)

Somogyi, Z.; Ramamohanarao, K.; Vaghani, J. (Univ. of Melbourne, Parkville (Australia))

1988-06-01

The authors present the first backtracking algorithm for stream AND-parallel logic programs. It relies on compile-time knowledge of the data flow graph of each clause to let it figure out efficiently which goals to kill or restart when a goal fails. This crucial information, which they derive from mode declarations, was not available at compile-time in any previous stream AND-parallel system. They show that modes can increase the precision of the backtracking algorithm, though their algorithm allows this precision to be traded off against overhead on a procedure-by-procedure and call-by-call basis. The modes also allow their algorithm to handle efficiently programs that manipulate partially instantiated data structures and an important class of programs with circular dependency graphs. On code that does not need backtracking, the efficiency of their algorithm approaches that of the committed-choice languages; on code that does need backtracking its overhead is comparable to that of the independent AND-parallel backtracking algorithms.
Feedback Driven Annotation and Refactoring of Parallel Programs

DEFF Research Database (Denmark)

Larsen, Per

and communication in embedded programs. Runtime checks are developed to ensure that annotations correctly describe observable program behavior. The performance impact of runtime checking is evaluated on several benchmark kernels and is negligible in all cases. The second aspect is compilation feedback. Annotations...... are not effective unless programmers are told how and when they are benecial. A prototype compilation feedback system was developed in collaboration with IBM Haifa Research Labs. It reports issues that prevent further analysis to the programmer. Performance evaluation shows that three programs performes signicantly......This thesis combines programmer knowledge and feedback to improve modeling and optimization of software. The research is motivated by two observations. First, there is a great need for automatic analysis of software for embedded systems - to expose and model parallelism inherent in programs. Second...
Getting To Exascale: Applying Novel Parallel Programming Models To Lab Applications For The Next Generation Of Supercomputers

Energy Technology Data Exchange (ETDEWEB)

Dube, Evi [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Shereda, Charles [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Nau, Lee [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Harris, Lance [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2010-09-27

As supercomputing moves toward exascale, node architectures will change significantly. CPU core counts on nodes will increase by an order of magnitude or more. Heterogeneous architectures will become more commonplace, with GPUs or FPGAs providing additional computational power. Novel programming models may make better use of on-node parallelism in these new architectures than do current models. In this paper we examine several of these novel models – UPC, CUDA, and OpenCL –to determine their suitability to LLNL scientific application codes. Our study consisted of several phases: We conducted interviews with code teams and selected two codes to port; We learned how to program in the new models and ported the codes; We debugged and tuned the ported applications; We measured results, and documented our findings. We conclude that UPC is a challenge for porting code, Berkeley UPC is not very robust, and UPC is not suitable as a general alternative to OpenMP for a number of reasons. CUDA is well supported and robust but is a proprietary NVIDIA standard, while OpenCL is an open standard. Both are well suited to a specific set of application problems that can be run on GPUs, but some problems are not suited to GPUs. Further study of the landscape of novel models is recommended.
High-Q perpendicular-biased ferrite-tuned cavity

International Nuclear Information System (INIS)

Carlini, R.D.; Thiessen, H.A.; Potter, J.M.

1983-01-01

Rapid-cycling proton synchrotrons, such as the proposed LAMPF II accelerator, require approximately 10 MV per turn rf with 17% tuning range near 50 MHz. The traditional approach to ferrite-tuned cavities uses a ferrite which is longitudinally biased (rf magnetic field parallel to bias field). This method leads to unacceptably high losses in the ferrite. At Los Alamos, we are developing a cavity with transverse bias (rf magnetic field perpendicular to the bias field) that makes use of the tensor permeability of the ferrite. Modest power tests of a small (10-cm-dia) quarter-wave singly re-entrant cavity tuned by nickel-zinc ferrites and aluminum-doped garnets indicate that the losses in the ferrite can be made negligible compared with the losses due to the surface resistivity of the copper cavity at power levels from 2 to 200 watts

Process-Oriented Parallel Programming with an Application to Data-Intensive Computing

OpenAIRE

Givelberg, Edward

2014-01-01

We introduce process-oriented programming as a natural extension of object-oriented programming for parallel computing. It is based on the observation that every class of an object-oriented language can be instantiated as a process, accessible via a remote pointer. The introduction of process pointers requires no syntax extension, identifies processes with programming objects, and enables processes to exchange information simply by executing remote methods. Process-oriented programming is a h...
P3T+: A Performance Estimator for Distributed and Parallel Programs

Directory of Open Access Journals (Sweden)

T. Fahringer

2000-01-01

Full Text Available Developing distributed and parallel programs on today's multiprocessor architectures is still a challenging task. Particular distressing is the lack of effective performance tools that support the programmer in evaluating changes in code, problem and machine sizes, and target architectures. In this paper we introduce P3T+ which is a performance estimator for mostly regular HPF (High Performance Fortran programs but partially covers also message passing programs (MPI. P3T+ is unique by modeling programs, compiler code transformations, and parallel and distributed architectures. It computes at compile-time a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. Several novel technologies are employed to compute these parameters: loop iteration spaces, array access patterns, and data distributions are modeled by employing highly effective symbolic analysis. Communication is estimated by simulating the behavior of a communication library used by the underlying compiler. Computation times are predicted through pre-measured kernels on every target architecture of interest. We carefully model most critical architecture specific factors such as cache lines sizes, number of cache lines available, startup times, message transfer time per byte, etc. P3T+ has been implemented and is closely integrated with the Vienna High Performance Compiler (VFC to support programmers develop parallel and distributed applications. Experimental results for realistic kernel codes taken from real-world applications are presented to demonstrate both accuracy and usefulness of P3T+.
Academic training: From Evolution Theory to Parallel and Distributed Genetic Programming

CERN Multimedia

2007-01-01

2006-2007 ACADEMIC TRAINING PROGRAMME LECTURE SERIES 15, 16 March From 11:00 to 12:00 - Main Auditorium, bldg. 500 From Evolution Theory to Parallel and Distributed Genetic Programming F. FERNANDEZ DE VEGA / Univ. of Extremadura, SP Lecture No. 1: From Evolution Theory to Evolutionary Computation Evolutionary computation is a subfield of artificial intelligence (more particularly computational intelligence) involving combinatorial optimization problems, which are based to some degree on the evolution of biological life in the natural world. In this tutorial we will review the source of inspiration for this metaheuristic and its capability for solving problems. We will show the main flavours within the field, and different problems that have been successfully solved employing this kind of techniques. Lecture No. 2: Parallel and Distributed Genetic Programming The successful application of Genetic Programming (GP, one of the available Evolutionary Algorithms) to optimization problems has encouraged an ...
Application Portable Parallel Library

Science.gov (United States)

Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

1995-01-01

Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
Final Report: Center for Programming Models for Scalable Parallel Computing

Energy Technology Data Exchange (ETDEWEB)

Mellor-Crummey, John [William Marsh Rice University

2011-09-13

As part of the Center for Programming Models for Scalable Parallel Computing, Rice University collaborated with project partners in the design, development and deployment of language, compiler, and runtime support for parallel programming models to support application development for the “leadership-class” computer systems at DOE national laboratories. Work over the course of this project has focused on the design, implementation, and evaluation of a second-generation version of Coarray Fortran. Research and development efforts of the project have focused on the CAF 2.0 language, compiler, runtime system, and supporting infrastructure. This has involved working with the teams that provide infrastructure for CAF that we rely on, implementing new language and runtime features, producing an open source compiler that enabled us to evaluate our ideas, and evaluating our design and implementation through the use of benchmarks. The report details the research, development, findings, and conclusions from this work.
Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

Science.gov (United States)

Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

2017-12-01

This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
76 FR 62808 - Pilot Program for Parallel Review of Medical Products

Science.gov (United States)

2011-10-11

... voluntary participation in the pilot program, as well as the guiding principles the Agencies intend to... 57045), parallel review is intended to reduce the time between FDA marketing approval and CMS national...
Empirical valence bond models for reactive potential energy surfaces: a parallel multilevel genetic program approach.

Science.gov (United States)

Bellucci, Michael A; Coker, David F

2011-07-28

We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics
Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

Science.gov (United States)

Bellerby, Tim

2015-04-01

PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks number of processors
76 FR 66309 - Pilot Program for Parallel Review of Medical Products; Correction

Science.gov (United States)

2011-10-26

... DEPARTMENT OF HEALTH AND HUMAN SERVICES Centers for Medicare and Medicaid Services [CMS-3180-N2] Food and Drug Administration [Docket No. FDA-2010-N-0308] Pilot Program for Parallel Review of Medical... 11, 2011 (76 FR 62808). The document announced a pilot program for sponsors of innovative device...
Teaching Scientific Computing: A Model-Centered Approach to Pipeline and Parallel Programming with C

Directory of Open Access Journals (Sweden)

Vladimiras Dolgopolovas

2015-01-01

Full Text Available The aim of this study is to present an approach to the introduction into pipeline and parallel computing, using a model of the multiphase queueing system. Pipeline computing, including software pipelines, is among the key concepts in modern computing and electronics engineering. The modern computer science and engineering education requires a comprehensive curriculum, so the introduction to pipeline and parallel computing is the essential topic to be included in the curriculum. At the same time, the topic is among the most motivating tasks due to the comprehensive multidisciplinary and technical requirements. To enhance the educational process, the paper proposes a novel model-centered framework and develops the relevant learning objects. It allows implementing an educational platform of constructivist learning process, thus enabling learners’ experimentation with the provided programming models, obtaining learners’ competences of the modern scientific research and computational thinking, and capturing the relevant technical knowledge. It also provides an integral platform that allows a simultaneous and comparative introduction to pipelining and parallel computing. The programming language C for developing programming models and message passing interface (MPI and OpenMP parallelization tools have been chosen for implementation.
Large tuning of birefringence in two strip silicon waveguides via optomechanical motion.

Science.gov (United States)

Ma, Jing; Povinelli, Michelle L

2009-09-28

We present an optomechanical method to tune phase and group birefringence in parallel silicon strip waveguides. We first calculate the deformation of suspended, parallel strip waveguides due to optical forces. We optimize the frequency and polarization of the pump light to obtain a 9 nm deformation for an optical power of 20 mW. Widely tunable phase and group birefringence can be achieved by varying the pump power, with maximum values of 0.026 and 0.13, respectively. The giant phase birefringence allows linear to circular polarization conversion within 30 microm for a pump power of 67 mW. The group birefringence gives a tunable differential group delay of 6fs between orthogonal polarizations. We also evaluate the tuning performance of waveguides with different cross sections.
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang

2010-01-01

Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Massively Parallel Finite Element Programming

KAUST Repository

Heister, Timo

2010-01-01

Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.
Adaptive Self-Tuning Networks

Science.gov (United States)

Knox, H. A.; Draelos, T.; Young, C. J.; Lawry, B.; Chael, E. P.; Faust, A.; Peterson, M. G.

2015-12-01

The quality of automatic detections from seismic sensor networks depends on a large number of data processing parameters that interact in complex ways. The largely manual process of identifying effective parameters is painstaking and does not guarantee that the resulting controls are the optimal configuration settings. Yet, achieving superior automatic detection of seismic events is closely related to these parameters. We present an automated sensor tuning (AST) system that learns near-optimal parameter settings for each event type using neuro-dynamic programming (reinforcement learning) trained with historic data. AST learns to test the raw signal against all event-settings and automatically self-tunes to an emerging event in real-time. The overall goal is to reduce the number of missed legitimate event detections and the number of false event detections. Reducing false alarms early in the seismic pipeline processing will have a significant impact on this goal. Applicable both for existing sensor performance boosting and new sensor deployment, this system provides an important new method to automatically tune complex remote sensing systems. Systems tuned in this way will achieve better performance than is currently possible by manual tuning, and with much less time and effort devoted to the tuning process. With ground truth on detections in seismic waveforms from a network of stations, we show that AST increases the probability of detection while decreasing false alarms.
Feasibility studies for a high energy physics MC program on massive parallel platforms

International Nuclear Information System (INIS)

Bertolotto, L.M.; Peach, K.J.; Apostolakis, J.; Bruschini, C.E.; Calafiura, P.; Gagliardi, F.; Metcalf, M.; Norton, A.; Panzer-Steindel, B.

1994-01-01

The parallelization of a Monte Carlo program for the NA48 experiment is presented. As a first step, a task farming structure was realized. Based on this, a further step, making use of a distributed database for showers in the electro-magnetic calorimeter, was implemented. Further possibilities for using parallel processing for a quasi-real time calibration of the calorimeter are described
Parallel programming with Python

CERN Document Server

Palach, Jan

2014-01-01

A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.
Enabling Requirements-Based Programming for Highly-Dependable Complex Parallel and Distributed Systems

Science.gov (United States)

Hinchey, Michael G.; Rash, James L.; Rouff, Christopher A.

2005-01-01

The manual application of formal methods in system specification has produced successes, but in the end, despite any claims and assertions by practitioners, there is no provable relationship between a manually derived system specification or formal model and the customer's original requirements. Complex parallel and distributed system present the worst case implications for today s dearth of viable approaches for achieving system dependability. No avenue other than formal methods constitutes a serious contender for resolving the problem, and so recognition of requirements-based programming has come at a critical juncture. We describe a new, NASA-developed automated requirement-based programming method that can be applied to certain classes of systems, including complex parallel and distributed systems, to achieve a high degree of dependability.
Compiler Technology for Parallel Scientific Computation

Directory of Open Access Journals (Sweden)

Can Özturan

1994-01-01

Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.
Practical parallel programming

CERN Document Server

Bauer, Barr E

2014-01-01

This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.

The FORCE: A portable parallel programming language supporting computational structural mechanics

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Brehm, Juergen; Ramanan, Aruna

1989-01-01

This project supports the conversion of codes in Computational Structural Mechanics (CSM) to a parallel form which will efficiently exploit the computational power available from multiprocessors. The work is a part of a comprehensive, FORTRAN-based system to form a basis for a parallel version of the NICE/SPAR combination which will form the CSM Testbed. The software is macro-based and rests on the force methodology developed by the principal investigator in connection with an early scientific multiprocessor. Machine independence is an important characteristic of the system so that retargeting it to the Flex/32, or any other multiprocessor on which NICE/SPAR might be imnplemented, is well supported. The principal investigator has experience in producing parallel software for both full and sparse systems of linear equations using the force macros. Other researchers have used the Force in finite element programs. It has been possible to rapidly develop software which performs at maximum efficiency on a multiprocessor. The inherent machine independence of the system also means that the parallelization will not be limited to a specific multiprocessor.
Parallelization and checkpointing of GPU applications through program transformation

Energy Technology Data Exchange (ETDEWEB)

Solano-Quinde, Lizandro Damian [Iowa State Univ., Ames, IA (United States)

2012-01-01

GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solve the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and
Development of parallel-plate-based MEMS tunable capacitors with linearized capacitance–voltage response and extended tuning range

International Nuclear Information System (INIS)

Shavezipur, M; Nieva, P; Khajepour, A; Hashemi, S M

2010-01-01

This paper presents a design technique that can be used to linearize the capacitance–voltage (C–V) response and extend the tuning range of parallel-plate-based MEMS tunable capacitors beyond that of conventional designs. The proposed technique exploits the curvature of the capacitor's moving electrode which could be induced by either manipulating the stress gradients in the plate's material or using bi-layer structures. The change in curvature generates a nonlinear structural stiffness as the moving electrode undergoes out-of-plane deformation due to the actuation voltage. If the moving plate curvature is tailored such that the capacitance increment is proportional to the voltage increment, then a linear C–V response is obtained. The larger structural resistive force at higher bias voltage also delays the pull-in and increases the maximum tunability of the capacitor. Moreover, for capacitors containing an insulation layer between the two electrodes, the proposed technique completely eliminates the pull-in effect. The experimental data obtained from different capacitors fabricated using PolyMUMPs demonstrate the advantages of this design approach where highly linear C–V responses and tunabilities as high as 1050% were recorded. The design methodology introduced in this paper could be easily extended to for example, capacitive pressure and temperature sensors or infrared detectors to enhance their response characteristics.
Tuning of Clic accelerating structure prototypes at CERN

CERN Document Server

Shi, J; Olyunin, A; Wuensch, W

2010-01-01

An RF measurement system has been set up at CERN for use in the X-band accelerating structure development program of the CLIC study. Using the system, S-parameters are measured and the field distribution is obtained automatically using a bead-pull technique. The corrections for tuning the structure are calculated from an initial measurement and cell-by-cell tuning is applied to obtain the correct phase advance and minimum reflection at the operation frequency. The detailed tuning procedure is presented and explained along with an example of measurement and tuning of CLIC accelerating structure prototypes.
Dynamic programming in parallel boundary detection with application to ultrasound intima-media segmentation.

Science.gov (United States)

Zhou, Yuan; Cheng, Xinyao; Xu, Xiangyang; Song, Enmin

2013-12-01

Segmentation of carotid artery intima-media in longitudinal ultrasound images for measuring its thickness to predict cardiovascular diseases can be simplified as detecting two nearly parallel boundaries within a certain distance range, when plaque with irregular shapes is not considered. In this paper, we improve the implementation of two dynamic programming (DP) based approaches to parallel boundary detection, dual dynamic programming (DDP) and piecewise linear dual dynamic programming (PL-DDP). Then, a novel DP based approach, dual line detection (DLD), which translates the original 2-D curve position to a 4-D parameter space representing two line segments in a local image segment, is proposed to solve the problem while maintaining efficiency and rotation invariance. To apply the DLD to ultrasound intima-media segmentation, it is imbedded in a framework that employs an edge map obtained from multiplication of the responses of two edge detectors with different scales and a coupled snake model that simultaneously deforms the two contours for maintaining parallelism. The experimental results on synthetic images and carotid arteries of clinical ultrasound images indicate improved performance of the proposed DLD compared to DDP and PL-DDP, with respect to accuracy and efficiency. Copyright © 2013 Elsevier B.V. All rights reserved.
Improved model reduction and tuning of fractional-order PI(λ)D(μ) controllers for analytical rule extraction with genetic programming.

Science.gov (United States)

Das, Saptarshi; Pan, Indranil; Das, Shantanu; Gupta, Amitava

2012-03-01

Genetic algorithm (GA) has been used in this study for a new approach of suboptimal model reduction in the Nyquist plane and optimal time domain tuning of proportional-integral-derivative (PID) and fractional-order (FO) PI(λ)D(μ) controllers. Simulation studies show that the new Nyquist-based model reduction technique outperforms the conventional H(2)-norm-based reduced parameter modeling technique. With the tuned controller parameters and reduced-order model parameter dataset, optimum tuning rules have been developed with a test-bench of higher-order processes via genetic programming (GP). The GP performs a symbolic regression on the reduced process parameters to evolve a tuning rule which provides the best analytical expression to map the data. The tuning rules are developed for a minimum time domain integral performance index described by a weighted sum of error index and controller effort. From the reported Pareto optimal front of the GP-based optimal rule extraction technique, a trade-off can be made between the complexity of the tuning formulae and the control performance. The efficacy of the single-gene and multi-gene GP-based tuning rules has been compared with the original GA-based control performance for the PID and PI(λ)D(μ) controllers, handling four different classes of representative higher-order processes. These rules are very useful for process control engineers, as they inherit the power of the GA-based tuning methodology, but can be easily calculated without the requirement for running the computationally intensive GA every time. Three-dimensional plots of the required variation in PID/fractional-order PID (FOPID) controller parameters with reduced process parameters have been shown as a guideline for the operator. Parametric robustness of the reported GP-based tuning rules has also been shown with credible simulation examples. Copyright © 2011 ISA. Published by Elsevier Ltd. All rights reserved.
Pattern-Driven Automatic Parallelization

Directory of Open Access Journals (Sweden)

Christoph W. Kessler

1996-01-01

Full Text Available This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequently occurring computations and programming concepts in the source code. This tool also works for dusty deck codes that have been "encrypted" by former machine-specific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machine-specific library routines, and powerful program transformations. The partially restored program semantics also supports local array alignment, distribution, and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.
MPI_XSTAR: MPI-based parallelization of XSTAR program

Science.gov (United States)

Danehkar, A.

2017-12-01

MPI_XSTAR parallelizes execution of multiple XSTAR runs using Message Passing Interface (MPI). XSTAR (ascl:9910.008), part of the HEASARC's HEAsoft (ascl:1408.004) package, calculates the physical conditions and emission spectra of ionized gases. MPI_XSTAR invokes XSTINITABLE from HEASoft to generate a job list of XSTAR commands for given physical parameters. The job list is used to make directories in ascending order, where each individual XSTAR is spawned on each processor and outputs are saved. HEASoft's XSTAR2TABLE program is invoked upon the contents of each directory in order to produce table model FITS files for spectroscopy analysis tools.
Jointly Tuned Plasmonic–Excitonic Photovoltaics Using Nanoshells

KAUST Repository

Paz-Soldan, Daniel

2013-04-10

Recent advances in spectrally tuned, solution-processed plasmonic nanoparticles have provided unprecedented control over light\\'s propagation and absorption via engineering at the nanoscale. Simultaneous parallel progress in colloidal quantum dot photovoltaics offers the potential for low-cost, large-area solar power; however, these devices suffer from poor quantum efficiency in the more weakly absorbed infrared portion of the sun\\'s spectrum. Here, we report a plasmonic-excitonic solar cell that combines two classes of solution-processed infrared materials that we tune jointly. We show through experiment and theory that a plasmonic-excitonic design using gold nanoshells with optimized single particle scattering-to-absorption cross-section ratios leads to a strong enhancement in near-field absorption and a resultant 35% enhancement in photocurrent in the performance-limiting near-infrared spectral region. © 2013 American Chemical Society.
Tune measurement at GSI SIS-18. Methods and applications

Energy Technology Data Exchange (ETDEWEB)

Singh, Rahul

2014-05-15

Two parallel tune measurement systems are installed at GSI SIS-18 based on different principles. The first is called the Tune, Orbit and POSition measurement system TOPOS. Its working principle involves direct digitization of BPM signals at 125 MSa/s, which is used for online bunch-by-bunch position calculation in FPGAs. In the course of this work, position calculation algorithms were developed and studied for real time implementation in the TOPOS FPGAs. The regression fit algorithm is found to be more efficient and robust in comparison to previously used weighted mean algorithm with the baseline restoration procedure. The second system is the Baseband Tune measurement system referred to as BBQ system. The operational principle of this system was conceived at the CERN Beam Instrumentation group and is based on direct diode detection. In the framework of this work, this system was optimized and brought into operation at GSI SIS-18. Front-end data from both systems are used to calculate the tune spectrum every 250-5000 beam revolutions or turns within SIS-18 based on the resolution requirement and the mode of operation. Advanced non-parametric spectrum estimation method like amplitude Capon estimator is compared to the conventional DFT based methods in terms of resolving power and computational requirements for the calculated spectrum. Further the TOPOS and BBQ systems are compared and characterized in terms of sensitivity, reliability and operational usage. The results from both systems are found to be consistent with each other and have their favoured regimes of operation. The effects on tune spectra obtained from both systems were studied with different types of excitations with excitation power levels up to 6 mW/Hz. These systems in association with other beam diagnostic devices at SIS-18 were used to conduct extensive experiments to understand the effect of high intensity beams on the tune spectrum. These careful measurements recorded all the relevant beam
AUTOMOTIVE DIESEL MAINTENANCE 1. UNIT VII, ENGINE TUNE-UP--DETROIT DIESEL ENGINE.

Science.gov (United States)

Human Engineering Inst., Cleveland, OH.

THIS MODULE OF A 30-MODULE COURSE IS DESIGNED TO DEVELOP AN UNDERSTANDING OF TUNE-UP PROCEDURES FOR DIESEL ENGINES. TOPICS ARE SCHEDULING TUNE-UPS, AND TUNE-UP PROCEDURES. THE MODULE CONSISTS OF A SELF-INSTRUCTIONAL BRANCH PROGRAMED TRAINING FILM "ENGINE TUNE-UP--DETROIT DIESEL ENGINE" AND OTHER MATERIALS. SEE VT 005 655 FOR FURTHER INFORMATION.…
Practical parallel computing

CERN Document Server

Morse, H Stephen

1994-01-01

Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi
Overview of the Force Scientific Parallel Language

Directory of Open Access Journals (Sweden)

Gita Alaghband

1994-01-01

Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets.

Science.gov (United States)

Shrimankar, D D; Sathe, S R

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today's supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures.
Analysis of Parallel Algorithms on SMP Node and Cluster of Workstations Using Parallel Programming Models with New Tile-based Method for Large Biological Datasets

Science.gov (United States)

Shrimankar, D. D.; Sathe, S. R.

2016-01-01

Sequence alignment is an important tool for describing the relationships between DNA sequences. Many sequence alignment algorithms exist, differing in efficiency, in their models of the sequences, and in the relationship between sequences. The focus of this study is to obtain an optimal alignment between two sequences of biological data, particularly DNA sequences. The algorithm is discussed with particular emphasis on time, speedup, and efficiency optimizations. Parallel programming presents a number of critical challenges to application developers. Today’s supercomputer often consists of clusters of SMP nodes. Programming paradigms such as OpenMP and MPI are used to write parallel codes for such architectures. However, the OpenMP programs cannot be scaled for more than a single SMP node. However, programs written in MPI can have more than single SMP nodes. But such a programming paradigm has an overhead of internode communication. In this work, we explore the tradeoffs between using OpenMP and MPI. We demonstrate that the communication overhead incurs significantly even in OpenMP loop execution and increases with the number of cores participating. We also demonstrate a communication model to approximate the overhead from communication in OpenMP loops. Our results are astonishing and interesting to a large variety of input data files. We have developed our own load balancing and cache optimization technique for message passing model. Our experimental results show that our own developed techniques give optimum performance of our parallel algorithm for various sizes of input parameter, such as sequence size and tile size, on a wide variety of multicore architectures. PMID:27932868
The numerical parallel computing of photon transport

International Nuclear Information System (INIS)

Huang Qingnan; Liang Xiaoguang; Zhang Lifa

1998-12-01

The parallel computing of photon transport is investigated, the parallel algorithm and the parallelization of programs on parallel computers both with shared memory and with distributed memory are discussed. By analyzing the inherent law of the mathematics and physics model of photon transport according to the structure feature of parallel computers, using the strategy of 'to divide and conquer', adjusting the algorithm structure of the program, dissolving the data relationship, finding parallel liable ingredients and creating large grain parallel subtasks, the sequential computing of photon transport into is efficiently transformed into parallel and vector computing. The program was run on various HP parallel computers such as the HY-1 (PVP), the Challenge (SMP) and the YH-3 (MPP) and very good parallel speedup has been gotten
Shared Variable Oriented Parallel Precompiler for SPMD Model

Institute of Scientific and Technical Information of China (English)

无

1995-01-01

For the moment,commercial parallel computer systems with distributed memory architecture are usually provided with parallel FORTRAN or parallel C compliers,which are just traditional sequential FORTRAN or C compilers expanded with communication statements.Programmers suffer from writing parallel programs with communication statements. The Shared Variable Oriented Parallel Precompiler (SVOPP) proposed in this paper can automatically generate appropriate communication statements based on shared variables for SPMD(Single Program Multiple Data) computation model and greatly ease the parallel programming with high communication efficiency.The core function of parallel C precompiler has been successfully verified on a transputer-based parallel computer.Its prominent performance shows that SVOPP is probably a break-through in parallel programming technique.
Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

Science.gov (United States)

Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

2014-10-30

Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
A scalable parallel algorithm for multiple objective linear programs

Science.gov (United States)

Wiecek, Malgorzata M.; Zhang, Hong

1994-01-01

This paper presents an ADBASE-based parallel algorithm for solving multiple objective linear programs (MOLP's). Job balance, speedup and scalability are of primary interest in evaluating efficiency of the new algorithm. Implementation results on Intel iPSC/2 and Paragon multiprocessors show that the algorithm significantly speeds up the process of solving MOLP's, which is understood as generating all or some efficient extreme points and unbounded efficient edges. The algorithm gives specially good results for large and very large problems. Motivation and justification for solving such large MOLP's are also included.
Feedback and feedforward control of frequency tuning to naturalistic stimuli.

Science.gov (United States)

Chacron, Maurice J; Maler, Leonard; Bastian, Joseph

2005-06-08

Sensory neurons must respond to a wide variety of natural stimuli that can have very different spatiotemporal characteristics. Optimal responsiveness to subsets of these stimuli can be achieved by devoting specialized neural circuitry to different stimulus categories, or, alternatively, this circuitry can be modulated or tuned to optimize responsiveness to current stimulus conditions. This study explores the mechanisms that enable neurons within the initial processing station of the electrosensory system of weakly electric fish to shift their tuning properties based on the spatial extent of the stimulus. These neurons are tuned to low frequencies when the stimulus is restricted to a small region within the receptive field center but are tuned to higher frequencies when the stimulus impinges on large regions of the sensory epithelium. Through a combination of modeling and in vivo electrophysiology, we reveal the respective contributions of the filtering characteristics of extended dendritic structures and feedback circuitry to this shift in tuning. Our results show that low-frequency tuning can result from the cable properties of an extended dendrite that conveys receptor-afferent information to the cell body. The shift from low- to high-frequency tuning, seen in response to spatially extensive stimuli, results from increased wide-band input attributable to activation of larger populations of receptor afferents, as well as the activation of parallel fiber feedback from the cerebellum. This feedback provides a cancellation signal with low-pass characteristics that selectively attenuates low-frequency responsiveness. Thus, with spatially extensive stimuli, these cells preferentially respond to the higher-frequency components of the receptor-afferent input.

Tuning of tool dynamics for increased stability of parallel (simultaneous) turning processes

Science.gov (United States)

Ozturk, E.; Comak, A.; Budak, E.

2016-01-01

Parallel (simultaneous) turning operations make use of more than one cutting tool acting on a common workpiece offering potential for higher productivity. However, dynamic interaction between the tools and workpiece and resulting chatter vibrations may create quality problems on machined surfaces. In order to determine chatter free cutting process parameters, stability models can be employed. In this paper, stability of parallel turning processes is formulated in frequency and time domain for two different parallel turning cases. Predictions of frequency and time domain methods demonstrated reasonable agreement with each other. In addition, the predicted stability limits are also verified experimentally. Simulation and experimental results show multi regional stability diagrams which can be used to select most favorable set of process parameters for higher stable material removal rates. In addition to parameter selection, developed models can be used to determine the best natural frequency ratio of tools resulting in the highest stable depth of cuts. It is concluded that the most stable operations are obtained when natural frequency of the tools are slightly off each other and worst stability occurs when the natural frequency of the tools are exactly the same.
MEMS variable capacitance devices utilizing the substrate: I. Novel devices with a customizable tuning range

International Nuclear Information System (INIS)

Elshurafa, Amro M; El-Masry, Ezz I

2010-01-01

This paper, the first in a series of two, presents a paradigm shift in the design of MEMS parallel plate PolyMUMPS variable capacitance devices by proposing two structures that utilize the substrate and are able to provide predetermined, customizable, tuning ranges and/or ratios. The proposed structures can provide theoretical tuning ranges anywhere from 4.9 to 35 and from 3.4 to 26 respectively with a simple, yet effective, layout modification as opposed to the previously reported devices where the tuning range is fixed and cannot be varied. Theoretical analysis is carried out and verified with measurements of fabricated devices. The first proposed device possessed initially a tuning range of 4.4. Two variations of the structure having tuning ranges of 3 and 3.4, all at 1 GHz, were also successfully developed and tested. The second proposed variable capacitance device behaved as a switch.
Linear beam-beam tune shift calculations for the Tevatron Collider

International Nuclear Information System (INIS)

Johnson, D.

1989-01-01

A realistic estimate of the linear beam-beam tune shift is necessary for the selection of an optimum working point in the tune diagram. Estimates of the beam-beam tune shift using the ''Round Beam Approximation'' (RBA) have over estimated the tune shift for the Tevatron. For a hadron machine with unequal lattice functions and beam sizes, an explicit calculation using the beam size at the crossings is required. Calculations for various Tevatron lattices used in Collider operation are presented. Comparisons between the RBA and the explicit calculation, for elliptical beams, are presented. This paper discusses the calculation of the linear tune shift using the program SYNCH. Selection of a working point is discussed. The magnitude of the tune shift is influenced by the choice of crossing points in the lattice as determined by the pbar ''cogging effects''. Also discussed is current cogging procedures and presents results of calculations for tune shifts at various crossing points in the lattice. Finally, a comparison of early pbar tune measurements with the present linear tune shift calculations is presented. 17 refs., 13 figs., 3 tabs
Algorithmic differentiation of pragma-defined parallel regions differentiating computer programs containing OpenMP

CERN Document Server

Förster, Michael

2014-01-01

Numerical programs often use parallel programming techniques such as OpenMP to compute the program's output values as efficient as possible. In addition, derivative values of these output values with respect to certain input values play a crucial role. To achieve code that computes not only the output values simultaneously but also the derivative values, this work introduces several source-to-source transformation rules. These rules are based on a technique called algorithmic differentiation. The main focus of this work lies on the important reverse mode of algorithmic differentiation. The inh
Generalized Analytical Program of Thyristor Phase Control Circuit with Series and Parallel Resonance Load

OpenAIRE

Nakanishi, Sen-ichiro; Ishida, Hideaki; Himei, Toyoji

1981-01-01

The systematic analytical method is reqUired for the ac phase control circuit by means of an inverse parallel thyristor pair which has a series and parallel L-C resonant load, because the phase control action causes abnormal and interesting phenomena, such as an extreme increase of voltage and current, an unique increase and decrease of contained higher harmonics, and a wide variation of power factor, etc. In this paper, the program for the analysis of the thyristor phase control circuit with...
Optimization under uncertainty of parallel nonlinear energy sinks

Science.gov (United States)

Boroson, Ethan; Missoum, Samy; Mattei, Pierre-Olivier; Vergez, Christophe

2017-04-01

Nonlinear Energy Sinks (NESs) are a promising technique for passively reducing the amplitude of vibrations. Through nonlinear stiffness properties, a NES is able to passively and irreversibly absorb energy. Unlike the traditional Tuned Mass Damper (TMD), NESs do not require a specific tuning and absorb energy over a wider range of frequencies. Nevertheless, they are still only efficient over a limited range of excitations. In order to mitigate this limitation and maximize the efficiency range, this work investigates the optimization of multiple NESs configured in parallel. It is well known that the efficiency of a NES is extremely sensitive to small perturbations in loading conditions or design parameters. In fact, the efficiency of a NES has been shown to be nearly discontinuous in the neighborhood of its activation threshold. For this reason, uncertainties must be taken into account in the design optimization of NESs. In addition, the discontinuities require a specific treatment during the optimization process. In this work, the objective of the optimization is to maximize the expected value of the efficiency of NESs in parallel. The optimization algorithm is able to tackle design variables with uncertainty (e.g., nonlinear stiffness coefficients) as well as aleatory variables such as the initial velocity of the main system. The optimal design of several parallel NES configurations for maximum mean efficiency is investigated. Specifically, NES nonlinear stiffness properties, considered random design variables, are optimized for cases with 1, 2, 3, 4, 5, and 10 NESs in parallel. The distributions of efficiency for the optimal parallel configurations are compared to distributions of efficiencies of non-optimized NESs. It is observed that the optimization enables a sharp increase in the mean value of efficiency while reducing the corresponding variance, thus leading to more robust NES designs.
Elastomeric composites with tuned electromagnetic characteristics

International Nuclear Information System (INIS)

Wheeland, Sara; Bayatpur, Farhad; Amirkhizi, Alireza V; Nemat-Nasser, Sia

2013-01-01

This paper presents a novel elastomeric composite that exhibits a deformation-induced change in chirality. Previous efforts primarily dealt with a coil array in air without chiral tuning. Here, a composite is created that consists of an array of parallel, metallic helices of the same handedness embedded in a polymer matrix. The chiral response of the composite depends on pitch, coil diameter, wire thickness and coil spacing; however, pitch has the greatest effect on electromagnetic performance. The present study explores this effect by using helical elements to construct a chiral medium that can be mechanically stretched to adjust pitch. This adjustment directly affects the overall chirality of the composite. A prototype sample of the composite, fabricated for operation between 5.5–12.5 GHz, demonstrates repeatable elastic deformation. Using a transmit/receive measurement setup, the composite scattering response is measured over the frequency interval. The results indicate substantial tuning of chirality through deformation. An increase in axial strain of up to 30% yields a ∼18% change in axial chirality. (paper)
Parallelism in matrix computations

CERN Document Server

Gallopoulos, Efstratios; Sameh, Ahmed H

2016-01-01

This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...
Building a parallel file system simulator

International Nuclear Information System (INIS)

Molina-Estolano, E; Maltzahn, C; Brandt, S A; Bent, J

2009-01-01

Parallel file systems are gaining in popularity in high-end computing centers as well as commercial data centers. High-end computing systems are expected to scale exponentially and to pose new challenges to their storage scalability in terms of cost and power. To address these challenges scientists and file system designers will need a thorough understanding of the design space of parallel file systems. Yet there exist few systematic studies of parallel file system behavior at petabyte- and exabyte scale. An important reason is the significant cost of getting access to large-scale hardware to test parallel file systems. To contribute to this understanding we are building a parallel file system simulator that can simulate parallel file systems at very large scale. Our goal is to simulate petabyte-scale parallel file systems on a small cluster or even a single machine in reasonable time and fidelity. With this simulator, file system experts will be able to tune existing file systems for specific workloads, scientists and file system deployment engineers will be able to better communicate workload requirements, file system designers and researchers will be able to try out design alternatives and innovations at scale, and instructors will be able to study very large-scale parallel file system behavior in the class room. In this paper we describe our approach and provide preliminary results that are encouraging both in terms of fidelity and simulation scalability.
MEMS variable capacitance devices utilizing the substrate: I. Novel devices with a customizable tuning range

KAUST Repository

Elshurafa, Amro M.

2010-03-22

This paper, the first in a series of two, presents a paradigm shift in the design of MEMS parallel plate PolyMUMPS variable capacitance devices by proposing two structures that utilize the substrate and are able to provide predetermined, customizable, tuning ranges and/or ratios. The proposed structures can provide theoretical tuning ranges anywhere from 4.9 to 35 and from 3.4 to 26 respectively with a simple, yet effective, layout modification as opposed to the previously reported devices where the tuning range is fixed and cannot be varied. Theoretical analysis is carried out and verified with measurements of fabricated devices. The first proposed device possessed initially a tuning range of 4.4. Two variations of the structure having tuning ranges of 3 and 3.4, all at 1 GHz, were also successfully developed and tested. The second proposed variable capacitance device behaved as a switch. © 2010 IOP Publishing Ltd.
Measurement of Beam Tunes in the Tevatron Using the BBQ System

Energy Technology Data Exchange (ETDEWEB)

Edstrom, Dean R.; /Indiana U.

2009-04-01

Measuring the betatron tunes in any synchrotron is of critical importance to ensuring the stability of beam in the synchrotron. The Base Band Tune, or BBQ, measurement system was developed by Marek Gasior of CERN and has been installed at Brookhaven and Fermilab as a part of the LHC Accelerator Research Program, or LARP. The BBQ was installed in the Tevatron to evaluate its effectiveness at reading proton and antiproton tunes at its flattop energy of 980 GeV. The primary objectives of this thesis are to examine the methods used to measure the tune using the BBQ tune measurement system, to incorporate the system into the Fermilab accelerator controls system, ACNET, and to compare the BBQ to existing tune measurement systems in the Tevatron.
Measurement of Beam Tunes in the Tevatron Using the BBQ System

International Nuclear Information System (INIS)

Edstrom, Dean R.

2009-01-01

Measuring the betatron tunes in any synchrotron is of critical importance to ensuring the stability of beam in the synchrotron. The Base Band Tune, or BBQ, measurement system was developed by Marek Gasior of CERN and has been installed at Brookhaven and Fermilab as a part of the LHC Accelerator Research Program, or LARP. The BBQ was installed in the Tevatron to evaluate its effectiveness at reading proton and antiproton tunes at its flattop energy of 980 GeV. The primary objectives of this thesis are to examine the methods used to measure the tune using the BBQ tune measurement system, to incorporate the system into the Fermilab accelerator controls system, ACNET, and to compare the BBQ to existing tune measurement systems in the Tevatron
Automatic Monte-Carlo tuning for minimum bias events at the LHC

Energy Technology Data Exchange (ETDEWEB)

Kama, Sami

2010-06-22

The Large Hadron Collider near Geneva Switzerland will ultimately collide protons at a center-of-mass energy of 14 TeV and 40 MHz bunch crossing rate with a luminosity of L=10{sup 34} cm{sup -2}s{sup -1}. At each bunch crossing about 20 soft proton-proton interactions are expected to happen. In order to study new phenomena and improve our current knowledge of the physics these events must be understood. However, the physics of soft interactions are not completely known at such high energies. Different phenomenological models, trying to explain these interactions, are implemented in several Monte-Carlo (MC) programs such as PYTHIA, PHOJET and EPOS. Some parameters in such MC programs can be tuned to improve the agreement with the data. In this thesis a new method for tuning the MC programs, based on Genetic Algorithms and distributed analysis techniques have been presented. This method represents the first and fully automated MC tuning technique that is based on true MC distributions. It is an alternative to parametrization-based automatic tuning. This new method is used in finding new tunes for PYTHIA 6 and 8. These tunes are compared to the tunes found by alternative methods, such as the PROFESSOR framework and manual tuning, and found to be equivalent or better. Charged particle multiplicity, dN{sub ch}/d{eta}, Lorentz-invariant yield, transverse momentum and mean transverse momentum distributions at various center-of-mass energies are generated using default tunes of EPOS, PHOJET and the Genetic Algorithm tunes of PYTHIA 6 and 8. These distributions are compared to measurements from UA5, CDF, CMS and ATLAS in order to investigate the best model available. Their predictions for the ATLAS detector at LHC energies have been investigated both with generator level and full detector simulation studies. Comparison with the data did not favor any model implemented in the generators, but EPOS is found to describe investigated distributions better. New data from ATLAS and
Automatic Monte-Carlo tuning for minimum bias events at the LHC

International Nuclear Information System (INIS)

Kama, Sami

2010-01-01

The Large Hadron Collider near Geneva Switzerland will ultimately collide protons at a center-of-mass energy of 14 TeV and 40 MHz bunch crossing rate with a luminosity of L=10 34 cm -2 s -1 . At each bunch crossing about 20 soft proton-proton interactions are expected to happen. In order to study new phenomena and improve our current knowledge of the physics these events must be understood. However, the physics of soft interactions are not completely known at such high energies. Different phenomenological models, trying to explain these interactions, are implemented in several Monte-Carlo (MC) programs such as PYTHIA, PHOJET and EPOS. Some parameters in such MC programs can be tuned to improve the agreement with the data. In this thesis a new method for tuning the MC programs, based on Genetic Algorithms and distributed analysis techniques have been presented. This method represents the first and fully automated MC tuning technique that is based on true MC distributions. It is an alternative to parametrization-based automatic tuning. This new method is used in finding new tunes for PYTHIA 6 and 8. These tunes are compared to the tunes found by alternative methods, such as the PROFESSOR framework and manual tuning, and found to be equivalent or better. Charged particle multiplicity, dN ch /dη, Lorentz-invariant yield, transverse momentum and mean transverse momentum distributions at various center-of-mass energies are generated using default tunes of EPOS, PHOJET and the Genetic Algorithm tunes of PYTHIA 6 and 8. These distributions are compared to measurements from UA5, CDF, CMS and ATLAS in order to investigate the best model available. Their predictions for the ATLAS detector at LHC energies have been investigated both with generator level and full detector simulation studies. Comparison with the data did not favor any model implemented in the generators, but EPOS is found to describe investigated distributions better. New data from ATLAS and CMS show higher
High performance parallel computers for science: New developments at the Fermilab advanced computer program

International Nuclear Information System (INIS)

Nash, T.; Areti, H.; Atac, R.

1988-08-01

Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs
Development, Verification and Validation of Parallel, Scalable Volume of Fluid CFD Program for Propulsion Applications

Science.gov (United States)

West, Jeff; Yang, H. Q.

2014-01-01

There are many instances involving liquid/gas interfaces and their dynamics in the design of liquid engine powered rockets such as the Space Launch System (SLS). Some examples of these applications are: Propellant tank draining and slosh, subcritical condition injector analysis for gas generators, preburners and thrust chambers, water deluge mitigation for launch induced environments and even solid rocket motor liquid slag dynamics. Commercially available CFD programs simulating gas/liquid interfaces using the Volume of Fluid approach are currently limited in their parallel scalability. In 2010 for instance, an internal NASA/MSFC review of three commercial tools revealed that parallel scalability was seriously compromised at 8 cpus and no additional speedup was possible after 32 cpus. Other non-interface CFD applications at the time were demonstrating useful parallel scalability up to 4,096 processors or more. Based on this review, NASA/MSFC initiated an effort to implement a Volume of Fluid implementation within the unstructured mesh, pressure-based algorithm CFD program, Loci-STREAM. After verification was achieved by comparing results to the commercial CFD program CFD-Ace+, and validation by direct comparison with data, Loci-STREAM-VoF is now the production CFD tool for propellant slosh force and slosh damping rate simulations at NASA/MSFC. On these applications, good parallel scalability has been demonstrated for problems sizes of tens of millions of cells and thousands of cpu cores. Ongoing efforts are focused on the application of Loci-STREAM-VoF to predict the transient flow patterns of water on the SLS Mobile Launch Platform in order to support the phasing of water for launch environment mitigation so that vehicle determinantal effects are not realized.
The technical feasibility of uranium enrichment for nuclear bomb construction at the parallel nuclear program plant

International Nuclear Information System (INIS)

Rosa, L.P.

1990-01-01

It is discussed the hole of the Parallel Nuclear Program is Brazil and the feasibility of uranium enrichment for nuclear bomb construction. This program involves two research centers, one belonging to the brazilian navy and another to the aeronautics. Some other brazilian institutes like CTA, IPEN, COPESP and CETEX and also taking part in the program. (A.C.A.S.)
Predictive Performance Tuning of OpenACC Accelerated Applications

KAUST Repository

Siddiqui, Shahzeb

2014-05-04

Graphics Processing Units (GPUs) are gradually becoming mainstream in supercomputing as their capabilities to significantly accelerate a large spectrum of scientific applications have been clearly identified and proven. Moreover, with the introduction of high level programming models such as OpenACC [1] and OpenMP 4.0 [2], these devices are becoming more accessible and practical to use by a larger scientific community. However, performance optimization of OpenACC accelerated applications usually requires an in-depth knowledge of the hardware and software specifications. We suggest a prediction-based performance tuning mechanism [3] to quickly tune OpenACC parameters for a given application to dynamically adapt to the execution environment on a given system. This approach is applied to a finite difference kernel to tune the OpenACC gang and vector clauses for mapping the compute kernels into the underlying accelerator architecture. Our experiments show a significant performance improvement against the default compiler parameters and a faster tuning by an order of magnitude compared to the brute force search tuning.
Implementations of BLAST for parallel computers.

Science.gov (United States)

Jülich, A

1995-02-01

The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Exploiting Symmetry on Parallel Architectures.

Science.gov (United States)

Stiller, Lewis Benjamin

1995-01-01

This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.

The language parallel Pascal and other aspects of the massively parallel processor

Science.gov (United States)

Reeves, A. P.; Bruner, J. D.

1982-01-01

A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Re-tuning tuned mass dampers using ambient vibration measurements

International Nuclear Information System (INIS)

Hazra, B; Sadhu, A; Narasimhan, S; Lourenco, R

2010-01-01

Deterioration, accidental changes in the operating conditions, or incorrect estimates of the structure modal properties lead to de-tuning in tuned mass dampers (TMDs). To restore optimal performance, it is necessary to estimate the modal properties of the system, and re-tune the TMD to its optimal state. The presence of closely spaced modes and a relatively large amount of damping in the dominant modes renders the process of identification difficult. Furthermore, the process of estimating the modal properties of the bare structure using ambient vibration measurements of the structure with the TMD is challenging. In order to overcome these challenges, a novel identification and re-tuning algorithm is proposed. The process of identification consists of empirical mode decomposition to separate the closely spaced modes, followed by the blind identification of the remaining modes. Algorithms for estimating the fundamental frequency and the mode shape of the primary structure necessary for re-tuning the TMD are proposed. Experimental results from the application of the proposed algorithms to identify and re-tune a laboratory structure TMD system are presented
An approach to multicore parallelism using functional programming: A case study based on Presburger Arithmetic

DEFF Research Database (Denmark)

Dung, Phan Anh; Hansen, Michael Reichhardt

2015-01-01

In this paper we investigate multicore parallelism in the context of functional programming by means of two quantifier-elimination procedures for Presburger Arithmetic: one is based on Cooper’s algorithm and the other is based on the Omega Test. We first develop correct-by-construction prototype...... platform executing on an 8-core machine. A speedup of approximately 4 was obtained for Cooper’s algorithm and a speedup of approximately 6 was obtained for the exact-shadow part of the Omega Test. The considered procedures are complex, memory-intense algorithms on huge formula trees and the case study...... reveals more general applicable techniques and guideline for deriving parallel algorithms from sequential ones in the context of data-intensive tree algorithms. The obtained insights should apply for any strict and impure functional programming language. Furthermore, the results obtained for the exact...
Parallel computing: numerics, applications, and trends

National Research Council Canada - National Science Library

Trobec, Roman; Vajteršic, Marián; Zinterhof, Peter

2009-01-01

... and/or distributed systems. The contributions to this book are focused on topics most concerned in the trends of today's parallel computing. These range from parallel algorithmics, programming, tools, network computing to future parallel computing. Particular attention is paid to parallel numerics: linear algebra, differential equations, numerica...
Parallel Computing Strategies for Irregular Algorithms

Science.gov (United States)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Tune shift and betatron modulations due to insertion devices in SPEAR

International Nuclear Information System (INIS)

Corbett, W.J.

1989-12-01

SPEAR will soon operate as a dedicated synchrotron radiation source with up to 5 beamlines fed from insertion devices. These magnets introduce additional focusing forces into the storage ring lattice which increase the vertical betatron tune and modulate the beam envelope in the vertical plane. The lattice simulation code 'GEMINI' is used to evaluate the tune shifts and estimate the degree of betatron modulation as each magnetic insertion device is brought up to full power. A program is recommended to correct the tunes with the FODO cell quadrupoles. 4 refs., 8 figs., 1 tab
NMR parallel Q-meter with double-balanced-mixer detection for polarized target experiments

International Nuclear Information System (INIS)

Boissevain, J.; Tippens, W.B.

1983-01-01

A constant-voltage, parallel-tuned nuclear magnetic resonance (NMR) circuit, patterned after a Liverpool design, has been developed for polarized target experiments. Measuring the admittance of the resonance circuit allows advantageous use of double-balanced mixer detection. The resonant circuit is tolerant of stray capacitance between the NMR coil and the target cavity, thus easing target-cell-design constraints. The reference leg of the circuit includes a voltage-controlled attenuator and phase shifter for ease of tuning. The NMR output features a flat background and has good linearity and stability
Tune-Based Halo Diagnostics

International Nuclear Information System (INIS)

Cameron, Peter

2003-01-01

Tune-based halo diagnostics can be divided into two categories -- diagnostics for halo prevention, and diagnostics for halo measurement. Diagnostics for halo prevention are standard fare in accumulators, synchrotrons, and storage rings, and again can be divided into two categories -- diagnostics to measure the tune distribution (primarily to avoid resonances), and diagnostics to identify instabilities (which will not be discussed here). These diagnostic systems include kicked (coherent) tune measurement, phase-locked loop (PLL) tune measurement, Schottky tune measurement, beam transfer function (BTF) measurements, and measurement of transverse quadrupole mode envelope oscillations. We refer briefly to tune diagnostics used at RHIC and intended for the SNS, and then present experimental results. Tune-based diagnostics for halo measurement (as opposed to prevention) are considerably more difficult. We present one brief example of tune-based halo measurement
SOFTWARE FOR DESIGNING PARALLEL APPLICATIONS

Directory of Open Access Journals (Sweden)

M. K. Bouza

2017-01-01

Full Text Available The object of research is the tools to support the development of parallel programs in C/C ++. The methods and software which automates the process of designing parallel applications are proposed.
Overview of implementation of DARPA GPU program in SAIC

Science.gov (United States)

Braunreiter, Dennis; Furtek, Jeremy; Chen, Hai-Wen; Healy, Dennis

2008-04-01

This paper reviews the implementation of DARPA MTO STAP-BOY program for both Phase I and II conducted at Science Applications International Corporation (SAIC). The STAP-BOY program conducts fast covariance factorization and tuning techniques for space-time adaptive process (STAP) Algorithm Implementation on Graphics Processor unit (GPU) Architectures for Embedded Systems. The first part of our presentation on the DARPA STAP-BOY program will focus on GPU implementation and algorithm innovations for a prototype radar STAP algorithm. The STAP algorithm will be implemented on the GPU, using stream programming (from companies such as PeakStream, ATI Technologies' CTM, and NVIDIA) and traditional graphics APIs. This algorithm will include fast range adaptive STAP weight updates and beamforming applications, each of which has been modified to exploit the parallel nature of graphics architectures.
Automatically tuned adaptive differencing algorithm for 3-D SN implemented in PENTRAN

International Nuclear Information System (INIS)

Sjoden, G.; Courau, T.; Manalo, K.; Yi, C.

2009-01-01

We present an adaptive algorithm with an automated tuning feature to augment optimum differencing scheme selection for 3-D S N computations in Cartesian geometry. This adaptive differencing scheme has been implemented in the PENTRAN parallel S N code. Individual fixed zeroth spatial transport moment based schemes, including Diamond Zero (DZ), Directional Theta Weighted (DTW), and Exponential Directional Iterative (EDI) 3-D S N methods were evaluated and compared with solutions generated using a code-tuned adaptive algorithm. Model problems considered include a fixed source slab problem (using reflected y- and z-axes) which contained mixed shielding and diffusive regions, and a 17 x 17 PWR assembly eigenvalue test problem; these problems were benchmarked against multigroup MCNP5 Monte Carlo computations. Both problems were effective in highlighting the performance of the adaptive scheme compared to single schemes, and demonstrated that the adaptive tuning handles exceptions to the standard DZ-DTW-EDI adaptive strategy. The tuning feature includes special scheme selection provisions for optically thin cells, and incorporates the ratio of the angular source density relative to the total angular collision density to best select the differencing method. Overall, the adaptive scheme demonstrated the best overall solution accuracy in the test problems. (authors)
Angular tuning of the magnetic birefringence in rippled cobalt films

Energy Technology Data Exchange (ETDEWEB)

Arranz, Miguel A., E-mail: MiguelAngel.Arranz@uclm.es [Facultad de Ciencias y Tecnologías Químicas, Universidad de Castilla-La Mancha, Avda. Camilo J. Cela 10, 13071 Ciudad Real (Spain); Colino, José M. [Instituto de Nanociencia, Nanotecnología y Materiales Moleculares, Universidad de Castilla-La Mancha, Campus de la Fábrica de Armas, 45071 Toledo (Spain)

2015-06-22

We report the measurement of magnetically induced birefringence in rippled Co films. For this purpose, the magneto-optical properties of ion beam eroded ferromagnetic films were studied using Kerr magnetometry and magnetic birefringence in the transmitted light intensity. Upon sufficient ion sculpting, these ripple surface nanostructures developed a defined uniaxial anisotropy in the in-plane magnetization, finely tuning the magnetic birefringence effect. We have studied its dependence on the relative orientation between the ripple direction and the magnetic field, and found this effect to be dramatically correlated with the capability to neatly distinguish the mechanisms for the in-plane magnetization reversal, i.e., rotation and nucleation. This double refraction corresponds univocally to the two magnetization axes, parallel and perpendicular to the ripples direction. We have also observed that tuned birefringence in stack assemblies of rippled Co films, which enables us to technically manipulate the number and direction of refraction axes.
Angular tuning of the magnetic birefringence in rippled cobalt films

International Nuclear Information System (INIS)

Arranz, Miguel A.; Colino, José M.

2015-01-01

We report the measurement of magnetically induced birefringence in rippled Co films. For this purpose, the magneto-optical properties of ion beam eroded ferromagnetic films were studied using Kerr magnetometry and magnetic birefringence in the transmitted light intensity. Upon sufficient ion sculpting, these ripple surface nanostructures developed a defined uniaxial anisotropy in the in-plane magnetization, finely tuning the magnetic birefringence effect. We have studied its dependence on the relative orientation between the ripple direction and the magnetic field, and found this effect to be dramatically correlated with the capability to neatly distinguish the mechanisms for the in-plane magnetization reversal, i.e., rotation and nucleation. This double refraction corresponds univocally to the two magnetization axes, parallel and perpendicular to the ripples direction. We have also observed that tuned birefringence in stack assemblies of rippled Co films, which enables us to technically manipulate the number and direction of refraction axes
Fully parallel write/read in resistive synaptic array for accelerating on-chip learning

Science.gov (United States)

Gao, Ligang; Wang, I.-Ting; Chen, Pai-Yu; Vrudhula, Sarma; Seo, Jae-sun; Cao, Yu; Hou, Tuo-Hung; Yu, Shimeng

2015-11-01

A neuro-inspired computing paradigm beyond the von Neumann architecture is emerging and it generally takes advantage of massive parallelism and is aimed at complex tasks that involve intelligence and learning. The cross-point array architecture with synaptic devices has been proposed for on-chip implementation of the weighted sum and weight update in the learning algorithms. In this work, forming-free, silicon-process-compatible Ta/TaO x /TiO2/Ti synaptic devices are fabricated, in which >200 levels of conductance states could be continuously tuned by identical programming pulses. In order to demonstrate the advantages of parallelism of the cross-point array architecture, a novel fully parallel write scheme is designed and experimentally demonstrated in a small-scale crossbar array to accelerate the weight update in the training process, at a speed that is independent of the array size. Compared to the conventional row-by-row write scheme, it achieves >30× speed-up and >30× improvement in energy efficiency as projected in a large-scale array. If realistic synaptic device characteristics such as device variations are taken into an array-level simulation, the proposed array architecture is able to achieve ∼95% recognition accuracy of MNIST handwritten digits, which is close to the accuracy achieved by software using the ideal sparse coding algorithm.
Fully parallel write/read in resistive synaptic array for accelerating on-chip learning

International Nuclear Information System (INIS)

Gao, Ligang; Chen, Pai-Yu; Seo, Jae-sun; Cao, Yu; Yu, Shimeng; Wang, I-Ting; Hou, Tuo-Hung; Vrudhula, Sarma

2015-01-01

A neuro-inspired computing paradigm beyond the von Neumann architecture is emerging and it generally takes advantage of massive parallelism and is aimed at complex tasks that involve intelligence and learning. The cross-point array architecture with synaptic devices has been proposed for on-chip implementation of the weighted sum and weight update in the learning algorithms. In this work, forming-free, silicon-process-compatible Ta/TaO_x/TiO_2/Ti synaptic devices are fabricated, in which >200 levels of conductance states could be continuously tuned by identical programming pulses. In order to demonstrate the advantages of parallelism of the cross-point array architecture, a novel fully parallel write scheme is designed and experimentally demonstrated in a small-scale crossbar array to accelerate the weight update in the training process, at a speed that is independent of the array size. Compared to the conventional row-by-row write scheme, it achieves >30× speed-up and >30× improvement in energy efficiency as projected in a large-scale array. If realistic synaptic device characteristics such as device variations are taken into an array-level simulation, the proposed array architecture is able to achieve ∼95% recognition accuracy of MNIST handwritten digits, which is close to the accuracy achieved by software using the ideal sparse coding algorithm. (paper)
A novel harmonic current sharing control strategy for parallel-connected inverters

DEFF Research Database (Denmark)

Guan, Yajuan; Guerrero, Josep M.; Savaghebi, Mehdi

2017-01-01

A novel control strategy which enables proportional linear and nonlinear loads sharing among paralleled inverters and voltage harmonic suppression is proposed in this paper. The proposed method is based on the autonomous currents sharing controller (ACSC) instead of conventional power droop control...... to provide fast transient response, decoupling control and large stability margin. The current components at different sequences and orders are decomposed by a multi-second-order generalized integrator-based frequency locked loop (MSOGI-FLL). A harmonic-orthogonal-virtual-resistances controller (HOVR......) is used to proportionally share current components at different sequences and orders independently among the paralleled inverters. Proportional resonance controllers tuned at selected frequencies are used to suppress voltage harmonics. Simulations based on two 2.2 kW paralleled three-phase inverters...
Development of whole core thermal-hydraulic analysis program ACT. 4. Simplified fuel assembly model and parallelization by MPI

International Nuclear Information System (INIS)

Ohshima, Hiroyuki

2001-10-01

A whole core thermal-hydraulic analysis program ACT is being developed for the purpose of evaluating detailed in-core thermal hydraulic phenomena of fast reactors including the effect of the flow between wrapper-tube walls (inter-wrapper flow) under various reactor operation conditions. As appropriate boundary conditions in addition to a detailed modeling of the core are essential for accurate simulations of in-core thermal hydraulics, ACT consists of not only fuel assembly and inter-wrapper flow analysis modules but also a heat transport system analysis module that gives response of the plant dynamics to the core model. This report describes incorporation of a simplified model to the fuel assembly analysis module and program parallelization by a message passing method toward large-scale simulations. ACT has a fuel assembly analysis module which can simulate a whole fuel pin bundle in each fuel assembly of the core and, however, it may take much CPU time for a large-scale core simulation. Therefore, a simplified fuel assembly model that is thermal-hydraulically equivalent to the detailed one has been incorporated in order to save the simulation time and resources. This simplified model is applied to several parts of fuel assemblies in a core where the detailed simulation results are not required. With regard to the program parallelization, the calculation load and the data flow of ACT were analyzed and the optimum parallelization has been done including the improvement of the numerical simulation algorithm of ACT. Message Passing Interface (MPI) is applied to data communication between processes and synchronization in parallel calculations. Parallelized ACT was verified through a comparison simulation with the original one. In addition to the above works, input manuals of the core analysis module and the heat transport system analysis module have been prepared. (author)
Mobile and replicated alignment of arrays in data-parallel programs

Science.gov (United States)

Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert

1993-01-01

When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. We solve two facets of the problem of finding alignments that reduce residual communication: we determine alignments that vary in loops, and objects that should have replicated alignments. We show that loop-dependent mobile alignment is sometimes necessary for optimum performance, and we provide algorithms with which a compiler can determine good mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. We propose an algorithm based on network flow that determines which objects to replicate so as to minimize the total amount of broadcast communication in replication. This work on mobile and replicated alignment extends our earlier work on determining static alignment.
Parallel Breadth-First Search on Distributed Memory Systems

Energy Technology Data Exchange (ETDEWEB)

Computational Research Division; Buluc, Aydin; Madduri, Kamesh

2011-04-15

Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned par- allel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix- partitioning-based approach that mitigates parallel commu- nication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execu- tion regimes in which these approaches will be competitive, and we demonstrate extremely high performance on lead- ing distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny- Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.
Parallelizing Gene Expression Programming Algorithm in Enabling Large-Scale Classification

Directory of Open Access Journals (Sweden)

Lixiong Xu

2017-01-01

Full Text Available As one of the most effective function mining algorithms, Gene Expression Programming (GEP algorithm has been widely used in classification, pattern recognition, prediction, and other research fields. Based on the self-evolution, GEP is able to mine an optimal function for dealing with further complicated tasks. However, in big data researches, GEP encounters low efficiency issue due to its long time mining processes. To improve the efficiency of GEP in big data researches especially for processing large-scale classification tasks, this paper presents a parallelized GEP algorithm using MapReduce computing model. The experimental results show that the presented algorithm is scalable and efficient for processing large-scale classification tasks.

Output-Mirror-Tuning Terahertz-Wave Parametric Oscillator with an Asymmetrical Porro-Prism Resonator Configuration

Science.gov (United States)

Zhang, Ruiliang; Qu, Yanchen; Zhao, Weijiang; Liu, Chuang; Chen, Zhenlei

2017-06-01

We demonstrate a terahertz-wave parametric oscillator (TPO) with an asymmetrical porro-prism (PP) resonator configuration, consisting of a close PP corner reflector and a distant output mirror relative to the MgO:LiNbO3 crystal. Based on this cavity, frequency tuning of Stokes and the accompanied terahertz (THz) waves is realized just by rotating the plane mirror. Furthermore, THz output with high efficiency and wide tuning range is obtained. Compared with a conventional TPO employing a plane-parallel resonator of the same cavity length and output loss, the low end of the frequency tuning range is extended to 0.96 THz from 1.2 THz. The highest output obtained at 1.28 THz is enhanced by about 25%, and the oscillation threshold pump energy measured at 1.66 THz is reduced by about 4.5%. This resonator configuration also shows some potential to simplify the structure and application for intracavity TPOs.
Parameter estimation in large-scale systems biology models: a parallel and self-adaptive cooperative strategy.

Science.gov (United States)

Penas, David R; González, Patricia; Egea, Jose A; Doallo, Ramón; Banga, Julio R

2017-01-21

The development of large-scale kinetic models is one of the current key issues in computational systems biology and bioinformatics. Here we consider the problem of parameter estimation in nonlinear dynamic models. Global optimization methods can be used to solve this type of problems but the associated computational cost is very large. Moreover, many of these methods need the tuning of a number of adjustable search parameters, requiring a number of initial exploratory runs and therefore further increasing the computation times. Here we present a novel parallel method, self-adaptive cooperative enhanced scatter search (saCeSS), to accelerate the solution of this class of problems. The method is based on the scatter search optimization metaheuristic and incorporates several key new mechanisms: (i) asynchronous cooperation between parallel processes, (ii) coarse and fine-grained parallelism, and (iii) self-tuning strategies. The performance and robustness of saCeSS is illustrated by solving a set of challenging parameter estimation problems, including medium and large-scale kinetic models of the bacterium E. coli, bakerés yeast S. cerevisiae, the vinegar fly D. melanogaster, Chinese Hamster Ovary cells, and a generic signal transduction network. The results consistently show that saCeSS is a robust and efficient method, allowing very significant reduction of computation times with respect to several previous state of the art methods (from days to minutes, in several cases) even when only a small number of processors is used. The new parallel cooperative method presented here allows the solution of medium and large scale parameter estimation problems in reasonable computation times and with small hardware requirements. Further, the method includes self-tuning mechanisms which facilitate its use by non-experts. We believe that this new method can play a key role in the development of large-scale and even whole-cell dynamic models.
PERI auto-tuning

Energy Technology Data Exchange (ETDEWEB)

Bailey, D H; Williams, S [Lawrence Berkeley National Laboratory, Berkeley, CA 94720 (United States); Chame, J; Chen, C; Hall, M [USC/ISI, Marina del Rey, CA 90292 (United States); Dongarra, J; Moore, S; Seymour, K; You, H [University of Tennessee, Knoxville, TN 37996 (United States); Hollingsworth, J K; Tiwari, A [University of Maryland, College Park, MD 20742 (United States); Hovland, P; Shin, J [Argonne National Laboratory, Argonne, IL 60439 (United States)], E-mail: mhall@isi.edu

2008-07-15

The enormous and growing complexity of today's high-end systems has increased the already significant challenges of obtaining high performance on equally complex scientific applications. Application scientists are faced with a daunting challenge in tuning their codes to exploit performance-enhancing architectural features. The Performance Engineering Research Institute (PERI) is working toward the goal of automating portions of the performance tuning process. This paper describes PERI's overall strategy for auto-tuning tools and recent progress in both building auto-tuning tools and demonstrating their success on kernels, some taken from large-scale applications.
Tuning the LEDA RFQ 6.7 MeV accelerator

International Nuclear Information System (INIS)

Young, L.M.; Rybarcyk, L.

1998-01-01

This paper presents the results of tuning the 8 meter long Radio Frequency Quadrupole (RFQ) built for the Low Energy Demonstration Accelerator (LEDA). This 350-MHz RFQ is split into four 2-meter-long-RFQs. Then they are joined with resonant coupling to form an 8-meter-long RFQ. This improves both the longitudinal stability and the transverse stability of this long RFQ. The frequencies of the modes near the RFQ mode are measured. The authors show the effect on the RF fields of an error in the temperature of each one of the 2-meter-long-RFQs. Slug tuners distributed along the outer walls tune the RFQ. The program RFQTUNE is used to determine the length of the tuners. The tuners are machined to length when the final tuning is complete
HPC parallel programming model for gyrokinetic MHD simulation

International Nuclear Information System (INIS)

Naitou, Hiroshi; Yamada, Yusuke; Tokuda, Shinji; Ishii, Yasutomo; Yagi, Masatoshi

2011-01-01

The 3-dimensional gyrokinetic PIC (particle-in-cell) code for MHD simulation, Gpic-MHD, was installed on SR16000 (“Plasma Simulator”), which is a scalar cluster system consisting of 8,192 logical cores. The Gpic-MHD code advances particle and field quantities in time. In order to distribute calculations over large number of logical cores, the total simulation domain in cylindrical geometry was broken up into N DD-r × N DD-z (number of radial decomposition times number of axial decomposition) small domains including approximately the same number of particles. The axial direction was uniformly decomposed, while the radial direction was non-uniformly decomposed. N RP replicas (copies) of each decomposed domain were used (“particle decomposition”). The hybrid parallelization model of multi-threads and multi-processes was employed: threads were parallelized by the auto-parallelization and N DD-r × N DD-z × N RP processes were parallelized by MPI (message-passing interface). The parallelization performance of Gpic-MHD was investigated for the medium size system of N r × N θ × N z = 1025 × 128 × 128 mesh with 4.196 or 8.192 billion particles. The highest speed for the fixed number of logical cores was obtained for two threads, the maximum number of N DD-z , and optimum combination of N DD-r and N RP . The observed optimum speeds demonstrated good scaling up to 8,192 logical cores. (author)
An Introduction to Parallel Computation R

Indian Academy of Sciences (India)

How are they programmed? This article provides an introduction. A parallel computer is a network of processors built for ... and have been used to solve problems much faster than a single ... in parallel computer design is to select an organization which ..... The most ambitious approach to parallel computing is to develop.
The parallel processing of EGS4 code on distributed memory scalar parallel computer:Intel Paragon XP/S15-256

Energy Technology Data Exchange (ETDEWEB)

Takemiya, Hiroshi; Ohta, Hirofumi; Honma, Ichirou

1996-03-01

The parallelization of Electro-Magnetic Cascade Monte Carlo Simulation Code, EGS4 on distributed memory scalar parallel computer: Intel Paragon XP/S15-256 is described. EGS4 has the feature that calculation time for one incident particle is quite different from each other because of the dynamic generation of secondary particles and different behavior of each particle. Granularity for parallel processing, parallel programming model and the algorithm of parallel random number generation are discussed and two kinds of method, each of which allocates particles dynamically or statically, are used for the purpose of realizing high speed parallel processing of this code. Among four problems chosen for performance evaluation, the speedup factors for three problems have been attained to nearly 100 times with 128 processor. It has been found that when both the calculation time for each incident particles and its dispersion are large, it is preferable to use dynamic particle allocation method which can average the load for each processor. And it has also been found that when they are small, it is preferable to use static particle allocation method which reduces the communication overhead. Moreover, it is pointed out that to get the result accurately, it is necessary to use double precision variables in EGS4 code. Finally, the workflow of program parallelization is analyzed and tools for program parallelization through the experience of the EGS4 parallelization are discussed. (author).
Parallel computing works

Energy Technology Data Exchange (ETDEWEB)

1991-10-23

An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
Rubus: A compiler for seamless and extensible parallelism

Science.gov (United States)

Adnan, Muhammad; Aslam, Faisal; Sarwar, Syed Mansoor

2017-01-01

Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU), originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer’s expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84 times has been
Rubus: A compiler for seamless and extensible parallelism.

Directory of Open Access Journals (Sweden)

Muhammad Adnan

Full Text Available Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU, originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84
Automated tune measurements in the Advanced Light Source storage ring using a LabVIEW application

International Nuclear Information System (INIS)

Hinkson, J.A.; Chin, M.; Kim, C.H.; Nishimura, H.

1994-06-01

Horizontal and vertical betatron tunes and the synchrotron tune are measured frequently during storage ring commissioning. The measurements are tedious and subject to human errors. Automating this kind of repetitive measurement is underway using LabVIEW for Windows, a software application supplied by National Instruments Corporation, that provides acquisition, graphing, and analysis of data as well as instrument control through the General Purpose Interface Bus (GPIB). We have added LabVIEW access to the Advanced Light Source (ALS) data base and control system. LabVIEW is a fast and efficient tool for accelerator commissioning and beam physics studies. Hardware used to perform tune measurements include a tracking generator (or a white noise generator), strip line electrodes for external ''citation of the beam, button monitors, and a spectrum analyzer. All three tunes are displayed simultaneously on the spectrum analyzer. Our program automatically identifies three tunes by applying and analyzing small variations and reports the results. This routine can be encapsulated in other applications, for instance, in a chromaticity measurement and correction program
Tuned Chamber Core Panel Acoustic Test Results

Science.gov (United States)

Schiller, Noah H.; Allen, Albert R.

2016-01-01

This report documents acoustic testing of tuned chamber core panels, which can be used to supplement the low-frequency performance of conventional acoustic treatment. The tuned chamber core concept incorporates low-frequency noise control directly within the primary structure and is applicable to sandwich constructions with a directional core, including corrugated-, truss-, and fluted-core designs. These types of sandwich structures have long, hollow channels (or chambers) in the core. By adding small holes through one of the facesheets, the hollow chambers can be utilized as an array of low-frequency acoustic resonators. These resonators can then be used to attenuate low-frequency noise (below 400 Hz) inside a vehicle compartment without increasing the weight or size of the structure. The results of this test program demonstrate that the tuned chamber core concept is effective when used in isolation or combined with acoustic foam treatments. Specifically, an array of acoustic resonators integrated within the core of the panels was shown to improve both the low-frequency absorption and transmission loss of the structure in targeted one-third octave bands.
.NET 4.5 parallel extensions

CERN Document Server

Freeman, Bryan

2013-01-01

This book contains practical recipes on everything you will need to create task-based parallel programs using C#, .NET 4.5, and Visual Studio. The book is packed with illustrated code examples to create scalable programs.This book is intended to help experienced C# developers write applications that leverage the power of modern multicore processors. It provides the necessary knowledge for an experienced C# developer to work with .NET parallelism APIs. Previous experience of writing multithreaded applications is not necessary.
Patterns for Parallel Software Design

CERN Document Server

Ortega-Arjona, Jorge Luis

2010-01-01

Essential reading to understand patterns for parallel programming Software patterns have revolutionized the way we think about how software is designed, built, and documented, and the design of parallel software requires you to consider other particular design aspects and special skills. From clusters to supercomputers, success heavily depends on the design skills of software developers. Patterns for Parallel Software Design presents a pattern-oriented software architecture approach to parallel software design. This approach is not a design method in the classic sense, but a new way of managin
ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

Directory of Open Access Journals (Sweden)

Kamatani Naoyuki

2011-05-01

Full Text Available Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.
SPSS and SAS programs for determining the number of components using parallel analysis and velicer's MAP test.

Science.gov (United States)

O'Connor, B P

2000-08-01

Popular statistical software packages do not have the proper procedures for determining the number of components in factor and principal components analyses. Parallel analysis and Velicer's minimum average partial (MAP) test are validated procedures, recommended widely by statisticians. However, many researchers continue to use alternative, simpler, but flawed procedures, such as the eigenvalues-greater-than-one rule. Use of the proper procedures might be increased if these procedures could be conducted within familiar software environments. This paper describes brief and efficient programs for using SPSS and SAS to conduct parallel analyses and the MAP test.
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

Science.gov (United States)

Lawson, Gary; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2017-01-01

In many fields, real-world applications for High Performance Computing have already been developed. For these applications to stay up-to-date, new parallel strategies must be explored to yield the best performance; however, restructuring or modifying a real-world application may be daunting depending on the size of the code. In this case, a mini-app may be employed to quickly explore such options without modifying the entire code. In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23 was measured for MPI+SMPI, but only 11 was measured for MPI+OpenMP.
Adapting algorithms to massively parallel hardware

CERN Document Server

Sioulas, Panagiotis

2016-01-01

In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.
Contrast invariance of orientation tuning in the lateral geniculate nucleus of the feline visual system.

Science.gov (United States)

Viswanathan, Sivaram; Jayakumar, Jaikishan; Vidyasagar, Trichur R

2015-09-01

Responses of most neurons in the primary visual cortex of mammals are markedly selective for stimulus orientation and their orientation tuning does not vary with changes in stimulus contrast. The basis of such contrast invariance of orientation tuning has been shown to be the higher variability in the response for low-contrast stimuli. Neurons in the lateral geniculate nucleus (LGN), which provides the major visual input to the cortex, have also been shown to have higher variability in their response to low-contrast stimuli. Parallel studies have also long established mild degrees of orientation selectivity in LGN and retinal cells. In our study, we show that contrast invariance of orientation tuning is already present in the LGN. In addition, we show that the variability of spike responses of LGN neurons increases at lower stimulus contrasts, especially for non-preferred orientations. We suggest that such contrast- and orientation-sensitive variability not only explains the contrast invariance observed in the LGN but can also underlie the contrast-invariant orientation tuning seen at the level of the primary visual cortex. © 2015 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Design strategies for irregularly adapting parallel applications

International Nuclear Information System (INIS)

Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal

2000-01-01

Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability

A development framework for parallel CFD applications: TRIOU project

International Nuclear Information System (INIS)

Calvin, Ch.

2003-01-01

We present in this paper the parallel structure of a thermal-hydraulic framework: Trio-U. This development platform has been designed in order to solve large 3-dimensional structured or unstructured CFD (computational fluid dynamics) problems. The code is intrinsically parallel, and an object-oriented design, UML, is used. The implementation language chosen is C++. All the parallelism management and the communication routines have been encapsulated. Parallel I/O and communication classes over standard I/O streams of C++ have been defined, which allows the developer an easy use of the different modules of the application without dealing with basic parallel process management and communications. Moreover, the encapsulation of the communication routines, guarantees the portability of the application and allows an efficient tuning of basic communication methods in order to achieve the best performances of the target architecture. The speed-up of parallel applications designed using the Trio U framework are very good since we obtained, for instance, on complex turbulent flow Large Eddy Simulation (LES) simulations an efficiency of up to 90% on 20 processors. The efficiencies obtained on direct numerical simulations of two phase flow fluids are similar since the speed-up is nearly equals to 7.5 for a 3-dimensional simulation using a one million element mesh on 8 processors. The purpose of this paper is to focus on the main concepts and their implementation that were the guidelines of the design of the parallel architecture of the code. (author)
Implementation and performance of parallelized elegant

International Nuclear Information System (INIS)

Wang, Y.; Borland, M.

2008-01-01

The program elegant is widely used for design and modeling of linacs for free-electron lasers and energy recovery linacs, as well as storage rings and other applications. As part of a multi-year effort, we have parallelized many aspects of the code, including single-particle dynamics, wakefields, and coherent synchrotron radiation. We report on the approach used for gradual parallelization, which proved very beneficial in getting parallel features into the hands of users quickly. We also report details of parallelization of collective effects. Finally, we discuss performance of the parallelized code in various applications.
SQL Tuning

CERN Document Server

Tow, Dan

2003-01-01

A poorly performing database application not only costs users time, but also has an impact on other applications running on the same computer or the same network. SQL Tuning provides an essential next step for SQL developers and database administrators who want to extend their SQL tuning expertise and get the most from their database applications.There are two basic issues to focus on when tuning SQL: how to find and interpret the execution plan of an SQL statement and how to change SQL to get a specific alternate execution plan. SQL Tuning provides answers to these questions and addresses a third issue that's even more important: how to find the optimal execution plan for the query to use.Author Dan Tow outlines a timesaving method he's developed for finding the optimum execution plan--rapidly and systematically--regardless of the complexity of the SQL or the database platform being used. You'll learn how to understand and control SQL execution plans and how to diagram SQL queries to deduce the best executio...
The parallel adult education system

DEFF Research Database (Denmark)

Wahlgren, Bjarne

2015-01-01

for competence development. The Danish university educational system includes two parallel programs: a traditional academic track (candidatus) and an alternative practice-based track (master). The practice-based program was established in 2001 and organized as part time. The total program takes half the time...
Parallelization in Modern C++

CERN Multimedia

CERN. Geneva

2016-01-01

The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...
Switching current imbalance mitigation in power modules with parallel connected SiC MOSFETs

DEFF Research Database (Denmark)

Beczkowski, Szymon; Jørgensen, Asger Bjørn; Li, Helong

2017-01-01

Multichip power modules use parallel connected chips to achieve high current rating. Due to a finite flexibility in a DBC layout, some electrical asymmetries will occur in the module. Parallel connected transistors will exhibit uneven static and dynamic current sharing due to these asymmetries....... Especially important are the couplings between gate and power loops of individual transistors. Fast changing source currents cause gate voltage imbalances yielding uneven switching currents. Equalizing gate voltages seen by paralleled transistors, done by adjusting source bond wires, is proposed...... in this paper. Analysis is performed on an industry standard DBC layout using numerically extracted module parasitics. The method of tuning individual source inductances shows clear improvement in dynamic current balancing and prevents excessive current overshoot during transistors turn-on....
Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

Science.gov (United States)

Harper, Richard

1989-01-01

In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described.
Tuning the mesomorphic properties of phenoxy-terminated smectic liquid crystals: the effect of fluoro substitution.

Science.gov (United States)

Thompson, Matthew; Carkner, Carolyn; Mosey, Nicholas J; Kapernaum, Nadia; Lemieux, Robert P

2015-05-21

The mesomorphic properties of phenoxy-terminated 5-alkoxy-2-(4-alkoxyphenyl)pyrimidine liquid crystals can be tuned in a predictable fashion with fluoro substituents on the phenoxy end-group. We show that an ortho-fluoro substituent promotes the formation of a tilted smectic C (SmC) phase whereas a para-fluoro substituent promotes the formation of an orthogonal smectic A (SmA) phase. The balance between SmA and SmC phases may be understood in terms of the energetic preference of the phenoxy end-groups to self-assemble via arene-arene interactions in a parallel or antiparallel geometry, and how these non-covalent interactions may cause either a suppression or enhancement of out-of-layer fluctuations at the interface of smectic layers. Calculations of changes in the potential energy of association ΔE for non-covalent dimers of fluoro-substituted n-butyloxybenzene molecules in parallel and antiparallel geometries support this hypothesis. We also show how mesomorphic properties can be further tuned by difluoro and perfluoro substitution, including difluoro substitution at the ortho positions, which uniquely promotes the formation of a SmC-nematic phase sequence.
Parallelism in computations in quantum and statistical mechanics

International Nuclear Information System (INIS)

Clementi, E.; Corongiu, G.; Detrich, J.H.

1985-01-01

Often very fundamental biochemical and biophysical problems defy simulations because of limitations in today's computers. We present and discuss a distributed system composed of two IBM 4341 s and/or an IBM 4381 as front-end processors and ten FPS-164 attached array processors. This parallel system - called LCAP - has presently a peak performance of about 110 Mflops; extensions to higher performance are discussed. Presently, the system applications use a modified version of VM/SP as the operating system: description of the modifications is given. Three applications programs have been migrated from sequential to parallel: a molecular quantum mechanical, a Metropolis-Monte Carlo and a molecular dynamics program. Descriptions of the parallel codes are briefly outlined. Use of these parallel codes has already opened up new capabilities for our research. The very positive performance comparisons with today's supercomputers allow us to conclude that parallel computers and programming, of the type we have considered, represent a pragmatic answer to many computationally intensive problems. (orig.)
CUBESIM, Hypercube and Denelcor Hep Parallel Computer Simulation

International Nuclear Information System (INIS)

Dunigan, T.H.

1988-01-01

1 - Description of program or function: CUBESIM is a set of subroutine libraries and programs for the simulation of message-passing parallel computers and shared-memory parallel computers. Subroutines are supplied to simulate the Intel hypercube and the Denelcor HEP parallel computers. The system permits a user to develop and test parallel programs written in C or FORTRAN on a single processor. The user may alter such hypercube parameters as message startup times, packet size, and the computation-to-communication ratio. The simulation generates a trace file that can be used for debugging, performance analysis, or graphical display. 2 - Method of solution: The CUBESIM simulator is linked with the user's parallel application routines to run as a single UNIX process. The simulator library provides a small operating system to perform process and message management. 3 - Restrictions on the complexity of the problem: Up to 128 processors can be simulated with a virtual memory limit of 6 million bytes. Up to 1000 processes can be simulated
A Novel Technique for Design of Ultra High Tunable Electrostatic Parallel Plate RF MEMS Variable Capacitor

Science.gov (United States)

Baghelani, Masoud; Ghavifekr, Habib Badri

2017-12-01

This paper introduces a novel method for designing of low actuation voltage, high tuning ratio electrostatic parallel plate RF MEMS variable capacitors. It is feasible to achieve ultra-high tuning ratios way beyond 1.5:1 barrier, imposed by pull-in effect, by the proposed method. The proposed method is based on spring strengthening of the structure just before the unstable region. Spring strengthening could be realized by embedding some dimples on the spring arms with the precise height. These dimples shorten the spring length when achieved to the substrate. By the proposed method, as high tuning ratios as 7.5:1 is attainable by only considering four dimple sets. The required actuation voltage for this high tuning ratio is 14.33 V which is simply achievable on-chip by charge pump circuits. Brownian noise effect is also discussed and mechanical natural frequency of the structure is calculated.
On the Tuning and the Mass of the Composite Higgs

CERN Document Server

Panico, Giuliano; Tesi, Andrea; Wulzer, Andrea

2013-01-01

We analyze quantitatively the tuning of composite Higgs models with partial compositeness and its interplay with the predicted Higgs mass. In this respect we identify three classes of models, characterized by different quantum numbers of the fermionic colored resonances associated with the top quark, the so-called top partners. The main result of this classification is that in all models with moderate tuning a light Higgs, of 125 GeV mass, requires the presence of light top partners, around 1 TeV. The minimal tuning is comparable to the one of the most attractive supersymmetric models in particular the ones realizing Natural SUSY. This gives further support to an extensive program of top partners searches at the LHC that can already probe the natural region of composite Higgs models.
Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

Science.gov (United States)

Quealy, Angela; Cole, Gary L.; Blech, Richard A.

1993-01-01

The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
Reducing Children Behavior Problems: A Pilot Study of Tuning in to Kids in Iran

Directory of Open Access Journals (Sweden)

Fateme Aghaie Meybodi

2017-09-01

Discussion: The Tuning in to Kids program appears to be a promising parenting intervention for mothers and children with disruptive behavior problems, offering a useful addition to usual programs used in Iran.
rf measurements and tuning of the 750 MHz radio frequency quadrupole

Science.gov (United States)

Koubek, Benjamin; Grudiev, Alexej; Timmins, Marc

2017-08-01

In the framework of the program on medical applications a compact 750 MHz RFQ has been designed and built to be used as an injector for a hadron therapy linac. This RFQ was designed to accelerate protons to an energy of 5 MeV within only 2 m length. It is divided into four segments and equipped with 32 tuners in total. The length of the RFQ corresponds to 5 λ which is considered to be close to the limit for field adjustment using only piston tuners. Moreover the high frequency, which is about double the frequency of existing RFQs, results in a sensitive structure and requires careful tuning. In this paper we present the tuning algorithm, the tuning procedure and rf measurements of the RFQ.
PERI - auto-tuning memory-intensive kernels for multicore

International Nuclear Information System (INIS)

Williams, S; Carter, J; Oliker, L; Shalf, J; Yelick, K; Bailey, D; Datta, K

2008-01-01

We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to sparse matrix vector multiplication (SpMV), the explicit heat equation PDE on a regular grid (Stencil), and a lattice Boltzmann application (LBMHD). We explore one of the broadest sets of multicore architectures in the high-performance computing literature, including the Intel Xeon Clovertown, AMD Opteron Barcelona, Sun Victoria Falls, and the Sony-Toshiba-IBM (STI) Cell. Rather than hand-tuning each kernel for each system, we develop a code generator for each kernel that allows us identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned kernel applications often achieve a better than 4x improvement compared with the original code. Additionally, we analyze a Roofline performance model for each platform to reveal hardware bottlenecks and software challenges for future multicore systems and applications
PERI - Auto-tuning Memory Intensive Kernels for Multicore

Energy Technology Data Exchange (ETDEWEB)

Bailey, David H; Williams, Samuel; Datta, Kaushik; Carter, Jonathan; Oliker, Leonid; Shalf, John; Yelick, Katherine; Bailey, David H

2008-06-24

We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of search-based performance optimizations, popular in linear algebra and FFT libraries, to application-specific computational kernels. Our work applies this strategy to Sparse Matrix Vector Multiplication (SpMV), the explicit heat equation PDE on a regular grid (Stencil), and a lattice Boltzmann application (LBMHD). We explore one of the broadest sets of multicore architectures in the HPC literature, including the Intel Xeon Clovertown, AMD Opteron Barcelona, Sun Victoria Falls, and the Sony-Toshiba-IBM (STI) Cell. Rather than hand-tuning each kernel for each system, we develop a code generator for each kernel that allows us to identify a highly optimized version for each platform, while amortizing the human programming effort. Results show that our auto-tuned kernel applications often achieve a better than 4X improvement compared with the original code. Additionally, we analyze a Roofline performance model for each platform to reveal hardware bottlenecks and software challenges for future multicore systems and applications.
Exploiting variability for energy optimization of parallel programs

Energy Technology Data Exchange (ETDEWEB)

Lavrijsen, Wim [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Iancu, Costin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); de Jong, Wibe [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Chen, Xin [Georgia Inst. of Technology, Atlanta, GA (United States); Schwan, Karsten [Georgia Inst. of Technology, Atlanta, GA (United States)

2016-04-18

Here in this paper we present optimizations that use DVFS mechanisms to reduce the total energy usage in scientific applications. Our main insight is that noise is intrinsic to large scale parallel executions and it appears whenever shared resources are contended. The presence of noise allows us to identify and manipulate any program regions amenable to DVFS. When compared to previous energy optimizations that make per core decisions using predictions of the running time, our scheme uses a qualitative approach to recognize the signature of executions amenable to DVFS. By recognizing the "shape of variability" we can optimize codes with highly dynamic behavior, which pose challenges to all existing DVFS techniques. We validate our approach using offline and online analyses for one-sided and two-sided communication paradigms. We have applied our methods to NWChem, and we show best case improvements in energy use of 12% at no loss in performance when using online optimizations running on 720 Haswell cores with one-sided communication. With NWChem on MPI two-sided and offline analysis, capturing the initialization, we find energy savings of up to 20%, with less than 1% performance cost.
Automatic Management of Parallel and Distributed System Resources

Science.gov (United States)

Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

1990-01-01

Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
High-performance computing — an overview

Science.gov (United States)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.

Loss of Kdm5c Causes Spurious Transcription and Prevents the Fine-Tuning of Activity-Regulated Enhancers in Neurons

Directory of Open Access Journals (Sweden)

Marilyn Scandaglia

2017-10-01

Full Text Available During development, chromatin-modifying enzymes regulate both the timely establishment of cell-type-specific gene programs and the coordinated repression of alternative cell fates. To dissect the role of one such enzyme, the intellectual-disability-linked lysine demethylase 5C (Kdm5c, in the developing and adult brain, we conducted parallel behavioral, transcriptomic, and epigenomic studies in Kdm5c-null and forebrain-restricted inducible knockout mice. Together, genomic analyses and functional assays demonstrate that Kdm5c plays a critical role as a repressor responsible for the developmental silencing of germline genes during cellular differentiation and in fine-tuning activity-regulated enhancers during neuronal maturation. Although the importance of these functions declines after birth, Kdm5c retains an important genome surveillance role preventing the incorrect activation of non-neuronal and cryptic promoters in adult neurons.
Streaming for Functional Data-Parallel Languages

DEFF Research Database (Denmark)

Madsen, Frederik Meisner

In this thesis, we investigate streaming as a general solution to the space inefficiency commonly found in functional data-parallel programming languages. The data-parallel paradigm maps well to parallel SIMD-style hardware. However, the traditional fully materializing execution strategy...... by extending two existing data-parallel languages: NESL and Accelerate. In the extensions we map bulk operations to data-parallel streams that can evaluate fully sequential, fully parallel or anything in between. By a dataflow, piecewise parallel execution strategy, the runtime system can adjust to any target...... flattening necessitates all sub-computations to materialize at the same time. For example, naive n by n matrix multiplication requires n^3 space in NESL because the algorithm contains n^3 independent scalar multiplications. For large values of n, this is completely unacceptable. We address the problem...
Automatic Parallelization An Overview of Fundamental Compiler Techniques

CERN Document Server

Midkiff, Samuel P

2012-01-01

Compiling for parallelism is a longstanding topic of compiler research. This book describes the fundamental principles of compiling "regular" numerical programs for parallelism. We begin with an explanation of analyses that allow a compiler to understand the interaction of data reads and writes in different statements and loop iterations during program execution. These analyses include dependence analysis, use-def analysis and pointer analysis. Next, we describe how the results of these analyses are used to enable transformations that make loops more amenable to parallelization, and
Automated tuning of an eight-channel cardiac transceive array at 7 tesla using piezoelectric actuators.

Science.gov (United States)

Keith, Graeme A; Rodgers, Christopher T; Hess, Aaron T; Snyder, Carl J; Vaughan, J Thomas; Robson, Matthew D

2015-06-01

Ultra-high field (UHF) MR scanning in the body requires novel coil designs due to B1 field inhomogeneities. In the transverse electromagnetic field (TEM) design, maximum B1 transmit power can only be achieved if each individual transmit element is tuned and matched for different coil loads, which requires a considerable amount of valuable scanner time. An integrated system for autotuning a multichannel parallel transmit (pTx) cardiac TEM array was devised, using piezoelectric actuators, power monitoring equipment and control software. The reproducibility and performance of the system were tested and the power responses of the coil elements were profiled. An automated optimization method was devised and evaluated. The time required to tune an eight-element pTx cardiac RF array was reduced from a mean of 30 min to less than 10 min with the use of this system. Piezoelectric actuators are an attractive means of tuning RF coil arrays to yield more efficient B1 transmission into the subject. An automated mechanism for tuning these elements provides a practical solution for cardiac imaging at UHF, bringing this technology closer to clinical use. © 2014 The Authors. Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine.
A solution for automatic parallelization of sequential assembly code

Directory of Open Access Journals (Sweden)

Kovačević Đorđe

2013-01-01

Full Text Available Since modern multicore processors can execute existing sequential programs only on a single core, there is a strong need for automatic parallelization of program code. Relying on existing algorithms, this paper describes one new software solution tool for parallelization of sequential assembly code. The main goal of this paper is to develop the parallelizator which reads sequential assembler code and at the output provides parallelized code for MIPS processor with multiple cores. The idea is the following: the parser translates assembler input file to program objects suitable for further processing. After that the static single assignment is done. Based on the data flow graph, the parallelization algorithm separates instructions on different cores. Once sequential code is parallelized by the parallelization algorithm, registers are allocated with the algorithm for linear allocation, and the result at the end of the program is distributed assembler code on each of the cores. In the paper we evaluate the speedup of the matrix multiplication example, which was processed by the parallelizator of assembly code. The result is almost linear speedup of code execution, which increases with the number of cores. The speed up on the two cores is 1.99, while on 16 cores the speed up is 13.88.
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

Science.gov (United States)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Java performance tuning

CERN Document Server

Shirazi, Jack

2003-01-01

Performance has been an important issue for Java developers ever since the first version hit the streets. Over the years, Java performance has improved dramatically, but tuning is essential to get the best results, especially for J2EE applications. You can never have code that runs too fast. Java Peformance Tuning, 2nd edition provides a comprehensive and indispensable guide to eliminating all types of performance problems. Using many real-life examples to work through the tuning process in detail, JPT shows how tricks such as minimizing object creation and replacing strings with arrays can
Parallel Object-Oriented Computation Applied to a Finite Element Problem

Directory of Open Access Journals (Sweden)

Jon B. Weissman

1993-01-01

Full Text Available The conventional wisdom in the scientific computing community is that the best way to solve large-scale numerically intensive scientific problems on today's parallel MIMD computers is to use Fortran or C programmed in a data-parallel style using low-level message-passing primitives. This approach inevitably leads to nonportable codes and extensive development time, and restricts parallel programming to the domain of the expert programmer. We believe that these problems are not inherent to parallel computing but are the result of the programming tools used. We will show that comparable performance can be achieved with little effort if better tools that present higher level abstractions are used. The vehicle for our demonstration is a 2D electromagnetic finite element scattering code we have implemented in Mentat, an object-oriented parallel processing system. We briefly describe the application. Mentat, the implementation, and present performance results for both a Mentat and a hand-coded parallel Fortran version.
Parallelism and Scalability in an Image Processing Application

DEFF Research Database (Denmark)

Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

2008-01-01

parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...
Parallelism and Scalability in an Image Processing Application

DEFF Research Database (Denmark)

Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

2009-01-01

parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...
Computer-Aided Parallelizer and Optimizer

Science.gov (United States)

Jin, Haoqiang

2011-01-01

The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
The ongoing investigation of high performance parallel computing in HEP

CERN Document Server

Peach, Kenneth J; Böck, R K; Dobinson, Robert W; Hansroul, M; Norton, Alan Robert; Willers, Ian Malcolm; Baud, J P; Carminati, F; Gagliardi, F; McIntosh, E; Metcalf, M; Robertson, L; CERN. Geneva. Detector Research and Development Committee

1993-01-01

Past and current exploitation of parallel computing in High Energy Physics is summarized and a list of R & D projects in this area is presented. The applicability of new parallel hardware and software to physics problems is investigated, in the light of the requirements for computing power of LHC experiments and the current trends in the computer industry. Four main themes are discussed (possibilities for a finer grain of parallelism; fine-grain communication mechanism; usable parallel programming environment; different programming models and architectures, using standard commercial products). Parallel computing technology is potentially of interest for offline and vital for real time applications in LHC. A substantial investment in applications development and evaluation of state of the art hardware and software products is needed. A solid development environment is required at an early stage, before mainline LHC program development begins.
Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-12-31

This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.
A Model for Speedup of Parallel Programs

Science.gov (United States)

1997-01-01

Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static
Tuning Bacterial Hydrodynamics with Magnetic Fields: A Path to Bacterial Robotics

Science.gov (United States)

Pierce, Christopher; Mumper, Eric; Brangham, Jack; Wijesinghe, Hiran; Lower, Stephen; Lower, Brian; Yang, Fengyuan; Sooryakumar, Ratnasingham

Magnetotactic Bacteria (MTB) are a group of motile prokaryotes that synthesize chains of lipid-bound, magnetic nano-particles. In this study, the innate magnetism of these flagellated swimmers is exploited to explore their hydrodynamics near confining surfaces, using the magnetic field as a tuning parameter. With weak (Gauss), uniform, external, magnetic ?elds and the field gradients arising from micro-magnetic surface patterns, the relative strength of hydrodynamic, magnetic and ?agellar force components is tuned through magnetic control of the bacteria's orientation and position. In addition to direct measurement of several hydrodynamic quantities related to the motility of individual cells, their tunable dynamics reveal a number of novel, highly controllable swimming behaviors with potential value in micro-robotics applications. Specifically, the experiments permit the MTB cells to be directed along parallel or divergent trajectories, suppress their flagellar forces through magnetic means, and induce transitions between planar, circulating trajectories and drifting, vertically oriented ``top-like'' motion. The implications of the work for fundamental hydrodynamics research as well as bacterially driven robotics applications will be discussed.
Fast implementations of 3D PET reconstruction using vector and parallel programming techniques

International Nuclear Information System (INIS)

Guerrero, T.M.; Cherry, S.R.; Dahlbom, M.; Ricci, A.R.; Hoffman, E.J.

1993-01-01

Computationally intensive techniques that offer potential clinical use have arisen in nuclear medicine. Examples include iterative reconstruction, 3D PET data acquisition and reconstruction, and 3D image volume manipulation including image registration. One obstacle in achieving clinical acceptance of these techniques is the computational time required. This study focuses on methods to reduce the computation time for 3D PET reconstruction through the use of fast computer hardware, vector and parallel programming techniques, and algorithm optimization. The strengths and weaknesses of i860 microprocessor based workstation accelerator boards are investigated in implementations of 3D PET reconstruction
Modernising educational programmes in ICT based on the Tuning methodology

Directory of Open Access Journals (Sweden)

Alexander Bedny

2014-07-01

Full Text Available An analysis is presented of the experience of modernising undergraduate educational programs using the TUNING methodology, based on the example of the area of studies “Fundamental computer science and information technology” (FCSIT implemented at Lobachevsky State University of Nizhni Novgorod (Russia. The algorithm for reforming curricula for the subject area of information technology in accordance with the TUNING methodology is explained. A comparison is drawn between the existing Russian and European standards in the area of ICT education, including the European e-Competence Framework, with the focus on relevant competences. Some guidelines for the preparation of educational programmes are also provided.
Micro-mechanical Simulations of Soils using Massively Parallel Supercomputers

Directory of Open Access Journals (Sweden)

David W. Washington

2004-06-01

Full Text Available In this research a computer program, Trubal version 1.51, based on the Discrete Element Method was converted to run on a Connection Machine (CM-5,a massively parallel supercomputer with 512 nodes, to expedite the computational times of simulating Geotechnical boundary value problems. The dynamic memory algorithm in Trubal program did not perform efficiently in CM-2 machine with the Single Instruction Multiple Data (SIMD architecture. This was due to the communication overhead involving global array reductions, global array broadcast and random data movement. Therefore, a dynamic memory algorithm in Trubal program was converted to a static memory arrangement and Trubal program was successfully converted to run on CM-5 machines. The converted program was called "TRUBAL for Parallel Machines (TPM." Simulating two physical triaxial experiments and comparing simulation results with Trubal simulations validated the TPM program. With a 512 nodes CM-5 machine TPM produced a nine-fold speedup demonstrating the inherent parallelism within algorithms based on the Discrete Element Method.
Simultaneous gains tuning in boiler/turbine PID-based controller clusters using iterative feedback tuning methodology.

Science.gov (United States)

Zhang, Shu; Taft, Cyrus W; Bentsman, Joseph; Hussey, Aaron; Petrus, Bryan

2012-09-01

Tuning a complex multi-loop PID based control system requires considerable experience. In today's power industry the number of available qualified tuners is dwindling and there is a great need for better tuning tools to maintain and improve the performance of complex multivariable processes. Multi-loop PID tuning is the procedure for the online tuning of a cluster of PID controllers operating in a closed loop with a multivariable process. This paper presents the first application of the simultaneous tuning technique to the multi-input-multi-output (MIMO) PID based nonlinear controller in the power plant control context, with the closed-loop system consisting of a MIMO nonlinear boiler/turbine model and a nonlinear cluster of six PID-type controllers. Although simplified, the dynamics and cross-coupling of the process and the PID cluster are similar to those used in a real power plant. The particular technique selected, iterative feedback tuning (IFT), utilizes the linearized version of the PID cluster for signal conditioning, but the data collection and tuning is carried out on the full nonlinear closed-loop system. Based on the figure of merit for the control system performance, the IFT is shown to deliver performance favorably comparable to that attained through the empirical tuning carried out by an experienced control engineer. Copyright © 2012 ISA. Published by Elsevier Ltd. All rights reserved.
Power stability methods for parallel systems

International Nuclear Information System (INIS)

Wallach, Y.

1988-01-01

Parallel-Processing Systems are already commercially available. This paper shows that if one of them - the Alternating Sequential Parallel, or ASP system - is applied to network stability calculations it will lead to a higher speed of solution. The ASP system is first described and is then shown to be cheaper, more reliable and available than other parallel systems. Also, no deadlock need be feared and the speedup is normally very high. A number of ASP systems were already assembled (the SMS systems, Topps, DIRMU etc.). At present, an IBM Local Area Network is being modified so that it too can work in the ASP mode. Existing ASP systems were programmed in Fortran or assembly language. Since newer systems (e.g. DIRMU) are programmed in Modula-2, this language can be used. Stability analysis is based on solving nonlinear differential and algebraic equations. The algorithm for solving the nonlinear differential equations on ASP, is described and programmed in Modula-2. The speedup is computed and is shown to be almost optimal

A lightweight, flow-based toolkit for parallel and distributed bioinformatics pipelines

Directory of Open Access Journals (Sweden)

Cieślik Marcin

2011-02-01

Full Text Available Abstract Background Bioinformatic analyses typically proceed as chains of data-processing tasks. A pipeline, or 'workflow', is a well-defined protocol, with a specific structure defined by the topology of data-flow interdependencies, and a particular functionality arising from the data transformations applied at each step. In computer science, the dataflow programming (DFP paradigm defines software systems constructed in this manner, as networks of message-passing components. Thus, bioinformatic workflows can be naturally mapped onto DFP concepts. Results To enable the flexible creation and execution of bioinformatics dataflows, we have written a modular framework for parallel pipelines in Python ('PaPy'. A PaPy workflow is created from re-usable components connected by data-pipes into a directed acyclic graph, which together define nested higher-order map functions. The successive functional transformations of input data are evaluated on flexibly pooled compute resources, either local or remote. Input items are processed in batches of adjustable size, all flowing one to tune the trade-off between parallelism and lazy-evaluation (memory consumption. An add-on module ('NuBio' facilitates the creation of bioinformatics workflows by providing domain specific data-containers (e.g., for biomolecular sequences, alignments, structures and functionality (e.g., to parse/write standard file formats. Conclusions PaPy offers a modular framework for the creation and deployment of parallel and distributed data-processing workflows. Pipelines derive their functionality from user-written, data-coupled components, so PaPy also can be viewed as a lightweight toolkit for extensible, flow-based bioinformatics data-processing. The simplicity and flexibility of distributed PaPy pipelines may help users bridge the gap between traditional desktop/workstation and grid computing. PaPy is freely distributed as open-source Python code at http://muralab.org/PaPy, and
Automatic performance tuning of parallel and accelerated seismic imaging kernels

KAUST Repository

Haberdar, Hakan; Siddiqui, Shahzeb; Feki, Saber

2014-01-01

the performance of the MPI communications as well as developer productivity by providing a higher level of abstraction. Keeping productivity in mind, we opted toward pragma based programming for accelerated computation on latest accelerated architectures
Advanced parallel processing with supercomputer architectures

International Nuclear Information System (INIS)

Hwang, K.

1987-01-01

This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers
A parallel solver for huge dense linear systems

Science.gov (United States)

Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

2011-11-01

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system
Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

Science.gov (United States)

Rostrup, Scott; De Sterck, Hans

2010-12-01

Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL
Algorithms for computational fluid dynamics n parallel processors

International Nuclear Information System (INIS)

Van de Velde, E.F.

1986-01-01

A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries
Applications of the parallel computing system using network

International Nuclear Information System (INIS)

Ido, Shunji; Hasebe, Hiroki

1994-01-01

Parallel programming is applied to multiple processors connected in Ethernet. Data exchanges between tasks located in each processing element are realized by two ways. One is socket which is standard library on recent UNIX operating systems. Another is a network connecting software, named as Parallel Virtual Machine (PVM) which is a free software developed by ORNL, to use many workstations connected to network as a parallel computer. This paper discusses the availability of parallel computing using network and UNIX workstations and comparison between specialized parallel systems (Transputer and iPSC/860) in a Monte Carlo simulation which generally shows high parallelization ratio. (author)
Parallel community climate model: Description and user`s guide

Energy Technology Data Exchange (ETDEWEB)

Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H. [and others

1996-07-15

This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.
Parallel Libraries to support High-Level Programming

DEFF Research Database (Denmark)

Larsen, Morten Nørgaard

and the Microsoft .NET iv framework. Normally, one would not directly think of the .NET framework when talking scientific applications, but Microsoft has in the last couple of versions of .NET introduce a number of tools for writing parallel and high performance code. The first section examines how programmers can...
An Expert System for the Development of Efficient Parallel Code

Science.gov (United States)

Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

2004-01-01

We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.
Directions in parallel processor architecture, and GPUs too

CERN Multimedia

CERN. Geneva

2014-01-01

Modern computing is power-limited in every domain of computing. Performance increments extracted from instruction-level parallelism (ILP) are no longer power-efficient; they haven't been for some time. Thread-level parallelism (TLP) is a more easily exploited form of parallelism, at the expense of programmer effort to expose it in the program. In this talk, I will introduce you to disparate topics in parallel processor architecture that will impact programming models (and you) in both the near and far future. About the speaker Olivier is a senior GPU (SM) architect at NVIDIA and an active participant in the concurrency working group of the ISO C++ committee. He has also worked on very large diesel engines as a mechanical engineer, and taught at McGill University (Canada) as a faculty instructor.
The ATLAS Monte Carlo tuning system

CERN Document Server

Wahrmund, S

2012-01-01

The ATLAS experiment moved the tuning of the underlying event and minimum bias event shape modeling, previously done in a manual fashion, to the automated Professor tuning tool, employed in connection with the Rivet analysis framework, when the first corresponding experimental analysis from LHC became available. The tuning effort for the Pythia 8 generator, which includes improved models for diffraction, has been started in this automated way in ATLAS, with the aim of getting a good description of the pile-up generated by multiple minimum bias interactions. The first results for these Pythia 8 tunes, as well as Pythia 6 shower tunes are presented, including a study of tunes for various PDFs.
rf measurements and tuning of the 750 MHz radio frequency quadrupole

Directory of Open Access Journals (Sweden)

Benjamin Koubek

2017-08-01

Full Text Available In the framework of the program on medical applications a compact 750 MHz RFQ has been designed and built to be used as an injector for a hadron therapy linac. This RFQ was designed to accelerate protons to an energy of 5 MeV within only 2 m length. It is divided into four segments and equipped with 32 tuners in total. The length of the RFQ corresponds to 5λ which is considered to be close to the limit for field adjustment using only piston tuners. Moreover the high frequency, which is about double the frequency of existing RFQs, results in a sensitive structure and requires careful tuning. In this paper we present the tuning algorithm, the tuning procedure and rf measurements of the RFQ.
Spectral tuning of near-field radiative heat transfer by graphene-covered metasurfaces

Science.gov (United States)

Zheng, Zhiheng; Wang, Ao; Xuan, Yimin

2018-03-01

When two gratings are respectively covered by a layer of graphene sheet, the near-field radiative heat transfer between two parallel gratings made of silica (SiO2) could be greatly improved. As the material properties of doped silicon (n-type doping concentration is 1020 cm-3, marked as Si-20) and SiO2 differ greatly, we theoretically investigate the near-field radiative heat transfer between two parallel graphene-covered gratings made of Si-20 to explore some different phenomena, especially for modulating the spectral properties. The radiative heat flux between two parallel bulks made of Si-20 can be enhanced by using gratings instead of bulks. When the two gratings are respectively covered by a layer of graphene sheet, the radiative heat flux between two gratings made of Si-20 can be further enhanced. By tuning graphene chemical potential μ and grating filling factor f, due to the interaction between surface plasmon polaritons (SPPs) of graphene sheets and grating structures, the spectral properties of the radiative heat flux between two parallel graphene-covered gratings can be effectively regulated. This work will develop and supplement the effects of materials on the near-field radiative heat transfer for this kind of system configuration, paving a way to modulate the spectral properties of near-field radiative heat transfer.
Abstract Level Parallelization of Finite Difference Methods

Directory of Open Access Journals (Sweden)

Edwin Vollebregt

1997-01-01

Full Text Available A formalism is proposed for describing finite difference calculations in an abstract way. The formalism consists of index sets and stencils, for characterizing the structure of sets of data items and interactions between data items (“neighbouring relations”. The formalism provides a means for lifting programming to a more abstract level. This simplifies the tasks of performance analysis and verification of correctness, and opens the way for automaticcode generation. The notation is particularly useful in parallelization, for the systematic construction of parallel programs in a process/channel programming paradigm (e.g., message passing. This is important because message passing, unfortunately, still is the only approach that leads to acceptable performance for many more unstructured or irregular problems on parallel computers that have non-uniform memory access times. It will be shown that the use of index sets and stencils greatly simplifies the determination of which data must be exchanged between different computing processes.
Block-Parallel Data Analysis with DIY2

Energy Technology Data Exchange (ETDEWEB)

Morozov, Dmitriy [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Peterka, Tom [Argonne National Lab. (ANL), Argonne, IL (United States)

2017-08-30

DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial, parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.
Parallel algorithms for continuum dynamics

International Nuclear Information System (INIS)

Hicks, D.L.; Liebrock, L.M.

1987-01-01

Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors
Parallelization of the model-based iterative reconstruction algorithm DIRA

International Nuclear Information System (INIS)

Oertenberg, A.; Sandborg, M.; Alm Carlsson, G.; Malusek, A.; Magnusson, M.

2016-01-01

New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelization of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelization of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelized using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelization of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelization with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. (authors)
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

Science.gov (United States)

Tam, Wing-Kin; Yang, Zhi

2018-05-01

Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
Searching for globally optimal functional forms for interatomic potentials using genetic programming with parallel tempering.

Science.gov (United States)

Slepoy, A; Peters, M D; Thompson, A P

2007-11-30

Molecular dynamics and other molecular simulation methods rely on a potential energy function, based only on the relative coordinates of the atomic nuclei. Such a function, called a force field, approximately represents the electronic structure interactions of a condensed matter system. Developing such approximate functions and fitting their parameters remains an arduous, time-consuming process, relying on expert physical intuition. To address this problem, a functional programming methodology was developed that may enable automated discovery of entirely new force-field functional forms, while simultaneously fitting parameter values. The method uses a combination of genetic programming, Metropolis Monte Carlo importance sampling and parallel tempering, to efficiently search a large space of candidate functional forms and parameters. The methodology was tested using a nontrivial problem with a well-defined globally optimal solution: a small set of atomic configurations was generated and the energy of each configuration was calculated using the Lennard-Jones pair potential. Starting with a population of random functions, our fully automated, massively parallel implementation of the method reproducibly discovered the original Lennard-Jones pair potential by searching for several hours on 100 processors, sampling only a minuscule portion of the total search space. This result indicates that, with further improvement, the method may be suitable for unsupervised development of more accurate force fields with completely new functional forms. Copyright (c) 2007 Wiley Periodicals, Inc.

Chattering-Free Neuro-Sliding Mode Control of 2-DOF Planar Parallel Manipulators

Directory of Open Access Journals (Sweden)

Tien Dung Le

2013-01-01

Full Text Available This paper proposes a novel chattering free neuro-sliding mode controller for the trajectory tracking control of two degrees of freedom (DOF parallel manipulators which have a complicated dynamic model, including modelling uncertainties, frictional uncertainties and external disturbances. A feedforward neural network (NN is combined with an error estimator to completely compensate the large nonlinear uncertainties and external disturbances of the parallel manipulators. The online weight tuning algorithms of the NN and the structure of the error estimator are derived with the strict theoretical stability proof of the Lyapunov theorem. The upper bound of uncertainties and the upper bound of the approximation errors are not required to be known in advance in order to guarantee the stability of the closed-loop system. The example simulation results show the effectiveness of the proposed control strategy for the tracking control of a 2-DOF parallel manipulator. It results in its being chattering-free, very small tracking errors and its robustness against uncertainties and external disturbances.
Experience with a clustered parallel reduction machine

NARCIS (Netherlands)

Beemster, M.; Hartel, Pieter H.; Hertzberger, L.O.; Hofman, R.F.H.; Langendoen, K.G.; Li, L.L.; Milikowski, R.; Vree, W.G.; Barendregt, H.P.; Mulder, J.C.

A clustered architecture has been designed to exploit divide and conquer parallelism in functional programs. The programming methodology developed for the machine is based on explicit annotations and program transformations. It has been successfully applied to a number of algorithms resulting in a
SC tuning fork

CERN Document Server

The tuning fork used to modulate the radiofrequency system of the synchro cyclotron (SC) from 1957 to 1973. This piece is an unused spare part. The SC was the 1st accelerator built at CERN. It operated from August 1957 until it was closed down at the end of 1990. In the SC the magnetic field did not change with time, and the particles were accelerated in successive pulses by a radiofrequency voltage of some 20kV which varied in frequency as they spiraled outwards towards the extraction radius. The frequency varied from 30MHz to about 17Mz in each pulse. The tuning fork vibrated at 55MHz in vacuum in an enclosure which formed a variable capacitor in the tuning circuit of the RF system, allowing the RF to vary over the appropriate range to accelerate protons from the centre of the macine up to 600Mev at extraction radius. In operation the tips of the tuning fork blade had an amplitude of movement of over 1 cm. The SC accelerator underwent extensive improvements from 1973 to 1975, including the installation of a...
A high-speed linear algebra library with automatic parallelism

Science.gov (United States)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Design and construction of a novel 1H/19F double-tuned coil system using PIN-diode switches at 9.4T.

Science.gov (United States)

Choi, Chang-Hoon; Hong, Suk-Min; Ha, YongHyun; Shah, N Jon

2017-06-01

A double-tuned 1 H/ 19 F coil using PIN-diode switches was developed and its performance evaluated. The is a key difference from the previous developments being that this design used a PIN-diode switch in series with an additionally inserted inductor in parallel to one of the capacitors on the loop. The probe was adjusted to 19 F when the reverse bias voltage was applied (PIN-diode OFF), whilst it was switched to 1 H when forward current was flowing (PIN-diode ON). S-parameters and Q-factors of single- and double-tuned coils were examined and compared with/without a phantom on the bench. Imaging experiments were carried out on a 9.4T preclinical scanner. All coils were tuned at resonance frequencies and matched well. It is shown that the Q-ratio and SNR of double-tuned coil at 19 F frequency are nearly as good as those of a single-tuned coil. Since the operating frequency was tuned to 19 F when the PIN-diodes were turned off, losses due to PIN-diodes were substantially lower resulting in the provision of excellent image quality of X-nuclei. Copyright © 2017 Elsevier Inc. All rights reserved.
Drainage network extraction from a high-resolution DEM using parallel programming in the .NET Framework

Science.gov (United States)

Du, Chao; Ye, Aizhong; Gan, Yanjun; You, Jinjun; Duan, Qinyun; Ma, Feng; Hou, Jingwen

2017-12-01

High-resolution Digital Elevation Models (DEMs) can be used to extract high-accuracy prerequisite drainage networks. A higher resolution represents a larger number of grids. With an increase in the number of grids, the flow direction determination will require substantial computer resources and computing time. Parallel computing is a feasible method with which to resolve this problem. In this paper, we proposed a parallel programming method within the .NET Framework with a C# Compiler in a Windows environment. The basin is divided into sub-basins, and subsequently the different sub-basins operate on multiple threads concurrently to calculate flow directions. The method was applied to calculate the flow direction of the Yellow River basin from 3 arc-second resolution SRTM DEM. Drainage networks were extracted and compared with HydroSHEDS river network to assess their accuracy. The results demonstrate that this method can calculate the flow direction from high-resolution DEMs efficiently and extract high-precision continuous drainage networks.
The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

Directory of Open Access Journals (Sweden)

Matthew O'keefe

1995-01-01

Full Text Available Massively parallel processors (MPPs hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.
Combining Compile-Time and Run-Time Parallelization

Directory of Open Access Journals (Sweden)

Sungdo Moon

1999-01-01

Full Text Available This paper demonstrates that significant improvements to automatic parallelization technology require that existing systems be extended in two ways: (1 they must combine high‐quality compile‐time analysis with low‐cost run‐time testing; and (2 they must take control flow into account during analysis. We support this claim with the results of an experiment that measures the safety of parallelization at run time for loops left unparallelized by the Stanford SUIF compiler’s automatic parallelization system. We present results of measurements on programs from two benchmark suites – SPECFP95 and NAS sample benchmarks – which identify inherently parallel loops in these programs that are missed by the compiler. We characterize remaining parallelization opportunities, and find that most of the loops require run‐time testing, analysis of control flow, or some combination of the two. We present a new compile‐time analysis technique that can be used to parallelize most of these remaining loops. This technique is designed to not only improve the results of compile‐time parallelization, but also to produce low‐cost, directed run‐time tests that allow the system to defer binding of parallelization until run‐time when safety cannot be proven statically. We call this approach predicated array data‐flow analysis. We augment array data‐flow analysis, which the compiler uses to identify independent and privatizable arrays, by associating predicates with array data‐flow values. Predicated array data‐flow analysis allows the compiler to derive “optimistic” data‐flow values guarded by predicates; these predicates can be used to derive a run‐time test guaranteeing the safety of parallelization.
A CS1 pedagogical approach to parallel thinking

Science.gov (United States)

Rague, Brian William

Almost all collegiate programs in Computer Science offer an introductory course in programming primarily devoted to communicating the foundational principles of software design and development. The ACM designates this introduction to computer programming course for first-year students as CS1, during which methodologies for solving problems within a discrete computational context are presented. Logical thinking is highlighted, guided primarily by a sequential approach to algorithm development and made manifest by typically using the latest, commercially successful programming language. In response to the most recent developments in accessible multicore computers, instructors of these introductory classes may wish to include training on how to design workable parallel code. Novel issues arise when programming concurrent applications which can make teaching these concepts to beginning programmers a seemingly formidable task. Student comprehension of design strategies related to parallel systems should be monitored to ensure an effective classroom experience. This research investigated the feasibility of integrating parallel computing concepts into the first-year CS classroom. To quantitatively assess student comprehension of parallel computing, an experimental educational study using a two-factor mixed group design was conducted to evaluate two instructional interventions in addition to a control group: (1) topic lecture only, and (2) topic lecture with laboratory work using a software visualization Parallel Analysis Tool (PAT) specifically designed for this project. A new evaluation instrument developed for this study, the Perceptions of Parallelism Survey (PoPS), was used to measure student learning regarding parallel systems. The results from this educational study show a statistically significant main effect among the repeated measures, implying that student comprehension levels of parallel concepts as measured by the PoPS improve immediately after the delivery of
Data Driven Tuning of Inventory Controllers

DEFF Research Database (Denmark)

Huusom, Jakob Kjøbsted; Santacoloma, Paloma Andrade; Poulsen, Niels Kjølstad

2007-01-01

A systematic method for criterion based tuning of inventory controllers based on data-driven iterative feedback tuning is presented. This tuning method circumvent problems with modeling bias. The process model used for the design of the inventory control is utilized in the tuning...... as an approximation to reduce time required on experiments. The method is illustrated in an application with a multivariable inventory control implementation on a four tank system....
A structured representation for parallel algorithm design on multicomputers

International Nuclear Information System (INIS)

Sun, Xian-He; Ni, L.M.

1991-01-01

Traditionally, parallel algorithms have been designed by brute force methods and fine-tuned on each architecture to achieve high performance. Rather than studying the design case by case, a systematic approach is proposed. A notation is first developed. Using this notation, most of the frequently used scientific and engineering applications can be presented by simple formulas. The formulas constitute the structured representation of the corresponding applications. The structured representation is simple, adequate and easy to understand. They also contain sufficient information about uneven allocation and communication latency degradations. With the structured representation, applications can be compared, classified and partitioned. Some of the basic building blocks, called computation models, of frequently used applications are identified and studied. Most applications are combinations of some computation models. The structured representation relates general applications to computation models. Studying computation models leads to a guideline for efficient parallel algorithm design for general applications. 6 refs., 7 figs
Finite element electromagnetic field computation on the Sequent Symmetry 81 parallel computer

International Nuclear Information System (INIS)

Ratnajeevan, S.; Hoole, H.

1990-01-01

Finite element field analysis algorithms lend themselves to parallelization and this fact is exploited in this paper to implement a finite element analysis program for electromagnetic field computation on the Sequent Symmetry 81 parallel computer with three processors. In terms of waiting time, the maximum gains are to be made in matrix solution and therefore this paper concentrates on the gains in parallelizing the solution part of finite element analysis. An outline of how parallelization could be exploited in most finite element operations is given in this paper although the actual implemention of parallelism on the Sequent Symmetry 81 parallel computer was in sparsity computation, matrix assembly and the matrix solution areas. In all cases, the algorithms were modified suit the parallel programming application rather than allowing the compiler to parallelize on existing algorithms
Transverse betatron tune measurements

International Nuclear Information System (INIS)

Serio, M.

1989-01-01

In this paper the concept of the betatron tune and the techniques to measure it are discussed. The smooth approximation is introduced along with the terminology of betatron oscillations, phase advance and tune. Single particle and beam spectra in the presence of synchro-betatron oscillations are treated with emphasis on the consequences of sampling the beam position. After a general presentation of various kinds of beam position monitors and transverse kickers, the time domain and frequency domain analysis of the beam response to a transverse excitation are discussed and several methods and applications of the tune measurements are listed
The ATLAS Monte Carlo tuning system

CERN Document Server

Wahrmund, S; The ATLAS collaboration

2011-01-01

The ATLAS experiment moved the tuning of the underlying event and minimum bias event shape modeling, previously done in a manual fashion, to the automated Professor tuning tool, employed in connection with the Rivet analysis framework, when the first corresponding experimental analysis from LHC became available. The tuning effort for the Pythia 8 generator, which includes improved models for diffraction, has been started in this automated way in ATLAS, with the aim of getting a good description of the pile-up generated by multiple minimum bias interactions. The first results for these Pythia 8 tunes are presented, including a study of tunes for various PDFs.
Oracle SQL tuning with Oracle SQLTXPLAIN

CERN Document Server

Charalambides, Stelios

2013-01-01

Oracle SQL Tuning with SQLTXPLAIN is a practical guide to SQL tuning the way Oracle's own experts do it, using a freely downloadable tool called SQLTXPLAIN. Using this simple tool you'll learn how to tune even the most complex SQL, and you'll learn to do it quickly, without the huge learning curve usually associated with tuning as a whole. Firmly based in real world problems, this book helps you reclaim system resources and avoid the most common bottleneck in overall performance, badly tuned SQL. You'll learn how the optimizer works, how to take advantage of its latest features, and when it'
A pilot Tuning Project-based national study on recently graduated medical students? self-assessment of competences - the TEST study

OpenAIRE

Grilo Diogo, Pedro; Barbosa, Joselina; Am?lia Ferreira, Maria

2015-01-01

Background The Tuning Project is an initiative funded by the European Commission that developed core competences for primary medical degrees in Europe. Students' grouped self-assessments are used for program evaluation and improvement of curricula. The TEST study aimed to assess how do Portuguese medical graduates self-assess their acquisition of core competences and experiences of contact with patients in core settings according to the Tuning framework. Methods Translation of the Tuning's co...
Discrete Hadamard transformation algorithm's parallelism analysis and achievement

Science.gov (United States)

Hu, Hui

2009-07-01

With respect to Discrete Hadamard Transformation (DHT) wide application in real-time signal processing while limitation in operation speed of DSP. The article makes DHT parallel research and its parallel performance analysis. Based on multiprocessor platform-TMS320C80 programming structure, the research is carried out to achieve two kinds of parallel DHT algorithms. Several experiments demonstrated the effectiveness of the proposed algorithms.
Is Monte Carlo embarrassingly parallel?

Energy Technology Data Exchange (ETDEWEB)

Hoogenboom, J. E. [Delft Univ. of Technology, Mekelweg 15, 2629 JB Delft (Netherlands); Delft Nuclear Consultancy, IJsselzoom 2, 2902 LB Capelle aan den IJssel (Netherlands)

2012-07-01

Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)
Is Monte Carlo embarrassingly parallel?

International Nuclear Information System (INIS)

Hoogenboom, J. E.

2012-01-01

Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)
A program system for ab initio MO calculations on vector and parallel processing machines. Pt. 3

International Nuclear Information System (INIS)

Wiest, R.; Demuynck, J.; Benard, M.; Rohmer, M.M.; Ernenwein, R.

1991-01-01

This series of three papers presents a program system for ab initio molecular orbital calculations on vector and parallel computers. Part III is devoted to the four-index transformation on a molecular orbital basis of size NMO of the file of two-electorn integrals (pqparallelrs) generated by a contracted Gaussian set of size NATO (number of atomic orbitals). A fast Yoshimine algorithm first sorts the (pqparallelrs) integrals with respect to index pq only. This file of half-sorted integrals labelled by their rs-index can be processed without further modification to generate either the transformed integrals or the supermatrix elements. The large memory available on the CRAY-2 hase made possible to implement the transformation algorithm proposed by Bender in 1972, which requires a core-storage allocation varying as (NATO) 3 . Two versions of Bender's algorithm are included in the present program. The first version is an in-core version, where the complete file of accumulated contributions to transformed integrals in stored and updated in central memory. This version has been parallelized by distributing over a limited number of logical tasks the NATO steps corresponding to the scanning of the most external loop. The second version is an out-of-core version, in which twin files are alternatively used as input and output for the accumulated contributions to transformed integrals. This version is not parallel. The choice of one or another version and (for version 1) the determination of the number of tasks depends upon the balance between the available and the requested amounts of storage. The storage management and the choice of the proper version are carried out automatically using dynamic storage allocation. Both versions are vectorized and take advantage of the molecular symmetry. (orig.)

Parallelization of elliptic solver for solving 1D Boussinesq model

Science.gov (United States)

Tarwidi, D.; Adytia, D.

2018-03-01

In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
Tuning magnet power supply

International Nuclear Information System (INIS)

Han, B.M.; Karady, G.G.; Thiessen, H.A.

1989-01-01

The particles in a Rapid Cycling Accelerator are accelerated by rf cavities, which are tuned by dc biased ferrite cores. The tuning is achieved by the regulation of bias current, which is produced by a power supply. The tuning magnet power supply utilizes a bridge circuit, supplied by a three phase rectifier. During the rise of the current, when the particles are accelerated, the current is controlled with precision by the bridge which operates a power amplifier. During the fall of the current, the bridge operates in a switching mode and recovers the energy stored in the ferrites. The recovered energy is stored in a capacitor bank. The bridge circuit is built with 150 power transistors. The drive, protection and control circuit were designed and built from commercial component. The system will be used for a rf cavity experiment in Los Alamos and will serve as a prototype tuning power supply for future accelerators. 1 ref., 7 figs
Thermodynamically Tuned Nanophase Materials for reversible Hydrogen storage

Energy Technology Data Exchange (ETDEWEB)

Ping Liu; John J. Vajo

2010-02-28

This program was devoted to significantly extending the limits of hydrogen storage technology for practical transportation applications. To meet the hydrogen capacity goals set forth by the DOE, solid-state materials consisting of light elements were developed. Many light element compounds are known that have high capacities. However, most of these materials are thermodynamically too stable, and they release and store hydrogen much too slowly for practical use. In this project we developed new light element chemical systems that have high hydrogen capacities while also having suitable thermodynamic properties. In addition, we developed methods for increasing the rates of hydrogen exchange in these new materials. The program has significantly advanced (1) the application of combined hydride systems for tuning thermodynamic properties and (2) the use of nanoengineering for improving hydrogen exchange. For example, we found that our strategy for thermodynamic tuning allows both entropy and enthalpy to be favorably adjusted. In addition, we demonstrated that using porous supports as scaffolds to confine hydride materials to nanoscale dimensions could improve rates of hydrogen exchange by > 50x. Although a hydrogen storage material meeting the requirements for commercial development was not achieved, this program has provided foundation and direction for future efforts. More broadly, nanoconfinment using scaffolds has application in other energy storage technologies including batteries and supercapacitors. The overall goal of this program was to develop a safe and cost-effective nanostructured light-element hydride material that overcomes the thermodynamic and kinetic barriers to hydrogen reaction and diffusion in current materials and thereby achieve > 6 weight percent hydrogen capacity at temperatures and equilibrium pressures consistent with DOE target values.
Tuning Recurrent Neural Networks for Recognizing Handwritten Arabic Words

KAUST Repository

Qaralleh, Esam

2013-10-01

Artificial neural networks have the abilities to learn by example and are capable of solving problems that are hard to solve using ordinary rule-based programming. They have many design parameters that affect their performance such as the number and sizes of the hidden layers. Large sizes are slow and small sizes are generally not accurate. Tuning the neural network size is a hard task because the design space is often large and training is often a long process. We use design of experiments techniques to tune the recurrent neural network used in an Arabic handwriting recognition system. We show that best results are achieved with three hidden layers and two subsampling layers. To tune the sizes of these five layers, we use fractional factorial experiment design to limit the number of experiments to a feasible number. Moreover, we replicate the experiment configuration multiple times to overcome the randomness in the training process. The accuracy and time measurements are analyzed and modeled. The two models are then used to locate network sizes that are on the Pareto optimal frontier. The approach described in this paper reduces the label error from 26.2% to 19.8%.
An improved design of virtual output impedance loop for droop-controlled parallel three-phase Voltage Source Inverters

DEFF Research Database (Denmark)

Wang, Xiongfei; Blaabjerg, Frede; Chen, Zhe

2012-01-01

-sequence virtual resistance even in the case of feeding a balanced three-phase load. Furthermore, to adapt to the variety of unbalanced loads, a dynamically-tuned negative-sequence resistance loop is designed, such that a good compromise between the quality of inverter output voltage and the performance of load......The virtual output impedance loop is known as an effective way to enhance the load sharing stability and quality of droop-controlled parallel inverters. This paper proposes an improved design of virtual output impedance loop for parallel three-phase voltage source inverters. In the approach...... sharing can be obtained. Finally, laboratory test results of two parallel three-phase voltage source inverters are shown to confirm the validity of the proposed method....
Vectorization, parallelization and porting of nuclear codes (vectorization and parallelization). Progress report fiscal 1998

International Nuclear Information System (INIS)

Ishizuki, Shigeru; Kawai, Wataru; Nemoto, Toshiyuki; Ogasawara, Shinobu; Kume, Etsuo; Adachi, Masaaki; Kawasaki, Nobuo; Yatake, Yo-ichi

2000-03-01

Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 12 codes in fiscal 1998. These results are reported in 3 parts, i.e., the vectorization and parallelization on vector processors part, the parallelization on scalar processors part and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of General Tokamak Circuit Simulation Program code GTCSP, the vectorization and parallelization of Molecular Dynamics NTV (n-particle, Temperature and Velocity) Simulation code MSP2, Eddy Current Analysis code EDDYCAL, Thermal Analysis Code for Test of Passive Cooling System by HENDEL T2 code THANPACST2 and MHD Equilibrium code SELENEJ on the VPP500 are described. In the parallelization on scalar processors part, the parallelization of Monte Carlo N-Particle Transport code MCNP4B2, Plasma Hydrodynamics code using Cubic Interpolated Propagation Method PHCIP and Vectorized Monte Carlo code (continuous energy model / multi-group model) MVP/GMVP on the Paragon are described. In the porting part, the porting of Monte Carlo N-Particle Transport code MCNP4B2 and Reactor Safety Analysis code RELAP5 on the AP3000 are described. (author)
Fabrication and Final Field Tuning of Copper Cavity Models for a High-Current SRF ERL at 703.75 MHz

CERN Document Server

Cole, Michael; Burger, Al; Falletta, Michael; Holmes, Douglas; Peterson, Ed; Wong, Robert

2005-01-01

Advanced Energy Systems is currently under contract to BNL to fabricate a five cell superconducting cavity and cryomodule for the RHIC eCooler SRF Energy Recovery Linac (ERL) program.* The cavity is designed and optimized for ampere class SRF ERL service. As part of this program, we have fabricated two low power copper models of the RF cavities. During the fabrication process a series of frequency measurements were made and compared to the frequency expected at that point in the fabrication process. Where possible, the cavity was modified either before or during, the next fabrication step to tune the cavity frequency toward the target frequency. Following completion of the cavities they were tuned for field flatness and frequency. This paper will review the measurements made, frequency tuning performed, and discuss discrepancies between the expected and measured results. We will also review the as fabricated field profiles and the results of the tuning steps. Further, the cost and benefits of extensive in pro...
Massively parallel sparse matrix function calculations with NTPoly

Science.gov (United States)

Dawson, William; Nakajima, Takahito

2018-04-01

We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
An efficient implementation of parallel molecular dynamics method on SMP cluster architecture

International Nuclear Information System (INIS)

Suzuki, Masaaki; Okuda, Hiroshi; Yagawa, Genki

2003-01-01

The authors have applied MPI/OpenMP hybrid parallel programming model to parallelize a molecular dynamics (MD) method on a symmetric multiprocessor (SMP) cluster architecture. In that architecture, it can be expected that the hybrid parallel programming model, which uses the message passing library such as MPI for inter-SMP node communication and the loop directive such as OpenMP for intra-SNP node parallelization, is the most effective one. In this study, the parallel performance of the hybrid style has been compared with that of conventional flat parallel programming style, which uses only MPI, both in cases the fast multipole method (FMM) is employed for computing long-distance interactions and that is not employed. The computer environments used here are Hitachi SR8000/MPP placed at the University of Tokyo. The results of calculation are as follows. Without FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 90% with the hybrid style, 75% with the flat-MPI style for MD simulation with 33,402 atoms. With FMM, the parallel efficiency using 16 SMP nodes (128 PEs) is: 60% with the hybrid style, 48% with the flat-MPI style for MD simulation with 117,649 atoms. (author)
ATLAS Run 1 Pythia8 tunes

CERN Document Server

The ATLAS collaboration

2014-01-01

We present tunes of the Pythia8 Monte~Carlo event generator's parton shower and multiple parton interaction parameters to a range of data observables from ATLAS Run 1. Four new tunes have been constructed, corresponding to the four leading-order parton density functions, CTEQ6L1, MSTW2008LO, NNPDF23LO, and HERAPDF15LO, each simultaneously tuning ten generator parameters. A set of systematic variations is provided for the NNPDF tune, based on the eigentune method. These tunes improve the modeling of observables that can be described by leading-order + parton shower simulation, and are primarily intended for use in situations where next-to-leading-order and/or multileg parton-showered simulations are unavailable or impractical.
An adaptive tuned mass damper based on the emulation of positive and negative stiffness with an MR damper

International Nuclear Information System (INIS)

Weber, F; Boston, C; Maślanka, M

2011-01-01

This paper presents a new adaptive tuned mass damper (TMD) whose stiffness and damping can be tuned in real-time to changing frequencies of a target structure. The adaptive TMD consists of a tuned mass, a tuned passive spring and a magnetorheological (MR) damper. The MR damper is used to emulate controlled friction–viscous damping and controlled stiffness. The controlled positive or negative stiffness emulated by the MR damper works in parallel to the stiffness of the passive TMD spring. The resulting overall TMD stiffness can therefore be varied around the passive spring stiffness using the MR damper. Both the emulated stiffness and friction–viscous damping in the MR damper are controlled such that the resulting overall TMD stiffness and damping are adjusted according to Den Hartog's formulae. Simulations demonstrate that the adaptive TMD with a controlled MR damper provides the same reduction of steady state vibration amplitudes in the target structure as a passive TMD if the target structure vibrates at the nominal frequency. However, if the target structure vibrates at different frequencies, e.g. due to changed service loads, the adaptive TMD with a controlled MR damper outperforms the passive TMD by up to several 100% depending on the frequency change
Application of Coherent Tune Shift Measurements to the Characterization of Electron Cloud Growth

International Nuclear Information System (INIS)

Kreinick, D.L.; Crittenden, J.A.; Dugan, G.; Holtzapple, R.L.; Randazzo, M.; Furman, M.A.; Venturini, M.; Palmer, M.A.; Ramirez, G.

2011-01-01

Measurements of coherent tune shifts at the Cornell Electron Storage Ring Test Accelerator (CesrTA) have been made for electron and positron beams under a wide variety of beam energies, bunch charge, and bunch train configurations. Comparing the observed tunes with the predictions of several electron cloud simulation programs allows the evaluation of important parameters in these models. These simulations will be used to predict the behavior of the electron cloud in damping rings for future linear colliders. We outline recent improvements to the analysis techniques that should improve the fidelity of the modeling.
Expressing Parallelism with ROOT

Energy Technology Data Exchange (ETDEWEB)

Piparo, D. [CERN; Tejedor, E. [CERN; Guiraud, E. [CERN; Ganis, G. [CERN; Mato, P. [CERN; Moneta, L. [CERN; Valls Pla, X. [CERN; Canal, P. [Fermilab

2017-11-22

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT

Science.gov (United States)

Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.

2017-10-01

The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Robust Self Tuning Controllers

DEFF Research Database (Denmark)

Poulsen, Niels Kjølstad

1985-01-01

The present thesis concerns robustness properties of adaptive controllers. It is addressed to methods for robustifying self tuning controllers with respect to abrupt changes in the plant parameters. In the thesis an algorithm for estimating abruptly changing parameters is presented. The estimator...... has several operation modes and a detector for controlling the mode. A special self tuning controller has been developed to regulate plant with changing time delay.......The present thesis concerns robustness properties of adaptive controllers. It is addressed to methods for robustifying self tuning controllers with respect to abrupt changes in the plant parameters. In the thesis an algorithm for estimating abruptly changing parameters is presented. The estimator...
Parallel computation for biological sequence comparison: comparing a portable model to the native model for the Intel Hypercube.

Science.gov (United States)

Nadkarni, P M; Miller, P L

1991-01-01

A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
Pre-tuning of TRISTAN superconducting RF cavities

International Nuclear Information System (INIS)

Tajima, Tsuyoshi; Furuya, Takaaki; Suzuki, Toshiji; Iino, Yohsuke.

1990-01-01

Pre-tuning of thirty-two TRISTAN superconducting cavities has been done. In this paper are described the pre-tuning system and the results of all the cavities. The average field flatness was 1.4 % after pre-tuning. From our experience, the followings are important, 1) to evacuate the cavity during the process of the pre-tuning to avoid the uncertainty in evacuation, 2) pre-tuning is needed after annealing because it causes changes of the cell length and the field profile and 3) field flatness sometimes changes when expanded and 4) cells should not be expanded more than 1.5 mm after pre-tuning since inelastic deformation occurs. (author)
Comparative Evaluation and Case Studies of Shared-Memory and Data-Parallel Execution Patterns

Directory of Open Access Journals (Sweden)

Xiaodong Zhang

1999-01-01

Full Text Available Shared‐memory and data‐parallel programming models are two important paradigms for scientific applications. Both models provide high‐level program abstractions, and simple and uniform views of network structures. The common features of the two models significantly simplify program coding and debugging for scientific applications. However, the underlining execution and overhead patterns are significantly different between the two models due to their programming constraints, and due to different and complex structures of interconnection networks and systems which support the two models. We performed this experimental study to present implications and comparisons of execution patterns on two commercial architectures. We implemented a standard electromagnetic simulation program (EM and a linear system solver using the shared‐memory model on the KSR‐1 and the data‐parallel model on the CM‐5. Our objectives are to examine the execution pattern changes required for an implementation transformation between the two models; to study memory access patterns; to address scalability issues; and to investigate relative costs and advantages/disadvantages of using the two models for scientific computations. Our results indicate that the EM program tends to become computation‐intensive in the KSR‐1 shared‐memory system, and memory‐demanding in the CM‐5 data‐parallel system when the systems and the problems are scaled. The EM program, a highly data‐parallel program performed extremely well, and the linear system solver, a highly control‐structured program suffered significantly in the data‐parallel model on the CM‐5. Our study provides further evidence that matching execution patterns of algorithms to parallel architectures would achieve better performance.
How safe is tuning a radio?: using the radio tuning task as a benchmark for distracted driving.

Science.gov (United States)

Lee, Ja Young; Lee, John D; Bärgman, Jonas; Lee, Joonbum; Reimer, Bryan

2018-01-01

Drivers engage in non-driving tasks while driving, such as interactions entertainment systems. Studies have identified glance patterns related to such interactions, and manual radio tuning has been used as a reference task to set an upper bound on the acceptable demand of interactions. Consequently, some view the risk associated with radio tuning as defining the upper limit of glance measures associated with visual-manual in-vehicle activities. However, we have little knowledge about the actual degree of crash risk that radio tuning poses and, by extension, the risk of tasks that have similar glance patterns as the radio tuning task. In the current study, we use counterfactual simulation to take the glance patterns for manual radio tuning tasks from an on-road experiment and apply these patterns to lead-vehicle events observed in naturalistic driving studies. We then quantify how often the glance patterns from radio tuning are associated with rear-end crashes, compared to driving only situations. We used the pre-crash kinematics from 34 crash events from the SHRP2 naturalistic driving study to investigate the effect of radio tuning in crash-imminent situations, and we also investigated the effect of radio tuning on 2,475 routine braking events from the Safety Pilot project. The counterfactual simulation showed that off-road glances transform some near-crashes that could have been avoided into crashes, and glance patterns observed in on-road radio tuning experiment produced 2.85-5.00 times more crashes than baseline driving. Copyright © 2017 Elsevier Ltd. All rights reserved.
An Educational Tool for Interactive Parallel and Distributed Processing

DEFF Research Database (Denmark)

Pagliarini, Luigi; Lund, Henrik Hautop

2011-01-01

In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation of the abs......In this paper we try to describe how the Modular Interactive Tiles System (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing an educational hands-on tool that allows a change of representation...... of the abstract problems related to designing interactive parallel and distributed systems. Indeed, MITS seems to bring a series of goals into the education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, user and multiuser interaction, which can hardly be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples show how to implement interactive...

Parallel treatment of simulation particles in particle-in-cell codes on SUPRENUM

International Nuclear Information System (INIS)

Seldner, D.

1990-02-01

This report contains the program documentation and description of the program package 2D-PLAS, which has been developed at the Nuclear Research Center Karlsruhe in the Institute for Data Processing in Technology (IDT) under the auspices of the BMFT. 2D-PLAS is a parallel program version of the treatment of the simulation particles of the two-dimensional stationary particle-in-cell code BFCPIC which has been developed at the Nuclear Research Center Karlsruhe. This parallel version has been designed for the parallel computer SUPRENUM. (orig.) [de
Collectively loading programs in a multiple program multiple data environment

Science.gov (United States)

Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.; Gooding, Thomas M.; Miller, Samuel J.

2016-11-08

Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the program needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.
Introducing PROFESS 2.0: A parallelized, fully linear scaling program for orbital-free density functional theory calculations

Science.gov (United States)

Hung, Linda; Huang, Chen; Shin, Ilgyou; Ho, Gregory S.; Lignères, Vincent L.; Carter, Emily A.

2010-12-01

Orbital-free density functional theory (OFDFT) is a first principles quantum mechanics method to find the ground-state energy of a system by variationally minimizing with respect to the electron density. No orbitals are used in the evaluation of the kinetic energy (unlike Kohn-Sham DFT), and the method scales nearly linearly with the size of the system. The PRinceton Orbital-Free Electronic Structure Software (PROFESS) uses OFDFT to model materials from the atomic scale to the mesoscale. This new version of PROFESS allows the study of larger systems with two significant changes: PROFESS is now parallelized, and the ion-electron and ion-ion terms scale quasilinearly, instead of quadratically as in PROFESS v1 (L. Hung and E.A. Carter, Chem. Phys. Lett. 475 (2009) 163). At the start of a run, PROFESS reads the various input files that describe the geometry of the system (ion positions and cell dimensions), the type of elements (defined by electron-ion pseudopotentials), the actions you want it to perform (minimize with respect to electron density and/or ion positions and/or cell lattice vectors), and the various options for the computation (such as which functionals you want it to use). Based on these inputs, PROFESS sets up a computation and performs the appropriate optimizations. Energies, forces, stresses, material geometries, and electron density configurations are some of the values that can be output throughout the optimization. New version program summaryProgram Title: PROFESS Catalogue identifier: AEBN_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBN_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 68 721 No. of bytes in distributed program, including test data, etc.: 1 708 547 Distribution format: tar.gz Programming language: Fortran 90 Computer
Nonlinear effects in varactor-tuned resonators.

Science.gov (United States)

Everard, Jeremy; Zhou, Liang

2006-05-01

This paper describes the effects of RF power level on the performance of varactor-tuned resonator circuits. A variety of topologies are considered, including series and parallel resonators operating in both unbalanced and balanced modes. As these resonators were designed to produce oscillators with minimum phase noise, the initial small signal insertion loss was set to 6 dB and, hence, QL/Q0 = 1/2. To enable accurate analysis and simulation, S parameter and PSPICE models for the varactors were optimized and developed. It is shown that these resonators start to demonstrate nonlinear operation at very low power levels demonstrating saturation and lowering of the resonant frequency. On occasion squegging is observed for modified bias conditions. The nonlinear effects are dependent on the unloaded Q (Q0), the ratio of loaded to unloaded Q (QL/Q0), the bias voltage, and circuit configurations with typical nonlinear effects occurring at -8 dBm in a circuit with a loaded Q of 63 and a varactor bias voltage of 3 V. Analysis, simulation, and measurements that show close correlation are presented.
Betatron tune measurement

International Nuclear Information System (INIS)

Dinev, D.

2001-01-01

On the basis of the comparative review of the methods for the betatron tune measurement in cyclic accelerators of synchrotrons type, the research of these methods is carried out from the point of view of their applicability to Nuclotron. Both methods using measurement of the statistical fluctuations of the beam current (Schottky noise) and methods using coherent beam excitation have been discussed. The emphasis is on the final results of importance for the tune measurement practice. Signal processing is briefly discussed too
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

Science.gov (United States)

Cooke, Daniel; Rushton, Nelson

2013-01-01

With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less
Parallel-Architecture Simulator Development Using Hardware Transactional Memory

OpenAIRE

Armejach Sanosa, Adrià

2009-01-01

To address the need for a simpler parallel programming model, Transactional Memory (TM) has been developed and promises good parallel performance with easy-to-write parallel code. Unlike lock-based approaches, with TM, programmers do not need to explicitly specify and manage the synchronization among threads. However, programmers simply mark code segments as transactions, and the TM system manages the concurrency control for them. TM can be implemented either in software (STM) or hardware (HT...
Upgrades to PEP-II Tune Measurements

Energy Technology Data Exchange (ETDEWEB)

Fisher, Alan S.

2002-07-30

The tune monitors for the two-ring PEP-II collider convert signals from one set of four BPM-type pickup buttons per ring into horizontal and vertical differences, which are then downconverted from 952 MHz (twice the RF) to baseband. Two-channel 10-MHz FFT spectrum analyzers show spectra in X-window displays in the Control Room, to assist PEP operators. When operating with the original system near the beam-beam limit, collisions broadened and flattened the tune peaks, often bringing them near the noise floor. We recently installed new downconverters that increase the signal-to-noise ratio by about 5 dB. In addition, we went from one to two sets of pickups per ring, near focusing and defocusing quadrupoles, so that signals for both planes originate at locations with large amplitudes. We also have just installed a tune tracker, based on a digital lock-in amplifier (one per tune plane) that is controlled by an EPICS software feedback loop. The tracker monitors the phase of the beam's response to a sinusoidal excitation, and adjusts the drive frequency to track the middle of the 1 go-degree phase transition across the tune resonance. We plan next to test an outer loop controlling the tune quadrupoles based on this tune measurement.
Upgrades to PEP-II Tune Measurements

Energy Technology Data Exchange (ETDEWEB)

Fisher, Alan S.

2002-07-30

The tune monitors for the two-ring PEP-II collider convert signals from one set of four BPM-type pickup buttons per ring into horizontal and vertical differences, which are then downconverted from 952 MHz (twice the RF) to baseband. Two-channel l0-MHz FFT spectrum analyzers show spectra in X-window displays in the Control Room, to assist PEP operators. When operating with the original system near the beam-beam limit, collisions broadened and flattened the tune peaks, often bringing them near the noise floor. We recently installed new downconverters that increase the signal-to-noise ratio by about 5 dB. In addition, we went from one to two sets of pickups per ring, near focusing and defocusing quadrupoles, so that signals for both planes originate at locations with large amplitudes. We also have just installed a tune tracker, based on a digital lock-in amplifier (one per tune plane) that is controlled by an EPICS software feedback loop. The tracker monitors the phase of the beam's response to a sinusoidal excitation, and adjusts the drive frequency to track the middle of the 180-degree phase transition across the tune resonance. We plan next to test an outer loop controlling the tune quadrupoles based on this tune measurement.
Upgrades to PEP-II tune measurements

International Nuclear Information System (INIS)

Fisher, Alan S.; Petree, Mark; Wienands, Uli; Allison, Stephanie; Laznovsky, Michael; Seeman, Michael; Robin, Jolene

2002-01-01

The tune monitors for the two-ring PEP-II collider convert signals from one set of four BPM-type pickup buttons per ring into horizontal and vertical differences, which are then downconverted from 952 MHz (twice the RF) to baseband. Two-channel 10-MHz FFT spectrum analyzers show spectra in X-window displays in the Control Room, to assist PEP operators. When operating with the original system near the beam-beam limit, collisions broadened and flattened the tune peaks, often bringing them near the noise floor. We recently installed new downconverters that increase the signal-to-noise ratio by about 5 dB. In addition, we went from one to two sets of pickups per ring, near focusing and defocusing quadrupoles, so that signals for both planes originate at locations with large amplitudes. We also have just installed a tune tracker, based on a digital lock-in amplifier (one per tune plane) that is controlled by an EPICS software feedback loop. The tracker monitors the phase of the beam's response to a sinusoidal excitation, and adjusts the drive frequency to track the middle of the 180-degree phase transition across the tune resonance. We plan next to test an outer loop controlling the tune quadrupoles based on this tune measurement
Automatic tuning of free electron lasers

Energy Technology Data Exchange (ETDEWEB)

Agapov, Ilya; Zagorodnov, Igor [Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany); Geloni, Gianluca [European XFEL, Schenefeld (Germany); Tomin, Sergey [European XFEL, Schenefeld (Germany); NRC Kurchatov Institute, Moscow (Russian Federation)

2017-04-07

Existing FEL facilities often suffer from stability issues: so electron orbit, transverse electron optics, electron bunch compression and other parameters have to be readjusted often to account for drifts in performance of various components. The tuning procedures typically employed in operation are often manual and lengthy. We have been developing a combination of model-free and model-based automatic tuning methods to meet the needs of present and upcoming XFEL facilities. Our approach has been implemented at FLASH to achieve automatic SASE tuning using empirical control of orbit, electron optics and bunch compression. In this paper we describe our approach to empirical tuning, the software which implements it, and the results of using it at FLASH.We also discuss the potential of using machine learning and model-based techniques in tuning methods.
Automatic tuning of free electron lasers

International Nuclear Information System (INIS)

Agapov, Ilya; Zagorodnov, Igor; Geloni, Gianluca; Tomin, Sergey

2017-01-01

Existing FEL facilities often suffer from stability issues: so electron orbit, transverse electron optics, electron bunch compression and other parameters have to be readjusted often to account for drifts in performance of various components. The tuning procedures typically employed in operation are often manual and lengthy. We have been developing a combination of model-free and model-based automatic tuning methods to meet the needs of present and upcoming XFEL facilities. Our approach has been implemented at FLASH to achieve automatic SASE tuning using empirical control of orbit, electron optics and bunch compression. In this paper we describe our approach to empirical tuning, the software which implements it, and the results of using it at FLASH.We also discuss the potential of using machine learning and model-based techniques in tuning methods.
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

Directory of Open Access Journals (Sweden)

Hari Radhakrishnan

2015-01-01

Full Text Available This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were done using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.
Parallel Programming Application to Matrix Algebra in the Spectral Method for Control Systems Analysis, Synthesis and Identification

Directory of Open Access Journals (Sweden)

V. Yu. Kleshnin

2016-01-01

Full Text Available The article describes the matrix algebra libraries based on the modern technologies of parallel programming for the Spectrum software, which can use a spectral method (in the spectral form of mathematical description to analyse, synthesise and identify deterministic and stochastic dynamical systems. The developed matrix algebra libraries use the following technologies for the GPUs: OmniThreadLibrary, OpenMP, Intel Threading Building Blocks, Intel Cilk Plus for CPUs nVidia CUDA, OpenCL, and Microsoft Accelerated Massive Parallelism.The developed libraries support matrices with real elements (single and double precision. The matrix dimensions are limited by 32-bit or 64-bit memory model and computer configuration. These libraries are general-purpose and can be used not only for the Spectrum software. They can also find application in the other projects where there is a need to perform operations with large matrices.The article provides a comparative analysis of the libraries developed for various matrix operations (addition, subtraction, scalar multiplication, multiplication, powers of matrices, tensor multiplication, transpose, inverse matrix, finding a solution of the system of linear equations through the numerical experiments using different CPU and GPU. The article contains sample programs and performance test results for matrix multiplication, which requires most of all computational resources in regard to the other operations.
Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS

Science.gov (United States)

Huang, Fang; Liu, Dingsheng; Tan, Xicheng; Wang, Jian; Chen, Yunping; He, Binbin

2011-04-01

To design and implement an open-source parallel GIS (OP-GIS) based on a Linux cluster, the parallel inverse distance weighting (IDW) interpolation algorithm has been chosen as an example to explore the working model and the principle of algorithm parallel pattern (APP), one of the parallelization patterns for OP-GIS. Based on an analysis of the serial IDW interpolation algorithm of GRASS GIS, this paper has proposed and designed a specific parallel IDW interpolation algorithm, incorporating both single process, multiple data (SPMD) and master/slave (M/S) programming modes. The main steps of the parallel IDW interpolation algorithm are: (1) the master node packages the related information, and then broadcasts it to the slave nodes; (2) each node calculates its assigned data extent along one row using the serial algorithm; (3) the master node gathers the data from all nodes; and (4) iterations continue until all rows have been processed, after which the results are outputted. According to the experiments performed in the course of this work, the parallel IDW interpolation algorithm can attain an efficiency greater than 0.93 compared with similar algorithms, which indicates that the parallel algorithm can greatly reduce processing time and maximize speed and performance.
Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

Science.gov (United States)

Oliker, Leonid; Heber, Gerd; Biswas, Rupak

2000-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages

Science.gov (United States)

2013-01-02

an array similar to that in the scalar case, but where each item in the array is a C struct instead of a built-in datatype ; this is called an array of...provide methods in our class that enable execution. In addition, we restrict the elemental datatypes to a subset of those supported by NumPy [88... datatypes and C++ datatypes when necessary, but in the case of the Scala backend, the DSEL uses Asp’s Scala support to translate datatypes using the Apache
Model-independent particle accelerator tuning

Directory of Open Access Journals (Sweden)

Alexander Scheinker

2013-10-01

Full Text Available We present a new model-independent dynamic feedback technique, rotation rate tuning, for automatically and simultaneously tuning coupled components of uncertain, complex systems. The main advantages of the method are: (1 it has the ability to handle unknown, time-varying systems, (2 it gives known bounds on parameter update rates, (3 we give an analytic proof of its convergence and its stability, and (4 it has a simple digital implementation through a control system such as the experimental physics and industrial control system (EPICS. Because this technique is model independent it may be useful as a real-time, in-hardware, feedback-based optimization scheme for uncertain and time-varying systems. In particular, it is robust enough to handle uncertainty due to coupling, thermal cycling, misalignments, and manufacturing imperfections. As a result, it may be used as a fine-tuning supplement for existing accelerator tuning/control schemes. We present multiparticle simulation results demonstrating the scheme’s ability to simultaneously adaptively adjust the set points of 22 quadrupole magnets and two rf buncher cavities in the Los Alamos Neutron Science Center (LANSCE Linear Accelerator’s transport region, while the beam properties and rf phase shift are continuously varying. The tuning is based only on beam current readings, without knowledge of particle dynamics. We also present an outline of how to implement this general scheme in software for optimization, and in hardware for feedback-based control/tuning, for a wide range of systems.
Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems

KAUST Repository

Mudigere, Dheevatsa

2015-05-01

In this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain decomposition approach, exposes tradeoffs between the number of threads assigned to each MPI-rank sub domain, and the total number of domains. By applying several algorithm- and architecture-aware optimization techniques for unstructured grids, we show a 6.9X speed-up in performance on a single-node Intel® XeonTM1 E5 2690 v2 processor relative to the out-of-the-box compilation. Our scaling studies on TACC Stampede supercomputer show that our optimizations continue to provide performance benefits over baseline implementation as we scale up to 256 nodes.
Study on MPI/OpenMP hybrid parallelism for Monte Carlo neutron transport code

International Nuclear Information System (INIS)

Liang Jingang; Xu Qi; Wang Kan; Liu Shiwen

2013-01-01

Parallel programming with mixed mode of messages-passing and shared-memory has several advantages when used in Monte Carlo neutron transport code, such as fitting hardware of distributed-shared clusters, economizing memory demand of Monte Carlo transport, improving parallel performance, and so on. MPI/OpenMP hybrid parallelism was implemented based on a one dimension Monte Carlo neutron transport code. Some critical factors affecting the parallel performance were analyzed and solutions were proposed for several problems such as contention access, lock contention and false sharing. After optimization the code was tested finally. It is shown that the hybrid parallel code can reach good performance just as pure MPI parallel program, while it saves a lot of memory usage at the same time. Therefore hybrid parallel is efficient for achieving large-scale parallel of Monte Carlo neutron transport. (authors)

RF Measurements and Tuning of the 750 MHz HF-RFQ

CERN Document Server

Koubek, Benjamin; Timmins, Marc; CERN. Geneva. ATS Department

2017-01-01

In the frame of the program on medical applications CERN has built a compact 750 MHz RFQ to be used as an injector for a hadron therapy linac. This RFQ was designed to accelerate protons to an energy of 5 {\\lambda} MeV within only 2 m length. It is divided into four segments and equipped with 32 tuners in total. The length of the RFQ corresponds to 5 which is considered to be close to the limit for field adjustment using only piston tuners. Moreover the high frequency, which is about double the frequency of existing RFQs, results in a sensitive structure and requires careful tuning by means of the alignment of the pumping ports and fixed tuners. This report summarises the tuning procedure, RF and bead pull measurements of the RFQ.
Truly nested data-parallelism: compiling SaC for the Microgrid architecture

NARCIS (Netherlands)

Herhut, S.; Joslin, C.; Scholz, S.-B.; Grelck, C.; Morazan, M.

2009-01-01

Data-parallel programming facilitates elegant specification of concurrency. However, the composability of data-parallel operations so far has been constrained by the requirement to have only at data- parallel operation at runtime. In this paper, we present early results on our work to exploit
Fixed field alternating gradient accelerator with small orbit shift and tune excursion

Directory of Open Access Journals (Sweden)

Suzanne L. Sheehy

2010-04-01

Full Text Available A new design principle of a nonscaling fixed field alternating gradient accelerator is proposed. It is based on optics that produce approximate scaling properties. A large field index k is chosen to squeeze the orbit shift as much as possible by setting the betatron oscillation frequency in the second stability region of Hill’s equation. Then, the lattice magnets and their alignment are simplified. To simplify the magnets, we expand the field profile of r^{k} into multipoles and keep only a few lower order terms. A rectangular-shaped magnet is assumed with lines of constant field parallel to the magnet axis. The lattice employs a triplet of rectangular magnets for focusing, which are parallel to one another to simplify alignment. These simplifications along with fringe fields introduce finite chromaticity and the fixed field alternating gradient accelerator is no longer a scaling one. However, the tune excursion of the whole ring can be within half an integer and we avoid the crossing of strong resonances.
Final Report, Center for Programming Models for Scalable Parallel Computing: Co-Array Fortran, Grant Number DE-FC02-01ER25505

Energy Technology Data Exchange (ETDEWEB)

Robert W. Numrich

2008-04-22

The major accomplishment of this project is the production of CafLib, an 'object-oriented' parallel numerical library written in Co-Array Fortran. CafLib contains distributed objects such as block vectors and block matrices along with procedures, attached to each object, that perform basic linear algebra operations such as matrix multiplication, matrix transpose and LU decomposition. It also contains constructors and destructors for each object that hide the details of data decomposition from the programmer, and it contains collective operations that allow the programmer to calculate global reductions, such as global sums, global minima and global maxima, as well as vector and matrix norms of several kinds. CafLib is designed to be extensible in such a way that programmers can define distributed grid and field objects, based on vector and matrix objects from the library, for finite difference algorithms to solve partial differential equations. A very important extra benefit that resulted from the project is the inclusion of the co-array programming model in the next Fortran standard called Fortran 2008. It is the first parallel programming model ever included as a standard part of the language. Co-arrays will be a supported feature in all Fortran compilers, and the portability provided by standardization will encourage a large number of programmers to adopt it for new parallel application development. The combination of object-oriented programming in Fortran 2003 with co-arrays in Fortran 2008 provides a very powerful programming model for high-performance scientific computing. Additional benefits from the project, beyond the original goal, include a programto provide access to the co-array model through access to the Cray compiler as a resource for teaching and research. Several academics, for the first time, included the co-array model as a topic in their courses on parallel computing. A separate collaborative project with LANL and PNNL showed how to
Pthreads vs MPI Parallel Performance of Angular-Domain Decomposed S

International Nuclear Information System (INIS)

Azmy, Y.Y.; Barnett, D.A.

2000-01-01

Two programming models for parallelizing the Angular Domain Decomposition (ADD) of the discrete ordinates (S n ) approximation of the neutron transport equation are examined. These are the shared memory model based on the POSIX threads (Pthreads) standard, and the message passing model based on the Message Passing Interface (MPI) standard. These standard libraries are available on most multiprocessor platforms thus making the resulting parallel codes widely portable. The question is: on a fixed platform, and for a particular code solving a given test problem, which of the two programming models delivers better parallel performance? Such comparison is possible on Symmetric Multi-Processors (SMP) architectures in which several CPUs physically share a common memory, and in addition are capable of emulating message passing functionality. Implementation of the two-dimensional,(S n ), Arbitrarily High Order Transport (AHOT) code for solving neutron transport problems using these two parallelization models is described. Measured parallel performance of each model on the COMPAQ AlphaServer 8400 and the SGI Origin 2000 platforms is described, and comparison of the observed speedup for the two programming models is reported. For the case presented in this paper it appears that the MPI implementation scales better than the Pthreads implementation on both platforms
Fermilab main accelerator quadrupole transistorized regulators for improved tune stability

International Nuclear Information System (INIS)

Yarema, R.J.; Pfeffer, H.

1977-01-01

During early operation of the Fermilab Main Accelerator, tune fluctuations, caused by the SCR-controlled power supplies in the quad bus, limited the beam aperature at low energies. To correct this problem, two transistorized power supplies were built in 1975 to regulate and filter the main ring quad magnet current during injection and beam acceleration through the rf transistion region. There is one power supply in series with each quad bus. Each supply uses 320 parallel power transistors and is rated at 300A, 120V. Since the voltage and current capabilities of the transistorized supplies are limited, the supplies are turned-off at about 25GeV. A real-time computer system initiates turn-on of the SCR-controlled power supplies and regulation takeover by the SCR-controlled supplies, at the appropriate times
Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

KAUST Repository

Hamam, Alwaleed A.

2017-03-13

Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it\\'s time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.
Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

KAUST Repository

Hamam, Alwaleed A.; Khan, Ayaz H.

2017-01-01

Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it's time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.
Development of parallel benchmark code by sheet metal forming simulator 'ITAS'

International Nuclear Information System (INIS)

Watanabe, Hiroshi; Suzuki, Shintaro; Minami, Kazuo

1999-03-01

This report describes the development of parallel benchmark code by sheet metal forming simulator 'ITAS'. ITAS is a nonlinear elasto-plastic analysis program by the finite element method for the purpose of the simulation of sheet metal forming. ITAS adopts the dynamic analysis method that computes displacement of sheet metal at every time unit and utilizes the implicit method with the direct linear equation solver. Therefore the simulator is very robust. However, it requires a lot of computational time and memory capacity. In the development of the parallel benchmark code, we designed the code by MPI programming to reduce the computational time. In numerical experiments on the five kinds of parallel super computers at CCSE JAERI, i.e., SP2, SR2201, SX-4, T94 and VPP300, good performances are observed. The result will be shown to the public through WWW so that the benchmark results may become a guideline of research and development of the parallel program. (author)
Selective enhancement of orientation tuning before saccades.

Science.gov (United States)

Ohl, Sven; Kuper, Clara; Rolfs, Martin

2017-11-01

Saccadic eye movements cause a rapid sweep of the visual image across the retina and bring the saccade's target into high-acuity foveal vision. Even before saccade onset, visual processing is selectively prioritized at the saccade target. To determine how this presaccadic attention shift exerts its influence on visual selection, we compare the dynamics of perceptual tuning curves before movement onset at the saccade target and in the opposite hemifield. Participants monitored a 30-Hz sequence of randomly oriented gratings for a target orientation. Combining a reverse correlation technique previously used to study orientation tuning in neurons and general additive mixed modeling, we found that perceptual reports were tuned to the target orientation. The gain of orientation tuning increased markedly within the last 100 ms before saccade onset. In addition, we observed finer orientation tuning right before saccade onset. This increase in gain and tuning occurred at the saccade target location and was not observed at the incongruent location in the opposite hemifield. The present findings suggest, therefore, that presaccadic attention exerts its influence on vision in a spatially and feature-selective manner, enhancing performance and sharpening feature tuning at the future gaze location before the eyes start moving.
On а Recursive-Parallel Algorithm for Solving the Knapsack Problem

Directory of Open Access Journals (Sweden)

Vladimir V. Vasilchikov

2018-01-01

Full Text Available In this paper, we offer an efficient parallel algorithm for solving the NP-complete Knapsack Problem in its basic, so-called 0-1 variant. To find its exact solution, algorithms belonging to the category ”branch and bound methods” have long been used. To speed up the solving with varying degrees of efficiency, various options for parallelizing computations are also used. We propose here an algorithm for solving the problem, based on the paradigm of recursive-parallel computations. We consider it suited well for problems of this kind, when it is difficult to immediately break up the computations into a sufficient number of subtasks that are comparable in complexity, since they appear dynamically at run time. We used the RPM ParLib library, developed by the author, as the main tool to program the algorithm. This library allows us to develop effective applications for parallel computing on a local network in the .NET Framework. Such applications have the ability to generate parallel branches of computation directly during program execution and dynamically redistribute work between computing modules. Any language with support for the .NET Framework can be used as a programming language in conjunction with this library. For our experiments, we developed some C# applications using this library. The main purpose of these experiments was to study the acceleration achieved by recursive-parallel computing. A detailed description of the algorithm and its testing, as well as the results obtained, are also given in the paper.
Parallel computation

International Nuclear Information System (INIS)

Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

1997-01-01

The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment
Parallel evolutionary computation in bioinformatics applications.

Science.gov (United States)

Pinho, Jorge; Sobral, João Luis; Rocha, Miguel

2013-05-01

A large number of optimization problems within the field of Bioinformatics require methods able to handle its inherent complexity (e.g. NP-hard problems) and also demand increased computational efforts. In this context, the use of parallel architectures is a necessity. In this work, we propose ParJECoLi, a Java based library that offers a large set of metaheuristic methods (such as Evolutionary Algorithms) and also addresses the issue of its efficient execution on a wide range of parallel architectures. The proposed approach focuses on the easiness of use, making the adaptation to distinct parallel environments (multicore, cluster, grid) transparent to the user. Indeed, this work shows how the development of the optimization library can proceed independently of its adaptation for several architectures, making use of Aspect-Oriented Programming. The pluggable nature of parallelism related modules allows the user to easily configure its environment, adding parallelism modules to the base source code when needed. The performance of the platform is validated with two case studies within biological model optimization. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Domain decomposition methods and parallel computing

International Nuclear Information System (INIS)

Meurant, G.

1991-01-01

In this paper, we show how to efficiently solve large linear systems on parallel computers. These linear systems arise from discretization of scientific computing problems described by systems of partial differential equations. We show how to get a discrete finite dimensional system from the continuous problem and the chosen conjugate gradient iterative algorithm is briefly described. Then, the different kinds of parallel architectures are reviewed and their advantages and deficiencies are emphasized. We sketch the problems found in programming the conjugate gradient method on parallel computers. For this algorithm to be efficient on parallel machines, domain decomposition techniques are introduced. We give results of numerical experiments showing that these techniques allow a good rate of convergence for the conjugate gradient algorithm as well as computational speeds in excess of a billion of floating point operations per second. (author). 5 refs., 11 figs., 2 tabs., 1 inset
Performance Tuning of x86 OpenMP Codes with MAQAO

Science.gov (United States)

Barthou, Denis; Charif Rubial, Andres; Jalby, William; Koliai, Souad; Valensi, Cédric

Failing to find the best optimization sequence for a given application code can lead to compiler generated codes with poor performances or inappropriate code. It is necessary to analyze performances from the assembly generated code to improve over the compilation process. This paper presents a tool for the performance analysis of multithreaded codes (OpenMP programs support at the moment). MAQAO relies on static performance evaluation to identify compiler optimizations and assess performance of loops. It exploits static binary rewriting for reading and instrumenting object files or executables. Static binary instrumentation allows the insertion of probes at instruction level. Memory accesses can be captured to help tune the code, but such traces require to be compressed. MAQAO can analyze the results and provide hints for tuning the code. We show on some examples how this can help users improve their OpenMP applications.
Execution Model of Three Parallel Languages: OpenMP, UPC and CAF

Directory of Open Access Journals (Sweden)

Ami Marowka

2005-01-01

Full Text Available The aim of this paper is to present a qualitative evaluation of three state-of-the-art parallel languages: OpenMP, Unified Parallel C (UPC and Co-Array Fortran (CAF. OpenMP and UPC are explicit parallel programming languages based on the ANSI standard. CAF is an implicit programming language. On the one hand, OpenMP designs for shared-memory architectures and extends the base-language by using compiler directives that annotate the original source-code. On the other hand, UPC and CAF designs for distribute-shared memory architectures and extends the base-language by new parallel constructs. We deconstruct each language into its basic components, show examples, make a detailed analysis, compare them, and finally draw some conclusions.
A parallel implementation of a maximum entropy reconstruction algorithm for PET images in a visual language

International Nuclear Information System (INIS)

Bastiens, K.; Lemahieu, I.

1994-01-01

The application of a maximum entropy reconstruction algorithm to PET images requires a lot of computing resources. A parallel implementation could seriously reduce the execution time. However, programming a parallel application is still a non trivial task, needing specialized people. In this paper a programming environment based on a visual programming language is used for a parallel implementation of the reconstruction algorithm. This programming environment allows less experienced programmers to use the performance of multiprocessor systems. (authors)
High performance computing of density matrix renormalization group method for 2-dimensional model. Parallelization strategy toward peta computing

International Nuclear Information System (INIS)

Yamada, Susumu; Igarashi, Ryo; Machida, Masahiko; Imamura, Toshiyuki; Okumura, Masahiko; Onishi, Hiroaki

2010-01-01

We parallelize the density matrix renormalization group (DMRG) method, which is a ground-state solver for one-dimensional quantum lattice systems. The parallelization allows us to extend the applicable range of the DMRG to n-leg ladders i.e., quasi two-dimension cases. Such an extension is regarded to bring about several breakthroughs in e.g., quantum-physics, chemistry, and nano-engineering. However, the straightforward parallelization requires all-to-all communications between all processes which are unsuitable for multi-core systems, which is a mainstream of current parallel computers. Therefore, we optimize the all-to-all communications by the following two steps. The first one is the elimination of the communications between all processes by only rearranging data distribution with the communication data amount kept. The second one is the avoidance of the communication conflict by rescheduling the calculation and the communication. We evaluate the performance of the DMRG method on multi-core supercomputers and confirm that our two-steps tuning is quite effective. (author)
Widespread auditory deficits in tune deafness.

Science.gov (United States)

Jones, Jennifer L; Zalewski, Christopher; Brewer, Carmen; Lucker, Jay; Drayna, Dennis

2009-02-01

The goal of this study was to investigate auditory function in individuals with deficits in musical pitch perception. We hypothesized that such individuals have deficits in nonspeech areas of auditory processing. We screened 865 randomly selected individuals to identify those who scored poorly on the Distorted Tunes test (DTT), a measure of musical pitch recognition ability. Those who scored poorly were given a comprehensive audiologic examination, and those with hearing loss or other confounding audiologic factors were excluded from further testing. Thirty-five individuals with tune deafness constituted the experimental group. Thirty-four individuals with normal hearing and normal DTT scores, matched for age, gender, handedness, and education, and without overt or reported psychiatric disorders made up the normal control group. Individual and group performance for pure-tone frequency discrimination at 1000 Hz was determined by measuring the difference limen for frequency (DLF). Auditory processing abilities were assessed using tests of pitch pattern recognition, duration pattern recognition, and auditory gap detection. In addition, we evaluated both attention and short- and long-term memory as variables that might influence performance on our experimental measures. Differences between groups were evaluated statistically using Wilcoxon nonparametric tests and t-tests as appropriate. The DLF at 1000 Hz in the group with tune deafness was significantly larger than that of the normal control group. However, approximately one-third of participants with tune deafness had DLFs within the range of performance observed in the control group. Many individuals with tune deafness also displayed a high degree of variability in their intertrial frequency discrimination performance that could not be explained by deficits in memory or attention. Pitch and duration pattern discrimination and auditory gap-detection ability were significantly poorer in the group with tune deafness
Programs Lucky and LuckyC - 3D parallel transport codes for the multi-group transport equation solution for XYZ geometry by Pm Sn method

International Nuclear Information System (INIS)

Moriakov, A.; Vasyukhno, V.; Netecha, M.; Khacheresov, G.

2003-01-01

Powerful supercomputers are available today. MBC-1000M is one of Russian supercomputers that may be used by distant way access. Programs LUCKY and LUCKY C were created to work for multi-processors systems. These programs have algorithms created especially for these computers and used MPI (message passing interface) service for exchanges between processors. LUCKY may resolved shielding tasks by multigroup discreet ordinate method. LUCKY C may resolve critical tasks by same method. Only XYZ orthogonal geometry is available. Under little space steps to approximate discreet operator this geometry may be used as universal one to describe complex geometrical structures. Cross section libraries are used up to P8 approximation by Legendre polynomials for nuclear data in GIT format. Programming language is Fortran-90. 'Vector' processors may be used that lets get a time profit up to 30 times. But unfortunately MBC-1000M has not these processors. Nevertheless sufficient value for efficiency of parallel calculations was obtained under 'space' (LUCKY) and 'space and energy' (LUCKY C ) paralleling. AUTOCAD program is used to control geometry after a treatment of input data. Programs have powerful geometry module, it is a beautiful tool to achieve any geometry. Output results may be processed by graphic programs on personal computer. (authors)

Parallel processing of neutron transport in fuel assembly calculation

International Nuclear Information System (INIS)

Song, Jae Seung

1992-02-01

Group constants, which are used for reactor analyses by nodal method, are generated by fuel assembly calculations based on the neutron transport theory, since one or a quarter of the fuel assembly corresponds to a unit mesh in the current nodal calculation. The group constant calculation for a fuel assembly is performed through spectrum calculations, a two-dimensional fuel assembly calculation, and depletion calculations. The purpose of this study is to develop a parallel algorithm to be used in a parallel processor for the fuel assembly calculation and the depletion calculations of the group constant generation. A serial program, which solves the neutron integral transport equation using the transmission probability method and the linear depletion equation, was prepared and verified by a benchmark calculation. Small changes from the serial program was enough to parallelize the depletion calculation which has inherent parallel characteristics. In the fuel assembly calculation, however, efficient parallelization is not simple and easy because of the many coupling parameters in the calculation and data communications among CPU's. In this study, the group distribution method is introduced for the parallel processing of the fuel assembly calculation to minimize the data communications. The parallel processing was performed on Quadputer with 4 CPU's operating in NURAD Lab. at KAIST. Efficiencies of 54.3 % and 78.0 % were obtained in the fuel assembly calculation and depletion calculation, respectively, which lead to the overall speedup of about 2.5. As a result, it is concluded that the computing time consumed for the group constant generation can be easily reduced by parallel processing on the parallel computer with small size CPU's
Tune splitting in the presence of linear coupling

International Nuclear Information System (INIS)

Parzen, G.

1991-01-01

The presence of random skew quadrupole field errors will couple the x and y motions. The x and y motions are then each given by the sum of 2 normal modes with the tunes v 1 and v 2 , which may differ appreciably from v x and v y , the unperturbed tunes. This is often called tune splitting since |v 1 - v 2 | is usually larger than |v x - v y |. This tune splitting may be large in proton accelerators using superconducting magnets, because of the relatively large random skew quadrupole field errors that are expected in these magnets. This effect is also increased by the required insertions in proton colliders which generate large β-functions in the insertion region. This tune splitting has been studied in the RHIC accelerator. For RHIC, a tune splitting as large as 0.2 was found in one worse case. A correction system has been developed for correcting this large tune splitting which uses two families of skew quadrupole correctors. It has been found that this correction system corrects most of the large tune splitting, but a residual tune splitting remains that is still appreciable. This paper discusses the corrections to this residual time
Telling in-tune from out-of-tune: widespread evidence for implicit absolute intonation.

Science.gov (United States)

Van Hedger, Stephen C; Heald, Shannon L M; Huang, Alex; Rutstein, Brooke; Nusbaum, Howard C

2017-04-01

Absolute pitch (AP) is the rare ability to name or produce an isolated musical note without the aid of a reference note. One skill thought to be unique to AP possessors is the ability to provide absolute intonation judgments (e.g., classifying an isolated note as "in-tune" or "out-of-tune"). Recent work has suggested that absolute intonation perception among AP possessors is not crystallized in a critical period of development, but is dynamically maintained by the listening environment, in which the vast majority of Western music is tuned to a specific cultural standard. Given that all listeners of Western music are constantly exposed to this specific cultural tuning standard, our experiments address whether absolute intonation perception extends beyond AP possessors. We demonstrate that non-AP listeners are able to accurately judge the intonation of completely isolated notes. Both musicians and nonmusicians showed evidence for absolute intonation recognition when listening to familiar timbres (piano and violin). When testing unfamiliar timbres (triangle and inverted sine waves), only musicians showed weak evidence of absolute intonation recognition (Experiment 2). Overall, these results highlight a previously unknown similarity between AP and non-AP possessors' long-term musical note representations, including evidence of sensitivity to frequency.
High-energy physics software parallelization using database techniques

International Nuclear Information System (INIS)

Argante, E.; Van der Stok, P.D.V.; Willers, I.

1997-01-01

A programming model for software parallelization, called CoCa, is introduced that copes with problems caused by typical features of high-energy physics software. By basing CoCa on the database transaction paradigm, the complexity induced by the parallelization is for a large part transparent to the programmer, resulting in a higher level of abstraction than the native message passing software. CoCa is implemented on a Meiko CS-2 and on a SUN SPARCcenter 2000 parallel computer. On the CS-2, the performance is comparable with the performance of native PVM and MPI. (orig.)
A Tuning Process in a Tunable Archtecture Computer System

OpenAIRE

深沢, 良彰; 岸野, 覚; 門倉, 敏夫

1986-01-01

A tuning process in a tunable archtecture computer is described. We have designed a computer system with tunable archtecture. Main components of this computer are four AM2903 bit-slice chips. The control schema of micro instructions is horizontal-type, and the length of each instruction is 104 bits. Our tunable algorithm utilizes an execution history of machine level instructions, because the execution history can be regarded as a property of the user program. In execution histories of simila...
A parallel implementation of a maximum entropy reconstruction algorithm for PET images in a visual language

Energy Technology Data Exchange (ETDEWEB)

Bastiens, K; Lemahieu, I [University of Ghent - ELIS Department, St. Pietersnieuwstraat 41, B-9000 Ghent (Belgium)

1994-12-31

The application of a maximum entropy reconstruction algorithm to PET images requires a lot of computing resources. A parallel implementation could seriously reduce the execution time. However, programming a parallel application is still a non trivial task, needing specialized people. In this paper a programming environment based on a visual programming language is used for a parallel implementation of the reconstruction algorithm. This programming environment allows less experienced programmers to use the performance of multiprocessor systems. (authors). 8 refs, 3 figs, 1 tab.
Apple iTunes music store

OpenAIRE

Lenzi, R.; Schmucker, M.; Spadoni, F.

2003-01-01

This technical report analyses the Apple iTunes Music Store and its success factors. Besides the technical aspects, user and customer aspects as well as content aspects are considered. Furthermore, iTunes Music Store's impact to online music distribution services is analysed and a short outlook to future music online distribution is given.
Test generation for digital circuits using parallel processing

Science.gov (United States)

Hartmann, Carlos R.; Ali, Akhtar-Uz-Zaman M.

1990-12-01

The problem of test generation for digital logic circuits is an NP-Hard problem. Recently, the availability of low cost, high performance parallel machines has spurred interest in developing fast parallel algorithms for computer-aided design and test. This report describes a method of applying a 15-valued logic system for digital logic circuit test vector generation in a parallel programming environment. A concept called fault site testing allows for test generation, in parallel, that targets more than one fault at a given location. The multi-valued logic system allows results obtained by distinct processors and/or processes to be merged by means of simple set intersections. A machine-independent description is given for the proposed algorithm.
Single-particle dynamics in a nonlinear accelerator lattice: attaining a large tune spread with octupoles in IOTA

Energy Technology Data Exchange (ETDEWEB)

Antipov, S. A.; Nagaitsev, S.; Valishev, A.

2017-04-01

Fermilab is constructing the Integrable Optics Test Accelerator (IOTA) as the centerpiece of the Accelerator R&D Program towards high-intensity circular machines. One of the factors limiting the beam intensity in present circular accelerators is collective instabilities, which can be suppressed by a spread of betatron frequencies (tunes) through the Landau damping mechanism or by an external damper, if the instability is slow enough. The spread is usually created by octupole magnets, which introduce the tune dependence on the amplitude and, in some cases, by a chromatic spread (tune dependence on particle's momentum). The introduction of octupoles usually lead to a resonant behavior and a reduction of the dynamic aperture. One of the goals of the IOTA research program is to achieve a high betatron tune spread, while retaining a large dynamic aperture using conventional octupole magnets in a special but realistic accelerator configuration. In this report, we present results of computer simulations of an electron beam in the IOTA by particle tracking and the Frequency Map Analysis. The results show that the ring's octupole magnets can be configured to provide a betatron tune shift of 0.08 (for particles at large amplitudes) with the dynamical aperture of over 20 beam sigma for a 150-MeV electron beam. The influence of the synchrotron motion, lattice errors, and magnet imperfections is insignificant for the parameters and levels of tolerances set by the design of the ring. The described octupole insert could be beneficial for suppression of space-charge induced instabilities in high intensity machines.
An educational tool for interactive parallel and distributed processing

DEFF Research Database (Denmark)

Pagliarini, Luigi; Lund, Henrik Hautop

2012-01-01

In this article we try to describe how the modular interactive tiles system (MITS) can be a valuable tool for introducing students to interactive parallel and distributed processing programming. This is done by providing a handson educational tool that allows a change in the representation...... of abstract problems related to designing interactive parallel and distributed systems. Indeed, the MITS seems to bring a series of goals into education, such as parallel programming, distributedness, communication protocols, master dependency, software behavioral models, adaptive interactivity, feedback......, connectivity, topology, island modeling, and user and multi-user interaction which can rarely be found in other tools. Finally, we introduce the system of modular interactive tiles as a tool for easy, fast, and flexible hands-on exploration of these issues, and through examples we show how to implement...
High performance parallel computers for science

International Nuclear Information System (INIS)

Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

1989-01-01

This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction
On the effective parallel programming of multi-core processors

NARCIS (Netherlands)

Varbanescu, A.L.

2010-01-01

Multi-core processors are considered now the only feasible alternative to the large single-core processors which have become limited by technological aspects such as power consumption and heat dissipation. However, due to their inherent parallel structure and their diversity, multi-cores are
Parallel processing of Monte Carlo code MCNP for particle transport problem

Energy Technology Data Exchange (ETDEWEB)

Higuchi, Kenji; Kawasaki, Takuji

1996-06-01

It is possible to vectorize or parallelize Monte Carlo codes (MC code) for photon and neutron transport problem, making use of independency of the calculation for each particle. Applicability of existing MC code to parallel processing is mentioned. As for parallel computer, we have used both vector-parallel processor and scalar-parallel processor in performance evaluation. We have made (i) vector-parallel processing of MCNP code on Monte Carlo machine Monte-4 with four vector processors, (ii) parallel processing on Paragon XP/S with 256 processors. In this report we describe the methodology and results for parallel processing on two types of parallel or distributed memory computers. In addition, we mention the evaluation of parallel programming environments for parallel computers used in the present work as a part of the work developing STA (Seamless Thinking Aid) Basic Software. (author)
User-friendly parallelization of GAUDI applications with Python

International Nuclear Information System (INIS)

Mato, Pere; Smith, Eoin

2010-01-01

GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.
User-friendly parallelization of GAUDI applications with Python

Energy Technology Data Exchange (ETDEWEB)

Mato, Pere; Smith, Eoin, E-mail: pere.mato@cern.c [PH Department, CERN, 1211 Geneva 23 (Switzerland)

2010-04-01

GAUDI is a software framework in C++ used to build event data processing applications using a set of standard components with well-defined interfaces. Simulation, high-level trigger, reconstruction, and analysis programs used by several experiments are developed using GAUDI. These applications can be configured and driven by simple Python scripts. Given the fact that a considerable amount of existing software has been developed using serial methodology, and has existed in some cases for many years, implementation of parallelisation techniques at the framework level may offer a way of exploiting current multi-core technologies to maximize performance and reduce latencies without re-writing thousands/millions of lines of code. In the solution we have developed, the parallelization techniques are introduced to the high level Python scripts which configure and drive the applications, such that the core C++ application code requires no modification, and that end users need make only minimal changes to their scripts. The developed solution leverages from existing generic Python modules that support parallel processing. Naturally, the parallel version of a given program should produce results consistent with its serial execution. The evaluation of several prototypes incorporating various parallelization techniques are presented and discussed.
A Performance Tuning Methodology with Compiler Support

Directory of Open Access Journals (Sweden)

Oscar Hernandez

2008-01-01

Full Text Available We have developed an environment, based upon robust, existing, open source software, for tuning applications written using MPI, OpenMP or both. The goal of this effort, which integrates the OpenUH compiler and several popular performance tools, is to increase user productivity by providing an automated, scalable performance measurement and optimization system. In this paper we describe our environment, show how these complementary tools can work together, and illustrate the synergies possible by exploiting their individual strengths and combined interactions. We also present a methodology for performance tuning that is enabled by this environment. One of the benefits of using compiler technology in this context is that it can direct the performance measurements to capture events at different levels of granularity and help assess their importance, which we have shown to significantly reduce the measurement overheads. The compiler can also help when attempting to understand the performance results: it can supply information on how a code was translated and whether optimizations were applied. Our methodology combines two performance views of the application to find bottlenecks. The first is a high level view that focuses on OpenMP/MPI performance problems such as synchronization cost and load imbalances; the second is a low level view that focuses on hardware counter analysis with derived metrics that assess the efficiency of the code. Our experiments have shown that our approach can significantly reduce overheads for both profiling and tracing to acceptable levels and limit the number of times the application needs to be run with selected hardware counters. In this paper, we demonstrate the workings of this methodology by illustrating its use with selected NAS Parallel Benchmarks and a cloud resolving code.
The 2nd Symposium on the Frontiers of Massively Parallel Computations

Science.gov (United States)

Mills, Ronnie (Editor)

1988-01-01

Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
A Novel Parallel Algorithm for Edit Distance Computation

Directory of Open Access Journals (Sweden)

Muhammad Murtaza Yousaf

2018-01-01

Full Text Available The edit distance between two sequences is the minimum number of weighted transformation-operations that are required to transform one string into the other. The weighted transformation-operations are insert, remove, and substitute. Dynamic programming solution to find edit distance exists but it becomes computationally intensive when the lengths of strings become very large. This work presents a novel parallel algorithm to solve edit distance problem of string matching. The algorithm is based on resolving dependencies in the dynamic programming solution of the problem and it is able to compute each row of edit distance table in parallel. In this way, it becomes possible to compute the complete table in min(m,n iterations for strings of size m and n whereas state-of-the-art parallel algorithm solves the problem in max(m,n iterations. The proposed algorithm also increases the amount of parallelism in each of its iteration. The algorithm is also capable of exploiting spatial locality while its implementation. Additionally, the algorithm works in a load balanced way that further improves its performance. The algorithm is implemented for multicore systems having shared memory. Implementation of the algorithm in OpenMP shows linear speedup and better execution time as compared to state-of-the-art parallel approach. Efficiency of the algorithm is also proven better in comparison to its competitor.
Automatic parallelization of while-Loops using speculative execution

International Nuclear Information System (INIS)

Collard, J.F.

1995-01-01

Automatic parallelization of imperative sequential programs has focused on nests of for-loops. The most recent of them consist in finding an affine mapping with respect to the loop indices to simultaneously capture the temporal and spatial properties of the parallelized program. Such a mapping is usually called a open-quotes space-time transformation.close quotes This work describes an extension of these techniques to while-loops using speculative execution. We show that space-time transformations are a good framework for summing up previous restructuration techniques of while-loop, such as pipelining. Moreover, we show that these transformations can be derived and applied automatically
Logical inference techniques for loop parallelization

DEFF Research Database (Denmark)

Oancea, Cosmin Eugen; Rauchwerger, Lawrence

2012-01-01

the parallelization transformation by verifying the independence of the loop's memory references. To this end it represents array references using the USR (uniform set representation) language and expresses the independence condition as an equation, S={}, where S is a set expression representing array indexes. Using...... of their estimated complexities. We evaluate our automated solution on 26 benchmarks from PERFECT-CLUB and SPEC suites and show that our approach is effective in parallelizing large, complex loops and obtains much better full program speedups than the Intel and IBM Fortran compilers....

The minimally tuned minimal supersymmetric standard model

International Nuclear Information System (INIS)

Essig, Rouven; Fortin, Jean-Francois

2008-01-01

The regions in the Minimal Supersymmetric Standard Model with the minimal amount of fine-tuning of electroweak symmetry breaking are presented for general messenger scale. No a priori relations among the soft supersymmetry breaking parameters are assumed and fine-tuning is minimized with respect to all the important parameters which affect electroweak symmetry breaking. The superpartner spectra in the minimally tuned region of parameter space are quite distinctive with large stop mixing at the low scale and negative squark soft masses at the high scale. The minimal amount of tuning increases enormously for a Higgs mass beyond roughly 120 GeV
Application of genetic algorithms to tuning fuzzy control systems

Science.gov (United States)

Espy, Todd; Vombrack, Endre; Aldridge, Jack

1993-01-01

Real number genetic algorithms (GA) were applied for tuning fuzzy membership functions of three controller applications. The first application is our 'Fuzzy Pong' demonstration, a controller that controls a very responsive system. The performance of the automatically tuned membership functions exceeded that of manually tuned membership functions both when the algorithm started with randomly generated functions and with the best manually-tuned functions. The second GA tunes input membership functions to achieve a specified control surface. The third application is a practical one, a motor controller for a printed circuit manufacturing system. The GA alters the positions and overlaps of the membership functions to accomplish the tuning. The applications, the real number GA approach, the fitness function and population parameters, and the performance improvements achieved are discussed. Directions for further research in tuning input and output membership functions and in tuning fuzzy rules are described.
Simplifying the parallelization of scientific codes by a function-centric approach in Python

International Nuclear Information System (INIS)

Nilsen, Jon K; Cai Xing; Langtangen, Hans Petter; Hoeyland, Bjoern

2010-01-01

The purpose of this paper is to show how existing scientific software can be parallelized using a separate thin layer of Python code where all parallelization-specific tasks are implemented. We provide specific examples of such a Python code layer, which can act as templates for parallelizing a wide set of serial scientific codes. The use of Python for parallelization is motivated by the fact that the language is well suited for reusing existing serial codes programmed in other languages. The extreme flexibility of Python with regard to handling functions makes it very easy to wrap up decomposed computational tasks of a serial scientific application as Python functions. Many parallelization-specific components can be implemented as generic Python functions, which may take as input those wrapped functions that perform concrete computational tasks. The overall programming effort needed by this parallelization approach is limited, and the resulting parallel Python scripts have a compact and clean structure. The usefulness of the parallelization approach is exemplified by three different classes of application in natural and social sciences.
Development of parallel 3D discrete ordinates transport program on JASMIN framework

International Nuclear Information System (INIS)

Cheng, T.; Wei, J.; Shen, H.; Zhong, B.; Deng, L.

2015-01-01

A parallel 3D discrete ordinates radiation transport code JSNT-S is developed, aiming at simulating real-world radiation shielding and reactor physics applications in a reasonable time. Through the patch-based domain partition algorithm, the memory requirement is shared among processors and a space-angle parallel sweeping algorithm is developed based on data-driven algorithm. Acceleration methods such as partial current rebalance are implemented. The correctness is proved through the VENUS-3 and other benchmark models. In the radiation shielding calculation of the Qinshan-II reactor pressure vessel model with 24.3 billion DoF, only 88 seconds is required and the overall parallel efficiency of 44% is achieved on 1536 CPU cores. (author)
Revisiting fine-tuning in the MSSM

Energy Technology Data Exchange (ETDEWEB)

Ross, Graham G. [Oxford Univ. (United Kingdom). Rudolf Peierls Centre for Theoretical Physics; Schmidt-Hoberg, Kai [Deutsches Elektronen-Synchrotron (DESY), Hamburg (Germany); Staub, Florian [Karlsruher Institut fuer Technologie (KIT), Karlsruhe (Germany). Inst. fuer Theoretische Physik; Karlsruher Institut fuer Technologie (KIT), Eggenstein-Leopoldshafen (Germany). Inst. fuer Experimentelle Kernphysik

2017-03-15

We evaluate the amount of fine-tuning in constrained versions of the minimal supersymmetric standard model (MSSM), with different boundary conditions at the GUT scale. Specifically we study the fully constrained version as well as the cases of non-universal Higgs and gaugino masses. We allow for the presence of additional non-holomorphic soft-terms which we show further relax the fine-tuning. Of particular importance is the possibility of a Higgsino mass term and we discuss possible origins for such a term in UV complete models. We point out that loop corrections typically lead to a reduction in the fine-tuning by a factor of about two compared to the estimate at tree-level, which has been overlooked in many recent works. Taking these loop corrections into account, we discuss the impact of current limits from SUSY searches and dark matter on the fine-tuning. Contrary to common lore, we find that the MSSM fine-tuning can be as small as 10 while remaining consistent with all experimental constraints. If, in addition, the dark matter abundance is fully explained by the neutralino LSP, the fine-tuning can still be as low as ∝20 in the presence of additional non-holomorphic soft-terms. We also discuss future prospects of these models and find that the MSSM will remain natural even in the case of a non-discovery in the foreseeable future.
Revisiting fine-tuning in the MSSM

Energy Technology Data Exchange (ETDEWEB)

Ross, Graham G. [Rudolf Peierls Centre for Theoretical Physics, University of Oxford, 1 Keble Road, Oxford OX1 3NP (United Kingdom); Schmidt-Hoberg, Kai [DESY, Notkestraße 85, D-22607 Hamburg (Germany); Staub, Florian [Institute for Theoretical Physics (ITP), Karlsruhe Institute of Technology, Engesserstraße 7, D-76128 Karlsruhe (Germany); Institute for Nuclear Physics (IKP), Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, D-76344 Eggenstein-Leopoldshafen (Germany)

2017-03-06

We evaluate the amount of fine-tuning in constrained versions of the minimal supersymmetric standard model (MSSM), with different boundary conditions at the GUT scale. Specifically we study the fully constrained version as well as the cases of non-universal Higgs and gaugino masses. We allow for the presence of additional non-holomorphic soft-terms which we show further relax the fine-tuning. Of particular importance is the possibility of a Higgsino mass term and we discuss possible origins for such a term in UV complete models. We point out that loop corrections typically lead to a reduction in the fine-tuning by a factor of about two compared to the estimate at tree-level, which has been overlooked in many recent works. Taking these loop corrections into account, we discuss the impact of current limits from SUSY searches and dark matter on the fine-tuning. Contrary to common lore, we find that the MSSM fine-tuning can be as small as 10 while remaining consistent with all experimental constraints. If, in addition, the dark matter abundance is fully explained by the neutralino LSP, the fine-tuning can still be as low as ∼20 in the presence of additional non-holomorphic soft-terms. We also discuss future prospects of these models and find that the MSSM will remain natural even in the case of a non-discovery in the foreseeable future.
A mechanism for tuning 5 GHz HTS filters

Energy Technology Data Exchange (ETDEWEB)

Ohsaka, M.; Takeuchi, S.; Ono, S.; Lee, J.H.; Saito, A. [Department of Electrical Engineering, Yamagata University, 4-3-16 Johnan, Yonezawa 992-8510 (Japan); Akasegawa, A.; Yamanaka, K.; Kurihara, K. [Fujitsu LTD., 10-1 Wakamiya, Morinosato, Atsugi, Kanagawa 243-0197 (Japan); Ohshima, S. [Department of Electrical Engineering, Yamagata University, 4-3-16 Johnan, Yonezawa 992-8510 (Japan)], E-mail: ohshima@yz.yamagata-u.ac.jp

2008-09-15

We developed a tuning mechanism of HTS filter with a dielectric tuning plate, dielectric trimming rods, and conducting trimming rods. The tuning plate has windows through which the dielectric and conducting trimming rods pass. The tuning plate was designed for a 3-pole filter with 5 GHz center frequency (f{sub c}) and 100 MHz bandwidth (BW) using a 3-dimensional electromagnetic simulator. We were able to shift the f{sub c} to frequencies below 500 MHz using the tuning plate with a dielectric constant of 45. However, the insertion loss (IL) and the pass-band ripple of the filter became more severe and the BW of the filter was narrower after tuning. We tried to improve the filter properties after tuning using the dielectric and conducting trimming rods. We decreased the IL and the pass-band ripple by adjusting the height of the dielectric trimming rods to above the resonators. Also, the BW was improved by using copper (Cu) trimming rods above the spaces between the resonators. The tuning plate and the trimming rods did not affect the IL. So, we simulated 500 MHz tuning without the filter properties deteriorating at f{sub c} = 5 GHz. Also, we experimentally evaluated that the f{sub c} could be shifted to 340 MHz using the dielectric plate, the pass-band ripple could be decreased by ripple trimming using the dielectric rods, and the BW could be increased 31 MHz by BW trimming using the Cu rods.
Leveraging Parallel Data Processing Frameworks with Verified Lifting

Directory of Open Access Journals (Sweden)

Maaz Bin Safeer Ahmad

2016-11-01

Full Text Available Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific languages that each framework supports. This rewriting–tedious and error-prone–also requires developers to choose the framework that best optimizes performance given a specific workload. This paper describes Casper, a novel compiler that automatically retargets sequential Java code for execution on Hadoop, a parallel data processing framework that implements the MapReduce paradigm. Given a sequential code fragment, Casper uses verified lifting to infer a high-level summary expressed in our program specification language that is then compiled for execution on Hadoop. We demonstrate that Casper automatically translates Java benchmarks into Hadoop. The translated results execute on average 3.3x faster than the sequential implementations and scale better, as well, to larger datasets.
Summary of ATLAS Pythia 8 tunes

CERN Document Server

The ATLAS collaboration

2012-01-01

We summarize the latest ATLAS Pythia 8 minimum bias and underlying event tunes. The Pythia 8 MPI tunes in this note have been constructed for nine different PDFs, making use of a new x-dependent hadronic matter distribution model.
Bayer image parallel decoding based on GPU

Science.gov (United States)

Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

2012-11-01

In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
A method of paralleling computer calculation for two-dimensional kinetic plasma model

International Nuclear Information System (INIS)

Brazhnik, V.A.; Demchenko, V.V.; Dem'yanov, V.G.; D'yakov, V.E.; Ol'shanskij, V.V.; Panchenko, V.I.

1987-01-01

A method for parallel computer calculation and OSIRIS program complex realizing it and designed for numerical plasma simulation by the macroparticle method are described. The calculation can be carried out either with one or simultaneously with two computers BESM-6, that is provided by some package of interacting programs functioning in every computer. Program interaction in every computer is based on event techniques realized in OS DISPAK. Parallel computer calculation with two BESM-6 computers allows to accelerate the computation 1.5 times
Vectorization, parallelization and porting of nuclear codes on the VPP500 system (parallelization). Progress report fiscal 1996

Energy Technology Data Exchange (ETDEWEB)

Watanabe, Hideo; Kawai, Wataru; Nemoto, Toshiyuki [Fujitsu Ltd., Tokyo (Japan); and others

1997-12-01

Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. These results are reported in 3 parts, i.e., the vectorization part, the parallelization part and the porting part. In this report, we describe the parallelization. In this parallelization part, the parallelization of 2-Dimensional relativistic electromagnetic particle code EM2D, Cylindrical Direct Numerical Simulation code CYLDNS and molecular dynamics code for simulating radiation damages in diamond crystals DGR are described. In the vectorization part, the vectorization of two and three dimensional discrete ordinates simulation code DORT-TORT, gas dynamics analysis code FLOWGR and relativistic Boltzmann-Uehling-Uhlenbeck simulation code RBUU are described. And then, in the porting part, the porting of reactor safety analysis code RELAP5/MOD3.2 and RELAP5/MOD3.2.1.2, nuclear data processing system NJOY and 2-D multigroup discrete ordinate transport code TWOTRAN-II are described. And also, a survey for the porting of command-driven interactive data analysis plotting program IPLOT are described. (author)
Parallel computational in nuclear group constant calculation

International Nuclear Information System (INIS)

Su'ud, Zaki; Rustandi, Yaddi K.; Kurniadi, Rizal

2002-01-01

In this paper parallel computational method in nuclear group constant calculation using collision probability method will be discuss. The main focus is on the calculation of collision matrix which need large amount of computational time. The geometry treated here is concentric cylinder. The calculation of collision probability matrix is carried out using semi analytic method using Beckley Naylor Function. To accelerate computation speed some computer parallel used to solve the problem. We used LINUX based parallelization using PVM software with C or fortran language. While in windows based we used socket programming using DELPHI or C builder. The calculation results shows the important of optimal weight for each processor in case there area many type of processor speed
GRADSPMHD: A parallel MHD code based on the SPH formalism

Science.gov (United States)

Vanaverbeke, S.; Keppens, R.; Poedts, S.

2014-03-01

We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a
A proposed experimental search for chameleons using asymmetric parallel plates

International Nuclear Information System (INIS)

Burrage, Clare; Copeland, Edmund J.; Stevenson, James A.

2016-01-01

Light scalar fields coupled to matter are a common consequence of theories of dark energy and attempts to solve the cosmological constant problem. The chameleon screening mechanism is commonly invoked in order to suppress the fifth forces mediated by these scalars, sufficiently to avoid current experimental constraints, without fine tuning. The force is suppressed dynamically by allowing the mass of the scalar to vary with the local density. Recently it has been shown that near future cold atoms experiments using atom-interferometry have the ability to access a large proportion of the chameleon parameter space. In this work we demonstrate how experiments utilising asymmetric parallel plates can push deeper into the remaining parameter space available to the chameleon.
A proposed experimental search for chameleons using asymmetric parallel plates

Energy Technology Data Exchange (ETDEWEB)

Burrage, Clare; Copeland, Edmund J.; Stevenson, James A., E-mail: Clare.Burrage@nottingham.ac.uk, E-mail: ed.copeland@nottingham.ac.uk, E-mail: james.stevenson@nottingham.ac.uk [School of Physics and Astronomy, University of Nottingham, Nottingham, NG7 2RD (United Kingdom)

2016-08-01

Light scalar fields coupled to matter are a common consequence of theories of dark energy and attempts to solve the cosmological constant problem. The chameleon screening mechanism is commonly invoked in order to suppress the fifth forces mediated by these scalars, sufficiently to avoid current experimental constraints, without fine tuning. The force is suppressed dynamically by allowing the mass of the scalar to vary with the local density. Recently it has been shown that near future cold atoms experiments using atom-interferometry have the ability to access a large proportion of the chameleon parameter space. In this work we demonstrate how experiments utilising asymmetric parallel plates can push deeper into the remaining parameter space available to the chameleon.
Programming and Tuning a Quantum Annealing Device to Solve Real World Problems

Science.gov (United States)

Perdomo-Ortiz, Alejandro; O'Gorman, Bryan; Fluegemann, Joseph; Smelyanskiy, Vadim

2015-03-01

Solving real-world applications with quantum algorithms requires overcoming several challenges, ranging from translating the computational problem at hand to the quantum-machine language to tuning parameters of the quantum algorithm that have a significant impact on the performance of the device. In this talk, we discuss these challenges, strategies developed to enhance performance, and also a more efficient implementation of several applications. Although we will focus on applications of interest to NASA's Quantum Artificial Intelligence Laboratory, the methods and concepts presented here apply to a broader family of hard discrete optimization problems, including those that occur in many machine-learning algorithms.
Frequency Tuning of Vibration Absorber Using Topology Optimization

Science.gov (United States)

Harel, Swapnil Subhash

A tuned mass absorber is a system for reducing the amplitude in one oscillator by coupling it to a second oscillator. If tuned correctly, the maximum amplitude of the first oscillator in response to a periodic driver will be lowered, and much of the vibration will be 'transferred' to the second oscillator. The tuned vibration absorber (TVA) has been utilized for vibration control purposes in many sectors of Civil/Automotive/Aerospace Engineering for many decades since its inception. Time and again we come across a situation in which a vibratory system is required to run near resonance. In the past, approaches have been made to design such auxiliary spring mass tuned absorbers for the safety of the structures. This research focuses on the development and optimization of continuously tuned mass absorbers as a substitute to the discretely tuned mass absorbers (spring- mass system). After conducting the study of structural behavior, the boundary condition and frequency to which the absorber is to be tuned are determined. The Modal analysis approach is used to determine mode shapes and frequencies. The absorber is designed and optimized using the topology optimization tool, which simultaneously designs, optimizes and tunes the absorber to the desired frequency. The tuned, optimized absorber, after post processing, is attached to the target structure. The number of the absorbers are increased to amplify bandwidth and thereby upgrade the safety of structure for a wide range of frequency. The frequency response analysis is carried out using various combinations of structure and number of absorber cell.
A high-quality narrow passband filter for elastic SV waves via aligned parallel separated thin polymethylmethacrylate plates

OpenAIRE

Jun Zhang; Yaolu Liu; Wensheng Yan; Ning Hu

2017-01-01

We designed a high-quality filter that consists of aligned parallel polymethylmethacrylate (PMMA) thin plates with small gaps for elastic SV waves propagate in metals. Both the theoretical model and the full numerical simulation show the transmission spectrum of the elastic SV waves through such a filter has several sharp peaks with flawless transmission within the investigated frequencies. These peaks can be readily tuned by manipulating the geometry parameters of the PMMA plates. Our invest...
Parallel Ada benchmarks for the SVMS

Science.gov (United States)

Collard, Philippe E.

1990-01-01

The use of parallel processing paradigm to design and develop faster and more reliable computers appear to clearly mark the future of information processing. NASA started the development of such an architecture: the Spaceborne VHSIC Multi-processor System (SVMS). Ada will be one of the languages used to program the SVMS. One of the unique characteristics of Ada is that it supports parallel processing at the language level through the tasking constructs. It is important for the SVMS project team to assess how efficiently the SVMS architecture will be implemented, as well as how efficiently Ada environment will be ported to the SVMS. AUTOCLASS II, a Bayesian classifier written in Common Lisp, was selected as one of the benchmarks for SVMS configurations. The purpose of the R and D effort was to provide the SVMS project team with the version of AUTOCLASS II, written in Ada, that would make use of Ada tasking constructs as much as possible so as to constitute a suitable benchmark. Additionally, a set of programs was developed that would measure Ada tasking efficiency on parallel architectures as well as determine the critical parameters influencing tasking efficiency. All this was designed to provide the SVMS project team with a set of suitable tools in the development of the SVMS architecture.

From functional programming to multicore parallelism: A case study based on Presburger Arithmetic

DEFF Research Database (Denmark)

Dung, Phan Anh; Hansen, Michael Reichhardt

2011-01-01

, we are interested in using PA in connection with the Duration Calculus Model Checker (DCMC) [5]. There are effective decision procedures for PA including Cooper’s algorithm and the Omega Test; however, their complexity is extremely high with doubly exponential lower bound and triply exponential upper...... bound [7]. We investigate these decision procedures in the context of multicore parallelism with the hope of exploiting multicore powers. Unfortunately, we are not aware of any prior parallelism research related to decision procedures for PA. The closest work is the preliminary results on parallelism...
Efficient tuning in supervised machine learning

NARCIS (Netherlands)

Koch, Patrick

2013-01-01

The tuning of learning algorithm parameters has become more and more important during the last years. With the fast growth of computational power and available memory databases have grown dramatically. This is very challenging for the tuning of parameters arising in machine learning, since the
Introduction to massively-parallel computing in high-energy physics

CERN Document Server

AUTHOR|(CDS)2083520

1993-01-01

Ever since computers were first used for scientific and numerical work, there has existed an "arms race" between the technical development of faster computing hardware, and the desires of scientists to solve larger problems in shorter time-scales. However, the vast leaps in processor performance achieved through advances in semi-conductor science have reached a hiatus as the technology comes up against the physical limits of the speed of light and quantum effects. This has lead all high performance computer manufacturers to turn towards a parallel architecture for their new machines. In these lectures we will introduce the history and concepts behind parallel computing, and review the various parallel architectures and software environments currently available. We will then introduce programming methodologies that allow efficient exploitation of parallel machines, and present case studies of the parallelization of typical High Energy Physics codes for the two main classes of parallel computing architecture (S...
A tuning method for nonuniform traveling-wave accelerating structures

International Nuclear Information System (INIS)

Gong Cunkui; Zheng Shuxin; Shao Jiahang; Jia Xiaoyu; Chen Huaibi

2013-01-01

The tuning method of uniform traveling-wave structures based on non-resonant perturbation field distribution measurement has been widely used in tuning both constant-impedance and constant-gradient structures. In this paper, the method of tuning nonuniform structures is proposed on the basis of the above theory. The internal reflection coefficient of each cell is obtained from analyzing the normalized voltage distribution. A numerical simulation of tuning process according to the coupled cavity chain theory has been done and the result shows each cell is in right phase advance after tuning. The method will be used in the tuning of a disk-loaded traveling-wave structure being developed at the Accelerator Laboratory, Tsinghua University. (authors)
An optimal tuning strategy for tidal turbines

Science.gov (United States)

2016-01-01

Tuning wind and tidal turbines is critical to maximizing their power output. Adopting a wind turbine tuning strategy of maximizing the output at any given time is shown to be an extremely poor strategy for large arrays of tidal turbines in channels. This ‘impatient-tuning strategy’ results in far lower power output, much higher structural loads and greater environmental impacts due to flow reduction than an existing ‘patient-tuning strategy’ which maximizes the power output averaged over the tidal cycle. This paper presents a ‘smart patient tuning strategy’, which can increase array output by up to 35% over the existing strategy. This smart strategy forgoes some power generation early in the half tidal cycle in order to allow stronger flows to develop later in the cycle. It extracts enough power from these stronger flows to produce more power from the cycle as a whole than the existing strategy. Surprisingly, the smart strategy can often extract more power without increasing maximum structural loads on the turbines, while also maintaining stronger flows along the channel. This paper also shows that, counterintuitively, for some tuning strategies imposing a cap on turbine power output to limit loads can increase a turbine’s average power output. PMID:27956870
An optimal tuning strategy for tidal turbines.

Science.gov (United States)

Vennell, Ross

2016-11-01

Tuning wind and tidal turbines is critical to maximizing their power output. Adopting a wind turbine tuning strategy of maximizing the output at any given time is shown to be an extremely poor strategy for large arrays of tidal turbines in channels. This 'impatient-tuning strategy' results in far lower power output, much higher structural loads and greater environmental impacts due to flow reduction than an existing 'patient-tuning strategy' which maximizes the power output averaged over the tidal cycle. This paper presents a 'smart patient tuning strategy', which can increase array output by up to 35% over the existing strategy. This smart strategy forgoes some power generation early in the half tidal cycle in order to allow stronger flows to develop later in the cycle. It extracts enough power from these stronger flows to produce more power from the cycle as a whole than the existing strategy. Surprisingly, the smart strategy can often extract more power without increasing maximum structural loads on the turbines, while also maintaining stronger flows along the channel. This paper also shows that, counterintuitively, for some tuning strategies imposing a cap on turbine power output to limit loads can increase a turbine's average power output.
A layered semantics for a parallel object-oriented language

NARCIS (Netherlands)

P.H.M. America (Pierre); J.J.M.M. Rutten (Jan)

1990-01-01

textabstractWe develop a denotational semantics for POOL, a parallel object-oriented programming language. The main contribution of this semantics is an accurate mathematical model of the most important concept in object-oriented programming: the object. This is achieved by structuring the semantics
Iterative Feedback Tuning in district heating systems; Iterative Feedback Tuning i vaermeproduktionsanlaeggningar

Energy Technology Data Exchange (ETDEWEB)

Raaberg, Martin; Velut, Stephane; Bari, Siavosh Amanat

2010-10-15

The project goal is to evaluate and describe how Iterative Feedback Tuning (IFT) can be used to tune controllers in the typical control loops in heat- and power plants. There are only a few practical studies carried out for IFT and they are not really relevant for power and heat processes. It is the practical problems in implementing the IFT and the result of trimming that is the focus of this project. The project will start with theoretical studies of the IFT-method, then realization and simple simulations in scilab. The IFT equations are then implemented in Freelance 2000, an ABB control system, for practical tests on a SISO- and a MIMO-process. By performing reproducible experiments on the process and analyze the results IFT can adjust the controller parameters to minimize a cost function that represents the control goal. The project selected for SISO experiments a pressure controller in an oil transportation system. By controlling the valve position of a control valve for the reversal to the supply tank, the pressure in the oil transport system is regulated. A disturbance in oil pressure can be achieved by changing the position of a valve that lets oil through to the day tank. The selected MIMO-process is a pre-heater in a degassing process. In this process, a valve on the secondary side is utilized to control the flow in the secondary system. A valve on the primary side is utilized to control the district heating water flow through the heat exchanger to control the temperature on the secondary side. An increased secondary flow increases the heat demand and thus requiring an increase in primary flow to maintain the secondary side outlet temperature. This is the cross-coupling responsible for why it is an advantage to consider the process as multi-variable. Using the IFT method, the two original PID-controllers and a feed-forward controller is tuned simultaneously. IFT-method was difficult to implement but worked well in both simulations and in real processes
Small Commercial Building Re-tuning: A Primer

Energy Technology Data Exchange (ETDEWEB)

Cort, Katherine A.; Hostick, Donna J.; Underhill, Ronald M.; Fernandez, Nicholas; Katipamula, Srinivas

2013-09-30

To help building owners and managers address issues related to energy-efficient operation of small buildings, DOE has developed a Small Building Re-tuning training curriculum. This "primer" provides additional background information to understand some of the concepts presented in the Small Building Re-tuning training. The intent is that those who are less familiar with the buidling energy concepts will review this material before taking the building re-tuning training class.
A comparison of high-order explicit Runge–Kutta, extrapolation, and deferred correction methods in serial and parallel

KAUST Repository

Ketcheson, David I.

2014-06-13

We compare the three main types of high-order one-step initial value solvers: extrapolation, spectral deferred correction, and embedded Runge–Kutta pairs. We consider orders four through twelve, including both serial and parallel implementations. We cast extrapolation and deferred correction methods as fixed-order Runge–Kutta methods, providing a natural framework for the comparison. The stability and accuracy properties of the methods are analyzed by theoretical measures, and these are compared with the results of numerical tests. In serial, the eighth-order pair of Prince and Dormand (DOP8) is most efficient. But other high-order methods can be more efficient than DOP8 when implemented in parallel. This is demonstrated by comparing a parallelized version of the wellknown ODEX code with the (serial) DOP853 code. For an N-body problem with N = 400, the experimental extrapolation code is as fast as the tuned Runge–Kutta pair at loose tolerances, and is up to two times as fast at tight tolerances.
Parallelization of a numerical simulation code for isotropic turbulence

International Nuclear Information System (INIS)

Sato, Shigeru; Yokokawa, Mitsuo; Watanabe, Tadashi; Kaburaki, Hideo.

1996-03-01

A parallel pseudospectral code which solves the three-dimensional Navier-Stokes equation by direct numerical simulation is developed and execution time, parallelization efficiency, load balance and scalability are evaluated. A vector parallel supercomputer, Fujitsu VPP500 with up to 16 processors is used for this calculation for Fourier modes up to 256x256x256 using 16 processors. Good scalability for number of processors is achieved when number of Fourier mode is fixed. For small Fourier modes, calculation time of the program is proportional to NlogN which is ideal complexity of calculation for 3D-FFT on vector parallel processors. It is found that the calculation performance decreases as the increase of the Fourier modes. (author)
Visual analysis of inter-process communication for large-scale parallel computing.

Science.gov (United States)

Muelder, Chris; Gygi, Francois; Ma, Kwan-Liu

2009-01-01

In serial computation, program profiling is often helpful for optimization of key sections of code. When moving to parallel computation, not only does the code execution need to be considered but also communication between the different processes which can induce delays that are detrimental to performance. As the number of processes increases, so does the impact of the communication delays on performance. For large-scale parallel applications, it is critical to understand how the communication impacts performance in order to make the code more efficient. There are several tools available for visualizing program execution and communications on parallel systems. These tools generally provide either views which statistically summarize the entire program execution or process-centric views. However, process-centric visualizations do not scale well as the number of processes gets very large. In particular, the most common representation of parallel processes is a Gantt char t with a row for each process. As the number of processes increases, these charts can become difficult to work with and can even exceed screen resolution. We propose a new visualization approach that affords more scalability and then demonstrate it on systems running with up to 16,384 processes.
Oracle SQL Tuning pocket Reference

CERN Document Server

Gurry, Mark

2002-01-01

One of the most important challenges faced by Oracle database administrators and Oracle developers is the need to tune SQL statements so that they execute efficiently. Poorly tuned SQL statements are one of the leading causes of substandard database performance and poor response time. SQL statements that perform poorly result in frustration for users, and can even prevent a company from serving its customers in a timely manner
Final module tuning of the 805 MHz side-coupled cavities for the Fermilab linac group

International Nuclear Information System (INIS)

Qian, Z.; Champion, M.; Miller, H.W.; Moretti, A.; Padilla, R.

1992-01-01

As part of the Fermilab Tevatron collider upgrade program the last four linac drift-tube tanks are to be replaced with seven side-coupled cavity modules that will operate at an accelerating gradient of 8 MV/V. Each module is composed of four accelerating sections connected by three bridge couplers and is driven by a 12 MW 805 MHz klystron rf power supply. Sixteen accelerating cells and fifteen coupling cells are brazed into an accelerating section. The modules were tuned such that the π/2 mode of each section and the TM 010 mode of the individual bridge coupler agreed within 2 KHz of the module accelerating mode, the accelerating cell frequency was tuned within ± % KHz and the section stopbands were 50-100 KHz under vacuum. The main cell rms field deviation was in general <1% within any section and the section average rms field deviation was in all but one case <1%. The phase shift from section to section was tuned to <1 degree. The coupling between waveguide and cavity was tuned to match the 30 ma beam loading. 3 tabs., 4 figs., 6 refs
A scalable implementation of RI-SCF on parallel computers

International Nuclear Information System (INIS)

Fruechtl, H.A.; Kendall, R.A.; Harrison, R.J.

1996-01-01

In order to avoid the integral bottleneck of conventional SCF calculations, the Resolution of the Identity (RI) method is used to obtain an approximate solution to the Hartree-Fock equations. In this approximation only three-center integrals are needed to build the Fock matrix. It has been implemented as part of the NWChem package of portable and scalable ab initio programs for parallel computers. Utilizing the V-approximation, both the Coulomb and exchange contribution to the Fock matrix can be calculated from a transformed set of three-center integrals which have to be precalculated and stored. A distributed in-core method as well as a disk based implementation have been programmed. Details of the implementation as well as the parallel programming tools used are described. We also give results and timings from benchmark calculations
Stress-tuned conductor-polymer composite for use in sensors

Science.gov (United States)

Martin, James E; Read, Douglas H

2013-10-22

A method for making a composite polymeric material with electrical conductivity determined by stress-tuning of the conductor-polymer composite, and sensors made with the stress-tuned conductor-polymer composite made by this method. Stress tuning is achieved by mixing a miscible liquid into the polymer precursor solution or by absorbing into the precursor solution a soluble compound from vapor in contact with the polymer precursor solution. The conductor may or may not be ordered by application of a magnetic field. The composite is formed by polymerization with the stress-tuning agent in the polymer matrix. The stress-tuning agent is removed following polymerization to produce a conductor-polymer composite with a stress field that depends on the amount of stress-tuning agent employed.
Parallel Nonlinear Optimization for Astrodynamic Navigation, Phase I

Data.gov (United States)

National Aeronautics and Space Administration — CU Aerospace proposes the development of a new parallel nonlinear program (NLP) solver software package. NLPs allow the solution of complex optimization problems,...
New ATLAS event generator tunes to 2010 data

CERN Document Server

The ATLAS collaboration

2011-01-01

This note describes the Monte Carlo event generator tunings for the Pythia 6 and Herwig/Jimmy generators in the ATLAS MC11 simulation production. New tunes have been produced for these generators, making maximal use of available published data from ATLAS and from the Tevatron and LEP experiments. Particular emphasis has been placed on improvement of the description of e+ e− event shape and jet rate data, and on description of hadron collider event shape observables in Pythia, as well as the established procedure of tuning the multiple parton interactions of both models to describe underlying event and minimum bias data. The tuning of Pythia is provided at this time for the MRST LO∗∗ PDF, while the purely MPI tune of Herwig/Jimmy is performed for ten different PDFs.
The new landscape of parallel computer architecture

International Nuclear Information System (INIS)

Shalf, John

2007-01-01

The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models
The new landscape of parallel computer architecture

Energy Technology Data Exchange (ETDEWEB)

Shalf, John [NERSC Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley California, 94720 (United States)

2007-07-15

The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.

GASPRNG: GPU accelerated scalable parallel random number generator library

Science.gov (United States)

Gao, Shuang; Peterson, Gregory D.

2013-04-01

Graphics processors represent a promising technology for accelerating computational science applications. Many computational science applications require fast and scalable random number generation with good statistical properties, so they use the Scalable Parallel Random Number Generators library (SPRNG). We present the GPU Accelerated SPRNG library (GASPRNG) to accelerate SPRNG in GPU-based high performance computing systems. GASPRNG includes code for a host CPU and CUDA code for execution on NVIDIA graphics processing units (GPUs) along with a programming interface to support various usage models for pseudorandom numbers and computational science applications executing on the CPU, GPU, or both. This paper describes the implementation approach used to produce high performance and also describes how to use the programming interface. The programming interface allows a user to be able to use GASPRNG the same way as SPRNG on traditional serial or parallel computers as well as to develop tightly coupled programs executing primarily on the GPU. We also describe how to install GASPRNG and use it. To help illustrate linking with GASPRNG, various demonstration codes are included for the different usage models. GASPRNG on a single GPU shows up to 280x speedup over SPRNG on a single CPU core and is able to scale for larger systems in the same manner as SPRNG. Because GASPRNG generates identical streams of pseudorandom numbers as SPRNG, users can be confident about the quality of GASPRNG for scalable computational science applications. Catalogue identifier: AEOI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOI_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: UTK license. No. of lines in distributed program, including test data, etc.: 167900 No. of bytes in distributed program, including test data, etc.: 1422058 Distribution format: tar.gz Programming language: C and CUDA. Computer: Any PC or
Frequency tuning allows flow direction control in microfluidic networks with passive features.

Science.gov (United States)

Jain, Rahil; Lutz, Barry

2017-05-02

Frequency tuning has emerged as an attractive alternative to conventional pumping techniques in microfluidics. Oscillating (AC) flow driven through a passive valve can be rectified to create steady (DC) flow, and tuning the excitation frequency to the characteristic (resonance) frequency of the underlying microfluidic network allows control of flow magnitude using simple hardware, such as an on-chip piezo buzzer. In this paper, we report that frequency tuning can also be used to control the direction (forward or backward) of the rectified DC flow in a single device. Initially, we observed that certain devices provided DC flow in the "forward" direction expected from previous work with a similar valve geometry, and the maximum DC flow occurred at the same frequency as a prominent peak in the AC flow magnitude, as expected. However, devices of a slightly different geometry provided the DC flow in the opposite direction and at a frequency well below the peak AC flow. Using an equivalent electrical circuit model, we found that the "forward" DC flow occurred at the series resonance frequency (with large AC flow peak), while the "backward" DC flow occurred at a less obvious parallel resonance (a valley in AC flow magnitude). We also observed that the DC flow occurred only when there was a measurable differential in the AC flow magnitude across the valve, and the DC flow direction was from the channel with large AC flow magnitude to that with small AC flow magnitude. Using these observations and the AC flow predictions from the equivalent circuit model, we designed a device with an AC flowrate frequency profile that was expected to allow the DC flow in opposite directions at two distinct frequencies. The fabricated device showed the expected flow reversal at the expected frequencies. This approach expands the flow control toolkit to include both magnitude and direction control in frequency-tuned microfluidic pumps. The work also raises interesting questions about the
Spin tune dependence on closed orbit in RHIC

International Nuclear Information System (INIS)

Ptitsyn, V.; Bai, M.; Roser, T.

2010-01-01

Polarized proton beams are accelerated in RHIC to 250 GeV energy with the help of Siberian Snakes. The pair of Siberian Snakes in each RHIC ring holds the design spin tune at 1/2 to avoid polarization loss during acceleration. However, in the presence of closed orbit errors, the actual spin tune can be shifted away from the exact 1/2 value. It leads to a corresponding shift of locations of higher-order ('snake') resonances and limits the available betatron tune space. The largest closed orbit effect on the spin tune comes from the horizontal orbit angle between the two snakes. During RHIC Run in 2009 dedicated measurements with polarized proton beams were taken to verify the dependence of the spin tune on the local orbits at the Snakes. The experimental results are presented along with the comparison with analytical predictions.
Bi-directional series-parallel elastic actuator and overlap of the actuation layers.

Science.gov (United States)

Furnémont, Raphaël; Mathijssen, Glenn; Verstraten, Tom; Lefeber, Dirk; Vanderborght, Bram

2016-01-27

Several robotics applications require high torque-to-weight ratio and energy efficient actuators. Progress in that direction was made by introducing compliant elements into the actuation. A large variety of actuators were developed such as series elastic actuators (SEAs), variable stiffness actuators and parallel elastic actuators (PEAs). SEAs can reduce the peak power while PEAs can reduce the torque requirement on the motor. Nonetheless, these actuators still cannot meet performances close to humans. To combine both advantages, the series parallel elastic actuator (SPEA) was developed. The principle is inspired from biological muscles. Muscles are composed of motor units, placed in parallel, which are variably recruited as the required effort increases. This biological principle is exploited in the SPEA, where springs (layers), placed in parallel, can be recruited one by one. This recruitment is performed by an intermittent mechanism. This paper presents the development of a SPEA using the MACCEPA principle with a self-closing mechanism. This actuator can deliver a bi-directional output torque, variable stiffness and reduced friction. The load on the motor can also be reduced, leading to a lower power consumption. The variable recruitment of the parallel springs can also be tuned in order to further decrease the consumption of the actuator for a given task. First, an explanation of the concept and a brief description of the prior work done will be given. Next, the design and the model of one of the layers will be presented. The working principle of the full actuator will then be given. At the end of this paper, experiments showing the electric consumption of the actuator will display the advantage of the SPEA over an equivalent stiff actuator.
Visual Neurons in the Superior Colliculus Innervated by Islet2+ or Islet2− Retinal Ganglion Cells Display Distinct Tuning Properties

Directory of Open Access Journals (Sweden)

Rachel B. Kay

2017-10-01

Full Text Available Throughout the visual system, different subtypes of neurons are tuned to distinct aspects of the visual scene, establishing parallel circuits. Defining the mechanisms by which such tuning arises has been a long-standing challenge for neuroscience. To investigate this, we have focused on the retina’s projection to the superior colliculus (SC, where multiple visual neuron subtypes have been described. The SC receives inputs from a variety of retinal ganglion cell (RGC subtypes; however, which RGCs drive the tuning of different SC neurons remains unclear. Here, we pursued a genetic approach that allowed us to determine the tuning properties of neurons innervated by molecularly defined subpopulations of RGCs. In homozygous Islet2-EphA3 knock-in (Isl2EA3/EA3 mice, Isl2+ and Isl2− RGCs project to non-overlapping sub-regions of the SC. Based on molecular and anatomic data, we show that significantly more Isl2− RGCs are direction-selective (DS in comparison with Isl2+ RGCs. Targeted recordings of visual responses from each SC sub-region in Isl2EA3/EA3 mice revealed that Isl2− RGC-innervated neurons were significantly more DS than those innervated by Isl2+ RGCs. Axis-selective (AS neurons were found in both sub-regions, though AS neurons innervated by Isl2+ RGCs were more tightly tuned. Despite this segregation, DS and AS neurons innervated by Isl2+ or Isl2− RGCs did not differ in their spatial summation or spatial frequency (SF tuning. Further, we did not observe alterations in receptive field (RF size or structure of SC neurons innervated by Isl2+ or Isl2− RGCs. Together, these data show that innervation by Isl2+ and Isl2− RGCs results in distinct tuning in the SC and set the stage for future studies investigating the mechanisms by which these circuits are built.
Integrated unaligned resonant modulator tuning

Energy Technology Data Exchange (ETDEWEB)

Zortman, William A.; Lentine, Anthony L.

2017-10-03

Methods and systems for tuning a resonant modulator are disclosed. One method includes receiving a carrier signal modulated by the resonant modulator with a stream of data having an approximately equal number of high and low bits, determining an average power of the modulated carrier signal, comparing the average power to a predetermined threshold, and operating a tuning device coupled to the resonant modulator based on the comparison of the average power and the predetermined threshold. One system includes an input structure, a plurality of processing elements, and a digital control element. The input structure is configured to receive, from the resonant modulator, a modulated carrier signal. The plurality of processing elements are configured to determine an average power of the modulated carrier signal. The digital control element is configured to operate a tuning device coupled to the resonant modulator based on the average power of the modulated carrier signal.
Efficient Thermal Tuning Employing Metallic Microheater With Slow Light Effect

DEFF Research Database (Denmark)

Yan, Siqi; Chen, Hao; Gao, Shengqian

2018-01-01

Thermal tuning acts as one of the most fundamental roles in integrated silicon photonics since it can provide flexibility and reconfigurability. Low tuning power and fast tuning speed are long-term pursuing goals in terms of the performance of the thermal tuning. Here we propose and experimentall...
Computer codes for automatic tuning of the beam transport at the UNILAC

International Nuclear Information System (INIS)

Dahl, L.; Ehrich, A.

1984-01-01

For application in routine operation fully automatic computer controlled algorithms are developed for tuning of beam transport elements at the Unilac. Computations, based on emittance measurements, simulate the beam behaviour and evaluate quadrupole settings, in order to produce defined beam properties at specified positions along the accelerator. The interactive program is controlled using a graphic display on which the beam emittances and envelopes are plotted. To align the beam onto the ion-optical axis of the accelerator two automatic computer controlled procedures have been developed. The misalignment of the beam is determined by variation of quadrupole or steering magnet settings with simultaneous measurement of the beam distribution on profile grids. According to the result a pair of steering magnet settings are adjusted to bend the beam on the axis. The effects of computer controlled tuning on beam quality and operation are reported
An Automatic Instruction-Level Parallelization of Machine Code

Directory of Open Access Journals (Sweden)

MARINKOVIC, V.

2018-02-01

Full Text Available Prevailing multicores and novel manycores have made a great challenge of modern day - parallelization of embedded software that is still written as sequential. In this paper, automatic code parallelization is considered, focusing on developing a parallelization tool at the binary level as well as on the validation of this approach. The novel instruction-level parallelization algorithm for assembly code which uses the register names after SSA to find independent blocks of code and then to schedule independent blocks using METIS to achieve good load balance is developed. The sequential consistency is verified and the validation is done by measuring the program execution time on the target architecture. Great speedup, taken as the performance measure in the validation process, and optimal load balancing are achieved for multicore RISC processors with 2 to 16 cores (e.g. MIPS, MicroBlaze, etc.. In particular, for 16 cores, the average speedup is 7.92x, while in some cases it reaches 14x. An approach to automatic parallelization provided by this paper is useful to researchers and developers in the area of parallelization as the basis for further optimizations, as the back-end of a compiler, or as the code parallelization tool for an embedded system.
Online control loop tuning in Pickering Nuclear Generating Stations

International Nuclear Information System (INIS)

Yu, K.X.; Harrington, S.

2008-01-01

Most analog controllers in the Pickering B Nuclear Generating Stations adopted PID control scheme. In replacing the analog controllers with digital controllers, the PID control strategies, including the original tuning parameters were retained. The replacement strategy resulted in minimum effort on control loop tuning. In a few cases, however, it was found during commissioning that control loop tuning was required as a result of poor control loop performance, typically due to slow response and controlled process oscillation. Several factors are accounted for the necessities of control loop re-tuning. Our experience in commissioning the digital controllers showed that online control tuning posted some challenges in nuclear power plant. (author)
Parallel R-matrix computation

International Nuclear Information System (INIS)

Heggarty, J.W.

1999-06-01

For almost thirty years, sequential R-matrix computation has been used by atomic physics research groups, from around the world, to model collision phenomena involving the scattering of electrons or positrons with atomic or molecular targets. As considerable progress has been made in the understanding of fundamental scattering processes, new data, obtained from more complex calculations, is of current interest to experimentalists. Performing such calculations, however, places considerable demands on the computational resources to be provided by the target machine, in terms of both processor speed and memory requirement. Indeed, in some instances the computational requirements are so great that the proposed R-matrix calculations are intractable, even when utilising contemporary classic supercomputers. Historically, increases in the computational requirements of R-matrix computation were accommodated by porting the problem codes to a more powerful classic supercomputer. Although this approach has been successful in the past, it is no longer considered to be a satisfactory solution due to the limitations of current (and future) Von Neumann machines. As a consequence, there has been considerable interest in the high performance multicomputers, that have emerged over the last decade which appear to offer the computational resources required by contemporary R-matrix research. Unfortunately, developing codes for these machines is not as simple a task as it was to develop codes for successive classic supercomputers. The difficulty arises from the considerable differences in the computing models that exist between the two types of machine and results in the programming of multicomputers to be widely acknowledged as a difficult, time consuming and error-prone task. Nevertheless, unless parallel R-matrix computation is realised, important theoretical and experimental atomic physics research will continue to be hindered. This thesis describes work that was undertaken in
Topological evolution and photoluminescent properties of a series of divalent zinc-based metal–organic frameworks tuned via ancillary ligating spacers

Energy Technology Data Exchange (ETDEWEB)

Lian, Xiao-Min; Zhao, Wen [Shanghai Key Laboratory of Green Chemistry and Chemical Processes, Department of Chemistry, East China Normal University, 3663 North Zhongshan Road, Shanghai 200062 (China); Zhao, Xiao-Li, E-mail: xlzhao@chem.ecnu.edu.cn [Shanghai Key Laboratory of Green Chemistry and Chemical Processes, Department of Chemistry, East China Normal University, 3663 North Zhongshan Road, Shanghai 200062 (China)

2013-04-15

The combination of divalent zinc ions, 4-(4-carboxybenzamido)benzoic acid and exo-bidendate bipyridine ligands gave rise to a series of new MOFs: [ZnL(bipy)]·DMF·H{sub 2}O (1), [ZnL(bpe)]·1.5H{sub 2}O (2), [ZnL(bpa)]·4H{sub 2}O (3) and [ZnL(bpp)]·1.75H{sub 2}O (4) (MOF=metal-organic framework, bipy=4,4′-bipyridine, bpe=trans-1,2-bis(4-pyridyl)ethylene, bpa=1,2-bis(4-pyridinyl)ethane, bpp=1,3-bis(4-pyridinyl)propane, H{sub 2}L=4,4′-(carbonylimino)dibenzoic acid). Fine tune over the topology of the MOFs was achieved via systematically varying the geometric length of the second ligating bipyridine ligands. Single-crystal X-ray analysis reveals that complex 1 has a triply interpenetrated three-dimensional (3D) framework with elongated primitive cubic topology, whereas isostructural complexes 2 and 3 each possesses a 6-fold interpenetrated diamondiod 3D framework. Further expansion of the length of the bipyridine ligand to bpp leads to the formation of 4, which features an interesting entangled architecture of 2D→3D parallel polycatenation. In addition, the thermogravimetric analyses and solid-state photoluminescent properties of the selected complexes are investigated. - Graphical abstract: The incorporation of exo-bidendate bipyridine spacers into the Zn–H{sub 2}L system has yielded a series of new MOFs exhibiting topological evolution from 3-fold interpenetration to 6-fold interpenetration and 2D→3D parallel polycatenation. Highlights: ► The effect of the pyridyl-based spacers on the formation of MOFs was explored. ► Fine tune over the topology of the MOFs was achieved. ► An interesting structure of 2D→3D parallel polycatenation is reported.
Topological evolution and photoluminescent properties of a series of divalent zinc-based metal–organic frameworks tuned via ancillary ligating spacers

International Nuclear Information System (INIS)

Lian, Xiao-Min; Zhao, Wen; Zhao, Xiao-Li

2013-01-01

The combination of divalent zinc ions, 4-(4-carboxybenzamido)benzoic acid and exo-bidendate bipyridine ligands gave rise to a series of new MOFs: [ZnL(bipy)]·DMF·H 2 O (1), [ZnL(bpe)]·1.5H 2 O (2), [ZnL(bpa)]·4H 2 O (3) and [ZnL(bpp)]·1.75H 2 O (4) (MOF=metal-organic framework, bipy=4,4′-bipyridine, bpe=trans-1,2-bis(4-pyridyl)ethylene, bpa=1,2-bis(4-pyridinyl)ethane, bpp=1,3-bis(4-pyridinyl)propane, H 2 L=4,4′-(carbonylimino)dibenzoic acid). Fine tune over the topology of the MOFs was achieved via systematically varying the geometric length of the second ligating bipyridine ligands. Single-crystal X-ray analysis reveals that complex 1 has a triply interpenetrated three-dimensional (3D) framework with elongated primitive cubic topology, whereas isostructural complexes 2 and 3 each possesses a 6-fold interpenetrated diamondiod 3D framework. Further expansion of the length of the bipyridine ligand to bpp leads to the formation of 4, which features an interesting entangled architecture of 2D→3D parallel polycatenation. In addition, the thermogravimetric analyses and solid-state photoluminescent properties of the selected complexes are investigated. - Graphical abstract: The incorporation of exo-bidendate bipyridine spacers into the Zn–H 2 L system has yielded a series of new MOFs exhibiting topological evolution from 3-fold interpenetration to 6-fold interpenetration and 2D→3D parallel polycatenation. Highlights: ► The effect of the pyridyl-based spacers on the formation of MOFs was explored. ► Fine tune over the topology of the MOFs was achieved. ► An interesting structure of 2D→3D parallel polycatenation is reported
A massively-parallel electronic-structure calculations based on real-space density functional theory

International Nuclear Information System (INIS)

Iwata, Jun-Ichi; Takahashi, Daisuke; Oshiyama, Atsushi; Boku, Taisuke; Shiraishi, Kenji; Okada, Susumu; Yabana, Kazuhiro

2010-01-01

Based on the real-space finite-difference method, we have developed a first-principles density functional program that efficiently performs large-scale calculations on massively-parallel computers. In addition to efficient parallel implementation, we also implemented several computational improvements, substantially reducing the computational costs of O(N 3 ) operations such as the Gram-Schmidt procedure and subspace diagonalization. Using the program on a massively-parallel computer cluster with a theoretical peak performance of several TFLOPS, we perform electronic-structure calculations for a system consisting of over 10,000 Si atoms, and obtain a self-consistent electronic-structure in a few hundred hours. We analyze in detail the costs of the program in terms of computation and of inter-node communications to clarify the efficiency, the applicability, and the possibility for further improvements.
Mathematical Methods and Algorithms of Mobile Parallel Computing on the Base of Multi-core Processors

Directory of Open Access Journals (Sweden)

Alexander B. Bakulev

2012-11-01

Full Text Available This article deals with mathematical models and algorithms, providing mobility of sequential programs parallel representation on the high-level language, presents formal model of operation environment processes management, based on the proposed model of programs parallel representation, presenting computation process on the base of multi-core processors.
Heavy superpartners with less tuning from hidden sector renormalisation

International Nuclear Information System (INIS)

Hardy, Edward

2014-01-01

In supersymmetric extensions of the Standard Model, superpartner masses consistent with collider bounds typically introduce significant tuning of the electroweak scale. We show that hidden sector renormalisation can greatly reduce such a tuning if the supersymmetry breaking, or mediating, sector runs through a region of strong coupling not far from the weak scale. In the simplest models, only the tuning due to the gaugino masses is improved, and a weak scale gluino mass in the region of 5 TeV may be obtained with an associated tuning of only one part in ten. In models with more complex couplings between the visible and hidden sectors, the tuning with respect to sfermions can also be reduced. We give an example of a model, with low scale gauge mediation and superpartner masses allowed by current LHC bounds, that has an overall tuning of one part in twenty
An object-oriented bulk synchronous parallel library for multicore programming

NARCIS (Netherlands)

Yzelman, A.N.; Bisseling, R.H.

2012-01-01

We show that the bulk synchronous parallel (BSP) model, originally designed for distributed-memory systems, is also applicable for shared-memory multicore systems and, furthermore, that BSP libraries are useful in scientific computing on these systems. A proof-of-concept MulticoreBSP library has
Neurite, a finite difference large scale parallel program for the simulation of electrical signal propagation in neurites under mechanical loading.

Directory of Open Access Journals (Sweden)

Julián A García-Grajales

Full Text Available With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite--explicit and implicit--were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon
Mammalian odorant receptor tuning breadth persists across distinct odorant panels.

Directory of Open Access Journals (Sweden)

Devin Kepchia

Full Text Available The molecular receptive range (MRR of a mammalian odorant receptor (OR is the set of odorant structures that activate the OR, while the distribution of these odorant structures across odor space is the tuning breadth of the OR. Variation in tuning breadth is thought to be an important property of ORs, with the MRRs of these receptors varying from narrowly to broadly tuned. However, defining the tuning breadth of an OR is a technical challenge. For practical reasons, a screening panel that broadly covers odor space must be limited to sparse coverage of the many potential structures in that space. When screened with such a panel, ORs with different odorant specificities, but equal tuning breadths, might appear to have different tuning breadths due to chance. We hypothesized that ORs would maintain their tuning breadths across distinct odorant panels. We constructed a new screening panel that was broadly distributed across an estimated odor space and contained compounds distinct from previous panels. We used this new screening panel to test several murine ORs that were previously characterized as having different tuning breadths. ORs were expressed in Xenopus laevis oocytes and assayed by two-electrode voltage clamp electrophysiology. MOR256-17, an OR previously characterized as broadly tuned, responded to nine novel compounds from our new screening panel that were structurally diverse and broadly dispersed across an estimated odor space. MOR256-22, an OR previously characterized as narrowly tuned, responded to a single novel compound that was structurally similar to a previously known ligand for this receptor. MOR174-9, a well-characterized receptor with a narrowly tuned MRR, did not respond to any novel compounds in our new panel. These results support the idea that variation in tuning breadth among these three ORs is not an artifact of the screening protocol, but is an intrinsic property of the receptors.
FPA Tuned Fuzzy Logic Controlled Synchronous Buck Converter for a Wave/SC Energy System

Directory of Open Access Journals (Sweden)

SAHIN, E.

2017-02-01

Full Text Available This paper presents a flower pollination algorithm (FPA tuned fuzzy logic controlled (FLC synchronous buck converter (SBC for an integrated wave/ supercapacitor (SC hybrid energy system. In order to compensate the irregular wave effects on electrical side of the wave energy converter (WEC, a SC unit charged by solar panels is connected in parallel to the WEC system and a SBC is controlled to provide more reliable and stable voltage to the DC load. In order to test the performance of the designed FLC, a classical proportional-integral-derivative (PID controller is also employed. Both of the controllers are optimized by FPA which is a pretty new optimization algorithm and a well-known optimization algorithm of which particle swarm optimization (PSO to minimize the integral of time weighted absolute error (ITAE performance index. Also, the other error-based objective functions are considered. The entire energy system and controllers are developed in Matlab/Simulink and realized experimentally. Real time applications are done through DS1104 Controller Board. The simulation and experimental results show that FPA tuned fuzzy logic controller provides lower value performance indices than conventional PID controller by reducing output voltage sags and swells of the wave/SC energy system.

Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

Science.gov (United States)

Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

2016-01-01

Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.
Parallelization and automatic data distribution for nuclear reactor simulations

Energy Technology Data Exchange (ETDEWEB)

Liebrock, L.M. [Liebrock-Hicks Research, Calumet, MI (United States)

1997-07-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.
Parallelization and automatic data distribution for nuclear reactor simulations

International Nuclear Information System (INIS)

Liebrock, L.M.

1997-01-01

Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directly affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed
A Topological Model for Parallel Algorithm Design

Science.gov (United States)

1991-09-01

effort should be directed to planning, requirements analysis, specification and design, with 20% invested into the actual coding, and then the final 40...be olle more language to learn. And by investing the effort into improving the utility of ai, existing language instead of creating a new one, this...193) it abandons the notion of a process as a fundemental concept of parallel program design and that it facilitates program derivation by rigorously
Application of Pfortran and Co-Array Fortran in the Parallelization of the GROMOS96 Molecular Dynamics Module

Directory of Open Access Journals (Sweden)

Piotr Bała

2001-01-01

Full Text Available After at least a decade of parallel tool development, parallelization of scientific applications remains a significant undertaking. Typically parallelization is a specialized activity supported only partially by the programming tool set, with the programmer involved with parallel issues in addition to sequential ones. The details of concern range from algorithm design down to low-level data movement details. The aim of parallel programming tools is to automate the latter without sacrificing performance and portability, allowing the programmer to focus on algorithm specification and development. We present our use of two similar parallelization tools, Pfortran and Cray's Co-Array Fortran, in the parallelization of the GROMOS96 molecular dynamics module. Our parallelization started from the GROMOS96 distribution's shared-memory implementation of the replicated algorithm, but used little of that existing parallel structure. Consequently, our parallelization was close to starting with the sequential version. We found the intuitive extensions to Pfortran and Co-Array Fortran helpful in the rapid parallelization of the project. We present performance figures for both the Pfortran and Co-Array Fortran parallelizations showing linear speedup within the range expected by these parallelization methods.
Parallel computing for data science with examples in R, C++ and CUDA

CERN Document Server

Matloff, Norman

2015-01-01

Parallel Computing for Data Science: With Examples in R, C++ and CUDA is one of the first parallel computing books to concentrate exclusively on parallel data structures, algorithms, software tools, and applications in data science. It includes examples not only from the classic ""n observations, p variables"" matrix format but also from time series, network graph models, and numerous other structures common in data science. The examples illustrate the range of issues encountered in parallel programming.With the main focus on computation, the book shows how to compute on three types of platfor
Astronomical tunings of the Oligocene-Miocene transition from Pacific Ocean Site U1334 and implications for the carbon cycle

Science.gov (United States)

Beddow, Helen M.; Liebrand, Diederik; Wilson, Douglas S.; Hilgen, Frits J.; Sluijs, Appy; Wade, Bridget S.; Lourens, Lucas J.

2018-03-01

Astronomical tuning of sediment sequences requires both unambiguous cycle pattern recognition in climate proxy records and astronomical solutions, as well as independent information about the phase relationship between these two. Here we present two different astronomically tuned age models for the Oligocene-Miocene transition (OMT) from Integrated Ocean Drilling Program Site U1334 (equatorial Pacific Ocean) to assess the effect tuning has on astronomically calibrated ages and the geologic timescale. These alternative age models (roughly from ˜ 22 to ˜ 24 Ma) are based on different tunings between proxy records and eccentricity: the first age model is based on an aligning CaCO3 weight (wt%) to Earth's orbital eccentricity, and the second age model is based on a direct age calibration of benthic foraminiferal stable carbon isotope ratios (δ13C) to eccentricity. To independently test which tuned age model and associated tuning assumptions are in best agreement with independent ages based on tectonic plate-pair spreading rates, we assign the tuned ages to magnetostratigraphic reversals identified in deep-marine magnetic anomaly profiles. Subsequently, we compute tectonic plate-pair spreading rates based on the tuned ages. The resultant alternative spreading-rate histories indicate that the CaCO3 tuned age model is most consistent with a conservative assumption of constant, or linearly changing, spreading rates. The CaCO3 tuned age model thus provides robust ages and durations for polarity chrons C6Bn.1n-C7n.1r, which are not based on astronomical tuning in the latest iteration of the geologic timescale. Furthermore, it provides independent evidence that the relatively large (several 10 000 years) time lags documented in the benthic foraminiferal isotope records relative to orbital eccentricity constitute a real feature of the Oligocene-Miocene climate system and carbon cycle. The age constraints from Site U1334 thus indicate that the delayed responses of the
Design Patterns: establishing a discipline of parallel software engineering

CERN Multimedia

CERN. Geneva

2010-01-01

Many core processors present us with a software challenge. We must turn our serial code into parallel code. To accomplish this wholesale transformation of our software ecosystem, we must define established practice is in parallel programming and then develop tools to support that practice. This leads to design patterns supported by frameworks optimized at runtime with advanced autotuning compilers. In this talk I provide an update of my ongoing research with the ParLab at UC Berkeley to realize this vision. In particular, I will describe our draft parallel pattern language, our early experiments with software frameworks, and the associated runtime optimization tools.About the speakerTim Mattson is a parallel programmer (Ph.D. Chemistry, UCSC, 1985). He does linear algebra, finds oil, shakes molecules, solves differential equations, and models electrons in simple atomic systems. He has spent his career working with computer scientists to make sure the needs of parallel applications programmers are met.Tim has ...
Parallel computing for event reconstruction in high-energy physics

International Nuclear Information System (INIS)

Wolbers, S.

1993-01-01

Parallel computing has been recognized as a solution to large computing problems. In High Energy Physics offline event reconstruction of detector data is a very large computing problem that has been solved with parallel computing techniques. A review of the parallel programming package CPS (Cooperative Processes Software) developed and used at Fermilab for offline reconstruction of Terabytes of data requiring the delivery of hundreds of Vax-Years per experiment is given. The Fermilab UNIX farms, consisting of 180 Silicon Graphics workstations and 144 IBM RS6000 workstations, are used to provide the computing power for the experiments. Fermilab has had a long history of providing production parallel computing starting with the ACP (Advanced Computer Project) Farms in 1986. The Fermilab UNIX Farms have been in production for over 2 years with 24 hour/day service to experimental user groups. Additional tools for management, control and monitoring these large systems will be described. Possible future directions for parallel computing in High Energy Physics will be given
On-line event reconstruction using a parallel in-memory data base

OpenAIRE

Argante, E; Van der Stok, P D V; Willers, Ian Malcolm

1995-01-01

PORS is a system designed for on-line event reconstruction in high energy physics (HEP) experiments. It uses the CPREAD reconstruction program. Central to the system is a parallel in-memory database which is used as communication medium between parallel workers. A farming control structure is implemented with PORS in a natural way. The database provides structured storage of data with a short life time. PORS serves as a case study for the construction of a methodology on how to apply parallel...
70 MeV injector auto tuning system handbook

International Nuclear Information System (INIS)

Ellis, J.E.; Munn, R.W.; Sandels, E.G.

1976-06-01

The handbook is in three sections: (1) description and location; (2) operating instructions; and (3) design notes on the tank and debuncher auto tuning systems for the 70 MeV injector. The purpose of the auto tuning system is to maintain the 'tune' of the four tanks and debuncher to within a few Hz, stabilizing against changes of temperature and other physical factors affecting the resonant frequency of the tanks. (U.K.)
Broader visual orientation tuning in patients with schizophrenia

Directory of Open Access Journals (Sweden)

Ariel eRokem

2011-11-01

Full Text Available Reduced gamma-aminobutyric acid (GABA levels in cerebral cortex are thought to contribute to information processing deficits in patients with schizophrenia (SZ, and we have previously reported lower in vivo GABA levels in the visual cortex of patients with SZ. GABA-mediated inhibition plays a role in sharpening orientation tuning of visual cortical neurons. Therefore, we predicted that tuning for visual stimulus orientation would be wider in SZ. We measured orientation tuning with a psychophysical procedure in which subjects performed a target detection task of a low-contrast oriented grating, following adaptation to a high-contrast grating. Contrast detection thresholds were determined for a range of adapter-target orientation offsets. For both SZ and healthy controls, contrast thresholds decreased as orientation offset increased, suggesting that this tuning curve reflects the selectivity of visual cortical neurons for stimulus orientation. After accounting for generalized deficits in task performance in SZ, there was no difference between patients and controls for detection of target stimuli having either the same orientation as the adapter or orientations far from the adapter. However, patients’ thresholds were significantly higher for intermediate adapter-target offsets. In addition, the mean width parameter of a Gaussian fit to the psychophysical orientation tuning curves was significantly larger for the patient group. We also present preliminary data relating visual cortical GABA levels, as measured with magnetic resonance spectroscopy, and orientation tuning width. These results suggest that our finding of broader orientation tuning in SZ may be due to diminished visual cortical GABA levels.
ATLAS tunes of PYTHIA 6 and Pythia 8 for MC11

CERN Document Server

The ATLAS collaboration

2011-01-01

We present the latest developments of the ATLAS MC generator tuning project for the Pythia family of event generators, including the C++ Pythia 8 code for the first time. The PYTHIA 6 tunes presented here, titled AMBT2B and AUET2B and constructed for a variety of PDFs, constitute alternatives to the AMBT2/AUET2 tunes previously presented as a candidate for MC11 event simulation. They systematically differ from the AMBT2/AUET2 PYTHIA 6 tunes in the treatment of alpha_S, to address concerns with those tunes. Systematic tune variations are also presented. The Pythia 8 tunes have been constructed for two different PDFs, and are aimed at an optimal description of minimum bias, for use in pile-up simulation. PDF-sensitive effects are observed and discussed in the MPI tunings of both generators.
Distributed Memory Programming on Many-Cores

DEFF Research Database (Denmark)

Berthold, Jost; Dieterle, Mischa; Lobachev, Oleg

2009-01-01

Eden is a parallel extension of the lazy functional language Haskell providing dynamic process creation and automatic data exchange. As a Haskell extension, Eden takes a high-level approach to parallel programming and thereby simplifies parallel program development. The current implementation is ...
Self-tuning control studies of the plasma vertical position problem

International Nuclear Information System (INIS)

Zheng, Guang Lin; Wellstead, P.E.; Browne, M.L.

1993-01-01

The plasma vertical position system in a tokamak device can be open-loop unstable with time-varying dynamics, such that the instability increases with system dynamical changes. Time-varying unstable dynamics makes the plasma vertical position a particularly difficult one to control with traditional fixed-coefficient controllers. A self-tuning technique offers a new solution of the plasma vertical position control problem by an adaptive control approach. Specifically, the self-tuning controller automatically tunes the controller parameters without an a priori knowledge of the system dynamics and continuously tracks dynamical changes within the system, thereby providing the system with auto-tuning and adaptive tuning capabilities. An overview of the self-tuning methods is given, and their applicability to a simulation of the Joint European Torus (JET) vertical plasma positions system is illustrated. Specifically, the applicability of pole-assignment and generalized predictive control self-tuning methods to the vertical plasma position system is demonstrated. 26 refs., 16 figs., 1 tab
A frequency domain approach for MPC tuning

NARCIS (Netherlands)

Özkan, L.; Meijs, J.B.; Backx, A.C.P.M.; Karimi, I.A.; Srinivasan, R.

2012-01-01

This paper presents a frequency domain based approach to tune the penalty weights in the model predictive control (MPC) formulation. The two-step tuning method involves the design of a favourite controller taking into account the model-plant mismatch followed by the controller matching. We implement
Natural tuning: towards a proof of concept

Science.gov (United States)

Dubovsky, Sergei; Gorbenko, Victor; Mirbabayi, Mehrdad

2013-09-01

The cosmological constant problem and the absence of new natural physics at the electroweak scale, if confirmed by the LHC, may either indicate that the nature is fine-tuned or that a refined notion of naturalness is required. We construct a family of toy UV complete quantum theories providing a proof of concept for the second possibility. Low energy physics is described by a tuned effective field theory, which exhibits relevant interactions not protected by any symmetries and separated by an arbitrary large mass gap from the new "gravitational" physics, represented by a set of irrelevant operators. Nevertheless, the only available language to describe dynamics at all energy scales does not require any fine-tuning. The interesting novel feature of this construction is that UV physics is not described by a fixed point, but rather exhibits asymptotic fragility. Observation of additional unprotected scalars at the LHC would be a smoking gun for this scenario. Natural tuning also favors TeV scale unification.
Accurate guitar tuning by cochlear implant musicians.

Directory of Open Access Journals (Sweden)

Thomas Lu

Full Text Available Modern cochlear implant (CI users understand speech but find difficulty in music appreciation due to poor pitch perception. Still, some deaf musicians continue to perform with their CI. Here we show unexpected results that CI musicians can reliably tune a guitar by CI alone and, under controlled conditions, match simultaneously presented tones to <0.5 Hz. One subject had normal contralateral hearing and produced more accurate tuning with CI than his normal ear. To understand these counterintuitive findings, we presented tones sequentially and found that tuning error was larger at ∼ 30 Hz for both subjects. A third subject, a non-musician CI user with normal contralateral hearing, showed similar trends in performance between CI and normal hearing ears but with less precision. This difference, along with electric analysis, showed that accurate tuning was achieved by listening to beats rather than discriminating pitch, effectively turning a spectral task into a temporal discrimination task.
Neutron transport solver parallelization using a Domain Decomposition method

International Nuclear Information System (INIS)

Van Criekingen, S.; Nataf, F.; Have, P.

2008-01-01

A domain decomposition (DD) method is investigated for the parallel solution of the second-order even-parity form of the time-independent Boltzmann transport equation. The spatial discretization is performed using finite elements, and the angular discretization using spherical harmonic expansions (P N method). The main idea developed here is due to P.L. Lions. It consists in having sub-domains exchanging not only interface point flux values, but also interface flux 'derivative' values. (The word 'derivative' is here used with quotes, because in the case considered here, it in fact consists in the Ω.∇ operator, with Ω the angular variable vector and ∇ the spatial gradient operator.) A parameter α is introduced, as proportionality coefficient between point flux and 'derivative' values. This parameter can be tuned - so far heuristically - to optimize the method. (authors)
7th International Workshop on Parallel Tools for High Performance Computing

CERN Document Server

Gracia, José; Nagel, Wolfgang; Resch, Michael

2014-01-01

Current advances in High Performance Computing (HPC) increasingly impact efficient software development workflows. Programmers for HPC applications need to consider trends such as increased core counts, multiple levels of parallelism, reduced memory per core, and I/O system challenges in order to derive well performing and highly scalable codes. At the same time, the increasing complexity adds further sources of program defects. While novel programming paradigms and advanced system libraries provide solutions for some of these challenges, appropriate supporting tools are indispensable. Such tools aid application developers in debugging, performance analysis, or code optimization and therefore make a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools presented and discussed at the 7th International Parallel Tools Workshop, held in Dresden, Germany, September 3-4, 2013.

Further ATLAS tunes of PYTHIA6 and Pythia 8

CERN Document Server

The ATLAS collaboration

2011-01-01

We present the latest developments of the ATLAS MC generator tuning project for the Pythia family of event generators, including the C++ Pythia 8 code. The PYTHIA 6 tunes presented here complete the ``AUET2B'' set by addition of parton shower and multi-parton interaction model tunings with three next-to-leading order (NLO) PDFs in addition to the leading-order and MC-adapted PDFs previously presented. This note also presents systematic variation ``eigentunes'' for the parton shower configurations in the AMBT2B/AUET2B tune series. The Pythia 8 MPI tunes in this note have been constructed for six different PDFs, making use of a new $x$-dependent hadronic matter distribution model. MPI eigentunes are constructed for the PDFs intended for use in ATLAS bulk MC production.
Parallel Evolutionary Optimization for Neuromorphic Network Training

Energy Technology Data Exchange (ETDEWEB)

Schuman, Catherine D [ORNL; Disney, Adam [University of Tennessee (UT); Singh, Susheela [North Carolina State University (NCSU), Raleigh; Bruer, Grant [University of Tennessee (UT); Mitchell, John Parker [University of Tennessee (UT); Klibisz, Aleksander [University of Tennessee (UT); Plank, James [University of Tennessee (UT)

2016-01-01

One of the key impediments to the success of current neuromorphic computing architectures is the issue of how best to program them. Evolutionary optimization (EO) is one promising programming technique; in particular, its wide applicability makes it especially attractive for neuromorphic architectures, which can have many different characteristics. In this paper, we explore different facets of EO on a spiking neuromorphic computing model called DANNA. We focus on the performance of EO in the design of our DANNA simulator, and on how to structure EO on both multicore and massively parallel computing systems. We evaluate how our parallel methods impact the performance of EO on Titan, the U.S.'s largest open science supercomputer, and BOB, a Beowulf-style cluster of Raspberry Pi's. We also focus on how to improve the EO by evaluating commonality in higher performing neural networks, and present the result of a study that evaluates the EO performed by Titan.
Northeast Artificial Intelligence Consortium Annual Report - 1988 Parallel Vision. Volume 9

Science.gov (United States)

1989-10-01

supports the Northeast Aritificial Intelligence Consortium (NAIC). Volume 9 Parallel Vision Report submitted by Christopher M. Brown Randal C. Nelson...NORTHEAST ARTIFICIAL INTELLIGENCE CONSORTIUM ANNUAL REPORT - 1988 Parallel Vision Syracuse University Christopher M. Brown and Randal C. Nelson...Technical Director Directorate of Intelligence & Reconnaissance FOR THE COMMANDER: IGOR G. PLONISCH Directorate of Plans & Programs If your address has
PERFORMANCE EVALUATION OF OR1200 PROCESSOR WITH EVOLUTIONARY PARALLEL HPRC USING GEP

Directory of Open Access Journals (Sweden)

R. Maheswari

2012-04-01

Full Text Available In this fast computing era, most of the embedded system requires more computing power to complete the complex function/ task at the lesser amount of time. One way to achieve this is by boosting up the processor performance which allows processor core to run faster. This paper presents a novel technique of increasing the performance by parallel HPRC (High Performance Reconfigurable Computing in the CPU/DSP (Digital Signal Processor unit of OR1200 (Open Reduced Instruction Set Computer (RISC 1200 using Gene Expression Programming (GEP an evolutionary programming model. OR1200 is a soft-core RISC processor of the Intellectual Property cores that can efficiently run any modern operating system. In the manufacturing process of OR1200 a parallel HPRC is placed internally in the Integer Execution Pipeline unit of the CPU/DSP core to increase the performance. The GEP Parallel HPRC is activated /deactivated by triggering the signals i HPRC_Gene_Start ii HPRC_Gene_End. A Verilog HDL(Hardware Description language functional code for Gene Expression Programming parallel HPRC is developed and synthesised using XILINX ISE in the former part of the work and a CoreMark processor core benchmark is used to test the performance of the OR1200 soft core in the later part of the work. The result of the implementation ensures the overall speed-up increased to 20.59% by GEP based parallel HPRC in the execution unit of OR1200.
Parallel Algorithms for Graph Optimization using Tree Decompositions

Energy Technology Data Exchange (ETDEWEB)

Sullivan, Blair D [ORNL; Weerapurage, Dinesh P [ORNL; Groer, Christopher S [ORNL

2012-06-01

Although many $\\cal{NP}$-hard graph optimization problems can be solved in polynomial time on graphs of bounded tree-width, the adoption of these techniques into mainstream scientific computation has been limited due to the high memory requirements of the necessary dynamic programming tables and excessive runtimes of sequential implementations. This work addresses both challenges by proposing a set of new parallel algorithms for all steps of a tree decomposition-based approach to solve the maximum weighted independent set problem. A hybrid OpenMP/MPI implementation includes a highly scalable parallel dynamic programming algorithm leveraging the MADNESS task-based runtime, and computational results demonstrate scaling. This work enables a significant expansion of the scale of graphs on which exact solutions to maximum weighted independent set can be obtained, and forms a framework for solving additional graph optimization problems with similar techniques.
Parallelizing an electron transport Monte Carlo simulator (MOCASIN 2.0)

International Nuclear Information System (INIS)

Schwetman, H.; Burdick, S.

1988-01-01

Electron transport simulators are tools for studying electrical properties of semiconducting materials and devices. As demands for modeling more complex devices and new materials have emerged, so have demands for more processing power. This paper documents a project to convert an electron transport simulator (MOCASIN 2.0) to a parallel processing environment. In addition to describing the conversion, the paper presents PPL, a parallel programming version of C running on a Sequent multiprocessor system. In timing tests, models that simulated the movement of 2,000 particles for 100 time steps were executed on ten processors, with a parallel efficiency of over 97%
A Programming Environment for Parallel Vision Algorithms

Science.gov (United States)

1990-04-11

industrial arm on the market , while the unique head was designed by Rochester’s Computer Science and Mechanical Engineering Departments. 9a 4.1 Introduction...R. Constraining-Unification and the Programming Language Unicorn . In Logic Programming, Functions, Relations, and Equations, Degroot and Lind- strom
Parallel algorithms for network routing problems and recurrences

International Nuclear Information System (INIS)

Wisniewski, J.A.; Sameh, A.H.

1982-01-01

In this paper, we consider the parallel solution of recurrences, and linear systems in the regular algebra of Carre. These problems are equivalent to solving the shortest path problem in graph theory, and they also arise in the analysis of Fortran programs. Our methods for solving linear systems in the regular algebra are analogues of well-known methods for solving systems of linear algebraic equations. A parallel version of Dijkstra's method, which has no linear algebraic analogue, is presented. Considerations for choosing an algorithm when the problem is large and sparse are also discussed
Just-in-Time Compilation-Inspired Methodology for Parallelization of Compute Intensive Java Code

Directory of Open Access Journals (Sweden)

GHULAM MUSTAFA

2017-01-01

Full Text Available Compute intensive programs generally consume significant fraction of execution time in a small amount of repetitive code. Such repetitive code is commonly known as hotspot code. We observed that compute intensive hotspots often possess exploitable loop level parallelism. A JIT (Just-in-Time compiler profiles a running program to identify its hotspots. Hotspots are then translated into native code, for efficient execution. Using similar approach, we propose a methodology to identify hotspots and exploit their parallelization potential on multicore systems. Proposed methodology selects and parallelizes each DOALL loop that is either contained in a hotspot method or calls a hotspot method. The methodology could be integrated in front-end of a JIT compiler to parallelize sequential code, just before native translation. However, compilation to native code is out of scope of this work. As a case study, we analyze eighteen JGF (Java Grande Forum benchmarks to determine parallelization potential of hotspots. Eight benchmarks demonstrate a speedup of up to 7.6x on an 8-core system
Just-in-time compilation-inspired methodology for parallelization of compute intensive java code

International Nuclear Information System (INIS)

Mustafa, G.; Ghani, M.U.

2017-01-01

Compute intensive programs generally consume significant fraction of execution time in a small amount of repetitive code. Such repetitive code is commonly known as hotspot code. We observed that compute intensive hotspots often possess exploitable loop level parallelism. A JIT (Just-in-Time) compiler profiles a running program to identify its hotspots. Hotspots are then translated into native code, for efficient execution. Using similar approach, we propose a methodology to identify hotspots and exploit their parallelization potential on multicore systems. Proposed methodology selects and parallelizes each DOALL loop that is either contained in a hotspot method or calls a hotspot method. The methodology could be integrated in front-end of a JIT compiler to parallelize sequential code, just before native translation. However, compilation to native code is out of scope of this work. As a case study, we analyze eighteen JGF (Java Grande Forum) benchmarks to determine parallelization potential of hotspots. Eight benchmarks demonstrate a speedup of up to 7.6x on an 8-core system. (author)
Parallel Education and Defining the Fourth Sector.

Science.gov (United States)

Chessell, Diana

1996-01-01

Parallel to the primary, secondary, postsecondary, and adult/community education sectors is education not associated with formal programs--learning in arts and cultural sites. The emergence of cultural and educational tourism is an opportunity for adult/community education to define itself by extending lifelong learning opportunities into parallel…
Parallelization of a beam dynamics code and first large scale radio frequency quadrupole simulations

Directory of Open Access Journals (Sweden)

J. Xu

2007-01-01

Full Text Available The design and operation support of hadron (proton and heavy-ion linear accelerators require substantial use of beam dynamics simulation tools. The beam dynamics code TRACK has been originally developed at Argonne National Laboratory (ANL to fulfill the special requirements of the rare isotope accelerator (RIA accelerator systems. From the beginning, the code has been developed to make it useful in the three stages of a linear accelerator project, namely, the design, commissioning, and operation of the machine. To realize this concept, the code has unique features such as end-to-end simulations from the ion source to the final beam destination and automatic procedures for tuning of a multiple charge state heavy-ion beam. The TRACK code has become a general beam dynamics code for hadron linacs and has found wide applications worldwide. Until recently, the code has remained serial except for a simple parallelization used for the simulation of multiple seeds to study the machine errors. To speed up computation, the TRACK Poisson solver has been parallelized. This paper discusses different parallel models for solving the Poisson equation with the primary goal to extend the scalability of the code onto 1024 and more processors of the new generation of supercomputers known as BlueGene (BG/L. Domain decomposition techniques have been adapted and incorporated into the parallel version of the TRACK code. To demonstrate the new capabilities of the parallelized TRACK code, the dynamics of a 45 mA proton beam represented by 10^{8} particles has been simulated through the 325 MHz radio frequency quadrupole and initial accelerator section of the proposed FNAL proton driver. The results show the benefits and advantages of large-scale parallel computing in beam dynamics simulations.
Reducing the fine-tuning of gauge-mediated SUSY breaking

Energy Technology Data Exchange (ETDEWEB)

Casas, J.A.; Moreno, Jesus M. [Universidad Autonoma de Madrid, Instituto de Fisica Teorica, IFT-UAM/CSIC, Madrid (Spain); Robles, Sandra [Universidad Autonoma de Madrid, Instituto de Fisica Teorica, IFT-UAM/CSIC, Madrid (Spain); Universidad Autonoma de Madrid, Departamento de Fisica Teorica, Madrid (Spain); Rolbiecki, Krzysztof [Universidad Autonoma de Madrid, Instituto de Fisica Teorica, IFT-UAM/CSIC, Madrid (Spain); University of Warsaw, Faculty of Physics, Warsaw (Poland)

2016-08-15

Despite their appealing features, models with gauge-mediated supersymmetry breaking (GMSB) typically present a high degree of fine-tuning, due to the initial absence of the top trilinear scalar couplings, A{sub t} = 0. In this paper, we carefully evaluate such a tuning, showing that is worse than per mil in the minimal model. Then, we examine some existing proposals to generate A{sub t} ≠ 0 term in this context. We find that, although the stops can be made lighter, usually the tuning does not improve (it may be even worse), with some exceptions, which involve the generation of A{sub t} at one loop or tree level. We examine both possibilities and propose a conceptually simplified version of the latter; which is arguably the optimum GMSB setup (with minimal matter content), concerning the fine-tuning issue. The resulting fine-tuning is better than one per mil, still severe but similar to other minimal supersymmetric standard model constructions. We also explore the so-called ''little A{sub t}{sup 2}/m{sup 2} problem'', i.e. the fact that a large A{sub t}-term is normally accompanied by a similar or larger sfermion mass, which typically implies an increase in the fine-tuning. Finally, we find the version of GMSB for which this ratio is optimized, which, nevertheless, does not minimize the fine-tuning. (orig.)
Real-time trajectory optimization on parallel processors

Science.gov (United States)

Psiaki, Mark L.

1993-01-01

A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.
Connectionist Models and Parallelism in High Level Vision.

Science.gov (United States)

1985-01-01

GRANT NUMBER(s) Jerome A. Feldman N00014-82-K-0193 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENt. PROJECT, TASK Computer Science...Connectionist Models 2.1 Background and Overviev % Computer science is just beginning to look seriously at parallel computation : it may turn out that...the chair. The program includes intermediate level networks that compute more complex joints and ones that compute parallelograms in the image. These
Parallel grid generation algorithm for distributed memory computers

Science.gov (United States)

Moitra, Stuti; Moitra, Anutosh

1994-01-01

A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.
Highly parallel machines and future of scientific computing

International Nuclear Information System (INIS)

Singh, G.S.

1992-01-01

Computing requirement of large scale scientific computing has always been ahead of what state of the art hardware could supply in the form of supercomputers of the day. And for any single processor system the limit to increase in the computing power was realized a few years back itself. Now with the advent of parallel computing systems the availability of machines with the required computing power seems a reality. In this paper the author tries to visualize the future large scale scientific computing in the penultimate decade of the present century. The author summarized trends in parallel computers and emphasize the need for a better programming environment and software tools for optimal performance. The author concludes this paper with critique on parallel architectures, software tools and algorithms. (author). 10 refs., 2 tabs
Temporal fringe pattern analysis with parallel computing

International Nuclear Information System (INIS)

Tuck Wah Ng; Kar Tien Ang; Argentini, Gianluca

2005-01-01

Temporal fringe pattern analysis is invaluable in transient phenomena studies but necessitates long processing times. Here we describe a parallel computing strategy based on the single-program multiple-data model and hyperthreading processor technology to reduce the execution time. In a two-node cluster workstation configuration we found that execution periods were reduced by 1.6 times when four virtual processors were used. To allow even lower execution times with an increasing number of processors, the time allocated for data transfer, data read, and waiting should be minimized. Parallel computing is found here to present a feasible approach to reduce execution times in temporal fringe pattern analysis
Distributed Tuning of Boundary Resources

DEFF Research Database (Denmark)

Eaton, Ben; Elaluf-Calderwood, Silvia; Sørensen, Carsten

2015-01-01

in the context of a paradoxical tension between the logic of generative and democratic innovations and the logic of infrastructural control. Boundary resources play a critical role in managing the tension as a firm that owns the infrastructure can secure its control over the service system while independent...... firms can participate in the service system. In this study, we explore the evolution of boundary resources. Drawing on Pickering’s (1993) and Barrett et al.’s (2012) conceptualizations of tuning, the paper seeks to forward our understanding of how heterogeneous actors engage in the tuning of boundary...
What is "the patient perspective" in patient engagement programs? Implicit logics and parallels to feminist theories.

Science.gov (United States)

Rowland, Paula; McMillan, Sarah; McGillicuddy, Patti; Richards, Joy

2017-01-01

Public and patient involvement (PPI) in health care may refer to many different processes, ranging from participating in decision-making about one's own care to participating in health services research, health policy development, or organizational reforms. Across these many forms of public and patient involvement, the conceptual and theoretical underpinnings remain poorly articulated. Instead, most public and patient involvement programs rely on policy initiatives as their conceptual frameworks. This lack of conceptual clarity participates in dilemmas of program design, implementation, and evaluation. This study contributes to the development of theoretical understandings of public and patient involvement. In particular, we focus on the deployment of patient engagement programs within health service organizations. To develop a deeper understanding of the conceptual underpinnings of these programs, we examined the concept of "the patient perspective" as used by patient engagement practitioners and participants. Specifically, we focused on the way this phrase was used in the singular: "the" patient perspective or "the" patient voice. From qualitative analysis of interviews with 20 patient advisers and 6 staff members within a large urban health network in Canada, we argue that "the patient perspective" is referred to as a particular kind of situated knowledge, specifically an embodied knowledge of vulnerability. We draw parallels between this logic of patient perspective and the logic of early feminist theory, including the concepts of standpoint theory and strong objectivity. We suggest that champions of patient engagement may learn much from the way feminist theorists have constructed their arguments and addressed critique.

Massively parallel computation of PARASOL code on the Origin 3800 system

International Nuclear Information System (INIS)

Hosokawa, Masanari; Takizuka, Tomonori

2001-10-01

The divertor particle simulation code named PARASOL simulates open-field plasmas between divertor walls self-consistently by using an electrostatic PIC method and a binary collision Monte Carlo model. The PARASOL parallelized with MPI-1.1 for scalar parallel computer worked on Intel Paragon XP/S system. A system SGI Origin 3800 was newly installed (May, 2001). The parallel programming was improved at this switchover. As a result of the high-performance new hardware and this improvement, the PARASOL is speeded up by about 60 times with the same number of processors. (author)
Towards automatic parameter tuning of stream processing systems

KAUST Repository

Bilal, Muhammad; Canini, Marco

2017-01-01

for automating parameter tuning for stream-processing systems. Our framework supports standard black-box optimization algorithms as well as a novel gray-box optimization algorithm. We demonstrate the multiple benefits of automated parameter tuning in optimizing
Data-parallel tomographic reconstruction : A comparison of filtered backprojection and direct Fourier reconstruction

NARCIS (Netherlands)

Roerdink, J.B.T.M.; Westenberg, M.A

1998-01-01

We consider the parallelization of two standard 2D reconstruction algorithms, filtered backprojection and direct Fourier reconstruction, using the data-parallel programming style. The algorithms are implemented on a Connection Machine CM-5 with 16 processors and a peak performance of 2 Gflop/s.
Tuning and backreaction in F-term axion monodromy inflation

Directory of Open Access Journals (Sweden)

Arthur Hebecker

2015-05-01

Full Text Available We continue the development of axion monodromy inflation, focusing in particular on the backreaction of complex structure moduli. In our setting, the shift symmetry comes from a partial large complex structure limit of the underlying type IIB orientifold or F-theory fourfold. The coefficient of the inflaton term in the superpotential has to be tuned small to avoid conflict with Kähler moduli stabilisation. To allow such a tuning, this coefficient necessarily depends on further complex structure moduli. At large values of the inflaton field, these moduli are then in danger of backreacting too strongly. To avoid this, further tunings are necessary. In weakly coupled type IIB theory at the orientifold point, implementing these tunings appears to be difficult if not impossible. However, fourfolds or models with mobile D7-branes provide enough structural freedom. We calculate the resulting inflaton potential and study the feasibility of the overall tuning given the limited freedom of the flux landscape. Our preliminary investigations suggest that, even imposing all tuning conditions, the remaining choice of flux vacua can still be large enough for such models to provide a promising path to large-field inflation in string theory.
Massively Parallel QCD

International Nuclear Information System (INIS)

Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G

2007-01-01

The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results
Self tuning fuzzy PID type load and frequency controller

International Nuclear Information System (INIS)

Yesil, E.; Guezelkaya, M.; Eksin, I.

2004-01-01

In this paper, a self tuning fuzzy PID type controller is proposed for solving the load frequency control (LFC) problem. The fuzzy PID type controller is constructed as a set of control rules, and the control signal is directly deduced from the knowledge base and the fuzzy inference. Moreover, there exists a self tuning mechanism that adjusts the input scaling factor corresponding to the derivative coefficient and the output scaling factor corresponding to the integral coefficient of the PID type fuzzy logic controller in an on-line manner. The self tuning mechanism depends on the peak observer idea, and this idea is modified and adapted to the LFC problem. A two area interconnected system is assumed for demonstrations. The proposed self tuning fuzzy PID type controller has been compared with the fuzzy PID type controller without a self tuning mechanism and the conventional integral controller through some performance indices
The Research of the Parallel Computing Development from the Angle of Cloud Computing

Science.gov (United States)

Peng, Zhensheng; Gong, Qingge; Duan, Yanyu; Wang, Yun

2017-10-01

Cloud computing is the development of parallel computing, distributed computing and grid computing. The development of cloud computing makes parallel computing come into people’s lives. Firstly, this paper expounds the concept of cloud computing and introduces two several traditional parallel programming model. Secondly, it analyzes and studies the principles, advantages and disadvantages of OpenMP, MPI and Map Reduce respectively. Finally, it takes MPI, OpenMP models compared to Map Reduce from the angle of cloud computing. The results of this paper are intended to provide a reference for the development of parallel computing.
Accelerating the explicitly restarted Arnoldi method with GPUs using an auto-tuned matrix vector product

International Nuclear Information System (INIS)

Dubois, J.; Calvin, Ch.; Dubois, J.; Petiton, S.

2011-01-01

This paper presents a parallelized hybrid single-vector Arnoldi algorithm for computing approximations to Eigen-pairs of a nonsymmetric matrix. We are interested in the use of accelerators and multi-core units to speed up the Arnoldi process. The main goal is to propose a parallel version of the Arnoldi solver, which can efficiently use multiple multi-core processors or multiple graphics processing units (GPUs) in a mixed coarse and fine grain fashion. In the proposed algorithms, this is achieved by an auto-tuning of the matrix vector product before starting the Arnoldi Eigen-solver as well as the reorganization of the data and global communications so that communication time is reduced. The execution time, performance, and scalability are assessed with well-known dense and sparse test matrices on multiple Nehalems, GT200 NVidia Tesla, and next generation Fermi Tesla. With one processor, we see a performance speedup of 2 to 3x when using all the physical cores, and a total speedup of 2 to 8x when adding a GPU to this multi-core unit, and hence a speedup of 4 to 24x compared to the sequential solver. (authors)
Performance evaluation of the HEP, ELXSI and CRAY X-MP parallel processors on hydrocode test problems

International Nuclear Information System (INIS)

Liebrock, L.M.; McGrath, J.F.; Hicks, D.L.

1986-01-01

Parallel programming promises improved processing speeds for hydrocodes, magnetohydrocodes, multiphase flow codes, thermal-hydraulics codes, wavecodes and other continuum dynamics codes. This paper presents the results of some investigations of parallel algorithms on three parallel processors: the CRAY X-MP, ELXSI and the HEP computers. Introduction and Background: We report the results of investigations of parallel algorithms for computational continuum dynamics. These programs (hydrocodes, wavecodes, etc.) produce simulations of the solutions to problems arising in the motion of continua: solid dynamics, liquid dynamics, gas dynamics, plasma dynamics, multiphase flow dynamics, thermal-hydraulic dynamics and multimaterial flow dynamics. This report restricts its scope to one-dimensional algorithms such as the von Neumann-Richtmyer (1950) scheme
A high-quality narrow passband filter for elastic SV waves via aligned parallel separated thin polymethylmethacrylate plates

Directory of Open Access Journals (Sweden)

Jun Zhang

2017-08-01

Full Text Available We designed a high-quality filter that consists of aligned parallel polymethylmethacrylate (PMMA thin plates with small gaps for elastic SV waves propagate in metals. Both the theoretical model and the full numerical simulation show the transmission spectrum of the elastic SV waves through such a filter has several sharp peaks with flawless transmission within the investigated frequencies. These peaks can be readily tuned by manipulating the geometry parameters of the PMMA plates. Our investigation finds that the same filter performs well for different metals where the elastic SV waves propagated.
Protocol-Based Verification of Message-Passing Parallel Programs

DEFF Research Database (Denmark)

López-Acosta, Hugo-Andrés; Eduardo R. B. Marques, Eduardo R. B.; Martins, Francisco

2015-01-01

We present ParTypes, a type-based methodology for the verification of Message Passing Interface (MPI) programs written in the C programming language. The aim is to statically verify programs against protocol specifications, enforcing properties such as fidelity and absence of deadlocks. We develo...
Big Data GPU-Driven Parallel Processing Spatial and Spatio-Temporal Clustering Algorithms

Science.gov (United States)

Konstantaras, Antonios; Skounakis, Emmanouil; Kilty, James-Alexander; Frantzeskakis, Theofanis; Maravelakis, Emmanuel

2016-04-01

Advances in graphics processing units' technology towards encompassing parallel architectures [1], comprised of thousands of cores and multiples of parallel threads, provide the foundation in terms of hardware for the rapid processing of various parallel applications regarding seismic big data analysis. Seismic data are normally stored as collections of vectors in massive matrices, growing rapidly in size as wider areas are covered, denser recording networks are being established and decades of data are being compiled together [2]. Yet, many processes regarding seismic data analysis are performed on each seismic event independently or as distinct tiles [3] of specific grouped seismic events within a much larger data set. Such processes, independent of one another can be performed in parallel narrowing down processing times drastically [1,3]. This research work presents the development and implementation of three parallel processing algorithms using Cuda C [4] for the investigation of potentially distinct seismic regions [5,6] present in the vicinity of the southern Hellenic seismic arc. The algorithms, programmed and executed in parallel comparatively, are the: fuzzy k-means clustering with expert knowledge [7] in assigning overall clusters' number; density-based clustering [8]; and a selves-developed spatio-temporal clustering algorithm encompassing expert [9] and empirical knowledge [10] for the specific area under investigation. Indexing terms: GPU parallel programming, Cuda C, heterogeneous processing, distinct seismic regions, parallel clustering algorithms, spatio-temporal clustering References [1] Kirk, D. and Hwu, W.: 'Programming massively parallel processors - A hands-on approach', 2nd Edition, Morgan Kaufman Publisher, 2013 [2] Konstantaras, A., Valianatos, F., Varley, M.R. and Makris, J.P.: 'Soft-Computing Modelling of Seismicity in the Southern Hellenic Arc', Geoscience and Remote Sensing Letters, vol. 5 (3), pp. 323-327, 2008 [3] Papadakis, S. and
Non-parametric Tuning of PID Controllers A Modified Relay-Feedback-Test Approach

CERN Document Server

Boiko, Igor

2013-01-01

The relay feedback test (RFT) has become a popular and efficient tool used in process identification and automatic controller tuning. Non-parametric Tuning of PID Controllers couples new modifications of classical RFT with application-specific optimal tuning rules to form a non-parametric method of test-and-tuning. Test and tuning are coordinated through a set of common parameters so that a PID controller can obtain the desired gain or phase margins in a system exactly, even with unknown process dynamics. The concept of process-specific optimal tuning rules in the nonparametric setup, with corresponding tuning rules for flow, level pressure, and temperature control loops is presented in the text. Common problems of tuning accuracy based on parametric and non-parametric approaches are addressed. In addition, the text treats the parametric approach to tuning based on the modified RFT approach and the exact model of oscillations in the system under test using the locus of a perturbedrelay system (LPRS) meth...
TME (Task Mapping Editor): tool for executing distributed parallel computing. TME user's manual

International Nuclear Information System (INIS)

Takemiya, Hiroshi; Yamagishi, Nobuhiro; Imamura, Toshiyuki

2000-03-01

At the Center for Promotion of Computational Science and Engineering, a software environment PPExe has been developed to support scientific computing on a parallel computer cluster (distributed parallel scientific computing). TME (Task Mapping Editor) is one of components of the PPExe and provides a visual programming environment for distributed parallel scientific computing. Users can specify data dependence among tasks (programs) visually as a data flow diagram and map these tasks onto computers interactively through GUI of TME. The specified tasks are processed by other components of PPExe such as Meta-scheduler, RIM (Resource Information Monitor), and EMS (Execution Management System) according to the execution order of these tasks determined by TME. In this report, we describe the usage of TME. (author)
Center for Programming Models for Scalable Parallel Computing - Towards Enhancing OpenMP for Manycore and Heterogeneous Nodes

Energy Technology Data Exchange (ETDEWEB)

Barbara Chapman

2012-02-01

OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close to DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support.
Parallel rendering

Science.gov (United States)

Crockett, Thomas W.

1995-01-01

This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.
The CRAFT Fortran Programming Model

Directory of Open Access Journals (Sweden)

Douglas M. Pase

1994-01-01

Full Text Available Many programming models for massively parallel machines exist, and each has its advantages and disadvantages. In this article we present a programming model that combines features from other programming models that (1 can be efficiently implemented on present and future Cray Research massively parallel processor (MPP systems and (2 are useful in constructing highly parallel programs. The model supports several styles of programming: message-passing, data parallel, global address (shared data, and work-sharing. These styles may be combined within the same program. The model includes features that allow a user to define a program in terms of the behavior of the system as a whole, where the behavior of individual tasks is implicit from this systemic definition. (In general, features marked as shared are designed to support this perspective. It also supports an opposite perspective, where a program may be defined in terms of the behaviors of individual tasks, and a program is implicitly the sum of the behaviors of all tasks. (Features marked as private are designed to support this perspective. Users can exploit any combination of either set of features without ambiguity and thus are free to define a program from whatever perspective is most appropriate to the problem at hand.
Static and dynamic load-balancing strategies for parallel reservoir simulation

International Nuclear Information System (INIS)

Anguille, L.; Killough, J.E.; Li, T.M.C.; Toepfer, J.L.

1995-01-01

Accurate simulation of the complex phenomena that occur in flow in porous media can tax even the most powerful serial computers. Emergence of new parallel computer architectures as a future efficient tool in reservoir simulation may overcome this difficulty. Unfortunately, major problems remain to be solved before using parallel computers commercially: production serial programs must be rewritten to be efficient in parallel environments and load balancing methods must be explored to evenly distribute the workload on each processor during the simulation. This study implements both a static load-balancing algorithm and a receiver-initiated dynamic load-sharing algorithm to achieve high parallel efficiencies on both the IBM SP2 and Intel IPSC/860 parallel computers. Significant speedup improvement was recorded for both methods. Further optimization of these algorithms yielded a technique with efficiencies as high as 90% and 70% on 8 and 32 nodes, respectively. The increased performance was the result of the minimization of message-passing overhead
Parallel computation for solving the tridiagonal linear system of equations

International Nuclear Information System (INIS)

Ishiguro, Misako; Harada, Hiroo; Fujii, Minoru; Fujimura, Toichiro; Nakamura, Yasuhiro; Nanba, Katsumi.

1981-09-01

Recently, applications of parallel computation for scientific calculations have increased from the need of the high speed calculation of large scale programs. At the JAERI computing center, an array processor FACOM 230-75 APU has installed to study the applicability of parallel computation for nuclear codes. We made some numerical experiments by using the APU on the methods of solution of tridiagonal linear equation which is an important problem in scientific calculations. Referring to the recent papers with parallel methods, we investigate eight ones. These are Gauss elimination method, Parallel Gauss method, Accelerated parallel Gauss method, Jacobi method, Recursive doubling method, Cyclic reduction method, Chebyshev iteration method, and Conjugate gradient method. The computing time and accuracy were compared among the methods on the basis of the numerical experiments. As the result, it is found that the Cyclic reduction method is best both in computing time and accuracy and the Gauss elimination method is the second one. (author)
Comparison of the Efficiency of Tuned Mass and Tuned Liquid Dampers at High-Rise Structures under Near and Far Fault Earthquakes

Directory of Open Access Journals (Sweden)

Hamed Rahman Shokrgozar

2017-02-01

Full Text Available Tuned mass and tuned liquid dampers are most common passive control systems that used for decrease of seismic responses of buildings. In this study, the performance of high-rise buildings with TM and TL dampers are evaluated under seven near-fault and seven far-fault earthquakes. For this purpose, a twenty-four stories steel moment frame building has been considered and the time history dynamic analyses are performed for both of controlled and uncontrolled states. Moreover, this building has been also modelled with five various mass, stiffness and damping ratios.The results have been shown that decreasing the structural responses at tall buildings against near-fault earthquakes are more than far-fault earthquakes due to the effect of higher modes. Furthermore, the tuned mass damper has better performance at decreasing of the responses in comparison of tuned liquid dampers.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.