WorldWideScience

Sample records for high-level parallel programming

  1. High-Level Parallel Programming.

    Science.gov (United States)

    parallel programming languages. These issues were evaluated via the utilization of a language called UC. UC is a programming language aimed at balancing notational simplicity with execution efficiency and portability. UC accomplishes this by separating the programming task from the efficiency issues. This report gives a description of the language, its current implementation, its verification methodology and its use in designing various

  2. Parallel Libraries to support High-Level Programming

    DEFF Research Database (Denmark)

    Larsen, Morten Nørgaard

    so is not a simple task and for many non-computer scientists, like chemists and physicists writing programs for simulating their experiments, the task can easily become overwhelming. During the last decades, a lot of research efforts have been put into how to create tools that will simplify writing......The development of computer architectures during the last ten years have forced programmers to move towards writing parallel programs instead of sequential ones. The homogenous multi-core architectures from the major CPU producers like Intel and AMD has led this trend, but the introduction......, the general increase in the usage of graphic cards for general-purpose programming (GPGPU) have meant that programmers today must be able to write parallel programs that cannot only utilize small number computational cores but perhaps hundreds or even thousands. However, most programmers will agree that doing...

  3. Parallel Libraries to support High-Level Programming

    DEFF Research Database (Denmark)

    Larsen, Morten Nørgaard

    model requires the programmer to think a bit differently, but at the same time the implemented algorithms will perform very well, as shown by the initial tests presented. In the second part of this thesis, I will change focus from the CELL-BE architecture to the more traditionally x86 architecture...... of the more exotic though short-lived heterogeneous CELL Broadband Engine (CELL-BE) architecture added to this shift. Furthermore, the use of cluster computers made of commodity hardware and specialized supercomputers have greatly increased in both industry as well as in the academic world. Finally...... as they would be a single machine. In between is a number of tools helping the programmers handle communication, share data, run loops in parallel, handle algorithms mining huge amounts of data etc. Even though most of them do a good job performance-wise, almost all of them require that the programmers learn...

  4. Parallel programming with PCN

    Energy Technology Data Exchange (ETDEWEB)

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  5. Hardware-software-co-design of parallel and distributed systems using a behavioural programming and multi-process model with high-level synthesis

    Science.gov (United States)

    Bosse, Stefan

    2011-05-01

    A new design methodology for parallel and distributed embedded systems is presented using the behavioural hardware compiler ConPro providing an imperative programming model based on concurrently communicating sequential processes (CSP) with an extensive set of interprocess-communication primitives and guarded atomic actions. The programming language and the compiler-based synthesis process enables the design of constrained power- and resourceaware embedded systems with pure Register-Transfer-Logic (RTL) efficiently mapped to FPGA and ASIC technologies. Concurrency is modelled explicitly on control- and datapath level. Additionally, concurrency on data-path level can be automatically explored and optimized by different schedulers. The CSP programming model can be synthesized to hardware (SoC) and software (C,ML) models and targets. A common source for both hardware and software implementation with identical functional behaviour is used. Processes and objects of the entire design can be distributed on different hardware and software platforms, for example, several FPGA components and software executed on several microprocessors, providing a parallel and distributed system. Intersystem-, interprocess-, and object communication is automatically implemented with serial links, not visible on programming level. The presented design methodology has the benefit of high modularity, freedom of choice of target technologies, and system architecture. Algorithms can be well matched to and distributed on different suitable execution platforms and implementation technologies, using a unique programming model, providing a balance of concurrency and resource complexity. An extended case study of a communication protocol used in high-density sensor-actuator networks should demonstrate and compare the design of a hardware and software target. The communication protocol is suited for high-density intra-and interchip networks.

  6. Parallel programming with PCN

    Energy Technology Data Exchange (ETDEWEB)

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  7. Developing Parallel Programs

    Directory of Open Access Journals (Sweden)

    Ranjan Sen

    2012-09-01

    Full Text Available Parallel programming is an extension of sequential programming; today, it is becoming the mainstream paradigm in day-to-day information processing. Its aim is to build the fastest programs on parallel computers. The methodologies for developing a parallelprogram can be put into integrated frameworks. Development focuses on algorithm, languages, and how the program is deployed on the parallel computer.

  8. Introduction to parallel programming

    CERN Document Server

    Brawer, Steven

    1989-01-01

    Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race

  9. Very-High Level Concurrent Programming.

    Science.gov (United States)

    1984-12-01

    individual modules at runtime (section 11.8). 3.5 CODE GEERATION In this phase, the Confiurator generates JCL and PVI programs for the execution of a...i’CH42 /1(4)1: /E(2)/ (ILNAMES30 IcPF- / Z (4)/ : /E(2)/ cPNGMESo cARRV~i - /ARROK/ (K): :/REOV (F) :~/RECF/ <NLjkAHS’% : HPIUAME) /ADDNAME/ CKMJAE...described in section 10.3. 0 I Z 2. A pair of nodes v and w in a configuration is said to be DIRECTLY EQUIVALDIT, denoted by v < ) w, if and only if: i

  10. Parallel Programming Paradigms

    Science.gov (United States)

    1987-07-01

    GOVT ACCESSION NO. 3. RECIPIENT’S CATALOG NUMBER 4, TITL.: td Subtitle) S. TYPE OF REPORT & PERIOD COVERED Parallel Programming Paradigms...studied. 0A ITI is Jt, t’i- StCUI-eASSIICATION OFvrHIS PAGFrm".n Def. £ntered, Parallel Programming Paradigms Philip Arne Nelson Department of Computer...8416878 and by the Office of Naval Research Contracts No. N00014-86-K-0264 and No. N00014-85- K-0328. 8 ?~~ O .G 1 49 II Parallel Programming Paradigms

  11. Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore

    Energy Technology Data Exchange (ETDEWEB)

    Liao, C; Quinlan, D J; Willcock, J J; Panas, T

    2008-12-12

    Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. C++ applications using high-level abstractions, such as STL containers and complex user-defined types, are largely ignored due to the lack of research compilers that are readily able to recognize high-level object-oriented abstractions and leverage their associated semantics. In this paper, we automatically parallelize C++ applications using ROSE, a multiple-language source-to-source compiler infrastructure which preserves the high-level abstractions and gives us access to their semantics. Several representative parallelization candidate kernels are used to explore semantic-aware parallelization strategies for high-level abstractions, combined with extended compiler analyses. Those kernels include an array-base computation loop, a loop with task-level parallelism, and a domain-specific tree traversal. Our work extends the applicability of automatic parallelization to modern applications using high-level abstractions and exposes more opportunities to take advantage of multicore processors.

  12. High-level waste immobilization program: an overview

    Energy Technology Data Exchange (ETDEWEB)

    Bonner, W.R.

    1979-09-01

    The High-Level Waste Immobilization Program is providing technology to allow safe, affordable immobilization and disposal of nuclear waste. Waste forms and processes are being developed on a schedule consistent with national needs for immobilization of high-level wastes stored at Savannah River, Hanford, Idaho National Engineering Laboratory, and West Valley, New York. This technology is directly applicable to high-level wastes from potential reprocessing of spent nuclear fuel. The program is removing one more obstacle previously seen as a potential restriction on the use and further development of nuclear power, and is thus meeting a critical technological need within the national objective of energy independence.

  13. Semantic-Aware Automatic Parallelization of Modern Applications Using High-Level Abstractions

    Energy Technology Data Exchange (ETDEWEB)

    Liao, C; Quinlan, D J; Willcock, J J; Panas, T

    2009-12-21

    Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. Modern applications using high-level abstractions, such as C++ STL containers and complex user-defined class types, are largely ignored due to the lack of research compilers that are readily able to recognize high-level object-oriented abstractions and leverage their associated semantics. In this paper, we use a source-to-source compiler infrastructure, ROSE, to explore compiler techniques to recognize high-level abstractions and to exploit their semantics for automatic parallelization. Several representative parallelization candidate kernels are used to study semantic-aware parallelization strategies for high-level abstractions, combined with extended compiler analyses. Preliminary results have shown that semantics of abstractions can help extend the applicability of automatic parallelization to modern applications and expose more opportunities to take advantage of multicore processors.

  14. Parallel Programming with Intel Parallel Studio XE

    CERN Document Server

    Blair-Chappell , Stephen

    2012-01-01

    Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the

  15. Hanford long-term high-level waste management program

    Energy Technology Data Exchange (ETDEWEB)

    Wodrich, D.D.

    1976-06-24

    An overview of the Hanford Long-Term High-Level Waste Management Program is presented. Four topics are discussed: first, the kinds and quantities of waste that will exist and are included in this program; second, how the plan is structured to solve this problem; third, the alternative waste management methods being considered; and fourth, the technology program that is in progress to carry out this plan. (LK)

  16. Parallel programming with PCN. Revision 1

    Energy Technology Data Exchange (ETDEWEB)

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  17. Parallel programming with MPI

    Energy Technology Data Exchange (ETDEWEB)

    Tatebe, Osamu [Electrotechnical Lab., Tsukuba, Ibaraki (Japan)

    1998-03-01

    MPI is a practical, portable, efficient and flexible standard for message passing, which has been implemented on most MPPs and network of workstations by machine vendors, universities and national laboratories. MPI avoids specifying how operations will take place and superfluous work to achieve efficiency as well as portability, and is also designed to encourage overlapping communication and computation to hide communication latencies. This presentation briefly explains the MPI standard, and comments on efficient parallel programming to improve performance. (author)

  18. Architectural Adaptability in Parallel Programming

    Science.gov (United States)

    1991-05-01

    I AD-A247 516 Architectural Adaptability in Parallel Programming Lawrence Alan Crowl Technical Report 381 May 1991 92-06322 UNIVERSITY OF ROC R...COMPUTER SCIENCE Best Avai~lable Copy Architectural Adaptability in Parallel Programming by Lawrence Alan Crowl Submitted in Partial Fulfillment of the...in the development of their programs. In applying abstraction to parallel programming , we can use abstractions to represent potential parallelism

  19. High-level waste management technology program plan

    Energy Technology Data Exchange (ETDEWEB)

    Harmon, H.D.

    1995-01-01

    The purpose of this plan is to document the integrated technology program plan for the Savannah River Site (SRS) High-Level Waste (HLW) Management System. The mission of the SRS HLW System is to receive and store SRS high-level wastes in a see and environmentally sound, and to convert these wastes into forms suitable for final disposal. These final disposal forms are borosilicate glass to be sent to the Federal Repository, Saltstone grout to be disposed of on site, and treated waste water to be released to the environment via a permitted outfall. Thus, the technology development activities described herein are those activities required to enable successful accomplishment of this mission. The technology program is based on specific needs of the SRS HLW System and organized following the systems engineering level 3 functions. Technology needs for each level 3 function are listed as reference, enhancements, and alternatives. Finally, FY-95 funding, deliverables, and schedules are s in Chapter IV with details on the specific tasks that are funded in FY-95 provided in Appendix A. The information in this report represents the vision of activities as defined at the beginning of the fiscal year. Depending on emergent issues, funding changes, and other factors, programs and milestones may be adjusted during the fiscal year. The FY-95 SRS HLW technology program strongly emphasizes startup support for the Defense Waste Processing Facility and In-Tank Precipitation. Closure of technical issues associated with these operations has been given highest priority. Consequently, efforts on longer term enhancements and alternatives are receiving minimal funding. However, High-Level Waste Management is committed to participation in the national Radioactive Waste Tank Remediation Technology Focus Area. 4 refs., 5 figs., 9 tabs.

  20. Design of a Real-Time Face Detection Parallel Architecture Using High-Level Synthesis

    Directory of Open Access Journals (Sweden)

    Yang Fan

    2008-01-01

    Full Text Available Abstract We describe a High-Level Synthesis implementation of a parallel architecture for face detection. The chosen face detection method is the well-known Convolutional Face Finder (CFF algorithm, which consists of a pipeline of convolution operations. We rely on dataflow modelling of the algorithm and we use a high-level synthesis tool in order to specify the local dataflows of our Processing Element (PE, by describing in C language inter-PE communication, fine scheduling of the successive convolutions, and memory distribution and bandwidth. Using this approach, we explore several implementation alternatives in order to find a compromise between processing speed and area of the PE. We then build a parallel architecture composed of a PE ring and a FIFO memory, which constitutes a generic architecture capable of processing images of different sizes. A ring of 25 PEs running at 80 MHz is able to process 127 QVGA images per second or 35 VGA images per second.

  1. Design of a Real-Time Face Detection Parallel Architecture Using High-Level Synthesis

    Directory of Open Access Journals (Sweden)

    2009-02-01

    Full Text Available We describe a High-Level Synthesis implementation of a parallel architecture for face detection. The chosen face detection method is the well-known Convolutional Face Finder (CFF algorithm, which consists of a pipeline of convolution operations. We rely on dataflow modelling of the algorithm and we use a high-level synthesis tool in order to specify the local dataflows of our Processing Element (PE, by describing in C language inter-PE communication, fine scheduling of the successive convolutions, and memory distribution and bandwidth. Using this approach, we explore several implementation alternatives in order to find a compromise between processing speed and area of the PE. We then build a parallel architecture composed of a PE ring and a FIFO memory, which constitutes a generic architecture capable of processing images of different sizes. A ring of 25 PEs running at 80 MHz is able to process 127 QVGA images per second or 35 VGA images per second.

  2. Computer Assisted Parallel Program Generation

    CERN Document Server

    Kawata, Shigeo

    2015-01-01

    Parallel computation is widely employed in scientific researches, engineering activities and product development. Parallel program writing itself is not always a simple task depending on problems solved. Large-scale scientific computing, huge data analyses and precise visualizations, for example, would require parallel computations, and the parallel computing needs the parallelization techniques. In this Chapter a parallel program generation support is discussed, and a computer-assisted parallel program generation system P-NCAS is introduced. Computer assisted problem solving is one of key methods to promote innovations in science and engineering, and contributes to enrich our society and our life toward a programming-free environment in computing science. Problem solving environments (PSE) research activities had started to enhance the programming power in 1970's. The P-NCAS is one of the PSEs; The PSE concept provides an integrated human-friendly computational software and hardware system to solve a target ...

  3. Approach of generating parallel programs from parallelized algorithm design strategies

    Institute of Scientific and Technical Information of China (English)

    WAN Jian-yi; LI Xiao-ying

    2008-01-01

    Today, parallel programming is dominated by message passing libraries, such as message passing interface (MPI). This article intends to simplify parallel programming by generating parallel programs from parallelized algorithm design strategies. It uses skeletons to abstract parallelized algorithm design strategies, as well as parallel architectures. Starting from problem specification, an abstract parallel abstract programming language+ (Apla+) program is generated from parallelized algorithm design strategies and problem-specific function definitions. By combining with parallel architectures, implicity of parallelism inside the parallelized algorithm design strategies is exploited. With implementation and transformation, C++ and parallel virtual machine (CPPVM) parallel program is finally generated. Parallelized branch and bound (B&B) algorithm design strategy and parallelized divide and conquer (D & C) algorithm design strategy are studied in this article as examples. And it also illustrates the approach with a case study.

  4. Patterns For Parallel Programming

    CERN Document Server

    Mattson, Timothy G; Massingill, Berna L

    2005-01-01

    From grids and clusters to next-generation game consoles, parallel computing is going mainstream. Innovations such as Hyper-Threading Technology, HyperTransport Technology, and multicore microprocessors from IBM, Intel, and Sun are accelerating the movement's growth. Only one thing is missing: programmers with the skills to meet the soaring demand for parallel software.

  5. Defense High-Level Waste Leaching Mechanisms Program. Final report

    Energy Technology Data Exchange (ETDEWEB)

    Mendel, J.E. (compiler)

    1984-08-01

    The Defense High-Level Waste Leaching Mechanisms Program brought six major US laboratories together for three years of cooperative research. The participants reached a consensus that solubility of the leached glass species, particularly solubility in the altered surface layer, is the dominant factor controlling the leaching behavior of defense waste glass in a system in which the flow of leachant is constrained, as it will be in a deep geologic repository. Also, once the surface of waste glass is contacted by ground water, the kinetics of establishing solubility control are relatively rapid. The concentrations of leached species reach saturation, or steady-state concentrations, within a few months to a year at 70 to 90/sup 0/C. Thus, reaction kinetics, which were the main subject of earlier leaching mechanisms studies, are now shown to assume much less importance. The dominance of solubility means that the leach rate is, in fact, directly proportional to ground water flow rate. Doubling the flow rate doubles the effective leach rate. This relationship is expected to obtain in most, if not all, repository situations.

  6. Parallel programming with PCN. Revision 2

    Energy Technology Data Exchange (ETDEWEB)

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  7. Parallel programming with Python

    CERN Document Server

    Palach, Jan

    2014-01-01

    A fast, easy-to-follow and clear tutorial to help you develop Parallel computing systems using Python. Along with explaining the fundamentals, the book will also introduce you to slightly advanced concepts and will help you in implementing these techniques in the real world. If you are an experienced Python programmer and are willing to utilize the available computing resources by parallelizing applications in a simple way, then this book is for you. You are required to have a basic knowledge of Python development to get the most of this book.

  8. Practical parallel programming

    CERN Document Server

    Bauer, Barr E

    2014-01-01

    This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.

  9. Writing parallel programs that work

    CERN Document Server

    CERN. Geneva

    2012-01-01

    Serial algorithms typically run inefficiently on parallel machines. This may sound like an obvious statement, but it is the root cause of why parallel programming is considered to be difficult. The current state of the computer industry is still that almost all programs in existence are serial. This talk will describe the techniques used in the Intel Parallel Studio to provide a developer with the tools necessary to understand the behaviors and limitations of the existing serial programs. Once the limitations are known the developer can refactor the algorithms and reanalyze the resulting programs with the tools in the Intel Parallel Studio to create parallel programs that work. About the speaker Paul Petersen is a Sr. Principal Engineer in the Software and Solutions Group (SSG) at Intel. He received a Ph.D. degree in Computer Science from the University of Illinois in 1993. After UIUC, he was employed at Kuck and Associates, Inc. (KAI) working on auto-parallelizing compiler (KAP), and was involved in th...

  10. A Heterogeneous Parallel Programming Capability

    Science.gov (United States)

    1990-11-30

    the various implementations of Express attempted to address only the first of these is- sues - providing a portable, standard platform for parallel ... programming on a wide variety of dif- I I! 5 ferent systems. Each implementation, however, was independent, but allowed programs to execute on a single

  11. Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

    Science.gov (United States)

    Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.

    2013-08-01

    environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.

  12. Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations

    Energy Technology Data Exchange (ETDEWEB)

    Bylaska, Eric J., E-mail: Eric.Bylaska@pnnl.gov [Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, P.O. Box 999, Richland, Washington 99352 (United States); Weare, Jonathan Q., E-mail: weare@uchicago.edu [Department of Mathematics, University of Chicago, Chicago, Illinois 60637 (United States); Weare, John H., E-mail: jweare@ucsd.edu [Department of Chemistry and Biochemistry, University of California, San Diego, La Jolla, California 92093 (United States)

    2013-08-21

    to 14.3. The parallel in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H{sub 2}O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.

  13. Extending molecular simulation time scales: Parallel in time integrations for high-level quantum chemistry and complex force representations.

    Science.gov (United States)

    Bylaska, Eric J; Weare, Jonathan Q; Weare, John H

    2013-08-21

    distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD simulation of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD simulations at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already parallel. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) simulation that had reached its maximum possible speedup in the parallelization of the electronic structure calculation from 32 s/time step to 6.9 s/time step.

  14. The FORCE: A highly portable parallel programming language

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.

  15. The FORCE - A highly portable parallel programming language

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.

  16. Parallel Programming Archetypes in Combinatorics and Optimization

    Science.gov (United States)

    1995-06-12

    A parallel programming archetype is a language independent program design strategy. We describe two archetypes in combinatorics and optimization...the systematic design of efficient sequential and parallel programs. The research whose results are presented in this document is part of the ongoing project on Parallel Programming Archetype.

  17. Graphics-Based Parallel Programming Tools

    Science.gov (United States)

    1991-09-01

    AD-A254 406 (9 FINAL REPORT DLECTF ’AUG 13 1992 Graphics-Based Parallel Programming Tools .p Janice E. Cuny, Principal Investigator Department of...suggest parallel (either because we use a parallel graph rewriting mechanism or because we apply our results to parallel programming ), we interpret it to...was to provide support for the ex- plicit representation of graphs for use within a parallel programming environ- ment. In our environment, we view a

  18. A Parallel Programming Model With Sequential Semantics

    Science.gov (United States)

    1996-01-01

    Parallel programming is more difficult than sequential programming in part because of the complexity of reasoning, testing, and debugging in the...context of concurrency. In the thesis, we present and investigate a parallel programming model that provides direct control of parallelism in a notation

  19. Structured Parallel Programming Patterns for Efficient Computation

    CERN Document Server

    McCool, Michael; Robison, Arch

    2012-01-01

    Programming is now parallel programming. Much as structured programming revolutionized traditional serial programming decades ago, a new kind of structured programming, based on patterns, is relevant to parallel programming today. Parallel computing experts and industry insiders Michael McCool, Arch Robison, and James Reinders describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give detailed concrete examples using multiple programming models. Examples are primarily given using two of th

  20. About Parallel Programming: Paradigms, Parallel Execution and Collaborative Systems

    Directory of Open Access Journals (Sweden)

    Loredana MOCEAN

    2009-01-01

    Full Text Available In the last years, there were made efforts for delineation of a stabile and unitary frame, where the problems of logical parallel processing must find solutions at least at the level of imperative languages. The results obtained by now are not at the level of the made efforts. This paper wants to be a little contribution at these efforts. We propose an overview in parallel programming, parallel execution and collaborative systems.

  1. The ParaScope parallel programming environment

    Science.gov (United States)

    Cooper, Keith D.; Hall, Mary W.; Hood, Robert T.; Kennedy, Ken; Mckinley, Kathryn S.; Mellor-Crummey, John M.; Torczon, Linda; Warren, Scott K.

    1993-01-01

    The ParaScope parallel programming environment, developed to support scientific programming of shared-memory multiprocessors, includes a collection of tools that use global program analysis to help users develop and debug parallel programs. This paper focuses on ParaScope's compilation system, its parallel program editor, and its parallel debugging system. The compilation system extends the traditional single-procedure compiler by providing a mechanism for managing the compilation of complete programs. Thus, ParaScope can support both traditional single-procedure optimization and optimization across procedure boundaries. The ParaScope editor brings both compiler analysis and user expertise to bear on program parallelization. It assists the knowledgeable user by displaying and managing analysis and by providing a variety of interactive program transformations that are effective in exposing parallelism. The debugging system detects and reports timing-dependent errors, called data races, in execution of parallel programs. The system combines static analysis, program instrumentation, and run-time reporting to provide a mechanical system for isolating errors in parallel program executions. Finally, we describe a new project to extend ParaScope to support programming in FORTRAN D, a machine-independent parallel programming language intended for use with both distributed-memory and shared-memory parallel computers.

  2. Parallel Programming in the Age of Ubiquitous Parallelism

    Science.gov (United States)

    Pingali, Keshav

    2014-04-01

    Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but none of these approaches has succeeded except in a few niche application areas. In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers. Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs

  3. Genetic Parallel Programming: design and implementation.

    Science.gov (United States)

    Cheang, Sin Man; Leung, Kwong Sak; Lee, Kin Hong

    2006-01-01

    This paper presents a novel Genetic Parallel Programming (GPP) paradigm for evolving parallel programs running on a Multi-Arithmetic-Logic-Unit (Multi-ALU) Processor (MAP). The MAP is a Multiple Instruction-streams, Multiple Data-streams (MIMD), general-purpose register machine that can be implemented on modern Very Large-Scale Integrated Circuits (VLSIs) in order to evaluate genetic programs at high speed. For human programmers, writing parallel programs is more difficult than writing sequential programs. However, experimental results show that GPP evolves parallel programs with less computational effort than that of their sequential counterparts. It creates a new approach to evolving a feasible problem solution in parallel program form and then serializes it into a sequential program if required. The effectiveness and efficiency of GPP are investigated using a suite of 14 well-studied benchmark problems. Experimental results show that GPP speeds up evolution substantially.

  4. Requirements for Data-Parallel Programming Environments

    Science.gov (United States)

    1994-04-22

    fully automatic techniques would be insufficient by themselves to support general parallel programming , even in the limited domain of scientific...computation. In other words, in an effective parallel programming system, the programmer would have to provide additional information to help the system...convey an understanding of the tools and strategies that will be needed to adequately support efficient, machine-independent, data- parallel programming .

  5. Parallel Programming Environment for OpenMP

    Directory of Open Access Journals (Sweden)

    Insung Park

    2001-01-01

    Full Text Available We present our effort to provide a comprehensive parallel programming environment for the OpenMP parallel directive language. This environment includes a parallel programming methodology for the OpenMP programming model and a set of tools (Ursa Minor and InterPol that support this methodology. Our toolset provides automated and interactive assistance to parallel programmers in time-consuming tasks of the proposed methodology. The features provided by our tools include performance and program structure visualization, interactive optimization, support for performance modeling, and performance advising for finding and correcting performance problems. The presented evaluation demonstrates that our environment offers significant support in general parallel tuning efforts and that the toolset facilitates many common tasks in OpenMP parallel programming in an efficient manner.

  6. PDDP, A Data Parallel Programming Model

    Directory of Open Access Journals (Sweden)

    Karen H. Warren

    1996-01-01

    Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.

  7. Synthetic models of distributed memory parallel programs

    Energy Technology Data Exchange (ETDEWEB)

    Poplawski, D.A. (Michigan Technological Univ., Houghton, MI (USA). Dept. of Computer Science)

    1990-09-01

    This paper deals with the construction and use of simple synthetic programs that model the behavior of more complex, real parallel programs. Synthetic programs can be used in many ways: to construct an easily ported suite of benchmark programs, to experiment with alternate parallel implementations of a program without actually writing them, and to predict the behavior and performance of an algorithm on a new or hypothetical machine. Synthetic programs are constructed easily from scratch, from existing programs, and can even be constructed using nothing but information obtained from traces of the real program's execution.

  8. Parallel programming characteristics of a DSP-based parallel system

    Institute of Scientific and Technical Information of China (English)

    GAO Shu; GUO Qing-ping

    2006-01-01

    This paper firstly introduces the structure and working principle of DSP-based parallel system, parallel accelerating board and SHARC DSP chip. Then it pays attention to investigating the system's programming characteristics, especially the mode of communication, discussing how to design parallel algorithms and presenting a domain-decomposition-based complete multi-grid parallel algorithm with virtual boundary forecast (VBF) to solve a lot of large-scale and complicated heat problems. In the end, Mandelbrot Set and a non-linear heat transfer equation of ceramic/metal composite material are taken as examples to illustrate the implementation of the proposed algorithm. The results showed that the solutions are highly efficient and have linear speedup.

  9. Massively Parallel Finite Element Programming

    KAUST Repository

    Heister, Timo

    2010-01-01

    Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.

  10. Experiences in Data-Parallel Programming

    Directory of Open Access Journals (Sweden)

    Terry W. Clark

    1997-01-01

    Full Text Available To efficiently parallelize a scientific application with a data-parallel compiler requires certain structural properties in the source program, and conversely, the absence of others. A recent parallelization effort of ours reinforced this observation and motivated this correspondence. Specifically, we have transformed a Fortran 77 version of GROMOS, a popular dusty-deck program for molecular dynamics, into Fortran D, a data-parallel dialect of Fortran. During this transformation we have encountered a number of difficulties that probably are neither limited to this particular application nor do they seem likely to be addressed by improved compiler technology in the near future. Our experience with GROMOS suggests a number of points to keep in mind when developing software that may at some time in its life cycle be parallelized with a data-parallel compiler. This note presents some guidelines for engineering data-parallel applications that are compatible with Fortran D or High Performance Fortran compilers.

  11. Productive Parallel Programming: The PCN Approach

    Directory of Open Access Journals (Sweden)

    Ian Foster

    1992-01-01

    Full Text Available We describe the PCN programming system, focusing on those features designed to improve the productivity of scientists and engineers using parallel supercomputers. These features include a simple notation for the concise specification of concurrent algorithms, the ability to incorporate existing Fortran and C code into parallel applications, facilities for reusing parallel program components, a portable toolkit that allows applications to be developed on a workstation or small parallel computer and run unchanged on supercomputers, and integrated debugging and performance analysis tools. We survey representative scientific applications and identify problem classes for which PCN has proved particularly useful.

  12. A survey of parallel programming tools

    Science.gov (United States)

    Cheng, Doreen Y.

    1991-01-01

    This survey examines 39 parallel programming tools. Focus is placed on those tool capabilites needed for parallel scientific programming rather than for general computer science. The tools are classified with current and future needs of Numerical Aerodynamic Simulator (NAS) in mind: existing and anticipated NAS supercomputers and workstations; operating systems; programming languages; and applications. They are divided into four categories: suggested acquisitions, tools already brought in; tools worth tracking; and tools eliminated from further consideration at this time.

  13. Using the High-Level Based Program Interface to Facilitate the Large Scale Scientific Computing

    Directory of Open Access Journals (Sweden)

    Yizi Shang

    2014-01-01

    Full Text Available This paper is to make further research on facilitating the large-scale scientific computing on the grid and the desktop grid platform. The related issues include the programming method, the overhead of the high-level program interface based middleware, and the data anticipate migration. The block based Gauss Jordan algorithm as a real example of large-scale scientific computing is used to evaluate those issues presented above. The results show that the high-level based program interface makes the complex scientific applications on large-scale scientific platform easier, though a little overhead is unavoidable. Also, the data anticipation migration mechanism can improve the efficiency of the platform which needs to process big data based scientific applications.

  14. Towards Implementation of a Generalized Architecture for High-Level Quantum Programming Language

    Science.gov (United States)

    Ameen, El-Mahdy M.; Ali, Hesham A.; Salem, Mofreh M.; Badawy, Mahmoud

    2017-08-01

    This paper investigates a novel architecture to the problem of quantum computer programming. A generalized architecture for a high-level quantum programming language has been proposed. Therefore, the programming evolution from the complicated quantum-based programming to the high-level quantum independent programming will be achieved. The proposed architecture receives the high-level source code and, automatically transforms it into the equivalent quantum representation. This architecture involves two layers which are the programmer layer and the compilation layer. These layers have been implemented in the state of the art of three main stages; pre-classification, classification, and post-classification stages respectively. The basic building block of each stage has been divided into subsequent phases. Each phase has been implemented to perform the required transformations from one representation to another. A verification process was exposed using a case study to investigate the ability of the compiler to perform all transformation processes. Experimental results showed that the efficacy of the proposed compiler achieves a correspondence correlation coefficient about R ≈ 1 between outputs and the targets. Also, an obvious achievement has been utilized with respect to the consumed time in the optimization process compared to other techniques. In the online optimization process, the consumed time has increased exponentially against the amount of accuracy needed. However, in the proposed offline optimization process has increased gradually.

  15. Effectiveness of one-time psychoeducational programming for students with high levels of eating concerns.

    Science.gov (United States)

    Tillman, Kathleen S; Sell, Darcie M; Yates, Lindsay A; Mueller, Nichole

    2015-12-01

    This study investigated the effectiveness of on-campus programming for National Eating Disorder Awareness Week at increasing knowledge of available treatment options and help-seeking intentions for participants with low and high levels of eating concerns. Program attendees were approached as they entered the space reserved for programming and were asked to participate in the study. One hundred thirty-six college students completed the study questionnaire both immediately before attending programming (pre-test) and immediately after attending programming (post-test). Results indicate that after programming both populations reported significantly greater knowledge of on-campus resources and help-seeking intentions for themselves. Only low eating concern participants reported significantly increased help-seeking intentions for a friend. Psychoeducational programming for eating disorders can be effective at increasing access to treatment and encouraging help seeking behaviors for students.

  16. Integrated Task and Data Parallel Programming

    Science.gov (United States)

    Grimshaw, A. S.

    1998-01-01

    This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated

  17. The PISCES 2 parallel programming environment

    Science.gov (United States)

    Pratt, Terrence W.

    1987-01-01

    PISCES 2 is a programming environment for scientific and engineering computations on MIMD parallel computers. It is currently implemented on a flexible FLEX/32 at NASA Langley, a 20 processor machine with both shared and local memories. The environment provides an extended Fortran for applications programming, a configuration environment for setting up a run on the parallel machine, and a run-time environment for monitoring and controlling program execution. This paper describes the overall design of the system and its implementation on the FLEX/32. Emphasis is placed on several novel aspects of the design: the use of a carefully defined virtual machine, programmer control of the mapping of virtual machine to actual hardware, forces for medium-granularity parallelism, and windows for parallel distribution of data. Some preliminary measurements of storage use are included.

  18. High-level Programming and Symbolic Reasoning on IoT Resource Constrained Devices

    Directory of Open Access Journals (Sweden)

    Sal vatore Gaglio

    2015-05-01

    Full Text Available While the vision of Internet of Things (IoT is rather inspiring, its practical implementation remains challenging. Conventional programming approaches prove unsuitable to provide IoT resource constrained devices with the distributed processing capabilities required to implement intelligent, autonomic, and self-organizing behaviors. In our previous work, we had already proposed an alternative programming methodology for such systems that is characterized by high-level programming and symbolic expressions evaluation, and developed a lightweight middleware to support it. Our approach allows for interactive programming of deployed nodes, and it is based on the simple but e ective paradigm of executable code exchange among nodes. In this paper, we show how our methodology can be used to provide IoT resource constrained devices with reasoning abilities by implementing a Fuzzy Logic symbolic extension on deployed nodes at runtime.

  19. Automatic Parallelization Tool: Classification of Program Code for Parallel Computing

    Directory of Open Access Journals (Sweden)

    Mustafa Basthikodi

    2016-04-01

    Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.

  20. Guidelines for development of structural integrity programs for DOE high-level waste storage tanks

    Energy Technology Data Exchange (ETDEWEB)

    Bandyopadhyay, K.; Bush, S.; Kassir, M.; Mather, B.; Shewmon, P.; Streicher, M.; Thompson, B.; Rooyen, D. van; Weeks, J.

    1997-01-01

    Guidelines are provided for developing programs to promote the structural integrity of high-level waste storage tanks and transfer lines at the facilities of the Department of Energy. Elements of the program plan include a leak-detection system, definition of appropriate loads, collection of data for possible material and geometric changes, assessment of the tank structure, and non-destructive examination. Possible aging degradation mechanisms are explored for both steel and concrete components of the tanks, and evaluated to screen out nonsignificant aging mechanisms and to indicate methods of controlling the significant aging mechanisms. Specific guidelines for assessing structural adequacy will be provided in companion documents. Site-specific structural integrity programs can be developed drawing on the relevant portions of the material in this document.

  1. Plasmonics and the parallel programming problem

    Science.gov (United States)

    Vishkin, Uzi; Smolyaninov, Igor; Davis, Chris

    2007-02-01

    While many parallel computers have been built, it has generally been too difficult to program them. Now, all computers are effectively becoming parallel machines. Biannual doubling in the number of cores on a single chip, or faster, over the coming decade is planned by most computer vendors. Thus, the parallel programming problem is becoming more critical. The only known solution to the parallel programming problem in the theory of computer science is through a parallel algorithmic theory called PRAM. Unfortunately, some of the PRAM theory assumptions regarding the bandwidth between processors and memories did not properly reflect a parallel computer that could be built in previous decades. Reaching memories, or other processors in a multi-processor organization, required off-chip connections through pins on the boundary of each electric chip. Using the number of transistors that is becoming available on chip, on-chip architectures that adequately support the PRAM are becoming possible. However, the bandwidth of off-chip connections remains insufficient and the latency remains too high. This creates a bottleneck at the boundary of the chip for a PRAM-On-Chip architecture. This also prevents scalability to larger "supercomputing" organizations spanning across many processing chips that can handle massive amounts of data. Instead of connections through pins and wires, power-efficient CMOS-compatible on-chip conversion to plasmonic nanowaveguides is introduced for improved latency and bandwidth. Proper incorporation of our ideas offer exciting avenues to resolving the parallel programming problem, and an alternative way for building faster, more useable and much more compact supercomputers.

  2. OpenCL parallel programming development cookbook

    CERN Document Server

    Tay, Raymond

    2013-01-01

    OpenCL Parallel Programming Development Cookbook will provide a set of advanced recipes that can be utilized to optimize existing code. This book is therefore ideal for experienced developers with a working knowledge of C/C++ and OpenCL.This book is intended for software developers who have often wondered what to do with that newly bought CPU or GPU they bought other than using it for playing computer games; this book is also for developers who have a working knowledge of C/C++ and who want to learn how to write parallel programs in OpenCL so that life isn't too boring.

  3. International program to study subseabed disposal of high-level radioactive wastes

    Energy Technology Data Exchange (ETDEWEB)

    Carlin, E.M.; Hinga, K.R.; Knauss, J.A.

    1984-01-01

    This report provides an overview of the international program to study seabed disposal of nuclear wastes. Its purpose is to inform legislators, other policy makers, and the general public as to the history of the program, technological requirements necessary for feasibility assessment, legal questions involved, international coordination of research, national policies, and research and development activities. Each of these major aspects of the program is presented in a separate section. The objective of seabed burial, similar to its continental counterparts, is to contain and to isolate the wastes. The subseabed option should not be confuesed with past practices of ocean dumping which have introduced wastes into ocean waters. Seabed disposal refers to the emplacement of solidified high-level radioactive waste (with or without reprocessing) in certain geologically stable sediments of the deep ocean floor. Specially designed surface ships would transport waste canisters from a port facility to the disposal site. Canisters would be buried from a few tens to a few hundreds of meters below the surface of ocean bottom sediments, and hence would not be in contact with the overlying ocean water. The concept is a multi-barrier approach for disposal. Barriers, including waste form, canister, ad deep ocean sediments, will separate wastes from the ocean environment. High-level wastes (HLW) would be stabilized by conversion into a leach-resistant solid form such as glass. This solid would be placed inside a metallic canister or other type of package which represents a second barrier. The deep ocean sediments, a third barrier, are discussed in the Feasibility Assessment section. The waste form and canister would provide a barrier for several hundred years, and the sediments would be relied upon as a barrier for thousands of years. 62 references, 3 figures, 2 tables.

  4. Contributions to computational stereology and parallel programming

    DEFF Research Database (Denmark)

    Rasmusson, Allan

    rotator, even without the need for isotropic sections. To meet the need for computational power to perform image restoration of virtual tissue sections, parallel programming on GPUs has also been part of the project. This has lead to a significant change in paradigm for a previously developed surgical...

  5. Parallel Volunteer Learning during Youth Programs

    Science.gov (United States)

    Lesmeister, Marilyn K.; Green, Jeremy; Derby, Amy; Bothum, Candi

    2012-01-01

    Lack of time is a hindrance for volunteers to participate in educational opportunities, yet volunteer success in an organization is tied to the orientation and education they receive. Meeting diverse educational needs of volunteers can be a challenge for program managers. Scheduling a Volunteer Learning Track for chaperones that is parallel to a…

  6. FPGA Implementation of Blue Whale Calls Classifier Using High-Level Programming Tool

    Directory of Open Access Journals (Sweden)

    Mohammed Bahoura

    2016-02-01

    Full Text Available In this paper, we propose a hardware-based architecture for automatic blue whale calls classification based on short-time Fourier transform and multilayer perceptron neural network. The proposed architecture is implemented on field programmable gate array (FPGA using Xilinx System Generator (XSG and the Nexys-4 Artix-7 FPGA board. This high-level programming tool allows us to design, simulate and execute the compiled design in Matlab/Simulink environment quickly and easily. Intermediate signals obtained at various steps of the proposed system are presented for typical blue whale calls. Classification performances based on the fixed-point XSG/FPGA implementation are compared to those obtained by the floating-point Matlab simulation, using a representative database of the blue whale calls.

  7. Automatic Compilation from High-Level Biologically-Oriented Programming Language to Genetic Regulatory Networks

    Science.gov (United States)

    Beal, Jacob; Lu, Ting; Weiss, Ron

    2011-01-01

    Background The field of synthetic biology promises to revolutionize our ability to engineer biological systems, providing important benefits for a variety of applications. Recent advances in DNA synthesis and automated DNA assembly technologies suggest that it is now possible to construct synthetic systems of significant complexity. However, while a variety of novel genetic devices and small engineered gene networks have been successfully demonstrated, the regulatory complexity of synthetic systems that have been reported recently has somewhat plateaued due to a variety of factors, including the complexity of biology itself and the lag in our ability to design and optimize sophisticated biological circuitry. Methodology/Principal Findings To address the gap between DNA synthesis and circuit design capabilities, we present a platform that enables synthetic biologists to express desired behavior using a convenient high-level biologically-oriented programming language, Proto. The high level specification is compiled, using a regulatory motif based mechanism, to a gene network, optimized, and then converted to a computational simulation for numerical verification. Through several example programs we illustrate the automated process of biological system design with our platform, and show that our compiler optimizations can yield significant reductions in the number of genes () and latency of the optimized engineered gene networks. Conclusions/Significance Our platform provides a convenient and accessible tool for the automated design of sophisticated synthetic biological systems, bridging an important gap between DNA synthesis and circuit design capabilities. Our platform is user-friendly and features biologically relevant compiler optimizations, providing an important foundation for the development of sophisticated biological systems. PMID:21850228

  8. Concurrency-based approaches to parallel programming

    Science.gov (United States)

    Kale, L.V.; Chrisochoides, N.; Kohl, J.; Yelick, K.

    1995-01-01

    The inevitable transition to parallel programming can be facilitated by appropriate tools, including languages and libraries. After describing the needs of applications developers, this paper presents three specific approaches aimed at development of efficient and reusable parallel software for irregular and dynamic-structured problems. A salient feature of all three approaches in their exploitation of concurrency within a processor. Benefits of individual approaches such as these can be leveraged by an interoperability environment which permits modules written using different approaches to co-exist in single applications.

  9. Towards HPC++: A Unified Approach to Parallel Programming in C++

    Science.gov (United States)

    1998-10-30

    Compositional C++ or CC++, is a general purpose parallel programming language designed to support a wide range of parallel programming styles. By...appropriate for parallelizing the range of applications that one would write in C++. CC++ supports the integration of different parallel programming styles

  10. Programming massively parallel processors a hands-on approach

    CERN Document Server

    Kirk, David B

    2010-01-01

    Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...

  11. Specifying and Executing Optimizations for Parallel Programs

    Directory of Open Access Journals (Sweden)

    William Mansky

    2014-07-01

    Full Text Available Compiler optimizations, usually expressed as rewrites on program graphs, are a core part of all modern compilers. However, even production compilers have bugs, and these bugs are difficult to detect and resolve. The problem only becomes more complex when compiling parallel programs; from the choice of graph representation to the possibility of race conditions, optimization designers have a range of factors to consider that do not appear when dealing with single-threaded programs. In this paper we present PTRANS, a domain-specific language for formal specification of compiler transformations, and describe its executable semantics. The fundamental approach of PTRANS is to describe program transformations as rewrites on control flow graphs with temporal logic side conditions. The syntax of PTRANS allows cleaner, more comprehensible specification of program optimizations; its executable semantics allows these specifications to act as prototypes for the optimizations themselves, so that candidate optimizations can be tested and refined before going on to include them in a compiler. We demonstrate the use of PTRANS to state, test, and refine the specification of a redundant store elimination optimization on parallel programs.

  12. Timing-Sequence Testing of Parallel Programs

    Institute of Scientific and Technical Information of China (English)

    LIANG Yu; LI Shu; ZHANG Hui; HAN Chengde

    2000-01-01

    Testing of parallel programs involves two parts-testing of controlflow within the processes and testing of timing-sequence.This paper focuses on the latter, particularly on the timing-sequence of message-passing paradigms.Firstly the coarse-grained SYN-sequence model is built up to describe the execution of distributed programs. All of the topics discussed in this paper are based on it. The most direct way to test a program is to run it. A fault-free parallel program should be of both correct computing results and proper SYN-sequence. In order to analyze the validity of observed SYN-sequence, this paper presents the formal specification (Backus Normal Form) of the valid SYN-sequence. Till now there is little work about the testing coverage for distributed programs. Calculating the number of the valid SYN-sequences is the key to coverage problem, while the number of the valid SYN-sequences is terribly large and it is very hard to obtain the combination law among SYN-events. In order to resolve this problem, this paper proposes an efficient testing strategy-atomic SYN-event testing, which is to linearize the SYN-sequence (making it only consist of serial atomic SYN-events) first and then test each atomic SYN-event independently. This paper particularly provides the calculating formula about the number of the valid SYN-sequences for tree-topology atomic SYN-event (broadcast and combine). Furthermore,the number of valid SYN-sequences also,to some degree, mirrors the testability of parallel programs. Taking tree-topology atomic SYN-event as an example, this paper demonstrates the testability and communication speed of the tree-topology atomic SYN-event under different numbers of branches in order to achieve a more satisfactory tradeoff between testability and communication efficiency.

  13. MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

    Science.gov (United States)

    Taft, James R.

    1999-01-01

    Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.

  14. Parallel Programming with MatlabMPI

    CERN Document Server

    Kepner, J V

    2001-01-01

    MatlabMPI is a Matlab implementation of the Message Passing Interface (MPI) standard and allows any Matlab program to exploit multiple processors. MatlabMPI currently implements the basic six functions that are the core of the MPI point-to-point communications standard. The key technical innovation of MatlabMPI is that it implements the widely used MPI ``look and feel'' on top of standard Matlab file I/O, resulting in an extremely compact (~100 lines) and ``pure'' implementation which runs anywhere Matlab runs. The performance has been tested on both shared and distributed memory parallel computers. MatlabMPI can match the bandwidth of C based MPI at large message sizes. A test image filtering application using MatlabMPI achieved a speedup of ~70 on a parallel computer.

  15. Array distribution in data-parallel programs

    Science.gov (United States)

    Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert; Sheffler, Thomas J.

    1994-01-01

    We consider distribution at compile time of the array data in a distributed-memory implementation of a data-parallel program written in a language like Fortran 90. We allow dynamic redistribution of data and define a heuristic algorithmic framework that chooses distribution parameters to minimize an estimate of program completion time. We represent the program as an alignment-distribution graph. We propose a divide-and-conquer algorithm for distribution that initially assigns a common distribution to each node of the graph and successively refines this assignment, taking computation, realignment, and redistribution costs into account. We explain how to estimate the effect of distribution on computation cost and how to choose a candidate set of distributions. We present the results of an implementation of our algorithms on several test problems.

  16. XJava: Exploiting Parallelism with Object-Oriented Stream Programming

    Science.gov (United States)

    Otto, Frank; Pankratius, Victor; Tichy, Walter F.

    This paper presents the XJava compiler for parallel programs. It exploits parallelism based on an object-oriented stream programming paradigm. XJava extends Java with new parallel constructs that do not expose programmers to low-level details of parallel programming on shared memory machines. Tasks define composable parallel activities, and new operators allow an easier expression of parallel patterns, such as pipelines, divide and conquer, or master/worker. We also present an automatic run-time mechanism that extends our previous work to automatically map tasks and parallel statements to threads.

  17. Profiling parallel Mercury programs with ThreadScope

    CERN Document Server

    Bone, Paul

    2011-01-01

    The behavior of parallel programs is even harder to understand than the behavior of sequential programs. Parallel programs may suffer from any of the performance problems affecting sequential programs, as well as from several problems unique to parallel systems. Many of these problems are quite hard (or even practically impossible) to diagnose without help from specialized tools. We present a proposal for a tool for profiling the parallel execution of Mercury programs, a proposal whose implementation we have already started. This tool is an adaptation and extension of the ThreadScope profiler that was first built to help programmers visualize the execution of parallel Haskell programs.

  18. A Tutorial on Parallel and Concurrent Programming in Haskell

    Science.gov (United States)

    Peyton Jones, Simon; Singh, Satnam

    This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.

  19. Programming in Manticore, a Heterogenous Parallel Functional Language

    Science.gov (United States)

    Fluet, Matthew; Bergstrom, Lars; Ford, Nic; Rainey, Mike; Reppy, John; Shaw, Adam; Xiao, Yingqi

    The Manticore project is an effort to design and implement a new functional language for parallel programming. Unlike many earlier parallel languages, Manticore is a heterogeneous language that supports parallelism at multiple levels. Specifically, the Manticore language combines Concurrent ML-style explicit concurrency with fine-grain, implicitly threaded, parallel constructs. These lectures will introduce the Manticore language and explore a variety of programs written to take advantage of heterogeneous parallelism.

  20. Four styles of parallel and net programming

    Institute of Scientific and Technical Information of China (English)

    Zhiwei XU; Yongqiang HE; Wei LIN; Li ZHA

    2009-01-01

    This paper reviews the programming landscape for parallel and network computing systems, focusing on four styles of concurrent programming models, and example languages/libraries. The four styles correspond to four scales of the targeted systems. At the smallest coprocessor scale, Single Instruction Multiple Thread (SIMT) and Compute Unified Device Architecture (CUDA) are considered. Transactional memory is discussed at the multicore or process scale. The MapReduce style is ex-amined at the datacenter scale. At the Internet scale, Grid Service Markup Language (GSML) is reviewed, which intends to integrate resources distributed across multiple dat-acenters.The four styles are concerned with and emphasize differ-ent issues, which are needed by systems at different scales. This paper discusses issues related to efficiency, ease of use, and expressiveness.

  1. Parallel Programming Strategies for Irregular Adaptive Applications

    Science.gov (United States)

    Biswas, Rupak; Biegel, Bryan (Technical Monitor)

    2001-01-01

    Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance for such computations. In this work, we examine two typical irregular adaptive applications, Dynamic Remeshing and N-Body, under competing programming methodologies and across various parallel architectures. The Dynamic Remeshing application simulates flow over an airfoil, and refines localized regions of the underlying unstructured mesh. The N-Body experiment models two neighboring Plummer galaxies that are about to undergo a merger. Both problems demonstrate dramatic changes in processor workloads and interprocessor communication with time; thus, dynamic load balancing is a required component.

  2. VERIFICATION OF PARALLEL AUTOMATA-BASED PROGRAMS

    Directory of Open Access Journals (Sweden)

    M. A. Lukin

    2014-01-01

    Full Text Available The paper deals with an interactive method of automatic verification for parallel automata-based programs. The hierarchical state machines can be implemented in different threads and can interact with each other. Verification is done by means of Spin tool and includes automatic Promela model construction, conversion of LTL-formula to Spin format and counterexamples in terms of automata. Interactive verification gives the possibility to decrease verification time and increase the maximum size of verifiable programs. Considered method supports verification of the parallel system for hierarchical automata that interact with each other through messages and shared variables. The feature of automaton model is that each state machine is considered as a new data type and can have an arbitrary bounded number of instances. Each state machine in the system can run a different state machine in a new thread or have nested state machine. This method was implemented in the developed Stater tool. Stater shows correct operation for all test cases.

  3. Hanford High-Level Waste Vitrification Program at the Pacific Northwest National Laboratory: technology development - annotated bibliography

    Energy Technology Data Exchange (ETDEWEB)

    Larson, D.E.

    1996-09-01

    This report provides a collection of annotated bibliographies for documents prepared under the Hanford High-Level Waste Vitrification (Plant) Program. The bibliographies are for documents from Fiscal Year 1983 through Fiscal Year 1995, and include work conducted at or under the direction of the Pacific Northwest National Laboratory. The bibliographies included focus on the technology developed over the specified time period for vitrifying Hanford pretreated high-level waste. The following subject areas are included: General Documentation; Program Documentation; High-Level Waste Characterization; Glass Formulation and Characterization; Feed Preparation; Radioactive Feed Preparation and Glass Properties Testing; Full-Scale Feed Preparation Testing; Equipment Materials Testing; Melter Performance Assessment and Evaluations; Liquid-Fed Ceramic Melter; Cold Crucible Melter; Stirred Melter; High-Temperature Melter; Melter Off-Gas Treatment; Vitrification Waste Treatment; Process, Product Control and Modeling; Analytical; and Canister Closure, Decontamination, and Handling

  4. Multilanguage parallel programming of heterogeneous machines

    Energy Technology Data Exchange (ETDEWEB)

    Bisiani, R.; Forin, A.

    1988-08-01

    The authors designed and implemented a system, Agora, that supports the development of multilanguage parallel applications for heterogeneous machines. Agora hinges on two ideas: the first one is that shared memory can be a suitable abstraction to program concurrent, multilanguage modules running on heterogeneous machines. The second one is that a shared memory abstraction can efficiently supported across different computer architectures that are not connected by a physical shared memory, for example local are network workstations or ensemble machines. Agora has been in use for more than a year. This paper describes the Agora shared memory and its software implementation on both tightly and loosely coupled architectures. Measurements of the current implementation are also included.

  5. F-Nets and Software Cabling: Deriving a Formal Model and Language for Portable Parallel Programming

    Science.gov (United States)

    DiNucci, David C.; Saini, Subhash (Technical Monitor)

    1998-01-01

    Parallel programming is still being based upon antiquated sequence-based definitions of the terms "algorithm" and "computation", resulting in programs which are architecture dependent and difficult to design and analyze. By focusing on obstacles inherent in existing practice, a more portable model is derived here, which is then formalized into a model called Soviets which utilizes a combination of imperative and functional styles. This formalization suggests more general notions of algorithm and computation, as well as insights into the meaning of structured programming in a parallel setting. To illustrate how these principles can be applied, a very-high-level graphical architecture-independent parallel language, called Software Cabling, is described, with many of the features normally expected from today's computer languages (e.g. data abstraction, data parallelism, and object-based programming constructs).

  6. Human factors programs for high-level radioactive waste handling systems

    Energy Technology Data Exchange (ETDEWEB)

    Pond, D.J.

    1992-04-01

    Human Factors is the discipline concerned with the acquisition of knowledge about human capabilities and limitations, and the application of such knowledge to the design of systems. This paper discusses the range of human factors issues relevant to high-level radioactive waste (HLRW) management systems and, based on examples from other organizations, presents mechanisms through which to assure application of such expertise in the safe, efficient, and effective management and disposal of high-level waste. Additionally, specific attention is directed toward consideration of who might be classified as a human factors specialist, why human factors expertise is critical to the success of the HLRW management system, and determining when human factors specialists should become involved in the design and development process.

  7. Machine and Collection Abstractions for User-Implemented Data-Parallel Programming

    Directory of Open Access Journals (Sweden)

    Magne Haveraaen

    2000-01-01

    Full Text Available Data parallelism has appeared as a fruitful approach to the parallelisation of compute-intensive programs. Data parallelism has the advantage of mimicking the sequential (and deterministic structure of programs as opposed to task parallelism, where the explicit interaction of processes has to be programmed. In data parallelism data structures, typically collection classes in the form of large arrays, are distributed on the processors of the target parallel machine. Trying to extract distribution aspects from conventional code often runs into problems with a lack of uniformity in the use of the data structures and in the expression of data dependency patterns within the code. Here we propose a framework with two conceptual classes, Machine and Collection. The Machine class abstracts hardware communication and distribution properties. This gives a programmer high-level access to the important parts of the low-level architecture. The Machine class may readily be used in the implementation of a Collection class, giving the programmer full control of the parallel distribution of data, as well as allowing normal sequential implementation of this class. Any program using such a collection class will be parallelisable, without requiring any modification, by choosing between sequential and parallel versions at link time. Experiments with a commercial application, built using the Sophus library which uses this approach to parallelisation, show good parallel speed-ups, without any adaptation of the application program being needed.

  8. Automatic Generation of Optimized and Synthesizable Hardware Implementation from High-Level Dataflow Programs

    Directory of Open Access Journals (Sweden)

    Khaled Jerbi

    2012-01-01

    Full Text Available In this paper, we introduce the Reconfigurable Video Coding (RVC standard based on the idea that video processing algorithms can be defined as a library of components that can be updated and standardized separately. MPEG RVC framework aims at providing a unified high-level specification of current MPEG coding technologies using a dataflow language called Cal Actor Language (CAL. CAL is associated with a set of tools to design dataflow applications and to generate hardware and software implementations. Before this work, the existing CAL hardware compilers did not support high-level features of the CAL. After presenting the main notions of the RVC standard, this paper introduces an automatic transformation process that analyses the non-compliant features and makes the required changes in the intermediate representation of the compiler while keeping the same behavior. Finally, the implementation results of the transformation on video and still image decoders are summarized. We show that the obtained results can largely satisfy the real time constraints for an embedded design on FPGA as we obtain a throughput of 73 FPS for MPEG 4 decoder and 34 FPS for coding and decoding process of the LAR coder using a video of CIF image size. This work resolves the main limitation of hardware generation from CAL designs.

  9. Data-Parallel Programming in a Multithreaded Environment

    Directory of Open Access Journals (Sweden)

    Matthew Haines

    1997-01-01

    Full Text Available Research on programming distributed memory multiprocessors has resulted in a well-understood programming model, namely data-parallel programming. However, data-parallel programming in a multithreaded environment is far less understood. For example, if multiple threads within the same process belong to different data-parallel computations, then the architecture, compiler, or run-time system must ensure that relative indexing and collective operations are handled properly and efficiently. We introduce a run-time-based solution for data-parallel programming in a distributed memory environment that handles the problems of relative indexing and collective communications among thread groups. As a result, the data-parallel programming model can now be executed in a multithreaded environment, such as a system using threads to support both task and data parallelism.

  10. Automatic Parallelization and Optimization of Programs by Proof Rewriting

    NARCIS (Netherlands)

    Hurlin, C.; Palsberg, J.; Su, Z.

    2009-01-01

    We show how, given a program and its separation logic proof, one can parallelize and optimize this program and transform its proof simultaneously to obtain a proven parallelized and optimized program. To achieve this goal, we present new proof rules for generating proof trees and a rewrite system on

  11. On the Performance of the Python Programming Language for Serial and Parallel Scientific Computations

    Directory of Open Access Journals (Sweden)

    Xing Cai

    2005-01-01

    Full Text Available This article addresses the performance of scientific applications that use the Python programming language. First, we investigate several techniques for improving the computational efficiency of serial Python codes. Then, we discuss the basic programming techniques in Python for parallelizing serial scientific applications. It is shown that an efficient implementation of the array-related operations is essential for achieving good parallel performance, as for the serial case. Once the array-related operations are efficiently implemented, probably using a mixed-language implementation, good serial and parallel performance become achievable. This is confirmed by a set of numerical experiments. Python is also shown to be well suited for writing high-level parallel programs.

  12. What a Parallel Programming Language Has to Let You Say,

    Science.gov (United States)

    1984-09-01

    RD-fl147 854 WHAT A PARALLEL PROGRAMMING LANGUAGE HAS TO LET YOU SAY 1/1 (U) MASSACHUSETTS INST OF TECH CAMBRIDGE ARTIFICIAL INTELLIGENCE LAB A...What a parallel programming language has to let I* you say 6. PERFORMING ORG. REPORT NUMNER io 8. CONTRACT OR GRANT NUSeR(8e) Alan Bawden/Philip E. Agre...Massachusetts Institute of Technology Artificial Intelligence Laboratory AI Memo 796 September 1984 What a parallel programming language has to let

  13. PDDP: A data parallel programming model. Revision 1

    Energy Technology Data Exchange (ETDEWEB)

    Warren, K.H.

    1995-06-01

    PDDP, the Parallel Data Distribution Preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP impelments High Performance Fortran compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the (WRERE?) construct. Distribued data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared-memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.

  14. Optimizing FORTRAN Programs for Hierarchical Memory Parallel Processing Systems

    Institute of Scientific and Technical Information of China (English)

    金国华; 陈福接

    1993-01-01

    Parallel loops account for the greatest amount of parallelism in numerical programs.Executing nested loops in parallel with low run-time overhead is thus very important for achieving high performance in parallel processing systems.However,in parallel processing systems with caches or local memories in memory hierarchies,“thrashing problemmay”may arise whenever data move back and forth between the caches or local memories in different processors.Previous techniques can only deal with the rather simple cases with one linear function in the perfactly nested loop.In this paper,we present a parallel program optimizing technique called hybri loop interchange(HLI)for the cases with multiple linear functions and loop-carried data dependences in the nested loop.With HLI we can easily eliminate or reduce the thrashing phenomena without reucing the program parallelism.

  15. Parallel Programming with Matrix Distributed Processing

    CERN Document Server

    Di Pierro, Massimo

    2005-01-01

    Matrix Distributed Processing (MDP) is a C++ library for fast development of efficient parallel algorithms. It constitues the core of FermiQCD. MDP enables programmers to focus on algorithms, while parallelization is dealt with automatically and transparently. Here we present a brief overview of MDP and examples of applications in Computer Science (Cellular Automata), Engineering (PDE Solver) and Physics (Ising Model).

  16. COMPARATIVE-EVALUATION OF HIGH-LEVEL REAL-TIME PROGRAMMING-LANGUAGES

    NARCIS (Netherlands)

    HALANG, WA; STOYENKO, AD

    1990-01-01

    Owing to the fast growing need for better means of building real-time systems, a number of representative languages used in real-time programming is surveyed. The evaluation focuses on seven languages which possess explicit real-time features. Based on a categorization of the latter, the seven langu

  17. An interactive parallel programming environment applied in atmospheric science

    Science.gov (United States)

    vonLaszewski, G.

    1996-01-01

    This article introduces an interactive parallel programming environment (IPPE) that simplifies the generation and execution of parallel programs. One of the tasks of the environment is to generate message-passing parallel programs for homogeneous and heterogeneous computing platforms. The parallel programs are represented by using visual objects. This is accomplished with the help of a graphical programming editor that is implemented in Java and enables portability to a wide variety of computer platforms. In contrast to other graphical programming systems, reusable parts of the programs can be stored in a program library to support rapid prototyping. In addition, runtime performance data on different computing platforms is collected in a database. A selection process determines dynamically the software and the hardware platform to be used to solve the problem in minimal wall-clock time. The environment is currently being tested on a Grand Challenge problem, the NASA four-dimensional data assimilation system.

  18. Deterministic Consistency: A Programming Model for Shared Memory Parallelism

    OpenAIRE

    Aviram, Amittai; Ford, Bryan

    2009-01-01

    The difficulty of developing reliable parallel software is generating interest in deterministic environments, where a given program and input can yield only one possible result. Languages or type systems can enforce determinism in new code, and runtime systems can impose synthetic schedules on legacy parallel code. To parallelize existing serial code, however, we would like a programming model that is naturally deterministic without language restrictions or artificial scheduling. We propose "...

  19. The Interface Between Distributed Operating System and High-Level Programming Language. Revision.

    Science.gov (United States)

    1986-09-01

    O.S. pair. Relatively little attention has been devoted to the relationship between languages and O.S. kernels in a distributed setting. Amoeba [161...interaction not only between the pieces of a multi-process appli- cation, but also between separate applications and between user programs and long- lived ...a process the right to send requests (it is still free to send replies). Allow restores that right. Retry is equivalent to forbid followed by allon

  20. ALPHN: A computer program for calculating ([alpha], n) neutron production in canisters of high-level waste

    Energy Technology Data Exchange (ETDEWEB)

    Salmon, R.; Hermann, O.W.

    1992-10-01

    The rate of neutron production from ([alpha], n) reactions in canisters of immobilized high-level waste containing borosilicate glass or glass-ceramic compositions is significant and must be considered when estimating neutron shielding requirements. The personal computer program ALPHA calculates the ([alpha], n) neutron production rate of a canister of vitrified high-level waste. The user supplies the chemical composition of the glass or glass-ceramic and the curies of the alpha-emitting actinides present. The output of the program gives the ([alpha], n) neutron production of each actinide in neutrons per second and the total for the canister. The ([alpha], n) neutron production rates are source terms only; that is, they are production rates within the glass and do not take into account the shielding effect of the glass. For a given glass composition, the user can calculate up to eight cases simultaneously; these cases are based on the same glass composition but contain different quantities of actinides per canister. In a typical application, these cases might represent the same canister of vitrified high-level waste at eight different decay times. Run time for a typical problem containing 20 chemical species, 24 actinides, and 8 decay times was 35 s on an IBM AT personal computer. Results of an example based on an expected canister composition at the Defense Waste Processing Facility are shown.

  1. ALPHN: A computer program for calculating ({alpha}, n) neutron production in canisters of high-level waste

    Energy Technology Data Exchange (ETDEWEB)

    Salmon, R.; Hermann, O.W.

    1992-10-01

    The rate of neutron production from ({alpha}, n) reactions in canisters of immobilized high-level waste containing borosilicate glass or glass-ceramic compositions is significant and must be considered when estimating neutron shielding requirements. The personal computer program ALPHA calculates the ({alpha}, n) neutron production rate of a canister of vitrified high-level waste. The user supplies the chemical composition of the glass or glass-ceramic and the curies of the alpha-emitting actinides present. The output of the program gives the ({alpha}, n) neutron production of each actinide in neutrons per second and the total for the canister. The ({alpha}, n) neutron production rates are source terms only; that is, they are production rates within the glass and do not take into account the shielding effect of the glass. For a given glass composition, the user can calculate up to eight cases simultaneously; these cases are based on the same glass composition but contain different quantities of actinides per canister. In a typical application, these cases might represent the same canister of vitrified high-level waste at eight different decay times. Run time for a typical problem containing 20 chemical species, 24 actinides, and 8 decay times was 35 s on an IBM AT personal computer. Results of an example based on an expected canister composition at the Defense Waste Processing Facility are shown.

  2. Parallel programming practical aspects, models and current limitations

    CERN Document Server

    Tarkov, Mikhail S

    2014-01-01

    Parallel programming is designed for the use of parallel computer systems for solving time-consuming problems that cannot be solved on a sequential computer in a reasonable time. These problems can be divided into two classes: 1. Processing large data arrays (including processing images and signals in real time)2. Simulation of complex physical processes and chemical reactions For each of these classes, prospective methods are designed for solving problems. For data processing, one of the most promising technologies is the use of artificial neural networks. Particles-in-cell method and cellular automata are very useful for simulation. Problems of scalability of parallel algorithms and the transfer of existing parallel programs to future parallel computers are very acute now. An important task is to optimize the use of the equipment (including the CPU cache) of parallel computers. Along with parallelizing information processing, it is essential to ensure the processing reliability by the relevant organization ...

  3. RPython high-level synthesis

    Science.gov (United States)

    Cieszewski, Radoslaw; Linczuk, Maciej

    2016-09-01

    The development of FPGA technology and the increasing complexity of applications in recent decades have forced compilers to move to higher abstraction levels. Compilers interprets an algorithmic description of a desired behavior written in High-Level Languages (HLLs) and translate it to Hardware Description Languages (HDLs). This paper presents a RPython based High-Level synthesis (HLS) compiler. The compiler get the configuration parameters and map RPython program to VHDL. Then, VHDL code can be used to program FPGA chips. In comparison of other technologies usage, FPGAs have the potential to achieve far greater performance than software as a result of omitting the fetch-decode-execute operations of General Purpose Processors (GPUs), and introduce more parallel computation. This can be exploited by utilizing many resources at the same time. Creating parallel algorithms computed with FPGAs in pure HDL is difficult and time consuming. Implementation time can be greatly reduced with High-Level Synthesis compiler. This article describes design methodologies and tools, implementation and first results of created VHDL backend for RPython compiler.

  4. Professional Parallel Programming with C# Master Parallel Extensions with NET 4

    CERN Document Server

    Hillar, Gastón

    2010-01-01

    Expert guidance for those programming today's dual-core processors PCs As PC processors explode from one or two to now eight processors, there is an urgent need for programmers to master concurrent programming. This book dives deep into the latest technologies available to programmers for creating professional parallel applications using C#, .NET 4, and Visual Studio 2010. The book covers task-based programming, coordination data structures, PLINQ, thread pools, asynchronous programming model, and more. It also teaches other parallel programming techniques, such as SIMD and vectorization.Teach

  5. Selecting Simulation Models when Predicting Parallel Program Behaviour

    OpenAIRE

    Broberg, Magnus; Lundberg, Lars; Grahn, Håkan

    2002-01-01

    The use of multiprocessors is an important way to increase the performance of a supercom-puting program. This means that the program has to be parallelized to make use of the multi-ple processors. The parallelization is unfortunately not an easy task. Development tools supporting parallel programs are important. Further, it is the customer that decides the number of processors in the target machine, and as a result the developer has to make sure that the pro-gram runs efficiently on any numbe...

  6. Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

    Science.gov (United States)

    Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

    1994-01-01

    Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.

  7. A Performance Analysis Tool for PVM Parallel Programs

    Institute of Scientific and Technical Information of China (English)

    Chen Wang; Yin Liu; Changjun Jiang; Zhaoqing Zhang

    2004-01-01

    In this paper,we introduce the design and implementation of ParaVT,which is a visual performance analysis and parallel debugging tool.In ParaVT,we propose an automated instrumentation mechanism. Based on this mechanism,ParaVT automatically analyzes the performance bottleneck of parallel applications and provides a visual user interface to monitor and analyze the performance of parallel programs.In addition ,it also supports certain extensions.

  8. Owlready: Ontology-oriented programming in Python with automatic classification and high level constructs for biomedical ontologies.

    Science.gov (United States)

    Lamy, Jean-Baptiste

    2017-07-01

    Ontologies are widely used in the biomedical domain. While many tools exist for the edition, alignment or evaluation of ontologies, few solutions have been proposed for ontology programming interface, i.e. for accessing and modifying an ontology within a programming language. Existing query languages (such as SPARQL) and APIs (such as OWLAPI) are not as easy-to-use as object programming languages are. Moreover, they provide few solutions to difficulties encountered with biomedical ontologies. Our objective was to design a tool for accessing easily the entities of an OWL ontology, with high-level constructs helping with biomedical ontologies. From our experience on medical ontologies, we identified two difficulties: (1) many entities are represented by classes (rather than individuals), but the existing tools do not permit manipulating classes as easily as individuals, (2) ontologies rely on the open-world assumption, whereas the medical reasoning must consider only evidence-based medical knowledge as true. We designed a Python module for ontology-oriented programming. It allows access to the entities of an OWL ontology as if they were objects in the programming language. We propose a simple high-level syntax for managing classes and the associated "role-filler" constraints. We also propose an algorithm for performing local closed world reasoning in simple situations. We developed Owlready, a Python module for a high-level access to OWL ontologies. The paper describes the architecture and the syntax of the module version 2. It details how we integrated the OWL ontology model with the Python object model. The paper provides examples based on Gene Ontology (GO). We also demonstrate the interest of Owlready in a use case focused on the automatic comparison of the contraindications of several drugs. This use case illustrates the use of the specific syntax proposed for manipulating classes and for performing local closed world reasoning. Owlready has been successfully

  9. How to Shape a Successful Repository Program: Staged Development of Geologic Repositories for High-Level Waste

    Energy Technology Data Exchange (ETDEWEB)

    Isaacs, T.

    2004-10-03

    Programs to manage and ultimately dispose of high-level radioactive wastes are unique from scientific and technological as well as socio-political aspects. From a scientific and technological perspective, high-level radioactive wastes remain potentially hazardous for geological time periods--many millennia--and scientific and technological programs must be put in place that result in a system that provides high confidence that the wastes will be isolated from the accessible environment for these many thousands of years. Of course, ''proof'' in the classical sense is not possible at the outset, since the performance of the system can only be known with assurance, if ever, after the waste has been emplaced for those geological time periods. Adding to this challenge, many uncertainties exist in both the natural and engineered systems that are intended to isolate the wastes, and some of the uncertainties will remain regardless of the time and expense in attempting to characterize the system and assess its performance.

  10. Architectural Adaptability in Parallel Programming via Control Abstraction

    Science.gov (United States)

    1991-01-01

    Technical Report 359 January 1991 Abstract Parallel programming involves finding the potential parallelism in an application, choos - ing an...during the development of this paper. 34 References [Albert et ai, 1988] Eugene Albert, Kathleen Knobe, Joan D. Lukas, and Guy L. Steele, Jr

  11. Six Sigma Evaluation of the High Level Waste Tank Farm Corrosion Control Program at the Savannah River Site

    Energy Technology Data Exchange (ETDEWEB)

    Hill, P. J.

    2003-02-26

    Six Sigma is a disciplined approach to process improvement based on customer requirements and data. The goal is to develop or improve processes with defects that are measured at only a few parts per million. The process includes five phases: Identify, Measure, Analyze, Improve, and Control. This report describes the application of the Six Sigma process to improving the High Level Waste (HLW) Tank Farm Corrosion Control Program. The report documents the work performed and the tools utilized while applying the Six Sigma process from September 28, 2001 to April 1, 2002. During Fiscal Year 2001, the High Level Waste Division spent $5.9 million to analyze samples from the F and H Tank Farms. The largest portion of these analytical costs was $2.45 million that was spent to analyze samples taken to support the Corrosion Control Program. The objective of the Process Improvement Project (PIP) team was to reduce the number of analytical tasks required to support the Corrosion Control Program by 50 percent. Based on the data collected, the corrosion control decision process flowchart, and the use of the X-Y Matrix tool, the team determined that analyses in excess of the requirements of the corrosion control program were being performed. Only two of the seven analytical tasks currently performed are required for the 40 waste tanks governed by the Corrosion Control Program. Two additional analytical tasks are required for a small subset of the waste tanks resulting in an average of 2.7 tasks per sample compared to the current 7 tasks per sample. Forty HLW tanks are sampled periodically as part of the Corrosion Control Program. For each of these tanks, an analysis was performed to evaluate the stability of the chemistry in the tank and then to determine the statistical capability of the tank to meet minimum corrosion inhibitor limits. The analyses proved that most of the tanks were being sampled too frequently. Based on the results of these analyses and th e use of additional

  12. Detection of And—Parallelism in Logic Programs

    Institute of Scientific and Technical Information of China (English)

    黄志毅; 胡守仁

    1990-01-01

    In this paper,we present a detection technique of and-parallelism in logic programs.The detection consists of three phases:analysis of entry modes,derivation of exit modes and determination of execution graph expressions.Compared with other techniques[2,4,5],our approach with the compile-time program-level data-dependence analysis of logic programs,can efficiently exploit and-parallelism in logic programs.Two precompilers,based on our technique and DeGroot's approach[3] respectively,have been implemented in SES-PIM system[12],Through compiling and running some typical benchmarks in SES-PIM,we conclude that our technique can,in most cases,exploit as much and-parallelism as the dynamic approach[13]does under“produces-consumer”scheme,and needs less dynamic overhead while exploiting more and parallelism than DeGroot's approach does.

  13. Integrated Task And Data Parallel Programming: Language Design

    Science.gov (United States)

    Grimshaw, Andrew S.; West, Emily A.

    1998-01-01

    his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated

  14. Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

    Directory of Open Access Journals (Sweden)

    Stephen L. Olivier

    2013-01-01

    Full Text Available Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems. Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.

  15. Program Transformation to Identify List-Based Parallel Skeletons

    Directory of Open Access Journals (Sweden)

    Venkatesh Kannan

    2016-07-01

    Full Text Available Algorithmic skeletons are used as building-blocks to ease the task of parallel programming by abstracting the details of parallel implementation from the developer. Most existing libraries provide implementations of skeletons that are defined over flat data types such as lists or arrays. However, skeleton-based parallel programming is still very challenging as it requires intricate analysis of the underlying algorithm and often uses inefficient intermediate data structures. Further, the algorithmic structure of a given program may not match those of list-based skeletons. In this paper, we present a method to automatically transform any given program to one that is defined over a list and is more likely to contain instances of list-based skeletons. This facilitates the parallel execution of a transformed program using existing implementations of list-based parallel skeletons. Further, by using an existing transformation called distillation in conjunction with our method, we produce transformed programs that contain fewer inefficient intermediate data structures.

  16. Development of massively parallel quantum chemistry program SMASH

    Energy Technology Data Exchange (ETDEWEB)

    Ishimura, Kazuya [Department of Theoretical and Computational Molecular Science, Institute for Molecular Science 38 Nishigo-Naka, Myodaiji, Okazaki, Aichi 444-8585 (Japan)

    2015-12-31

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.

  17. Web Based Parallel Programming Workshop for Undergraduate Education.

    Science.gov (United States)

    Marcus, Robert L.; Robertson, Douglass

    Central State University (Ohio), under a contract with Nichols Research Corporation, has developed a World Wide web based workshop on high performance computing entitled "IBN SP2 Parallel Programming Workshop." The research is part of the DoD (Department of Defense) High Performance Computing Modernization Program. The research…

  18. Protocol-Based Verification of Message-Passing Parallel Programs

    DEFF Research Database (Denmark)

    López-Acosta, Hugo-Andrés; Eduardo R. B. Marques, Eduardo R. B.; Martins, Francisco;

    2015-01-01

    a protocol language based on a dependent type system for message-passing parallel programs, which includes various communication operators, such as point-to-point messages, broadcast, reduce, array scatter and gather. For the verification of a program against a given protocol, the protocol is first...

  19. Accelerate Performance on the Parallel Programming Super Highway

    Science.gov (United States)

    2010-04-01

    barriers  associated with  parallel   programming Dataflow languages ought to be considered along with  traditional (imperative) programming solutions 2...dasymptotic con ition (3 GHz) Moore’s Law may still be valid, but the Law of  Thermodynamics is also valid Parallel   Programming  options exist, but...languages can address some major  challenges associated with  parallel   programming Many dataflow languages exist today, and should be  considered along  ith

  20. On program restructuring, scheduling, and communication for parallel processor systems

    Energy Technology Data Exchange (ETDEWEB)

    Polychronopoulos, Constantine D.

    1986-08-01

    This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, these algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented. 69 refs., 74 figs., 14 tabs.

  1. Optimized Parallel Execution of Declarative Programs on Distributed Memory Multiprocessors

    Institute of Scientific and Technical Information of China (English)

    沈美明; 田新民; 等

    1993-01-01

    In this paper,we focus on the compiling implementation of parlalel logic language PARLOG and functional language ML on distributed memory multiprocessors.Under the graph rewriting framework, a Heterogeneous Parallel Graph Rewriting Execution Model(HPGREM)is presented firstly.Then based on HPGREM,a parallel abstact machine PAM/TGR is described.Furthermore,several optimizing compilation schemes for executing declarative programs on transputer array are proposed.The performance statistics on transputer array demonstrate the effectiveness of our model,parallel abstract machine,optimizing compilation strategies and compiler.

  2. The parallel programming of voluntary and reflexive saccades.

    Science.gov (United States)

    Walker, Robin; McSorley, Eugene

    2006-06-01

    A novel two-step paradigm was used to investigate the parallel programming of consecutive, stimulus-elicited ('reflexive') and endogenous ('voluntary') saccades. The mean latency of voluntary saccades, made following the first reflexive saccades in two-step conditions, was significantly reduced compared to that of voluntary saccades made in the single-step control trials. The latency of the first reflexive saccades was modulated by the requirement to make a second saccade: first saccade latency increased when a second voluntary saccade was required in the opposite direction to the first saccade, and decreased when a second saccade was required in the same direction as the first reflexive saccade. A second experiment confirmed the basic effect and also showed that a second reflexive saccade may be programmed in parallel with a first voluntary saccade. The results support the view that voluntary and reflexive saccades can be programmed in parallel on a common motor map.

  3. Basic design of parallel computational program for probabilistic structural analysis

    Energy Technology Data Exchange (ETDEWEB)

    Kaji, Yoshiyuki; Arai, Taketoshi [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Gu, Wenwei; Nakamura, Hitoshi

    1999-06-01

    In our laboratory, for `development of damage evaluation method of structural brittle materials by microscopic fracture mechanics and probabilistic theory` (nuclear computational science cross-over research) we examine computational method related to super parallel computation system which is coupled with material strength theory based on microscopic fracture mechanics for latent cracks and continuum structural model to develop new structural reliability evaluation methods for ceramic structures. This technical report is the review results regarding probabilistic structural mechanics theory, basic terms of formula and program methods of parallel computation which are related to principal terms in basic design of computational mechanics program. (author)

  4. Parallel Implementation of the PHOENIX Generalized Stellar Atmosphere Program; 2, Wavelength Parallelization

    CERN Document Server

    Baron, E A; Hauschildt, Peter H.

    1997-01-01

    We describe an important addition to the parallel implementation of our generalized NLTE stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000--300,0...

  5. Python based high-level synthesis compiler

    Science.gov (United States)

    Cieszewski, Radosław; Pozniak, Krzysztof; Romaniuk, Ryszard

    2014-11-01

    This paper presents a python based High-Level synthesis (HLS) compiler. The compiler interprets an algorithmic description of a desired behavior written in Python and map it to VHDL. FPGA combines many benefits of both software and ASIC implementations. Like software, the mapped circuit is flexible, and can be reconfigured over the lifetime of the system. FPGAs therefore have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute operations of traditional processors, and possibly exploiting a greater level of parallelism. Creating parallel programs implemented in FPGAs is not trivial. This article describes design, implementation and first results of created Python based compiler.

  6. Center for Programming Models for Scalable Parallel Computing

    Energy Technology Data Exchange (ETDEWEB)

    John Mellor-Crummey

    2008-02-29

    Rice University's achievements as part of the Center for Programming Models for Scalable Parallel Computing include: (1) design and implemention of cafc, the first multi-platform CAF compiler for distributed and shared-memory machines, (2) performance studies of the efficiency of programs written using the CAF and UPC programming models, (3) a novel technique to analyze explicitly-parallel SPMD programs that facilitates optimization, (4) design, implementation, and evaluation of new language features for CAF, including communication topologies, multi-version variables, and distributed multithreading to simplify development of high-performance codes in CAF, and (5) a synchronization strength reduction transformation for automatically replacing barrier-based synchronization with more efficient point-to-point synchronization. The prototype Co-array Fortran compiler cafc developed in this project is available as open source software from http://www.hipersoft.rice.edu/caf.

  7. MELD: A Logical Approach to Distributed and Parallel Programming

    Science.gov (United States)

    2012-03-01

    USA: ACM, 1974, pp. 249–264. [14] M. Isard , M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: Distributed data-parallel programs from sequential...TR-2006-140. [Online]. Available: http://budiu.info/work/eurosys07.pdf [15] Y. Yu, M. Isard , D. Fetterly, M. Budiu, Ú . Erlingsson, P. K. Gunda

  8. Modelling parallel programs and multiprocessor architectures with AXE

    Science.gov (United States)

    Yan, Jerry C.; Fineman, Charles E.

    1991-01-01

    AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.

  9. A Large-Grain Parallel Programming Environment for Non-Programmers

    OpenAIRE

    Lewis, Ted

    1994-01-01

    1994 International Conference on Parallel Processing Banger is a parallel programming environment used by non-professional programmers to write explicitly parallel large-grain parallel programs. The goals of Banger are: 1. extreme ease of use, 2. immediate feedback, and 3. machine-independence. Banger is based on three principles: 1. separation of parallel programming-in-the-large from sequential programming-in-the-small, 2. separation of programming environment from target machine ...

  10. Heterogeneous Multicore Parallel Programming for Graphics Processing Units

    Directory of Open Access Journals (Sweden)

    Francois Bodin

    2009-01-01

    Full Text Available Hybrid parallel multicore architectures based on graphics processing units (GPUs can provide tremendous computing power. Current NVIDIA and AMD Graphics Product Group hardware display a peak performance of hundreds of gigaflops. However, exploiting GPUs from existing applications is a difficult task that requires non-portable rewriting of the code. In this paper, we present HMPP, a Heterogeneous Multicore Parallel Programming workbench with compilers, developed by CAPS entreprise, that allows the integration of heterogeneous hardware accelerators in a unintrusive manner while preserving the legacy code.

  11. Advanced parallel programming models research and development opportunities.

    Energy Technology Data Exchange (ETDEWEB)

    Wen, Zhaofang.; Brightwell, Ronald Brian

    2004-07-01

    There is currently a large research and development effort within the high-performance computing community on advanced parallel programming models. This research can potentially have an impact on parallel applications, system software, and computing architectures in the next several years. Given Sandia's expertise and unique perspective in these areas, particularly on very large-scale systems, there are many areas in which Sandia can contribute to this effort. This technical report provides a survey of past and present parallel programming model research projects and provides a detailed description of the Partitioned Global Address Space (PGAS) programming model. The PGAS model may offer several improvements over the traditional distributed memory message passing model, which is the dominant model currently being used at Sandia. This technical report discusses these potential benefits and outlines specific areas where Sandia's expertise could contribute to current research activities. In particular, we describe several projects in the areas of high-performance networking, operating systems and parallel runtime systems, compilers, application development, and performance evaluation.

  12. Efficient Thread Labeling for Monitoring Programs with Nested Parallelism

    Science.gov (United States)

    Ha, Ok-Kyoon; Kim, Sun-Sook; Jun, Yong-Kee

    It is difficult and cumbersome to detect data races occurred in an execution of parallel programs. Any on-the-fly race detection techniques using Lamport's happened-before relation needs a thread labeling scheme for generating unique identifiers which maintain logical concurrency information for the parallel threads. NR labeling is an efficient thread labeling scheme for the fork-join program model with nested parallelism, because its efficiency depends only on the nesting depth for every fork and join operation. This paper presents an improved NR labeling, called e-NR labeling, in which every thread generates its label by inheriting the pointer to its ancestor list from the parent threads or by updating the pointer in a constant amount of time and space. This labeling is more efficient than the NR labeling, because its efficiency does not depend on the nesting depth for every fork and join operation. Some experiments were performed with OpenMP programs having nesting depths of three or four and maximum parallelisms varying from 10,000 to 1,000,000. The results show that e-NR is 5 times faster than NR labeling and 4.3 times faster than OS labeling in the average time for creating and maintaining the thread labels. In average space required for labeling, it is 3.5 times smaller than NR labeling and 3 times smaller than OS labeling.

  13. Feedback Driven Annotation and Refactoring of Parallel Programs

    DEFF Research Database (Denmark)

    Larsen, Per

    This thesis combines programmer knowledge and feedback to improve modeling and optimization of software. The research is motivated by two observations. First, there is a great need for automatic analysis of software for embedded systems - to expose and model parallelism inherent in programs. Second......, some program properties are beyond reach of such analysis for theoretical and practical reasons - but can be described by programmers. Three aspects are explored. The first is annotation of the source code. Two annotations are introduced. These allow more accurate modeling of parallelism...... are not effective unless programmers are told how and when they are benecial. A prototype compilation feedback system was developed in collaboration with IBM Haifa Research Labs. It reports issues that prevent further analysis to the programmer. Performance evaluation shows that three programs performes signicantly...

  14. Exploration Of Deep Learning Algorithms Using Openacc Parallel Programming Model

    KAUST Repository

    Hamam, Alwaleed A.

    2017-03-13

    Deep learning is based on a set of algorithms that attempt to model high level abstractions in data. Specifically, RBM is a deep learning algorithm that used in the project to increase it\\'s time performance using some efficient parallel implementation by OpenACC tool with best possible optimizations on RBM to harness the massively parallel power of NVIDIA GPUs. GPUs development in the last few years has contributed to growing the concept of deep learning. OpenACC is a directive based ap-proach for computing where directives provide compiler hints to accelerate code. The traditional Restricted Boltzmann Ma-chine is a stochastic neural network that essentially perform a binary version of factor analysis. RBM is a useful neural net-work basis for larger modern deep learning model, such as Deep Belief Network. RBM parameters are estimated using an efficient training method that called Contrastive Divergence. Parallel implementation of RBM is available using different models such as OpenMP, and CUDA. But this project has been the first attempt to apply OpenACC model on RBM.

  15. Sisal 3.2: functional language for scientific parallel programming

    Science.gov (United States)

    Kasyanov, Victor

    2013-05-01

    Sisal 3.2 is a new input language of system of functional programming (SFP) which is under development at the Institute of Informatics Systems in Novosibirsk as an interactive visual environment for supporting of scientific parallel programming. This paper contains an overview of Sisal 3.2 and a description of its new features compared with previous versions of the SFP input language such as the multidimensional array support, new abstractions like parametric types and generalised procedures, more flexible user-defined reductions, improved interoperability with other programming languages and specification of several optimising source text annotations.

  16. Programming Massively Parallel Architectures using MARTE: a Case Study

    CERN Document Server

    Rodrigues, Wendell; Dekeyser, Jean-Luc

    2011-01-01

    Nowadays, several industrial applications are being ported to parallel architectures. These applications take advantage of the potential parallelism provided by multiple core processors. Many-core processors, especially the GPUs(Graphics Processing Unit), have led the race of floating-point performance since 2003. While the performance improvement of general- purpose microprocessors has slowed significantly, the GPUs have continued to improve relentlessly. As of 2009, the ratio between many-core GPUs and multicore CPUs for peak floating-point calculation throughput is about 10 times. However, as parallel programming requires a non-trivial distribution of tasks and data, developers find it hard to implement their applications effectively. Aiming to improve the use of many-core processors, this work presents an case-study using UML and MARTE profile to specify and generate OpenCL code for intensive signal processing applications. Benchmark results show us the viability of the use of MDE approaches to generate G...

  17. Testing New Programming Paradigms with NAS Parallel Benchmarks

    Science.gov (United States)

    Jin, H.; Frumkin, M.; Schultz, M.; Yan, J.

    2000-01-01

    Over the past decade, high performance computing has evolved rapidly, not only in hardware architectures but also with increasing complexity of real applications. Technologies have been developing to aim at scaling up to thousands of processors on both distributed and shared memory systems. Development of parallel programs on these computers is always a challenging task. Today, writing parallel programs with message passing (e.g. MPI) is the most popular way of achieving scalability and high performance. However, writing message passing programs is difficult and error prone. Recent years new effort has been made in defining new parallel programming paradigms. The best examples are: HPF (based on data parallelism) and OpenMP (based on shared memory parallelism). Both provide simple and clear extensions to sequential programs, thus greatly simplify the tedious tasks encountered in writing message passing programs. HPF is independent of memory hierarchy, however, due to the immaturity of compiler technology its performance is still questionable. Although use of parallel compiler directives is not new, OpenMP offers a portable solution in the shared-memory domain. Another important development involves the tremendous progress in the internet and its associated technology. Although still in its infancy, Java promisses portability in a heterogeneous environment and offers possibility to "compile once and run anywhere." In light of testing these new technologies, we implemented new parallel versions of the NAS Parallel Benchmarks (NPBs) with HPF and OpenMP directives, and extended the work with Java and Java-threads. The purpose of this study is to examine the effectiveness of alternative programming paradigms. NPBs consist of five kernels and three simulated applications that mimic the computation and data movement of large scale computational fluid dynamics (CFD) applications. We started with the serial version included in NPB2.3. Optimization of memory and cache usage

  18. Performance Evaluation Methodologies and Tools for Massively Parallel Programs

    Science.gov (United States)

    Yan, Jerry C.; Sarukkai, Sekhar; Tucker, Deanne (Technical Monitor)

    1994-01-01

    The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. The recent introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CSI'S Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance tool technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g., AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2)) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.

  19. Regulatory perspectives on model validation in high-level radioactive waste management programs: A joint NRC/SKI white paper

    Energy Technology Data Exchange (ETDEWEB)

    Wingefors, S.; Andersson, J.; Norrby, S. [Swedish Nuclear Power lnspectorate, Stockholm (Sweden). Office of Nuclear Waste Safety; Eisenberg, N.A.; Lee, M.P.; Federline, M.V. [U.S. Nuclear Regulatory Commission, Washington, DC (United States). Office of Nuclear Material Safety and Safeguards; Sagar, B.; Wittmeyer, G.W. [Center for Nuclear Waste Regulatory Analyses, San Antonio, TX (United States)

    1999-03-01

    Validation (or confidence building) should be an important aspect of the regulatory uses of mathematical models in the safety assessments of geologic repositories for the disposal of spent nuclear fuel and other high-level radioactive wastes (HLW). A substantial body of literature exists indicating the manner in which scientific validation of models is usually pursued. Because models for a geologic repository performance assessment cannot be tested over the spatial scales of interest and long time periods for which the models will make estimates of performance, the usual avenue for model validation- that is, comparison of model estimates with actual data at the space-time scales of interest- is precluded. Further complicating the model validation process in HLW programs are the uncertainties inherent in describing the geologic complexities of potential disposal sites, and their interactions with the engineered system, with a limited set of generally imprecise data, making it difficult to discriminate between model discrepancy and inadequacy of input data. A successful strategy for model validation, therefore, should attempt to recognize these difficulties, address their resolution, and document the resolution in a careful manner. The end result of validation efforts should be a documented enhancement of confidence in the model to an extent that the model's results can aid in regulatory decision-making. The level of validation needed should be determined by the intended uses of these models, rather than by the ideal of validation of a scientific theory. This white Paper presents a model validation strategy that can be implemented in a regulatory environment. It was prepared jointly by staff members of the U.S. Nuclear Regulatory Commission and the Swedish Nuclear Power Inspectorate-SKI. This document should not be viewed as, and is not intended to be formal guidance or as a staff position on this matter. Rather, based on a review of the literature and previous

  20. Scientific programming on massively parallel processor CP-PACS

    Energy Technology Data Exchange (ETDEWEB)

    Boku, Taisuke [Tsukuba Univ., Ibaraki (Japan). Inst. of Information Sciences and Electronics

    1998-03-01

    The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)

  1. Automatic array alignment in data-parallel programs

    Science.gov (United States)

    Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert; Teng, Shang-Hua

    1993-01-01

    FORTRAN 90 and other data-parallel languages express parallelism in the form of operations on data aggregates such as arrays. Misalignment of the operands of an array operation can reduce program performance on a distributed-memory parallel machine by requiring nonlocal data accesses. Determining array alignments that reduce communication is therefore a key issue in compiling such languages. We present a framework for the automatic determination of array alignments in array-based, data-parallel languages. Our language model handles array sectioning, reductions, spreads, transpositions, and masked operations. We decompose alignment functions into three constituents: axis, stride, and offset. For each of these subproblems, we show how to solve the alignment problem for a basic block of code, possibly containing common subexpressions. Alignments are generated for all array objects in the code, both named program variables and intermediate results. We assign computation to processors by virtue of explicit alignment of all temporaries; the resulting work assignment is in general better than that provided by the 'owner-computes' rule. Finally, we present some ideas for dealing with control flow, replication, and dynamic alignments that depend on loop induction variables.

  2. Evidence of gene orthology and trans-species polymorphism, but not of parallel evolution, despite high levels of concerted evolution in the major histocompatibility complex of flamingo species.

    Science.gov (United States)

    Gillingham, M A F; Courtiol, A; Teixeira, M; Galan, M; Bechet, A; Cezilly, F

    2016-02-01

    The major histocompatibility complex (MHC) is a cornerstone in the study of adaptive genetic diversity. Intriguingly, highly polymorphic MHC sequences are often not more similar within species than between closely related species. Divergent selection of gene duplicates, balancing selection maintaining trans-species polymorphism (TSP) that predate speciation and parallel evolution of species sharing similar selection pressures can all lead to higher sequence similarity between species. In contrast, high rates of concerted evolution increase sequence similarity of duplicated loci within species. Assessing these evolutionary models remains difficult as relatedness and ecological similarities are often confounded. As sympatric species of flamingos are more distantly related than allopatric species, flamingos represent an ideal model to disentangle these evolutionary models. We characterized MHC Class I exon 3, Class IIB exon 2 and exon 3 of the six extant flamingo species. We found up to six MHC Class I loci and two MHC Class IIB loci. As all six species shared the same number of MHC Class IIB loci, duplication appears to predate flamingo speciation. However, the high rate of concerted evolution has prevented the divergence of duplicated loci. We found high sequence similarity between all species regardless of codon position. The latter is consistent with balancing selection maintaining TSP, as under this mechanism amino acid sites under pathogen-mediated selection should be characterized by fewer synonymous codons (due to their common ancestry) than under parallel evolution. Overall, balancing selection maintaining TSP appears to result in high MHC similarity between species regardless of species relatedness and geographical distribution. © 2015 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2015 European Society For Evolutionary Biology.

  3. Programming N-Cubes with a Graphical Parallel Programming Environment Versus an Extended Sequential Language.

    Science.gov (United States)

    1986-11-01

    parallel programming environment and language Poker. Our example programs, an implementation of a Cholesky algorithm for a banded matrix, were written in both languages and compiled into object codes that ran on the Cosmic Cube. However the program written in Poker is shorter, faster and easier to write, easier to debug, and portable without changes to other parallel computer architectures. The Poker program was slower than the program written directly in Cosmic Cube C, however the experiments provided insights into changes that make Poker programs nearly as fast.

  4. On the utility of threads for data parallel programming

    Science.gov (United States)

    Fahringer, Thomas; Haines, Matthew; Mehrotra, Piyush

    1995-01-01

    Threads provide a useful programming model for asynchronous behavior because of their ability to encapsulate units of work that can then be scheduled for execution at runtime, based on the dynamic state of a system. Recently, the threaded model has been applied to the domain of data parallel scientific codes, and initial reports indicate that the threaded model can produce performance gains over non-threaded approaches, primarily through the use of overlapping useful computation with communication latency. However, overlapping computation with communication is possible without the benefit of threads if the communication system supports asynchronous primitives, and this comparison has not been made in previous papers. This paper provides a critical look at the utility of lightweight threads as applied to data parallel scientific programming.

  5. Final Report: Center for Programming Models for Scalable Parallel Computing

    Energy Technology Data Exchange (ETDEWEB)

    Mellor-Crummey, John [William Marsh Rice University

    2011-09-13

    As part of the Center for Programming Models for Scalable Parallel Computing, Rice University collaborated with project partners in the design, development and deployment of language, compiler, and runtime support for parallel programming models to support application development for the “leadership-class” computer systems at DOE national laboratories. Work over the course of this project has focused on the design, implementation, and evaluation of a second-generation version of Coarray Fortran. Research and development efforts of the project have focused on the CAF 2.0 language, compiler, runtime system, and supporting infrastructure. This has involved working with the teams that provide infrastructure for CAF that we rely on, implementing new language and runtime features, producing an open source compiler that enabled us to evaluate our ideas, and evaluating our design and implementation through the use of benchmarks. The report details the research, development, findings, and conclusions from this work.

  6. VPC - A Proposal for a Vector Parallel C Programming Language.

    Science.gov (United States)

    1987-10-30

    181 B. Kernighan and D. Ritchie. Th~e C Programming Language. Prentice-11all, 1978. [91 B. Kernighan and R. Pike. The Unix Programming Environment...designed to be an extended version of the C language as defined by Kernighan and Ritchie (Ref. 8). Rather than taking the approach of extending...basis. Unix is a trademark of AT&T Bell Laboratories. e .’r % 7% The Vector Parallel C Language 3 tion calls that activate the FX/8’s proprietary

  7. An informal introduction to program transformation and parallel processors

    Energy Technology Data Exchange (ETDEWEB)

    Hopkins, K.W. [Southwest Baptist Univ., Bolivar, MO (United States)

    1994-08-01

    In the summer of 1992, I had the opportunity to participate in a Faculty Research Program at Argonne National Laboratory. I worked under Dr. Jim Boyle on a project transforming code written in pure functional Lisp to Fortran code to run on distributed-memory parallel processors. To perform this project, I had to learn three things: the transformation system, the basics of distributed-memory parallel machines, and the Lisp programming language. Each of these topics in computer science was unfamiliar to me as a mathematician, but I found that they (especially parallel processing) are greatly impacting many fields of mathematics and science. Since most mathematicians have some exposure to computers, but.certainly are not computer scientists, I felt it was appropriate to write a paper summarizing my introduction to these areas and how they can fit together. This paper is not meant to be a full explanation of the topics, but an informal introduction for the ``mathematical layman.`` I place myself in that category as well as my previous use of computers was as a classroom demonstration tool.

  8. Parallelizing Deadlock Resolution in Symbolic Synthesis of Distributed Programs

    Directory of Open Access Journals (Sweden)

    Fuad Abujarad

    2009-12-01

    Full Text Available Previous work has shown that there are two major complexity barriers in the synthesis of fault-tolerant distributed programs: (1 generation of fault-span, the set of states reachable in the presence of faults, and (2 resolving deadlock states, from where the program has no outgoing transitions. Of these, the former closely resembles with model checking and, hence, techniques for efficient verification are directly applicable to it. Hence, we focus on expediting the latter with the use of multi-core technology. We present two approaches for parallelization by considering different design choices. The first approach is based on the computation of equivalence classes of program transitions (called group computation that are needed due to the issue of distribution (i.e., inability of processes to atomically read and write all program variables. We show that in most cases the speedup of this approach is close to the ideal speedup and in some cases it is superlinear. The second approach uses traditional technique of partitioning deadlock states among multiple threads. However, our experiments show that the speedup for this approach is small. Consequently, our analysis demonstrates that a simple approach of parallelizing the group computation is likely to be the effective method for using multi-core computing in the context of deadlock resolution.

  9. Advanced Programming Platform for efficient use of Data Parallel Hardware

    CERN Document Server

    Cabellos, Luis

    2012-01-01

    Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly interesting for scientific groups, which traditionally use mainly CPU as a work horse, and now can profit of the arrival of GPU hardware to HPC clusters. This new GPU hardware promises a boost in peak performance, but it is not trivial to use. In this article a programming platform designed to promote a direct use of this specialized hardware is presented. This platform includes a visual editor of parallel data flows and it is oriented to the execution in distributed clusters with GPUs. Examples of application in two characteristic problems, Fast Fourier Transform and Image Compression, are also shown.

  10. Automatic Performance Debugging of SPMD-style Parallel Programs

    CERN Document Server

    Liu, Xu; Zhan, Kunlin; Shi, Weisong; Yuan, Lin; Meng, Dan; Wang, Lei

    2011-01-01

    The simple program and multiple data (SPMD) programming model is widely used for both high performance computing and Cloud computing. In this paper, we design and implement an innovative system, AutoAnalyzer, that automates the process of debugging performance problems of SPMD-style parallel programs, including data collection, performance behavior analysis, locating bottlenecks, and uncovering their root causes. AutoAnalyzer is unique in terms of two features: first, without any apriori knowledge, it automatically locates bottlenecks and uncovers their root causes for performance optimization; second, it is lightweight in terms of the size of performance data to be collected and analyzed. Our contributions are three-fold: first, we propose two effective clustering algorithms to investigate the existence of performance bottlenecks that cause process behavior dissimilarity or code region behavior disparity, respectively; meanwhile, we present two searching algorithms to locate bottlenecks; second, on a basis o...

  11. NavP: Structured and Multithreaded Distributed Parallel Programming

    Science.gov (United States)

    Pan, Lei

    2007-01-01

    We present Navigational Programming (NavP) -- a distributed parallel programming methodology based on the principles of migrating computations and multithreading. The four major steps of NavP are: (1) Distribute the data using the data communication pattern in a given algorithm; (2) Insert navigational commands for the computation to migrate and follow large-sized distributed data; (3) Cut the sequential migrating thread and construct a mobile pipeline; and (4) Loop back for refinement. NavP is significantly different from the current prevailing Message Passing (MP) approach. The advantages of NavP include: (1) NavP is structured distributed programming and it does not change the code structure of an original algorithm. This is in sharp contrast to MP as MP implementations in general do not resemble the original sequential code; (2) NavP implementations are always competitive with the best MPI implementations in terms of performance. Approaches such as DSM or HPF have failed to deliver satisfying performance as of today in contrast, even if they are relatively easy to use compared to MP; (3) NavP provides incremental parallelization, which is beyond the reach of MP; and (4) NavP is a unifying approach that allows us to exploit both fine- (multithreading on shared memory) and coarse- (pipelined tasks on distributed memory) grained parallelism. This is in contrast to the currently popular hybrid use of MP+OpenMP, which is known to be complex to use. We present experimental results that demonstrate the effectiveness of NavP.

  12. StreaMorph: A Case for Synthesizing Energy-Efficient Adaptive Programs Using High-Level Abstractions

    Science.gov (United States)

    2013-08-12

    SDF, a program is decom- posed into multiple autonomous filters connected using FIFO chan- nels. As a result, stream programs can dynamically...straightforward. For example, in the left of Figure 2, a stream program is composed of three tasks T1, T2 and T3 con- nected through two FIFO channels...through FIFO channels. Each filter has a set of input and output ports. Each channel connects an output port of a filter to an input port of another actor

  13. Parallelization and checkpointing of GPU applications through program transformation

    Energy Technology Data Exchange (ETDEWEB)

    Solano-Quinde, Lizandro Damian [Iowa State Univ., Ames, IA (United States)

    2012-01-01

    GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solve the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and

  14. Energy consumption model over parallel programs implemented on multicore architectures

    Directory of Open Access Journals (Sweden)

    Ricardo Isidro-Ramirez

    2015-06-01

    Full Text Available In High Performance Computing, energy consump-tion is becoming an important aspect to consider. Due to the high costs that represent energy production in all countries it holds an important role and it seek to find ways to save energy. It is reflected in some efforts to reduce the energy requirements of hardware components and applications. Some options have been appearing in order to scale down energy use and, con-sequently, scale up energy efficiency. One of these strategies is the multithread programming paradigm, whose purpose is to produce parallel programs able to use the full amount of computing resources available in a microprocessor. That energy saving strategy focuses on efficient use of multicore processors that are found in various computing devices, like mobile devices. Actually, as a growing trend, multicore processors are found as part of various specific purpose computers since 2003, from High Performance Computing servers to mobile devices. However, it is not clear how multiprogramming affects energy efficiency. This paper presents an analysis of different types of multicore-based architectures used in computing, and then a valid model is presented. Based on Amdahl’s Law, a model that considers different scenarios of energy use in multicore architectures it is proposed. Some interesting results were found from experiments with the developed algorithm, that it was execute of a parallel and sequential way. A lower limit of energy consumption was found in a type of multicore architecture and this behavior was observed experimentally.

  15. An empirical study of FORTRAN programs for parallelizing compilers

    Science.gov (United States)

    Shen, Zhiyu; Li, Zhiyuan; Yew, Pen-Chung

    1990-01-01

    Some results are reported from an empirical study of program characteristics that are important in parallelizing compiler writers, especially in the area of data dependence analysis and program transformations. The state of the art in data dependence analysis and some parallel execution techniques are examined. The major findings are included. Many subscripts contain symbolic terms with unknown values. A few methods of determining their values at compile time are evaluated. Array references with coupled subscripts appear quite frequently; these subscripts must be handled simultaneously in a dependence test, rather than being handled separately as in current test algorithms. Nonzero coefficients of loop indexes in most subscripts are found to be simple: they are either 1 or -1. This allows an exact real-valued test to be as accurate as an exact integer-valued test for one-dimensional or two-dimensional arrays. Dependencies with uncertain distance are found to be rather common, and one of the main reasons is the frequent appearance of symbolic terms with unknown values.

  16. Poker on the Cosmic Cube: The First Retargetable Parallel Programming Language and Environment.

    Science.gov (United States)

    1986-06-01

    parallel programming environment, to new parallel architectures. The specifics are illustrated by describing the retarget of Poker to CalTech’s Cosmic Cube. Poker requires only three features from the target architecture: MIMD operation, message passing inter-process communication, and a sequential language (e.g. C) for the processor elements. In return Poker gives the new architecture a complete parallel programming environment which will compile Poker parallel programs without modification, into efficient object code for the new architecture.

  17. A Parallel Vector Machine for the PM Programming Language

    Science.gov (United States)

    Bellerby, Tim

    2016-04-01

    PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using

  18. Branch technical position on the use of expert elicitation in the high-level radioactive waste program

    Energy Technology Data Exchange (ETDEWEB)

    Kotra, J.P.; Lee, M.P.; Eisenberg, N.A. [Nuclear Regulatory Commission, Washington, DC (United States); DeWispelare, A.R. [Center for Nuclear Waste Regulatory Analyses, San Antonio, TX (United States)

    1996-11-01

    Should the site be found suitable, DOE will apply to the US Nuclear Regulatory Commission for permission to construct and then operate a proposed geologic repository for the disposal of spent nuclear fuel and other high-level radioactive waste at Yucca Mountain. In deciding whether to grant or deny DOE`s license application for a geologic repository, NRC will closely examine the facts and expert judgment set forth in any potential DOE license application. NRC expects that subjective judgments of individual experts and, in some cases, groups of experts, will be used by DOE to interpret data obtained during site characterization and to address the many technical issues and inherent uncertainties associated with predicting the performance of a repository system for thousands of years. NRC has traditionally accepted, for review, expert judgment to evaluate and interpret the factual bases of license applications and is expected to give appropriate consideration to the judgments of DOE`s experts regarding the geologic repository. Such consideration, however, envisions DOE using expert judgments to complement and supplement other sources of scientific and technical information, such as data collection, analyses, and experimentation. In this document, the NRC staff has set forth technical positions that: (1) provide general guidelines on those circumstances that may warrant the use of a formal process for obtaining the judgments of more than one expert (i.e., expert elicitation); and (2) describe acceptable procedures for conducting expert elicitation when formally elicited judgments are used to support a demonstration of compliance with NRC`s geologic disposal regulation, currently set forth in 10 CFR Part 60. 76 refs.

  19. Parallelizing dynamic sequential programs using polyhedral process networks

    NARCIS (Netherlands)

    Nadezhkin, Dmitry

    2012-01-01

    The Polyhedral Process Network (PPN) is a suitable parallel model of computation (MoC) used to specify embedded streaming applications in a parallel form facilitating the efficient mapping onto embedded parallel execution platforms. Unfortunately, specifying an application using a parallel MoC is a

  20. Open-MP与并行程序设计%Open-MP and Parallel Programming

    Institute of Scientific and Technical Information of China (English)

    陈崚; 陈宏建; 秦玲

    2003-01-01

    The application programming interface Open-MP for the shared memory parallel computer system and its characteristics are illustrated. We also compare Open-MP with parallel programming tool MPI.To overcome the disadvantage of large overhead in Open-MP program,several optimization methods in Open-MP programming are presented to increase the efficiency of its execution.

  1. Parallel functional programming in Sisal: Fictions, facts, and future

    Energy Technology Data Exchange (ETDEWEB)

    McGraw, J.R.

    1993-07-01

    This paper provides a status report on the progress of research and development on the functional language Sisal. This project focuses on providing a highly effective method of writing large scientific applications that can efficiently execute on a spectrum of different multiprocessors. The paper includes sections on the language definition, compilation strategies, and programming techniques intended for readers with little or no background with Sisal. The section on performance presents our most recent results on execution speed for shared-memory multiprocessors, our findings using Sisal to develop codes, and our experiences migrating the same source code to different machines. For large programs, the execution performance of Sisal (with minimal supporting advice from the programmer) usually exceeds that of the best available automatic, vector/parallel Fortran compilers. Our evidence also indicates that Sisal programs tend to be shorter in length, faster to write, and dearer to understand than equivalent algorithms in Fortran. The paper concludes with a substantial discussion of common criticisms of the language and our plans for addressing them. Most notably, efficient implementations for distributed memory machines are lacking; an issue we plan to remedy.

  2. DOMAIN :An Abstraction Means for Parallel Program Conceptual Design%域:支持并行程序概念设计的一种抽象手段

    Institute of Scientific and Technical Information of China (English)

    董超群; 陆林生

    2001-01-01

    DPHL is a Data Parallel High-level modeling Language,with which users can describe prob-lem-solving course in algorithmic level,closer to their perspectives. Domain is a basilic concept intro-duced in DPHL. This paper discusses the definition of domain and its Abstraction method in supporting parallel program conceptual design.

  3. Hanford Waste Vitrification Plant Quality Assurance Program description for high-level waste form development and qualification. Revision 3, Part 2

    Energy Technology Data Exchange (ETDEWEB)

    1993-08-01

    The Hanford Waste Vitrification Plant Project has been established to convert the high-level radioactive waste associated with nuclear defense production at the Hanford Site into a waste form suitable for disposal in a deep geologic repository. The Hanford Waste Vitrification Plant will mix processed radioactive waste with borosilicate material, then heat the mixture to its melting point (vitrification) to forin a glass-like substance that traps the radionuclides in the glass matrix upon cooling. The Hanford Waste Vitrification Plant Quality Assurance Program has been established to support the mission of the Hanford Waste Vitrification Plant. This Quality Assurance Program Description has been written to document the Hanford Waste Vitrification Plant Quality Assurance Program.

  4. The Rochester Checkers Player: Multi-Model Parallel Programming for Animate Vision

    Science.gov (United States)

    1991-06-01

    parallel programming is likely to serve for all tasks, however. Early vision algorithms are intensely data parallel, often utilizing fine-grain parallel computations that share an image, while cognition algorithms decompose naturally by function, often consisting of loosely-coupled, coarse-grain parallel units. A typical animate vision application will likely consist of many tasks, each of which may require a different parallel programming model, and all of which must cooperate to achieve the desired behavior. These multi-model programs require an

  5. Scoring methods and results for qualitative evaluation of public health impacts from the Hanford high-level waste tanks. Integrated Risk Assessment Program

    Energy Technology Data Exchange (ETDEWEB)

    Buck, J.W.; Gelston, G.M.; Farris, W.T.

    1995-09-01

    The objective of this analysis is to qualitatively rank the Hanford Site high-level waste (HLW) tanks according to their potential public health impacts through various (groundwater, surface water, and atmospheric) exposure pathways. Data from all 149 single-shell tanks (SSTs) and 23 of the 28 double-shell tanks (DSTs) in the Tank Waste Remediation System (TWRS) Program were analyzed for chemical and radiological carcinogenic as well as chemical noncarcinogenic health impacts. The preliminary aggregate score (PAS) ranking system was used to generate information from various release scenarios. Results based on the PAS ranking values should be considered relative health impacts rather than absolute risk values.

  6. Parallel Programming Methodologies for Non-Uniform Structured Problems in Materials Science

    Science.gov (United States)

    1993-10-01

    COVERED 1 10/93 _ Interim 12/01/92 - 09/30/93 4. TITLE AND SUBTITLE 5. FUNDING NUMBERS Parallel Programming Methodologies for Non-Uniform Structured...Dear Dr. van Tilborg, Enclosed you will find the annual report for " Parallel Programming Methodolo- gies for Non-Uniform Structured Problems in...Quincy Street Arlington, VA 22217-5660 Dear Dr. van Tilborg, Enclosed you will find the annual report for " Parallel Programming Methodolo- gies for Non

  7. 并行程序设计语言发展现状%Current Development of Parallel Programming Language

    Institute of Scientific and Technical Information of China (English)

    韩卫; 郝红宇; 代丽

    2003-01-01

    In this paper we introduce the history of the parallel programming language and list some of currently parallel programming languages. Then according to the classified principle. We analyze some of the representative parallel programming languages in detail. Finally, we show a further feature to the parallel programming language.

  8. Processor Allocation for Optimistic Parallelization of Irregular Programs

    CERN Document Server

    Versaci, Francesco

    2012-01-01

    Optimistic parallelization is a promising approach for the parallelization of irregular algorithms: potentially interfering tasks are launched dynamically, and the runtime system detects conflicts between concurrent activities, aborting and rolling back conflicting tasks. However, parallelism in irregular algorithms is very complex. In a regular algorithm like dense matrix multiplication, the amount of parallelism can usually be expressed as a function of the problem size, so it is reasonably straightforward to determine how many processors should be allocated to execute a regular algorithm of a certain size (this is called the processor allocation problem). In contrast, parallelism in irregular algorithms can be a function of input parameters, and the amount of parallelism can vary dramatically during the execution of the irregular algorithm. Therefore, the processor allocation problem for irregular algorithms is very difficult. In this paper, we describe the first systematic strategy for addressing this pro...

  9. Exploiting variability for energy optimization of parallel programs

    Energy Technology Data Exchange (ETDEWEB)

    Lavrijsen, Wim [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Iancu, Costin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); de Jong, Wibe [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Chen, Xin [Georgia Inst. of Technology, Atlanta, GA (United States); Schwan, Karsten [Georgia Inst. of Technology, Atlanta, GA (United States)

    2016-04-18

    Here in this paper we present optimizations that use DVFS mechanisms to reduce the total energy usage in scientific applications. Our main insight is that noise is intrinsic to large scale parallel executions and it appears whenever shared resources are contended. The presence of noise allows us to identify and manipulate any program regions amenable to DVFS. When compared to previous energy optimizations that make per core decisions using predictions of the running time, our scheme uses a qualitative approach to recognize the signature of executions amenable to DVFS. By recognizing the "shape of variability" we can optimize codes with highly dynamic behavior, which pose challenges to all existing DVFS techniques. We validate our approach using offline and online analyses for one-sided and two-sided communication paradigms. We have applied our methods to NWChem, and we show best case improvements in energy use of 12% at no loss in performance when using online optimizations running on 720 Haswell cores with one-sided communication. With NWChem on MPI two-sided and offline analysis, capturing the initialization, we find energy savings of up to 20%, with less than 1% performance cost.

  10. MulticoreBSP for C : A high-performance library for shared-memory parallel programming

    NARCIS (Netherlands)

    Yzelman, A. N.; Bisseling, R. H.; Roose, D.; Meerbergen, K.

    2014-01-01

    The bulk synchronous parallel (BSP) model, as well as parallel programming interfaces based on BSP, classically target distributed-memory parallel architectures. In earlier work, Yzelman and Bisseling designed a MulticoreBSP for Java library specifically for shared-memory architectures. In the prese

  11. HPF: a data parallel programming interface for large-scale numerical simulations

    Energy Technology Data Exchange (ETDEWEB)

    Seo, Yoshiki; Suehiro, Kenji; Murai, Hitoshi [NEC Corp., Tokyo (Japan)

    1998-03-01

    HPF (High Performance Fortran) is a data parallel language designed for programming on distributed memory parallel systems. The first draft of HPF1.0 was defined in 1993 as a de facto standard language. Recently, relatively reliable HPF compilers have become available on several distributed memory parallel systems. Many projects to parallelize real world programs have started mainly in the U.S. and Europe, and the weak and strong points in the current HPF have been made clear. In this paper, major data transfer patterns required to parallelize numerical simulations, such as SHIFT, matrix transposition, reduction, GATHER/SCATTER and irregular communication, and the programming methods to implement them with HPF are described. The problems in the current HPF interface for developing efficient parallel programs and recent activities to deal with them is presented as well. (author)

  12. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems.

    Science.gov (United States)

    Stone, John E; Gohara, David; Shi, Guochun

    2010-05-01

    We provide an overview of the key architectural features of recent microprocessor designs and describe the programming model and abstractions provided by OpenCL, a new parallel programming standard targeting these architectures.

  13. A Verified Integration of Imperative Parallel Programming Paradigms in an Object-Oriented Language

    OpenAIRE

    Sivilotti, Paul

    1993-01-01

    CC++ is a parallel object-oriented programming language that uses parallel composition, atomic functions, and single- assignment variables to express concurrency. We show that this programming paradigm is equivalent to several traditional imperative communication and synchronization models, namely: semaphores, monitors, and asynchronous channels. A collection of libraries which integrates these traditional models with CC++ is specified, implemented, and formally verified.

  14. Discussion on the High-level Language Programming Teaching Methods%高级语言程序设计教学方法探讨

    Institute of Scientific and Technical Information of China (English)

    陈丛

    2012-01-01

      Based on the training objectives and student characteristics of Independent College,this artical analysis the teaching lo⁃cation and teaching status of High-level language programming course. It summarizes the curriculum reform of the course and identifies a number of methods to improve the teaching quality and students’practical ability.%  该文基于独立学院的培养目标和学生特点,深入剖析了《高级语言程序设计》课程的教学定位和教学现状;总结了该课程的教学改革活动;找出了一些提高该课程教学质量和学生动手实践能力的方法。

  15. Ageing management program for the Spanish low and intermediate level waste disposal and spent fuel and high-level waste centralised storage facilities

    Directory of Open Access Journals (Sweden)

    Andrade C.

    2011-04-01

    Full Text Available The generic design of the centralised spent fuel storage facility was approved by the Spanish Safety Authority in 2006. The planned operational life is 60 years, while the design service life is 100 years. Durability studies and surveillance of the behaviour have been considered from the initial design steps, taking into account the accessibility limitations and temperatures involved. The paper presents an overview of the ageing management program set in support of the Performance Assessment and Safety Review of El Cabril low and intermediate level waste (LILW disposal facility. Based on the experience gained for LILW, ENRESA has developed a preliminary definition of the Ageing Management Plan for the Centralised Interim Storage Facility of spent Fuel and High Level Waste (HLW, which addresses the behaviour of spent fuel, its retrievability, the confinement system and the reinforced concrete structure. It includes tests plans and surveillance design considerations, based on the El Cabril LILW disposal facility.

  16. Japan-Australia co-operative program on research and development of technology for the management of high level radioactive wastes. Final report 1985 to 1998

    Energy Technology Data Exchange (ETDEWEB)

    Hart, K.; Vance, E.; Lumpkin, G. [Australian Nuclear Science and Technology Organisation, Lucas Heights, NSW (Australia); Mitamura, H.; Banba, T. [Japan Atomic Energy Research Inst. Tokai, Ibaraki (Japan)

    1998-12-01

    The overall aim of the Co-operative Program has been to promote the exchange of information on technology for the management of High-Level Wastes (HLW) and to encourage research and development relevant to such technology. During the 13 years that the Program has been carried out, HLW management strategies have matured and developed internationally, and Japan has commenced construction of a domestic reprocessing and vitrification facility for HLW. The HLW management strategy preferred is a national decision. Many countries are using vitrification, direct disposal of spent fuel or a combination of both to handle their existing wastes whereas others have deferred the decision. The work carried out in the Co-operative Program provides strong scientific evidence that the durability of ceramic waste forms is not significantly affected by radiation damage and that high loadings of actinide elements can be incorporated into specially designed ceramic waste forms. Moreover, natural minerals have been shown to remain as closed systems for U and Th for up to 2.5 b y. All of these results give confidence in the ability of second generation waste forms, such as Synroc, to handle future waste arisings that may not be suitable for vitrification 87 refs., 15 tabs., 22 figs.

  17. Retargeting of existing FORTRAN program and development of parallel compilers

    Science.gov (United States)

    Agrawal, Dharma P.

    1988-01-01

    The software models used in implementing the parallelizing compiler for the B-HIVE multiprocessor system are described. The various models and strategies used in the compiler development are: flexible granularity model, which allows a compromise between two extreme granularity models; communication model, which is capable of precisely describing the interprocessor communication timings and patterns; loop type detection strategy, which identifies different types of loops; critical path with coloring scheme, which is a versatile scheduling strategy for any multicomputer with some associated communication costs; and loop allocation strategy, which realizes optimum overlapped operations between computation and communication of the system. Using these models, several sample routines of the AIR3D package are examined and tested. It may be noted that automatically generated codes are highly parallelized to provide the maximized degree of parallelism, obtaining the speedup up to a 28 to 32-processor system. A comparison of parallel codes for both the existing and proposed communication model, is performed and the corresponding expected speedup factors are obtained. The experimentation shows that the B-HIVE compiler produces more efficient codes than existing techniques. Work is progressing well in completing the final phase of the compiler. Numerous enhancements are needed to improve the capabilities of the parallelizing compiler.

  18. A simple and efficient explicit parallelization of logic programs using low-level threading primitives

    CERN Document Server

    Saha, Diptikalyan

    2009-01-01

    In this work, we present an automatic way to parallelize logic programs for finding all the answers to queries using a transformation to low level threading primitives. Although much work has been done in parallelization of logic programming more than a decade ago (e.g., Aurora, Muse, YapOR), the current state of parallelizing logic programs is still very poor. This work presents a way for parallelism of tabled logic programs in XSB Prolog under the well founded semantics. An important contribution of this work relies in merging answer-tables from multiple children threads without incurring copying or full-sharing and synchronization of data-structures. The implementation of the parent-children shared answer-tables surpasses in efficiency all the other data-structures currently implemented for completion of answers in parallelization using multi-threading. The transformation and its lower-level answer merging predicates were implemented as an extension to the XSB system.

  19. High-level verification

    CERN Document Server

    Lerner, Sorin; Kundu, Sudipta

    2011-01-01

    Given the growing size and heterogeneity of Systems on Chip (SOC), the design process from initial specification to chip fabrication has become increasingly complex. This growing complexity provides incentive for designers to use high-level languages such as C, SystemC, and SystemVerilog for system-level design. While a major goal of these high-level languages is to enable verification at a higher level of abstraction, allowing early exploration of system-level designs, the focus so far for validation purposes has been on traditional testing techniques such as random testing and scenario-based

  20. Declarative Parallel Programming in Spreadsheet End-User Development

    DEFF Research Database (Denmark)

    Biermann, Florian

    2016-01-01

    Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations. In this liter......Spreadsheets are first-order functional languages and are widely used in research and industry as a tool to conveniently perform all kinds of computations. Because cells on a spreadsheet are immutable, there are possibilities for implicit parallelization of spreadsheet computations...

  1. Molecular dynamics simulation on a network of workstations using a machine-independent parallel programming language.

    OpenAIRE

    1991-01-01

    Molecular dynamics simulations investigate local and global motion in molecules. Several parallel computing approaches have been taken to attack the most computationally expensive phase of molecular simulations, the evaluation of long range interactions. This paper develops a straightforward but effective algorithm for molecular dynamics simulations using the machine-independent parallel programming language, Linda. The algorithm was run both on a shared memory parallel computer and on a netw...

  2. The Effect of Parallel Programming Languages on the Performance and Energy Consumption of HPC Applications

    Directory of Open Access Journals (Sweden)

    Muhammad Aqib

    2016-02-01

    Full Text Available Big and complex applications need many resources and long computation time to execute sequentially. In this scenario, all application's processes are handled in sequential fashion even if they are independent of each other. In high- performance computing environment, multiple processors are available to running applications in parallel. So mutually independent blocks of codes could run in parallel. This approach not only increases the efficiency of the system without affecting the results but also saves a significant amount of energy. Many parallel programming models or APIs like Open MPI, Open MP, CUDA, etc. are available to running multiple instructions in parallel. In this paper, the efficiency and energy consumption of two known tasks i.e. matrix multiplication and quicksort are analyzed using different parallel programming models and a multiprocessor machine. The obtained results, which can be generalized, outline the effect of choosing a programming model on the efficiency and energy consumption when running different codes on different machines.

  3. CRBLASTER: a fast parallel-processing program for cosmic ray rejection

    Science.gov (United States)

    Mighell, Kenneth J.

    2008-08-01

    Many astronomical image-analysis programs are based on algorithms that can be described as being embarrassingly parallel, where the analysis of one subimage generally does not affect the analysis of another subimage. Yet few parallel-processing astrophysical image-analysis programs exist that can easily take full advantage of todays fast multi-core servers costing a few thousands of dollars. A major reason for the shortage of state-of-the-art parallel-processing astrophysical image-analysis codes is that the writing of parallel codes has been perceived to be difficult. I describe a new fast parallel-processing image-analysis program called crblaster which does cosmic ray rejection using van Dokkum's L.A.Cosmic algorithm. crblaster is written in C using the industry standard Message Passing Interface (MPI) library. Processing a single 800×800 HST WFPC2 image takes 1.87 seconds using 4 processes on an Apple Xserve with two dual-core 3.0-GHz Intel Xeons; the efficiency of the program running with the 4 processors is 82%. The code can be used as a software framework for easy development of parallel-processing image-anlaysis programs using embarrassing parallel algorithms; the biggest required modification is the replacement of the core image processing function with an alternative image-analysis function based on a single-processor algorithm. I describe the design, implementation and performance of the program.

  4. Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs

    Energy Technology Data Exchange (ETDEWEB)

    Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal; Kulkarni, Milind

    2017-01-26

    Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that these schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.

  5. Concurrent Collections (CnC): A new approach to parallel programming

    CERN Document Server

    CERN. Geneva

    2010-01-01

    A common approach in designing parallel languages is to provide some high level handles to manipulate the use of the parallel platform. This exposes some aspects of the target platform, for example, shared vs. distributed memory. It may expose some but not all types of parallelism, for example, data parallelism but not task parallelism. This approach must find a balance between the desire to provide a simple view for the domain expert and provide sufficient power for tuning. This is hard for any given architecture and harder if the language is to apply to a range of architectures. Either simplicity or power is lost. Instead of viewing the language design problem as one of providing the programmer with high level handles, we view the problem as one of designing an interface. On one side of this interface is the programmer (domain expert) who knows the application but needs no knowledge of any aspects of the platform. On the other side of the interface is the performance expert (programmer o...

  6. Computer simulation program for parallel SITAN. [Sandia Inertia Terrain-Aided Navigation, in FORTRAN

    Energy Technology Data Exchange (ETDEWEB)

    Andreas, R.D.; Sheives, T.C.

    1980-11-01

    This computer program simulates the operation of parallel SITAN using digitized terrain data. An actual trajectory is modeled including the effects of inertial navigation errors and radar altimeter measurements.

  7. Identification of a subset of human natural killer cells expressing high levels of programmed death 1: A phenotypic and functional characterization.

    Science.gov (United States)

    Pesce, Silvia; Greppi, Marco; Tabellini, Giovanna; Rampinelli, Fabio; Parolini, Silvia; Olive, Daniel; Moretta, Lorenzo; Moretta, Alessandro; Marcenaro, Emanuela

    2017-01-01

    Programmed death 1 (PD-1) is an immunologic checkpoint that limits immune responses by delivering potent inhibitory signals to T cells on interaction with specific ligands expressed on tumor/virus-infected cells, thus contributing to immune escape mechanisms. Therapeutic PD-1 blockade has been shown to mediate tumor eradication with impressive clinical results. Little is known about the expression/function of PD-1 on human natural killer (NK) cells. We sought to clarify whether human NK cells can express PD-1 and analyze their phenotypic/functional features. We performed multiparametric cytofluorimetric analysis of PD-1(+) NK cells and their functional characterization using degranulation, cytokine production, and proliferation assays. We provide unequivocal evidence that PD-1 is highly expressed (PD-1(bright)) on an NK cell subset detectable in the peripheral blood of approximately one fourth of healthy subjects. These donors are always serologically positive for human cytomegalovirus. PD-1 is expressed by CD56(dim) but not CD56(bright) NK cells and is confined to fully mature NK cells characterized by the NKG2A(-)KIR(+)CD57(+) phenotype. Proportions of PD-1(bright) NK cells were higher in the ascites of a cohort of patients with ovarian carcinoma, suggesting their possible induction/expansion in tumor environments. Functional analysis revealed a reduced proliferative capability in response to cytokines, low degranulation, and impaired cytokine production on interaction with tumor targets. We have identified and characterized a novel subpopulation of human NK cells expressing high levels of PD-1. These cells have the phenotypic characteristics of fully mature NK cells and are increased in patients with ovarian carcinoma. They display low proliferative responses and impaired antitumor activity that can be partially restored by antibody-mediated disruption of PD-1/programmed death ligand interaction. Copyright © 2016 American Academy of Allergy, Asthma

  8. Resolutions of the Coulomb operator: VIII. Parallel implementation using the modern programming language X10.

    Science.gov (United States)

    Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P

    2014-10-30

    Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine.

  9. ALICE High Level Trigger

    CERN Multimedia

    Alt, T

    2013-01-01

    The ALICE High Level Trigger (HLT) is a computing farm designed and build for the real-time, online processing of the raw data produced by the ALICE detectors. Events are fully reconstructed from the raw data, analyzed and compressed. The analysis summary together with the compressed data and a trigger decision is sent to the DAQ. In addition the reconstruction of the events allows for on-line monitoring of physical observables and this information is provided to the Data Quality Monitor (DQM). The HLT can process event rates of up to 2 kHz for proton-proton and 200 Hz for Pb-Pb central collisions.

  10. Concurrent extensions to the FORTRAN language for parallel programming of computational fluid dynamics algorithms

    Science.gov (United States)

    Weeks, Cindy Lou

    1986-01-01

    Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.

  11. Using CLIPS in the domain of knowledge-based massively parallel programming

    Science.gov (United States)

    Dvorak, Jiri J.

    1994-01-01

    The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.

  12. Protocol-Based Verification of Message-Passing Parallel Programs

    DEFF Research Database (Denmark)

    López-Acosta, Hugo-Andrés; Eduardo R. B. Marques, Eduardo R. B.; Martins, Francisco

    2015-01-01

    translated into a representation read by VCC, a software verifier for C. We successfully verified several MPI programs in a running time that is independent of the number of processes or other input parameters. This contrasts with alternative techniques, notably model checking and runtime verification...

  13. Guide to development of a scalar massive parallel programming on Paragon

    Energy Technology Data Exchange (ETDEWEB)

    Ueshima, Yutaka; Arakawa, Takuya; Sasaki, Akira [Japan Atomic Energy Research Inst., Neyagawa, Osaka (Japan). Kansai Research Establishment; Yokota, Hisasi

    1998-10-01

    Parallel calculations using more than hundred computers had begun in Japan only several years ago. The Intel Paragon XP/S 15GP256 , 75MP834 were introduced as pioneers in Japan Atomic Energy Research Institute (JAERI) to pursue massive parallel simulations for advanced photon and fusion researches. Recently, large number of parallel programs have been transplanted or newly produced to perform the parallel calculations with those computers. However, these programs are developed based on software technologies for conventional super computer, therefore they sometimes cause troubles in the massive parallel computing. In principle, when programs are developed under different computer and operating system (OS), prudent directions and knowledge are needed. However, integration of knowledge and standardization of environment are quite difficult because number of Paragon system and Paragon`s users are very small in Japan. Therefore, we summarized information which was got through the process of development of a massive parallel program in the Paragon XP/S 75MP834. (author)

  14. Parallelized Solution to Semidefinite Programmings in Quantum Complexity Theory

    CERN Document Server

    Wu, Xiaodi

    2010-01-01

    In this paper we present an equilibrium value based framework for solving SDPs via the multiplicative weight update method which is different from the one in Kale's thesis \\cite{Kale07}. One of the main advantages of the new framework is that we can guarantee the convertibility from approximate to exact feasibility in a much more general class of SDPs than previous result. Another advantage is the design of the oracle which is necessary for applying the multiplicative weight update method is much simplified in general cases. This leads to an alternative and easier solutions to the SDPs used in the previous results \\class{QIP(2)}$\\subseteq$\\class{PSPACE} \\cite{JainUW09} and \\class{QMAM}=\\class{PSPACE} \\cite{JainJUW09}. Furthermore, we provide a generic form of SDPs which can be solved in the similar way. By parallelizing every step in our solution, we are able to solve a class of SDPs in \\class{NC}. Although our motivation is from quantum computing, our result will also apply directly to any SDP which satisfie...

  15. Programming a massively parallel, computation universal system: static behavior

    Energy Technology Data Exchange (ETDEWEB)

    Lapedes, A.; Farber, R.

    1986-01-01

    In previous work by the authors, the ''optimum finding'' properties of Hopfield neural nets were applied to the nets themselves to create a ''neural compiler.'' This was done in such a way that the problem of programming the attractors of one neural net (called the Slave net) was expressed as an optimization problem that was in turn solved by a second neural net (the Master net). In this series of papers that approach is extended to programming nets that contain interneurons (sometimes called ''hidden neurons''), and thus deals with nets capable of universal computation. 22 refs.

  16. Parallelizing Deadlock Resolution in Symbolic Synthesis of Distributed Programs

    Science.gov (United States)

    2008-01-01

    follows. In Sections 2 and 3, we present precise defini- tions for distributed programs, specifications, and fault- tolerance. We formally state the...Subsequently, experimental results and analysis are presented in Section 6. Related work is discussed in Section 7. Finally, we conclude in Section...infinite com- putation by stuttering at sl. On the other hand, if there exists a state sd such that there is no outgoing transition (or a self-loop

  17. Concurrent Programming Using Actors: Exploiting Large-Scale Parallelism,

    Science.gov (United States)

    1985-10-07

    ORGANIZATION NAME AND ADDRESS 10. PROGRAM ELEMENT. PROJECT. TASK* Artificial Inteligence Laboratory AREA Is WORK UNIT NUMBERS 545 Technology Square...D-R162 422 CONCURRENT PROGRMMIZNG USING f"OS XL?ITP TEH l’ LARGE-SCALE PARALLELISH(U) NASI AC E Al CAMBRIDGE ARTIFICIAL INTELLIGENCE L. G AGHA ET AL...RESOLUTION TEST CHART N~ATIONAL BUREAU OF STANDA.RDS - -96 A -E. __ _ __ __’ .,*- - -- •. - MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL

  18. Describing, using 'recognition cones'. [parallel-series model with English-like computer program

    Science.gov (United States)

    Uhr, L.

    1973-01-01

    A parallel-serial 'recognition cone' model is examined, taking into account the model's ability to describe scenes of objects. An actual program is presented in an English-like language. The concept of a 'description' is discussed together with possible types of descriptive information. Questions regarding the level and the variety of detail are considered along with approaches for improving the serial representations of parallel systems.

  19. Method for resource control in parallel environments using program organization and run-time support

    Science.gov (United States)

    Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)

    2001-01-01

    A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.

  20. Task scheduling of parallel programs to optimize communications for cluster of SMPs

    Institute of Scientific and Technical Information of China (English)

    郑纬民; 杨博; 林伟坚; 李志光

    2001-01-01

    This paper discusses the compile time task scheduling of parallel program running on cluster of SMP workstations. Firstly, the problem is stated formally and transformed into a graph partition problem and proved to be NP-Complete. A heuristic algorithm MMP-Solver is then proposed to solve the problem. Experiment result shows that the task scheduling can reduce communication overhead of parallel applications greatly and MMP-Solver outperforms the existing algorithms.

  1. Efficient Parallelization of the Stochastic Dual Dynamic Programming Algorithm Applied to Hydropower Scheduling

    Directory of Open Access Journals (Sweden)

    Arild Helseth

    2015-12-01

    Full Text Available Stochastic dual dynamic programming (SDDP has become a popular algorithm used in practical long-term scheduling of hydropower systems. The SDDP algorithm is computationally demanding, but can be designed to take advantage of parallel processing. This paper presents a novel parallel scheme for the SDDP algorithm, where the stage-wise synchronization point traditionally used in the backward iteration of the SDDP algorithm is partially relaxed. The proposed scheme was tested on a realistic model of a Norwegian water course, proving that the synchronization point relaxation significantly improves parallel efficiency.

  2. CRBLASTER: A Fast Parallel-Processing Program for Cosmic Ray Rejection in Space-Based Observations

    Science.gov (United States)

    Mighell, K.

    Many astronomical image analysis tasks are based on algorithms that can be described as being embarrassingly parallel - where the analysis of one subimage generally does not affect the analysis of another subimage. Yet few parallel-processing astrophysical image-analysis programs exist that can easily take full advantage of today's fast multi-core servers costing a few thousands of dollars. One reason for the shortage of state-of-the-art parallel processing astrophysical image-analysis codes is that the writing of parallel codes has been perceived to be difficult. I describe a new fast parallel-processing image-analysis program called CRBLASTER which does cosmic ray rejection using van Dokkum's L.A.Cosmic algorithm. CRBLASTER is written in C using the industry standard Message Passing Interface library. Processing a single 800 x 800 Hubble Space Telescope Wide-Field Planetary Camera 2 (WFPC2) image takes 1.9 seconds using 4 processors on an Apple Xserve with two dual-core 3.0-GHz Intel Xeons; the efficiency of the program running with the 4 cores is 82%. The code has been designed to be used as a software framework for the easy development of parallel-processing image-analysis programs using embarrassing parallel algorithms; all that needs to be done is to replace the core image processing task (in this case the C function that performs the L.A.Cosmic algorithm) with an alternative image analysis task based on a single processor algorithm. I describe the design and implementation of the program and then discuss how it could possibly be used to quickly do time-critical analysis applications such as those involved with space surveillance or do complex calibration tasks as part of the pipeline processing of images from large focal plane arrays.

  3. A Tool for Performance Modeling of Parallel Programs

    Directory of Open Access Journals (Sweden)

    J.A. González

    2003-01-01

    Full Text Available Current performance prediction analytical models try to characterize the performance behavior of actual machines through a small set of parameters. In practice, substantial deviations are observed. These differences are due to factors as memory hierarchies or network latency. A natural approach is to associate a different proportionality constant with each basic block, and analogously, to associate different latencies and bandwidths with each "communication block". Unfortunately, to use this approach implies that the evaluation of parameters must be done for each algorithm. This is a heavy task, implying experiment design, timing, statistics, pattern recognition and multi-parameter fitting algorithms. Software support is required. We present a compiler that takes as source a C program annotated with complexity formulas and produces as output an instrumented code. The trace files obtained from the execution of the resulting code are analyzed with an interactive interpreter, giving us, among other information, the values of those parameters.

  4. Remote Memory Access: A Case for Portable, Efficient and Library Independent Parallel Programming

    Directory of Open Access Journals (Sweden)

    Alexandros V. Gerbessiotis

    2004-01-01

    Full Text Available In this work we make a strong case for remote memory access (RMA as the effective way to program a parallel computer by proposing a framework that supports RMA in a library independent, simple and intuitive way. If one uses our approach the parallel code one writes will run transparently under MPI-2 enabled libraries but also bulk-synchronous parallel libraries. The advantage of using RMA is code simplicity, reduced programming complexity, and increased efficiency. We support the latter claims by implementing under this framework a collection of benchmark programs consisting of a communication and synchronization performance assessment program, a dense matrix multiplication algorithm, and two variants of a parallel radix-sort algorithm and examine their performance on a LINUX-based PC cluster under three different RMA enabled libraries: LAM MPI, BSPlib, and PUB. We conclude that implementations of such parallel algorithms using RMA communication primitives lead to code that is as efficient as the message-passing equivalent code and in the case of radix-sort substantially more efficient. In addition our work can be used as a comparative study of the relevant capabilities of the three libraries.

  5. High performance parallelism pearls 2 multicore and many-core programming approaches

    CERN Document Server

    Jeffers, Jim

    2015-01-01

    High Performance Parallelism Pearls Volume 2 offers another set of examples that demonstrate how to leverage parallelism. Similar to Volume 1, the techniques included here explain how to use processors and coprocessors with the same programming - illustrating the most effective ways to combine Xeon Phi coprocessors with Xeon and other multicore processors. The book includes examples of successful programming efforts, drawn from across industries and domains such as biomed, genetics, finance, manufacturing, imaging, and more. Each chapter in this edited work includes detailed explanations of t

  6. Empirical valence bond models for reactive potential energy surfaces: a parallel multilevel genetic program approach.

    Science.gov (United States)

    Bellucci, Michael A; Coker, David F

    2011-07-28

    We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent.

  7. Parallel programming of saccades during natural scene viewing: evidence from eye movement positions.

    Science.gov (United States)

    Wu, Esther X W; Gilani, Syed Omer; van Boxtel, Jeroen J A; Amihai, Ido; Chua, Fook Kee; Yen, Shih-Cheng

    2013-10-24

    Previous studies have shown that saccade plans during natural scene viewing can be programmed in parallel. This evidence comes mainly from temporal indicators, i.e., fixation durations and latencies. In the current study, we asked whether eye movement positions recorded during scene viewing also reflect parallel programming of saccades. As participants viewed scenes in preparation for a memory task, their inspection of the scene was suddenly disrupted by a transition to another scene. We examined whether saccades after the transition were invariably directed immediately toward the center or were contingent on saccade onset times relative to the transition. The results, which showed a dissociation in eye movement behavior between two groups of saccades after the scene transition, supported the parallel programming account. Saccades with relatively long onset times (>100 ms) after the transition were directed immediately toward the center of the scene, probably to restart scene exploration. Saccades with short onset times (programming of saccades during scene viewing. Additionally, results from the analyses of intersaccadic intervals were also consistent with the parallel programming hypothesis.

  8. Grid Service Framework:Supporting Multi-Models Parallel Grid Programming

    Institute of Scientific and Technical Information of China (English)

    邓倩妮; 陆鑫达

    2004-01-01

    Web service is a grid computing technology that promises greater ease-of-use and interoperability than previous distributed computing technologies. This paper proposed Group Service Framework, a grid computing platform based on Microsoft. NET that use web service to: (1) locate and harness volunteer computing resources for different applications, and (2) support multi-models such as Master/Slave, Divide and Conquer, Phase Parallel and so forth parallel programming paradigms in Grid environment, (3) allocate data and balance load dynamically and transparently for grid computing application. The Grid Service Framework based on Microsoft. NET was used to implement several simple parallel computing applications. The results show that the proposed Group Service Framework is suitable for generic parallel numerical computing.

  9. Architecture-Adaptive Computing Environment: A Tool for Teaching Parallel Programming

    Science.gov (United States)

    Dorband, John E.; Aburdene, Maurice F.

    2002-01-01

    Recently, networked and cluster computation have become very popular. This paper is an introduction to a new C based parallel language for architecture-adaptive programming, aCe C. The primary purpose of aCe (Architecture-adaptive Computing Environment) is to encourage programmers to implement applications on parallel architectures by providing them the assurance that future architectures will be able to run their applications with a minimum of modification. A secondary purpose is to encourage computer architects to develop new types of architectures by providing an easily implemented software development environment and a library of test applications. This new language should be an ideal tool to teach parallel programming. In this paper, we will focus on some fundamental features of aCe C.

  10. Parallel Implementation of a Semidefinite Programming Solver based on CSDP in a distributed memory cluster

    NARCIS (Netherlands)

    Ivanov, I.D.; de Klerk, E.

    2007-01-01

    In this paper we present the algorithmic framework and practical aspects of implementing a parallel version of a primal-dual semidefinite programming solver on a distributed memory computer cluster. Our implementation is based on the CSDP solver and uses a message passing interface (MPI), and the Sc

  11. All-pairs Shortest Path Algorithm based on MPI+CUDA Distributed Parallel Programming Model

    Directory of Open Access Journals (Sweden)

    Qingshuang Wu

    2013-12-01

    Full Text Available In view of the problem that computing shortest paths in a graph is a complex and time-consuming process, and the traditional algorithm that rely on the CPU as computing unit solely can't meet the demand of real-time processing, in this paper, we present an all-pairs shortest paths algorithm using MPI+CUDA hybrid programming model, which can take use of the overwhelming computing power of the GPU cluster to speed up the processing. This proposed algorithm can combine the advantages of MPI and CUDA programming model, and can realize two-level parallel computing. In the cluster-level, we take use of the MPI programming model to achieve a coarse-grained parallel computing between the computational nodes of the GPU cluster. In the node-level, we take use of the CUDA programming model to achieve a GPU-accelerated fine grit parallel computing in each computational node internal. The experimental results show that the MPI+CUDA-based parallel algorithm can take full advantage of the powerful computing capability of the GPU cluster, and can achieve about hundreds of time speedup; The whole algorithm has good computing performance, reliability and scalability, and it is able to meet the demand of real-time processing of massive spatial shortest path analysis

  12. Parallel Implementation of a Semidefinite Programming Solver based on CSDP in a distributed memory cluster

    NARCIS (Netherlands)

    Ivanov, I.D.; de Klerk, E.

    2007-01-01

    In this paper we present the algorithmic framework and practical aspects of implementing a parallel version of a primal-dual semidefinite programming solver on a distributed memory computer cluster. Our implementation is based on the CSDP solver and uses a message passing interface (MPI), and the Sc

  13. Convex quadratic programming relaxations for parallel machine scheduling with controllable processing times subject to release times

    Institute of Scientific and Technical Information of China (English)

    ZHANG Feng; CHEN Feng; TANG Guochun

    2004-01-01

    Scheduling unrelated parallel machines with controllable processing times subject to release times is investigated. Based on the convex quadratic programming relaxation and the randomized rounding strategy, a 2-approximation algorithm is obtained for a special case with the all-or-none property and then a 3-approximation algorithm is presented for general problem.

  14. High Level of Integration in Integrated Disease Management Leads to Higher Usage in the e-Vita Study: Self-Management of Chronic Obstructive Pulmonary Disease With Web-Based Platforms in a Parallel Cohort Design.

    Science.gov (United States)

    Talboom-Kamp, Esther Pwa; Verdijk, Noortje A; Kasteleyn, Marise J; Harmans, Lara M; Talboom, Irvin Jsh; Numans, Mattijs E; Chavannes, Niels H

    2017-05-31

    Worldwide, nearly 3 million people die of chronic obstructive pulmonary disease (COPD) every year. Integrated disease management (IDM) improves disease-specific quality of life and exercise capacity for people with COPD, but can also reduce hospital admissions and hospital days. Self-management of COPD through eHealth interventions has shown to be an effective method to improve the quality and efficiency of IDM in several settings, but it remains unknown which factors influence usage of eHealth and change in behavior of patients. Our study, e-Vita COPD, compares different levels of integration of Web-based self-management platforms in IDM in three primary care settings. The main aim of this study is to analyze the factors that successfully promote the use of a self-management platform for COPD patients. The e-Vita COPD study compares three different approaches to incorporating eHealth via Web-based self-management platforms into IDM of COPD using a parallel cohort design. Three groups integrated the platforms to different levels. In groups 1 (high integration) and 2 (medium integration), randomization was performed to two levels of personal assistance for patients (high and low assistance); in group 3 there was no integration into disease management (none integration). Every visit to the e-Vita and Zorgdraad COPD Web platforms was tracked objectively by collecting log data (sessions and services). At the first log-in, patients completed a baseline questionnaire. Baseline characteristics were automatically extracted from the log files including age, gender, education level, scores on the Clinical COPD Questionnaire (CCQ), dyspnea scale (MRC), and quality of life questionnaire (EQ5D). To predict the use of the platforms, multiple linear regression analyses for the different independent variables were performed: integration in IDM (high, medium, none), personal assistance for the participants (high vs low), educational level, and self-efficacy level (General Self

  15. LDRD final report on massively-parallel linear programming : the parPCx system.

    Energy Technology Data Exchange (ETDEWEB)

    Parekh, Ojas (Emory University, Atlanta, GA); Phillips, Cynthia Ann; Boman, Erik Gunnar

    2005-02-01

    This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runs on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and

  16. Speedup properties of phases in the execution profile of distributed parallel programs

    Energy Technology Data Exchange (ETDEWEB)

    Carlson, B.M. [Toronto Univ., ON (Canada). Computer Systems Research Institute; Wagner, T.D.; Dowdy, L.W. [Vanderbilt Univ., Nashville, TN (United States). Dept. of Computer Science; Worley, P.H. [Oak Ridge National Lab., TN (United States)

    1992-08-01

    The execution profile of a distributed-memory parallel program specifies the number of busy processors as a function of time. Periods of homogeneous processor utilization are manifested in many execution profiles. These periods can usually be correlated with the algorithms implemented in the underlying parallel code. Three families of methods for smoothing execution profile data are presented. These approaches simplify the problem of detecting end points of periods of homogeneous utilization. These periods, called phases, are then examined in isolation, and their speedup characteristics are explored. A specific workload executed on an Intel iPSC/860 is used for validation of the techniques described.

  17. Method, systems, and computer program products for implementing function-parallel network firewall

    Science.gov (United States)

    Fulp, Errin W.; Farley, Ryan J.

    2011-10-11

    Methods, systems, and computer program products for providing function-parallel firewalls are disclosed. According to one aspect, a function-parallel firewall includes a first firewall node for filtering received packets using a first portion of a rule set including a plurality of rules. The first portion includes less than all of the rules in the rule set. At least one second firewall node filters packets using a second portion of the rule set. The second portion includes at least one rule in the rule set that is not present in the first portion. The first and second portions together include all of the rules in the rule set.

  18. Programming Environment for a High-Performance Parallel Supercomputer with Intelligent Communication

    Directory of Open Access Journals (Sweden)

    A. Gunzinger

    1996-01-01

    Full Text Available At the Electronics Laboratory of the Swiss Federal Institute of Technology (ETH in Zürich, the high-performance parallel supercomputer MUSIC (MUlti processor System with Intelligent Communication has been developed. As applications like neural network simulation and molecular dynamics show, the Electronics Laboratory supercomputer is absolutely on par with those of conventional supercomputers, but electric power requirements are reduced by a factor of 1,000, weight is reduced by a factor of 400, and price is reduced by a factor of 100. Software development is a key issue of such parallel systems. This article focuses on the programming environment of the MUSIC system and on its applications.

  19. Teaching Scientific Computing: A Model-Centered Approach to Pipeline and Parallel Programming with C

    Directory of Open Access Journals (Sweden)

    Vladimiras Dolgopolovas

    2015-01-01

    Full Text Available The aim of this study is to present an approach to the introduction into pipeline and parallel computing, using a model of the multiphase queueing system. Pipeline computing, including software pipelines, is among the key concepts in modern computing and electronics engineering. The modern computer science and engineering education requires a comprehensive curriculum, so the introduction to pipeline and parallel computing is the essential topic to be included in the curriculum. At the same time, the topic is among the most motivating tasks due to the comprehensive multidisciplinary and technical requirements. To enhance the educational process, the paper proposes a novel model-centered framework and develops the relevant learning objects. It allows implementing an educational platform of constructivist learning process, thus enabling learners’ experimentation with the provided programming models, obtaining learners’ competences of the modern scientific research and computational thinking, and capturing the relevant technical knowledge. It also provides an integral platform that allows a simultaneous and comparative introduction to pipelining and parallel computing. The programming language C for developing programming models and message passing interface (MPI and OpenMP parallelization tools have been chosen for implementation.

  20. Academic training: From Evolution Theory to Parallel and Distributed Genetic Programming

    CERN Multimedia

    2007-01-01

    2006-2007 ACADEMIC TRAINING PROGRAMME LECTURE SERIES 15, 16 March From 11:00 to 12:00 - Main Auditorium, bldg. 500 From Evolution Theory to Parallel and Distributed Genetic Programming F. FERNANDEZ DE VEGA / Univ. of Extremadura, SP Lecture No. 1: From Evolution Theory to Evolutionary Computation Evolutionary computation is a subfield of artificial intelligence (more particularly computational intelligence) involving combinatorial optimization problems, which are based to some degree on the evolution of biological life in the natural world. In this tutorial we will review the source of inspiration for this metaheuristic and its capability for solving problems. We will show the main flavours within the field, and different problems that have been successfully solved employing this kind of techniques. Lecture No. 2: Parallel and Distributed Genetic Programming The successful application of Genetic Programming (GP, one of the available Evolutionary Algorithms) to optimization problems has encouraged an ...

  1. Parallel programming of exogenous and endogenous components in the antisaccade task.

    Science.gov (United States)

    Massen, Cristina

    2004-04-01

    In the antisaccade task subjects are required to suppress the reflexive tendency to look at a peripherally presented stimulus and to perform a saccade in the opposite direction instead. The present studies aimed at investigating the inhibitory mechanisms responsible for successful performance in this task, testing a hypothesis of parallel programming of exogenous and endogenous components: A reflexive saccade to the stimulus is automatically programmed and competes with the concurrently established voluntary programme to look in the opposite direction. The experiments followed the logic of selectively manipulating the speed of processing of these components and testing the prediction that a selective slowing of the exogenous component should result in a reduced error rate in this task, while a selective slowing of the endogenous component should have the opposite effect. The results provide evidence for the hypothesis of parallel programming and are discussed in the context of alternative accounts of antisaccade performance.

  2. Calculation of illumination conditions at the lunar south pole - parallel programming approach

    Science.gov (United States)

    Figuera, R. Marco; Gläser, P.; Oberst, J.; De Rosa, D.

    2014-04-01

    In this paper we present a parallel programming approach to evaluate illumination conditions at the lunar south pole. Due to the small inclination (1.54°) of the lunar rotational axis with respect to the ecliptic plane and the topography of the lunar south pole, which allows long illumination periods, the study of illumination conditions is of great importance. Several tests were conducted in order to check the viability of the study and to optimize the tool used to calculate such illumination. First results using a simulated case study showed a reduction of the computation time in the order of 8-12 times using parallel programming in the Graphic Processing Unit (GPU) in comparison with sequential programming in the Central Processing Unit (CPU).

  3. Next Evolution of the Seneca College Outdoor Recreation Program: One Year of High Level Professional Outdoor Training and Development for Post-Diploma/Post-Degree Students.

    Science.gov (United States)

    Magee, Clare

    1998-01-01

    Describes the steps in utilizing fast-tracking to phase out the overloaded two-year Outdoor Recreation Technician Co-op program at Seneca College (Ontario) and phase in a one-year graduate Outdoor Recreation Certificate program with a lower teacher-student ratio. A concept model relates generalist core skills to specializations and outdoor…

  4. Algorithmic differentiation of pragma-defined parallel regions differentiating computer programs containing OpenMP

    CERN Document Server

    Förster, Michael

    2014-01-01

    Numerical programs often use parallel programming techniques such as OpenMP to compute the program's output values as efficient as possible. In addition, derivative values of these output values with respect to certain input values play a crucial role. To achieve code that computes not only the output values simultaneously but also the derivative values, this work introduces several source-to-source transformation rules. These rules are based on a technique called algorithmic differentiation. The main focus of this work lies on the important reverse mode of algorithmic differentiation. The inh

  5. Fixed-dimensional parallel linear programming via relative {Epsilon}-approximations

    Energy Technology Data Exchange (ETDEWEB)

    Goodrich, M.T.

    1996-12-31

    We show that linear programming in IR{sup d} can be solved deterministically in O((log log n){sup d}) time using linear work in the PRAM model of computation, for any fixed constant d. Our method is developed for the CRCW variant of the PRAM parallel computation model, and can be easily implemented to run in O(log n(log log n){sup d-1}) time using linear work on an EREW PRAM. A key component in these algorithms is a new, efficient parallel method for constructing E-nets and E-approximations (which have wide applicability in computational geometry). In addition, we introduce a new deterministic set approximation for range spaces with finite VC-exponent, which we call the {delta}-relative {epsilon}-approximation, and we show how such approximations can be efficiently constructed in parallel.

  6. Molecular dynamics simulation on a network of workstations using a machine-independent parallel programming language.

    Science.gov (United States)

    Shifman, M A; Windemuth, A; Schulten, K; Miller, P L

    1992-04-01

    Molecular dynamics simulations investigate local and global motion in molecules. Several parallel computing approaches have been taken to attack the most computationally expensive phase of molecular simulations, the evaluation of long range interactions. This paper reviews these approaches and develops a straightforward but effective algorithm using the machine-independent parallel programming language, Linda. The algorithm was run both on a shared memory parallel computer and on a network of high performance Unix workstations. Performance benchmarks were performed on both systems using two proteins. This algorithm offers a portable cost-effective alternative for molecular dynamics simulations. In view of the increasing numbers of networked workstations, this approach could help make molecular dynamics simulations more easily accessible to the research community.

  7. Sequence alignment tools: one parallel pattern to rule them all?

    Science.gov (United States)

    Misale, Claudia; Ferrero, Giulio; Torquati, Massimo; Aldinucci, Marco

    2014-01-01

    In this paper, we advocate high-level programming methodology for next generation sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools to their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols, and task scheduling, gaining more possibility for seamless performance tuning. In this work, we show some use cases in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.

  8. Sequence Alignment Tools: One Parallel Pattern to Rule Them All?

    Directory of Open Access Journals (Sweden)

    Claudia Misale

    2014-01-01

    Full Text Available In this paper, we advocate high-level programming methodology for next generation sequencers (NGS alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools to their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols, and task scheduling, gaining more possibility for seamless performance tuning. In this work, we show some use cases in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.

  9. Run-Time and Compiler Support for Programming in Adaptive Parallel Environments

    Directory of Open Access Journals (Sweden)

    Guy Edjlali

    1997-01-01

    Full Text Available For better utilization of computing resources, it is important to consider parallel programming environments in which the number of available processors varies at run-time. In this article, we discuss run-time support for data-parallel programming in such an adaptive environment. Executing programs in an adaptive environment requires redistributing data when the number of processors changes, and also requires determining new loop bounds and communication patterns for the new set of processors. We have developed a run-time library to provide this support. We discuss how the run-time library can be used by compilers of high-performance Fortran (HPF-like languages to generate code for an adaptive environment. We present performance results for a Navier-Stokes solver and a multigrid template run on a network of workstations and an IBM SP-2. Our experiments show that if the number of processors is not varied frequently, the cost of data redistribution is not significant compared to the time required for the actual computation. Overall, our work establishes the feasibility of compiling HPF for a network of nondedicated workstations, which are likely to be an important resource for parallel programming in the future.

  10. Towards Interactive Visual Exploration of Parallel Programs using a Domain-Specific Language

    KAUST Repository

    Klein, Tobias

    2016-04-19

    The use of GPUs and the massively parallel computing paradigm have become wide-spread. We describe a framework for the interactive visualization and visual analysis of the run-time behavior of massively parallel programs, especially OpenCL kernels. This facilitates understanding a program\\'s function and structure, finding the causes of possible slowdowns, locating program bugs, and interactively exploring and visually comparing different code variants in order to improve performance and correctness. Our approach enables very specific, user-centered analysis, both in terms of the recording of the run-time behavior and the visualization itself. Instead of having to manually write instrumented code to record data, simple code annotations tell the source-to-source compiler which code instrumentation to generate automatically. The visualization part of our framework then enables the interactive analysis of kernel run-time behavior in a way that can be very specific to a particular problem or optimization goal, such as analyzing the causes of memory bank conflicts or understanding an entire parallel algorithm.

  11. BSP模型下的并行程序设计与开发%Design and Development of Parallel Programs on Bulk Synchronous Parallel Model

    Institute of Scientific and Technical Information of China (English)

    赖树华; 陆朝俊; 孙永强

    2001-01-01

    The Bulk Synchronous Parallel (BSP) model was simply introduced, and the advantage of the parapllel program's design and development on BSP model was discussed. Then it analysed how to design and develop the parallel programs on BSP model and summarized several principles the developer must comply with. At last a useful parallel programming method based on the BSP model was presented: the two phase method of BSP parallel program design. An example was given to illustrate how to make use of the above method and the BSP performance prediction tool.%介绍了BSP(Bulk Synchronous Parallel)模型,讨论了在该模型下进行并行程序设计的优点、并行算法的分析和设计方法及其必须遵守的原则.以两矩阵的乘法为例说明了如何借助BSP并行程序性能预测工具,利用两阶段BSP并行程序设计方法进行BSP并行程序的设计和开发.

  12. The (parallel) approximability of non-boolean satisfiability problems and restricted integer programming

    Science.gov (United States)

    Serna, Maria; Trevisan, Luca; Xhafa, Fatos

    We present parallel approximation algorithms for maximization problems expressible by integer linear programs of a restricted syntactic form introduced by Barland et al. [BKT96]. One of our motivations was to show whether the approximation results in the framework of Barland et al. holds in the parallel setting. Our results are a confirmation of this, and thus we have a new common framework for both computational settings. Also, we prove almost tight non-approximability results, thus solving a main open question of Barland et al. We obtain the results through the constraint satisfaction problem over multi-valued domains, for which we show non-approximability results and develop parallel approximation algorithms. Our parallel approximation algorithms are based on linear programming and random rounding; they are better than previously known sequential algorithms. The non-approximability results are based on new recent progress in the fields of Probabilistically Checkable Proofs and Multi-Prover One-Round Proof Systems [Raz95, Hås97, AS97, RS97].

  13. From functional programming to multicore parallelism: A case study based on Presburger Arithmetic

    DEFF Research Database (Denmark)

    Dung, Phan Anh; Hansen, Michael Reichhardt

    2011-01-01

    The overall goal of this work is studying parallelization of functional programs with the specific case study of decision procedures for Presburger Arithmetic (PA). PA is a first order theory of integers accepting addition as its only operation. Whereas it has wide applications in different areas......, we are interested in using PA in connection with the Duration Calculus Model Checker (DCMC) [5]. There are effective decision procedures for PA including Cooper’s algorithm and the Omega Test; however, their complexity is extremely high with doubly exponential lower bound and triply exponential upper...... in the SMT-solver Z3 [8] which has the capability of solving Presburger formulas. Functional programming is well-suited for the domain of decision procedures, and its immutability feature helps to reduce parallelization effort. While Haskell has progressed with a lot of parallelismrelated research [6], we...

  14. Class Notes: Programming Parallel Algorithms CS 15-840B (Fall 1992)

    Science.gov (United States)

    1993-02-01

    840: Programming Parallel Algorithms Lecture #15 Scribe: Bob Wheeler Thursday, 6 Nov 92 Overview * Connected components (continued). * Minimum spanning...Sriram Sethuraman Singular value decomposition Ken Tew EEG analysis Eric Thayer Speech recognition Xuemei Wang & Bob Wheeler Matrix operations Matt...Computing, 14(4):862-874, 1985. [33] L. W. Tucker, C. R. Feynman , and D. M. Fritzsche. Object recognition using the Connection Machine. Proceedings CVPR 󈨜

  15. Dynamic programming in parallel boundary detection with application to ultrasound intima-media segmentation.

    Science.gov (United States)

    Zhou, Yuan; Cheng, Xinyao; Xu, Xiangyang; Song, Enmin

    2013-12-01

    Segmentation of carotid artery intima-media in longitudinal ultrasound images for measuring its thickness to predict cardiovascular diseases can be simplified as detecting two nearly parallel boundaries within a certain distance range, when plaque with irregular shapes is not considered. In this paper, we improve the implementation of two dynamic programming (DP) based approaches to parallel boundary detection, dual dynamic programming (DDP) and piecewise linear dual dynamic programming (PL-DDP). Then, a novel DP based approach, dual line detection (DLD), which translates the original 2-D curve position to a 4-D parameter space representing two line segments in a local image segment, is proposed to solve the problem while maintaining efficiency and rotation invariance. To apply the DLD to ultrasound intima-media segmentation, it is imbedded in a framework that employs an edge map obtained from multiplication of the responses of two edge detectors with different scales and a coupled snake model that simultaneously deforms the two contours for maintaining parallelism. The experimental results on synthetic images and carotid arteries of clinical ultrasound images indicate improved performance of the proposed DLD compared to DDP and PL-DDP, with respect to accuracy and efficiency.

  16. Japan-Australia Co-operative Program on research and development of technology for the management of high level radioactive wastes: phase II (1990-1995)

    Energy Technology Data Exchange (ETDEWEB)

    Banba, Tsunetaka [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Hart, K.P. [eds.

    1996-05-01

    The major activities associated with Japan-Australia Co-operative Program were the preparation, characterization and subsequent testing of both Cm-doped Synroc containing PW-4b simulated waste and Cm-doped single-phase zirconolite and perovskite, and the initiation of studies on naturally-occurring zirconolites to study the long-term durability of this mineral phase over geological time. The preparation of the Cm-doped samples was carried out in JAERI`s WASTEF facility at Tokai, with technical information and assistance provided by ANSTO where necessary. The experiments were designed to induce accelerated radiation damage in Synroc samples that would correspond to periods of Synroc storage of up to 100,000 years. The results are of considerable importance in evaluating the potential of the Synroc process as a means of dealing with HLW waste streams and represent a significant contribution to the understanding of the ability of Synroc to immobilize HLW elements. Overall the Phase II Co-operative Program has continued the excellent co-operative working relationship between the staff at the two institutions, and provided a better understanding of the potential advantages and limitations of Synroc as a second generation waste form. The work has shown the need for additional studies to be carried out on the effect of the levels of Cm-doping on the Cm leach rate, extension of natural analogue studies to define the geological conditions under which zirconolite is stable and development of models to provide long-term predictions of releases of HLW elements from Synroc under a range of repository conditions. It is strongly recommended that the program carried out in Phase II of the Co-operative Agreement be extended for a further three years to allow additional information on the above areas to be collected and reported in a document providing an overview of the Co-operative Program and recommendations on HLW management strategies. (J.P.N.).

  17. CLUSTEREASY:A Program for Simulating Scalar Field Evolution on Parallel Computers

    CERN Document Server

    Felder, Gary N

    2007-01-01

    We describe a new, parallel programming version of the scalar field simulation program LATTICEEASY. The new C++ program, CLUSTEREASY, can simulate arbitrary scalar field models on distributed-memory clusters. The speed and memory requirements scale well with the number of processors. As with the serial version of LATTICEEASY, CLUSTEREASY can run simulations in one, two, or three dimensions, with or without expansion of the universe, with customizable parameters and output. The program and its full documentation are available on the LATTICEEASY website at http://www.science.smith.edu/departments/Physics/fstaff/gfelder/latticeeasy/. In this paper we provide a brief overview of what CLUSTEREASY does and the ways in which it does and doesn't differ from the serial version of LATTICEEASY.

  18. CaKernel – A Parallel Application Programming Framework for Heterogenous Computing Architectures

    Directory of Open Access Journals (Sweden)

    Marek Blazewicz

    2011-01-01

    Full Text Available With the recent advent of new heterogeneous computing architectures there is still a lack of parallel problem solving environments that can help scientists to use easily and efficiently hybrid supercomputers. Many scientific simulations that use structured grids to solve partial differential equations in fact rely on stencil computations. Stencil computations have become crucial in solving many challenging problems in various domains, e.g., engineering or physics. Although many parallel stencil computing approaches have been proposed, in most cases they solve only particular problems. As a result, scientists are struggling when it comes to the subject of implementing a new stencil-based simulation, especially on high performance hybrid supercomputers. In response to the presented need we extend our previous work on a parallel programming framework for CUDA – CaCUDA that now supports OpenCL. We present CaKernel – a tool that simplifies the development of parallel scientific applications on hybrid systems. CaKernel is built on the highly scalable and portable Cactus framework. In the CaKernel framework, Cactus manages the inter-process communication via MPI while CaKernel manages the code running on Graphics Processing Units (GPUs and interactions between them. As a non-trivial test case we have developed a 3D CFD code to demonstrate the performance and scalability of the automatically generated code.

  19. High performance parallel computers for science: New developments at the Fermilab advanced computer program

    Energy Technology Data Exchange (ETDEWEB)

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

    1988-08-01

    Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs.

  20. The FORCE: A portable parallel programming language supporting computational structural mechanics

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Brehm, Juergen; Ramanan, Aruna

    1989-01-01

    This project supports the conversion of codes in Computational Structural Mechanics (CSM) to a parallel form which will efficiently exploit the computational power available from multiprocessors. The work is a part of a comprehensive, FORTRAN-based system to form a basis for a parallel version of the NICE/SPAR combination which will form the CSM Testbed. The software is macro-based and rests on the force methodology developed by the principal investigator in connection with an early scientific multiprocessor. Machine independence is an important characteristic of the system so that retargeting it to the Flex/32, or any other multiprocessor on which NICE/SPAR might be imnplemented, is well supported. The principal investigator has experience in producing parallel software for both full and sparse systems of linear equations using the force macros. Other researchers have used the Force in finite element programs. It has been possible to rapidly develop software which performs at maximum efficiency on a multiprocessor. The inherent machine independence of the system also means that the parallelization will not be limited to a specific multiprocessor.

  1. Review and critique of the US Department of Energy environmental program plan for site characterization for a high-level waste repository at Yucca Mountain, Nevada

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1992-12-31

    This report provides a review and critique of the US Department of Energy (DOE) environmental program plan for site characterization activities at Yucca Mountain which principally addresses compliance with federal and state environmental regulation and to a lesser extent monitoring and mitigation of significant adverse impacts and reclamation of disturbed areas. There are 15 documents which comprise the plan and focus on complying with the environmental requirements of the Nuclear Waste Policy Act, as amended, (NWPA) and with single-media environmental statutes and their regulations. All elements of the plan follow from the 1986 statutory environmental assessment (EA) required by NWPA which concluded that no significant adverse impacts would result from characterization of the Yucca Mountain site. The lack of appropriate environmental planning and review for site characterization at Yucca Mountain points to the need for an oversight function by the State of Nevada. It cannot be assumed that on its own DOE will properly comply with environmental requirements, especially the substantive requirements that comprise the intent of NEPA. Thus, procedures must be established to assure that the environmental interests of the State are addressed in the course of the Yucca Mountain Project. Accordingly, steps will be taken by the State of Nevada to review the soundness and efficacy of the DOE field surveys, monitoring and mitigation activities, reclamation actions, and ecological impact studies that follow from the DOE environmental program plans addressed by this review.

  2. Performance Evaluation of Parallel Message Passing and Thread Programming Model on Multicore Architectures

    CERN Document Server

    Hasta, D T

    2010-01-01

    The current trend of multicore architectures on shared memory systems underscores the need of parallelism. While there are some programming model to express parallelism, thread programming model has become a standard to support these system such as OpenMP, and POSIX threads. MPI (Message Passing Interface) which remains the dominant model used in high-performance computing today faces this challenge. Previous version of MPI which is MPI-1 has no shared memory concept, and Current MPI version 2 which is MPI-2 has a limited support for shared memory systems. In this research, MPI-2 version of MPI will be compared with OpenMP to see how well does MPI perform on multicore / SMP (Symmetric Multiprocessor) machines. Comparison between OpenMP for thread programming model and MPI for message passing programming model will be conducted on multicore shared memory machine architectures to see who has a better performance in terms of speed and throughput. Application used to assess the scalability of the evaluated parall...

  3. The neural basis of parallel saccade programming: an fMRI study.

    Science.gov (United States)

    Hu, Yanbo; Walker, Robin

    2011-11-01

    The neural basis of parallel saccade programming was examined in an event-related fMRI study using a variation of the double-step saccade paradigm. Two double-step conditions were used: one enabled the second saccade to be partially programmed in parallel with the first saccade while in a second condition both saccades had to be prepared serially. The intersaccadic interval, observed in the parallel programming (PP) condition, was significantly reduced compared with latency in the serial programming (SP) condition and also to the latency of single saccades in control conditions. The fMRI analysis revealed greater activity (BOLD response) in the frontal and parietal eye fields for the PP condition compared with the SP double-step condition and when compared with the single-saccade control conditions. By contrast, activity in the supplementary eye fields was greater for the double-step condition than the single-step condition but did not distinguish between the PP and SP requirements. The role of the frontal eye fields in PP may be related to the advanced temporal preparation and increased salience of the second saccade goal that may mediate activity in other downstream structures, such as the superior colliculus. The parietal lobes may be involved in the preparation for spatial remapping, which is required in double-step conditions. The supplementary eye fields appear to have a more general role in planning saccade sequences that may be related to error monitoring and the control over the execution of the correct sequence of responses.

  4. The ARES High-level Intermediate Representation

    Energy Technology Data Exchange (ETDEWEB)

    Moss, Nicholas David [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-03-03

    The LLVM intermediate representation (IR) lacks semantic constructs for depicting common high-performance operations such as parallel and concurrent execution, communication and synchronization. Currently, representing such semantics in LLVM requires either extending the intermediate form (a signi cant undertaking) or the use of ad hoc indirect means such as encoding them as intrinsics and/or the use of metadata constructs. In this paper we discuss a work in progress to explore the design and implementation of a new compilation stage and associated high-level intermediate form that is placed between the abstract syntax tree and when it is lowered to LLVM's IR. This highlevel representation is a superset of LLVM IR and supports the direct representation of these common parallel computing constructs along with the infrastructure for supporting analysis and transformation passes on this representation.

  5. Facing competition: Neural mechanisms underlying parallel programming of antisaccades and prosaccades.

    Science.gov (United States)

    Talanow, Tobias; Kasparbauer, Anna-Maria; Steffens, Maria; Meyhöfer, Inga; Weber, Bernd; Smyrnis, Nikolaos; Ettinger, Ulrich

    2016-08-01

    The antisaccade task is a prominent tool to investigate the response inhibition component of cognitive control. Recent theoretical accounts explain performance in terms of parallel programming of exogenous and endogenous saccades, linked to the horse race metaphor. Previous studies have tested the hypothesis of competing saccade signals at the behavioral level by selectively slowing the programming of endogenous or exogenous processes e.g. by manipulating the probability of antisaccades in an experimental block. To gain a better understanding of inhibitory control processes in parallel saccade programming, we analyzed task-related eye movements and blood oxygenation level dependent (BOLD) responses obtained using functional magnetic resonance imaging (fMRI) at 3T from 16 healthy participants in a mixed antisaccade and prosaccade task. The frequency of antisaccade trials was manipulated across blocks of high (75%) and low (25%) antisaccade frequency. In blocks with high antisaccade frequency, antisaccade latencies were shorter and error rates lower whilst prosaccade latencies were longer and error rates were higher. At the level of BOLD, activations in the task-related saccade network (left inferior parietal lobe, right inferior parietal sulcus, left precentral gyrus reaching into left middle frontal gyrus and inferior frontal junction) and deactivations in components of the default mode network (bilateral temporal cortex, ventromedial prefrontal cortex) compensated increased cognitive control demands. These findings illustrate context dependent mechanisms underlying the coordination of competing decision signals in volitional gaze control.

  6. Eighth SIAM conference on parallel processing for scientific computing: Final program and abstracts

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-12-31

    This SIAM conference is the premier forum for developments in parallel numerical algorithms, a field that has seen very lively and fruitful developments over the past decade, and whose health is still robust. Themes for this conference were: combinatorial optimization; data-parallel languages; large-scale parallel applications; message-passing; molecular modeling; parallel I/O; parallel libraries; parallel software tools; parallel compilers; particle simulations; problem-solving environments; and sparse matrix computations.

  7. Program Suite for Conceptual Designing of Parallel Mechanism-Based Robots and Machine Tools

    Directory of Open Access Journals (Sweden)

    Slobodan Tabaković

    2013-06-01

    This paper describes the categorization of criteria for the conceptual design of parallel mechanism‐based robots or machine tools, resulting from workspace analysis as well as the procedure of their defining. Furthermore, it also presents the designing methodology that was implemented into the program for the creation of a robot or machine tool space model and the optimization of the resulting solution. For verification of the criteria and the programme suite, three common (conceptually different mechanisms with a similar mechanical structure and kinematic characteristics were used.

  8. Full Parallel Implementation of an All-Electron Four-Component Dirac-Kohn-Sham Program.

    Science.gov (United States)

    Rampino, Sergio; Belpassi, Leonardo; Tarantelli, Francesco; Storchi, Loriano

    2014-09-09

    A full distributed-memory implementation of the Dirac-Kohn-Sham (DKS) module of the program BERTHA (Belpassi et al., Phys. Chem. Chem. Phys. 2011, 13, 12368-12394) is presented, where the self-consistent field (SCF) procedure is replicated on all the parallel processes, each process working on subsets of the global matrices. The key feature of the implementation is an efficient procedure for switching between two matrix distribution schemes, one (integral-driven) optimal for the parallel computation of the matrix elements and another (block-cyclic) optimal for the parallel linear algebra operations. This approach, making both CPU-time and memory scalable with the number of processors used, virtually overcomes at once both time and memory barriers associated with DKS calculations. Performance, portability, and numerical stability of the code are illustrated on the basis of test calculations on three gold clusters of increasing size, an organometallic compound, and a perovskite model. The calculations are performed on a Beowulf and a BlueGene/Q system.

  9. A comparison of distributed memory and virtual shared memory parallel programming models

    Energy Technology Data Exchange (ETDEWEB)

    Keane, J.A. [Univ. of Manchester (United Kingdom). Dept. of Computer Science; Grant, A.J. [Univ. of Manchester (United Kingdom). Computer Graphics Unit; Xu, M.Q. [Argonne National Lab., IL (United States)

    1993-04-01

    The virtues of the different parallel programming models, shared memory and distributed memory, have been much debated. Conventionally the debate could be reduced to programming convenience on the one hand, and high salability factors on the other. More recently the debate has become somewhat blurred with the provision of virtual shared memory models built on machines with physically distributed memory. The intention of such models/machines is to provide scalable shared memory, i.e. to provide both programmer convenience and high salability. In this paper, the different models are considered from experiences gained with a number of system ranging from applications in both commerce and science to languages and operating systems. Case studies are introduced as appropriate.

  10. High-Level Radioactive Waste.

    Science.gov (United States)

    Hayden, Howard C.

    1995-01-01

    Presents a method to calculate the amount of high-level radioactive waste by taking into consideration the following factors: the fission process that yields the waste, identification of the waste, the energy required to run a 1-GWe plant for one year, and the uranium mass required to produce that energy. Briefly discusses waste disposal and…

  11. High-level Petri Nets

    DEFF Research Database (Denmark)

    of some of the most important papers on the application and theory of high-level Petri nets. In this way it makes the relevant literature more available. It is our hope that the book will be a useful source of information and that, e.g., it can be used in the organization of Petri net courses. To make...... there is only one kind of token and this means that the state of a place is described by an integer (and in many cases even by a boolean). In high-level nets each token can carry a complex information/data - which, e.g., may describe the entire state of a process or a data base. Today most practical...... by other papers. Thus, e.g., none of the original papers introducing the first versions of high-level Petri nets have been included. The introductions to the individual sections mention a number of researchers who have contributed to the development of high-level Petri nets. Detailed references...

  12. GRADE: a graphical programming environment for multicomputers

    OpenAIRE

    Kacsuk, P; G. Dózsa; T. Fadfyas; R. Lovas

    2012-01-01

    To provide high-level graphical support for developing message passing programs, an integrated programming environment (GRADE) is being developed. GRADE currently provides tools to construct, execute, debug, monitor and visualize message passing based parallel programs. GRADE offers the programmer an integrated graphical user interface during the whole life-cycle of program development and provides high-level graphical programming abstraction mechanisms to construct parallel applications. The...

  13. High level binocular rivalry effects

    Directory of Open Access Journals (Sweden)

    Michal eWolf

    2011-12-01

    Full Text Available Binocular rivalry (BR occurs when the brain cannot fuse percepts from the two eyes because they are different. We review results relating to an ongoing controversy regarding the cortical site of the BR mechanism. Some BR qualities suggest it is low-level: 1 BR, as its name implies, is usually between eyes and only low levels have access to utrocular information. 2 All input to one eye is suppressed: blurring doesn’t stimulate accommodation; pupilary constrictions are reduced; probe detection is reduced. 3 Rivalry is affected by low level attributes, contrast, spatial frequency, brightness, motion. 4 There is limited priming due to suppressed words or pictures. On the other hand, recent studies favor a high level mechanism: 1 Rivalry occurs between patterns, not eyes, as in patchwork rivalry or a swapping paradigm. 2 Attention affects alternations. 3 Context affects dominance. There is conflicting evidence from physiological studies (single cell and fMRI regarding cortical level(s of conscious perception. We discuss the possibility of multiple BR sites and theoretical considerations that rule out this solution.We present new data regarding the locus of the BR switch by manipulating stimulus semantic content or high-level characteristics. Since these variations are represented at higher cortical levels, their affecting rivalry supports high-level BR intervention. In Experiment I, we measure rivalry when one eye views words and the other nonwords and find significantly longer dominance durations for nonwords. In Experiment II, we find longer dominance times for line drawings of simple, structurally impossible figures than for similar, possible objects. In Experiment III, we test the influence of idiomatic context on rivalry between words. Results show that generally words within their idiomatic context have longer mean dominance durations. We conclude that Binocular Rivalry has high-level cortical influences, and may be controlled by a high-level

  14. What is "the patient perspective" in patient engagement programs? Implicit logics and parallels to feminist theories.

    Science.gov (United States)

    Rowland, Paula; McMillan, Sarah; McGillicuddy, Patti; Richards, Joy

    2017-01-01

    Public and patient involvement (PPI) in health care may refer to many different processes, ranging from participating in decision-making about one's own care to participating in health services research, health policy development, or organizational reforms. Across these many forms of public and patient involvement, the conceptual and theoretical underpinnings remain poorly articulated. Instead, most public and patient involvement programs rely on policy initiatives as their conceptual frameworks. This lack of conceptual clarity participates in dilemmas of program design, implementation, and evaluation. This study contributes to the development of theoretical understandings of public and patient involvement. In particular, we focus on the deployment of patient engagement programs within health service organizations. To develop a deeper understanding of the conceptual underpinnings of these programs, we examined the concept of "the patient perspective" as used by patient engagement practitioners and participants. Specifically, we focused on the way this phrase was used in the singular: "the" patient perspective or "the" patient voice. From qualitative analysis of interviews with 20 patient advisers and 6 staff members within a large urban health network in Canada, we argue that "the patient perspective" is referred to as a particular kind of situated knowledge, specifically an embodied knowledge of vulnerability. We draw parallels between this logic of patient perspective and the logic of early feminist theory, including the concepts of standpoint theory and strong objectivity. We suggest that champions of patient engagement may learn much from the way feminist theorists have constructed their arguments and addressed critique.

  15. The ALICE high level trigger

    Science.gov (United States)

    Alt, T.; Grastveit, G.; Helstrup, H.; Lindenstruth, V.; Loizides, C.; Röhrich, D.; Skaali, B.; Steinbeck, T.; Stock, R.; Tilsner, H.; Ullaland, K.; Vestbø, A.; Vik, T.; Wiebalck, A.; the ALICE Collaboration

    2004-08-01

    The ALICE experiment at LHC will implement a high-level trigger system for online event selection and/or data compression. The largest computing challenge is posed by the TPC detector, which requires real-time pattern recognition. The system entails a very large processing farm that is designed for an anticipated input data stream of 25 GB s-1. In this paper, we present the architecture of the system and the current state of the tracking methods and data compression applications.

  16. DOE SBIR Phase-1 Report on Hybrid CPU-GPU Parallel Development of the Eulerian-Lagrangian Barracuda Multiphase Program

    Energy Technology Data Exchange (ETDEWEB)

    Dr. Dale M. Snider

    2011-02-28

    This report gives the result from the Phase-1 work on demonstrating greater than 10x speedup of the Barracuda computer program using parallel methods and GPU processors (General-Purpose Graphics Processing Unit or Graphics Processing Unit). Phase-1 demonstrated a 12x speedup on a typical Barracuda function using the GPU processor. The problem test case used about 5 million particles and 250,000 Eulerian grid cells. The relative speedup, compared to a single CPU, increases with increased number of particles giving greater than 12x speedup. Phase-1 work provided a path for reformatting data structure modifications to give good parallel performance while keeping a friendly environment for new physics development and code maintenance. The implementation of data structure changes will be in Phase-2. Phase-1 laid the ground work for the complete parallelization of Barracuda in Phase-2, with the caveat that implemented computer practices for parallel programming done in Phase-1 gives immediate speedup in the current Barracuda serial running code. The Phase-1 tasks were completed successfully laying the frame work for Phase-2. The detailed results of Phase-1 are within this document. In general, the speedup of one function would be expected to be higher than the speedup of the entire code because of I/O functions and communication between the algorithms. However, because one of the most difficult Barracuda algorithms was parallelized in Phase-1 and because advanced parallelization methods and proposed parallelization optimization techniques identified in Phase-1 will be used in Phase-2, an overall Barracuda code speedup (relative to a single CPU) is expected to be greater than 10x. This means that a job which takes 30 days to complete will be done in 3 days. Tasks completed in Phase-1 are: Task 1: Profile the entire Barracuda code and select which subroutines are to be parallelized (See Section Choosing a Function to Accelerate) Task 2: Select a GPU consultant company and

  17. The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

    Directory of Open Access Journals (Sweden)

    Matthew O'keefe

    1995-01-01

    Full Text Available Massively parallel processors (MPPs hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.

  18. A Review on Large Scale Graph Processing Using Big Data Based Parallel Programming Models

    Directory of Open Access Journals (Sweden)

    Anuraj Mohan

    2017-02-01

    Full Text Available Processing big graphs has become an increasingly essential activity in various fields like engineering, business intelligence and computer science. Social networks and search engines usually generate large graphs which demands sophisticated techniques for social network analysis and web structure mining. Latest trends in graph processing tend towards using Big Data platforms for parallel graph analytics. MapReduce has emerged as a Big Data based programming model for the processing of massively large datasets. Apache Giraph, an open source implementation of Google Pregel which is based on Bulk Synchronous Parallel Model (BSP is used for graph analytics in social networks like Facebook. This proposed work is to investigate the algorithmic effects of the MapReduce and BSP model on graph problems. The triangle counting problem in graphs is considered as a benchmark and evaluations are made on the basis of time of computation on the same cluster, scalability in relation to graph and cluster size, resource utilization and the structure of the graph.

  19. Mobile and replicated alignment of arrays in data-parallel programs

    Science.gov (United States)

    Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert

    1993-01-01

    When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. We solve two facets of the problem of finding alignments that reduce residual communication: we determine alignments that vary in loops, and objects that should have replicated alignments. We show that loop-dependent mobile alignment is sometimes necessary for optimum performance, and we provide algorithms with which a compiler can determine good mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. We propose an algorithm based on network flow that determines which objects to replicate so as to minimize the total amount of broadcast communication in replication. This work on mobile and replicated alignment extends our earlier work on determining static alignment.

  20. Distributed Memory Programming on Many-Cores

    DEFF Research Database (Denmark)

    Berthold, Jost; Dieterle, Mischa; Lobachev, Oleg;

    2009-01-01

    Eden is a parallel extension of the lazy functional language Haskell providing dynamic process creation and automatic data exchange. As a Haskell extension, Eden takes a high-level approach to parallel programming and thereby simplifies parallel program development. The current implementation is ...

  1. Research on Task Parallel Programming Model%任务并行编程模型研究与进展

    Institute of Scientific and Technical Information of China (English)

    王蕾; 崔慧敏; 陈莉; 冯晓兵

    2013-01-01

    Task parallel programming model is a widely used parallel programming model on multi-core platforms.With the intention of simplifying parallel programming and improving the utilization of multiple cores,this paper provides an introduction to the essential programming interfaces and the supporting mechanism used in task parallel programming models and discusses issues and the latest achievements from three perspectives:Parallelism expression,data management and task scheduling.In the end,some future trends in this area are discussed.%任务并行编程模型是近年来多核平台上广泛研究和使用的并行编程模型,旨在简化并行编程和提高多核利用率.首先,介绍了任务并行编程模型的基本编程接口和支持机制;然后,从3个角度,即并行性表达、数据管理和任务调度介绍任务并行编程模型的研究问题、困难和最新研究成果;最后展望了任务并行未来的研究方向.

  2. An approach to multicore parallelism using functional programming: A case study based on Presburger Arithmetic

    DEFF Research Database (Denmark)

    Dung, Phan Anh; Hansen, Michael Reichhardt

    2015-01-01

    platform executing on an 8-core machine. A speedup of approximately 4 was obtained for Cooper’s algorithm and a speedup of approximately 6 was obtained for the exact-shadow part of the Omega Test. The considered procedures are complex, memory-intense algorithms on huge formula trees and the case study...... reveals more general applicable techniques and guideline for deriving parallel algorithms from sequential ones in the context of data-intensive tree algorithms. The obtained insights should apply for any strict and impure functional programming language. Furthermore, the results obtained for the exact......-shadow elimination procedure have a wider applicability because they can directly be transferred to the Fourier–Motzkinelimination method....

  3. Managing Communication Latency-Hiding at Runtime for Parallel Programming Languages and Libraries

    CERN Document Server

    Kristensen, Mads Ruben Burgdorff

    2012-01-01

    This work introduces a runtime model for managing communication with support for latency-hiding. The model enables non-computer science researchers to exploit communication latency-hiding techniques seamlessly. For compiled languages, it is often possible to create efficient schedules for communication, but this is not the case for interpreted languages. By maintaining data dependencies between scheduled operations, it is possible to aggressively initiate communication and lazily evaluate tasks to allow maximal time for the communication to finish before entering a wait state. We implement a heuristic of this model in DistNumPy, an auto-parallelizing version of numerical Python that allows sequential NumPy programs to run on distributed memory architectures. Furthermore, we present performance comparisons for eight benchmarks with and without automatic latency-hiding. The results shows that our model reduces the time spent on waiting for communication as much as 27 times, from a maximum of 54% to only 2% of t...

  4. Parallel conjugate gradient: effects of ordering strategies, programming paradigms, and architectural platforms

    Energy Technology Data Exchange (ETDEWEB)

    Oliker, L.; Li, X.; Heber, G.; Biswas, R.

    2000-05-01

    The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations with a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multithreaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.

  5. Structured Parallel Programming: How Informatics Can Help Overcome the Software Dilemma

    Directory of Open Access Journals (Sweden)

    Helmar Burkhart

    1996-01-01

    Full Text Available The state-of-the-art programming of parallel computers is far from being successful. The main challenge today is, therefore, the development of techniques and tools that improve programmers' productivity. Programmability, portability, and reusability are key issues to be solved. In this article we shall report about our ongoing efforts in this direction. After a short discussion of the software dilemma found today, we shall present the Basel approach. We shall summarize our algorithm description methodology and discuss the basic concepts of the proposed skeleton language. An algorithmic example and comments on implementation aspects will explain our work in more detail. We shall summarize the current state of the implementation and conclude with a discussion of related work.

  6. Adaptive Representations for Improving Evolvability, Parameter Control, and Parallelization of Gene Expression Programming

    Directory of Open Access Journals (Sweden)

    Nigel P. A. Browne

    2010-01-01

    Full Text Available Gene Expression Programming (GEP is a genetic algorithm that evolves linear chromosomes encoding nonlinear (tree-like structures. In the original GEP algorithm, the genome size is problem specific and is determined through trial and error. In this work, a method for adaptive control of the genome size is presented. The approach introduces mutation, transposition, and recombination operators that enable a population of heterogeneously structured chromosomes, something the original GEP algorithm does not support. This permits crossbreeding between normally incompatible individuals, speciation within a population, increases the evolvability of the representations, and enhances parallel GEP. To test our approach, an assortment of problems were used, including symbolic regression, classification, and parameter optimization. Our experimental results show that our approach provides a solution for the problem of self-adaptive control of the genome size of GEP's representation.

  7. The siting record: An account of the programs of federal agencies and events that have led to the selection of a potential site for a geologic respository for high-level radioactive waste

    Energy Technology Data Exchange (ETDEWEB)

    Lomenick, T.F.

    1996-03-01

    This record of siting a geologic repository for high-level radioactive wastes (HLW) and spent fuel describes the many investigations that culminated on December 22, 1987 in the designation of Yucca Mountain (YM), as the site to undergo detailed geologic characterization. It recounts the important issues and events that have been instrumental in shaping the course of siting over the last three and one half decades. In this long task, which was initiated in 1954, more than 60 regions, areas, or sites involving nine different rock types have been investigated. This effort became sharply focused in 1983 with the identification of nine potentially suitable sites for the first repository. From these nine sites, five were subsequently nominated by the U.S. Department of Energy (DOE) as suitable for characterization and then, in 1986, as required by the Nuclear Waste Policy Act of 1982 (NWPA), three of these five were recommended to the President as candidates for site characterization. President Reagan approved the recommendation on May 28, 1986. DOE was preparing site characterization plans for the three candidate sites, namely Deaf Smith County, Texas; Hanford Site, Washington; and YM. As a consequence of the 1987 Amendment to the NWPA, only the latter was authorized to undergo detailed characterization. A final Site Characterization Plan for Yucca Mountain was published in 1988. Prior to 1954, there was no program for the siting of disposal facilities for high-level waste (HLW). In the 1940s and 1950s, the volume of waste, which was small and which resulted entirely from military weapons and research programs, was stored as a liquid in large steel tanks buried at geographically remote government installations principally in Washington and Tennessee.

  8. The ALICE high level trigger

    Energy Technology Data Exchange (ETDEWEB)

    Alt, T [Kirchhoff Institute for Physics, University of Heidelberg (Germany); Grastveit, G [Department of Physics and Technology, University of Bergen (Norway); Helstrup, H [Faculty of Engineering, Bergen University College (Norway); Lindenstruth, V [Kirchhoff Institute for Physics, University of Heidelberg (Germany); Loizides, C [Institute for Nuclear Physics, University of Frankfurt (Germany); Roehrich, D [Department of Physics and Technology, University of Bergen (Norway); Skaali, B [Department of Physics, University of Oslo (Norway); Steinbeck, T [Kirchhoff Institute for Physics, University of Heidelberg (Germany); Stock, R [Institute for Nuclear Physics, University of Frankfurt (Germany); Tilsner, H [Kirchhoff Institute for Physics, University of Heidelberg (Germany); Ullaland, K [Department of Physics and Technology, University of Bergen (Norway); Vestboe, A [Faculty of Engineering, Bergen University College (Norway); Vik, T [Department of Physics, University of Oslo (Norway); Wiebalck, A [Kirchhoff Institute for Physics, University of Heidelberg (Germany)

    2004-08-01

    The ALICE experiment at LHC will implement a high-level trigger system for online event selection and/or data compression. The largest computing challenge is posed by the TPC detector, which requires real-time pattern recognition. The system entails a very large processing farm that is designed for an anticipated input data stream of 25 GB s{sup -1}. In this paper, we present the architecture of the system and the current state of the tracking methods and data compression applications.

  9. The CMS High Level Trigger

    CERN Document Server

    Adam, W; Deldicque, C; Ero, J; Frühwirth, R; Jeitler, Manfred; Kastner, K; Köstner, S; Neumeister, N; Porth, M; Padrta P; Rohringer, H; Sakulinb, H; Strauss, J; Taurok, A; Walzel, G; Wulz, C E; Lowette, S; Van De Vyver, B; De Lentdecker, G; Vanlaer, P; Delaere, C; Lemaître, V; Ninane, A; van der Aa, O; Damgov, J; Karimäki, V; Kinnunen, R; Lampen, T; Lassila-Perini, K M; Lehti, S; Nysten, J; Tuominiemi, J; Busson, P; Todorov, T; Schwering, G; Gras, P; Daskalakis, G; Sfyrla, A; Barone, M; Geralis, T; Markou, C; Zachariadou, K; Hidas, P; Banerjee, S; Mazumdara, K; Abbrescia, M; Colaleoa, A; D'Amato, N; De Filippis, N; Giordano, D; Loddo, F; Maggi, M; Silvestris, L; Zito, G; Arcelli, S; Bonacorsi, D; Capiluppi, P; Dallavalle, G M; Fanfani, A; Grandi, C; Marcellini, S; Montanari, A; Odorici, F; Travaglini, R; Costa, S; Tricomi, A; Ciulli, a V; Magini, N; Ranieri, R; Berti, L; Biasotto, M; Gulminia, M; Maron, G; Toniolo, N; Zangrando, L; Bellato, M; Gasparini, U; Lacaprara, S; Parenti, A; Ronchese, P; Vanini, S; Zotto, S; Ventura P L; Perugia; Benedetti, D; Biasini, M; Fano, L; Servoli, L; Bagliesi, a G; Boccali, T; Dutta, S; Gennai, S; Giassi, A; Palla, F; Segneri, G; Starodumov, A; Tenchini, R; Meridiani, P; Organtini, G; Amapane, a N; Bertolino, F; Cirio, R; Kim, J Y; Lim, I T; Pac, Y; Joo, K; Kim, S B; Suwon; Choi, Y I; Yu, I T; Cho, K; Chung, J; Ham, S W; Kim, D H; Kim, G N; Kim, W; CKim, J; Oh, S K; Park, H; Ro, S R; Son, D C; Suh, J S; Aftab, Z; Hoorani, H; Osmana, A; Bunkowski, K; Cwiok, M; Dominik, Wojciech; Doroba, K; Kazana, M; Królikowski, J; Kudla, I; Pietrusinski, M; Pozniak, Krzysztof T; Zabolotny, W M; Zalipska, J; Zych, P; Goscilo, L; Górski, M; Wrochna, G; Zalewski, P; Alemany-Fernandez, R; Almeida, C; Almeida, N; Da Silva, J C; Santos, M; Teixeira, I; Teixeira, J P; Varelaa, J; Vaz-Cardoso, N; Konoplyanikov, V F; Urkinbaev, A R; Toropin, A; Gavrilov, V; Kolosov, V; Krokhotin, A; Oulianov, A; Stepanov, N; Kodolova, O L; Vardanyan, I; Ilic, J; Skoro, G P; Albajar, C; De Troconiz, J F; Calderón, A; López-Virto, M A; Marco, R; Martínez-Rivero, C; Matorras, F; Vila, I; Cucciarelli, S; Konecki, M; Ashby, S; Barney, D; Bartalini, P; Benetta, R; Brigljevic, V; Bruno, G; Cano, E; Cittolin, S; Della Negra, M; de Roeck, A; Favre, P; Frey, A; Funk, W; Futyan, D; Gigi, D; Glege, F; Gutleber, J; Hansen, M; Innocente, V; Jacobs, C; Jank, W; Kozlovszky, Miklos; Larsen, H; Lenzi, M; Magrans, I; Mannelli, M; Meijers, F; Meschi, E; Mirabito, L; Murray, S J; Oh, A; Orsini, L; Palomares-Espiga, C; Pollet, L; Rácz, A; Reynaud, S; Samyn, D; Scharff-Hansen, P; Schwick, C; Sguazzoni, G; Sinanis, N; Sphicas, P; Spiropulu, M; Strandlie, A; Taylor, B G; Van Vulpen, I; Wellisch, J P; Winkler, M; Villigen; Kotlinski, D; Zurich; Prokofiev, K; Speer, T; Dumanoglu, I; Bristol; Bailey, S; Brooke, J J; Cussans, D; Heath, G P; Machin, D; Nash, S J; Newbold, D; Didcot; Coughlan, A; Halsall, R; Haynes, W J; Tomalin, I R; Marinelli, N; Nikitenko, A; Rutherford, S; Seeza, C; Sharif, O; Antchev, G; Hazen, E; Rohlf, J; Wu, S; Breedon, R; Cox, P T; Murray, P; Tripathi, M; Cousins, R; Erhan, S; Hauser, J; Kreuzer, P; Lindgren, M; Mumford, J; Schlein, P E; Shi, Y; Tannenbaum, B; Valuev, V; Von der Mey, M; Andreevaa, I; Clare, R; Villa, S; Bhattacharya, S; Branson, J G; Fisk, I; Letts, J; Mojaver, M; Paar, H P; Trepagnier, E; Litvine, V; Shevchenko, S; Singh, S; Wilkinson, R; Aziz, S; Bowden, M; Elias, J E; Graham, G; Green, D; Litmaath, M; Los, S; O'Dell, V; Ratnikova, N; Suzuki, I; Wenzel, H; Acosta, D; Bourilkov, D; Korytov, A; Madorsky, A; Mitselmakher, G; Rodríguez, J L; Scurlock, B; Abdullin, S; Baden, D; Eno, S; Grassi, T; Kunori, S; Pavlon, S; Sumorok, K; Tether, S; Cremaldi, L M; Sanders, D; Summers, D; Osborne, I; Taylor, L; Tuura, L; Fisher,W C; Mans6, J; Stickland, D P; Tully, C; Wildish, T; Wynhoff, S; Padley, B P; Chumney, P; Dasu, S; Smith, W H; CMS Trigger Data Acquisition Group

    2006-01-01

    At the Large Hadron Collider at CERN the proton bunches cross at a rate of 40MHz. At the Compact Muon Solenoid experiment the original collision rate is reduced by a factor of O (1000) using a Level-1 hardware trigger. A subsequent factor of O(1000) data reduction is obtained by a software-implemented High Level Trigger (HLT) selection that is executed on a multi-processor farm. In this review we present in detail prototype CMS HLT physics selection algorithms, expected trigger rates and trigger performance in terms of both physics efficiency and timing.

  10. Study on Parallel Computing

    Institute of Scientific and Technical Information of China (English)

    Guo-Liang Chen; Guang-Zhong Sun; Yun-Quan Zhang; Ze-Yao Mo

    2006-01-01

    In this paper, we present a general survey on parallel computing. The main contents include parallel computer system which is the hardware platform of parallel computing, parallel algorithm which is the theoretical base of parallel computing, parallel programming which is the software support of parallel computing. After that, we also introduce some parallel applications and enabling technologies. We argue that parallel computing research should form an integrated methodology of "architecture - algorithm - programming - application". Only in this way, parallel computing research becomes continuous development and more realistic.

  11. Implementation of GAMMA on a Massively Parallel Computer

    Institute of Scientific and Technical Information of China (English)

    黄林鹏; 童维勤; 等

    1997-01-01

    The GAMMA paradigm is recently proposed by Banatre and Metayer to describe the systematic construction of parallel programs without introducing artificial sequentiality.This paper presents two synchronous execution models for GAMMA and discusses how to implement them on MasPar MP-1,a massively data parallel computer.The results show that GAMMA paradign can be implemented very naturally on data parallel machines,and very high level language,such as GAMMA in which parallelism is left implicit,is suitable for specifying massively parallel applications.

  12. Reusable, Extensible High-Level Data-Distribution Concept

    Science.gov (United States)

    James, Mark; Zima, Hans; Diaconescua, Roxana

    2007-01-01

    A framework for high-level specification of data distributions in data-parallel application programs has been conceived. [As used here, distributions signifies means to express locality (more specifically, locations of specified pieces of data) in a computing system composed of many processor and memory components connected by a network.] Inasmuch as distributions exert a great effect on the performances of application programs, it is important that a distribution strategy be flexible, so that distributions can be adapted to the requirements of those programs. At the same time, for the sake of productivity in programming and execution, it is desirable that users be shielded from such error-prone, tedious details as those of communication and synchronization. As desired, the present framework enables a user to refine a distribution type and adjust it to optimize the performance of an application program and conceals, from the user, the low-level details of communication and synchronization. The framework provides for a reusable, extensible, data-distribution design, denoted the design pattern, that is independent of a concrete implementation. The design pattern abstracts over coding patterns that have been found to be commonly encountered in both manually and automatically generated distributed parallel programs. The following description of the present framework is necessarily oversimplified to fit within the space available for this article. Distributions are among the elements of a conceptual data-distribution machinery, some of the other elements being denoted domains, index sets, and data collections (see figure). Associated with each domain is one index set and one distribution. A distribution class interface (where "class" is used in the object-oriented-programming sense) includes operations that enable specification of the mapping of an index to a unit of locality. Thus, "Map(Index)" specifies a unit, while "LocalLayout(Index)" specifies the local address

  13. High-level graphic and visual environments for the Intel Paragon

    Energy Technology Data Exchange (ETDEWEB)

    Frost, R.; McCurdy, C. [San Diego Supercomputer Center, CA (United States); Kelly, P. [Univ. of California, San Diego, CA (United States)] [and others

    1995-12-01

    Scalable parallel systems have proven to be excellent computational platforms for a wide range of applications. Nonetheless, the industry has been driven by hardware development which has for the most part outpaced end-user software. Here we discuss the design and implementation of a high-level graphics library for SPMD-style programs and the porting of a well-known visual programming system to the Intel Paragon. It is our belief that this work will lead to wider use of scalable parallel systems in science and industry. Although these two efforts might seem unrelated, they both require the transport and assimilation of distributed data between independent node partitions. This is a challenge rarely found in single-partition, multi-node compute applications.

  14. Manipulation of stimulus onset delay in reading: evidence for parallel programming of saccades.

    Science.gov (United States)

    Morrison, R E

    1984-10-01

    On-line eye movement recording of 12 subjects who read short stories on a cathode ray tube enabled a test of direct control and preprogramming models of eye movements in reading. Contingent upon eye position, a mask was displayed in place of the letters in central vision after each saccade, delaying the onset of the stimulus in each eye fixation. The duration of the delay was manipulated in fixed or randomized blocks. Although the length of the delay strongly affected the duration of the fixations, there was no difference due to the conditions of delay manipulation, indicating that fixation duration is under direct control. However, not all fixations were lengthened by the period of the delay. Some ended while the mask was still present, suggesting they had been preprogrammed. But these "anticipation" eye movements could not have been completely determined before the fixation was processed because their fixation durations and saccade lengths were affected by the spatial extent of the mask, which varied randomly. Neither preprogramming nor existing serial direct control models of eye guidance can adequately account for these data. Instead, a model with direct control and parallel programming of saccades is proposed to explain the data and eye movements in reading in general.

  15. A pattern recognition system for prostate mass spectra discrimination based on the CUDA parallel programming model

    Science.gov (United States)

    Kostopoulos, Spiros; Glotsos, Dimitris; Sidiropoulos, Konstantinos; Asvestas, Pantelis; Cavouras, Dionisis; Kalatzis, Ioannis

    2014-03-01

    The aim of the present study was to implement a pattern recognition system for the discrimination of healthy from malignant prostate tumors from proteomic Mass Spectroscopy (MS) samples and to identify m/z intervals of potential biomarkers associated with prostate cancer. One hundred and six MS-spectra were studied in total. Sixty three spectra corresponded to healthy cases (PSA 10). The MS-spectra are publicly available from the NCI Clinical Proteomics Database. The pre-processing comprised the steps: denoising, normalization, peak extraction and peak alignment. Due to the enormous number of features that rose from MS-spectra as informative peaks, and in order to secure optimum system design, the classification task was performed by programming in parallel the multiprocessors of an nVIDIA GPU card, using the CUDA framework. The proposed system achieved 98.1% accuracy. The identified m/z intervals displayed significant statistical differences between the two classes and were found to possess adequate discriminatory power in characterizing prostate samples, when employed in the design of the classification system. Those intervals should be further investigated since they might lead to the identification of potential new biomarkers for prostate cancer.

  16. 3-D parallel program for numerical calculation of gas dynamics problems with heat conductivity on distributed memory computational systems (CS)

    Energy Technology Data Exchange (ETDEWEB)

    Sofronov, I.D.; Voronin, B.L.; Butnev, O.I. [VNIIEF (Russian Federation)] [and others

    1997-12-31

    The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle. The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.

  17. Research of Parallel Application Program Scheduling Strategy%并行应用程序调度策略研究

    Institute of Scientific and Technical Information of China (English)

    李爱玲; 王璐; 彭云峰

    2012-01-01

    In order to improve the execution efficiency of parallel application programs on the heterogeneous platform, from the paradigm and granularity point of view the parallel components are classified and model is designed. The serial paradigms, messages parallel paradigms or memory shared parallel paradigms and whatever coarse grain,middle grain,fine grain parallel level paradigms can be run well. It also can be programmed against the component programming language. At the same time, based on the description of the components paradigm, grain size and the message of use of resources, it presents the components scheduling policy. Tests show that the component model and strategy can improve the performance of the parallel component running and improves the utilization of heterogeneous platforms.%为了提高并行应用程序在异构平台上的执行效率,从范例、粒度角度对并行组件分类并设计相应模型,从而实现串行、消息并行或内存并行共亭,粗、精、中粒度均可的各类范例的运行,同时也可针对组件的编程语言对范例进行编程.基于对组件范例、粒度的描述及资源使用的信息,进一步提出了组件调度策略,经测试表明组件模型和调度策略改善了并行应用程序的执行,提高了异构平台资源的利用率.

  18. Parallelization of While Loops in Nested Loop Programs for Shared-Memory Multiprocessor Systems

    NARCIS (Netherlands)

    Geuns, Stefan J.; Bekooij, Marco J.G.; Bijlsma, Tjerk; Corporaal, Henk

    2011-01-01

    Many applications contain loops with an undetermined number of iterations. These loops have to be parallelized in order to increase the throughput when executed on an embedded multiprocessor platform. This paper presents a method to automatically extract a parallel task graph based on function level

  19. SPSS and SAS programs for determining the number of components using parallel analysis and velicer's MAP test.

    Science.gov (United States)

    O'Connor, B P

    2000-08-01

    Popular statistical software packages do not have the proper procedures for determining the number of components in factor and principal components analyses. Parallel analysis and Velicer's minimum average partial (MAP) test are validated procedures, recommended widely by statisticians. However, many researchers continue to use alternative, simpler, but flawed procedures, such as the eigenvalues-greater-than-one rule. Use of the proper procedures might be increased if these procedures could be conducted within familiar software environments. This paper describes brief and efficient programs for using SPSS and SAS to conduct parallel analyses and the MAP test.

  20. A component analysis based on serial results analyzing performance of parallel iterative programs

    Energy Technology Data Exchange (ETDEWEB)

    Richman, S.C. [Dalhousie Univ. (Canada)

    1994-12-31

    This research is concerned with the parallel performance of iterative methods for solving large, sparse, nonsymmetric linear systems. Most of the iterative methods are first presented with their time costs and convergence rates examined intensively on sequential machines, and then adapted to parallel machines. The analysis of the parallel iterative performance is more complicated than that of serial performance, since the former can be affected by many new factors, such as data communication schemes, number of processors used, and Ordering and mapping techniques. Although the author is able to summarize results from data obtained after examining certain cases by experiments, two questions remain: (1) How to explain the results obtained? (2) How to extend the results from the certain cases to general cases? To answer these two questions quantitatively, the author introduces a tool called component analysis based on serial results. This component analysis is introduced because the iterative methods consist mainly of several basic functions such as linked triads, inner products, and triangular solves, which have different intrinsic parallelisms and are suitable for different parallel techniques. The parallel performance of each iterative method is first expressed as a weighted sum of the parallel performance of the basic functions that are the components of the method. Then, one separately examines the performance of basic functions and the weighting distributions of iterative methods, from which two independent sets of information are obtained when solving a given problem. In this component approach, all the weightings require only serial costs not parallel costs, and each iterative method for solving a given problem is represented by its unique weighting distribution. The information given by the basic functions is independent of iterative method, while that given by weightings is independent of parallel technique, parallel machine and number of processors.

  1. On the Executon Mechanisms of Parallel Graph Reduction

    Institute of Scientific and Technical Information of China (English)

    王鼎兴; 郑纬民; 等

    1990-01-01

    Parallel graph reduction is a promising model for new generation computer because of its amenability to both programming and parallel computing.In this paper,an initial design for a parallel graph reduction model,PGR model,is presented,which employs eager evaluation strategy to exploit conservative parallelism and provides with primitives and associated tage of nodes to synchronize concurrent tasks.Moreover,a direct operational description of graph reduction in terms of high level instructions(primitives)is given to obtain a virtual machine,called PGRVM.

  2. Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

    Science.gov (United States)

    Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

    2016-01-01

    Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.

  3. High-level trigger system for the LHC ALICE experiment

    Energy Technology Data Exchange (ETDEWEB)

    Bramm, R.; Helstrup, H.; Lien, J.; Lindenstruth, V.; Loizides, C.; Roehrich, D.; Skaali, B.; Steinbeck, T.; Stock, R.; Ullaland, K.; Vestboe, A. E-mail: vestbo@fi.uib.no; Wiebalck, A

    2003-04-21

    The central detectors of the ALICE experiment at LHC will produce a data size of up to 75 MB/event at an event rate {<=}200 Hz resulting in a data rate of {approx}15 GB/s. Online processing of the data is necessary in order to select interesting (sub)events ('High Level Trigger'), or to compress data efficiently by modeling techniques. Processing this data requires a massive parallel computing system (High Level Trigger System). The system will consist of a farm of clustered SMP-nodes based on off-the-shelf PCs connected with a high bandwidth low latency network.

  4. High-level trigger system for the LHC ALICE experiment

    CERN Document Server

    Bramm, R; Lien, J A; Lindenstruth, V; Loizides, C; Röhrich, D; Skaali, B; Steinbeck, T M; Stock, Reinhard; Ullaland, K; Vestbø, A S; Wiebalck, A

    2003-01-01

    The central detectors of the ALICE experiment at LHC will produce a data size of up to 75 MB/event at an event rate less than approximately equals 200 Hz resulting in a data rate of similar to 15 GB/s. Online processing of the data is necessary in order to select interesting (sub)events ("High Level Trigger"), or to compress data efficiently by modeling techniques. Processing this data requires a massive parallel computing system (High Level Trigger System). The system will consist of a farm of clustered SMP-nodes based on off- the-shelf PCs connected with a high bandwidth low latency network.

  5. High-level trigger system for the LHC ALICE experiment

    Science.gov (United States)

    Bramm, R.; Helstrup, H.; Lien, J.; Lindenstruth, V.; Loizides, C.; Röhrich, D.; Skaali, B.; Steinbeck, T.; Stock, R.; Ullaland, K.; Vestbø, A.; Wiebalck, A.; ALICE Colloboration

    2003-04-01

    The central detectors of the ALICE experiment at LHC will produce a data size of up to 75 MB/ event at an event rate ⩽200 Hz resulting in a data rate of ˜15 GB/ s. Online processing of the data is necessary in order to select interesting (sub)events ("High Level Trigger"), or to compress data efficiently by modeling techniques. Processing this data requires a massive parallel computing system (High Level Trigger System). The system will consist of a farm of clustered SMP-nodes based on off-the-shelf PCs connected with a high bandwidth low latency network.

  6. ParaHaplo: A program package for haplotype-based whole-genome association study using parallel computing

    Directory of Open Access Journals (Sweden)

    Kamatani Naoyuki

    2009-10-01

    Full Text Available Abstract Background Since more than a million single-nucleotide polymorphisms (SNPs are analyzed in any given genome-wide association study (GWAS, performing multiple comparisons can be problematic. To cope with multiple-comparison problems in GWAS, haplotype-based algorithms were developed to correct for multiple comparisons at multiple SNP loci in linkage disequilibrium. A permutation test can also control problems inherent in multiple testing; however, both the calculation of exact probability and the execution of permutation tests are time-consuming. Faster methods for calculating exact probabilities and executing permutation tests are required. Methods We developed a set of computer programs for the parallel computation of accurate P-values in haplotype-based GWAS. Our program, ParaHaplo, is intended for workstation clusters using the Intel Message Passing Interface (MPI. We compared the performance of our algorithm to that of the regular permutation test on JPT and CHB of HapMap. Results ParaHaplo can detect smaller differences between 2 populations than SNP-based GWAS. We also found that parallel-computing techniques made ParaHaplo 100-fold faster than a non-parallel version of the program. Conclusion ParaHaplo is a useful tool in conducting haplotype-based GWAS. Since the data sizes of such projects continue to increase, the use of fast computations with parallel computing--such as that used in ParaHaplo--will become increasingly important. The executable binaries and program sources of ParaHaplo are available at the following address: http://sourceforge.jp/projects/parallelgwas/?_sl=1

  7. ParaHaplo: A program package for haplotype-based whole-genome association study using parallel computing.

    Science.gov (United States)

    Misawa, Kazuharu; Kamatani, Naoyuki

    2009-10-21

    Since more than a million single-nucleotide polymorphisms (SNPs) are analyzed in any given genome-wide association study (GWAS), performing multiple comparisons can be problematic. To cope with multiple-comparison problems in GWAS, haplotype-based algorithms were developed to correct for multiple comparisons at multiple SNP loci in linkage disequilibrium. A permutation test can also control problems inherent in multiple testing; however, both the calculation of exact probability and the execution of permutation tests are time-consuming. Faster methods for calculating exact probabilities and executing permutation tests are required. We developed a set of computer programs for the parallel computation of accurate P-values in haplotype-based GWAS. Our program, ParaHaplo, is intended for workstation clusters using the Intel Message Passing Interface (MPI). We compared the performance of our algorithm to that of the regular permutation test on JPT and CHB of HapMap. ParaHaplo can detect smaller differences between 2 populations than SNP-based GWAS. We also found that parallel-computing techniques made ParaHaplo 100-fold faster than a non-parallel version of the program. ParaHaplo is a useful tool in conducting haplotype-based GWAS. Since the data sizes of such projects continue to increase, the use of fast computations with parallel computing--such as that used in ParaHaplo--will become increasingly important. The executable binaries and program sources of ParaHaplo are available at the following address: http://sourceforge.jp/projects/parallelgwas/?_sl=1.

  8. High level programming for the control of a tele operating mobile robot and with line following; Programacion de alto nivel para el control de un robot movil teleoperado y con seguimiento de linea

    Energy Technology Data Exchange (ETDEWEB)

    Bernal U, E. [Instituto Tecnologico de Toluca, Metepec, Estado de Mexico (Mexico)

    2006-07-01

    The TRASMAR automated vehicle was built with the purpose of transporting radioactive materials, it has a similar kinematic structure to that of a tricycle, in where the front wheel is the one in charge of offering the traction and direction, both rear wheels rotate freely and they are subject to a common axle. The electronic design was carried out being based on a MC68HC811 micro controller of the Motorola company. Of the characteristics that the robot possesses it stands out that it counts with an obstacle perception system through three ultrasonic sensors located in the front part of the vehicle to avoid collisions. The robot has two operation modes, the main mode is the manual, manipulated through a control by infrareds, although it can also move in autonomous way by means of the line pursuit technique using two reflective infrared sensors. As any other electronic system, the mobile robot required of improvements and upgrades. The modifications to be carried out were focused to the control stage. Its were intended as elements of upgrade the incorporation of the MC68HC912B32 micro controller and to replace the assembler language characteristic of this type of systems, by a high level language for micro controllers of this type, in this case the FORTH. In a same way it was implemented inside the program the function of the robot's displacement in an autonomous way by means of the line pursuit technique using control with fuzzy logic. The carried out work is distributed in the following way: In the chapter 1 the robot's characteristics are mentioned, as well as the objectives that thought about to the beginning of the project and the justifications that motivated the realization of this upgrade. In the chapters 2 at 5 are presented in a theoretical way the supports used for the the robot's upgrade, as the used modules of the micro controller, those main characteristics of the FORTH language, the theory of the fuzzy logic and the design of the stage

  9. Scalable parallel programming for high performance seismic simulation on petascale heterogeneous supercomputers

    Science.gov (United States)

    Zhou, Jun

    The 1994 Northridge earthquake in Los Angeles, California, killed 57 people, injured over 8,700 and caused an estimated $20 billion in damage. Petascale simulations are needed in California and elsewhere to provide society with a better understanding of the rupture and wave dynamics of the largest earthquakes at shaking frequencies required to engineer safe structures. As the heterogeneous supercomputing infrastructures are becoming more common, numerical developments in earthquake system research are particularly challenged by the dependence on the accelerator elements to enable "the Big One" simulations with higher frequency and finer resolution. Reducing time to solution and power consumption are two primary focus area today for the enabling technology of fault rupture dynamics and seismic wave propagation in realistic 3D models of the crust's heterogeneous structure. This dissertation presents scalable parallel programming techniques for high performance seismic simulation running on petascale heterogeneous supercomputers. A real world earthquake simulation code, AWP-ODC, one of the most advanced earthquake codes to date, was chosen as the base code in this research, and the testbed is based on Titan at Oak Ridge National Laboraratory, the world's largest hetergeneous supercomputer. The research work is primarily related to architecture study, computation performance tuning and software system scalability. An earthquake simulation workflow has also been developed to support the efficient production sets of simulations. The highlights of the technical development are an aggressive performance optimization focusing on data locality and a notable data communication model that hides the data communication latency. This development results in the optimal computation efficiency and throughput for the 13-point stencil code on heterogeneous systems, which can be extended to general high-order stencil codes. Started from scratch, the hybrid CPU/GPU version of AWP

  10. General purpose parallel programing using new generation graphic processors: CPU vs GPU comparative analysis and opportunities research

    Directory of Open Access Journals (Sweden)

    Donatas Krušna

    2013-03-01

    Full Text Available OpenCL, a modern parallel heterogeneous system programming language, enables problems to be partitioned and executed on modern CPU and GPU hardware, this increases performance of such applications considerably. Since GPU's are optimized for floating point and vector operations and specialize in them, they outperform general purpose CPU's in this field greatly. This language greatly simplifies the creation of applications for such heterogeneous system since it's cross-platform, vendor independent and is embeddable , hence letting it be used in any other general purpose programming language via libraries. There is more and more tools being developed that are aimed at low level programmers and scientists or engineers alike, that are developing applications or libraries for CPU’s and GPU’s of today as well as other heterogeneous platforms. The tendency today is to increase the number of cores or CPU‘s in hopes of increasing performance, however the increasing difficulty of parallelizing applications for such systems and the even increasing overhead of communication and synchronization are limiting the potential performance. This means that there is a point at which increasing cores or CPU‘s will no longer increase applications performance, and even can diminish performance. Even though parallel programming and GPU‘s with stream computing capabilities have decreased the need for communication and synchronization (since only the final result needs to be committed to memory, however this still is a weak link in developing such applications.

  11. Parallel implementation of inverse adding-doubling and Monte Carlo multi-layered programs for high performance computing systems with shared and distributed memory

    Science.gov (United States)

    Chugunov, Svyatoslav; Li, Changying

    2015-09-01

    Parallel implementation of two numerical tools popular in optical studies of biological materials-Inverse Adding-Doubling (IAD) program and Monte Carlo Multi-Layered (MCML) program-was developed and tested in this study. The implementation was based on Message Passing Interface (MPI) and standard C-language. Parallel versions of IAD and MCML programs were compared to their sequential counterparts in validation and performance tests. Additionally, the portability of the programs was tested using a local high performance computing (HPC) cluster, Penguin-On-Demand HPC cluster, and Amazon EC2 cluster. Parallel IAD was tested with up to 150 parallel cores using 1223 input datasets. It demonstrated linear scalability and the speedup was proportional to the number of parallel cores (up to 150x). Parallel MCML was tested with up to 1001 parallel cores using problem sizes of 104-109 photon packets. It demonstrated classical performance curves featuring communication overhead and performance saturation point. Optimal performance curve was derived for parallel MCML as a function of problem size. Typical speedup achieved for parallel MCML (up to 326x) demonstrated linear increase with problem size. Precision of MCML results was estimated in a series of tests - problem size of 106 photon packets was found optimal for calculations of total optical response and 108 photon packets for spatially-resolved results. The presented parallel versions of MCML and IAD programs are portable on multiple computing platforms. The parallel programs could significantly speed up the simulation for scientists and be utilized to their full potential in computing systems that are readily available without additional costs.

  12. The high-level trigger of ALICE

    Energy Technology Data Exchange (ETDEWEB)

    Tilsner, H.; Lindenstruth, V.; Steinbeck, T. [Kirchhoff Institute for Physics, University of Heidelberg (Germany); Alt, T.; Aurbakken, K.; Grastveit, G.; Nystrand, J.; Roehrich, D.; Ullaland, K.; Vestbo, A. [Department of Physics, University of Bergen (Norway); Helstrup, H. [Bergen College (Norway); Loizides, C. [Institute of Nuclear Physics, University of Frankfurt (Germany); Skaali, B.; Vik, T. [Department of Physics, University of Oslo (Norway)

    2004-07-01

    One of the main tracking detectors of the forthcoming ALICE Experiment at the LHC is a cylindrical Time Projection Chamber (TPC) with an expected data volume of about 75 MByte per event. This data volume, in combination with the presumed maximum bandwidth of 1.2 GByte/s to the mass storage system, would limit the maximum event rate to 20 Hz. In order to achieve higher event rates, online data processing has to be applied. This implies either the detection and read-out of only those events which contain interesting physical signatures or an efficient compression of the data by modeling techniques. In order to cope with the anticipated data rate, massive parallel computing power is required. It will be provided in form of a clustered farm of SMP-nodes, based on off-the-shelf PCs, which are connected with a high bandwidth low overhead network. This High-Level Trigger (HLT) will be able to process a data rate of 25 GByte/s online. The front-end electronics of the individual sub-detectors is connected to the HLT via an optical link and a custom PCI card which is mounted in the clustered PCs. The PCI card is equipped with an FPGA necessary for the implementation of the PCI-bus protocol. Therefore, this FPGA can also be used to assist the host processor with first-level processing. The first-level processing done on the FPGA includes conventional cluster-finding for low multiplicity events and local track finding based on the Hough Transformation of the raw data for high multiplicity events. (orig.)

  13. The high-level trigger of ALICE

    Science.gov (United States)

    Tilsner, H.; Alt, T.; Aurbakken, K.; Grastveit, G.; Helstrup, H.; Lindenstruth, V.; Loizides, C.; Nystrand, J.; Roehrich, D.; Skaali, B.; Steinbeck, T.; Ullaland, K.; Vestbo, A.; Vik, T.

    One of the main tracking detectors of the forthcoming ALICE Experiment at the LHC is a cylindrical Time Projection Chamber (TPC) with an expected data volume of about 75 MByte per event. This data volume, in combination with the presumed maximum bandwidth of 1.2 GByte/s to the mass storage system, would limit the maximum event rate to 20 Hz. In order to achieve higher event rates, online data processing has to be applied. This implies either the detection and read-out of only those events which contain interesting physical signatures or an efficient compression of the data by modeling techniques. In order to cope with the anticipated data rate, massive parallel computing power is required. It will be provided in form of a clustered farm of SMP-nodes, based on off-the-shelf PCs, which are connected with a high bandwidth low overhead network. This High-Level Trigger (HLT) will be able to process a data rate of 25 GByte/s online. The front-end electronics of the individual sub-detectors is connected to the HLT via an optical link and a custom PCI card which is mounted in the clustered PCs. The PCI card is equipped with an FPGA necessary for the implementation of the PCI-bus protocol. Therefore, this FPGA can also be used to assist the host processor with first-level processing. The first-level processing done on the FPGA includes conventional cluster-finding for low multiplicity events and local track finding based on the Hough Transformation of the raw data for high multiplicity events. PACS: 07.05.-t Computers in experimental physics - 07.05.Hd Data acquisition: hardware and software - 29.85.+c Computer data analysis

  14. Automatic generation of scheduling and communication code in real-time parallel programs

    NARCIS (Netherlands)

    Bakkers, André; Sunter, Johan; Ploeg, Evert

    1995-01-01

    Inter-process communication and scheduling are notorious problem areas in the design of real-time systems. Using CASE tools, the system design phase will in general result in a system description in the form of parallel processes. Manual allocation of these processes to processors may result in erro

  15. A survey of parallel execution strategies for transitive closure and logic programs

    NARCIS (Netherlands)

    Cacace, F.; Ceri, S.; Houtsma, M.A.W.

    1993-01-01

    An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In

  16. An object-oriented bulk synchronous parallel library for multicore programming

    NARCIS (Netherlands)

    Yzelman, A.N.; Bisseling, R.H.

    2012-01-01

    We show that the bulk synchronous parallel (BSP) model, originally designed for distributed-memory systems, is also applicable for shared-memory multicore systems and, furthermore, that BSP libraries are useful in scientific computing on these systems. A proof-of-concept MulticoreBSP library has

  17. Mathematical Programming Method Based on Chaos Anti-Control for the Solution of Forward Displacement of Parallel Robot Mechanisms

    Directory of Open Access Journals (Sweden)

    Youxin Luo

    2013-01-01

    Full Text Available The pose of the moving platform in parallel robots is possible thanks to the strong coupling, but it consequently is very difficult to obtain its forward displacement. Different methods establishing forward displacement can obtain different numbers of variables and different solving speeds with nonlinear equations. The nonlinear equations with nine variables for forward displacement in the general 6-6 type parallel mechanism were created using the rotation transformation matrix R, translation vector P and the constraint conditions of the rod length. Given the problems of there being only one solution and sometimes no convergence when solving nonlinear equations with the Newton method and the quasi-Newton method, the Euler equation for free rotation in a rigid body was applied to a chaotic system by using chaos anti-control and chaotic sequences were produced. Combining the characteristics of the chaotic sequence with the mathematical programming method, a new mathematical programming method was put forward, which was based on chaos anti-control with the aim of solving all real solutions of nonlinear equations for forward displacement in the general 6-6 type parallel mechanism. The numerical example shows that the new method has some positive characteristics such as that it runs in the initial value range, it has fast convergence, it can find all the possible real solutions that be found out and it proves the correctness and validity of this method when compared with other methods.

  18. Mathematical Programming Method Based on Chaos Anti-Control for the Solution of Forward Displacement of Parallel Robot Mechanisms

    Directory of Open Access Journals (Sweden)

    Youxin Luo

    2013-01-01

    Full Text Available The pose of the moving platform in parallel robots is possible thanks to the strong coupling, but it consequently is very difficult to obtain its forward displacement. Different methods establishing forward displacement can obtain different numbers of variables and different solving speeds with nonlinear equations. The nonlinear equations with nine variables for forward displacement in the general 6‐6 type parallel mechanism were created using the rotation transformation matrix R , translation vector P and the constraint conditions of the rod length. Given the problems of there being only one solution and sometimes no convergence when solving nonlinear equations with the Newton method and the quasi‐Newton method, the Euler equation for free rotation in a rigid body was applied to a chaotic system by using chaos anti‐control and chaotic sequences were produced. Combining the characteristics of the chaotic sequence with the mathematical programming method, a new mathematical programming method was put forward, which was based on chaos anti‐control with the aim of solving all real solutions of nonlinear equations for forward displacement in the general 6‐6 type parallel mechanism. The numerical example shows that the new method has some positive characteristics such as that it runs in the initial value range, it has fast convergence, it can find all the possible real solutions that be found out and it proves the correctness and validity of this method when compared with other methods.

  19. Implementing the PM Programming Language using MPI and OpenMP - a New Tool for Programming Geophysical Models on Parallel Systems

    Science.gov (United States)

    Bellerby, Tim

    2015-04-01

    PM (Parallel Models) is a new parallel programming language specifically designed for writing environmental and geophysical models. The language is intended to enable implementers to concentrate on the science behind the model rather than the details of running on parallel hardware. At the same time PM leaves the programmer in control - all parallelisation is explicit and the parallel structure of any given program may be deduced directly from the code. This paper describes a PM implementation based on the Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) standards, looking at issues involved with translating the PM parallelisation model to MPI/OpenMP protocols and considering performance in terms of the competing factors of finer-grained parallelisation and increased communication overhead. In order to maximise portability, the implementation stays within the MPI 1.3 standard as much as possible, with MPI-2 MPI-IO file handling the only significant exception. Moreover, it does not assume a thread-safe implementation of MPI. PM adopts a two-tier abstract representation of parallel hardware. A PM processor is a conceptual unit capable of efficiently executing a set of language tasks, with a complete parallel system consisting of an abstract N-dimensional array of such processors. PM processors may map to single cores executing tasks using cooperative multi-tasking, to multiple cores or even to separate processing nodes, efficiently sharing tasks using algorithms such as work stealing. While tasks may move between hardware elements within a PM processor, they may not move between processors without specific programmer intervention. Tasks are assigned to processors using a nested parallelism approach, building on ideas from Reyes et al. (2009). The main program owns all available processors. When the program enters a parallel statement then either processors are divided out among the newly generated tasks (number of new tasks number of processors

  20. Parallel biocomputing

    Directory of Open Access Journals (Sweden)

    Witte John S

    2011-03-01

    Full Text Available Abstract Background With the advent of high throughput genomics and high-resolution imaging techniques, there is a growing necessity in biology and medicine for parallel computing, and with the low cost of computing, it is now cost-effective for even small labs or individuals to build their own personal computation cluster. Methods Here we briefly describe how to use commodity hardware to build a low-cost, high-performance compute cluster, and provide an in-depth example and sample code for parallel execution of R jobs using MOSIX, a mature extension of the Linux kernel for parallel computing. A similar process can be used with other cluster platform software. Results As a statistical genetics example, we use our cluster to run a simulated eQTL experiment. Because eQTL is computationally intensive, and is conceptually easy to parallelize, like many statistics/genetics applications, parallel execution with MOSIX gives a linear speedup in analysis time with little additional effort. Conclusions We have used MOSIX to run a wide variety of software programs in parallel with good results. The limitations and benefits of using MOSIX are discussed and compared to other platforms.

  1. Management of data quality of high level waste characterization

    Energy Technology Data Exchange (ETDEWEB)

    Winters, W.I., Westinghouse Hanford

    1996-06-12

    Over the past 10 years, the Hanford Site has been transitioning from nuclear materials production to Site cleanup operations. High-level waste characterization at the Hanford Site provides data to support present waste processing operations, tank safety programs, and future waste disposal programs. Quality elements in the high-level waste characterization program will be presented by following a sample through the data quality objective, sampling, laboratory analysis and data review process. Transition from production to cleanup has resulted in changes in quality systems and program; the changes, as well as other issues in these quality programs, will be described. Laboratory assessment through quality control and performance evaluation programs will be described, and data assessments in the laboratory and final reporting in the tank characterization reports will be discussed.

  2. Strength analysis of parallel robot components in PLM Siemens NX 8.5 program

    Science.gov (United States)

    Ociepka, P.; Herbus, K.

    2015-11-01

    This article presents a series of numerical analyses in order to identify the states of stress in elements, which arise during the operation of the mechanism. The object of the research was parallel robot, which is the basis for the prototype of a driving simulator. To conduct the dynamic analysis was used the Motion Simulation module and the RecurDyn solver. In this module were created the joints which occur in the mechanism of a parallel robot. Next dynamic analyzes were performed to determine the maximal forces that will applied to the analyzed elements. It was also analyzed the platform motion during the simulation a collision of a car with a wall. In the next step, basing on the results obtained in the dynamic analysis, were performed the strength analyzes in the Advanced Simulation module. For calculation the NX Nastran solver was used.

  3. Structured Parallel Programming: patterns for efficient computation : Michael McCool, Arch D Robison, James Reinders, Morgan Kaufmann-Elsevier 2012

    OpenAIRE

    De Giusti, Armando Eduardo

    2015-01-01

    In this book the authors, who are parallel computing experts and industry insiders, describe how to design and implement maintainable and efficient parallel algorithms using a pattern-based approach. They present both theory and practice, and give some specific examples using multiple programming models.

  4. Programming Environment for a High-Performance Parallel Supercomputer with Intelligent Communication

    OpenAIRE

    A. Gunzinger; BÄumle, B.; Frey, M.; Klebl, M.; Kocheisen, M.; Kohler, P.; Morel, R.; Müller, U; Rosenthal, M

    1996-01-01

    At the Electronics Laboratory of the Swiss Federal Institute of Technology (ETH) in Zürich, the high-performance parallel supercomputer MUSIC (MUlti processor System with Intelligent Communication) has been developed. As applications like neural network simulation and molecular dynamics show, the Electronics Laboratory supercomputer is absolutely on par with those of conventional supercomputers, but electric power requirements are reduced by a factor of 1,000, weight is reduced by a factor of...

  5. High-level language computer architecture

    CERN Document Server

    Chu, Yaohan

    1975-01-01

    High-Level Language Computer Architecture offers a tutorial on high-level language computer architecture, including von Neumann architecture and syntax-oriented architecture as well as direct and indirect execution architecture. Design concepts of Japanese-language data processing systems are discussed, along with the architecture of stack machines and the SYMBOL computer system. The conceptual design of a direct high-level language processor is also described.Comprised of seven chapters, this book first presents a classification of high-level language computer architecture according to the pr

  6. Compiling the functional data-parallel language SaC for Microgrids of Self-Adaptive Virtual Processors

    NARCIS (Netherlands)

    Grelck, C.; Herhut, S.; Jesshope, C.; Joslin, C.; Lankamp, M.; Scholz, S.-B.; Shafarenko, A.

    2009-01-01

    We present preliminary results from compiling the high-level, functional and data-parallel programming language SaC into a novel multi-core design: Microgrids of Self-Adaptive Virtual Processors (SVPs). The side-effect free nature of SaC in conjunction with its data-parallel foundation make it an id

  7. On the Problem of Optimizing Parallel Programs for Complex Memory Hierarchies

    Institute of Scientific and Technical Information of China (English)

    金国华; 陈福接

    1994-01-01

    Based on a thorough study of the relationship between array element accesses and loop indices of the nested loop,a method is presented with which the staggering relation and the compacting relation between the threads of the nested loop (either with a single linear function of with multiple linear functions) can be determined at compile-time,and accordingly the nested loop (either perfectly nested one or imperfectly nested one) can be restructured to avoid the thrashing problem.Due to its simplicity,our method can be efficiently implemented in any parallel compiler,and the improvement of the performance is significant as shown be the experimental results.

  8. Hybrid Parallel Programming Models for AMR Neutron Monte-Carlo Transport

    Science.gov (United States)

    Dureau, David; Poëtte, Gaël

    2014-06-01

    This paper deals with High Performance Computing (HPC) applied to neutron transport theory on complex geometries, thanks to both an Adaptive Mesh Refinement (AMR) algorithm and a Monte-Carlo (MC) solver. Several Parallelism models are presented and analyzed in this context, among them shared memory and distributed memory ones such as Domain Replication and Domain Decomposition, together with Hybrid strategies. The study is illustrated by weak and strong scalability tests on complex benchmarks on several thousands of cores thanks to the petaflopic supercomputer Tera100.

  9. 76 FR 66309 - Pilot Program for Parallel Review of Medical Products; Correction

    Science.gov (United States)

    2011-10-26

    ... Federal Register of October 11, 2011 (76 FR 62808). The document announced a pilot program for sponsors of...-796-6579. SUPPLEMENTARY INFORMATION: In FR Doc. 2011-25907, appearing on page 62808 in the Federal... HUMAN SERVICES Centers for Medicare and Medicaid Services Food and Drug Administration Pilot Program...

  10. Asynchronous Adaptive Optimisation for Generic Data-Parallel Array Programming and Beyond

    NARCIS (Netherlands)

    Grelck, C.

    2011-01-01

    We present the concept of an adaptive compiler optimisation framework for the functional array programming language SaC, Single Assignment C. SaC advocates shape- and rank-generic programming with multidimensional arrays. A sophisticated, highly optimising compiler technology nonetheless achieves co

  11. Asynchronous Adaptive Optimisation for Generic Data-Parallel Array Programming and Beyond

    NARCIS (Netherlands)

    Grelck, C.

    2011-01-01

    We present the concept of an adaptive compiler optimisation framework for the functional array programming language SaC, Single Assignment C. SaC advocates shape- and rank-generic programming with multidimensional arrays. A sophisticated, highly optimising compiler technology nonetheless achieves co

  12. An overview of very high level software design methods

    Science.gov (United States)

    Asdjodi, Maryam; Hooper, James W.

    1988-01-01

    Very High Level design methods emphasize automatic transfer of requirements to formal design specifications, and/or may concentrate on automatic transformation of formal design specifications that include some semantic information of the system into machine executable form. Very high level design methods range from general domain independent methods to approaches implementable for specific applications or domains. Applying AI techniques, abstract programming methods, domain heuristics, software engineering tools, library-based programming and other methods different approaches for higher level software design are being developed. Though one finds that a given approach does not always fall exactly in any specific class, this paper provides a classification for very high level design methods including examples for each class. These methods are analyzed and compared based on their basic approaches, strengths and feasibility for future expansion toward automatic development of software systems.

  13. A Study on the Effect of Communication Performance on Message-Passing Parallel Programs: Methodology and Case Studies

    Science.gov (United States)

    Sarukkai, Sekhar R.; Yan, Jerry; Woodrow, Thomas (Technical Monitor)

    1994-01-01

    From a source-program perspective, the performance achieved on distributed/parallel systems is governed by the underlying message-passing library overhead and the network capabilities of the architecture. Studying the impact of changes in these features on the source-program. can have a significant influence in the development of next-generation system designs. In this paper we introduce a simple and robust tool that can be used for this purpose. This tool is based on event-driven simulation of programs that generates a new set of trace events - that preserves causality and partial order - corresponding to the expected execution of the program in the simulated environment. Trace events can be visualized and source-level profile information can be used to pin-point locations of program which are most significantly affected with changing system parameters in the simulated environment. We present a number of examples from the NAS benchmark suite, executed on the Intel Paragon and iPSC/860 that are used to identify and expose performance bottlenecks with varying system parameters. Specific aspects of the system that significantly effect these benchmarks are presented and discussed,

  14. A Study on the Effect of Communication Performance on Message-Passing Parallel Programs: Methodology and Case Studies

    Science.gov (United States)

    Sarukkai, Sekhar R.; Yan, Jerry; Woodrow, Thomas (Technical Monitor)

    1994-01-01

    From a source-program perspective, the performance achieved on distributed/parallel systems is governed by the underlying message-passing library overhead and the network capabilities of the architecture. Studying the impact of changes in these features on the source-program. can have a significant influence in the development of next-generation system designs. In this paper we introduce a simple and robust tool that can be used for this purpose. This tool is based on event-driven simulation of programs that generates a new set of trace events - that preserves causality and partial order - corresponding to the expected execution of the program in the simulated environment. Trace events can be visualized and source-level profile information can be used to pin-point locations of program which are most significantly affected with changing system parameters in the simulated environment. We present a number of examples from the NAS benchmark suite, executed on the Intel Paragon and iPSC/860 that are used to identify and expose performance bottlenecks with varying system parameters. Specific aspects of the system that significantly effect these benchmarks are presented and discussed,

  15. Neurite, a finite difference large scale parallel program for the simulation of electrical signal propagation in neurites under mechanical loading.

    Directory of Open Access Journals (Sweden)

    Julián A García-Grajales

    Full Text Available With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite--explicit and implicit--were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon

  16. Neurite, a finite difference large scale parallel program for the simulation of electrical signal propagation in neurites under mechanical loading.

    Science.gov (United States)

    García-Grajales, Julián A; Rucabado, Gabriel; García-Dopico, Antonio; Peña, José-María; Jérusalem, Antoine

    2015-01-01

    With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference parallel program for simulating electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to simulate the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The simulation of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the simulated cells grows. The solvers implemented in Neurite--explicit and implicit--were therefore parallelized using graphics processing units in order to reduce the burden of the simulation costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the parallel implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented

  17. Parallel Programming Application to Matrix Algebra in the Spectral Method for Control Systems Analysis, Synthesis and Identification

    Directory of Open Access Journals (Sweden)

    V. Yu. Kleshnin

    2016-01-01

    Full Text Available The article describes the matrix algebra libraries based on the modern technologies of parallel programming for the Spectrum software, which can use a spectral method (in the spectral form of mathematical description to analyse, synthesise and identify deterministic and stochastic dynamical systems. The developed matrix algebra libraries use the following technologies for the GPUs: OmniThreadLibrary, OpenMP, Intel Threading Building Blocks, Intel Cilk Plus for CPUs nVidia CUDA, OpenCL, and Microsoft Accelerated Massive Parallelism.The developed libraries support matrices with real elements (single and double precision. The matrix dimensions are limited by 32-bit or 64-bit memory model and computer configuration. These libraries are general-purpose and can be used not only for the Spectrum software. They can also find application in the other projects where there is a need to perform operations with large matrices.The article provides a comparative analysis of the libraries developed for various matrix operations (addition, subtraction, scalar multiplication, multiplication, powers of matrices, tensor multiplication, transpose, inverse matrix, finding a solution of the system of linear equations through the numerical experiments using different CPU and GPU. The article contains sample programs and performance test results for matrix multiplication, which requires most of all computational resources in regard to the other operations.

  18. Study on Patterns for Parallel Programming Based on CMP System%基于CMP系统的并行编程模式研究

    Institute of Scientific and Technical Information of China (English)

    胥秀峰; 鲍广宇; 黄海燕; 吴亚宁

    2014-01-01

    Studying patterns for parallel programming based on CMP system is designed to build and develop the whole way for parallel program on CMP system. Firstly,briefly introduce the multi-core parallel computing,then put forward a conceptual model of patterns for parallel programming based on CMP system by summarizing the problem of parallel computing,and the patterns contains four elements of parallel architecture,parallel algorithm design model,development environment,parallel program implementation model. Then,describe the main connotation of the concepts. Finally illustrate the parallel programming patterns by a example,initially verifying the reasonable-ness of the patterns.%研究基于CMP( Chip Multiple Processors,片上多处理器)系统的并行编程模式旨在建立开发CMP系统上并行程序的整套方法。首先简要介绍了多核并行计算,然后通过对CMP系统上并行计算问题的综合归纳,提出了基于CMP系统的并行编程模式的概念模型,这个概念模型包含并行体系结构、并行算法设计模型、开发环境、并行程序实现模型四个核心要素;其次,对各并行编程模式各要素及其子概念的内涵进行了阐释;最后以实例对并行编程模式进行说明,初步验证了这套编程模式的合理性。

  19. High-level manpower movement and Japan's foreign aid.

    Science.gov (United States)

    Furuya, K

    1992-01-01

    "Japan's technical assistance programs to Asian countries are summarized. Movements of high-level manpower accompanying direct foreign investments by private enterprise are also reviewed. Proposals for increased human resources development include education and training of foreigners in Japan as well as the training of Japanese aid experts and the development of networks for information exchange."

  20. Generating local addresses and communication sets for data-parallel programs

    Science.gov (United States)

    Chatterjee, Siddhartha; Gilbert, John R.; Long, Fred J. E.; Schreiber, Robert; Teng, Shang-Hua

    1993-01-01

    Generating local addresses and communication sets is an important issue in distributed-memory implementations of data-parallel languages such as High Performance FORTRAN. We show that, for an array A affinely aligned to a template that is distributed across p processors with a cyclic(k) distribution and a computation involving the regular section A(l:h:s), the local memory access sequence for any processor is characterized by a finite state machine of at most k states. We present fast algorithms for computing the essential information about these state machines, and extend the framework to handle multidimensional arrays. We also show how to generate communication sets using the state machine approach. Performance results show that this solution requires very little run-time overhead and acceptable preprocessing time.

  1. Portable Parallel Programming for the Dynamic Load Balancing of Unstructured Grid Applications

    Science.gov (United States)

    Biswas, Rupak; Das, Sajal K.; Harvey, Daniel; Oliker, Leonid

    1999-01-01

    The ability to dynamically adapt an unstructured -rid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however, an efficient parallel implementation is rather difficult, particularly from the view point of portability on various multiprocessor platforms We address this problem by developing PLUM, tin automatic anti architecture-independent framework for adaptive numerical computations in a message-passing environment. Portability is demonstrated by comparing performance on an SP2, an Origin2000, and a T3E, without any code modifications. We also present a general-purpose load balancer that utilizes symmetric broadcast networks (SBN) as the underlying communication pattern, with a goal to providing a global view of system loads across processors. Experiments on, an SP2 and an Origin2000 demonstrate the portability of our approach which achieves superb load balance at the cost of minimal extra overhead.

  2. Parallel programming of gradient-based iterative image reconstruction schemes for optical tomography.

    Science.gov (United States)

    Hielscher, Andreas H; Bartel, Sebastian

    2004-02-01

    Optical tomography (OT) is a fast developing novel imaging modality that uses near-infrared (NIR) light to obtain cross-sectional views of optical properties inside the human body. A major challenge remains the time-consuming, computational-intensive image reconstruction problem that converts NIR transmission measurements into cross-sectional images. To increase the speed of iterative image reconstruction schemes that are commonly applied for OT, we have developed and implemented several parallel algorithms on a cluster of workstations. Static process distribution as well as dynamic load balancing schemes suitable for heterogeneous clusters and varying machine performances are introduced and tested. The resulting algorithms are shown to accelerate the reconstruction process to various degrees, substantially reducing the computation times for clinically relevant problems.

  3. SIGWX Charts - High Level Significant Weather

    Data.gov (United States)

    National Oceanic and Atmospheric Administration, Department of Commerce — High level significant weather (SIGWX) forecasts are provided for the en-route portion of international flights. NOAA's National Weather Service Aviation Center...

  4. High-Level Dialogue on International Migration

    Directory of Open Access Journals (Sweden)

    UNHCR

    2006-08-01

    Full Text Available UNHCR wishes to bring the following observations andrecommendations to the attention of the High-LevelDialogue (HLD on International Migration and Development,to be held in New York, 14-15 September 2006:

  5. The Feasibility of Using OpenCL Instead of OpenMP for Parallel CPU Programming

    OpenAIRE

    Karimi, Kamran

    2015-01-01

    OpenCL, along with CUDA, is one of the main tools used to program GPGPUs. However, it allows running the same code on multi-core CPUs too, making it a rival for the long-established OpenMP. In this paper we compare OpenCL and OpenMP when developing and running compute-heavy code on a CPU. Both ease of programming and performance aspects are considered. Since, unlike a GPU, no memory copy operation is involved, our comparisons measure the code generation quality, as well as thread management e...

  6. FPGA Co-processor for the ALICE High Level Trigger

    CERN Document Server

    Grastveit, G; Lindenstruth, V.; Loizides, C.; Roehrich, D.; Skaali, B.; Steinbeck, T.; Stock, R.; Tilsner, H.; Ullaland, K.; Vestbo, A.; Vik, T.

    2003-01-01

    The High Level Trigger (HLT) of the ALICE experiment requires massive parallel computing. One of the main tasks of the HLT system is two-dimensional cluster finding on raw data of the Time Projection Chamber (TPC), which is the main data source of ALICE. To reduce the number of computing nodes needed in the HLT farm, FPGAs, which are an intrinsic part of the system, will be utilized for this task. VHDL code implementing the Fast Cluster Finder algorithm, has been written, a testbed for functional verification of the code has been developed, and the code has been synthesized

  7. Acceleration of the Geostatistical Software Library (GSLIB) by code optimization and hybrid parallel programming

    Science.gov (United States)

    Peredo, Oscar; Ortiz, Julián M.; Herrero, José R.

    2015-12-01

    The Geostatistical Software Library (GSLIB) has been used in the geostatistical community for more than thirty years. It was designed as a bundle of sequential Fortran codes, and today it is still in use by many practitioners and researchers. Despite its widespread use, few attempts have been reported in order to bring this package to the multi-core era. Using all CPU resources, GSLIB algorithms can handle large datasets and grids, where tasks are compute- and memory-intensive applications. In this work, a methodology is presented to accelerate GSLIB applications using code optimization and hybrid parallel processing, specifically for compute-intensive applications. Minimal code modifications are added decreasing as much as possible the elapsed time of execution of the studied routines. If multi-core processing is available, the user can activate OpenMP directives to speed up the execution using all resources of the CPU. If multi-node processing is available, the execution is enhanced using MPI messages between the compute nodes.Four case studies are presented: experimental variogram calculation, kriging estimation, sequential gaussian and indicator simulation. For each application, three scenarios (small, large and extra large) are tested using a desktop environment with 4 CPU-cores and a multi-node server with 128 CPU-nodes. Elapsed times, speedup and efficiency results are shown.

  8. The Establishment of Parallel Systems on Microcomputer and the Development of Parallel Programs%微机环境下并行系统的建立与并行程序的开发

    Institute of Scientific and Technical Information of China (English)

    王顺绪; 李志英

    2001-01-01

    The significance of establishing parallel systems on microcomputer and making parallel simulation is illustrated with methods given to install PVM on microcomputer. Besides, the file (.cshrc) which makes PVM run correctly and the examples of program application in the model of master/slave are also presented.%阐述了在微机环境下建立并行环境,进行并行模拟的意义,给出了微机上PVM的安装方法和使PVM正确运行的.cshrc文件,以及master/slave编程模式的PVM应用程序示例.

  9. Design and Implementation of FEM Parallel Program%有限元并行程序设计与实现

    Institute of Scientific and Technical Information of China (English)

    余天堂; 姜弘道

    2000-01-01

    Parallel computation of FEM under systolic distributednetwork is a important direction of FEM parallel computation. A programdesigning method and its implementation for FEM parallel analysis undernetwork based on PVM is presented. Substructure parallel analysismethod with multi-front parallel processing is adopted in FEM parallelcomputation, the interface equations are solved with PreconditionedConjugate Gradient (PCG) method. The implementation of this designingmethod is easy. Example shows the designing method can obtain higherspeedup ratio.

  10. Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages

    Science.gov (United States)

    2013-01-02

    ECL Embedded Common Lisp EM Expectation-Maximization FFI Foreign Function Interface FFT Fast Fourier Transform FFTW Fastest Fourier Transform in the...results to the user. 3.3 Domain-Specific Embedded Languages Domain-specific embedded languages are used in many programming languages such as Lisp ...designed to be extensible using metaprogramming, including Haskell and variants of Lisp . These DSELs generally transformed host language code into

  11. Practical parallel computing

    CERN Document Server

    Morse, H Stephen

    1994-01-01

    Practical Parallel Computing provides information pertinent to the fundamental aspects of high-performance parallel processing. This book discusses the development of parallel applications on a variety of equipment.Organized into three parts encompassing 12 chapters, this book begins with an overview of the technology trends that converge to favor massively parallel hardware over traditional mainframes and vector machines. This text then gives a tutorial introduction to parallel hardware architectures. Other chapters provide worked-out examples of programs using several parallel languages. Thi

  12. A distributed computing approach to improve the performance of the Parallel Ocean Program (v2.1

    Directory of Open Access Journals (Sweden)

    B. van Werkhoven

    2013-09-01

    Full Text Available The Parallel Ocean Program (POP is used in many strongly eddying ocean circulation simulations. Ideally one would like to do thousand-year long simulations, but the current performance of POP prohibits this type of simulations. In this work, using a new distributed computing approach, two innovations to improve the performance of POP are presented. The first is a new block partitioning scheme for the optimization of the load balancing of POP such that it can be run efficiently in a multi-platform setting. The second is an implementation of part of the POP model code on Graphics Processing Units. We show that the combination of both innovations leads to a substantial performance increase also when running POP simultaneously over multiple computational platforms.

  13. Comparative Study of Dynamic Programming and Pontryagin’s Minimum Principle on Energy Management for a Parallel Hybrid Electric Vehicle

    Directory of Open Access Journals (Sweden)

    Huei Peng

    2013-04-01

    Full Text Available This paper compares two optimal energy management methods for parallel hybrid electric vehicles using an Automatic Manual Transmission (AMT. A control-oriented model of the powertrain and vehicle dynamics is built first. The energy management is formulated as a typical optimal control problem to trade off the fuel consumption and gear shifting frequency under admissible constraints. The Dynamic Programming (DP and Pontryagin’s Minimum Principle (PMP are applied to obtain the optimal solutions. Tuning with the appropriate co-states, the PMP solution is found to be very close to that from DP. The solution for the gear shifting in PMP has an algebraic expression associated with the vehicular velocity and can be implemented more efficiently in the control algorithm. The computation time of PMP is significantly less than DP.

  14. SKIRT: Hybrid parallelization of radiative transfer simulations

    Science.gov (United States)

    Verstocken, S.; Van De Putte, D.; Camps, P.; Baes, M.

    2017-07-01

    We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modelling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behaviour of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.

  15. EAP high-level product architecture

    DEFF Research Database (Denmark)

    Guðlaugsson, Tómas Vignir; Mortensen, Niels Henrik; Sarban, Rahimullah

    2013-01-01

    the function of the EAP transducers to be changed, by basing the EAP transducers on a different combination of organ alternatives. A model providing an overview of the high level product architecture has been developed to support daily development and cooperation across development teams. The platform approach...... of EAP technology products while keeping complexity under control. High level product architecture has been developed for the mechanical part of EAP transducers, as the foundation for platform development. A generic description of an EAP transducer forms the core of the high level product architecture....... Initial results from applying the platform on demonstrator design for potential applications are promising. The scope of the article does not include technical details. © 2013 SPIE....

  16. Research on Parallel Program Design Method Based on Distributing Object%基于分布对象的并行程序设计方法研究

    Institute of Scientific and Technical Information of China (English)

    龚向坚; 邹腊梅; 马淑萍

    2011-01-01

    Studies the parallel implementation of distributed object and optimization, proposes a distributed object-based parallel programming method and puts forward a distributed object-based parallel programming model. In this way it completes the design and implementation of a virtual computer network experimental systems, the experimental result shows that the virtual computer network experimental system has better parallelism and moderate responding rate. It proves that parallel programming method based on distributed object is effective in improving the computer system parallelism.%研究分布式对象的并行实现及优化.提出一种基于分布式对象的并行程序设计方法,构建一个基于分布式对象的并行程序设计模型.并以此方法完成虚拟计算机网络实验系统的设计和实现。实验结果表明,该虚拟计算机网络实验系统并行性较好、响应速度适中,证明基于分布式对象的并行程序设计方法在改善微机系统并行性上具有一定的作用。

  17. Speeding Up the String Comparison of the IDS Snort using Parallel Programming: A Systematic Literature Review on the Parallelized Aho-Corasick Algorithm

    Directory of Open Access Journals (Sweden)

    SILVA JUNIOR,J. B.

    2016-12-01

    Full Text Available The Intrusion Detection System (IDS needs to compare the contents of all packets arriving at the network interface with a set of signatures for indicating possible attacks, a task that consumes much CPU processing time. In order to alleviate this problem, some researchers have tried to parallelize the IDS's comparison engine, transferring execution from the CPU to GPU. This paper identifies and maps the parallelization features of the Aho-Corasick algorithm, which is used in Snort to compare patterns, in order to show this algorithm's implementation and execution issues, as well as optimization techniques for the Aho-Corasick machine. We have found 147 papers from important computer science publications databases, and have mapped them. We selected 22 and analyzed them in order to find our results. Our analysis of the papers showed, among other results, that parallelization of the AC algorithm is a new task and the authors have focused on the State Transition Table as the most common way to implement the algorithm on the GPU. Furthermore, we found that some techniques speed up the algorithm and reduce the required machine storage space are highly used, such as the algorithm running on the fastest memories and mechanisms for reducing the number of nodes and bit maping.

  18. Design strategies for irregularly adapting parallel applications

    Energy Technology Data Exchange (ETDEWEB)

    Oliker, Leonid; Biswas, Rupak; Shan, Hongzhang; Sing, Jaswinder Pal

    2000-11-01

    Achieving scalable performance for dynamic irregular applications is eminently challenging. Traditional message-passing approaches have been making steady progress towards this goal; however, they suffer from complex implementation requirements. The use of a global address space greatly simplifies the programming task, but can degrade the performance of dynamically adapting computations. In this work, we examine two major classes of adaptive applications, under five competing programming methodologies and four leading parallel architectures. Results indicate that it is possible to achieve message-passing performance using shared-memory programming techniques by carefully following the same high level strategies. Adaptive applications have computational work loads and communication patterns which change unpredictably at runtime, requiring dynamic load balancing to achieve scalable performance on parallel machines. Efficient parallel implementations of such adaptive applications are therefore a challenging task. This work examines the implementation of two typical adaptive applications, Dynamic Remeshing and N-Body, across various programming paradigms and architectural platforms. We compare several critical factors of the parallel code development, including performance, programmability, scalability, algorithmic development, and portability.

  19. PAIRWISE BLENDING OF HIGH LEVEL WASTE (HLW)

    Energy Technology Data Exchange (ETDEWEB)

    CERTA, P.J.

    2006-02-22

    The primary objective of this study is to demonstrate a mission scenario that uses pairwise and incidental blending of high level waste (HLW) to reduce the total mass of HLW glass. Secondary objectives include understanding how recent refinements to the tank waste inventory and solubility assumptions affect the mass of HLW glass and how logistical constraints may affect the efficacy of HLW blending.

  20. Algorithms and programming tools for image processing on the MPP, introduction. Thesis

    Science.gov (United States)

    1985-01-01

    The programming tools and parallel algorithms created for the Massively Parallel Processor (MPP) located at the NASA Goddard Space Center are discussed. A user-friendly environment for high level language parallel algorithm development was developed. The issues involved in implementing certain algorithms on the MPP were researched. The expected results were compared with the actual results.

  1. High-level binocular rivalry effects.

    Science.gov (United States)

    Wolf, Michal; Hochstein, Shaul

    2011-01-01

    Binocular rivalry (BR) occurs when the brain cannot fuse percepts from the two eyes because they are different. We review results relating to an ongoing controversy regarding the cortical site of the BR mechanism. Some BR qualities suggest it is low-level: (1) BR, as its name implies, is usually between eyes and only low-levels have access to utrocular information. (2) All input to one eye is suppressed: blurring doesn't stimulate accommodation; pupilary constrictions are reduced; probe detection is reduced. (3) Rivalry is affected by low-level attributes, contrast, spatial frequency, brightness, motion. (4) There is limited priming due to suppressed words or pictures. On the other hand, recent studies favor a high-level mechanism: (1) Rivalry occurs between patterns, not eyes, as in patchwork rivalry or a swapping paradigm. (2) Attention affects alternations. (3) Context affects dominance. There is conflicting evidence from physiological studies (single cell and fMRI) regarding cortical level(s) of conscious perception. We discuss the possibility of multiple BR sites and theoretical considerations that rule out this solution. We present new data regarding the locus of the BR switch by manipulating stimulus semantic content or high-level characteristics. Since these variations are represented at higher cortical levels, their affecting rivalry supports high-level BR intervention. In Experiment I, we measure rivalry when one eye views words and the other non-words and find significantly longer dominance durations for non-words. In Experiment II, we find longer dominance times for line drawings of simple, structurally impossible figures than for similar, possible objects. In Experiment III, we test the influence of idiomatic context on rivalry between words. Results show that generally words within their idiomatic context have longer mean dominance durations. We conclude that BR has high-level cortical influences, and may be controlled by a high-level mechanism.

  2. Center for Programming Models for Scalable Parallel Computing - Towards Enhancing OpenMP for Manycore and Heterogeneous Nodes

    Energy Technology Data Exchange (ETDEWEB)

    Barbara Chapman

    2012-02-01

    OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close to DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support.

  3. Introducing PROFESS 2.0: A parallelized, fully linear scaling program for orbital-free density functional theory calculations

    Science.gov (United States)

    Hung, Linda; Huang, Chen; Shin, Ilgyou; Ho, Gregory S.; Lignères, Vincent L.; Carter, Emily A.

    2010-12-01

    Orbital-free density functional theory (OFDFT) is a first principles quantum mechanics method to find the ground-state energy of a system by variationally minimizing with respect to the electron density. No orbitals are used in the evaluation of the kinetic energy (unlike Kohn-Sham DFT), and the method scales nearly linearly with the size of the system. The PRinceton Orbital-Free Electronic Structure Software (PROFESS) uses OFDFT to model materials from the atomic scale to the mesoscale. This new version of PROFESS allows the study of larger systems with two significant changes: PROFESS is now parallelized, and the ion-electron and ion-ion terms scale quasilinearly, instead of quadratically as in PROFESS v1 (L. Hung and E.A. Carter, Chem. Phys. Lett. 475 (2009) 163). At the start of a run, PROFESS reads the various input files that describe the geometry of the system (ion positions and cell dimensions), the type of elements (defined by electron-ion pseudopotentials), the actions you want it to perform (minimize with respect to electron density and/or ion positions and/or cell lattice vectors), and the various options for the computation (such as which functionals you want it to use). Based on these inputs, PROFESS sets up a computation and performs the appropriate optimizations. Energies, forces, stresses, material geometries, and electron density configurations are some of the values that can be output throughout the optimization. New version program summaryProgram Title: PROFESS Catalogue identifier: AEBN_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBN_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 68 721 No. of bytes in distributed program, including test data, etc.: 1 708 547 Distribution format: tar.gz Programming language: Fortran 90 Computer

  4. The HST Frontier Fields: Complete High-Level Science Data Products for All 6 Clusters

    Science.gov (United States)

    Koekemoer, Anton M.; Mack, Jennifer; Lotz, Jennifer M.; Borncamp, David; Khandrika, Harish G.; Lucas, Ray A.; Martlin, Catherine; Porterfield, Blair; Sunnquist, Ben; Anderson, Jay; Avila, Roberto J.; Barker, Elizabeth A.; Grogin, Norman A.; Gunning, Heather C.; Hilbert, Bryan; Ogaz, Sara; Robberto, Massimo; Sembach, Kenneth; Flanagan, Kathryn; Mountain, Matt; HST Frontier Fields Team

    2017-01-01

    The Hubble Space Telescope Frontier Fields program (PI: J. Lotz) is a large Director's Discretionary program of 840 orbits, to obtain ultra-deep observations of six strong lensing clusters of galaxies, together with parallel deep blank fields, making use of the strong lensing amplification by these clusters of distant background galaxies to detect the faintest galaxies currently observable in the high-redshift universe. The entire program has now completed successfully for all 6 clusters, namely Abell 2744, Abell S1063, Abell 370, MACS J0416.1-2403, MACS J0717.5+3745 and MACS J1149.5+2223,. Each of these was observed over two epochs, to a total depth of 140 orbits on the main cluster and an associated parallel field, obtaining images in ACS (F435W, F606W, F814W) and WFC3/IR (F105W, F125W, F140W, F160W) on both the main cluster and the parallel field in all cases. Full sets of high-level science products have been generated for all these clusters by the team at STScI, including cumulative-depth data releases during each epoch, as well as full-depth releases after the completion of each epoch. These products include all the full-depth distortion-corrected drizzled mosaics and associated products for each cluster, which are science-ready to facilitate the construction of lensing models as well as enabling a wide range of other science projects. Many improvements beyond default calibration for ACS and WFC3/IR are implemented in these data products, including corrections for persistence, time-variable sky, and low-level dark current residuals, as well as improvements in astrometric alignment to achieve milliarcsecond-level accuracy. The full set of resulting high-level science products and mosaics are publicly delivered to the community via the Mikulski Archive for Space Telescopes (MAST) to enable the widest scientific use of these data, as well as ensuring a public legacy dataset of the highest possible quality that is of lasting value to the entire community.

  5. Handbook of high-level radioactive waste transportation

    Energy Technology Data Exchange (ETDEWEB)

    Sattler, L.R.

    1992-10-01

    The High-Level Radioactive Waste Transportation Handbook serves as a reference to which state officials and members of the general public may turn for information on radioactive waste transportation and on the federal government`s system for transporting this waste under the Civilian Radioactive Waste Management Program. The Handbook condenses and updates information contained in the Midwestern High-Level Radioactive Waste Transportation Primer. It is intended primarily to assist legislators who, in the future, may be called upon to enact legislation pertaining to the transportation of radioactive waste through their jurisdictions. The Handbook is divided into two sections. The first section places the federal government`s program for transporting radioactive waste in context. It provides background information on nuclear waste production in the United States and traces the emergence of federal policy for disposing of radioactive waste. The second section covers the history of radioactive waste transportation; summarizes major pieces of legislation pertaining to the transportation of radioactive waste; and provides an overview of the radioactive waste transportation program developed by the US Department of Energy (DOE). To supplement this information, a summary of pertinent federal and state legislation and a glossary of terms are included as appendices, as is a list of publications produced by the Midwestern Office of The Council of State Governments (CSG-MW) as part of the Midwestern High-Level Radioactive Waste Transportation Project.

  6. Extending Java for High-Level Web Service Construction

    DEFF Research Database (Denmark)

    Christensen, Aske Simon; Møller, Anders; Schwartzbach, Michael Ignatieff

    2003-01-01

    We incorporate innovations from the project into the Java language to provide high-level features for Web service programming. The resulting language, JWIG, contains an advanced session model and a flexible mechanism for dynamic construction of XML documents, in particular XHTML. To support program...... development we provide a suite of program analyses that at compile time verify for a given program that no runtime errors can occur while building documents or receiving form input, and that all documents being shown are valid according to the document type definition for XHTML 1.0.We compare JWIG...... with Servlets and JSP which are widely used Web service development platforms. Our implementation and evaluation of JWIG indicate that the language extensions can simplify the program structure and that the analyses are sufficiently fast and precise to be practically useful....

  7. MPI-based Parallel Programming and Implementation of Knapsack Problem%基于MPI的背包问题并行程序设计与实现

    Institute of Scientific and Technical Information of China (English)

    张居晓

    2011-01-01

    MPI(Message Passing Interface)message passing parallel programming is one of the criteria,outlined the concept and composition of MPI,focuses on support for parallel programming Message Passing Interface(MPI)and MPI parallel programming environment method,and gives a MPI parallel programming examples to illustrate the design of MPI processes and procedures common link between the serial programming.%MPI(Message Passing Interface)是消息传递并行程序设计的标准之一,概述了MPI的概念和组成,着重介绍了支持并行程序设计的消息传递接口(MPI)以及在MPI环境下的并行程序设计方法,并给出一个MPI并行程序设计实例,说明了MPI的程序设计流程和普通串行程序设计之间的关联。

  8. Thread-Based Automatic Parallel Conversion Technique for Java Program%基于线程的Java程序自动并行转换技术

    Institute of Scientific and Technical Information of China (English)

    刘英; 刘磊; 张乃孝

    2001-01-01

    Java程序的并行化研究是一个重要课题.提出一种Java程序的自动并行转换技术,并充分利用Java语言本身提供的多线程机制,通过操作冲突性检测等方法将串行化的Java程序自动转化成并行化程序.使得转化后的并行化程序在多处理机操作系统的支持下,能在共享内存的多处理机系统上运行,从而提高了程序的运行效率.%The study of parallelism for Java program is one of the mostimportant subjects at present. In this paper, a kind of automatic parallel conversion technique is given. The serial Java program is transformed to parallel program utilizing the multithreading machanism and testing technique of commutativity operations. The parallel program after transforming can run in the supercomputer with multi-CPU under the multi-processor operating system, which will enhance the programs' efficiency.

  9. Determining the Number of Factors to Retain in EFA: An easy-to-use computer program for carrying out Parallel Analysis

    Directory of Open Access Journals (Sweden)

    Rubin Daniel Ledesma

    2007-02-01

    Full Text Available Parallel Analysis is a Monte Carlo simulation technique that aids researchers in determining the number of factors to retain in Principal Component and Exploratory Factor Analysis. This method provides a superior alternative to other techniques that are commonly used for the same purpose, such as the Scree test or the Kaiser's eigenvalue-greater-than-one rule. Nevertheless, Parallel Analysis is not well known among researchers, in part because it is not included as an analysis option in the most popular statistical packages. This paper describes and illustrates how to apply Parallel Analysis with an easy-to-use computer program called ViSta-PARAN. ViSta-PARAN is a user-friendly application that can compute and interpret Parallel Analysis. Its user interface is fully graphic and includes a dialog box to specify parameters, and specialized graphics to visualize the analysis output.

  10. High-Level Waste Melter Study Report

    Energy Technology Data Exchange (ETDEWEB)

    Perez, Joseph M.; Bickford, Dennis F.; Day, Delbert E.; Kim, Dong-Sang; Lambert, Steven L.; Marra, Sharon L.; Peeler, David K.; Strachan, Denis M.; Triplett, Mark B.; Vienna, John D.; Wittman, Richard S.

    2001-07-13

    At the Hanford Site in Richland, Washington, the path to site cleanup involves vitrification of the majority of the wastes that currently reside in large underground tanks. A Joule-heated glass melter is the equipment of choice for vitrifying the high-level fraction of these wastes. Even though this technology has general national and international acceptance, opportunities may exist to improve or change the technology to reduce the enormous cost of accomplishing the mission of site cleanup. Consequently, the U.S. Department of Energy requested the staff of the Tanks Focus Area to review immobilization technologies, waste forms, and modifications to requirements for solidification of the high-level waste fraction at Hanford to determine what aspects could affect cost reductions with reasonable long-term risk. The results of this study are summarized in this report.

  11. Service-oriented high level architecture

    CERN Document Server

    Wang, Wenguang; Li, Qun; Wang, Weiping; Liu, Xichun

    2009-01-01

    Service-oriented High Level Architecture (SOHLA) refers to the high level architecture (HLA) enabled by Service-Oriented Architecture (SOA) and Web Services etc. techniques which supports distributed interoperating services. The detailed comparisons between HLA and SOA are made to illustrate the importance of their combination. Then several key enhancements and changes of HLA Evolved Web Service API are introduced in comparison with native APIs, such as Federation Development and Execution Process, communication mechanisms, data encoding, session handling, testing environment and performance analysis. Some approaches are summarized including Web-Enabling HLA at the communication layer, HLA interface specification layer, federate interface layer and application layer. Finally the problems of current research are discussed, and the future directions are pointed out.

  12. High-Level Waste Melter Study Report

    Energy Technology Data Exchange (ETDEWEB)

    Perez Jr, Joseph M; Bickford, Dennis F; Day, Delbert E; Kim, Dong-Sang; Lambert, Steven L; Marra, Sharon L; Peeler, David K; Strachan, Denis M; Triplett, Mark B; Vienna, John D; Wittman, Richard S

    2001-07-13

    At the Hanford Site in Richland, Washington, the path to site cleanup involves vitrification of the majority of the wastes that currently reside in large underground tanks. A Joule-heated glass melter is the equipment of choice for vitrifying the high-level fraction of these wastes. Even though this technology has general national and international acceptance, opportunities may exist to improve or change the technology to reduce the enormous cost of accomplishing the mission of site cleanup. Consequently, the U.S. Department of Energy requested the staff of the Tanks Focus Area to review immobilization technologies, waste forms, and modifications to requirements for solidification of the high-level waste fraction at Hanford to determine what aspects could affect cost reductions with reasonable long-term risk. The results of this study are summarized in this report.

  13. A Parallel and Concurrent Implementation of Lin-Kernighan Heuristic (LKH-2 for Solving Traveling Salesman Problem for Multi-Core Processors using SPC3 Programming Model

    Directory of Open Access Journals (Sweden)

    Muhammad Ali Ismail

    2011-08-01

    Full Text Available With the arrival of multi-cores, every processor has now built-in parallel computational power and that can be fully utilized only if the program in execution is written accordingly. This study is a part of an on-going research for designing of a new parallel programming model for multi-core processors. In this paper we have presented a combined parallel and concurrent implementation of Lin-Kernighan Heuristic (LKH-2 for Solving Travelling Salesman Problem (TSP using a newly developed parallel programming model, SPC3 PM, for general purpose multi-core processors. This implementation is found to be very simple, highly efficient, scalable and less time consuming in compare to the existing LKH-2 serial implementations in multi-core processing environment. We have tested our parallel implementation of LKH-2 with medium and large size TSP instances of TSBLIB. And for all these tests our proposed approach has shown much improved performance and scalability.

  14. High-level radioactive wastes. Supplement 1

    Energy Technology Data Exchange (ETDEWEB)

    McLaren, L.H. (ed.)

    1984-09-01

    This bibliography contains information on high-level radioactive wastes included in the Department of Energy's Energy Data Base from August 1982 through December 1983. These citations are to research reports, journal articles, books, patents, theses, and conference papers from worldwide sources. Five indexes, each preceded by a brief description, are provided: Corporate Author, Personal Author, Subject, Contract Number, and Report Number. 1452 citations.

  15. XVM虚拟环境下并行程序的性能评测%Performance Evaluation of Parallel Program in Xvm Virtual Environment

    Institute of Scientific and Technical Information of China (English)

    陈剑

    2012-01-01

    该文引入speedup作为并行程序的性能评测指标,分析了并行程序在不同类型和不同数量的客户虚拟机中运行的性能差异,实验表明,MPI并行程序在xVM虚拟化环境中的运行性能接近非虚拟化本地主机的性能,在半虚拟化环境中的并行程序性能超过全虚拟化环境中的并行程序性能.%This paper introduce speedup as the basic performance metric. The performance characteristics of parallel program on different types and number of guest Vmes was investigated. The results show that the performance of traditional MPI parallel programs in Xvm virtual environment is close to it in native, non-virtualized environment and the performance of parallel programs in para-virtualized environment exceeds that of parallel programs in full-virtualized environment.

  16. VISUAL-ORIENTED PARALLEL PROGRAMMING BASED ON STM%基于STM模型的面向可视化并行程序的设计

    Institute of Scientific and Technical Information of China (English)

    王力生; 黄鹏

    2012-01-01

    并行程序设计由于需要考虑进程之间的同步等问题使得编码过程十分复杂.可视化的并行程序设计为程序员提供了图形化的编程模板和骨架来进行并行程序的设计工作,在一定程度上减小了并行程序的设计难度.首先研究软件事务性内存模型,它相对于传统的并行程序设计方法而言有着接口简单灵活,可扩展性强等特点,之后将STM模型运用到可视化程序设计中来,使得其编程接口以UML活动图的形式提供给编程人员使用,不用依赖特定的软件或硬件环境,提高了可视化并行程序设计的通用性与可扩展性.%The encoding process of parallel programming is quite complicated due to the necessity of the consideration of interprocess synchronisation. Visual parallel programming provides the programmers a graphic programming template and skeleton to carry out the design work of parallel programs, this attenuates to certain extent the difficulty of parallel programming. In the paper we first study the model of software transactional memory (STM), compared with conventional parallel programming approaches, it has some advantages such as simple and flexible interface and strong scalability; Then we apply the STM model to visual programming and make its programming interfaces in the form of UML activity graph for the utilisation by the programmers, which no longer relies on the specific software or hardware environment, this improves the universality and scalability of the visual parallel programming.

  17. High Level Trigger System for the ALICE Experiment

    Institute of Scientific and Technical Information of China (English)

    U.Frankenfeld; H.Helstrup; 等

    2001-01-01

    The ALICE experiment [1] at the Large Hadron Collider(LHC) at CERN will detect up to 20,000 particles in a single Pb-Pb event resulting in a data rate of -75 MByte/event,The event rate is limited by the bandwidth of the data storage system.Higher rates are possible by selecting interesting events and subevents (High Level trigger) or compressing the data efficiently with modeling techniques.Both require a fast parallel pattern recognition.One possible solution to process the detector data at such rates is a farm of clustered SMP nodes,based on off-the-shelf PCs,and connected by a high bandwidt,low latency network.

  18. Mammut: High-level management of system knobs and sensors

    Science.gov (United States)

    De Sensi, Daniele; Torquati, Massimo; Danelutto, Marco

    Managing low-level architectural features for controlling performance and power consumption is a growing demand in the parallel computing community. Such features include, but are not limited to: energy profiling, platform topology analysis, CPU cores disabling and frequency scaling. However, these low-level mechanisms are usually managed by specific tools, without any interaction between each other, thus hampering their usability. More important, most existing tools can only be used through a command line interface and they do not provide any API. Moreover, in most cases, they only allow monitoring and managing the same machine on which the tools are used. MAMMUT provides and integrates architectural management utilities through a high-level and easy-to-use object-oriented interface. By using MAMMUT, is possible to link together different collected information and to exploit them on both local and remote systems, to build architecture-aware applications.

  19. Portable parallel portfolio optimization in the Aurora Financial Management System

    Science.gov (United States)

    Laure, Erwin; Moritsch, Hans

    2001-07-01

    Financial planning problems are formulated as large scale, stochastic, multiperiod, tree structured optimization problems. An efficient technique for solving this kind of problems is the nested Benders decomposition method. In this paper we present a parallel, portable, asynchronous implementation of this technique. To achieve our portability goals we elected the programming language Java for our implementation and used a high level Java based framework, called OpusJava, for expressing the parallelism potential as well as synchronization constraints. Our implementation is embedded within a modular decision support tool for portfolio and asset liability management, the Aurora Financial Management System.

  20. Parallelism in matrix computations

    CERN Document Server

    Gallopoulos, Efstratios; Sameh, Ahmed H

    2016-01-01

    This book is primarily intended as a research monograph that could also be used in graduate courses for the design of parallel algorithms in matrix computations. It assumes general but not extensive knowledge of numerical linear algebra, parallel architectures, and parallel programming paradigms. The book consists of four parts: (I) Basics; (II) Dense and Special Matrix Computations; (III) Sparse Matrix Computations; and (IV) Matrix functions and characteristics. Part I deals with parallel programming paradigms and fundamental kernels, including reordering schemes for sparse matrices. Part II is devoted to dense matrix computations such as parallel algorithms for solving linear systems, linear least squares, the symmetric algebraic eigenvalue problem, and the singular-value decomposition. It also deals with the development of parallel algorithms for special linear systems such as banded ,Vandermonde ,Toeplitz ,and block Toeplitz systems. Part III addresses sparse matrix computations: (a) the development of pa...

  1. Horde: A framework for parallel programming on multi-core clusters%Horde:面向多核集群的并行编程框架

    Institute of Scientific and Technical Information of China (English)

    薛巍; 张凯; 陈康

    2011-01-01

    并行程序可以充分发掘硬件计算能力并提高程序性能,但是在多核集群环境中编写并行程序十分复杂。该文提出了面向多核集群的并行编程框架,Horde。Horde提供了一组简单易用的消息传递接口和事件驱动(event-driven)编程模型,用以帮助程序员表达算法逻辑中潜在的并行性,将计算分解与底层硬件结构去耦合,从而简化编写并行程序的复杂度,灵活地在不同的底层结构的集群上进行映射并能保持良好的性能。此外,Horde也提供了有效的任务对象迁移机制,可以实现动态负载均衡与在线容错。在128核集群上的实验表明:Horde可以有效执行并行程序,并且可以实现高效的任务对象迁移。%Parallel programming hardware to improve performance. utilizes the capacity of parallel However, parallel applications are difficult to program on multi-core clusters. This paper presents a framework for parallel programming on mult?core clusters called Horde. This framework provides a set of easy to use message-passing interfaces and an event driven programming model while helps programmers express parallelisms in the application level and decouple the computational decomposition strategy from the hardware architecture. As such, Horde releases programmers from the difficulties of building complex parallel programs and accommodates different infrastructures while maintaining reasonable performance. Horde also provides task-object migration, which is the key technology for dynamic load balancing and fault tolerance. Tests on a 128-core cluster demonstrate that this system enables high performance parallel programs as well as effective job migration.

  2. The ALICE Dimuon Spectrometer High Level Trigger

    CERN Document Server

    Becker, B; Cicalo, Corrado; Das, Indranil; de Vaux, Gareth; Fearick, Roger; Lindenstruth, Volker; Marras, Davide; Sanyal, Abhijit; Siddhanta, Sabyasachi; Staley, Florent; Steinbeck, Timm; Szostak, Artur; Usai, Gianluca; Vilakazi, Zeblon

    2009-01-01

    The ALICE Dimuon Spectrometer High Level Trigger (dHLT) is an on-line processing stage whose primary function is to select interesting events that contain distinct physics signals from heavy resonance decays such as J/psi and Gamma particles, amidst unwanted background events. It forms part of the High Level Trigger of the ALICE experiment, whose goal is to reduce the large data rate of about 25 GB/s from the ALICE detectors by an order of magnitude, without loosing interesting physics events. The dHLT has been implemented as a software trigger within a high performance and fault tolerant data transportation framework, which is run on a large cluster of commodity compute nodes. To reach the required processing speeds, the system is built as a concurrent system with a hierarchy of processing steps. The main algorithms perform partial event reconstruction, starting with hit reconstruction on the level of the raw data received from the spectrometer. Then a tracking algorithm finds track candidates from the recon...

  3. Commissioning of the CMS High Level Trigger

    CERN Document Server

    Agostino, Lorenzo; Beccati, Barbara; Behrens, Ulf; Berryhil, Jeffrey; Biery, Kurt; Bose, Tulika; Brett, Angela; Branson, James; Cano, Eric; Cheung, Harry; Ciganek, Marek; Cittolin, Sergio; Coarasa, Jose Antonio; Dahmes, Bryan; Deldicque, Christian; Dusinberre, Elizabeth; Erhan, Samim; Gigi, Dominique; Glege, Frank; Gomez-Reino, Robert; Gutleber, Johannes; Hatton, Derek; Laurens, Jean-Francois; Loizides, Constantin; Ma, Frank; Meijers, Frans; Meschi, Emilio; Meyer, Andreas; Mommsen, Remigius K; Moser, Roland; O'Dell, Vivian; Oh, Alexander; Orsini, Luciano; Patras, Vaios; Paus, Christoph; Petrucci, Andrea; Pieri, Marco; Racz, Attila; Sakulin, Hannes; Sani, Matteo; Schieferdeckerd, Philipp; Schwick, Christoph; Serrano Margaleff, Josep Francesc; Shpakov, Dennis; Simon, Sean; Sumorok, Konstanty; Sungho Yoon, Andre; Wittich, Peter; Zanetti, Marco

    2009-01-01

    The CMS experiment will collect data from the proton-proton collisions delivered by the Large Hadron Collider (LHC) at a centre-of-mass energy up to 14 TeV. The CMS trigger system is designed to cope with unprecedented luminosities and LHC bunch-crossing rates up to 40 MHz. The unique CMS trigger architecture only employs two trigger levels. The Level-1 trigger is implemented using custom electronics, while the High Level Trigger (HLT) is based on software algorithms running on a large cluster of commercial processors, the Event Filter Farm. We present the major functionalities of the CMS High Level Trigger system as of the starting of LHC beams operations in September 2008. The validation of the HLT system in the online environment with Monte Carlo simulated data and its commissioning during cosmic rays data taking campaigns are discussed in detail. We conclude with the description of the HLT operations with the first circulating LHC beams before the incident occurred the 19th September 2008.

  4. Commissioning of the CMS High Level Trigger

    Energy Technology Data Exchange (ETDEWEB)

    Agostino, Lorenzo; et al.

    2009-08-01

    The CMS experiment will collect data from the proton-proton collisions delivered by the Large Hadron Collider (LHC) at a centre-of-mass energy up to 14 TeV. The CMS trigger system is designed to cope with unprecedented luminosities and LHC bunch-crossing rates up to 40 MHz. The unique CMS trigger architecture only employs two trigger levels. The Level-1 trigger is implemented using custom electronics, while the High Level Trigger (HLT) is based on software algorithms running on a large cluster of commercial processors, the Event Filter Farm. We present the major functionalities of the CMS High Level Trigger system as of the starting of LHC beams operations in September 2008. The validation of the HLT system in the online environment with Monte Carlo simulated data and its commissioning during cosmic rays data taking campaigns are discussed in detail. We conclude with the description of the HLT operations with the first circulating LHC beams before the incident occurred the 19th September 2008.

  5. Commissioning of the CMS High Level Trigger

    Energy Technology Data Exchange (ETDEWEB)

    Agostino, Lorenzo; et al.

    2009-08-01

    The CMS experiment will collect data from the proton-proton collisions delivered by the Large Hadron Collider (LHC) at a centre-of-mass energy up to 14 TeV. The CMS trigger system is designed to cope with unprecedented luminosities and LHC bunch-crossing rates up to 40 MHz. The unique CMS trigger architecture only employs two trigger levels. The Level-1 trigger is implemented using custom electronics, while the High Level Trigger (HLT) is based on software algorithms running on a large cluster of commercial processors, the Event Filter Farm. We present the major functionalities of the CMS High Level Trigger system as of the starting of LHC beams operations in September 2008. The validation of the HLT system in the online environment with Monte Carlo simulated data and its commissioning during cosmic rays data taking campaigns are discussed in detail. We conclude with the description of the HLT operations with the first circulating LHC beams before the incident occurred the 19th September 2008.

  6. Revisiting FPGA Acceleration of Molecular Dynamics Simulation with Dynamic Data Flow Behavior in High-Level Synthesis

    CERN Document Server

    Cong, Jason; Kianinejad, Hassan; Wei, Peng

    2016-01-01

    Molecular dynamics (MD) simulation is one of the past decade's most important tools for enabling biology scientists and researchers to explore human health and diseases. However, due to the computation complexity of the MD algorithm, it takes weeks or even months to simulate a comparatively simple biology entity on conventional multicore processors. The critical path in molecular dynamics simulations is the force calculation between particles inside the simulated environment, which has abundant parallelism. Among various acceleration platforms, FPGA is an attractive alternative because of its low power and high energy efficiency. However, due to its high programming cost using RTL, none of the mainstream MD software packages has yet adopted FPGA for acceleration. In this paper we revisit the FPGA acceleration of MD in high-level synthesis (HLS) so as to provide affordable programming cost. Our experience with the MD acceleration demonstrates that HLS optimizations such as loop pipelining, module duplication a...

  7. Algorithms and parallel computing

    CERN Document Server

    Gebali, Fayez

    2011-01-01

    There is a software gap between the hardware potential and the performance that can be attained using today's software parallel program development tools. The tools need manual intervention by the programmer to parallelize the code. Programming a parallel computer requires closely studying the target algorithm or application, more so than in the traditional sequential programming we have all learned. The programmer must be aware of the communication and data dependencies of the algorithm or application. This book provides the techniques to explore the possible ways to

  8. ADAPTATION OF PARALLEL VIRTUAL MACHINES MECHANISMS TO PARALLEL SYSTEMS

    Directory of Open Access Journals (Sweden)

    Zafer DEMİR

    2001-02-01

    Full Text Available In this study, at first, Parallel Virtual Machine is reviewed. Since It is based upon parallel processing, it is similar to parallel systems in principle in terms of architecture. Parallel Virtual Machine is neither an operating system nor a programming language. It is a specific software tool that supports heterogeneous parallel systems. However, it takes advantage of the features of both to make users close to parallel systems. Since tasks can be executed in parallel on parallel systems by Parallel Virtual Machine, there is an important similarity between PVM and distributed systems and multiple processors. In this study, the relations in question are examined by making use of Master-Slave programming technique. In conclusion, the PVM is tested with a simple factorial computation on a distributed system to observe its adaptation to parallel architects.

  9. ParaHaplo 3.0: A program package for imputation and a haplotype-based whole-genome association study using hybrid parallel computing

    Directory of Open Access Journals (Sweden)

    Kamatani Naoyuki

    2011-05-01

    Full Text Available Abstract Background Use of missing genotype imputations and haplotype reconstructions are valuable in genome-wide association studies (GWASs. By modeling the patterns of linkage disequilibrium in a reference panel, genotypes not directly measured in the study samples can be imputed and used for GWASs. Since millions of single nucleotide polymorphisms need to be imputed in a GWAS, faster methods for genotype imputation and haplotype reconstruction are required. Results We developed a program package for parallel computation of genotype imputation and haplotype reconstruction. Our program package, ParaHaplo 3.0, is intended for use in workstation clusters using the Intel Message Passing Interface. We compared the performance of ParaHaplo 3.0 on the Japanese in Tokyo, Japan and Han Chinese in Beijing, and Chinese in the HapMap dataset. A parallel version of ParaHaplo 3.0 can conduct genotype imputation 20 times faster than a non-parallel version of ParaHaplo. Conclusions ParaHaplo 3.0 is an invaluable tool for conducting haplotype-based GWASs. The need for faster genotype imputation and haplotype reconstruction using parallel computing will become increasingly important as the data sizes of such projects continue to increase. ParaHaplo executable binaries and program sources are available at http://en.sourceforge.jp/projects/parallelgwas/releases/.

  10. ParaHaplo 2.0: a program package for haplotype-estimation and haplotype-based whole-genome association study using parallel computing

    Directory of Open Access Journals (Sweden)

    Kamatani Naoyuki

    2010-06-01

    Full Text Available Abstract Background The use of haplotype-based association tests can improve the power of genome-wide association studies. Since the observed genotypes are unordered pairs of alleles, haplotype phase must be inferred. However, estimating haplotype phase is time consuming. When millions of single-nucleotide polymorphisms (SNPs are analyzed in genome-wide association study, faster methods for haplotype estimation are required. Methods We developed a program package for parallel computation of haplotype estimation. Our program package, ParaHaplo 2.0, is intended for use in workstation clusters using the Intel Message Passing Interface (MPI. We compared the performance of our algorithm to that of the regular permutation test on both Japanese in Tokyo, Japan and Han Chinese in Beijing, China of the HapMap dataset. Results Parallel version of ParaHaplo 2.0 can estimate haplotypes 100 times faster than a non-parallel version of the ParaHaplo. Conclusion ParaHaplo 2.0 is an invaluable tool for conducting haplotype-based genome-wide association studies (GWAS. The need for fast haplotype estimation using parallel computing will become increasingly important as the data sizes of such projects continue to increase. The executable binaries and program sources of ParaHaplo are available at the following address: http://en.sourceforge.jp/projects/parallelgwas/releases/

  11. Optimization analysis of parallel codes for single program multiple data%单程序多数据并行程序优化规律分析

    Institute of Scientific and Technical Information of China (English)

    胡悦; 童维勤

    2014-01-01

    In developing parallel programs, effective parallel program optimization can give full play to the hardware and software execution efficiency. In this paper the Amdahl's law was re-observed. Based on this, the optimized changing trends of SPMD parallel codes' computing time and efficiency for data-intensive problems were analyzed, which were demonstrated by equations and could make the most use of Amdahl's law to guide parallel program optimization. The experimental results indicate the effectiveness of the discussions.%在并行程序开发过程中,对并行程序的有效优化能够充分发挥软硬件的执行效率。在进一步探讨Amdahl定律的基础上,针对数据密集型问题的单程序多数据( SPMD)并行计算,分析并行程序被有效优化后其计算时间及并行效率的变化规律,并给出了公式证明,有利于充分利用Amdahl定律指导并行程序的优化。实验结果表明了论证的有效性。

  12. The science of computing - The evolution of parallel processing

    Science.gov (United States)

    Denning, P. J.

    1985-01-01

    The present paper is concerned with the approaches to be employed to overcome the set of limitations in software technology which impedes currently an effective use of parallel hardware technology. The process required to solve the arising problems is found to involve four different stages. At the present time, Stage One is nearly finished, while Stage Two is under way. Tentative explorations are beginning on Stage Three, and Stage Four is more distant. In Stage One, parallelism is introduced into the hardware of a single computer, which consists of one or more processors, a main storage system, a secondary storage system, and various peripheral devices. In Stage Two, parallel execution of cooperating programs on different machines becomes explicit, while in Stage Three, new languages will make parallelism implicit. In Stage Four, there will be very high level user interfaces capable of interacting with scientists at the same level of abstraction as scientists do with each other.

  13. 并行编程模型的研究与发展%Research and Development on Parallel-Programming Model

    Institute of Scientific and Technical Information of China (English)

    董仁举; 祝永志

    2011-01-01

    并行编程模型在分布式计算中发挥着很重要的作用,随着人们对高性能计算需求的不断扩大和各种新技术的出现,并行编程模型也处于不断的发展和完善之中.对两种主要的编程模型进行了详细的分析和研究,针对前两种模型的优缺点分析并研究了两级并行模型的使用范围和优势等,最后针对硬件的新发展提出了新的编程模型的发展TBB+MPI.并在基于CMP的集群系统中实现丁矩阵相乘的算法.实验结果显示TBB+MPI在多核集群编程方面有明显的优势,因此模型TBB+MPI更适合于多核集群.%Parallel-programming model takes a very important part in distributed computing. With the increasing need for high performance computing and the appearances of many new technologies, parallel-programming model is also in need of exploration and improvement. Two major parallel-programming models are compared in details in many aspects at first. Against to the advantages and disadvantages of the first two models, the usage and advantages of the two-level parallel is studied. According to the development of hardware,given the future trend of parallel-programming model TBB+MPI. And has realized the matrix multiplication algorithm based on the CMP cluster system. The experment result showed that the new model got the performance improved.

  14. Intergenerational ethics of high level radioactive waste

    Energy Technology Data Exchange (ETDEWEB)

    Takeda, Kunihiko [Nagoya Univ., Graduate School of Engineering, Nagoya, Aichi (Japan); Nasu, Akiko; Maruyama, Yoshihiro [Shibaura Inst. of Tech., Tokyo (Japan)

    2003-03-01

    The validity of intergenerational ethics on the geological disposal of high level radioactive waste originating from nuclear power plants was studied. The result of the study on geological disposal technology showed that the current method of disposal can be judged to be scientifically reliable for several hundred years and the radioactivity level will be less than one tenth of the tolerable amount after 1,000 years or more. This implies that the consideration of intergenerational ethics of geological disposal is meaningless. Ethics developed in western society states that the consent of people in the future is necessary if the disposal has influence on them. Moreover, the ethics depends on generally accepted ideas in western society and preconceptions based on racism and sexism. The irrationality becomes clearer by comparing the dangers of the exhaustion of natural resources and pollution from harmful substances in a recycling society. (author)

  15. Reliability-Centric High-Level Synthesis

    CERN Document Server

    Tosun, S; Arvas, E; Kandemir, M; Xie, Yuan

    2011-01-01

    Importance of addressing soft errors in both safety critical applications and commercial consumer products is increasing, mainly due to ever shrinking geometries, higher-density circuits, and employment of power-saving techniques such as voltage scaling and component shut-down. As a result, it is becoming necessary to treat reliability as a first-class citizen in system design. In particular, reliability decisions taken early in system design can have significant benefits in terms of design quality. Motivated by this observation, this paper presents a reliability-centric high-level synthesis approach that addresses the soft error problem. The proposed approach tries to maximize reliability of the design while observing the bounds on area and performance, and makes use of our reliability characterization of hardware components such as adders and multipliers. We implemented the proposed approach, performed experiments with several designs, and compared the results with those obtained by a prior proposal.

  16. Tracking at High Level Trigger in CMS

    CERN Document Server

    Tosi, Mia

    2014-01-01

    A reduction of several orders of magnitude of the event rate is needed to reach values compatible with detector readout, offline storage and analysis capability. The CMS experiment has been designed with a two-level trigger system: the Level-1 Trigger (L1T), implemented on custom-designed electronics, and the High Level Trigger (HLT), a streamlined version of the CMS offline reconstruction software running on a computer farm. A software trigger system requires a trade-off between the complexity of the algorithms, the sustainable output rate, and the selection efficiency. With the computing power available during the 2012 data taking the maximum reconstruction time at HLT was about 200 ms per event, at the nominal L1T rate of 100 kHz. Track reconstruction algorithms are widely used in the HLT, for the reconstruction of the physics objects as well as in the identification of b-jets and lepton iso...

  17. Progress and future direction for the interim safe storage and disposal of Hanford high-level waste

    Energy Technology Data Exchange (ETDEWEB)

    Kinzer, J.E.; Wodrich, D.D. [Dept. of Energy, Richland, WA (United States); Bacon, R.F. [Westinghouse Hanford Company, Richland, WA (United States)] [and others

    1996-12-31

    This paper describes the progress made at the largest environmental cleanup program in the United States. Substantial advances in methods to start interim safe storage of Hanford Site high-level wastes, waste characterization to support both safety- and disposal-related information needs, and proceeding with cost-effective disposal by the U.S. Department of Energy (DOE) and its Hanford Site contractors, have been realized. Challenges facing the Tank Waste Remediation System (TWRS) Program, which is charged with the dual and parallel missions of interim safe storage and disposal of the high-level tank waste stored at the Hanford Site, are described. In these times of budget austerity, implementing an ongoing program that combines technical excellence and cost effectiveness is the near-term challenge. The technical initiatives and progress described in this paper are made more cost effective by DOE`s focus on work force productivity improvement, reduction of overhead costs, and reduction, integration and simplification of DOE regulations and operations requirements to more closely model those used in the private sector.

  18. Parallel implementation of electronic structure energy, gradient, and Hessian calculations.

    Science.gov (United States)

    Lotrich, V; Flocke, N; Ponton, M; Yau, A D; Perera, A; Deumens, E; Bartlett, R J

    2008-05-21

    ACES III is a newly written program in which the computationally demanding components of the computational chemistry code ACES II [J. F. Stanton et al., Int. J. Quantum Chem. 526, 879 (1992); [ACES II program system, University of Florida, 1994] have been redesigned and implemented in parallel. The high-level algorithms include Hartree-Fock (HF) self-consistent field (SCF), second-order many-body perturbation theory [MBPT(2)] energy, gradient, and Hessian, and coupled cluster singles, doubles, and perturbative triples [CCSD(T)] energy and gradient. For SCF, MBPT(2), and CCSD(T), both restricted HF and unrestricted HF reference wave functions are available. For MBPT(2) gradients and Hessians, a restricted open-shell HF reference is also supported. The methods are programed in a special language designed for the parallelization project. The language is called super instruction assembly language (SIAL). The design uses an extreme form of object-oriented programing. All compute intensive operations, such as tensor contractions and diagonalizations, all communication operations, and all input-output operations are handled by a parallel program written in C and FORTRAN 77. This parallel program, called the super instruction processor (SIP), interprets and executes the SIAL program. By separating the algorithmic complexity (in SIAL) from the complexities of execution on computer hardware (in SIP), a software system is created that allows for very effective optimization and tuning on different hardware architectures with quite manageable effort.

  19. Parallel implementation of electronic structure energy, gradient, and Hessian calculations

    Science.gov (United States)

    Lotrich, V.; Flocke, N.; Ponton, M.; Yau, A. D.; Perera, A.; Deumens, E.; Bartlett, R. J.

    2008-05-01

    ACES III is a newly written program in which the computationally demanding components of the computational chemistry code ACES II [J. F. Stanton et al., Int. J. Quantum Chem. 526, 879 (1992); [ACES II program system, University of Florida, 1994] have been redesigned and implemented in parallel. The high-level algorithms include Hartree-Fock (HF) self-consistent field (SCF), second-order many-body perturbation theory [MBPT(2)] energy, gradient, and Hessian, and coupled cluster singles, doubles, and perturbative triples [CCSD(T)] energy and gradient. For SCF, MBPT(2), and CCSD(T), both restricted HF and unrestricted HF reference wave functions are available. For MBPT(2) gradients and Hessians, a restricted open-shell HF reference is also supported. The methods are programed in a special language designed for the parallelization project. The language is called super instruction assembly language (SIAL). The design uses an extreme form of object-oriented programing. All compute intensive operations, such as tensor contractions and diagonalizations, all communication operations, and all input-output operations are handled by a parallel program written in C and FORTRAN 77. This parallel program, called the super instruction processor (SIP), interprets and executes the SIAL program. By separating the algorithmic complexity (in SIAL) from the complexities of execution on computer hardware (in SIP), a software system is created that allows for very effective optimization and tuning on different hardware architectures with quite manageable effort.

  20. High level cognitive information processing in neural networks

    Science.gov (United States)

    Barnden, John A.; Fields, Christopher A.

    1992-01-01

    Two related research efforts were addressed: (1) high-level connectionist cognitive modeling; and (2) local neural circuit modeling. The goals of the first effort were to develop connectionist models of high-level cognitive processes such as problem solving or natural language understanding, and to understand the computational requirements of such models. The goals of the second effort were to develop biologically-realistic model of local neural circuits, and to understand the computational behavior of such models. In keeping with the nature of NASA's Innovative Research Program, all the work conducted under the grant was highly innovative. For instance, the following ideas, all summarized, are contributions to the study of connectionist/neural networks: (1) the temporal-winner-take-all, relative-position encoding, and pattern-similarity association techniques; (2) the importation of logical combinators into connection; (3) the use of analogy-based reasoning as a bridge across the gap between the traditional symbolic paradigm and the connectionist paradigm; and (4) the application of connectionism to the domain of belief representation/reasoning. The work on local neural circuit modeling also departs significantly from the work of related researchers. In particular, its concentration on low-level neural phenomena that could support high-level cognitive processing is unusual within the area of biological local circuit modeling, and also serves to expand the horizons of the artificial neural net field.

  1. Leveraging Parallel Data Processing Frameworks with Verified Lifting

    Directory of Open Access Journals (Sweden)

    Maaz Bin Safeer Ahmad

    2016-11-01

    Full Text Available Many parallel data frameworks have been proposed in recent years that let sequential programs access parallel processing. To capitalize on the benefits of such frameworks, existing code must often be rewritten to the domain-specific languages that each framework supports. This rewriting–tedious and error-prone–also requires developers to choose the framework that best optimizes performance given a specific workload. This paper describes Casper, a novel compiler that automatically retargets sequential Java code for execution on Hadoop, a parallel data processing framework that implements the MapReduce paradigm. Given a sequential code fragment, Casper uses verified lifting to infer a high-level summary expressed in our program specification language that is then compiled for execution on Hadoop. We demonstrate that Casper automatically translates Java benchmarks into Hadoop. The translated results execute on average 3.3x faster than the sequential implementations and scale better, as well, to larger datasets.

  2. High-Level Operations in Nonprocedural Programming Languages.

    Science.gov (United States)

    1983-12-01

    defining type specific operators. Abstract data type has been applied to the nonprocedural language NOPAL [Sang8O]. It was primarily used as a tool...Navigation............24 Network data model. ...... 24 NEXT...............98, 100 Nonterminals .......... 118 NOPAL .............19 Optimization .......... 80

  3. Work stealing for GPU-accelerated parallel programs in a global address space framework: WORK STEALING ON GPU-ACCELERATED SYSTEMS

    Energy Technology Data Exchange (ETDEWEB)

    Arafat, Humayun [Department of Computer Science and Engineering, The Ohio State University, Columbus OH USA; Dinan, James [Mathematics and Computer Science Division, Argonne National Laboratory, Lemont IL USA; Krishnamoorthy, Sriram [Computer Science and Mathematics Division, Pacific Northwest National Laboratory, Richland WA USA; Balaji, Pavan [Mathematics and Computer Science Division, Argonne National Laboratory, Lemont IL USA; Sadayappan, P. [Department of Computer Science and Engineering, The Ohio State University, Columbus OH USA

    2016-01-06

    Task parallelism is an attractive approach to automatically load balance the computation in a parallel system and adapt to dynamism exhibited by parallel systems. Exploiting task parallelism through work stealing has been extensively studied in shared and distributed-memory contexts. In this paper, we study the design of a system that uses work stealing for dynamic load balancing of task-parallel programs executed on hybrid distributed-memory CPU-graphics processing unit (GPU) systems in a global-address space framework. We take into account the unique nature of the accelerator model employed by GPUs, the significant performance difference between GPU and CPU execution as a function of problem size, and the distinct CPU and GPU memory domains. We consider various alternatives in designing a distributed work stealing algorithm for CPU-GPU systems, while taking into account the impact of task distribution and data movement overheads. These strategies are evaluated using microbenchmarks that capture various execution configurations as well as the state-of-the-art CCSD(T) application module from the computational chemistry domain.

  4. 并行编程模型和技术的研究概况%A Review of Parallel Programing Models and Technologies

    Institute of Scientific and Technical Information of China (English)

    李嘉欣

    2012-01-01

    Prallel programming models are basic to parallel computing. Facing to the new challenge of multi-processor, how to chooce the main stream of future parallel computing models is an important problem. The concept, features and application environment of common parallel programming models and technologies are introduced to be good references in the condition of parallel application development.%并行编程模型是并行计算的基础,在面临多核新挑战的情况下,如何确定未来并行编程模型的主流,是一个重要的问题.通过对常用的并行编程模型和并行编程技术的概念、特点和应用环境进行介绍,为并行应用开发时有效选取并行编程模型和技术提供参考.

  5. Proton Affinity Calculations with High Level Methods.

    Science.gov (United States)

    Kolboe, Stein

    2014-08-12

    Proton affinities, stretching from small reference compounds, up to the methylbenzenes and naphthalene and anthracene, have been calculated with high accuracy computational methods, viz. W1BD, G4, G3B3, CBS-QB3, and M06-2X. Computed and the currently accepted reference proton affinities are generally in excellent accord, but there are deviations. The literature value for propene appears to be 6-7 kJ/mol too high. Reported proton affinities for the methylbenzenes seem 4-5 kJ/mol too high. G4 and G3 computations generally give results in good accord with the high level W1BD. Proton affinity values computed with the CBS-QB3 scheme are too low, and the error increases with increasing molecule size, reaching nearly 10 kJ/mol for the xylenes. The functional M06-2X fails markedly for some of the small reference compounds, in particular, for CO and ketene, but calculates methylbenzene proton affinities with high accuracy.

  6. The ATLAS High Level Trigger Steering

    CERN Document Server

    Berger, N; Eifert, T; Fischer, G; George, S; Haller, J; Höcker, A; Masik, J; Zur Nedden, M; Pérez-Réale, V; Risler, C; Schiavi, C; Stelzer, J; Wu, X; International Conference on Computing in High Energy and Nuclear Physics

    2008-01-01

    The High Level Trigger (HLT) of the ATLAS experiment at the Large Hadron Collider receives events which pass the LVL1 trigger at ~75 kHz and has to reduce the rate to ~200 Hz while retaining the most interesting physics. It is a software trigger and performs the reduction in two stages: the LVL2 trigger and the Event Filter (EF). At the heart of the HLT is the Steering software. To minimise processing time and data transfers it implements the novel event selection strategies of seeded, step-wise reconstruction and early rejection. The HLT is seeded by regions of interest identified at LVL1. These and the static configuration determine which algorithms are run to reconstruct event data and test the validity of trigger signatures. The decision to reject the event or continue is based on the valid signatures, taking into account pre-scale and pass-through. After the EF, event classification tags are assigned for streaming purposes. Several powerful new features for commissioning and operation have been added: co...

  7. Performance of the CMS High Level Trigger

    CERN Document Server

    Perrotta, Andrea

    2015-01-01

    The CMS experiment has been designed with a 2-level trigger system. The first level is implemented using custom-designed electronics. The second level is the so-called High Level Trigger (HLT), a streamlined version of the CMS offline reconstruction software running on a computer farm. For Run II of the Large Hadron Collider, the increases in center-of-mass energy and luminosity will raise the event rate to a level challenging for the HLT algorithms. The increase in the number of interactions per bunch crossing, on average 25 in 2012, and expected to be around 40 in Run II, will be an additional complication. We present here the expected performance of the main triggers that will be used during the 2015 data taking campaign, paying particular attention to the new approaches that have been developed to cope with the challenges of the new run. This includes improvements in HLT electron and photon reconstruction as well as better performing muon triggers. We will also present the performance of the improved trac...

  8. Tracking at High Level Trigger in CMS

    CERN Document Server

    Tosi, Mia

    2016-01-01

    The trigger systems of the LHC detectors play a crucial role in determining the physics capabili- ties of the experiments. A reduction of several orders of magnitude of the event rate is needed to reach values compatible with detector readout, offline storage and analysis capability. The CMS experiment has been designed with a two-level trigger system: the Level-1 Trigger (L1T), implemented on custom-designed electronics, and the High Level Trigger (HLT), a stream- lined version of the CMS offline reconstruction software running on a computer farm. A software trigger system requires a trade-off between the complexity of the algorithms, the sustainable out- put rate, and the selection efficiency. With the computing power available during the 2012 data taking the maximum reconstruction time at HLT was about 200 ms per event, at the nominal L1T rate of 100 kHz. Track reconstruction algorithms are widely used in the HLT, for the reconstruction of the physics objects as well as in the identification of b-jets and ...

  9. Programs Lucky and Lucky{sub C} - 3D parallel transport codes for the multi-group transport equation solution for XYZ geometry by Pm Sn method

    Energy Technology Data Exchange (ETDEWEB)

    Moriakov, A. [Russian Research Centre, Kurchatov Institute, Moscow (Russian Federation); Vasyukhno, V.; Netecha, M.; Khacheresov, G. [Research and Development Institute of Power Engineering, Moscow (Russian Federation)

    2003-07-01

    Powerful supercomputers are available today. MBC-1000M is one of Russian supercomputers that may be used by distant way access. Programs LUCKY and LUCKY{sub C} were created to work for multi-processors systems. These programs have algorithms created especially for these computers and used MPI (message passing interface) service for exchanges between processors. LUCKY may resolved shielding tasks by multigroup discreet ordinate method. LUCKY{sub C} may resolve critical tasks by same method. Only XYZ orthogonal geometry is available. Under little space steps to approximate discreet operator this geometry may be used as universal one to describe complex geometrical structures. Cross section libraries are used up to P8 approximation by Legendre polynomials for nuclear data in GIT format. Programming language is Fortran-90. 'Vector' processors may be used that lets get a time profit up to 30 times. But unfortunately MBC-1000M has not these processors. Nevertheless sufficient value for efficiency of parallel calculations was obtained under 'space' (LUCKY) and 'space and energy' (LUCKY{sub C}) paralleling. AUTOCAD program is used to control geometry after a treatment of input data. Programs have powerful geometry module, it is a beautiful tool to achieve any geometry. Output results may be processed by graphic programs on personal computer. (authors)

  10. Managing Algorithmic Skeleton Nesting Requirements in Realistic Image Processing Applications: The Case of the SKiPPER-II Parallel Programming Environment's Operating Model

    Directory of Open Access Journals (Sweden)

    Duculty Florent

    2005-01-01

    Full Text Available SKiPPER is a SKeleton-based Parallel Programming EnviRonment being developed since 1996 and running at LASMEA Laboratory, the Blaise-Pascal University, France. The main goal of the project was to demonstrate the applicability of skeleton-based parallel programming techniques to the fast prototyping of reactive vision applications. This paper deals with the special features embedded in the latest version of the project: algorithmic skeleton nesting capabilities and a fully dynamic operating model. Throughout the case study of a complete and realistic image processing application, in which we have pointed out the requirement for skeleton nesting, we are presenting the operating model of this feature. The work described here is one of the few reported experiments showing the application of skeleton nesting facilities for the parallelisation of a realistic application, especially in the area of image processing. The image processing application we have chosen is a 3D face-tracking algorithm from appearance.

  11. Parallelization in Modern C++

    CERN Document Server

    CERN. Geneva

    2016-01-01

    The traditionally used and well established parallel programming models OpenMP and MPI are both targeting lower level parallelism and are meant to be as language agnostic as possible. For a long time, those models were the only widely available portable options for developing parallel C++ applications beyond using plain threads. This has strongly limited the optimization capabilities of compilers, has inhibited extensibility and genericity, and has restricted the use of those models together with other, modern higher level abstractions introduced by the C++11 and C++14 standards. The recent revival of interest in the industry and wider community for the C++ language has also spurred a remarkable amount of standardization proposals and technical specifications being developed. Those efforts however have so far failed to build a vision on how to seamlessly integrate various types of parallelism, such as iterative parallel execution, task-based parallelism, asynchronous many-task execution flows, continuation s...

  12. 一种优化MPI程序性能的改进方法%An Improvement Method of MPI Parallel Program Performance Optimization

    Institute of Scientific and Technical Information of China (English)

    柯鹏; 聂鑫

    2011-01-01

    MPI is an ideal parallel programming model which has been confirmed on distributed storage system. Because MPI is based on message passing and adopts the method of message passing to realize the communication between every node, the performance of MPI parallel program is deeply depended on the efficiency of communication. By using the cluster communication to replace the peer to peer communication function and using the derived datatype to create new communication domain, puts forward a common method of optimizing MPI parallel program via the experiment on improving on the MPI parallel program of DNS twice and advancing the performauce of it.%在分布式存储系统上.MPI已被证实是理想的并行程序设计模型。MPI是基于消息传递的并行编程模型,进程间的通信是通过调用库函数来实现的,因此MPI并行程序中,通信部分代码的效率对该并行程序的性能有直接的影响。通过用集群通信函数替代点对点通信函数以及通过派生数据类型和建立新通信域这两种方式.两次改进DNS的MPI并行程序实现。并通过实验给出一个优化MPI并行程序的一般思路与方法。

  13. Spent fuel and high-level radioactive waste transportation report

    Energy Technology Data Exchange (ETDEWEB)

    1990-11-01

    This publication is intended to provide its readers with an introduction to the issues surrounding the subject of transportation of spent nuclear fuel and high-level radioactive waste, especially as those issues impact the southern region of the United States. It was originally issued by the Southern States Energy Board (SSEB) in July 1987 as the Spent Nuclear Fuel and High-Level Radioactive Waste Transportation Primer, a document patterned on work performed by the Western Interstate Energy Board and designed as a ``comprehensive overview of the issues.`` This work differs from that earlier effort in that it is designed for the educated layman with little or no background in nuclear waste issues. In addition, this document is not a comprehensive examination of nuclear waste issues but should instead serve as a general introduction to the subject. Owing to changes in the nuclear waste management system, program activities by the US Department of Energy and other federal agencies and developing technologies, much of this information is dated quickly. While this report uses the most recent data available, readers should keep in mind that some of the material is subject to rapid change. SSEB plans periodic updates in the future to account for changes in the program. Replacement pages will be supplied to all parties in receipt of this publication provided they remain on the SSEB mailing list.

  14. Evaluation and selection of candidate high-level waste forms

    Energy Technology Data Exchange (ETDEWEB)

    Bernadzikowski, T. A.; Allender, J. S.; Butler, J. L.; Gordon, D. E.; Gould, Jr., T. H.; Stone, J. A.

    1982-03-01

    Seven candidate waste forms being developed under the direction of the Department of Energy's National High-Level Waste (HLW) Technology Program, were evaluated as potential media for the immobilization and geologic disposal of high-level nuclear wastes. The evaluation combined preliminary waste form evaluations conducted at DOE defense waste-sites and independent laboratories, peer review assessments, a product performance evaluation, and a processability analysis. Based on the combined results of these four inputs, two of the seven forms, borosilicate glass and a titanate based ceramic, SYNROC, were selected as the reference and alternative forms for continued development and evaluation in the National HLW Program. Both the glass and ceramic forms are viable candidates for use at each of the DOE defense waste-sites; they are also potential candidates for immobilization of commercial reprocessing wastes. This report describes the waste form screening process, and discusses each of the four major inputs considered in the selection of the two forms.

  15. Spent Fuel and High-Level Radioactive Waste Transportation Report

    Energy Technology Data Exchange (ETDEWEB)

    1992-03-01

    This publication is intended to provide its readers with an introduction to the issues surrounding the subject of transportation of spent nuclear fuel and high-level radioactive waste, especially as those issues impact the southern region of the United States. It was originally issued by SSEB in July 1987 as the Spent Nuclear Fuel and High-Level Radioactive Waste Transportation Primer, a document patterned on work performed by the Western Interstate Energy Board and designed as a ``comprehensive overview of the issues.`` This work differs from that earlier effort in that it is designed for the educated layman with little or no background in nuclear waste Issues. In addition. this document is not a comprehensive examination of nuclear waste issues but should instead serve as a general introduction to the subject. Owing to changes in the nuclear waste management system, program activities by the US Department of Energy and other federal agencies and developing technologies, much of this information is dated quickly. While this report uses the most recent data available, readers should keep in mind that some of the material is subject to rapid change. SSEB plans periodic updates in the future to account for changes in the program. Replacement pages will be supplied to all parties in receipt of this publication provided they remain on the SSEB mailing list.

  16. Spent fuel and high-level radioactive waste transportation report

    Energy Technology Data Exchange (ETDEWEB)

    1989-11-01

    This publication is intended to provide its readers with an introduction to the issues surrounding the subject of transportation of spent nuclear fuel and high-level radioactive waste, especially as those issues impact the southern region of the United States. It was originally issued by the Southern States Energy Board (SSEB) in July 1987 as the Spent Nuclear Fuel and High-Level Radioactive Waste Transportation Primer, a document patterned on work performed by the Western Interstate Energy Board and designed as a ``comprehensive overview of the issues.`` This work differs from that earlier effort in that it is designed for the educated layman with little or no background in nuclear waste issues. In addition, this document is not a comprehensive examination of nuclear waste issues but should instead serve as a general introduction to the subject. Owing to changes in the nuclear waste management system, program activities by the US Department of Energy and other federal agencies and developing technologies, much of this information is dated quickly. While this report uses the most recent data available, readers should keep in mind that some of the material is subject to rapid change. SSEB plans periodic updates in the future to account for changes in the program. Replacement pages sew be supplied to all parties in receipt of this publication provided they remain on the SSEB mailing list.

  17. 层级式可视化并行程序建模系统研究%Research of Hierarchical Visual Modeling System for Parallel Programs

    Institute of Scientific and Technical Information of China (English)

    徐祯; 孙济洲; 于策; 孙超; 汤善江

    2011-01-01

    The visual modeling technology can reduce the difficulty of the design of parallel programs effectively, the complex hardware architecture still puts forward new challenges on the parallel program design method on the software level.To solve these issues, this paper proposes a visual modeling methodology based on the hierarchical idea and an hierarchical modeling scheme for parallel programs, and designs and implements a modeling system called e-ParaModel for multi-core cluster environments.A modeling paradigm to verify the system's feasibility and applicability is completed.%可视化建模技术虽能降低并行程序设计的难度,但复杂的硬件结构仍使软件层面上的并行程序设计方法存在一定难度.为此,提出一种基于层级式建模思想的并行程序可视化建模方法和分层建模方案,设计和实现一个面向多层次集群环境的可视化建模系统e-ParaModel,用建模实例验证其可行性和实用性.

  18. Survey of new vector computers: The CRAY 1S from CRAY research; the CYBER 205 from CDC and the parallel computer from ICL - architecture and programming

    Science.gov (United States)

    Gentzsch, W.

    1982-01-01

    Problems which can arise with vector and parallel computers are discussed in a user oriented context. Emphasis is placed on the algorithms used and the programming techniques adopted. Three recently developed supercomputers are examined and typical application examples are given in CRAY FORTRAN, CYBER 205 FORTRAN and DAP (distributed array processor) FORTRAN. The systems performance is compared. The addition of parts of two N x N arrays is considered. The influence of the architecture on the algorithms and programming language is demonstrated. Numerical analysis of magnetohydrodynamic differential equations by an explicit difference method is illustrated, showing very good results for all three systems. The prognosis for supercomputer development is assessed.

  19. High - level Professional and Technical Experts Training Based on the Business Development Needs--- Exploration and Practice on Special Technical Experts Training Programs of Petroleum Engineering%基于业务发展需要的高层次专业技术人才培训--石油工程特种技术培训项目的实践

    Institute of Scientific and Technical Information of China (English)

    房新娜; 盛湘

    2013-01-01

      高层次专业技术人才培训越来越受到石油企业的重视,但由于培训对象的高学历、高职称和高水平,实践中培训教师难找、课程难设计、教学难组织等问题一直困扰着培训组织者。本文结合石油工程特种技术培训项目的实践,从培训内容设计、培训师资安排以及教学组织等几个方面,探讨了提高高层次专业技术人才培训效果的措施。%  High-level professional and technical experts trainings have gained more and more popularity among the oil companies ,but due to the fact that the trainees are highly educated and of rich working expertise in practice , the training of such competent teachers are difficult to appoint ,and so are the design curriculum and training organi-zation .In accordance with the practice of the special technical training programs in petroleum engineering ,this paper discusses the high-level professional and technical experts training to improve the effectiveness of training measures , from the training content design ,teacher training deployment and other aspects of teaching organization .

  20. “少数民族高层次骨干人才计划”高等教育政策的实践困境及其破解%The Practice Difficulties and the Solutions of Higher Educational Policy of High Level Minority Talents Program

    Institute of Scientific and Technical Information of China (English)

    蒋馨岚

    2014-01-01

    “少数民族高层次骨干人才计划”教育政策的实施为我国民族地区社会经济发展培养了一批高层次人才。但该教育政策在实施中存在招生指标分配不合理、学科专业分布失衡、统一基础强化培训模式落后等问题,需要通过合理设计分配指标、优化学科专业结构、改革基础培训模式和创新就业机制等措施来破解困境。%The implementation of high level minority talents program education policy trains a series of high level talents for the social economic development of the minority areas of China .However , there are some practice difficulties in the implementation , such as the improper distribution of enrollment index , unbalanced distribution of discipline and specialty , backward unified basic training mode , and so on.To solve the difficulties , some measures should be carried out , such as reasonable distribution of enrollment index , optimizing the structure of discipline and specialty , reform of basic training mode and creative employment mechanism , and so on .

  1. Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

    Science.gov (United States)

    Choudhary, Alok Nidhi

    1989-01-01

    Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.

  2. Parallel Simulation of Loosely Timed SystemC/TLM Programs: Challenges Raised by an Industrial Case Study

    Directory of Open Access Journals (Sweden)

    Denis Becker

    2016-05-01

    Full Text Available Transaction level models of systems-on-chip in SystemC are commonly used in the industry to provide an early simulation environment. The SystemC standard imposes coroutine semantics for the scheduling of simulated processes, to ensure determinism and reproducibility of simulations. However, because of this, sequential implementations have, for a long time, been the only option available, and still now the reference implementation is sequential. With the increasing size and complexity of models, and the multiplication of computation cores on recent machines, the parallelization of SystemC simulations is a major research concern. There have been several proposals for SystemC parallelization, but most of them are limited to cycle-accurate models. In this paper we focus on loosely timed models, which are commonly used in the industry. We present an industrial context and show that, unfortunately, most of the existing approaches for SystemC parallelization can fundamentally not apply in this context. We support this claim with a set of measurements performed on a platform used in production at STMicroelectronics. This paper surveys existing techniques, presents a visualization and profiling tool and identifies unsolved challenges in the parallelization of SystemC models at transaction level.

  3. SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws

    Science.gov (United States)

    Cooke, Daniel; Rushton, Nelson

    2013-01-01

    With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less

  4. Parallel computing works

    Energy Technology Data Exchange (ETDEWEB)

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  5. High-Level Waste Systems Plan. Revision 7

    Energy Technology Data Exchange (ETDEWEB)

    Brooke, J.N.; Gregory, M.V.; Paul, P.; Taylor, G.; Wise, F.E.; Davis, N.R.; Wells, M.N.

    1996-10-01

    This revision of the High-Level Waste (HLW) System Plan aligns SRS HLW program planning with the DOE Savannah River (DOE-SR) Ten Year Plan (QC-96-0005, Draft 8/6), which was issued in July 1996. The objective of the Ten Year Plan is to complete cleanup at most nuclear sites within the next ten years. The two key principles of the Ten Year Plan are to accelerate the reduction of the most urgent risks to human health and the environment and to reduce mortgage costs. Accordingly, this System Plan describes the HLW program that will remove HLW from all 24 old-style tanks, and close 20 of those tanks, by 2006 with vitrification of all HLW by 2018. To achieve these goals, the DWPF canister production rate is projected to climb to 300 canisters per year starting in FY06, and remain at that rate through the end of the program in FY18, (Compare that to past System Plans, in which DWPF production peaked at 200 canisters per year, and the program did not complete until 2026.) An additional $247M (FY98 dollars) must be made available as requested over the ten year planning period, including a one-time $10M to enhance Late Wash attainment. If appropriate resources are made available, facility attainment issues are resolved and regulatory support is sufficient, then completion of the HLW program in 2018 would achieve a $3.3 billion cost savings to DOE, versus the cost of completing the program in 2026. Facility status information is current as of October 31, 1996.

  6. MPJ并行编程框架的实现及安装配置%Implementation, Installation and Configuration of MPJ Parallel Programming Framework

    Institute of Scientific and Technical Information of China (English)

    刘俊莉; 林晓锐; 王楚斌; 谭子义; 司徒祝坤

    2009-01-01

    MPJ编程接口为Java应用程序提供类MPI消息传递应用.本文阐述MPJ并行编程框架的设计、实现,探讨其体系架构、实现机制及相关的技术特征,详细描述MPJ Express的安装配置过程,最后给出MPJ程序的运行例子.%The MPJ programming interface provides MPI-like message passing for Java applications. In this article, an overview and introduction on the design and implementation of the MPJ parallel programming framework, including architectures, implementation mechanisms and correlated technologies, are presented. Besides, the installation and configuration process of MPJ Express are also introduced. At last, a practical MPJ program demo is given.

  7. Cloning, high-level expression, purification and characterization of a ...

    African Journals Online (AJOL)

    Cloning, high-level expression, purification and characterization of a staphylokinase variant, SakøC, ... African Journal of Biotechnology ... Hence in this study, we reported the cloning, high-level expression, purification and characterization of ...

  8. An abstract machine based execution model for computer architecture design and efficient implementation of logic programs in parallel.

    OpenAIRE

    Hermenegildo, Manuel V.

    1986-01-01

    The term "Logic Programming" refers to a variety of computer languages and execution models which are based on the traditional concept of Symbolic Logic. The expressive power of these languages offers promise to be of great assistance in facing the programming challenges of present and future symbolic processing applications in Artificial Intelligence, Knowledge-based systems, and many other areas of computing. The sequential execution speed of logic programs has been greatly improved sinc...

  9. Parallel algorithms

    CERN Document Server

    Casanova, Henri; Robert, Yves

    2008-01-01

    ""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi

  10. 4MOST: science operations for a large spectroscopic survey program with multiple science cases executed in parallel

    Science.gov (United States)

    Walcher, C. Jakob; de Jong, Roelof S.; Dwelly, Tom; Bellido, Olga; Boller, Thomas; Chiappini, Cristina; Feltzing, Sofia; Irwin, Mike; McMahon, Richard; Merloni, Andrea; Schnurr, Olivier; Walton, Nicholas A.

    2016-07-01

    The 4MOST instrument is a multi-object spectrograph to be mounted to the VISTA telescope at ESOs La- Silla-Paranal observatory. 4MOST will deliver several 10s of millions of spectra from surveys typically lasting 5 years. 4MOST will address Galactic and extra-galactic science cases simultaneously, i.e. by observing targets from a large number of different surveys within one science exposure. This parallel mode of operations as well as the survey nature of 4MOST require some 4MOST-specific operations features within the overall operations model of ESO. These features are necessary to minimize any changes to the ESO operations model at the La- Silla-Paranal observatory on the one hand, and to enable parallel science observing and thus the most efficient use of the instrument on the other hand. The main feature is that the 4MOST consortium will not only deliver the instrument, but also contractual services to the user community, which is why 4MOST is also described as a 'facility'. We describe the operations model for 4MOST as seen by the consortium building the instrument. Among others this encompasses: 1) A joint science team for all participating surveys (i.e. including community surveys as well as those from the instrument-building consortium). 2) Common centralized tasks in observing preparation and data management provided as service by the consortium. 3) Transparency of all decisions to all stakeholders. 4) Close interaction between science and facility operations. Here we describe our efforts to make parallel observing mode efficient, flexible, and manageable.

  11. C++ and Massively Parallel Computers

    Directory of Open Access Journals (Sweden)

    Daniel J. Lickly

    1993-01-01

    Full Text Available Our goal is to apply the software engineering advantages of object-oriented programming to the raw power of massively parallel architectures. To do this we have constructed a hierarchy of C++ classes to support the data-parallel paradigm. Feasibility studies and initial coding can be supported by any serial machine that has a C++ compiler. Parallel execution requires an extended Cfront, which understands the data-parallel classes and generates C* code. (C* is a data-parallel superset of ANSI C developed by Thinking Machines Corporation. This approach provides potential portability across parallel architectures and leverages the existing compiler technology for translating data-parallel programs onto both SIMD and MIMD hardware.

  12. A program system for ab initio MO calculations on vector and parallel processing machines III. Integral reordering and four-index transformation

    Science.gov (United States)

    Wiest, Roland; Demuynck, Jean; Bénard, Marc; Rohmer, Marie-Madeleine; Ernenwein, René

    1991-01-01

    This series of three papers presents a program system for ab initio molecular orbital calculations on vector and parallel computers. Part III is devoted to the four-index transformation on a molecular orbital basis of size NMO of the file of two-electron integrals ( pq∥ rs) generated by a contracted Gaussian set of size NATO (number of atomic orbitals). A fast Yoshimine algorithm first sorts the ( pq∥ rs) integrals with respect to index pq only. This file of half-sorted integrals labelled by their rs-index can be processed without further modification to generate either the transformed integrals or the supermatrix elements. The large memory available on the CRAY-2 has made possible to implement the transformation algorithm proposed by Bender in 1972, which requires a core-storage allocation varying as (NATO) 3. Two versions of Bender's algorithm are included in the present program. The first version is an in-core version, where the complete file of accumulated contributions to transformed integrals is stored and updated in central memory. This version has been parallelized by distributing over a limited number of logical tasks the NATO steps corresponding to the scanning of the most external loop. The second version is an out-of-core version, in which twin fires are alternatively used as input and output for the accumulated contributions to transformed integrals. This version is not parallel. The choice of one or another version and (for version 1) the determination of the number of tasks depends upon the balance between the available and the requested amounts of storage. The storage management and the choice of the proper version are carried out automatically using dynamic storage allocation. Both versions are vectorized and take advantage of the molecular symmetry.

  13. Design and Programming for Cable-Driven Parallel Robots in the German Pavilion at the EXPO 2015

    Directory of Open Access Journals (Sweden)

    Philipp Tempel

    2015-08-01

    Full Text Available In the German Pavilion at the EXPO 2015, two large cable-driven parallel robots are flying over the heads of the visitors representing two bees flying over Germany and displaying everyday life in Germany. Each robot consists of a mobile platform and eight cables suspended by winches and follows a desired trajectory, which needs to be computed in advance taking technical limitations, safety considerations and visual aspects into account. In this paper, a path planning software is presented, which includes the design process from developing a robot design and workspace estimation via planning complex trajectories considering technical limitations through to exporting a complete show. For a test trajectory, simulation results are given, which display the relevant trajectories and cable force distributions.

  14. Final Report, Center for Programming Models for Scalable Parallel Computing: Co-Array Fortran, Grant Number DE-FC02-01ER25505

    Energy Technology Data Exchange (ETDEWEB)

    Robert W. Numrich

    2008-04-22

    The major accomplishment of this project is the production of CafLib, an 'object-oriented' parallel numerical library written in Co-Array Fortran. CafLib contains distributed objects such as block vectors and block matrices along with procedures, attached to each object, that perform basic linear algebra operations such as matrix multiplication, matrix transpose and LU decomposition. It also contains constructors and destructors for each object that hide the details of data decomposition from the programmer, and it contains collective operations that allow the programmer to calculate global reductions, such as global sums, global minima and global maxima, as well as vector and matrix norms of several kinds. CafLib is designed to be extensible in such a way that programmers can define distributed grid and field objects, based on vector and matrix objects from the library, for finite difference algorithms to solve partial differential equations. A very important extra benefit that resulted from the project is the inclusion of the co-array programming model in the next Fortran standard called Fortran 2008. It is the first parallel programming model ever included as a standard part of the language. Co-arrays will be a supported feature in all Fortran compilers, and the portability provided by standardization will encourage a large number of programmers to adopt it for new parallel application development. The combination of object-oriented programming in Fortran 2003 with co-arrays in Fortran 2008 provides a very powerful programming model for high-performance scientific computing. Additional benefits from the project, beyond the original goal, include a programto provide access to the co-array model through access to the Cray compiler as a resource for teaching and research. Several academics, for the first time, included the co-array model as a topic in their courses on parallel computing. A separate collaborative project with LANL and PNNL showed how to

  15. Practical Use of High-level Petri Net

    DEFF Research Database (Denmark)

    The aim of the workshop is to bring together researchers and practitioners with interests in the use of high-level nets and their tools for practical applications. A typical paper is expected to report on a case study where high-level Petri nets and their tools have been used in practice. We also...... welcome papers describing a tool, a methodology, or other developments that have proved successful to make high-level Petri nets more applicable in practice....

  16. PARALLEL STABILIZATION

    Institute of Scientific and Technical Information of China (English)

    J.L.LIONS

    1999-01-01

    A new algorithm for the stabilization of (possibly turbulent, chaotic) distributed systems, governed by linear or non linear systems of equations is presented. The SPA (Stabilization Parallel Algorithm) is based on a systematic parallel decomposition of the problem (related to arbitrarily overlapping decomposition of domains) and on a penalty argument. SPA is presented here for the case of linear parabolic equations: with distrjbuted or boundary control. It extends to practically all linear and non linear evolution equations, as it will be presented in several other publications.

  17. Playable Serious Games for Studying and Programming Computational STEM and Informatics Applications of Distributed and Parallel Computer Architectures

    Science.gov (United States)

    Amenyo, John-Thones

    2012-01-01

    Carefully engineered playable games can serve as vehicles for students and practitioners to learn and explore the programming of advanced computer architectures to execute applications, such as high performance computing (HPC) and complex, inter-networked, distributed systems. The article presents families of playable games that are grounded in…

  18. Easy and Effective Parallel Programmable ETL

    DEFF Research Database (Denmark)

    Thomsen, Christian; Pedersen, Torben Bach

    2011-01-01

    , typically the case that the ETL program can exploit both task parallelism and data parallelism to run faster. This, on the other hand, makes the development time longer as it is complex to create a parallel ETL program. To remedy this situation, we propose efficient ways to parallelize typical ETL tasks...... and we implement these new constructs in an ETL framework. The constructs are easy to apply and do only require few modifications to an ETL program to parallelize it. They support both task and data parallelism and give the programmer different possibilities to choose from. An experimental evaluation...

  19. Parallel Worlds

    DEFF Research Database (Denmark)

    Steno, Anne Mia

    2013-01-01

    as a symbol of something else, for instance as a way of handling uncertainty in difficult times, magical practice should also be seen as an emic concept. In this context, understanding the existence of two parallel universes, the profane and the magic, is important because the witches’ movements across...

  20. The ATLAS High Level Trigger Configuration and Steering

    CERN Document Server

    Stelzer, J; The ATLAS collaboration

    2010-01-01

    In March 2010 the four LHC experiments saw the first proton-proton collisions at 7 TeV. Still within the year a collision rate of nearly 10 MHz is expected. At ATLAS, events of potential interest for ATLAS physics are selected by a three level trigger system, with a final recording rate of about 200 Hz. The first level (L1) is implemented in customized hardware, the two levels of the high level trigger (HLT) are software triggers. Within the ATLAS physics program more than 500 trigger signatures are defined. The HLT tests each signature on each L1-accepted event, the test outcome is recorded for later analysis. The HLT-Steering is responsible for this. It foremost ensures the independent test of each signature, guarantying unbiased trigger decisions. Yet, to minimize data readout and execution time, cached detector data and once-calculated trigger objects are reused to form the decision. Some signature tests are performed only on a scaled-down fraction of candidate events, in order to reduce the output rate a...

  1. MULTI-CORE PARALLEL PROGRAMMING BASED ON COMMUNICATION ON WIN32 PLATFORM%Win32平台基于通信的多核并行编程方法

    Institute of Scientific and Technical Information of China (English)

    李青; 徐璐娜

    2014-01-01

    随着计算机硬件的发展,多核并行计算在计算机软件及应用领域的出现率也越来越频繁。目前的多核编程模型采用线程级并行模型,现有的多线程并行编程模型主要有线程库、指令模型和任务式模型三种。提出一种与 MPI 并行编程模型相似的基于通信的方法在 Win32平台上来实现并行编程,在此基础上实现 MTI 并行编程模型。通过若干典型的测试给出使用 MTI 进行并行编程的执行结果,结果表明 MTI 是有效、易用的。%With the development of computer science and technology,more and more frequently the term “multi-core parallel computing”appears in computer software and its application field.Current multi-core programming model adopts thread-level parallel model,existing multi-thread parallel programming model mainly includes the thread library,directive models and tasking models.In this paper we propose a communication-based method to implement parallel programming on Win32 platform,which is similar to the multi-thread interface (MTI) parallel programming model,and the MTI parallel programming model is realised based on that.Moreover,we provide the execution outcome of parallel programming using MTI through a couple of typical tests,the results prove that the MTI is effective and easy in use.

  2. Structure_threader: An improved method for automation and parallelization of programs structure, fastStructure and MavericK on multicore CPU systems.

    Science.gov (United States)

    Pina-Martins, Francisco; Silva, Diogo N; Fino, Joana; Paulo, Octávio S

    2017-08-04

    Structure_threader is a program to parallelize multiple runs of genetic clustering software that does not make use of multithreading technology (structure, fastStructure and MavericK) on multicore computers. Our approach was benchmarked across multiple systems and displayed great speed improvements relative to the single-threaded implementation, scaling very close to linearly with the number of physical cores used. Structure_threader was compared to previous software written for the same task-ParallelStructure and StrAuto and was proven to be the faster (up to 25% faster) wrapper under all tested scenarios. Furthermore, Structure_threader can perform several automatic and convenient operations, assisting the user in assessing the most biologically likely value of 'K' via implementations such as the "Evanno," or "Thermodynamic Integration" tests and automatically draw the "meanQ" plots (static or interactive) for each value of K (or even combined plots). Structure_threader is written in python 3 and licensed under the GPLv3. It can be downloaded free of charge at https://github.com/StuntsPT/Structure_threader. © 2017 John Wiley & Sons Ltd.

  3. The STAPL Parallel Graph Library

    KAUST Repository

    Harshvardhan,

    2013-01-01

    This paper describes the stapl Parallel Graph Library, a high-level framework that abstracts the user from data-distribution and parallelism details and allows them to concentrate on parallel graph algorithm development. It includes a customizable distributed graph container and a collection of commonly used parallel graph algorithms. The library introduces pGraph pViews that separate algorithm design from the container implementation. It supports three graph processing algorithmic paradigms, level-synchronous, asynchronous and coarse-grained, and provides common graph algorithms based on them. Experimental results demonstrate improved scalability in performance and data size over existing graph libraries on more than 16,000 cores and on internet-scale graphs containing over 16 billion vertices and 250 billion edges. © Springer-Verlag Berlin Heidelberg 2013.

  4. 40 CFR 227.30 - High-level radioactive waste.

    Science.gov (United States)

    2010-07-01

    ... 40 Protection of Environment 24 2010-07-01 2010-07-01 false High-level radioactive waste. 227.30...-level radioactive waste. High-level radioactive waste means the aqueous waste resulting from the operation of the first cycle solvent extraction system, or equivalent, and the concentrated waste from...

  5. Process for solidifying high-level nuclear waste

    Science.gov (United States)

    Ross, Wayne A.

    1978-01-01

    The addition of a small amount of reducing agent to a mixture of a high-level radioactive waste calcine and glass frit before the mixture is melted will produce a more homogeneous glass which is leach-resistant and suitable for long-term storage of high-level radioactive waste products.

  6. The Research of Multi-Core Parallel Program Evaluation%多核并行程序评测相关技术研究

    Institute of Scientific and Technical Information of China (English)

    龚溪东

    2011-01-01

    On the occasion of competing, Multi-Core parallel program evaluation can get more satisfying efficiency while the accuracy of results can also be guaranteed comparing to the traditional single-process program evaluation. It is the machine itself, considering the hardware environment and current system load status, that dynamically decide whether to release a child process in process pool to participate in the program evaluation process in order to take full advantage of multi-core computing resources and improve the evaluation efficiency.%在竞赛场合下,多核并行程序评测技术是在保证程序评测结果满足既定准确度的前提下,对传统单进程程序评测技术的一种改进。它综合考虑机器本身的硬件环境以及系统当前负载状态对评测结果可能的影响,动态决定是否释放进程池中子进程参与到程序评测过程,在保证准确度的情况下,充分利用多核计算资源,提升评测效率。

  7. THE MODEL OF INSTRUCTION-LEVEL PARALLEL PROGRAM EXECUTION%指令级并行程序执行模型

    Institute of Scientific and Technical Information of China (English)

    乔林; 汤志忠; 容红波; 张赤红

    1999-01-01

    提出了一种形式化的指令级并行程序执行模型(Instruction-Level Parallel Program Execution Model,ILPPEM).ILPPEM不仅可以描述程序实际执行过程的行为,也可以描述编译和执行时不确定的时间变化所造成的可行执行过程的行为;同时提出了程序执行的同构概念,并证明了可行程序执行必与一个实际程序执行同构,从而为并行程序编译和验证提供了理论依据.

  8. Spent nuclear fuel project high-level information management plan

    Energy Technology Data Exchange (ETDEWEB)

    Main, G.C.

    1996-09-13

    This document presents the results of the Spent Nuclear Fuel Project (SNFP) Information Management Planning Project (IMPP), a short-term project that identified information management (IM) issues and opportunities within the SNFP and outlined a high-level plan to address them. This high-level plan for the SNMFP IM focuses on specific examples from within the SNFP. The plan`s recommendations can be characterized in several ways. Some recommendations address specific challenges that the SNFP faces. Others form the basis for making smooth transitions in several important IM areas. Still others identify areas where further study and planning are indicated. The team`s knowledge of developments in the IM industry and at the Hanford Site were crucial in deciding where to recommend that the SNFP act and where they should wait for Site plans to be made. Because of the fast pace of the SNFP and demands on SNFP staff, input and interaction were primarily between the IMPP team and members of the SNFP Information Management Steering Committee (IMSC). Key input to the IMPP came from a workshop where IMSC members and their delegates developed a set of draft IM principles. These principles, described in Section 2, became the foundation for the recommendations found in the transition plan outlined in Section 5. Availability of SNFP staff was limited, so project documents were used as a basis for much of the work. The team, realizing that the status of the project and the environment are continually changing, tried to keep abreast of major developments since those documents were generated. To the extent possible, the information contained in this document is current as of the end of fiscal year (FY) 1995. Programs and organizations on the Hanford Site as a whole are trying to maximize their return on IM investments. They are coordinating IM activities and trying to leverage existing capabilities. However, the SNFP cannot just rely on Sitewide activities to meet its IM requirements

  9. Floorplan-Driven Multivoltage High-Level Synthesis

    Directory of Open Access Journals (Sweden)

    Xianwu Xing

    2009-01-01

    Full Text Available As the semiconductor technology advances, interconnect plays a more and more important role in power consumption in VLSI systems. This also imposes a challenge in high-level synthesis, in which physical information is limited and conventionally considered after high-level synthesis. To close the gap between high-level synthesis and physical implementation, integration of physical synthesis and high-level synthesis is essential. In this paper, a technique named FloM is proposed for integrating floorplanning into high-level synthesis of VLSI system with multivoltage datapath. Experimental results obtained show that the proposed technique is effective and the energy consumed by both the datapath and the wires can be reduced by more than 40%.

  10. Coarrars for Parallel Processing

    Science.gov (United States)

    Snyder, W. Van

    2011-01-01

    The design of the Coarray feature of Fortran 2008 was guided by answering the question "What is the smallest change required to convert Fortran to a robust and efficient parallel language." Two fundamental issues that any parallel programming model must address are work distribution and data distribution. In order to coordinate work distribution and data distribution, methods for communication and synchronization must be provided. Although originally designed for Fortran, the Coarray paradigm has stimulated development in other languages. X10, Chapel, UPC, Titanium, and class libraries being developed for C++ have the same conceptual framework.

  11. Effectiveness of Inclusion of Dry Needling in a Multimodal Therapy Program for Patellofemoral Pain: A Randomized Parallel-Group Trial.

    Science.gov (United States)

    Espí-López, Gemma V; Serra-Añó, Pilar; Vicent-Ferrando, Juan; Sánchez-Moreno-Giner, Miguel; Arias-Buría, Jose L; Cleland, Joshua; Fernández-de-Las-Peñas, César

    2017-06-01

    Study Design Randomized controlled trial. Background Evidence suggests that multimodal interventions that include exercise therapy may be effective for patellofemoral pain (PFP); however, no study has investigated the effects of trigger point (TrP) dry needling (DN) in people with PFP. Objectives To compare the effects of adding TrP DN to a manual therapy and exercise program on pain, function, and disability in individuals with PFP. Methods Individuals with PFP (n = 60) recruited from a public hospital in Valencia, Spain were randomly allocated to manual therapy and exercises (n = 30) or manual therapy and exercise plus TrP DN (n = 30). Both groups received the same manual therapy and strengthening exercise program for 3 sessions (once a week for 3 weeks), and 1 group also received TrP DN to active TrPs within the vastus medialis and vastus lateralis muscles. The pain subscale of the Knee injury and Osteoarthritis Outcome Score (KOOS; 0-100 scale) was used as the primary outcome. Secondary outcomes included other subscales of the KOOS, the Knee Society Score, the International Knee Documentation Committee Subjective Knee Evaluation Form (IKDC), and the numeric pain-rating scale. Patients were assessed at baseline and at 15-day (posttreatment) and 3-month follow-ups. Analysis was conducted with mixed analyses of covariance, adjusted for baseline scores. Results At 3 months, 58 subjects (97%) completed the follow-up. No significant between-group differences (all, P>.391) were observed for any outcome: KOOS pain subscale mean difference, -2.1 (95% confidence interval [CI]: -4.6, 0.4); IKDC mean difference, 2.3 (95% CI: -0.1, 4.7); knee pain intensity mean difference, 0.3 (95% CI: -0.2, 0.8). Both groups experienced similar moderate-to-large within-group improvements in all outcomes (standardized mean differences of 0.6 to 1.1); however, only the KOOS function in sport and recreation subscale surpassed the prespecified minimum important change. Conclusion The current

  12. Overview of the Force Scientific Parallel Language

    Directory of Open Access Journals (Sweden)

    Gita Alaghband

    1994-01-01

    Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.

  13. The parallel adult education system

    DEFF Research Database (Denmark)

    Wahlgren, Bjarne

    2015-01-01

    for competence development. The Danish university educational system includes two parallel programs: a traditional academic track (candidatus) and an alternative practice-based track (master). The practice-based program was established in 2001 and organized as part time. The total program takes half the time...

  14. GPU并行计算编程技术介绍%Introduction to GPU Parallel Programming Technology

    Institute of Scientific and Technical Information of China (English)

    王泽寰; 王鹏

    2013-01-01

    近年来 GPU 通用计算蓬勃发展。程序开发者和 GPU 通用计算应用程序的数量增长很快。针对不同的应用程序的要求和程序开发者不同的使用习惯,围绕着CUDA架构的GPU,NVIDIA及其合作伙伴共同开发了很多种不同的编程技术。本文详细介绍了它们的特点和适用对象。希望可以帮助广大开发人员针对自己的编程习惯和程序要求选择最为合适的编程技术。%Recently, GPGPU is developing rapidly. We can see a speedy growth in the number of programmers and the GPGPU applications. Tailoring to the demands of different applications and the various programming habits of the programmers, NVIDIA has co-developed different technologies used on GPU of CUDA architecture with its partners. This article is written for the purpose of making an elaborated introduction to them, their respective characteristics, and their target users. It is also written in the hope that programmers and developers can choose their most suitable technologies.

  15. The Parallel C Preprocessor

    Directory of Open Access Journals (Sweden)

    Eugene D. Brooks III

    1992-01-01

    Full Text Available We describe a parallel extension of the C programming language designed for multiprocessors that provide a facility for sharing memory between processors. The programming model was initially developed on conventional shared memory machines with small processor counts such as the Sequent Balance and Alliant FX/8, but has more recently been used on a scalable massively parallel machine, the BBN TC2000. The programming model is split-join rather than fork-join. Concurrency is exploited to use a fixed number of processors more efficiently rather than to exploit more processors as in the fork-join model. Team splitting, a mechanism to split the team of processors executing a code into subteams to handle parallel subtasks, is used to provide an efficient mechanism to exploit nested concurrency. We have found the split-join programming model to have an inherent implementation advantage, compared to the fork-join model, when the number of processors in a machine becomes large.

  16. PLUTONIUM/HIGH-LEVEL VITRIFIED WASTE BDBE DOSE CALCULATION

    Energy Technology Data Exchange (ETDEWEB)

    D.C. Richardson

    2003-03-19

    In accordance with the Nuclear Waste Policy Amendments Act of 1987, Yucca Mountain was designated as the site to be investigated as a potential repository for the disposal of high-level radioactive waste. The Yucca Mountain site is an undeveloped area located on the southwestern edge of the Nevada Test Site (NTS), about 100 miles northwest of Las Vegas. The site currently lacks rail service or an existing right-of-way. If the Yucca Mountain site is found suitable for the repository, rail service is desirable to the Office of Civilian Waste Management (OCRWM) Program because of the potential of rail transportation to reduce costs and to reduce the number of shipments relative to highway transportation. A Preliminary Rail Access Study evaluated 13 potential rail spur options. Alternative routes within the major options were also developed. Each of these options was then evaluated for potential land use conflicts and access to regional rail carriers. Three potential routes having few land use conflicts and having access to regional carriers were recommended for further investigation. Figure 1-1 shows these three routes. The Jean route is estimated to be about 120 miles long, the Carlin route to be about 365 miles long, and Caliente route to be about 365 miles long. The remaining ten routes continue to be monitored and should any of the present conflicts change, a re-evaluation of that route will be made. Complete details of the evaluation of the 13 routes can be found in the previous study. The DOE has not identified any preferred route and recognizes that the transportation issues need a full and open treatment under the National Environmental Policy Act. The issue of transportation will be included in public hearings to support development of the Environmental Impact Statement (EIS) proceedings for either the Monitored Retrievable Storage Facility or the Yucca Mountain Project or both.

  17. Engineering neural systems for high-level problem solving.

    Science.gov (United States)

    Sylvester, Jared; Reggia, James

    2016-07-01

    There is a long-standing, sometimes contentious debate in AI concerning the relative merits of a symbolic, top-down approach vs. a neural, bottom-up approach to engineering intelligent machine behaviors. While neurocomputational methods excel at lower-level cognitive tasks (incremental learning for pattern classification, low-level sensorimotor control, fault tolerance and processing of noisy data, etc.), they are largely non-competitive with top-down symbolic methods for tasks involving high-level cognitive problem solving (goal-directed reasoning, metacognition, planning, etc.). Here we take a step towards addressing this limitation by developing a purely neural framework named galis. Our goal in this work is to integrate top-down (non-symbolic) control of a neural network system with more traditional bottom-up neural computations. galis is based on attractor networks that can be "programmed" with temporal sequences of hand-crafted instructions that control problem solving by gating the activity retention of, communication between, and learning done by other neural networks. We demonstrate the effectiveness of this approach by showing that it can be applied successfully to solve sequential card matching problems, using both human performance and a top-down symbolic algorithm as experimental controls. Solving this kind of problem makes use of top-down attention control and the binding together of visual features in ways that are easy for symbolic AI systems but not for neural networks to achieve. Our model can not only be instructed on how to solve card matching problems successfully, but its performance also qualitatively (and sometimes quantitatively) matches the performance of both human subjects that we had perform the same task and the top-down symbolic algorithm that we used as an experimental control. We conclude that the core principles underlying the galis framework provide a promising approach to engineering purely neurocomputational systems for problem

  18. Electric Grid Expansion Planning with High Levels of Variable Generation

    Energy Technology Data Exchange (ETDEWEB)

    Hadley, Stanton W. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); You, Shutang [Univ. of Tennessee, Knoxville, TN (United States); Shankar, Mallikarjun [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Liu, Yilu [Univ. of Tennessee, Knoxville, TN (United States); Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2016-02-01

    Renewables are taking a large proportion of generation capacity in U.S. power grids. As their randomness has increasing influence on power system operation, it is necessary to consider their impact on system expansion planning. To this end, this project studies the generation and transmission expansion co-optimization problem of the US Eastern Interconnection (EI) power grid with a high wind power penetration rate. In this project, the generation and transmission expansion problem for the EI system is modeled as a mixed-integer programming (MIP) problem. This study analyzed a time series creation method to capture the diversity of load and wind power across balancing regions in the EI system. The obtained time series can be easily introduced into the MIP co-optimization problem and then solved robustly through available MIP solvers. Simulation results show that the proposed time series generation method and the expansion co-optimization model and can improve the expansion result significantly after considering the diversity of wind and load across EI regions. The improved expansion plan that combines generation and transmission will aid system planners and policy makers to maximize the social welfare. This study shows that modelling load and wind variations and diversities across balancing regions will produce significantly different expansion result compared with former studies. For example, if wind is modeled in more details (by increasing the number of wind output levels) so that more wind blocks are considered in expansion planning, transmission expansion will be larger and the expansion timing will be earlier. Regarding generation expansion, more wind scenarios will slightly reduce wind generation expansion in the EI system and increase the expansion of other generation such as gas. Also, adopting detailed wind scenarios will reveal that it may be uneconomic to expand transmission networks for transmitting a large amount of wind power through a long distance

  19. Burning high-level TRU waste in fusion fission reactors

    National Research Council Canada - National Science Library

    Shen, Yaosong

    2016-01-01

    .... A new method of burning high-level transuranic (TRU) waste combined with Thorium–Uranium (Th–U) fuel in the subcritical reactors driven by external fusion neutron sources is proposed in this paper...

  20. Parallel R

    CERN Document Server

    McCallum, Ethan

    2011-01-01

    It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.

  1. Both the caspase CSP-1 and a caspase-independent pathway promote programmed cell death in parallel to the canonical pathway for apoptosis in Caenorhabditis elegans.

    Directory of Open Access Journals (Sweden)

    Daniel P Denning

    Full Text Available Caspases are cysteine proteases that can drive apoptosis in metazoans and have critical functions in the elimination of cells during development, the maintenance of tissue homeostasis, and responses to cellular damage. Although a growing body of research suggests that programmed cell death can occur in the absence of caspases, mammalian studies of caspase-independent apoptosis are confounded by the existence of at least seven caspase homologs that can function redundantly to promote cell death. Caspase-independent programmed cell death is also thought to occur in the invertebrate nematode Caenorhabditis elegans. The C. elegans genome contains four caspase genes (ced-3, csp-1, csp-2, and csp-3, of which only ced-3 has been demonstrated to promote apoptosis. Here, we show that CSP-1 is a pro-apoptotic caspase that promotes programmed cell death in a subset of cells fated to die during C. elegans embryogenesis. csp-1 is expressed robustly in late pachytene nuclei of the germline and is required maternally for its role in embryonic programmed cell deaths. Unlike CED-3, CSP-1 is not regulated by the APAF-1 homolog CED-4 or the BCL-2 homolog CED-9, revealing that csp-1 functions independently of the canonical genetic pathway for apoptosis. Previously we demonstrated that embryos lacking all four caspases can eliminate cells through an extrusion mechanism and that these cells are apoptotic. Extruded cells differ from cells that normally undergo programmed cell death not only by being extruded but also by not being engulfed by neighboring cells. In this study, we identify in csp-3; csp-1; csp-2 ced-3 quadruple mutants apoptotic cell corpses that fully resemble wild-type cell corpses: these caspase-deficient cell corpses are morphologically apoptotic, are not extruded, and are internalized by engulfing cells. We conclude that both caspase-dependent and caspase-independent pathways promote apoptotic programmed cell death and the phagocytosis of cell

  2. High-Level Waste System Process Interface Description

    Energy Technology Data Exchange (ETDEWEB)

    d' Entremont, P.D.

    1999-01-14

    The High-Level Waste System is a set of six different processes interconnected by pipelines. These processes function as one large treatment plant that receives, stores, and treats high-level wastes from various generators at SRS and converts them into forms suitable for final disposal. The three major forms are borosilicate glass, which will be eventually disposed of in a Federal Repository, Saltstone to be buried on site, and treated water effluent that is released to the environment.

  3. Tank waste remediation system phase I high-level waste feed processability assessment report

    Energy Technology Data Exchange (ETDEWEB)

    Lambert, S.L.; Stegen, G.E., Westinghouse Hanford

    1996-08-01

    This report evaluates the effects of feed composition on the Phase I high-level waste immobilization process and interim storage facility requirements for the high-level waste glass.Several different Phase I staging (retrieval, blending, and pretreatment) scenarios were used to generate example feed compositions for glass formulations, testing, and glass sensitivity analysis. Glass models and data form laboratory glass studies were used to estimate achievable waste loading and corresponding glass volumes for various Phase I feeds. Key issues related to feed process ability, feed composition, uncertainty, and immobilization process technology are identified for future consideration in other tank waste disposal program activities.

  4. Lock-Free Parallel Access Collections

    Directory of Open Access Journals (Sweden)

    Bruce P. Lester

    2014-06-01

    Full Text Available All new computers have multicore processors. To exploit this hardware parallelism for improved performance, the predominant approach today is multithreading using shared variables and locks. This approach has potential data races that can create a nondeterministic program. This paper presents a promising new approach to parallel programming that is both lock-free and deterministic. The standard forall primitive for parallel execution of for-loop iterations is extended into a more highly structured primitive called a Parallel Operation (POP. Each parallel process created by a POP may read shared variables (or shared collections freely. Shared collections modified by a POP must be selected from a special set of predefined Parallel Access Collections (PAC. Each PAC has several Write Modes that govern parallel updates in a deterministic way. This paper presents an overview of a Prototype Library that implements this POP-PAC approach for the C++ language, including performance results for two benchmark parallel programs.

  5. Advanced High-Level Waste Glass Research and Development Plan

    Energy Technology Data Exchange (ETDEWEB)

    Peeler, David K. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Vienna, John D. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Schweiger, Michael J. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Fox, Kevin M. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL)

    2015-07-01

    The U.S. Department of Energy Office of River Protection (ORP) has implemented an integrated program to increase the loading of Hanford tank wastes in glass while meeting melter lifetime expectancies and process, regulatory, and product quality requirements. The integrated ORP program is focused on providing a technical, science-based foundation from which key decisions can be made regarding the successful operation of the Hanford Tank Waste Treatment and Immobilization Plant (WTP) facilities. The fundamental data stemming from this program will support development of advanced glass formulations, key process control models, and tactical processing strategies to ensure safe and successful operations for both the low-activity waste (LAW) and high-level waste (HLW) vitrification facilities with an appreciation toward reducing overall mission life. The purpose of this advanced HLW glass research and development plan is to identify the near-, mid-, and longer-term research and development activities required to develop and validate advanced HLW glasses and their associated models to support facility operations at WTP, including both direct feed and full pretreatment flowsheets. This plan also integrates technical support of facility operations and waste qualification activities to show the interdependence of these activities with the advanced waste glass (AWG) program to support the full WTP mission. Figure ES-1 shows these key ORP programmatic activities and their interfaces with both WTP facility operations and qualification needs. The plan is a living document that will be updated to reflect key advancements and mission strategy changes. The research outlined here is motivated by the potential for substantial economic benefits (e.g., significant increases in waste throughput and reductions in glass volumes) that will be realized when advancements in glass formulation continue and models supporting facility operations are implemented. Developing and applying advanced

  6. Easy and Effective Parallel Programmable ETL

    DEFF Research Database (Denmark)

    Thomsen, Christian; Pedersen, Torben Bach

    2011-01-01

    Extract–Transform–Load (ETL) programs are used to load data into data warehouses (DWs). An ETL program must extract data from sources, apply different transformations to it, and use the DW to look up/insert the data. It is both time consuming to develop and to run an ETL program. It is, however......, typically the case that the ETL program can exploit both task parallelism and data parallelism to run faster. This, on the other hand, makes the development time longer as it is complex to create a parallel ETL program. To remedy this situation, we propose efficient ways to parallelize typical ETL tasks...... and we implement these new constructs in an ETL framework. The constructs are easy to apply and do only require few modifications to an ETL program to parallelize it. They support both task and data parallelism and give the programmer different possibilities to choose from. An experimental evaluation...

  7. Evaluation of the FIR Example using Xilinx Vivado High-Level Synthesis Compiler

    Energy Technology Data Exchange (ETDEWEB)

    Jin, Zheming [Argonne National Lab. (ANL), Argonne, IL (United States); Finkel, Hal [Argonne National Lab. (ANL), Argonne, IL (United States); Yoshii, Kazutomo [Argonne National Lab. (ANL), Argonne, IL (United States); Cappello, Franck [Argonne National Lab. (ANL), Argonne, IL (United States)

    2017-07-28

    Compared to central processing units (CPUs) and graphics processing units (GPUs), field programmable gate arrays (FPGAs) have major advantages in reconfigurability and performance achieved per watt. This development flow has been augmented with high-level synthesis (HLS) flow that can convert programs written in a high-level programming language to Hardware Description Language (HDL). Using high-level programming languages such as C, C++, and OpenCL for FPGA-based development could allow software developers, who have little FPGA knowledge, to take advantage of the FPGA-based application acceleration. This improves developer productivity and makes the FPGA-based acceleration accessible to hardware and software developers. Xilinx Vivado HLS compiler is a high-level synthesis tool that enables C, C++ and System C specification to be directly targeted into Xilinx FPGAs without the need to create RTL manually. The white paper [1] published recently by Xilinx uses a finite impulse response (FIR) example to demonstrate the variable-precision features in the Vivado HLS compiler and the resource and power benefits of converting floating point to fixed point for a design. To get a better understanding of variable-precision features in terms of resource usage and performance, this report presents the experimental results of evaluating the FIR example using Vivado HLS 2017.1 and a Kintex Ultrascale FPGA. In addition, we evaluated the half-precision floating-point data type against the double-precision and single-precision data type and present the detailed results.

  8. Cascade Boosting-Based Object Detection from High-Level Description to Hardware Implementation

    Directory of Open Access Journals (Sweden)

    Khattab K

    2009-01-01

    Full Text Available Object detection forms the first step of a larger setup for a wide variety of computer vision applications. The focus of this paper is the implementation of a real-time embedded object detection system while relying on high-level description language such as SystemC. Boosting-based object detection algorithms are considered as the fastest accurate object detection algorithms today. However, the implementation of a real time solution for such algorithms is still a challenge. A new parallel implementation, which exploits the parallelism and the pipelining in these algorithms, is proposed. We show that using a SystemC description model paired with a mainstream automatic synthesis tool can lead to an efficient embedded implementation. We also display some of the tradeoffs and considerations, for this implementation to be effective. This implementation proves capable of achieving 42 fps for images as well as bringing regularity in time consuming.

  9. Parallel Execution of Prolog on Shared—Memory Multiprocessors

    Institute of Scientific and Technical Information of China (English)

    高耀清; 王鼎兴; 等

    1993-01-01

    Logic programs offer many opportunities for the exploitation of parallelism.But the parallel execution of a task incurs various overheads.This paper focuses on the issues relevant to parallelizing Prolog on shared-memory multiprocessors efficiently.

  10. Compiler Technology for Parallel Scientific Computation

    Directory of Open Access Journals (Sweden)

    Can Özturan

    1994-01-01

    Full Text Available There is a need for compiler technology that, given the source program, will generate efficient parallel codes for different architectures with minimal user involvement. Parallel computation is becoming indispensable in solving large-scale problems in science and engineering. Yet, the use of parallel computation is limited by the high costs of developing the needed software. To overcome this difficulty we advocate a comprehensive approach to the development of scalable architecture-independent software for scientific computation based on our experience with equational programming language (EPL. Our approach is based on a program decomposition, parallel code synthesis, and run-time support for parallel scientific computation. The program decomposition is guided by the source program annotations provided by the user. The synthesis of parallel code is based on configurations that describe the overall computation as a set of interacting components. Run-time support is provided by the compiler-generated code that redistributes computation and data during object program execution. The generated parallel code is optimized using techniques of data alignment, operator placement, wavefront determination, and memory optimization. In this article we discuss annotations, configurations, parallel code generation, and run-time support suitable for parallel programs written in the functional parallel programming language EPL and in Fortran.

  11. Process Design Concepts for Stabilization of High Level Waste Calcine

    Energy Technology Data Exchange (ETDEWEB)

    T. R. Thomas; A. K. Herbst

    2005-06-01

    The current baseline assumption is that packaging ¡§as is¡¨ and direct disposal of high level waste (HLW) calcine in a Monitored Geologic Repository will be allowed. The fall back position is to develop a stabilized waste form for the HLW calcine, that will meet repository waste acceptance criteria currently in place, in case regulatory initiatives are unsuccessful. A decision between direct disposal or a stabilization alternative is anticipated by June 2006. The purposes of this Engineering Design File (EDF) are to provide a pre-conceptual design on three low temperature processes under development for stabilization of high level waste calcine (i.e., the grout, hydroceramic grout, and iron phosphate ceramic processes) and to support a down selection among the three candidates. The key assumptions for the pre-conceptual design assessment are that a) a waste treatment plant would operate over eight years for 200 days a year, b) a design processing rate of 3.67 m3/day or 4670 kg/day of HLW calcine would be needed, and c) the performance of waste form would remove the HLW calcine from the hazardous waste category, and d) the waste form loadings would range from about 21-25 wt% calcine. The conclusions of this EDF study are that: (a) To date, the grout formulation appears to be the best candidate stabilizer among the three being tested for HLW calcine and appears to be the easiest to mix, pour, and cure. (b) Only minor differences would exist between the process steps of the grout and hydroceramic grout stabilization processes. If temperature control of the mixer at about 80„aC is required, it would add a major level of complexity to the iron phosphate stabilization process. (c) It is too early in the development program to determine which stabilizer will produce the minimum amount of stabilized waste form for the entire HLW inventory, but the volume is assumed to be within the range of 12,250 to 14,470 m3. (d) The stacked vessel height of the hot process vessels

  12. Parallel Programming Language with Shared Variable Holding Declaration%一种含共享变量维持声明的并行程序语言

    Institute of Scientific and Technical Information of China (English)

    汪晨; 张昱; 付小朋; 张伟

    2011-01-01

    现今的并行编程实践多采用锁来同步对共享资源的访问,编程难且易出错;新引入的原子区构造虽简化了编程,但支持其实现的软硬件技术尚不令人满意.本文就同步提出一种新的语言级抽象一共享变量维持声明,它允许程序员从局部于线程的观点声明当前线程对某共享变量s访问的维持需求,即声明当前线程在运行时从上次访问s到这次访问s期间不允许其他线程访问s.从而,程序员无须考虑该如何使用锁等具体机制来同步对共享变量的访问,也可以避免或解决原子区所面临的一些问题.本文给出了共享变量维持声明的语法和语义描述,讨论了由这种声明信息生成共享变量访问控制代码的方法.%Nowadays the way to synchronously access the shared resources using locks in parallel programming is difficult to write and error-prone. The newly language construct of atomic section promises to make parallel programming easier, but the software and hardware technology to support atomic sections is not satisfied. This paper presents a new language abstract on synchronization mechanisms , I. E. Holding declaration of shared variables. It allows programmers to declare the local holding requirement of some shared variable s in current thread, e. G. Declaring that it is forbidden to access the shared variable s from other threads since the latest access to s in current thread. Therefore, programmers need not consider how to access shared variables with concrete synchronization mechanisms, such as using locks and so on, and can avoid or solve some problems confronted by atomic sections. In the paper, the syntax and the semantic description of the shared variable holding declaration are presented, and methods for generating the shared variable access control code from such declaration information are discussed.

  13. SymexTRON: Symbolic Execution of High-Level Transformation Languages

    DEFF Research Database (Denmark)

    Al-Sibahi, Ahmad Salim; Dimovski, Aleksandar; Wasowski, Andrzej

    2016-01-01

    Transformations form an important part of developing domain specific languages, where they are used to provide semantics for typing and evaluation. Yet, few solutions exist for verifying transformations written in expressive high-level transformation languages. We take a step towards that goal......, by developing a general symbolic execution technique that handles programs written in these high-level transformation languages. We use logical constraints to describe structured symbolic values, including containment, acyclicity, simple unordered collections (sets) and to handle deep type-based querying...... of syntax hierarchies. We evaluate this symbolic execution technique on a collection of refactoring and model transformation programs, showing that the white-box test generation tool based on symbolic execution obtains better code coverage than a black box test generator for such programs in almost all...

  14. Synchronizing Parallel Tasks Using STM

    Directory of Open Access Journals (Sweden)

    Ryan Saptarshi Ray

    2015-03-01

    Full Text Available The past few years have marked the start of a historic transition from sequential to parallel computation. The necessity to write parallel programs is increasing as systems are getting more complex while processor speed increases are slowing down. Current parallel programming uses low-level programming constructs like threads and explicit synchronization using locks to coordinate thread execution. Parallel programs written with these constructs are difficult to design, program and debug. Also locks have many drawbacks which make them a suboptimal solution. One such drawback is that locks should be only used to enclose the critical section of the parallel-processing code. If locks are used to enclose the entire code then the performance of the code drastically decreases. Software Transactional Memory (STM is a promising new approach to programming shared-memory parallel processors. It is a concurrency control mechanism that is widely considered to be easier to use by programmers than locking. It allows portions of a program to execute in isolation, without regard to other, concurrently executing tasks. A programmer can reason about the correctness of code within a transaction and need not worry about complex interactions with other, concurrently executing parts of the program. If STM is used to enclose the entire code then the performance of the code is the same as that of the code in which STM is used to enclose the critical section only and is far better than code in which locks have been used to enclose the entire code. So STM is easier to use than locks as critical section does not need to be identified in case of STM. This paper shows the concept of writing code using Software Transactional Memory (STM and the performance comparison of codes using locks with those using STM. It also shows why the use of STM in parallel-processing code is better than the use of locks.

  15. Parallel Lines

    Directory of Open Access Journals (Sweden)

    James G. Worner

    2017-05-01

    Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship.   ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.

  16. High-level waste processing at the Savannah River Site: An update

    Energy Technology Data Exchange (ETDEWEB)

    Marra, J.E.; Bennett, W.M.; Elder, H.H.; Lee, E.D.; Marra, S.L.; Rutland, P.L.

    1997-09-01

    The Defense Waste Processing Facility (DWPF) at the Savannah River Site (SRS) in Aiken, SC mg began immobilizing high-level radioactive waste in borosilicate glass in 1996. Currently, the radioactive glass is being produced as a ``sludge-only`` composition by combining washed high-level waste sludge with glass frit. The glass is poured in stainless steel canisters which will eventually be disposed of in a permanent, geological repository. To date, DWPF has produced about 100 canisters of vitrified waste. Future processing operations will, be based on a ``coupled`` feed of washed high-level waste sludge, precipitated cesium, and glass frit. This paper provides an update of the processing activities completed to date, operational/flowsheet problems encountered, and programs underway to increase production rates.

  17. High level resistance to aminoglycosides in enterococci from Riyadh.

    Science.gov (United States)

    Al-Ballaa, S R; Qadri, S M; Al-Ballaa, S R; Kambal, A M; Saldin, H; Al-Qatary, K

    1994-07-01

    Enterococci with high level of aminoglycosides resistance are being reported from different parts of the world with increasing frequency. Treatment of infections caused by such isolates is associated with a high incidence of failure or relapse. This is attributed to the loss of the synergetic effect of aminoglycosides and cell wall active agents against isolates exhibiting this type of resistance. To determine the prevalence of enterococci with high level resistance to aminoglycosides in Riyadh, Saudi Arabia, 241 distinct clinical isolates were examined by disk diffusion method using high content aminoglycosides disks. Seventy-four isolates (30%) were resistant to one or more of the aminoglycosides tested. The most common pattern of resistance was that to streptomycin and kanamycin. Of the 241 isolates tested, 29 (12%) were resistant to high levels of gentamicin, 35 (15%) to tobramycin, 65 (27%) to kanamycin and 53 (22%) to streptomycin. The highest rate of resistance to a high level of gentamicin was found among enterococcal blood isolates (30%). Eighteen of the isolates were identified as Enterococcus faecium, 13 (72%) of these showed high level resistance to two or more of the aminoglycosides tested.

  18. Static Data Race Detection for X10 Parallel Programs%X10并行程序中静态数据竞争检测

    Institute of Scientific and Technical Information of China (English)

    王旭; 陈雨亭

    2012-01-01

    A multi-threaded program can contain a data race when two or more threads access the same memory location under no ordering constraints, and at least one access is a write operation. The existence of data races can lead to many kinds of harmful program behaviors, including determinism violations, corrupted memory, and so on. It proposes a new algorithm for static detection of data races in X10 parallel programs, which contains four steps: pairs of sources accesses computation, pairs of reachable accesses computation, pairs of clockwise accesses computation, and escape analysis of accessing pairs computation. The essential idea of this approach is to construct the call graph of the program on the basis of the WALA, and then to compute pairs of sources accesses for detecting potential data races. Experimental results show that the algorithm performs well and can find and report potential data races in a cost-effective manner.%在多线程程序中,当2个以上线程在没有顺序约束的条件下访问同一个存储单元时,且其中至少有一个为写访问,则可能会发生数据竞争.为此,提出一种针对X10并行程序的静态数据竞争检测算法,包括源访存对计算、可达访存对计算、时钟同步访存对计算和逃逸访存对计算4个阶段.通过在WALA框架中分析构建程序的调用图,计算源访存对集合,检测出内存访存中可能发生数据竞争的无序对.实验结果表明,该算法可以在不显著增加X10并行程序总体运行时间的情况下,达到比较理想的数据竞争检测效果.

  19. Study on Graphical Parallel Program Development System GPPDS%可视化并行程序开发环境GPPDS研究

    Institute of Scientific and Technical Information of China (English)

    郑利平; 刘晓平

    2008-01-01

    并行程序的编写、调试和性能分析十分复杂和困难,极大地阻碍了并行计算的普及,因此需要一个较完善的并行程序开发环境来帮助开发并行程序、监视程序运行和分析程序的性能,以减轻并行程序开发者的困难.本文针对消息传递类型的并行程序,研究并开发了一种可视化并行程序开发环境GPPDS(Graphical Parallel Program Development System),包括图形化并行程序开发模块、远程提交编译计算模块、性能数据监测模块以及性能可视化模块等.GPPDS是一个轻量的开发环境,功能实用、简单,便于使用.

  20. An analysis of the technical status of high level radioactive waste and spent fuel management systems

    Science.gov (United States)

    English, T.; Miller, C.; Bullard, E.; Campbell, R.; Chockie, A.; Divita, E.; Douthitt, C.; Edelson, E.; Lees, L.

    1977-01-01

    The technical status of the old U.S. mailine program for high level radioactive nuclear waste management, and the newly-developing program for disposal of unreprocessed spent fuel was assessed. The method of long term containment for both of these waste forms is considered to be deep geologic isolation in bedded salt. Each major component of both waste management systems is analyzed in terms of its scientific feasibility, technical achievability and engineering achievability. The resulting matrix leads to a systematic identification of major unresolved technical or scientific questions and/or gaps in these programs.

  1. Building high-level features using large scale unsupervised learning

    CERN Document Server

    Le, Quoc V; Devin, Matthieu; Corrado, Greg; Chen, Kai; Ranzato, Marc'Aurelio; Dean, Jeff; Ng, Andrew Y

    2011-01-01

    We consider the problem of building detectors for high-level concepts using only unsupervised feature learning. For example, we would like to understand if it is possible to learn a face detector using only unlabeled images downloaded from the internet. To answer this question, we trained a simple feature learning algorithm on a large dataset of images (10 million images, each image is 200x200). The simulation is performed on a cluster of 1000 machines with fast network hardware for one week. Extensive experimental results reveal surprising evidence that such high-level concepts can indeed be learned using only unlabeled data and a simple learning algorithm.

  2. Sterilization, high-level disinfection, and environmental cleaning.

    Science.gov (United States)

    Rutala, William A; Weber, David J

    2011-03-01

    Failure to perform proper disinfection and sterilization of medical devices may lead to introduction of pathogens, resulting in infection. New techniques have been developed for achieving high-level disinfection and adequate environmental cleanliness. This article examines new technologies for sterilization and high-level disinfection of critical and semicritical items, respectively, and because semicritical items carry the greatest risk of infection, the authors discuss reprocessing semicritical items such as endoscopes and automated endoscope reprocessors, endocavitary probes, prostate biopsy probes, tonometers, laryngoscopes, and infrared coagulation devices. In addition, current issues and practices associated with environmental cleaning are reviewed.

  3. High Level Waste (HLW) Feed Process Control Strategy

    Energy Technology Data Exchange (ETDEWEB)

    STAEHR, T.W.

    2000-06-14

    The primary purpose of this document is to describe the overall process control strategy for monitoring and controlling the functions associated with the Phase 1B high-level waste feed delivery. This document provides the basis for process monitoring and control functions and requirements needed throughput the double-shell tank system during Phase 1 high-level waste feed delivery. This document is intended to be used by (1) the developers of the future Process Control Plan and (2) the developers of the monitoring and control system.

  4. Final report on cermet high-level waste forms

    Energy Technology Data Exchange (ETDEWEB)

    Kobisk, E.H.; Quinby, T.C.; Aaron, W.S.

    1981-08-01

    Cermets are being developed as an alternate method for the fixation of defense and commercial high level radioactive waste in a terminal disposal form. Following initial feasibility assessments of this waste form, consisting of ceramic particles dispersed in an iron-nickel base alloy, significantly improved processing methods were developed. The characterization of cermets has continued through property determinations on samples prepared by various methods from a variety of simulated and actual high-level wastes. This report describes the status of development of the cermet waste form as it has evolved since 1977. 6 tables, 18 figures.

  5. Parallelizing Data-Centric Programs

    Science.gov (United States)

    2013-09-25

    manner. We focused on coordination related to data of some sort, for example the location of a joint trip or a joint tactical strike; we call this...2009. [AWS] Amazon web services, search engines & web crawlers . http://aws.amazon.com/search-engines/. [BJvOR03] Olaf Bonorden, Ben Juurlink, Ingo von

  6. Research of Parallel Programming Techniques of Hierarchical Model Based on SMP Clusters%基于SMP机群的层次化并行编程技术的研究

    Institute of Scientific and Technical Information of China (English)

    祝永志; 张丹丹; 曹宝香; 禹继国

    2012-01-01

    针对多核SMP机群的体系结构特点,讨论了MPI+ OpenMP混合并行程序设计技术.提出了一种多层次化混合设计新方法.设计了N-body问题的多层次化并行算法,并在曙光5000A机群上与传统的混合算法作了性能方面的比较.结果表明,该层次化混合并行算法具有更好的扩展性和加速比.%For multi-core SMP cluster systems, this paper discusses hybrid parallel programming techniques based on MPI and OpenMP.We propose a new hybrid parallel programming methods lhat are aware of architecture hierarchy on SMP cluster systems. We design a hierarchically parallel algorithm on the N-body problem, and compared its performance with traditional hybrid parallel algorithms on the Dawning 5000A cluster. The results indicate that our hierarchically hybrid parallel algorithm has better scalability and speedup than others.

  7. Typewriter Modifications for Persons Who Are High-Level Quadriplegics.

    Science.gov (United States)

    O'Reagan, James R.; And Others

    Standard, common electric typewriters are not completely suited to the needs of a high-level quadriplegic typing with a mouthstick. Experiences show that for complete control of a typewriter a mouthstick user needs the combined features of one-button correction, electric forward and reverse indexing, and easy character viewing. To modify a…

  8. Site suitability criteria for solidified high level waste repositories

    Energy Technology Data Exchange (ETDEWEB)

    Heckman, R.A.; Holdsworth, T.; Towse, D.F.

    1979-03-07

    Activities devoted to development of regulations, criteria, and standards for storage of solidified high-level radioactive wastes are reported. The work is summarized in sections on site suitability regulations, risk calculations, geological models, aquifer models, human usage model, climatology model, and repository characteristics. Proposed additional analytical work is also summarized. (JRD)

  9. High-Level Overview of Data Needs for RE Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Lopez, Anthony

    2016-12-22

    This presentation provides a high level overview of analysis topics and associated data needs. Types of renewable energy analysis are grouped into two buckets: First, analysis for renewable energy potential, and second, analysis for other goals. Data requirements are similar but and they build upon one another.

  10. Reachability Trees for High-level Petri Nets

    DEFF Research Database (Denmark)

    Jensen, Kurt; Jensen, Arne M.; Jepsen, Leif Obel;

    1986-01-01

    the necessary analysis methods. In other papers it is shown how to generalize the concept of place- and transition invariants from place/transition nets to high-level Petri nets. Our present paper contributes to this with a generalization of reachability trees, which is one of the other important analysis...

  11. High-level expression, purification, polyclonal antibody preparation ...

    African Journals Online (AJOL)

    user

    2011-02-14

    Feb 14, 2011 ... Full Length Research Paper. High-level expression ... resistance severely compromises effective therapeutic options. ... In the present study, we first report the expression of the oprD ... databases of National Center for Biotechnology Information (NCBI) ..... assembly of the head of bacteriophage T4. Nature.

  12. Murine erythrocytes contain high levels of lysophospholipase activity

    NARCIS (Netherlands)

    Kamp, J.A.F. op den; Roelofsen, B.; Sanderink, G.; Middelkoop, E.; Hamer, R.

    1984-01-01

    Murine erythrocytes were found to be unique in the high levels of lysophospholipase activity in the cytosol of these cells. The specific activity of the enzyme in the cytosol of the murine cells is 10-times higher than in the cytosol of rabbit erythrocytes and approximately three orders of magnitude

  13. Mercury Phase II Study - Mercury Behavior across the High-Level Waste Evaporator System

    Energy Technology Data Exchange (ETDEWEB)

    Bannochie, C. J. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Crawford, C. L. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Jackson, D. G. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL); Shah, H. B. [Savannah River Remediation, LLC., Aiken, SC (United States); Jain, V. [Savannah River Remediation, LLC., Aiken, SC (United States); Occhipinti, J. E. [Savannah River Remediation, LLC., Aiken, SC (United States); Wilmarth, W. R. [Savannah River Site (SRS), Aiken, SC (United States). Savannah River National Lab. (SRNL)

    2016-06-17

    The Mercury Program team’s effort continues to develop more fundamental information concerning mercury behavior across the liquid waste facilities and unit operations. Previously, the team examined the mercury chemistry across salt processing, including the Actinide Removal Process/Modular Caustic Side Solvent Extraction Unit (ARP/MCU), and the Defense Waste Processing Facility (DWPF) flowsheets. This report documents the data and understanding of mercury across the high level waste 2H and 3H evaporator systems.

  14. Parallel programming of heterogeneous system based on OpenCL%基于OpenCL的异构系统并行编程

    Institute of Scientific and Technical Information of China (English)

    詹云; 赵新灿; 谭同德

    2012-01-01

    针对异构处理器在传统通用计算中利用率低的问题,提出基于开放计算语言OpenCL (open computing language)的新的通用计算技术,它提供了统一的编程模型.介绍了OpenCL的特点、架构及实现原理等,并提出OpenCL性能优化策略.将OpenCL与计算统一设备架构CUDA (compute unified device architecture)及其它通用计算技术进行对比.对比结果表明,OpenCL能够充分发挥异构处理平台上各种处理器的性能潜力,充分合理地分配任务,为进行大规模并行计算提供了新的强有力的工具.%Aiming at the problem of the low utilization of heterogeneous processors in traditional general purpose computing, a new general purpose computing technology based on OpenCL (open computing language) is proposed to provide a unified programming model. Firstly, OpenCL features, framework and performance principles are described, and the performance optimization strategies of OpenCL are proposed. At last, OpenCL and CUDA (compute unified device architecture) are compared with other computing technologies. The result shows that OpenCL can fully excavate various processors' potential in heterogeneous processing platforms, and distribute tasks reasonably, and a new powerful tool for large-scale parallel computing is provided.

  15. RETENTION OF SULFATE IN HIGH LEVEL RADIOACTIVE WASTE GLASS

    Energy Technology Data Exchange (ETDEWEB)

    Fox, K.

    2010-09-07

    High level radioactive wastes are being vitrified at the Savannah River Site for long term disposal. Many of the wastes contain sulfate at concentrations that can be difficult to retain in borosilicate glass. This study involves efforts to optimize the composition of a glass frit for combination with the waste to improve sulfate retention while meeting other process and product performance constraints. The fabrication and characterization of several series of simulated waste glasses are described. The experiments are detailed chronologically, to provide insight into part of the engineering studies used in developing frit compositions for an operating high level waste vitrification facility. The results lead to the recommendation of a specific frit composition and a concentration limit for sulfate in the glass for the next batch of sludge to be processed at Savannah River.

  16. Storage of High Level Nuclear Waste in Germany

    Directory of Open Access Journals (Sweden)

    Dietmar P. F. Möller

    2007-01-01

    Full Text Available Nuclear energy is very often used to generate electricity. But first the energy must be released from atoms what can be done in two ways: nuclear fusion and nuclear fission. Nuclear power plants use nuclear fission to produce electrical energy. The electrical energy generated in nuclear power plants does not produce polluting combustion gases but a renewable energy, an important fact that could play a key role helping to reduce global greenhouse gas emissions and tackling global warming especially as the electricity energy demand rises in the years ahead. This could be assumed as an ideal win-win situation, but the reverse site of the medal is that the production of high-level nuclear waste outweighs this advantage. Hence the paper attempt to highlight the possible state-of-art concepts for the safe and sustaining storage of high-level nuclear waste in Germany.

  17. Web Based Technologies to Support High Level Process Maturity

    Directory of Open Access Journals (Sweden)

    A. V. Sharmila

    2013-07-01

    Full Text Available This paper discusses the uses of Web based Technologies to support High Level Process Maturity in an organization. It also provides an overview of CMMI, focusing on the importance of centralized data storage and data access for sustaining high maturity levels of CMMI. Further, elaboration is made on the web based technology, stressing that change over to Web Based Application is extremely helpful to maintain the centralized data repository, to collect data for process capability baseline, and to track process performance management, with reduced maintenance effort and ease of data access. A case study analysis of advantages of adopting Web Based Technology is also narrated. Finally the paper concludes that the sustenance of High level Process maturity can be achieved by adopting web application technology.

  18. VHDL Specification Methodology from High-level Specification

    Directory of Open Access Journals (Sweden)

    M. Benmohammed

    2005-01-01

    Full Text Available Design complexity has been increasing exponentially this last decade. In order to cope with such an increase and to keep up designers' productivity, higher level specifications were required. Moreover new synthesis systems, starting with a high level specification, have been developed in order to automate and speed up processor design. This study presents a VHDL specification methodology aimed to extend structured design methodologies to the behavioral level. The goal is to develop VHDL modeling strategies in order to master the design and analysis of large and complex systems. Structured design methodologies are combined with a high-level synthesis system, a VHDL based behavioral synthesis tool, in order to allow hierarchical design and component re-use.

  19. 基于CC-NUMA系统模拟器的并行程序性能分析%Performance Analysis of Parallel Programs Based on the CC-NUMA System Simulator

    Institute of Scientific and Technical Information of China (English)

    陈渝; 庞立会; 杨学军; 陈福接

    2001-01-01

    针对CC-NUMA并行系统的特点,本文描述了模拟器—AMY的设计与实现。该模拟器运行在x86 PC机上的Linux操作系统环境下,采用多项优化技术,能够较精确地统计并行程序的时间开销和CC-NUMA并行系统的各项参数,具有执行速度快、精度高和内存开销小等特点。在AMY模拟器环境下,通过对几个典型的并行测试程序的模拟执行,文章给出了统计的模拟结果,分析了并行测试程序的执行行为和开销,最后得出了在CC-NUMA并行系统中对并行程序进行性能优化的有益的指导原则。%According to the characteristics of the CC-NUMA parallel system,this paper introduces a simulator for the CC-NUMA system AMY that runs on the Linux Operating System in common x86 PCs.AMY adopts many optimization technologies,so it can state the time overhead of parallel programs and statistic parameters of CC-NUMA very fast and accurately with little memory overhead.According to the execution of some parallel benchmarks on AMY,this paper state the simulated results,analyzes the behavior and overhead of parallel benchmarks and concludes the helpful guideline of optimizing parallel programs in the CC-NUMA parallel system.

  20. Software For Diagnosis Of Parallel Processing

    Science.gov (United States)

    Hontalas, Philip; Yan, Jerry; Fineman, Charles

    1995-01-01

    Ames Instrumentation System (AIMS) computer program package of software tools measuring and analyzing performances of parallel-processing application programs. Helps programmer to debug and refine, and to monitor and visualize execution of, parallel-processing application software for Intel iPSC/860 (or equivalent) multicomputer. Performance data collected displayed graphically on computer workstations supporting X-Windows.

  1. WAPM:适合广域分布式计算的并行编程模型%WAPM: A parallel programming model inlarge scale Internet distributed computing environments

    Institute of Scientific and Technical Information of China (English)

    付崇国; 徐胜超

    2009-01-01

    早期的MPI与OpenMP等编程模型由于扩展性限制或并行粒度的差异而不适合于大规模的广域动态Internet环境.提出了一个用于广域网络范围内的并行编程模型(WAPM),为应用的分布式计算的编程提供了一个新的可行解决方案.WAPM由通信库、通信协议和应用编程接口组成,并且具有通用编程、自适应并行、容错性等特点,通过选择合适的编程语言,就可形成一个广域范围内的并行程序设计环境.以分布式计算平台P2HP为工作平台,描述了WAPM分布式计算的实施过程.实验结果表明,WAPM是一个通用的、可行的、性能较好的编程模型.%Programming models like MPI or OpenMP are not suitable for large-scale Internet environments because of scalability or parallel grain issues. In this paper, a novel parallel programming model in large-scale Internet environments called Wide Area Programming Model (WAPM), which provided a feasible way for parallel programming, was designed and implemented. WAPM includes three modules: communication library, communication protocol and application programming interface. WAPM is a good programming model, and is strongly supported by its general programming, adaptive parallelism and fault tolerance. An example application was also demonstrated with WAPM on a specific distributed computing platform. In order to testify the efficiency of WAPM, a serial of simulation experiments were done. The results obtained from performance analysis show that WAPM is a general and feasible approach for parallel programming.

  2. Case for retrievable high-level nuclear waste disposal

    Science.gov (United States)

    Roseboom, Eugene H.

    1994-01-01

    Plans for the nation's first high-level nuclear waste repository have called for permanently closing and sealing the repository soon after it is filled. However, the hydrologic environment of the proposed site at Yucca Mountain, Nevada, should allow the repository to be kept open and the waste retrievable indefinitely. This would allow direct monitoring of the repository and maintain the options for future generations to improve upon the disposal methods or use the uranium in the spent fuel as an energy resource.

  3. Online pattern recognition for the ALICE high level trigger

    CERN Document Server

    Bramm, R; Lien, J A; Lindenstruth, V; Loizides, C; Röhrich, D; Skaali, B; Steinbeck, T M; Stock, Reinhard; Ullaland, K; Vestbø, A S; Wiebalck, A

    2003-01-01

    The ALICE High Level Trigger system needs to reconstruct events online at high data rates. Focusing on the Time Projection Chamber we present two pattern recognition methods under investigation: the sequential approach (cluster finding, track follower) and the iterative approach (Hough Transform, cluster assignment, re-fitting). The implementation of the former in hardware indicates that we can reach the designed inspection rate for p-p collisions of 1 kHz with 98% efficiency.

  4. Online pattern recognition for the ALICE high level trigger

    Energy Technology Data Exchange (ETDEWEB)

    Bramm, R.; Helstrup, H.; Lien, J.; Lindenstruth, V.; Loizides, C. E-mail: loizides@ikf.uni-frankfurt.de; Rohrich, D.; Skaali, B.; Steinbeck, T.; Stock, R.; Ullaland, K.; Vestboe, A.; Wiebalck, A

    2003-04-21

    The ALICE High Level Trigger system needs to reconstruct events online at high data rates. Focusing on the Time Projection Chamber we present two pattern recognition methods under investigation: the sequential approach (cluster finding, track follower) and the iterative approach (Hough Transform, cluster assignment, re-fitting). The implementation of the former in hardware indicates that we can reach the designed inspection rate for p-p collisions of 1 kHz with 98% efficiency.

  5. High-level Component Interfaces for Collaborative Development: A Proposal

    Directory of Open Access Journals (Sweden)

    Thomas Marlowe

    2009-12-01

    Full Text Available Software development has rapidly moved toward collaborative development models where multiple partners collaborate in creating and evolving software intensive systems or components of sophisticated ubiquitous socio-technical-ecosystems. In this paper we extend the concept of software interface to a flexible high-level interface as means for accommodating change and localizing, controlling and managing the exchange of knowledge and functional, behavioral, quality, project and business related information between the partners and between the developed components.

  6. Nuclear reactor high-level waste: origin and safe disposal

    Energy Technology Data Exchange (ETDEWEB)

    Chua, C.; Tsipis, K. (Massachusetts Inst. of Tech., Cambridge, MA (USA))

    High-level waste (HLW) is a natural component of the nuclear fuel cycle. Because of its radioactivity, HLW needs to be handled with great care. Different alternatives for permanently storing HLW are evaluated. Studies have shown that the disposal of HLW is safest when the waste is first vitrified before storage. Simple calculations show that vitrified HLW that is properly buried in deep, carefully chosen crystalline rock structures poses insignificant health risks. (author).

  7. Mixing Processes in High-Level Waste Tanks - Final Report

    Energy Technology Data Exchange (ETDEWEB)

    Peterson, P.F.

    1999-05-24

    The mixing processes in large, complex enclosures using one-dimensional differential equations, with transport in free and wall jets is modeled using standard integral techniques. With this goal in mind, we have constructed a simple, computationally efficient numerical tool, the Berkeley Mechanistic Mixing Model, which can be used to predict the transient evolution of fuel and oxygen concentrations in DOE high-level waste tanks following loss of ventilation, and validate the model against a series of experiments.

  8. Execution of a High Level Real-Time Language

    OpenAIRE

    Luqi; Berzins, Valdis

    1988-01-01

    Prototype System Description Language (PSDL) is a high level real-time language with special features for hard real-time system specification and design. It can be used to firm up requirements through execution of its software prototypes The language is designed based on a real-time model merging data and control flow and its implementation is beyond conventional compiler technology because of the need to meet real-time constraints. In this paper we describe and illustrate our research result...

  9. Theory and Methods for Supporting High Level Military Decisionmaking

    Science.gov (United States)

    2007-01-01

    Gompert, and Kugler, 1996; Davis, 2002a). The relationship between defense applications and finance is more metaphorical than mathematical. A...be summarized as the fractal problem: • • 62 Theory and Methods for Supporting High-Level Military Decisionmaking Describing objectives...strategies, tactics, and tasks is a fractal matter—i.e., the concepts apply and are needed at each level, whether that of the president, the theater commander

  10. Thermoelastic analysis of spent fuel and high level radioactive waste repositories in salt. A semi-analytical solution. [JUDITH

    Energy Technology Data Exchange (ETDEWEB)

    St. John, C.M.

    1977-04-01

    An underground repository containing heat generating, High Level Waste or Spent Unreprocessed Fuel may be approximated as a finite number of heat sources distributed across the plane of the repository. The resulting temperature, displacement and stress changes may be calculated using analytical solutions, providing linear thermoelasticity is assumed. This report documents a computer program based on this approach and gives results that form the basis for a comparison between the effects of disposing of High Level Waste and Spent Unreprocessed Fuel.

  11. High-Level Development of Multiserver Online Games

    Directory of Open Access Journals (Sweden)

    Frank Glinka

    2008-01-01

    Full Text Available Multiplayer online games with support for high user numbers must provide mechanisms to support an increasing amount of players by using additional resources. This paper provides a comprehensive analysis of the practically proven multiserver distribution mechanisms, zoning, instancing, and replication, and the tasks for the game developer implied by them. We propose a novel, high-level development approach which integrates the three distribution mechanisms seamlessly in today's online games. As a possible base for this high-level approach, we describe the real-time framework (RTF middleware system which liberates the developer from low-level tasks and allows him to stay at high level of design abstraction. We explain how RTF supports the implementation of single-server online games and how RTF allows to incorporate the three multiserver distribution mechanisms during the development process. Finally, we describe briefly how RTF provides manageability and maintenance functionality for online games in a grid context with dynamic resource allocation scenarios.

  12. Pattern-Driven Automatic Parallelization

    Directory of Open Access Journals (Sweden)

    Christoph W. Kessler

    1996-01-01

    Full Text Available This article describes a knowledge-based system for automatic parallelization of a wide class of sequential numerical codes operating on vectors and dense matrices, and for execution on distributed memory message-passing multiprocessors. Its main feature is a fast and powerful pattern recognition tool that locally identifies frequently occurring computations and programming concepts in the source code. This tool also works for dusty deck codes that have been "encrypted" by former machine-specific code transformations. Successful pattern recognition guides sophisticated code transformations including local algorithm replacement such that the parallelized code need not emerge from the sequential program structure by just parallelizing the loops. It allows access to an expert's knowledge on useful parallel algorithms, available machine-specific library routines, and powerful program transformations. The partially restored program semantics also supports local array alignment, distribution, and redistribution, and allows for faster and more exact prediction of the performance of the parallelized target code than is usually possible.

  13. 面向任务的TBB多核集群混合并行编程模型%TBB Task-oriented Mixed-parallel Programming Model for Multi-Core Cluster

    Institute of Scientific and Technical Information of China (English)

    顾慧; 郑晓薇; 张建强; 吴华平

    2011-01-01

    构建了一种适用于多核集群的混合并行编程模型.该模型融合了共享内存的面向任务的TBB编程和基于消息传递的MPI编程两种模式.结合两者的优势,实现进程到处理节点和进程内线程到处理器核的两级并行.相对于单一编程方式下的程序性能,采用这种混合并行编程模型的算法不但可以减少程序执行时间,获得更好的加速比和执行效率,而且明显地提高了集群性能.%To take full advantage of the structural characteristics of multi-core clusters and enhance the use of CPU each core efficiently, a multi-core cluster hybrid parallel programming model is constructed. The model combines the shared memory and the TBB task-oriented programming and MPI message-passing programming mode. Using their advantages, achieve the hierarchical parallel programming of processes to nodes and threads to the processor core.Compare with the algorithm of the single parallel programming model, this programming mode not only can reduce the program execution time, get a better speedup and execution efficiency, but also improve the cluster performance greatly.

  14. Parallelization of the molecular dynamics code GROMOS87 for distributed memory parallel architectures

    NARCIS (Netherlands)

    Green, DG; Meacham, KE; vanHoesel, F; Hertzberger, B; Serazzi, G

    1995-01-01

    This paper describes the techniques and methodologies employed during parallelization of the Molecular Dynamics (MD) code GROMOS87, with the specific requirement that the program run efficiently on a range of distributed-memory parallel platforms. We discuss the preliminary results of our parallel

  15. High-Level Synthesis: Productivity, Performance, and Software Constraints

    Directory of Open Access Journals (Sweden)

    Yun Liang

    2012-01-01

    Full Text Available FPGAs are an attractive platform for applications with high computation demand and low energy consumption requirements. However, design effort for FPGA implementations remains high—often an order of magnitude larger than design effort using high-level languages. Instead of this time-consuming process, high-level synthesis (HLS tools generate hardware implementations from algorithm descriptions in languages such as C/C++ and SystemC. Such tools reduce design effort: high-level descriptions are more compact and less error prone. HLS tools promise hardware development abstracted from software designer knowledge of the implementation platform. In this paper, we present an unbiased study of the performance, usability and productivity of HLS using AutoPilot (a state-of-the-art HLS tool. In particular, we first evaluate AutoPilot using the popular embedded benchmark kernels. Then, to evaluate the suitability of HLS on real-world applications, we perform a case study of stereo matching, an active area of computer vision research that uses techniques also common for image denoising, image retrieval, feature matching, and face recognition. Based on our study, we provide insights on current limitations of mapping general-purpose software to hardware using HLS and some future directions for HLS tool development. We also offer several guidelines for hardware-friendly software design. For popular embedded benchmark kernels, the designs produced by HLS achieve 4X to 126X speedup over the software version. The stereo matching algorithms achieve between 3.5X and 67.9X speedup over software (but still less than manual RTL design with a fivefold reduction in design effort versus manual RTL design.

  16. High level radioactive waste (HLW) disposal a global challenge

    CERN Document Server

    PUSCH, R; NAKANO, M

    2011-01-01

    High Level Radioactive Waste (HLW) Disposal, A Global Challenge presents the most recent information on proposed methods of disposal for the most dangerous radioactive waste and for assessing their function from short- and long-term perspectives. It discusses new aspects of the disposal of such waste, especially HLW.The book is unique in the literature in making it clear that, due to tectonics and long-term changes in rock structure, rock can serve only as a ""mechanical support to the chemical apparatus"" and that effective containment of hazardous elements can only be managed by properly des

  17. High level trigger online calibration framework in ALICE

    Energy Technology Data Exchange (ETDEWEB)

    Bablok, S R; Djuvsland, Oe; Kanaki, K; Nystrand, J; Richter, M; Roehrich, D; Skjerdal, K; Ullaland, K; Oevrebekk, G; Larsen, D; Alme, J [Department of Physics and Technology, University of Bergen (Norway); Alt, T; Lindenstruth, V; Steinbeck, T M; Thaeder, J; Kebschull, U; Boettger, S; Kalcher, S; Lara, C; Panse, R [Kirchhoff Institute of Physics, Ruprecht-Karls-University Heidelberg (Germany)], E-mail: Sebastian.Bablok@uib.no (and others)

    2008-07-01

    The ALICE High Level Trigger (HLT) is designed to perform event analysis of heavy ion and proton-proton collisions as well as calibration calculations online. A large PC farm, currently under installation, enables analysis algorithms to process these computationally intensive tasks. The HLT receives event data from all major detectors in ALICE. Interfaces to the various other systems provide the analysis software with required additional information. Processed results are sent back to the corresponding systems. To allow online performance monitoring of the detectors an interface for visualizing these results has been developed.

  18. Market Designs for High Levels of Variable Generation: Preprint

    Energy Technology Data Exchange (ETDEWEB)

    Milligan, M.; Holttinen, H.; Kiviluoma, J.; Orths, A.; Lynch, M.; Soder, L.

    2014-10-01

    Variable renewable generation is increasing in penetration in modern power systems, leading to higher variability in the supply and price of electricity as well as lower average spot prices. This raises new challenges, particularly in ensuring sufficient capacity and flexibility from conventional technologies. Because the fixed costs and lifetimes of electricity generation investments are significant, designing markets and regulations that ensure the efficient integration of renewable generation is a significant challenge. This papers reviews the state of play of market designs for high levels of variable generation in the United States and Europe and considers new developments in both regions.

  19. High-level neutron coincidence counter maintenance manual

    Energy Technology Data Exchange (ETDEWEB)

    Swansen, J.; Collinsworth, P.

    1983-05-01

    High-level neutron coincidence counter operational (field) calibration and usage is well known. This manual makes explicit basic (shop) check-out, calibration, and testing of new units and is a guide for repair of failed in-service units. Operational criteria for the major electronic functions are detailed, as are adjustments and calibration procedures, and recurrent mechanical/electromechanical problems are addressed. Some system tests are included for quality assurance. Data on nonstandard large-scale integrated (circuit) components and a schematic set are also included.

  20. High Level Synthesis for Loop-Based BIST

    Institute of Scientific and Technical Information of China (English)

    李晓维; 张英相

    2000-01-01

    Area and test time are two major overheads encountered during data path high level synthesis for BIST. This paper presents an approach to behavioral synthesis for loop-based BIST. By taking into account the requirements of the BIST scheme during behavioral synthesis processes, an area optimal BIST solution can be obtained. This approach is based on the use of test resources reusability that results in a fewer number of registers being modified to be test registers. This is achieved by incorporating self-testability constraints during register assignment operations. Experimental results on benchmarks are presented to demonstrate the effectiveness of the approach.