WorldWideScience

Sample records for monolithic parallel processor

  1. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    These proceedings contain the articles presented at the named conference. These concern hardware and software for vector and parallel processors, numerical methods and algorithms for the computation on such processors, as well as applications of such methods to different fields of physics and related sciences. See hints under the relevant topics. (HSI)

  2. An interactive parallel processor for data analysis

    International Nuclear Information System (INIS)

    Mong, J.; Logan, D.; Maples, C.; Rathbun, W.; Weaver, D.

    1984-01-01

    A parallel array of eight minicomputers has been assembled in an attempt to deal with kiloparameter data events. By exporting computer system functions to a separate processor, the authors have been able to achieve computer amplification linearly proportional to the number of executing processors

  3. Aspects of computation on asynchronous parallel processors

    International Nuclear Information System (INIS)

    Wright, M.

    1989-01-01

    The increasing availability of asynchronous parallel processors has provided opportunities for original and useful work in scientific computing. However, the field of parallel computing is still in a highly volatile state, and researchers display a wide range of opinion about many fundamental questions such as models of parallelism, approaches for detecting and analyzing parallelism of algorithms, and tools that allow software developers and users to make effective use of diverse forms of complex hardware. This volume collects the work of researchers specializing in different aspects of parallel computing, who met to discuss the framework and the mechanics of numerical computing. The far-reaching impact of high-performance asynchronous systems is reflected in the wide variety of topics, which include scientific applications (e.g. linear algebra, lattice gauge simulation, ordinary and partial differential equations), models of parallelism, parallel language features, task scheduling, automatic parallelization techniques, tools for algorithm development in parallel environments, and system design issues

  4. Parallel processor for fast event analysis

    International Nuclear Information System (INIS)

    Hensley, D.C.

    1983-01-01

    Current maximum data rates from the Spin Spectrometer of approx. 5000 events/s (up to 1.3 MBytes/s) and minimum analysis requiring at least 3000 operations/event require a CPU cycle time near 70 ns. In order to achieve an effective cycle time of 70 ns, a parallel processing device is proposed where up to 4 independent processors will be implemented in parallel. The individual processors are designed around the Am2910 Microsequencer, the AM29116 μP, and the Am29517 Multiplier. Satellite histogramming in a mass memory system will be managed by a commercial 16-bit μP system

  5. Parallel processor programs in the Federal Government

    Science.gov (United States)

    Schneck, P. B.; Austin, D.; Squires, S. L.; Lehmann, J.; Mizell, D.; Wallgren, K.

    1985-01-01

    In 1982, a report dealing with the nation's research needs in high-speed computing called for increased access to supercomputing resources for the research community, research in computational mathematics, and increased research in the technology base needed for the next generation of supercomputers. Since that time a number of programs addressing future generations of computers, particularly parallel processors, have been started by U.S. government agencies. The present paper provides a description of the largest government programs in parallel processing. Established in fiscal year 1985 by the Institute for Defense Analyses for the National Security Agency, the Supercomputing Research Center will pursue research to advance the state of the art in supercomputing. Attention is also given to the DOE applied mathematical sciences research program, the NYU Ultracomputer project, the DARPA multiprocessor system architectures program, NSF research on multiprocessor systems, ONR activities in parallel computing, and NASA parallel processor projects.

  6. Vector and parallel processors in computational science

    International Nuclear Information System (INIS)

    Duff, I.S.; Reid, J.K.

    1985-01-01

    This book presents the papers given at a conference which reviewed the new developments in parallel and vector processing. Topics considered at the conference included hardware (array processors, supercomputers), programming languages, software aids, numerical methods (e.g., Monte Carlo algorithms, iterative methods, finite elements, optimization), and applications (e.g., neutron transport theory, meteorology, image processing)

  7. Multibus-based parallel processor for simulation

    Science.gov (United States)

    Ogrady, E. P.; Wang, C.-H.

    1983-01-01

    A Multibus-based parallel processor simulation system is described. The system is intended to serve as a vehicle for gaining hands-on experience, testing system and application software, and evaluating parallel processor performance during development of a larger system based on the horizontal/vertical-bus interprocessor communication mechanism. The prototype system consists of up to seven Intel iSBC 86/12A single-board computers which serve as processing elements, a multiple transmission controller (MTC) designed to support system operation, and an Intel Model 225 Microcomputer Development System which serves as the user interface and input/output processor. All components are interconnected by a Multibus/IEEE 796 bus. An important characteristic of the system is that it provides a mechanism for a processing element to broadcast data to other selected processing elements. This parallel transfer capability is provided through the design of the MTC and a minor modification to the iSBC 86/12A board. The operation of the MTC, the basic hardware-level operation of the system, and pertinent details about the iSBC 86/12A and the Multibus are described.

  8. Lattice gauge theory using parallel processors

    International Nuclear Information System (INIS)

    Lee, T.D.; Chou, K.C.; Zichichi, A.

    1987-01-01

    The book's contents include: Lattice Gauge Theory Lectures: Introduction and Current Fermion Simulations; Monte Carlo Algorithms for Lattice Gauge Theory; Specialized Computers for Lattice Gauge Theory; Lattice Gauge Theory at Finite Temperature: A Monte Carlo Study; Computational Method - An Elementary Introduction to the Langevin Equation, Present Status of Numerical Quantum Chromodynamics; Random Lattice Field Theory; The GF11 Processor and Compiler; and The APE Computer and First Physics Results; Columbia Supercomputer Project: Parallel Supercomputer for Lattice QCD; Statistical and Systematic Errors in Numerical Simulations; Monte Carlo Simulation for LGT and Programming Techniques on the Columbia Supercomputer; Food for Thought: Five Lectures on Lattice Gauge Theory

  9. Programming massively parallel processors a hands-on approach

    CERN Document Server

    Kirk, David B

    2010-01-01

    Programming Massively Parallel Processors discusses basic concepts about parallel programming and GPU architecture. ""Massively parallel"" refers to the use of a large number of processors to perform a set of computations in a coordinated parallel way. The book details various techniques for constructing parallel programs. It also discusses the development process, performance level, floating-point format, parallel patterns, and dynamic parallelism. The book serves as a teaching guide where parallel programming is the main topic of the course. It builds on the basics of C programming for CUDA, a parallel programming environment that is supported on NVI- DIA GPUs. Composed of 12 chapters, the book begins with basic information about the GPU as a parallel computer source. It also explains the main concepts of CUDA, data parallelism, and the importance of memory access efficiency using CUDA. The target audience of the book is graduate and undergraduate students from all science and engineering disciplines who ...

  10. Simulation of a parallel processor on a serial processor: The neutron diffusion equation

    International Nuclear Information System (INIS)

    Honeck, H.C.

    1981-01-01

    Parallel processors could provide the nuclear industry with very high computing power at a very moderate cost. Will we be able to make effective use of this power. This paper explores the use of a very simple parallel processor for solving the neutron diffusion equation to predict power distributions in a nuclear reactor. We first describe a simple parallel processor and estimate its theoretical performance based on the current hardware technology. Next, we show how the parallel processor could be used to solve the neutron diffusion equation. We then present the results of some simulations of a parallel processor run on a serial processor and measure some of the expected inefficiencies. Finally we extrapolate the results to estimate how actual design codes would perform. We find that the standard numerical methods for solving the neutron diffusion equation are still applicable when used on a parallel processor. However, some simple modifications to these methods will be necessary if we are to achieve the full power of these new computers. (orig.) [de

  11. On the effective parallel programming of multi-core processors

    NARCIS (Netherlands)

    Varbanescu, A.L.

    2010-01-01

    Multi-core processors are considered now the only feasible alternative to the large single-core processors which have become limited by technological aspects such as power consumption and heat dissipation. However, due to their inherent parallel structure and their diversity, multi-cores are

  12. The study of image processing of parallel digital signal processor

    International Nuclear Information System (INIS)

    Liu Jie

    2000-01-01

    The author analyzes the basic characteristic of parallel DSP (digital signal processor) TMS320C80 and proposes related optimized image algorithm and the parallel processing method based on parallel DSP. The realtime for many image processing can be achieved in this way

  13. Fast parallel computation of polynomials using few processors

    DEFF Research Database (Denmark)

    Valiant, Leslie; Skyum, Sven

    1981-01-01

    It is shown that any multivariate polynomial that can be computed sequentially in C steps and has degree d can be computed in parallel in 0((log d) (log C + log d)) steps using only (Cd)0(1) processors....

  14. Event analysis using a massively parallel processor

    International Nuclear Information System (INIS)

    Bale, A.; Gerelle, E.; Messersmith, J.; Warren, R.; Hoek, J.

    1990-01-01

    This paper describes a system for performing histogramming of n-tuple data at interactive rates using a commercial SIMD processor array connected to a work-station running the well-known Physics Analysis Workstation software (PAW). Results indicate that an order of magnitude performance improvement over current RISC technology is easily achievable

  15. Particle simulation on a distributed memory highly parallel processor

    International Nuclear Information System (INIS)

    Sato, Hiroyuki; Ikesaka, Morio

    1990-01-01

    This paper describes parallel molecular dynamics simulation of atoms governed by local force interaction. The space in the model is divided into cubic subspaces and mapped to the processor array of the CAP-256, a distributed memory, highly parallel processor developed at Fujitsu Labs. We developed a new technique to avoid redundant calculation of forces between atoms in different processors. Experiments showed the communication overhead was less than 5%, and the idle time due to load imbalance was less than 11% for two model problems which contain 11,532 and 46,128 argon atoms. From the software simulation, the CAP-II which is under development is estimated to be about 45 times faster than CAP-256 and will be able to run the same problem about 40 times faster than Fujitsu's M-380 mainframe when 256 processors are used. (author)

  16. Global synchronization of parallel processors using clock pulse width modulation

    Science.gov (United States)

    Chen, Dong; Ellavsky, Matthew R.; Franke, Ross L.; Gara, Alan; Gooding, Thomas M.; Haring, Rudolf A.; Jeanson, Mark J.; Kopcsay, Gerard V.; Liebsch, Thomas A.; Littrell, Daniel; Ohmacht, Martin; Reed, Don D.; Schenck, Brandon E.; Swetz, Richard A.

    2013-04-02

    A circuit generates a global clock signal with a pulse width modification to synchronize processors in a parallel computing system. The circuit may include a hardware module and a clock splitter. The hardware module may generate a clock signal and performs a pulse width modification on the clock signal. The pulse width modification changes a pulse width within a clock period in the clock signal. The clock splitter may distribute the pulse width modified clock signal to a plurality of processors in the parallel computing system.

  17. Scientific programming on massively parallel processor CP-PACS

    International Nuclear Information System (INIS)

    Boku, Taisuke

    1998-01-01

    The massively parallel processor CP-PACS takes various problems of calculation physics as the object, and it has been designed so that its architecture has been devised to do various numerical processings. In this report, the outline of the CP-PACS and the example of programming in the Kernel CG benchmark in NAS Parallel Benchmarks, version 1, are shown, and the pseudo vector processing mechanism and the parallel processing tuning of scientific and technical computation utilizing the three-dimensional hyper crossbar net, which are two great features of the architecture of the CP-PACS are described. As for the CP-PACS, the PUs based on RISC processor and added with pseudo vector processor are used. Pseudo vector processing is realized as the loop processing by scalar command. The features of the connection net of PUs are explained. The algorithm of the NPB version 1 Kernel CG is shown. The part that takes the time for processing most in the main loop is the product of matrix and vector (matvec), and the parallel processing of the matvec is explained. The time for the computation by the CPU is determined. As the evaluation of the performance, the evaluation of the time for execution, the short vector processing of pseudo vector processor based on slide window, and the comparison with other parallel computers are reported. (K.I.)

  18. Intelligent spatial ecosystem modeling using parallel processors

    International Nuclear Information System (INIS)

    Maxwell, T.; Costanza, R.

    1993-01-01

    Spatial modeling of ecosystems is essential if one's modeling goals include developing a relatively realistic description of past behavior and predictions of the impacts of alternative management policies on future ecosystem behavior. Development of these models has been limited in the past by the large amount of input data required and the difficulty of even large mainframe serial computers in dealing with large spatial arrays. These two limitations have begun to erode with the increasing availability of remote sensing data and GIS systems to manipulate it, and the development of parallel computer systems which allow computation of large, complex, spatial arrays. Although many forms of dynamic spatial modeling are highly amenable to parallel processing, the primary focus in this project is on process-based landscape models. These models simulate spatial structure by first compartmentalizing the landscape into some geometric design and then describing flows within compartments and spatial processes between compartments according to location-specific algorithms. The authors are currently building and running parallel spatial models at the regional scale for the Patuxent River region in Maryland, the Everglades in Florida, and Barataria Basin in Louisiana. The authors are also planning a project to construct a series of spatially explicit linked ecological and economic simulation models aimed at assessing the long-term potential impacts of global climate change

  19. Dynamic overset grid communication on distributed memory parallel processors

    Science.gov (United States)

    Barszcz, Eric; Weeratunga, Sisira K.; Meakin, Robert L.

    1993-01-01

    A parallel distributed memory implementation of intergrid communication for dynamic overset grids is presented. Included are discussions of various options considered during development. Results are presented comparing an Intel iPSC/860 to a single processor Cray Y-MP. Results for grids in relative motion show the iPSC/860 implementation to be faster than the Cray implementation.

  20. Fast Parallel Computation of Polynomials Using Few Processors

    DEFF Research Database (Denmark)

    Valiant, Leslie G.; Skyum, Sven; Berkowitz, S.

    1983-01-01

    It is shown that any multivariate polynomial of degree $d$ that can be computed sequentially in $C$ steps can be computed in parallel in $O((\\log d)(\\log C + \\log d))$ steps using only $(Cd)^{O(1)} $ processors....

  1. Real-time trajectory optimization on parallel processors

    Science.gov (United States)

    Psiaki, Mark L.

    1993-01-01

    A parallel algorithm has been developed for rapidly solving trajectory optimization problems. The goal of the work has been to develop an algorithm that is suitable to do real-time, on-line optimal guidance through repeated solution of a trajectory optimization problem. The algorithm has been developed on an INTEL iPSC/860 message passing parallel processor. It uses a zero-order-hold discretization of a continuous-time problem and solves the resulting nonlinear programming problem using a custom-designed augmented Lagrangian nonlinear programming algorithm. The algorithm achieves parallelism of function, derivative, and search direction calculations through the principle of domain decomposition applied along the time axis. It has been encoded and tested on 3 example problems, the Goddard problem, the acceleration-limited, planar minimum-time to the origin problem, and a National Aerospace Plane minimum-fuel ascent guidance problem. Execution times as fast as 118 sec of wall clock time have been achieved for a 128-stage Goddard problem solved on 32 processors. A 32-stage minimum-time problem has been solved in 151 sec on 32 processors. A 32-stage National Aerospace Plane problem required 2 hours when solved on 32 processors. A speed-up factor of 7.2 has been achieved by using 32-nodes instead of 1-node to solve a 64-stage Goddard problem.

  2. Parallel Processor for 3D Recovery from Optical Flow

    Directory of Open Access Journals (Sweden)

    Jose Hugo Barron-Zambrano

    2009-01-01

    Full Text Available 3D recovery from motion has received a major effort in computer vision systems in the recent years. The main problem lies in the number of operations and memory accesses to be performed by the majority of the existing techniques when translated to hardware or software implementations. This paper proposes a parallel processor for 3D recovery from optical flow. Its main feature is the maximum reuse of data and the low number of clock cycles to calculate the optical flow, along with the precision with which 3D recovery is achieved. The results of the proposed architecture as well as those from processor synthesis are presented.

  3. A Parallel Workload Model and its Implications for Processor Allocation

    Science.gov (United States)

    1996-11-01

    with SEV or AVG, both of which can tolerate c = 0.4 { 0.6 before their performance deteriorates signi cantly. On the other hand, Setia [10] has...Sanjeev. K Setia . The interaction between memory allocation and adaptive partitioning in message-passing multicomputers. In IPPS 󈨣 Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [11] Sanjeev K. Setia and Satish K. Tripathi. An analysis of several processor

  4. Directions in parallel processor architecture, and GPUs too

    CERN Multimedia

    CERN. Geneva

    2014-01-01

    Modern computing is power-limited in every domain of computing. Performance increments extracted from instruction-level parallelism (ILP) are no longer power-efficient; they haven't been for some time. Thread-level parallelism (TLP) is a more easily exploited form of parallelism, at the expense of programmer effort to expose it in the program. In this talk, I will introduce you to disparate topics in parallel processor architecture that will impact programming models (and you) in both the near and far future. About the speaker Olivier is a senior GPU (SM) architect at NVIDIA and an active participant in the concurrency working group of the ISO C++ committee. He has also worked on very large diesel engines as a mechanical engineer, and taught at McGill University (Canada) as a faculty instructor.

  5. GPU: the biggest key processor for AI and parallel processing

    Science.gov (United States)

    Baji, Toru

    2017-07-01

    Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.

  6. Algorithms for computational fluid dynamics n parallel processors

    International Nuclear Information System (INIS)

    Van de Velde, E.F.

    1986-01-01

    A study of parallel algorithms for the numerical solution of partial differential equations arising in computational fluid dynamics is presented. The actual implementation on parallel processors of shared and nonshared memory design is discussed. The performance of these algorithms is analyzed in terms of machine efficiency, communication time, bottlenecks and software development costs. For elliptic equations, a parallel preconditioned conjugate gradient method is described, which has been used to solve pressure equations discretized with high order finite elements on irregular grids. A parallel full multigrid method and a parallel fast Poisson solver are also presented. Hyperbolic conservation laws were discretized with parallel versions of finite difference methods like the Lax-Wendroff scheme and with the Random Choice method. Techniques are developed for comparing the behavior of an algorithm on different architectures as a function of problem size and local computational effort. Effective use of these advanced architecture machines requires the use of machine dependent programming. It is shown that the portability problems can be minimized by introducing high level operations on vectors and matrices structured into program libraries

  7. Sn transport calculations on vector and parallel processors

    International Nuclear Information System (INIS)

    Rhoades, W.A.; Childs, R.L.

    1987-01-01

    The transport of radiation from the source to the location of people or equipment gives rise to some of the most challenging of calculations. A problem may involve as many as a billion unknowns, each evaluated several times to resolve interdependence. Such calculations run many hours on a Cray computer, and a typical study involves many such calculations. This paper will discuss the steps taken to vectorize the DOT code, which solves transport problems in two space dimensions (2-D); the extension of this code to 3-D; and the plans for extension to parallel processors

  8. The language parallel Pascal and other aspects of the massively parallel processor

    Science.gov (United States)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  9. Parallelization of applications for networks with homogeneous and heterogeneous processors

    International Nuclear Information System (INIS)

    Colombet, L.

    1994-01-01

    The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs

  10. On program restructuring, scheduling, and communication for parallel processor systems

    Energy Technology Data Exchange (ETDEWEB)

    Polychronopoulos, Constantine D. [Univ. of Illinois, Urbana, IL (United States)

    1986-08-01

    This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, these algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.

  11. Vector and parallel processors in computational science. Proceedings

    Energy Technology Data Exchange (ETDEWEB)

    Duff, I S; Reid, J K

    1985-01-01

    This volume contains papers from most of the invited talks and from several of the contributed talks and poster sessions presented at VAPP II. The contents present an extensive coverage of all important aspects of vector and parallel processors, including hardware, languages, numerical algorithms and applications. The topics covered include descriptions of new machines (both research and commercial machines), languages and software aids, and general discussions of whole classes of machines and their uses. Numerical methods papers include Monte Carlo algorithms, iterative and direct methods for solving large systems, finite elements, optimization, random number generation and mathematical software. The specific applications covered include neutron diffusion calculations, molecular dynamics, weather forecasting, lattice gauge calculations, fluid dynamics, flight simulation, cartography, image processing and cryptography. Most machines and architecture types are being used for these applications. many refs.

  12. Beam dynamics calculations and particle tracking using massively parallel processors

    International Nuclear Information System (INIS)

    Ryne, R.D.; Habib, S.

    1995-01-01

    During the past decade massively parallel processors (MPPs) have slowly gained acceptance within the scientific community. At present these machines typically contain a few hundred to one thousand off-the-shelf microprocessors and a total memory of up to 32 GBytes. The potential performance of these machines is illustrated by the fact that a month long job on a high end workstation might require only a few hours on an MPP. The acceptance of MPPs has been slow for a variety of reasons. For example, some algorithms are not easily parallelizable. Also, in the past these machines were difficult to program. But in recent years the development of Fortran-like languages such as CM Fortran and High Performance Fortran have made MPPs much easier to use. In the following we will describe how MPPs can be used for beam dynamics calculations and long term particle tracking

  13. Monolithic Parallel Tandem Organic Photovoltaic Cell with Transparent Carbon Nanotube Interlayer

    Science.gov (United States)

    Tanaka, S.; Mielczarek, K.; Ovalle-Robles, R.; Wang, B.; Hsu, D.; Zakhidov, A. A.

    2009-01-01

    We demonstrate an organic photovoltaic cell with a monolithic tandem structure in parallel connection. Transparent multiwalled carbon nanotube sheets are used as an interlayer anode electrode for this parallel tandem. The characteristics of front and back cells are measured independently. The short circuit current density of the parallel tandem cell is larger than the currents of each individual cell. The wavelength dependence of photocurrent for the parallel tandem cell shows the superposition spectrum of the two spectral sensitivities of the front and back cells. The monolithic three-electrode photovoltaic cell indeed operates as a parallel tandem with improved efficiency.

  14. Parallel computation for distributed parameter system-from vector processors to Adena computer

    Energy Technology Data Exchange (ETDEWEB)

    Nogi, T

    1983-04-01

    Research on advanced parallel hardware and software architectures for very high-speed computation deserves and needs more support and attention to fulfil its promise. Novel architectures for parallel processing are being made ready. Architectures for parallel processing can be roughly divided into two groups. One is a vector processor in which a single central processing unit involves multiple vector-arithmetic registers. The other is a processor array in which slave processors are connected to a host processor to perform parallel computation. In this review, the concept and data structure of the Adena (alternating-direction edition nexus array) architecture, which is conformable to distributed-parameter simulation algorithms, are described. 5 references.

  15. 'Iconic' tracking algorithms for high energy physics using the TRAX-I massively parallel processor

    International Nuclear Information System (INIS)

    Vesztergombi, G.

    1989-01-01

    TRAX-I, a cost-effective parallel microcomputer, applying associative string processor (ASP) architecture with 16 K parallel processing elements, is being built by Aspex Microsystems Ltd. (UK). When applied to the tracking problem of very complex events with several hundred tracks, the large number of processors allows one to dedicate one or more processors to each wire (in MWPC), each pixel (in digitized images from streamer chambers or other visual detectors), or each pad (in TPC) to perform very efficient pattern recognition. Some linear tracking algorithms based on this ''ionic'' representation are presented. (orig.)

  16. 'Iconic' tracking algorithms for high energy physics using the TRAX-I massively parallel processor

    International Nuclear Information System (INIS)

    Vestergombi, G.

    1989-11-01

    TRAX-I, a cost-effective parallel microcomputer, applying Associative String Processor (ASP) architecture with 16 K parallel processing elements, is being built by Aspex Microsystems Ltd. (UK). When applied to the tracking problem of very complex events with several hundred tracks, the large number of processors allows one to dedicate one or more processors to each wire (in MWPC), each pixel (in digitized images from streamer chambers or other visual detectors), or each pad (in TPC) to perform very efficient pattern recognition. Some linear tracking algorithms based on this 'iconic' representation are presented. (orig.)

  17. Monte Carlo photon transport on shared memory and distributed memory parallel processors

    International Nuclear Information System (INIS)

    Martin, W.R.; Wan, T.C.; Abdel-Rahman, T.S.; Mudge, T.N.; Miura, K.

    1987-01-01

    Parallelized Monte Carlo algorithms for analyzing photon transport in an inertially confined fusion (ICF) plasma are considered. Algorithms were developed for shared memory (vector and scalar) and distributed memory (scalar) parallel processors. The shared memory algorithm was implemented on the IBM 3090/400, and timing results are presented for dedicated runs with two, three, and four processors. Two alternative distributed memory algorithms (replication and dispatching) were implemented on a hypercube parallel processor (1 through 64 nodes). The replication algorithm yields essentially full efficiency for all cube sizes; with the 64-node configuration, the absolute performance is nearly the same as with the CRAY X-MP. The dispatching algorithm also yields efficiencies above 80% in a large simulation for the 64-processor configuration

  18. JIST: Just-In-Time Scheduling Translation for Parallel Processors

    Directory of Open Access Journals (Sweden)

    Giovanni Agosta

    2005-01-01

    Full Text Available The application fields of bytecode virtual machines and VLIW processors overlap in the area of embedded and mobile systems, where the two technologies offer different benefits, namely high code portability, low power consumption and reduced hardware cost. Dynamic compilation makes it possible to bridge the gap between the two technologies, but special attention must be paid to software instruction scheduling, a must for the VLIW architectures. We have implemented JIST, a Virtual Machine and JIT compiler for Java Bytecode targeted to a VLIW processor. We show the impact of various optimizations on the performance of code compiled with JIST through the experimental study on a set of benchmark programs. We report significant speedups, and increments in the number of instructions issued per cycle up to 50% with respect to the non-scheduling version of the JITcompiler. Further optimizations are discussed.

  19. Mathematical Methods and Algorithms of Mobile Parallel Computing on the Base of Multi-core Processors

    Directory of Open Access Journals (Sweden)

    Alexander B. Bakulev

    2012-11-01

    Full Text Available This article deals with mathematical models and algorithms, providing mobility of sequential programs parallel representation on the high-level language, presents formal model of operation environment processes management, based on the proposed model of programs parallel representation, presenting computation process on the base of multi-core processors.

  20. PERFORMANCE EVALUATION OF OR1200 PROCESSOR WITH EVOLUTIONARY PARALLEL HPRC USING GEP

    Directory of Open Access Journals (Sweden)

    R. Maheswari

    2012-04-01

    Full Text Available In this fast computing era, most of the embedded system requires more computing power to complete the complex function/ task at the lesser amount of time. One way to achieve this is by boosting up the processor performance which allows processor core to run faster. This paper presents a novel technique of increasing the performance by parallel HPRC (High Performance Reconfigurable Computing in the CPU/DSP (Digital Signal Processor unit of OR1200 (Open Reduced Instruction Set Computer (RISC 1200 using Gene Expression Programming (GEP an evolutionary programming model. OR1200 is a soft-core RISC processor of the Intellectual Property cores that can efficiently run any modern operating system. In the manufacturing process of OR1200 a parallel HPRC is placed internally in the Integer Execution Pipeline unit of the CPU/DSP core to increase the performance. The GEP Parallel HPRC is activated /deactivated by triggering the signals i HPRC_Gene_Start ii HPRC_Gene_End. A Verilog HDL(Hardware Description language functional code for Gene Expression Programming parallel HPRC is developed and synthesised using XILINX ISE in the former part of the work and a CoreMark processor core benchmark is used to test the performance of the OR1200 soft core in the later part of the work. The result of the implementation ensures the overall speed-up increased to 20.59% by GEP based parallel HPRC in the execution unit of OR1200.

  1. Parallel processors and nonlinear structural dynamics algorithms and software

    Science.gov (United States)

    Belytschko, Ted

    1989-01-01

    A nonlinear structural dynamics finite element program was developed to run on a shared memory multiprocessor with pipeline processors. The program, WHAMS, was used as a framework for this work. The program employs explicit time integration and has the capability to handle both the nonlinear material behavior and large displacement response of 3-D structures. The elasto-plastic material model uses an isotropic strain hardening law which is input as a piecewise linear function. Geometric nonlinearities are handled by a corotational formulation in which a coordinate system is embedded at the integration point of each element. Currently, the program has an element library consisting of a beam element based on Euler-Bernoulli theory and trianglar and quadrilateral plate element based on Mindlin theory.

  2. Modular high-temperature gas-cooled reactor simulation using parallel processors

    International Nuclear Information System (INIS)

    Ball, S.J.; Conklin, J.C.

    1989-01-01

    The MHPP (Modular HTGR Parallel Processor) code has been developed to simulate modular high-temperature gas-cooled reactor (MHTGR) transients and accidents. MHPP incorporates a very detailed model for predicting the dynamics of the reactor core, vessel, and cooling systems over a wide variety of scenarios ranging from expected transients to very-low-probability severe accidents. The simulations routines, which had originally been developed entirely as serial code, were readily adapted to parallel processing Fortran. The resulting parallelized simulation speed was enhanced significantly. Workstation interfaces are being developed to provide for user (operator) interaction. In this paper the benefits realized by adapting previous MHTGR codes to run on a parallel processor are discussed, along with results of typical accident analyses

  3. Supertracker: A Programmable Parallel Pipeline Arithmetic Processor For Auto-Cueing Target Processing

    Science.gov (United States)

    Mack, Harold; Reddi, S. S.

    1980-04-01

    Supertracker represents a programmable parallel pipeline computer architecture that has been designed to meet the real time image processing requirements of auto-cueing target data processing. The prototype bread-board currently under development will be designed to perform input video preprocessing and processing for 525-line and 875-line TV formats FLIR video, automatic display gain and contrast control, and automatic target cueing, classification, and tracking. The video preprocessor is capable of performing operations full frames of video data in real time, e.g., frame integration, storage, 3 x 3 convolution, and neighborhood processing. The processor architecture is being implemented using bit-slice microprogrammable arithmetic processors, operating in parallel. Each processor is capable of up to 20 million operations per second. Multiple frame memories are used for additional flexibility.

  4. Computations on the massively parallel processor at the Goddard Space Flight Center

    Science.gov (United States)

    Strong, James P.

    1991-01-01

    Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.

  5. Real-time simulation of MHD/steam power plants by digital parallel processors

    International Nuclear Information System (INIS)

    Johnson, R.M.; Rudberg, D.A.

    1981-01-01

    Attention is given to a large FORTRAN coded program which simulates the dynamic response of the MHD/steam plant on either a SEL 32/55 or VAX 11/780 computer. The code realizes a detailed first-principle model of the plant. Quite recently, in addition to the VAX 11/780, an AD-10 has been installed for usage as a real-time simulation facility. The parallel processor AD-10 is capable of simulating the MHD/steam plant at several times real-time rates. This is desirable in order to develop rapidly a large data base of varied plant operating conditions. The combined-cycle MHD/steam plant model is discussed, taking into account a number of disadvantages. The disadvantages can be overcome with the aid of an array processor used as an adjunct to the unit processor. The conversion of some computations for real-time simulation is considered

  6. Discussion paper for a highly parallel array processor-based machine

    International Nuclear Information System (INIS)

    Hagstrom, R.; Bolotin, G.; Dawson, J.

    1984-01-01

    The architectural plant for a quickly realizable implementation of a highly parallel special-purpose computer system with peak performance in the range of 6 billion floating point operations per second is discussed. The architecture is suitable to Lattice Gauge theoretical computations of fundamental physics interest and may be applicable to a range of other problems which deal with numerically intensive computational problems. The plan is quickly realizable because it employs a maximum of commercially available hardware subsystems and because the architecture is software-transparent to the individual processors, allowing straightforward re-use of whatever commercially available operating-systems and support software that is suitable to run on the commercially-produced processors. A tiny prototype instrument, designed along this architecture has already operated. A few elementary examples of programs which can run efficiently are presented. The large machine which the authors would propose to build would be based upon a highly competent array-processor, the ST-100 Array Processor, and specific design possibilities are discussed. The first step toward realizing this plan practically is to install a single ST-100 to allow algorithm development to proceed while a demonstration unit is built using two of the ST-100 Array Processors

  7. Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

    Science.gov (United States)

    Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

    2008-04-01

    This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.

  8. Scalable High-Performance Parallel Design for Network Intrusion Detection Systems on Many-Core Processors

    OpenAIRE

    Jiang, Hayang; Xie, Gaogang; Salamatian, Kavé; Mathy, Laurent

    2013-01-01

    Network Intrusion Detection Systems (NIDSes) face significant challenges coming from the relentless network link speed growth and increasing complexity of threats. Both hardware accelerated and parallel software-based NIDS solutions, based on commodity multi-core and GPU processors, have been proposed to overcome these challenges. Network Intrusion Detection Systems (NIDSes) face significant challenges coming from the relentless network link speed growth and increasing complexity of threats. ...

  9. An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications

    Energy Technology Data Exchange (ETDEWEB)

    Vydyanathan, Naga; Krishnamoorthy, Sriram; Sabin, Gerald M.; Catalyurek, Umit V.; Kurc, Tahsin; Sadayappan, Ponnuswamy; Saltz, Joel H.

    2009-08-01

    Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application-tasks with dependences. These applications exhibit both task- and data-parallelism, and combining these two (also called mixedparallelism), has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task- and data-parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisions are made in an integrated manner and are based on several factors such as the structure of the taskgraph, the runtime estimates and scalability characteristics of the tasks and the inter-task data communication volumes. A locality conscious scheduling strategy is used to improve inter-task data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications as well as synthetic graphs shows that our algorithm consistently generates schedules with lower makespan as compared to CPR and CPA, two previously proposed scheduling algorithms. Our algorithm also produces schedules that have lower makespan than pure taskand data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.

  10. DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

    Directory of Open Access Journals (Sweden)

    Kaufmann Michael

    2004-09-01

    Full Text Available Abstract Background Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Results Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. Conclusions By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.

  11. Phase space simulation of collisionless stellar systems on the massively parallel processor

    International Nuclear Information System (INIS)

    White, R.L.

    1987-01-01

    A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem

  12. Block iterative restoration of astronomical images with the massively parallel processor

    International Nuclear Information System (INIS)

    Heap, S.R.; Lindler, D.J.

    1987-01-01

    A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images

  13. Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

    Science.gov (United States)

    Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; Masciovecchio, Mario; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

    2017-08-01

    For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.

  14. Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

    Directory of Open Access Journals (Sweden)

    Cerati Giuseppe

    2017-01-01

    Full Text Available For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU, ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC, for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.

  15. Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

    Energy Technology Data Exchange (ETDEWEB)

    Cerati, Giuseppe [Fermilab; Elmer, Peter [Princeton U.; Krutelyov, Slava [UC, San Diego; Lantz, Steven [Cornell U.; Lefebvre, Matthieu [Princeton U.; Masciovecchio, Mario [UC, San Diego; McDermott, Kevin [Cornell U.; Riley, Daniel [Cornell U., LNS; Tadel, Matevž [UC, San Diego; Wittich, Peter [Cornell U.; Würthwein, Frank [UC, San Diego; Yagil, Avi [UC, San Diego

    2017-01-01

    For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.

  16. Animated computer graphics models of space and earth sciences data generated via the massively parallel processor

    Science.gov (United States)

    Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David

    1987-01-01

    The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.

  17. Compiling the functional data-parallel language SaC for Microgrids of Self-Adaptive Virtual Processors

    NARCIS (Netherlands)

    Grelck, C.; Herhut, S.; Jesshope, C.; Joslin, C.; Lankamp, M.; Scholz, S.-B.; Shafarenko, A.

    2009-01-01

    We present preliminary results from compiling the high-level, functional and data-parallel programming language SaC into a novel multi-core design: Microgrids of Self-Adaptive Virtual Processors (SVPs). The side-effect free nature of SaC in conjunction with its data-parallel foundation make it an

  18. Architecture and VHDL behavioural validation of a parallel processor dedicated to computer vision

    International Nuclear Information System (INIS)

    Collette, Thierry

    1992-01-01

    Speeding up image processing is mainly obtained using parallel computers; SIMD processors (single instruction stream, multiple data stream) have been developed, and have proven highly efficient regarding low-level image processing operations. Nevertheless, their performances drop for most intermediate of high level operations, mainly when random data reorganisations in processor memories are involved. The aim of this thesis was to extend the SIMD computer capabilities to allow it to perform more efficiently at the image processing intermediate level. The study of some representative algorithms of this class, points out the limits of this computer. Nevertheless, these limits can be erased by architectural modifications. This leads us to propose SYMPATIX, a new SIMD parallel computer. To valid its new concept, a behavioural model written in VHDL - Hardware Description Language - has been elaborated. With this model, the new computer performances have been estimated running image processing algorithm simulations. VHDL modeling approach allows to perform the system top down electronic design giving an easy coupling between system architectural modifications and their electronic cost. The obtained results show SYMPATIX to be an efficient computer for low and intermediate level image processing. It can be connected to a high level computer, opening up the development of new computer vision applications. This thesis also presents, a top down design method, based on the VHDL, intended for electronic system architects. (author) [fr

  19. Fundamental physics issues of multilevel logic in developing a parallel processor.

    Science.gov (United States)

    Bandyopadhyay, Anirban; Miki, Kazushi

    2007-06-01

    In the last century, On and Off physical switches, were equated with two decisions 0 and 1 to express every information in terms of binary digits and physically realize it in terms of switches connected in a circuit. Apart from memory-density increase significantly, more possible choices in particular space enables pattern-logic a reality, and manipulation of pattern would allow controlling logic, generating a new kind of processor. Neumann's computer is based on sequential logic, processing bits one by one. But as pattern-logic is generated on a surface, viewing whole pattern at a time is a truly parallel processing. Following Neumann's and Shannons fundamental thermodynamical approaches we have built compatible model based on series of single molecule based multibit logic systems of 4-12 bits in an UHV-STM. On their monolayer multilevel communication and pattern formation is experimentally verified. Furthermore, the developed intelligent monolayer is trained by Artificial Neural Network. Therefore fundamental weak interactions for the building of truly parallel processor are explored here physically and theoretically.

  20. Computing effective properties of random heterogeneous materials on heterogeneous parallel processors

    Science.gov (United States)

    Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto

    2012-11-01

    In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.

  1. Evaluation of the Intel iWarp parallel processor for space flight applications

    Science.gov (United States)

    Hine, Butler P., III; Fong, Terrence W.

    1993-01-01

    The potential of a DARPA-sponsored advanced processor, the Intel iWarp, for use in future SSF Data Management Systems (DMS) upgrades is evaluated through integration into the Ames DMS testbed and applications testing. The iWarp is a distributed, parallel computing system well suited for high performance computing applications such as matrix operations and image processing. The system architecture is modular, supports systolic and message-based computation, and is capable of providing massive computational power in a low-cost, low-power package. As a consequence, the iWarp offers significant potential for advanced space-based computing. This research seeks to determine the iWarp's suitability as a processing device for space missions. In particular, the project focuses on evaluating the ease of integrating the iWarp into the SSF DMS baseline architecture and the iWarp's ability to support computationally stressing applications representative of SSF tasks.

  2. On-board landmark navigation and attitude reference parallel processor system

    Science.gov (United States)

    Gilbert, L. E.; Mahajan, D. T.

    1978-01-01

    An approach to autonomous navigation and attitude reference for earth observing spacecraft is described along with the landmark identification technique based on a sequential similarity detection algorithm (SSDA). Laboratory experiments undertaken to determine if better than one pixel accuracy in registration can be achieved consistent with onboard processor timing and capacity constraints are included. The SSDA is implemented using a multi-microprocessor system including synchronization logic and chip library. The data is processed in parallel stages, effectively reducing the time to match the small known image within a larger image as seen by the onboard image system. Shared memory is incorporated in the system to help communicate intermediate results among microprocessors. The functions include finding mean values and summation of absolute differences over the image search area. The hardware is a low power, compact unit suitable to onboard application with the flexibility to provide for different parameters depending upon the environment.

  3. A diffusion model for two parallel queues with processor sharing: transient behavior and asymptotics

    Directory of Open Access Journals (Sweden)

    Charles Knessl

    1999-01-01

    Full Text Available We consider two identical, parallel M/M/1 queues. Both queues are fed by a Poisson arrival stream of rate λ and have service rates equal to μ. When both queues are non-empty, the two systems behave independently of each other. However, when one of the queues becomes empty, the corresponding server helps in the other queue. This is called head-of-the-line processor sharing. We study this model in the heavy traffic limit, where ρ=λ/μ→1. We formulate the heavy traffic diffusion approximation and explicitly compute the time-dependent probability of the diffusion approximation to the joint queue length process. We then evaluate the solution asymptotically for large values of space and/or time. This leads to simple expressions that show how the process achieves its stead state and other transient aspects.

  4. High-performance parallel processors based on star-coupled wavelength division multiplexing optical interconnects

    Science.gov (United States)

    Deri, Robert J.; DeGroot, Anthony J.; Haigh, Ronald E.

    2002-01-01

    As the performance of individual elements within parallel processing systems increases, increased communication capability between distributed processor and memory elements is required. There is great interest in using fiber optics to improve interconnect communication beyond that attainable using electronic technology. Several groups have considered WDM, star-coupled optical interconnects. The invention uses a fiber optic transceiver to provide low latency, high bandwidth channels for such interconnects using a robust multimode fiber technology. Instruction-level simulation is used to quantify the bandwidth, latency, and concurrency required for such interconnects to scale to 256 nodes, each operating at 1 GFLOPS performance. Performance scales have been shown to .apprxeq.100 GFLOPS for scientific application kernels using a small number of wavelengths (8 to 32), only one wavelength received per node, and achievable optoelectronic bandwidth and latency.

  5. The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

    Directory of Open Access Journals (Sweden)

    Matthew O'keefe

    1995-01-01

    Full Text Available Massively parallel processors (MPPs hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. We have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.

  6. Preliminary design of an advanced programmable digital filter network for large passive acoustic ASW systems. [Parallel processor

    Energy Technology Data Exchange (ETDEWEB)

    McWilliams, T.; Widdoes, Jr., L. C.; Wood, L.

    1976-09-30

    The design of an extremely high performance programmable digital filter of novel architecture, the LLL Programmable Digital Filter, is described. The digital filter is a high-performance multiprocessor having general purpose applicability and high programmability; it is extremely cost effective either in a uniprocessor or a multiprocessor configuration. The architecture and instruction set of the individual processor was optimized with regard to the multiple processor configuration. The optimal structure of a parallel processing system was determined for addressing the specific Navy application centering on the advanced digital filtering of passive acoustic ASW data of the type obtained from the SOSUS net. 148 figures. (RWR)

  7. Mathematical and numerical models to achieve high speed with special-purpose parallel processors

    International Nuclear Information System (INIS)

    Cheng, H.S.; Wulff, W.; Mallen, A.N.

    1986-01-01

    Historically, safety analyses and plant dynamic simulations have been and still are being carried out be means of detailed FORTRAN codes on expensive mainframe computers in time-consuming batch processing mode. These codes have grown to be so expensive to execute that their utilization depends increasingly on the availability of very expensive supercomputers. Thus, advanced technology for high-speed, low-cost, and accurate plant dynamic simulations is very much needed. Ideally, a low-cost facility based on a modern minicomputer can be dedicated to the staff of a power plant, which is easy and convenient to use, and which can simulate realistically plant transients at faster than real-time speeds. Such a simulation capability can enhance safety and plant utilization. One such simulation facility that has been developed is the Brookhaven National Laboratory (BNL) Plant Analyzer, currently set up for boiling water reactor plant simulations at up to seven times faster than real-time process speeds. The principal hardware components of the BNL Plant Analyzer are two units of special-purpose parallel processors, the AD10 of Applied Dynamics International and a PDP-11/34 host computer

  8. Track recognition in 4 μs by a systolic trigger processor using a parallel Hough transform

    International Nuclear Information System (INIS)

    Klefenz, F.; Noffz, K.H.; Conen, W.; Zoz, R.; Kugel, A.; Maenner, R.; Univ. Heidelberg

    1993-01-01

    A parallel Hough transform processor has been developed that identifies circular particle tracks in a 2D projection of the OPAL jet chamber. The high-speed requirements imposed by the 8 bunch crossing mode of LEP could be fulfilled by computing the starting angle and the radius of curvature for each well defined track in less than 4 μs. The system consists of a Hough transform processor that determines well defined tracks, and a Euler processor that counts their number by applying the Euler relation to the thresholded result of the Hough transform. A prototype of a systolic processor has been built that handles one sector of the jet chamber. It consists of 35 x 32 processing elements that were loaded into 21 programmable gate arrays (XILINX). This processor runs at a clock rate of 40 MHz. It has been tested offline with about 1,000 original OPAL events. No deviations from the off-line simulation have been found. A trigger efficiency of 93% has been obtained. The prototype together with the associated drift time measurement unit has been installed at the OPAL detector at LEP and 100k events have been sampled to evaluate the system under detector conditions

  9. Real-Time Adaptive Lossless Hyperspectral Image Compression using CCSDS on Parallel GPGPU and Multicore Processor Systems

    Science.gov (United States)

    Hopson, Ben; Benkrid, Khaled; Keymeulen, Didier; Aranki, Nazeeh; Klimesh, Matt; Kiely, Aaron

    2012-01-01

    The proposed CCSDS (Consultative Committee for Space Data Systems) Lossless Hyperspectral Image Compression Algorithm was designed to facilitate a fast hardware implementation. This paper analyses that algorithm with regard to available parallelism and describes fast parallel implementations in software for GPGPU and Multicore CPU architectures. We show that careful software implementation, using hardware acceleration in the form of GPGPUs or even just multicore processors, can exceed the performance of existing hardware and software implementations by up to 11x and break the real-time barrier for the first time for a typical test application.

  10. Performance evaluation of the HEP, ELXSI and CRAY X-MP parallel processors on hydrocode test problems

    International Nuclear Information System (INIS)

    Liebrock, L.M.; McGrath, J.F.; Hicks, D.L.

    1986-01-01

    Parallel programming promises improved processing speeds for hydrocodes, magnetohydrocodes, multiphase flow codes, thermal-hydraulics codes, wavecodes and other continuum dynamics codes. This paper presents the results of some investigations of parallel algorithms on three parallel processors: the CRAY X-MP, ELXSI and the HEP computers. Introduction and Background: We report the results of investigations of parallel algorithms for computational continuum dynamics. These programs (hydrocodes, wavecodes, etc.) produce simulations of the solutions to problems arising in the motion of continua: solid dynamics, liquid dynamics, gas dynamics, plasma dynamics, multiphase flow dynamics, thermal-hydraulic dynamics and multimaterial flow dynamics. This report restricts its scope to one-dimensional algorithms such as the von Neumann-Richtmyer (1950) scheme

  11. Choosing processor array configuration by performance modeling for a highly parallel linear algebra algorithm

    International Nuclear Information System (INIS)

    Littlefield, R.J.; Maschhoff, K.J.

    1991-04-01

    Many linear algebra algorithms utilize an array of processors across which matrices are distributed. Given a particular matrix size and a maximum number of processors, what configuration of processors, i.e., what size and shape array, will execute the fastest? The answer to this question depends on tradeoffs between load balancing, communication startup and transfer costs, and computational overhead. In this paper we analyze in detail one algorithm: the blocked factored Jacobi method for solving dense eigensystems. A performance model is developed to predict execution time as a function of the processor array and matrix sizes, plus the basic computation and communication speeds of the underlying computer system. In experiments on a large hypercube (up to 512 processors), this model has been found to be highly accurate (mean error ∼ 2%) over a wide range of matrix sizes (10 x 10 through 200 x 200) and processor counts (1 to 512). The model reveals, and direct experiment confirms, that the tradeoffs mentioned above can be surprisingly complex and counterintuitive. We propose decision procedures based directly on the performance model to choose configurations for fastest execution. The model-based decision procedures are compared to a heuristic strategy and shown to be significantly better. 7 refs., 8 figs., 1 tab

  12. Space-charge-dominated beam dynamics simulations using the massively parallel processors (MPPs) of the Cray T3D

    International Nuclear Information System (INIS)

    Liu, H.

    1996-01-01

    Computer simulations using the multi-particle code PARMELA with a three-dimensional point-by-point space charge algorithm have turned out to be very helpful in supporting injector commissioning and operations at Thomas Jefferson National Accelerator Facility (Jefferson Lab, formerly called CEBAF). However, this algorithm, which defines a typical N 2 problem in CPU time scaling, is very time-consuming when N, the number of macro-particles, is large. Therefore, it is attractive to use massively parallel processors (MPPs) to speed up the simulations. Motivated by this, the authors modified the space charge subroutine for using the MPPs of the Cray T3D. The techniques used to parallelize and optimize the code on the T3D are discussed in this paper. The performance of the code on the T3D is examined in comparison with a Parallel Vector Processing supercomputer of the Cray C90 and an HP 735/15 high-end workstation

  13. Design of massively parallel hardware multi-processors for highly-demanding embedded applications

    NARCIS (Netherlands)

    Jozwiak, L.; Jan, Y.

    2013-01-01

    Many new embedded applications require complex computations to be performed to tight schedules, while at the same time demanding low energy consumption and low cost. For implementation of these highly-demanding applications, highly-optimized application-specific multi-processor system-on-a-chip

  14. A parallel FPGA implementation for real-time 2D pixel clustering for the ATLAS Fast Tracker Processor

    International Nuclear Information System (INIS)

    Sotiropoulou, C L; Gkaitatzis, S; Kordas, K; Nikolaidis, S; Petridou, C; Annovi, A; Beretta, M; Volpi, G

    2014-01-01

    The parallel 2D pixel clustering FPGA implementation used for the input system of the ATLAS Fast TracKer (FTK) processor is presented. The input system for the FTK processor will receive data from the Pixel and micro-strip detectors from inner ATLAS read out drivers (RODs) at full rate, for total of 760Gbs, as sent by the RODs after level-1 triggers. Clustering serves two purposes, the first is to reduce the high rate of the received data before further processing, the second is to determine the cluster centroid to obtain the best spatial measurement. For the pixel detectors the clustering is implemented by using a 2D-clustering algorithm that takes advantage of a moving window technique to minimize the logic required for cluster identification. The cluster detection window size can be adjusted for optimizing the cluster identification process. Additionally, the implementation can be parallelized by instantiating multiple cores to identify different clusters independently thus exploiting more FPGA resources. This flexibility makes the implementation suitable for a variety of demanding image processing applications. The implementation is robust against bit errors in the input data stream and drops all data that cannot be identified. In the unlikely event of missing control words, the implementation will ensure stable data processing by inserting the missing control words in the data stream. The 2D pixel clustering implementation is developed and tested in both single flow and parallel versions. The first parallel version with 16 parallel cluster identification engines is presented. The input data from the RODs are received through S-Links and the processing units that follow the clustering implementation also require a single data stream, therefore data parallelizing (demultiplexing) and serializing (multiplexing) modules are introduced in order to accommodate the parallelized version and restore the data stream afterwards. The results of the first hardware tests of

  15. A Parallel FPGA Implementation for Real-Time 2D Pixel Clustering for the ATLAS Fast TracKer Processor

    CERN Document Server

    Sotiropoulou, C-L; The ATLAS collaboration; Annovi, A; Beretta, M; Kordas, K; Nikolaidis, S; Petridou, C; Volpi, G

    2014-01-01

    The parallel 2D pixel clustering FPGA implementation used for the input system of the ATLAS Fast TracKer (FTK) processor is presented. The input system for the FTK processor will receive data from the Pixel and micro-strip detectors from inner ATLAS read out drivers (RODs) at full rate, for total of 760Gbs, as sent by the RODs after level-1 triggers. Clustering serves two purposes, the first is to reduce the high rate of the received data before further processing, the second is to determine the cluster centroid to obtain the best spatial measurement. For the pixel detectors the clustering is implemented by using a 2D-clustering algorithm that takes advantage of a moving window technique to minimize the logic required for cluster identification. The cluster detection window size can be adjusted for optimizing the cluster identification process. Additionally, the implementation can be parallelized by instantiating multiple cores to identify different clusters independently thus exploiting more FPGA resources. ...

  16. A Parallel FPGA Implementation for Real-Time 2D Pixel Clustering for the ATLAS Fast TracKer Processor

    CERN Document Server

    Sotiropoulou, C-L; The ATLAS collaboration; Annovi, A; Beretta, M; Kordas, K; Nikolaidis, S; Petridou, C; Volpi, G

    2014-01-01

    The parallel 2D pixel clustering FPGA implementation used for the input system of the ATLAS Fast TracKer (FTK) processor is presented. The input system for the FTK processor will receive data from the Pixel and micro-strip detectors from inner ATLAS read out drivers (RODs) at full rate, for total of 760Gbs, as sent by the RODs after level1 triggers. Clustering serves two purposes, the first is to reduce the high rate of the received data before further processing, the second is to determine the cluster centroid to obtain the best spatial measurement. For the pixel detectors the clustering is implemented by using a 2D-clustering algorithm that takes advantage of a moving window technique to minimize the logic required for cluster identification. The cluster detection window size can be adjusted for optimizing the cluster identification process. Additionally, the implementation can be parallelized by instantiating multiple cores to identify different clusters independently thus exploiting more FPGA resources. T...

  17. A longitudinal multi-bunch feedback system using parallel digital signal processors

    International Nuclear Information System (INIS)

    Sapozhnikov, L.; Fox, J.D.; Olsen, J.J.; Oxoby, G.; Linscott, I.; Drago, A.; Serio, M.

    1994-01-01

    A programmable longitudinal feedback system based on four AT ampersand T 1610 digital signal processors has been developed as a component of the PEP-II R ampersand D program. This longitudinal quick prototype is a proof of concept for the PEP-II system and implements full-speed bunch-by-bunch signal processing for storage rings with bunch spacings of 4 ns. The design incorporates a phase-detector-based front end that digitizes the oscillation phases of bunches at the 250 MHz crossing rate, four programmable signal processors that compute correction signals, and a 250-MHz hold buffer/kicker driver stage that applies correction signals back on the beam. The design implements a general-purpose, table-driven downsampler that allows the system to be operated at several accelerator facilities. The hardware architecture of the signal processing is described, and the software algorithms used in the feedback signal computation are discussed. The system configuration used for tests at the LBL Advanced Light Source is presented

  18. Compiling the parallel programming language NestStep to the CELL processor

    OpenAIRE

    Holm, Magnus

    2010-01-01

    The goal of this project is to create a source-to-source compiler which will translate NestStep code to C code. The compiler's job is to replace NestStep constructs with a series of function calls to the NestStep runtime system. NestStep is a parallel programming language extension based on the BSP model. It adds constructs for parallel programming on top of an imperative programming language. For this project, only constructs extending the C language are relevant. The output code will compil...

  19. Three-dimensional magnetic field computation on a distributed memory parallel processor

    International Nuclear Information System (INIS)

    Barion, M.L.

    1990-01-01

    The analysis of three-dimensional magnetic fields by finite element methods frequently proves too onerous a task for the computing resource on which it is attempted. When non-linear and transient effects are included, it may become impossible to calculate the field distribution to sufficient resolution. One approach to this problem is to exploit the natural parallelism in the finite element method via parallel processing. This paper reports on an implementation of a finite element code for non-linear three-dimensional low-frequency magnetic field calculation on Intel's iPSC/2

  20. A Design of a New Column-Parallel Analog-to-Digital Converter Flash for Monolithic Active Pixel Sensor.

    Science.gov (United States)

    Chakir, Mostafa; Akhamal, Hicham; Qjidaa, Hassan

    2017-01-01

    The CMOS Monolithic Active Pixel Sensor (MAPS) for the International Linear Collider (ILC) vertex detector (VXD) expresses stringent requirements on their analog readout electronics, specifically on the analog-to-digital converter (ADC). This paper concerns designing and optimizing a new architecture of a low power, high speed, and small-area 4-bit column-parallel ADC Flash. Later in this study, we propose to interpose an S/H block in the converter. This integration of S/H block increases the sensitiveness of the converter to the very small amplitude of the input signal from the sensor and provides a sufficient time to the converter to be able to code the input signal. This ADC is developed in 0.18  μ m CMOS process with a pixel pitch of 35  μ m. The proposed ADC responds to the constraints of power dissipation, size, and speed for the MAPS composed of a matrix of 64 rows and 48 columns where each column ADC covers a small area of 35 × 336.76  μ m 2 . The proposed ADC consumes low power at a 1.8 V supply and 100 MS/s sampling rate with dynamic range of 125 mV. Its DNL and INL are 0.0812/-0.0787 LSB and 0.0811/-0.0787 LSB, respectively. Furthermore, this ADC achieves a high speed more than 5 GHz.

  1. Kemari: A Portable High Performance Fortran System for Distributed Memory Parallel Processors

    Directory of Open Access Journals (Sweden)

    T. Kamachi

    1997-01-01

    Full Text Available We have developed a compilation system which extends High Performance Fortran (HPF in various aspects. We support the parallelization of well-structured problems with loop distribution and alignment directives similar to HPF's data distribution directives. Such directives give both additional control to the user and simplify the compilation process. For the support of unstructured problems, we provide directives for dynamic data distribution through user-defined mappings. The compiler also allows integration of message-passing interface (MPI primitives. The system is part of a complete programming environment which also comprises a parallel debugger and a performance monitor and analyzer. After an overview of the compiler, we describe the language extensions and related compilation mechanisms in detail. Performance measurements demonstrate the compiler's applicability to a variety of application classes.

  2. Intelligent trigger by massively parallel processors for high energy physics experiments

    International Nuclear Information System (INIS)

    Rohrbach, F.; Vesztergombi, G.

    1992-01-01

    The CERN-MPPC collaboration concentrates its effort on the development of machines based on massive parallelism with thousands of integrated processing elements, arranged in a string. Seven applications are under detailed studies within the collaboration: three for LHC, one for SSC, two for fixed target high energy physics at CERN and one for HDTV. Preliminary results are presented. They show that the objectives should be reached with the use of the ASP architecture. (author)

  3. Design of a family of integrated parallel co-processors for images processing

    International Nuclear Information System (INIS)

    Court, Thierry

    1991-01-01

    The design of parallel image processing Systems joining in a same architecture, sophisticated microprocessors and specialised operators is a difficult task, because of the various problems to be taken into account. The current study identifies a certain way of realizing and interfacing such dedicated operators to a central unit with microprocessor type. The two guide lines of this work are the search for polyvalent specialized and re-configurated operators as well as their connections to a System bus, and not to specialized video buses. This research work proposes a certain architecture of circuits dedicated to image processing and two realization proposals of them. One of them was be realized in this study by using silicon compiler tools. This work belongs to a more important project, whose aim is the development of an industrial image processing System, high performing, modular, based on the parallelization, in MIMD structures, of an elementary, autonomous image processing unit integrating a microprocessor equipped with a parallel coprocessor suited to image processing. (author) [fr

  4. KENO-VA-PVM KENO-VA-SM, KENO5A for Parallel Processors

    International Nuclear Information System (INIS)

    Ramon, Javier; Pena, Jorge

    2002-01-01

    1 - Description of program or function: This package contains versions KENO-Va-SM (Shared Memory version) and KENO-Va-PVM (Parallel Virtual Machine version) based on SCALE-4.1. KENO-Va three-dimensional Boltzmann transport equation for neutron multiplying systems. The primary purpose of KENO-Va is to determine k-effective. Other calculated quantities include lifetime and generation time, energy-dependent leakages, energy- and region-dependent absorptions, fissions, fluxes, and fission densities. 2 - Method of solution: KENO-Va employs the Monte Carlo technique

  5. Supercomputers and parallel computation. Based on the proceedings of a workshop on progress in the use of vector and array processors organised by the Institute of Mathematics and its Applications and held in Bristol, 2-3 September 1982

    International Nuclear Information System (INIS)

    Paddon, D.J.

    1984-01-01

    This book is based on the proceedings of a conference on parallel computing held in 1982. There are 18 papers which cover the following topics: VLSI parallel architectures, the theory of parallel computing and vector and array processor computing. One paper on 'Tough Problems in Reactor Design' is indexed separately. All the contributions are on research done in the United Kingdom. Although much of the experience in array processor computing is associated with the ICL distributed array processor (DAP) and this is reflected in the contributions, the research relating to the ICL DAP is relevant to all types of array processors. (UK)

  6. A Design of a New Column-Parallel Analog-to-Digital Converter Flash for Monolithic Active Pixel Sensor

    Directory of Open Access Journals (Sweden)

    Mostafa Chakir

    2017-01-01

    Full Text Available The CMOS Monolithic Active Pixel Sensor (MAPS for the International Linear Collider (ILC vertex detector (VXD expresses stringent requirements on their analog readout electronics, specifically on the analog-to-digital converter (ADC. This paper concerns designing and optimizing a new architecture of a low power, high speed, and small-area 4-bit column-parallel ADC Flash. Later in this study, we propose to interpose an S/H block in the converter. This integration of S/H block increases the sensitiveness of the converter to the very small amplitude of the input signal from the sensor and provides a sufficient time to the converter to be able to code the input signal. This ADC is developed in 0.18 μm CMOS process with a pixel pitch of 35 μm. The proposed ADC responds to the constraints of power dissipation, size, and speed for the MAPS composed of a matrix of 64 rows and 48 columns where each column ADC covers a small area of 35 × 336.76 μm2. The proposed ADC consumes low power at a 1.8 V supply and 100 MS/s sampling rate with dynamic range of 125 mV. Its DNL and INL are 0.0812/−0.0787 LSB and 0.0811/−0.0787 LSB, respectively. Furthermore, this ADC achieves a high speed more than 5 GHz.

  7. A 1,000 Frames/s Programmable Vision Chip with Variable Resolution and Row-Pixel-Mixed Parallel Image Processors

    Directory of Open Access Journals (Sweden)

    Nanjian Wu

    2009-07-01

    Full Text Available A programmable vision chip with variable resolution and row-pixel-mixed parallel image processors is presented. The chip consists of a CMOS sensor array, with row-parallel 6-bit Algorithmic ADCs, row-parallel gray-scale image processors, pixel-parallel SIMD Processing Element (PE array, and instruction controller. The resolution of the image in the chip is variable: high resolution for a focused area and low resolution for general view. It implements gray-scale and binary mathematical morphology algorithms in series to carry out low-level and mid-level image processing and sends out features of the image for various applications. It can perform image processing at over 1,000 frames/s (fps. A prototype chip with 64 × 64 pixels resolution and 6-bit gray-scale image is fabricated in 0.18 mm Standard CMOS process. The area size of chip is 1.5 mm × 3.5 mm. Each pixel size is 9.5 μm × 9.5 μm and each processing element size is 23 μm × 29 μm. The experiment results demonstrate that the chip can perform low-level and mid-level image processing and it can be applied in the real-time vision applications, such as high speed target tracking.

  8. XOP, a fast versatile processor, as a building block for parallel processing in high energy physics experiments

    International Nuclear Information System (INIS)

    Baehler, P.; Bosco, N.; Lingjaerde, T.; Ljuslin, C.; Van Praag, A.; Werner, P.

    1986-01-01

    The XOP processor has been designed for trigger calculation and data compression in high energy physics experiments. Therefore, emphasis has been placed upon fast execution and high input/output rate. The fast execution is achieved by a wide instruction word holding operations which are executed concurrently. Thus, the arithmetic operations, data address calculations, data accessing, condition checking, loop count checking and next instruction evaluation all overlap in time. In conventional micro-processors these operations are performed sequentially. In addition, the instruction set comprises not only the classical computer instructions, but also specialized instructions suitable for trigger calculations, such as bit search, population count, loose compare and vector instructions. In order to achieve a high input/output rate, each XOP ECLine interface board is equipped with an input and an output port which fulfil the LeCroy ECLine specifications. The autonomous input port allows a data rate of 40 Mbytes/sec, while the program controlled output port allows 20 Mbytes/sec. For Fastbus based systems a dual Fastbus master interface is under design which allows to build up a Fastbus multi-processor system. This design is being done in collaboration with LAPP Annecy for the CERN Lep L3 experiment. Their scheme comprises 4-5 XOP processors, each of them with a master interface on a data input segment and a master interface on a data output segment. This paper describes the structure of the XOP processor, the interface capabilities and the software development and debugging tools. (Auth.)

  9. A procedure for solving the neutron diffusion equation on a parallel micro-processor; modifications to the nodal expansion codes RECNEC and HEXNEC to implement the procedure

    International Nuclear Information System (INIS)

    Putney, J.M.

    1983-05-01

    The characteristics of a simple parallel micro-processor (PMP) are reviewed and its software requirements discussed. One of the more immediate applications is the multi-spatial simulation of a nuclear reactor station. This is of particular interest because 3D reactor simulation might then be possible as part of operating procedure for PFR and CDFR. A major part of a multi-spatial reactor simulator is the solution of the neutron diffusion equation. A procedure is described for solving the equation on a PMP, which is applied to the nodal expansion method with modifications to the nodal expansion codes RECNEC and HEXNEC. Estimations of the micro-processor requirements for the simulation of both PFR and CDFR are given. (U.K.)

  10. Neuron splitting in compute-bound parallel network simulations enables runtime scaling with twice as many processors.

    Science.gov (United States)

    Hines, Michael L; Eichner, Hubert; Schürmann, Felix

    2008-08-01

    Neuron tree topology equations can be split into two subtrees and solved on different processors with no change in accuracy, stability, or computational effort; communication costs involve only sending and receiving two double precision values by each subtree at each time step. Splitting cells is useful in attaining load balance in neural network simulations, especially when there is a wide range of cell sizes and the number of cells is about the same as the number of processors. For compute-bound simulations load balance results in almost ideal runtime scaling. Application of the cell splitting method to two published network models exhibits good runtime scaling on twice as many processors as could be effectively used with whole-cell balancing.

  11. Very Long Instruction Word Processors

    Indian Academy of Sciences (India)

    Pentium Processor have modified the processor architecture to exploit parallelism in a program. .... The type of operation itself is encoded using 14 bits. .... text of designing simple architectures with low power consump- tion and execute x86 ...

  12. Parallelization of applications for networks with homogeneous and heterogeneous processors; Parallelisation d`applications pour des reseaux de processeurs homogenes ou heterogenes

    Energy Technology Data Exchange (ETDEWEB)

    Colombet, L

    1994-10-07

    The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs.

  13. Optimization and parallelization of B-spline based orbital evaluations in QMC on multi/many-core shared memory processors

    OpenAIRE

    Mathuriya, Amrita; Luo, Ye; Benali, Anouar; Shulenburger, Luke; Kim, Jeongnim

    2016-01-01

    B-spline based orbital representations are widely used in Quantum Monte Carlo (QMC) simulations of solids, historically taking as much as 50% of the total run time. Random accesses to a large four-dimensional array make it challenging to efficiently utilize caches and wide vector units of modern CPUs. We present node-level optimizations of B-spline evaluations on multi/many-core shared memory processors. To increase SIMD efficiency and bandwidth utilization, we first apply data layout transfo...

  14. A Methodolgy, Based on Analytical Modeling, for the Design of Parallel and Distributed Architectures for Relational Database Query Processors.

    Science.gov (United States)

    1987-12-01

    Application Programs Intelligent Disk Database Controller Manangement System Operating System Host .1’ I% Figure 2. Intelligent Disk Controller Application...8217. /- - • Database Control -% Manangement System Disk Data Controller Application Programs Operating Host I"" Figure 5. Processor-Per- Head data. Therefore, the...However. these ad- ditional properties have been proven in classical set and relation theory [75]. These additional properties are described here

  15. An Energy-Efficient and Scalable Deep Learning/Inference Processor With Tetra-Parallel MIMD Architecture for Big Data Applications.

    Science.gov (United States)

    Park, Seong-Wook; Park, Junyoung; Bong, Kyeongryeol; Shin, Dongjoo; Lee, Jinmook; Choi, Sungpill; Yoo, Hoi-Jun

    2015-12-01

    Deep Learning algorithm is widely used for various pattern recognition applications such as text recognition, object recognition and action recognition because of its best-in-class recognition accuracy compared to hand-crafted algorithm and shallow learning based algorithms. Long learning time caused by its complex structure, however, limits its usage only in high-cost servers or many-core GPU platforms so far. On the other hand, the demand on customized pattern recognition within personal devices will grow gradually as more deep learning applications will be developed. This paper presents a SoC implementation to enable deep learning applications to run with low cost platforms such as mobile or portable devices. Different from conventional works which have adopted massively-parallel architecture, this work adopts task-flexible architecture and exploits multiple parallelism to cover complex functions of convolutional deep belief network which is one of popular deep learning/inference algorithms. In this paper, we implement the most energy-efficient deep learning and inference processor for wearable system. The implemented 2.5 mm × 4.0 mm deep learning/inference processor is fabricated using 65 nm 8-metal CMOS technology for a battery-powered platform with real-time deep inference and deep learning operation. It consumes 185 mW average power, and 213.1 mW peak power at 200 MHz operating frequency and 1.2 V supply voltage. It achieves 411.3 GOPS peak performance and 1.93 TOPS/W energy efficiency, which is 2.07× higher than the state-of-the-art.

  16. Numeric algorithms for parallel processors computer architectures with applications to the few-groups neutron diffusion equations

    International Nuclear Information System (INIS)

    Zee, S.K.

    1987-01-01

    A numeric algorithm and an associated computer code were developed for the rapid solution of the finite-difference method representation of the few-group neutron-diffusion equations on parallel computers. Applications of the numeric algorithm on both SIMD (vector pipeline) and MIMD/SIMD (multi-CUP/vector pipeline) architectures were explored. The algorithm was successfully implemented in the two-group, 3-D neutron diffusion computer code named DIFPAR3D (DIFfusion PARallel 3-Dimension). Numerical-solution techniques used in the code include the Chebyshev polynomial acceleration technique in conjunction with the power method of outer iteration. For inner iterations, a parallel form of red-black (cyclic) line SOR with automated determination of group dependent relaxation factors and iteration numbers required to achieve specified inner iteration error tolerance is incorporated. The code employs a macroscopic depletion model with trace capability for selected fission products' transients and critical boron. In addition to this, moderator and fuel temperature feedback models are also incorporated into the DIFPAR3D code, for realistic simulation of power reactor cores. The physics models used were proven acceptable in separate benchmarking studies

  17. Parallel Computation on Multicore Processors Using Explicit Form of the Finite Element Method and C++ Standard Libraries

    Directory of Open Access Journals (Sweden)

    Rek Václav

    2016-11-01

    Full Text Available In this paper, the form of modifications of the existing sequential code written in C or C++ programming language for the calculation of various kind of structures using the explicit form of the Finite Element Method (Dynamic Relaxation Method, Explicit Dynamics in the NEXX system is introduced. The NEXX system is the core of engineering software NEXIS, Scia Engineer, RFEM and RENEX. It has the possibilities of multithreaded running, which can now be supported at the level of native C++ programming language using standard libraries. Thanks to the high degree of abstraction that a contemporary C++ programming language provides, a respective library created in this way can be very generalized for other purposes of usage of parallelism in computational mechanics.

  18. Array processor architecture

    Science.gov (United States)

    Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

    1983-01-01

    A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.

  19. Monolithic spectrometer

    Energy Technology Data Exchange (ETDEWEB)

    Rajic, Slobodan (Knoxville, TN); Egert, Charles M. (Oak Ridge, TN); Kahl, William K. (Knoxville, TN); Snyder, Jr., William B. (Knoxville, TN); Evans, III, Boyd M. (Oak Ridge, TN); Marlar, Troy A. (Knoxville, TN); Cunningham, Joseph P. (Oak Ridge, TN)

    1998-01-01

    A monolithic spectrometer is disclosed for use in spectroscopy. The spectrometer is a single body of translucent material with positioned surfaces for the transmission, reflection and spectral analysis of light rays.

  20. Architectural design and analysis of a programmable image processor

    International Nuclear Information System (INIS)

    Siyal, M.Y.; Chowdhry, B.S.; Rajput, A.Q.K.

    2003-01-01

    In this paper we present an architectural design and analysis of a programmable image processor, nicknamed Snake. The processor was designed with a high degree of parallelism to speed up a range of image processing operations. Data parallelism found in array processors has been included into the architecture of the proposed processor. The implementation of commonly used image processing algorithms and their performance evaluation are also discussed. The performance of Snake is also compared with other types of processor architectures. (author)

  1. Multimode power processor

    Science.gov (United States)

    O'Sullivan, George A.; O'Sullivan, Joseph A.

    1999-01-01

    In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.

  2. Processors and systems (picture processing)

    Energy Technology Data Exchange (ETDEWEB)

    Gemmar, P

    1983-01-01

    Automatic picture processing requires high performance computers and high transmission capacities in the processor units. The author examines the possibilities of operating processors in parallel in order to accelerate the processing of pictures. He therefore discusses a number of available processors and systems for picture processing and illustrates their capacities for special types of picture processing. He stresses the fact that the amount of storage required for picture processing is exceptionally high. The author concludes that it is as yet difficult to decide whether very large groups of simple processors or highly complex multiprocessor systems will provide the best solution. Both methods will be aided by the development of VLSI. New solutions have already been offered (systolic arrays and 3-d processing structures) but they also are subject to losses caused by inherently parallel algorithms. Greater efforts must be made to produce suitable software for multiprocessor systems. Some possibilities for future picture processing systems are discussed. 33 references.

  3. Selection and integration of a network of parallel processors in the real time acquisition system of the 4π DIAMANT multidetector: modeling, realization and evaluation of the software installed on this network

    International Nuclear Information System (INIS)

    Guirande, F.

    1997-01-01

    The increase in sensitivity of 4π arrays such as EUROBALL or DIAMANT has led to an increase in the data flow rate into the data acquisition system. If at the electronic level, the data flow has been distributed onto several data acquisition buses, it is necessary in the data processing system to increase the processing power. This work regards the modelling and implementation of the software allocated onto an architecture of parallel processors. Object analysis and formal methods were used, benchmark and evolution in the future of this architecture are presented. The thesis consists of two parts. Part A, devoted to 'Nuclear Spectroscopy with 4 π multidetectors', contains a first chapter entitled 'The Physics of 4π multidetectors' and a second chapter entitled 'Integral architecture of 4π multidetectors'. Part B, devoted to 'Parallel acquisition system of DIAMANT' contains three chapters entitled 'Material architecture', 'Software architecture' and 'Validation and Performances'. Four appendices and a term glossary close this work. (author)

  4. Introduction to parallel programming

    CERN Document Server

    Brawer, Steven

    1989-01-01

    Introduction to Parallel Programming focuses on the techniques, processes, methodologies, and approaches involved in parallel programming. The book first offers information on Fortran, hardware and operating system models, and processes, shared memory, and simple parallel programs. Discussions focus on processes and processors, joining processes, shared memory, time-sharing with multiple processors, hardware, loops, passing arguments in function/subroutine calls, program structure, and arithmetic expressions. The text then elaborates on basic parallel programming techniques, barriers and race

  5. A video Hartmann wavefront diagnostic that incorporates a monolithic microlens array

    International Nuclear Information System (INIS)

    Toeppen, J.S.; Bliss, E.S.; Long, T.W.; Salmon, J.T.

    1991-07-01

    we have developed a video Hartmann wavefront sensor that incorporates a monolithic array of microlenses as the focusing elements. The sensor uses a monolithic array of photofabricated lenslets. Combined with a video processor, this system reveals local gradients of the wavefront at a video frame rate of 30 Hz. Higher bandwidth is easily attainable with a camera and video processor that have faster frame rates. When used with a temporal filter, the reconstructed wavefront error is less than 1/10th wave

  6. Parallel grid population

    Science.gov (United States)

    Wald, Ingo; Ize, Santiago

    2015-07-28

    Parallel population of a grid with a plurality of objects using a plurality of processors. One example embodiment is a method for parallel population of a grid with a plurality of objects using a plurality of processors. The method includes a first act of dividing a grid into n distinct grid portions, where n is the number of processors available for populating the grid. The method also includes acts of dividing a plurality of objects into n distinct sets of objects, assigning a distinct set of objects to each processor such that each processor determines by which distinct grid portion(s) each object in its distinct set of objects is at least partially bounded, and assigning a distinct grid portion to each processor such that each processor populates its distinct grid portion with any objects that were previously determined to be at least partially bounded by its distinct grid portion.

  7. Very Long Instruction Word Processors

    Indian Academy of Sciences (India)

    Explicitly Parallel Instruction Computing (EPIC) is an instruction processing paradigm that has been in the spot- light due to its adoption by the next generation of Intel. Processors starting with the IA-64. The EPIC processing paradigm is an evolution of the Very Long Instruction. Word (VLIW) paradigm. This article gives an ...

  8. Monoliths in Bioprocess Technology

    Directory of Open Access Journals (Sweden)

    Vignesh Rajamanickam

    2015-04-01

    Full Text Available Monolithic columns are a special type of chromatography column, which can be used for the purification of different biomolecules. They have become popular due to their high mass transfer properties and short purification times. Several articles have already discussed monolith manufacturing, as well as monolith characteristics. In contrast, this review focuses on the applied aspect of monoliths and discusses the most relevant biomolecules that can be successfully purified by them. We describe success stories for viruses, nucleic acids and proteins and compare them to conventional purification methods. Furthermore, the advantages of monolithic columns over particle-based resins, as well as the limitations of monoliths are discussed. With a compilation of commercially available monolithic columns, this review aims at serving as a ‘yellow pages’ for bioprocess engineers who face the challenge of purifying a certain biomolecule using monoliths.

  9. Invasive tightly coupled processor arrays

    CERN Document Server

    LARI, VAHID

    2016-01-01

    This book introduces new massively parallel computer (MPSoC) architectures called invasive tightly coupled processor arrays. It proposes strategies, architecture designs, and programming interfaces for invasive TCPAs that allow invading and subsequently executing loop programs with strict requirements or guarantees of non-functional execution qualities such as performance, power consumption, and reliability. For the first time, such a configurable processor array architecture consisting of locally interconnected VLIW processing elements can be claimed by programs, either in full or in part, using the principle of invasive computing. Invasive TCPAs provide unprecedented energy efficiency for the parallel execution of nested loop programs by avoiding any global memory access such as GPUs and may even support loops with complex dependencies such as loop-carried dependencies that are not amenable to parallel execution on GPUs. For this purpose, the book proposes different invasion strategies for claiming a desire...

  10. The LASS hardware processor

    International Nuclear Information System (INIS)

    Kunz, P.F.

    1976-01-01

    The problems of data analysis with hardware processors are reviewed and a description is given of a programmable processor. This processor, the 168/E, has been designed for use in the LASS multi-processor system; it has an execution speed comparable to the IBM 370/168 and uses the subset of IBM 370 instructions appropriate to the LASS analysis task. (Auth.)

  11. Design and evaluation of an architecture for a digital signal processor for instrumentation applications

    Science.gov (United States)

    Fellman, Ronald D.; Kaneshiro, Ronald T.; Konstantinides, Konstantinos

    1990-03-01

    The authors present the design and evaluation of an architecture for a monolithic, programmable, floating-point digital signal processor (DSP) for instrumentation applications. An investigation of the most commonly used algorithms in instrumentation led to a design that satisfies the requirements for high computational and I/O (input/output) throughput. In the arithmetic unit, a 16- x 16-bit multiplier and a 32-bit accumulator provide the capability for single-cycle multiply/accumulate operations, and three format adjusters automatically adjust the data format for increased accuracy and dynamic range. An on-chip I/O unit is capable of handling data block transfers through a direct memory access port and real-time data streams through a pair of parallel I/O ports. I/O operations and program execution are performed in parallel. In addition, the processor includes two data memories with independent addressing units, a microsequencer with instruction RAM, and multiplexers for internal data redirection. The authors also present the structure and implementation of a design environment suitable for the algorithmic, behavioral, and timing simulation of a complete DSP system. Various benchmarking results are reported.

  12. Algorithms for parallel computers

    International Nuclear Information System (INIS)

    Churchhouse, R.F.

    1985-01-01

    Until relatively recently almost all the algorithms for use on computers had been designed on the (usually unstated) assumption that they were to be run on single processor, serial machines. With the introduction of vector processors, array processors and interconnected systems of mainframes, minis and micros, however, various forms of parallelism have become available. The advantage of parallelism is that it offers increased overall processing speed but it also raises some fundamental questions, including: (i) which, if any, of the existing 'serial' algorithms can be adapted for use in the parallel mode. (ii) How close to optimal can such adapted algorithms be and, where relevant, what are the convergence criteria. (iii) How can we design new algorithms specifically for parallel systems. (iv) For multi-processor systems how can we handle the software aspects of the interprocessor communications. Aspects of these questions illustrated by examples are considered in these lectures. (orig.)

  13. A monolithic integrated photonic microwave filter

    Science.gov (United States)

    Fandiño, Javier S.; Muñoz, Pascual; Doménech, David; Capmany, José

    2017-02-01

    Meeting the increasing demand for capacity in wireless networks requires the harnessing of higher regions in the radiofrequency spectrum, reducing cell size, as well as more compact, agile and power-efficient base stations that are capable of smoothly interfacing the radio and fibre segments. Fully functional microwave photonic chips are promising candidates in attempts to meet these goals. In recent years, many integrated microwave photonic chips have been reported in different technologies. To the best of our knowledge, none has monolithically integrated all the main active and passive optoelectronic components. Here, we report the first demonstration of a tunable microwave photonics filter that is monolithically integrated into an indium phosphide chip. The reconfigurable radiofrequency photonic filter includes all the necessary elements (for example, lasers, modulators and photodetectors), and its response can be tuned by means of control electric currents. This is an important step in demonstrating the feasibility of integrated and programmable microwave photonic processors.

  14. High-speed special-purpose processor for event selection by number of direct tracks

    International Nuclear Information System (INIS)

    Kalinnikov, V.A.; Krastev, V.R.; Chudakov, E.A.

    1986-01-01

    A processor which uses data on events from five detector planes is described. To increase economy and speed in parallel processing, the processor converts the input data to superposition code and recognizes tracks by a generated search mask. The resolving time of the processor is ≤300 nsec. The processor is CAMAC-compatible and uses ECL integrated circuits

  15. Neurovision processor for designing intelligent sensors

    Science.gov (United States)

    Gupta, Madan M.; Knopf, George K.

    1992-03-01

    A programmable multi-task neuro-vision processor, called the Positive-Negative (PN) neural processor, is proposed as a plausible hardware mechanism for constructing robust multi-task vision sensors. The computational operations performed by the PN neural processor are loosely based on the neural activity fields exhibited by certain nervous tissue layers situated in the brain. The neuro-vision processor can be programmed to generate diverse dynamic behavior that may be used for spatio-temporal stabilization (STS), short-term visual memory (STVM), spatio-temporal filtering (STF) and pulse frequency modulation (PFM). A multi- functional vision sensor that performs a variety of information processing operations on time- varying two-dimensional sensory images can be constructed from a parallel and hierarchical structure of numerous individually programmed PN neural processors.

  16. Online track processor for the CDF upgrade

    International Nuclear Information System (INIS)

    Thomson, E. J.

    2002-01-01

    A trigger track processor, called the eXtremely Fast Tracker (XFT), has been designed for the CDF upgrade. This processor identifies high transverse momentum (> 1.5 GeV/c) charged particles in the new central outer tracking chamber for CDF II. The XFT design is highly parallel to handle the input rate of 183 Gbits/s and output rate of 44 Gbits/s. The processor is pipelined and reports the result for a new event every 132 ns. The processor uses three stages: hit classification, segment finding, and segment linking. The pattern recognition algorithms for the three stages are implemented in programmable logic devices (PLDs) which allow in-situ modification of the algorithm at any time. The PLDs reside on three different types of modules. The complete system has been installed and commissioned at CDF II. An overview of the track processor and performance in CDF Run II are presented

  17. Monolithic exploding foil initiator

    Science.gov (United States)

    Welle, Eric J; Vianco, Paul T; Headley, Paul S; Jarrell, Jason A; Garrity, J. Emmett; Shelton, Keegan P; Marley, Stephen K

    2012-10-23

    A monolithic exploding foil initiator (EFI) or slapper detonator and the method for making the monolithic EFI wherein the exploding bridge and the dielectric from which the flyer will be generated are integrated directly onto the header. In some embodiments, the barrel is directly integrated directly onto the header.

  18. Second derivative parallel block backward differentiation type ...

    African Journals Online (AJOL)

    Second derivative parallel block backward differentiation type formulas for Stiff ODEs. ... Log in or Register to get access to full text downloads. ... and the methods are inherently parallel and can be distributed over parallel processors. They are ...

  19. VIRTUS: a multi-processor system in FASTBUS

    International Nuclear Information System (INIS)

    Ellett, J.; Jackson, R.; Ritter, R.; Schlein, P.; Yaeger, D.; Zweizig, J.

    1986-01-01

    VIRTUS is a system of parallel MC68000-based processors interconnected by FASTBUS that is used either on-line as an intelligent trigger component or off-line for full event processing. Each processor receives the complete set of data from one event. The host computer, a VAX 11/780, down-line loads all software to the processors, controls and monitors the functioning of all processors, and writes processed data to tape. Instructions, programs, and data are transferred among the processors and the host in the form of fixed format, variable length data blocks. (Auth.)

  20. Joint Experimentation on Scalable Parallel Processors (JESPP)

    National Research Council Canada - National Science Library

    Davis, Dan M; Lucas, Robert F; Yao, Ka-Thia; Wagenbrath, Gene

    2006-01-01

    ...) required expansion of its joint semi-automated forces (JSAF) code capabilities; including number of entities, behavior complexity, terrain resolution, infrastructure features, environmental realism, and analytical potential...

  1. Probabilistic programmable quantum processors

    International Nuclear Information System (INIS)

    Buzek, V.; Ziman, M.; Hillery, M.

    2004-01-01

    We analyze how to improve performance of probabilistic programmable quantum processors. We show how the probability of success of the probabilistic processor can be enhanced by using the processor in loops. In addition, we show that an arbitrary SU(2) transformations of qubits can be encoded in program state of a universal programmable probabilistic quantum processor. The probability of success of this processor can be enhanced by a systematic correction of errors via conditional loops. Finally, we show that all our results can be generalized also for qudits. (Abstract Copyright [2004], Wiley Periodicals, Inc.)

  2. Parallel reservoir simulator computations

    International Nuclear Information System (INIS)

    Hemanth-Kumar, K.; Young, L.C.

    1995-01-01

    The adaptation of a reservoir simulator for parallel computations is described. The simulator was originally designed for vector processors. It performs approximately 99% of its calculations in vector/parallel mode and relative to scalar calculations it achieves speedups of 65 and 81 for black oil and EOS simulations, respectively on the CRAY C-90

  3. Acoustooptic linear algebra processors - Architectures, algorithms, and applications

    Science.gov (United States)

    Casasent, D.

    1984-01-01

    Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.

  4. Embedded Processor Laboratory

    Data.gov (United States)

    Federal Laboratory Consortium — The Embedded Processor Laboratory provides the means to design, develop, fabricate, and test embedded computers for missile guidance electronics systems in support...

  5. Multithreading in vector processors

    Science.gov (United States)

    Evangelinos, Constantinos; Kim, Changhoan; Nair, Ravi

    2018-01-16

    In one embodiment, a system includes a processor having a vector processing mode and a multithreading mode. The processor is configured to operate on one thread per cycle in the multithreading mode. The processor includes a program counter register having a plurality of program counters, and the program counter register is vectorized. Each program counter in the program counter register represents a distinct corresponding thread of a plurality of threads. The processor is configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle.

  6. Parallel Programming with Intel Parallel Studio XE

    CERN Document Server

    Blair-Chappell , Stephen

    2012-01-01

    Optimize code for multi-core processors with Intel's Parallel Studio Parallel programming is rapidly becoming a "must-know" skill for developers. Yet, where to start? This teach-yourself tutorial is an ideal starting point for developers who already know Windows C and C++ and are eager to add parallelism to their code. With a focus on applying tools, techniques, and language extensions to implement parallelism, this essential resource teaches you how to write programs for multicore and leverage the power of multicore in your programs. Sharing hands-on case studies and real-world examples, the

  7. Broadcasting collective operation contributions throughout a parallel computer

    Science.gov (United States)

    Faraj, Ahmad [Rochester, MN

    2012-02-21

    Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.

  8. Selection and integration of a network of parallel processors in the real time acquisition system of the 4{pi} DIAMANT multidetector: modeling, realization and evaluation of the software installed on this network; Choix et integration d`un reseau de processeurs paralleles dans le systeme d`acquisition temps reel du multidetecteur 4{pi} DIAMANT: modelisation, realisation et evaluation du logiciel implante sur ce reseau

    Energy Technology Data Exchange (ETDEWEB)

    Guirande, F. [Ecole Doctorale des Sciences Physiques et de l`Ingenieur, Bordeaux-1 Univ., 33 (France)

    1997-07-11

    The increase in sensitivity of 4{pi} arrays such as EUROBALL or DIAMANT has led to an increase in the data flow rate into the data acquisition system. If at the electronic level, the data flow has been distributed onto several data acquisition buses, it is necessary in the data processing system to increase the processing power. This work regards the modelling and implementation of the software allocated onto an architecture of parallel processors. Object analysis and formal methods were used, benchmark and evolution in the future of this architecture are presented. The thesis consists of two parts. Part A, devoted to `Nuclear Spectroscopy with 4 {pi} multidetectors`, contains a first chapter entitled `The Physics of 4{pi} multidetectors` and a second chapter entitled `Integral architecture of 4{pi} multidetectors`. Part B, devoted to `Parallel acquisition system of DIAMANT` contains three chapters entitled `Material architecture`, `Software architecture` and `Validation and Performances`. Four appendices and a term glossary close this work. (author) 58 refs.

  9. M7--a high speed digital processor for second level trigger selections

    International Nuclear Information System (INIS)

    Droege, T.F.; Gaines, I.; Turner, K.J.

    1978-01-01

    A digital processor is described which reconstructs mass and momentum as a second-level trigger selection. The processor is a five-address, microprogramed, pipelined, ECL machine with simultaneous memory access to four operands which load two parallel multipliers and an ALU. Source data modules are extensions of the processor

  10. RISC Processors and High Performance Computing

    Science.gov (United States)

    Bailey, David H.; Saini, Subhash; Craw, James M. (Technical Monitor)

    1995-01-01

    This tutorial will discuss the top five RISC microprocessors and the parallel systems in which they are used. It will provide a unique cross-machine comparison not available elsewhere. The effective performance of these processors will be compared by citing standard benchmarks in the context of real applications. The latest NAS Parallel Benchmarks, both absolute performance and performance per dollar, will be listed. The next generation of the NPB will be described. The tutorial will conclude with a discussion of future directions in the field. Technology Transfer Considerations: All of these computer systems are commercially available internationally. Information about these processors is available in the public domain, mostly from the vendors themselves. The NAS Parallel Benchmarks and their results have been previously approved numerous times for public release, beginning back in 1991.

  11. Fibrous monolithic ceramics

    International Nuclear Information System (INIS)

    Kovar, D.; King, B.H.; Trice, R.W.; Halloran, J.W.

    1997-01-01

    Fibrous monolithic ceramics are an example of a laminate in which a controlled, three-dimensional structure has been introduced on a submillimeter scale. This unique structure allows this all-ceramic material to fail in a nonbrittle manner. Materials have been fabricated and tested with a variety of architectures. The influence on mechanical properties at room temperature and at high temperature of the structure of the constituent phases and the architecture in which they are arranged are discussed. The elastic properties of these materials can be effectively predicted using existing models. These models also can be extended to predict the strength of fibrous monoliths with an arbitrary orientation and architecture. However, the mechanisms that govern the energy absorption capacity of fibrous monoliths are unique, and experimental results do not follow existing models. Energy dissipation occurs through two dominant mechanisms--delamination of the weak interphases and then frictional sliding after cracking occurs. The properties of the constituent phases that maximize energy absorption are discussed. In this article, the authors examine the structure of Si 3 N 4 -BN fibrous monoliths from the submillimeter scale of the crack-deflecting cell-cell boundary features to the nanometer scale of the BN cell boundaries

  12. Integrated fuel processor development

    International Nuclear Information System (INIS)

    Ahmed, S.; Pereira, C.; Lee, S. H. D.; Krumpelt, M.

    2001-01-01

    The Department of Energy's Office of Advanced Automotive Technologies has been supporting the development of fuel-flexible fuel processors at Argonne National Laboratory. These fuel processors will enable fuel cell vehicles to operate on fuels available through the existing infrastructure. The constraints of on-board space and weight require that these fuel processors be designed to be compact and lightweight, while meeting the performance targets for efficiency and gas quality needed for the fuel cell. This paper discusses the performance of a prototype fuel processor that has been designed and fabricated to operate with liquid fuels, such as gasoline, ethanol, methanol, etc. Rated for a capacity of 10 kWe (one-fifth of that needed for a car), the prototype fuel processor integrates the unit operations (vaporization, heat exchange, etc.) and processes (reforming, water-gas shift, preferential oxidation reactions, etc.) necessary to produce the hydrogen-rich gas (reformate) that will fuel the polymer electrolyte fuel cell stacks. The fuel processor work is being complemented by analytical and fundamental research. With the ultimate objective of meeting on-board fuel processor goals, these studies include: modeling fuel cell systems to identify design and operating features; evaluating alternative fuel processing options; and developing appropriate catalysts and materials. Issues and outstanding challenges that need to be overcome in order to develop practical, on-board devices are discussed

  13. Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

    Science.gov (United States)

    Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

    2017-11-01

    Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.

  14. Control structures for high speed processors

    Science.gov (United States)

    Maki, G. K.; Mankin, R.; Owsley, P. A.; Kim, G. M.

    1982-01-01

    A special processor was designed to function as a Reed Solomon decoder with throughput data rate in the Mhz range. This data rate is significantly greater than is possible with conventional digital architectures. To achieve this rate, the processor design includes sequential, pipelined, distributed, and parallel processing. The processor was designed using a high level language register transfer language. The RTL can be used to describe how the different processes are implemented by the hardware. One problem of special interest was the development of dependent processes which are analogous to software subroutines. For greater flexibility, the RTL control structure was implemented in ROM. The special purpose hardware required approximately 1000 SSI and MSI components. The data rate throughput is 2.5 megabits/second. This data rate is achieved through the use of pipelined and distributed processing. This data rate can be compared with 800 kilobits/second in a recently proposed very large scale integration design of a Reed Solomon encoder.

  15. Logistic Fuel Processor Development

    National Research Council Canada - National Science Library

    Salavani, Reza

    2004-01-01

    ... to light gases then steam reform the light gases into hydrogen rich stream. This report documents the efforts in developing a fuel processor capable of providing hydrogen to a 3kW fuel cell stack...

  16. 3081/E processor

    International Nuclear Information System (INIS)

    Kunz, P.F.; Gravina, M.; Oxoby, G.

    1984-04-01

    The 3081/E project was formed to prepare a much improved IBM mainframe emulator for the future. Its design is based on a large amount of experience in using the 168/E processor to increase available CPU power in both online and offline environments. The processor will be at least equal to the execution speed of a 370/168 and up to 1.5 times faster for heavy floating point code. A single processor will thus be at least four times more powerful than the VAX 11/780, and five processors on a system would equal at least the performance of the IBM 3081K. With its large memory space and simple but flexible high speed interface, the 3081/E is well suited for the online and offline needs of high energy physics in the future

  17. Logistic Fuel Processor Development

    National Research Council Canada - National Science Library

    Salavani, Reza

    2004-01-01

    The Air Base Technologies Division of the Air Force Research Laboratory has developed a logistic fuel processor that removes the sulfur content of the fuel and in the process converts logistic fuel...

  18. Adaptive signal processor

    Energy Technology Data Exchange (ETDEWEB)

    Walz, H.V.

    1980-07-01

    An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 ..mu..sec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed.

  19. Adaptive signal processor

    International Nuclear Information System (INIS)

    Walz, H.V.

    1980-07-01

    An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 μsec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed

  20. Parallelization of 2-D lattice Boltzmann codes

    International Nuclear Information System (INIS)

    Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo.

    1996-03-01

    Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author)

  1. Parallelization of 2-D lattice Boltzmann codes

    Energy Technology Data Exchange (ETDEWEB)

    Suzuki, Soichiro; Kaburaki, Hideo; Yokokawa, Mitsuo

    1996-03-01

    Lattice Boltzmann (LB) codes to simulate two dimensional fluid flow are developed on vector parallel computer Fujitsu VPP500 and scalar parallel computer Intel Paragon XP/S. While a 2-D domain decomposition method is used for the scalar parallel LB code, a 1-D domain decomposition method is used for the vector parallel LB code to be vectorized along with the axis perpendicular to the direction of the decomposition. High parallel efficiency of 95.1% by the vector parallel calculation on 16 processors with 1152x1152 grid and 88.6% by the scalar parallel calculation on 100 processors with 800x800 grid are obtained. The performance models are developed to analyze the performance of the LB codes. It is shown by our performance models that the execution speed of the vector parallel code is about one hundred times faster than that of the scalar parallel code with the same number of processors up to 100 processors. We also analyze the scalability in keeping the available memory size of one processor element at maximum. Our performance model predicts that the execution time of the vector parallel code increases about 3% on 500 processors. Although the 1-D domain decomposition method has in general a drawback in the interprocessor communication, the vector parallel LB code is still suitable for the large scale and/or high resolution simulations. (author).

  2. Functional unit for a processor

    NARCIS (Netherlands)

    Rohani, A.; Kerkhoff, Hans G.

    2013-01-01

    The invention relates to a functional unit for a processor, such as a Very Large Instruction Word Processor. The invention further relates to a processor comprising at least one such functional unit. The invention further relates to a functional unit and processor capable of mitigating the effect of

  3. Monolith electroplating process

    Science.gov (United States)

    Agarrwal, Rajev R.

    2001-01-01

    An electroplating process for preparing a monolith metal layer over a polycrystalline base metal and the plated monolith product. A monolith layer has a variable thickness of one crystal. The process is typically carried in molten salts electrolytes, such as the halide salts under an inert atmosphere at an elevated temperature, and over deposition time periods and film thickness sufficient to sinter and recrystallize completely the nucleating metal particles into one single crystal or crystals having very large grains. In the process, a close-packed film of submicron particle (20) is formed on a suitable substrate at an elevated temperature. The temperature has the significance of annealing particles as they are formed, and substrates on which the particles can populate are desirable. As the packed bed thickens, the submicron particles develop necks (21) and as they merge into each other shrinkage (22) occurs. Then as micropores also close (23) by surface tension, metal density is reached and the film consists of unstable metal grain (24) that at high enough temperature recrystallize (25) and recrystallized grains grow into an annealed single crystal over the electroplating time span. While cadmium was used in the experimental work, other soft metals may be used.

  4. A parallel buffer tree

    DEFF Research Database (Denmark)

    Sitchinava, Nodar; Zeh, Norbert

    2012-01-01

    We present the parallel buffer tree, a parallel external memory (PEM) data structure for batched search problems. This data structure is a non-trivial extension of Arge's sequential buffer tree to a private-cache multiprocessor environment and reduces the number of I/O operations by the number of...... in the optimal OhOf(psortN + K/PB) parallel I/O complexity, where K is the size of the output reported in the process and psortN is the parallel I/O complexity of sorting N elements using P processors....

  5. Parallel External Memory Graph Algorithms

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

    2010-01-01

    In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of ¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....

  6. High-Speed General Purpose Genetic Algorithm Processor.

    Science.gov (United States)

    Hoseini Alinodehi, Seyed Pourya; Moshfe, Sajjad; Saber Zaeimian, Masoumeh; Khoei, Abdollah; Hadidi, Khairollah

    2016-07-01

    In this paper, an ultrafast steady-state genetic algorithm processor (GAP) is presented. Due to the heavy computational load of genetic algorithms (GAs), they usually take a long time to find optimum solutions. Hardware implementation is a significant approach to overcome the problem by speeding up the GAs procedure. Hence, we designed a digital CMOS implementation of GA in [Formula: see text] process. The proposed processor is not bounded to a specific application. Indeed, it is a general-purpose processor, which is capable of performing optimization in any possible application. Utilizing speed-boosting techniques, such as pipeline scheme, parallel coarse-grained processing, parallel fitness computation, parallel selection of parents, dual-population scheme, and support for pipelined fitness computation, the proposed processor significantly reduces the processing time. Furthermore, by relying on a built-in discard operator the proposed hardware may be used in constrained problems that are very common in control applications. In the proposed design, a large search space is achievable through the bit string length extension of individuals in the genetic population by connecting the 32-bit GAPs. In addition, the proposed processor supports parallel processing, in which the GAs procedure can be run on several connected processors simultaneously.

  7. Practical parallel programming

    CERN Document Server

    Bauer, Barr E

    2014-01-01

    This is the book that will teach programmers to write faster, more efficient code for parallel processors. The reader is introduced to a vast array of procedures and paradigms on which actual coding may be based. Examples and real-life simulations using these devices are presented in C and FORTRAN.

  8. 3081//sub E/ processor

    International Nuclear Information System (INIS)

    Kunz, P.F.; Gravina, M.; Oxoby, G.; Trang, Q.; Fucci, A.; Jacobs, D.; Martin, B.; Storr, K.

    1983-03-01

    Since the introduction of the 168//sub E/, emulating processors have been successful over an amazingly wide range of applications. This paper will describe a second generation processor, the 3081//sub E/. This new processor, which is being developed as a collaboration between SLAC and CERN, goes beyond just fixing the obvious faults of the 168//sub E/. Not only will the 3081//sub E/ have much more memory space, incorporate many more IBM instructions, and have much more memory space, incorporate many more IBM instructions, and have full double precision floating point arithmetic, but it will also have faster execution times and be much simpler to build, debug, and maintain. The simple interface and reasonable cost of the 168//sub E/ will be maintained for the 3081//sub E/

  9. APRON: A Cellular Processor Array Simulation and Hardware Design Tool

    Science.gov (United States)

    Barr, David R. W.; Dudek, Piotr

    2009-12-01

    We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.

  10. APRON: A Cellular Processor Array Simulation and Hardware Design Tool

    Directory of Open Access Journals (Sweden)

    David R. W. Barr

    2009-01-01

    Full Text Available We present a software environment for the efficient simulation of cellular processor arrays (CPAs. This software (APRON is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.

  11. High-speed packet filtering utilizing stream processors

    Science.gov (United States)

    Hummel, Richard J.; Fulp, Errin W.

    2009-04-01

    Parallel firewalls offer a scalable architecture for the next generation of high-speed networks. While these parallel systems can be implemented using multiple firewalls, the latest generation of stream processors can provide similar benefits with a significantly reduced latency due to locality. This paper describes how the Cell Broadband Engine (CBE), a popular stream processor, can be used as a high-speed packet filter. Results show the CBE can potentially process packets arriving at a rate of 1 Gbps with a latency less than 82 μ-seconds. Performance depends on how well the packet filtering process is translated to the unique stream processor architecture. For example the method used for transmitting data and control messages among the pseudo-independent processor cores has a significant impact on performance. Experimental results will also show the current limitations of a CBE operating system when used to process packets. Possible solutions to these issues will be discussed.

  12. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    Science.gov (United States)

    Hristov, Ivan; Goranov, Goran; Hristova, Radoslava

    2018-02-01

    We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named "Ivy Bridge-EP") in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named "Knights Landing" (KNL). The results show 2 times better performance on KNL processor.

  13. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    Directory of Open Access Journals (Sweden)

    Hristov Ivan

    2018-01-01

    Full Text Available We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named “Ivy Bridge-EP” in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named “Knights Landing” (KNL. The results show 2 times better performance on KNL processor.

  14. Porous polymer monolithic col

    Directory of Open Access Journals (Sweden)

    Lydia Terborg

    2015-05-01

    Full Text Available A new approach has been developed for the preparation of mixed-mode stationary phases to separate proteins. The pore surface of monolithic poly(glycidyl methacrylate-co-ethylene dimethacrylate capillary columns was functionalized with thiols and coated with gold nanoparticles. The final mixed mode surface chemistry was formed by attaching, in a single step, alkanethiols, mercaptoalkanoic acids, and their mixtures on the free surface of attached gold nanoparticles. Use of these mixtures allowed fine tuning of the hydrophobic/hydrophilic balance. The amount of attached gold nanoparticles according to thermal gravimetric analysis was 44.8 wt.%. This value together with results of frontal elution enabled calculation of surface coverage with the alkanethiol and mercaptoalkanoic acid ligands. Interestingly, alkanethiols coverage in a range of 4.46–4.51 molecules/nm2 significantly exceeded that of mercaptoalkanoic acids with 2.39–2.45 molecules/nm2. The mixed mode character of these monolithic stationary phases was for the first time demonstrated in the separations of proteins that could be achieved in the same column using gradient elution conditions typical of reverse phase (using gradient of acetonitrile in water and ion exchange chromatographic modes (applying gradient of salt in water, respectively.

  15. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2008-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chip. This means that parallel processing is required in application areas that traditionally have not used...

  16. Parallelism and Scalability in an Image Processing Application

    DEFF Research Database (Denmark)

    Rasmussen, Morten Sleth; Stuart, Matthias Bo; Karlsson, Sven

    2009-01-01

    parallel programs. This paper investigates parallelism and scalability of an embedded image processing application. The major challenges faced when parallelizing the application were to extract enough parallelism from the application and to reduce load imbalance. The application has limited immediately......The recent trends in processor architecture show that parallel processing is moving into new areas of computing in the form of many-core desktop processors and multi-processor system-on-chips. This means that parallel processing is required in application areas that traditionally have not used...

  17. The Central Trigger Processor (CTP)

    CERN Multimedia

    Franchini, Matteo

    2016-01-01

    The Central Trigger Processor (CTP) receives trigger information from the calorimeter and muon trigger processors, as well as from other sources of trigger. It makes the Level-1 decision (L1A) based on a trigger menu.

  18. Speedup predictions on large scientific parallel programs

    International Nuclear Information System (INIS)

    Williams, E.; Bobrowicz, F.

    1985-01-01

    How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory

  19. Systematic approach for deriving feasible mappings of parallel algorithms to parallel computing platforms

    NARCIS (Netherlands)

    Arkin, Ethem; Tekinerdogan, Bedir; Imre, Kayhan M.

    2017-01-01

    The need for high-performance computing together with the increasing trend from single processor to parallel computer architectures has leveraged the adoption of parallel computing. To benefit from parallel computing power, usually parallel algorithms are defined that can be mapped and executed

  20. Is Monte Carlo embarrassingly parallel?

    Energy Technology Data Exchange (ETDEWEB)

    Hoogenboom, J. E. [Delft Univ. of Technology, Mekelweg 15, 2629 JB Delft (Netherlands); Delft Nuclear Consultancy, IJsselzoom 2, 2902 LB Capelle aan den IJssel (Netherlands)

    2012-07-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  1. Is Monte Carlo embarrassingly parallel?

    International Nuclear Information System (INIS)

    Hoogenboom, J. E.

    2012-01-01

    Monte Carlo is often stated as being embarrassingly parallel. However, running a Monte Carlo calculation, especially a reactor criticality calculation, in parallel using tens of processors shows a serious limitation in speedup and the execution time may even increase beyond a certain number of processors. In this paper the main causes of the loss of efficiency when using many processors are analyzed using a simple Monte Carlo program for criticality. The basic mechanism for parallel execution is MPI. One of the bottlenecks turn out to be the rendez-vous points in the parallel calculation used for synchronization and exchange of data between processors. This happens at least at the end of each cycle for fission source generation in order to collect the full fission source distribution for the next cycle and to estimate the effective multiplication factor, which is not only part of the requested results, but also input to the next cycle for population control. Basic improvements to overcome this limitation are suggested and tested. Also other time losses in the parallel calculation are identified. Moreover, the threading mechanism, which allows the parallel execution of tasks based on shared memory using OpenMP, is analyzed in detail. Recommendations are given to get the maximum efficiency out of a parallel Monte Carlo calculation. (authors)

  2. The Molen Polymorphic Media Processor

    NARCIS (Netherlands)

    Kuzmanov, G.K.

    2004-01-01

    In this dissertation, we address high performance media processing based on a tightly coupled co-processor architectural paradigm. More specifically, we introduce a reconfigurable media augmentation of a general purpose processor and implement it into a fully operational processor prototype. The

  3. Dual-core Itanium Processor

    CERN Multimedia

    2006-01-01

    Intel’s first dual-core Itanium processor, code-named "Montecito" is a major release of Intel's Itanium 2 Processor Family, which implements the Intel Itanium architecture on a dual-core processor with two cores per die (integrated circuit). Itanium 2 is much more powerful than its predecessor. It has lower power consumption and thermal dissipation.

  4. Monolitni katalizatori i reaktori: osnovne značajke, priprava i primjena (Monolith catalysts and reactors: preparation and applications

    Directory of Open Access Journals (Sweden)

    Tomašić, V.

    2004-12-01

    Full Text Available Monolithic (honeycomb catalysts are continuous unitary structures containing many narrow, parallel and usually straight channels (or passages. Catalytically active components are dispersed uniformly over the whole porous ceramic monolith structure (so-called incorporated monolithic catalysts or are in a layer of porous material that is deposited on the walls of channels in the monolith's structure (washcoated monolithic catalysts. The material of the main monolithic construction is not limited to ceramics but includes metals, as well. Monolithic catalysts are commonly used in gas phase catalytic processes, such as treatment of automotive exhaust gases, selective catalytic reduction of nitrogen oxides, catalytic removal of volatile organic compounds from industrial processes, etc. Monoliths continue to be the preferred support for environmental applications due to their high geometric surface area, different design options, low pressure drop, high temperature durability, mechanical strength, ease of orientation in a reactor and effectiveness as a support for a catalytic washcoat. As known, monolithic catalysts belong to the class of the structured catalysts and/or reactors (in some cases the distinction between "catalyst" and "reactor" has vanished. Structured catalysts can greatly intensify chemical processes, resulting in smaller, safer, cleaner and more energy efficient technologies. Monolith reactors can be considered as multifunctional reactors, in which chemical conversion is advantageously integrated with another unit operation, such as separation, heat exchange, a secondary reaction, etc. Finally, structured catalysts and/or reactors appear to be one of the most significant and promising developments in the field of heterogeneous catalysis and chemical engineering of the recent years. This paper gives a description of the background and perspectives for application and development of monolithic materials. Different methods and techniques

  5. Nested dissection on a mesh-connected processor array

    International Nuclear Information System (INIS)

    Worley, P.H.; Schreiber, R.

    1986-01-01

    The authors present a parallel implementation of Gaussian elimination without pivoting using the nested dissection ordering for solving Ax=b where A is an N x N symmetric positive definite matrix. If the graph of A is a √N x √N finite element mesh then a parallel complexity of O(√N) can be achieved for Gaussian elimination with the nested dissection ordering. The authors' implementation achieves this parallel complexity on a two dimensional MIMD processor array with N processors and nearest neighbors interconnections. Thus nested dissection is a near optimal algorithm for this problem on this interconnection topology. The parallel implementation on this architecture requires 158√N + O(log/sub 2/(√N)) parallel floating point multiplications. It is faster than a Kung-Leiserson systolic array for banded matrices for N≥961, and faster than a serial implementation for N as small as 9

  6. Parallelization of quantum molecular dynamics simulation code

    International Nuclear Information System (INIS)

    Kato, Kaori; Kunugi, Tomoaki; Shibahara, Masahiko; Kotake, Susumu

    1998-02-01

    A quantum molecular dynamics simulation code has been developed for the analysis of the thermalization of photon energies in the molecule or materials in Kansai Research Establishment. The simulation code is parallelized for both Scalar massively parallel computer (Intel Paragon XP/S75) and Vector parallel computer (Fujitsu VPP300/12). Scalable speed-up has been obtained with a distribution to processor units by division of particle group in both parallel computers. As a result of distribution to processor units not only by particle group but also by the particles calculation that is constructed with fine calculations, highly parallelization performance is achieved in Intel Paragon XP/S75. (author)

  7. Parallel computing, failure recovery, and extreme values

    DEFF Research Database (Denmark)

    Andersen, Lars Nørvang; Asmussen, Søren

    A task of random size T is split into M subtasks of lengths T1, . . . , TM, each of which is sent to one out of M parallel processors. Each processor may fail at a random time before completing its allocated task, and then has to restart it from the beginning. If X1, . . . ,XM are the total task ...

  8. An Introduction to Parallel Computation R

    Indian Academy of Sciences (India)

    How are they programmed? This article provides an introduction. A parallel computer is a network of processors built for ... and have been used to solve problems much faster than a single ... in parallel computer design is to select an organization which ..... The most ambitious approach to parallel computing is to develop.

  9. Massively parallel mathematical sieves

    Energy Technology Data Exchange (ETDEWEB)

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  10. Adapting algorithms to massively parallel hardware

    CERN Document Server

    Sioulas, Panagiotis

    2016-01-01

    In the recent years, the trend in computing has shifted from delivering processors with faster clock speeds to increasing the number of cores per processor. This marks a paradigm shift towards parallel programming in which applications are programmed to exploit the power provided by multi-cores. Usually there is gain in terms of the time-to-solution and the memory footprint. Specifically, this trend has sparked an interest towards massively parallel systems that can provide a large number of processors, and possibly computing nodes, as in the GPUs and MPPAs (Massively Parallel Processor Arrays). In this project, the focus was on two distinct computing problems: k-d tree searches and track seeding cellular automata. The goal was to adapt the algorithms to parallel systems and evaluate their performance in different cases.

  11. Parallelization of the FLAPW method

    International Nuclear Information System (INIS)

    Canning, A.; Mannstadt, W.; Freeman, A.J.

    1999-01-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about one hundred atoms due to a lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel computer

  12. Parallelization of the FLAPW method

    Science.gov (United States)

    Canning, A.; Mannstadt, W.; Freeman, A. J.

    2000-08-01

    The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.

  13. A parallelization study of the general purpose Monte Carlo code MCNP4 on a distributed memory highly parallel computer

    International Nuclear Information System (INIS)

    Yamazaki, Takao; Fujisaki, Masahide; Okuda, Motoi; Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka

    1993-01-01

    The general purpose Monte Carlo code MCNP4 has been implemented on the Fujitsu AP1000 distributed memory highly parallel computer. Parallelization techniques developed and studied are reported. A shielding analysis function of the MCNP4 code is parallelized in this study. A technique to map a history to each processor dynamically and to map control process to a certain processor was applied. The efficiency of parallelized code is up to 80% for a typical practical problem with 512 processors. These results demonstrate the advantages of a highly parallel computer to the conventional computers in the field of shielding analysis by Monte Carlo method. (orig.)

  14. Algorithmically specialized parallel computers

    CERN Document Server

    Snyder, Lawrence; Gannon, Dennis B

    1985-01-01

    Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster

  15. Video frame processor

    International Nuclear Information System (INIS)

    Joshi, V.M.; Agashe, Alok; Bairi, B.R.

    1993-01-01

    This report provides technical description regarding the Video Frame Processor (VFP) developed at Bhabha Atomic Research Centre. The instrument provides capture of video images available in CCIR format. Two memory planes each with a capacity of 512 x 512 x 8 bit data enable storage of two video image frames. The stored image can be processed on-line and on-line image subtraction can also be carried out for image comparisons. The VFP is a PC Add-on board and is I/O mapped within the host IBM PC/AT compatible computer. (author). 9 refs., 4 figs., 19 photographs

  16. Trigger and decision processors

    International Nuclear Information System (INIS)

    Franke, G.

    1980-11-01

    In recent years there have been many attempts in high energy physics to make trigger and decision processes faster and more sophisticated. This became necessary due to a permanent increase of the number of sensitive detector elements in wire chambers and calorimeters, and in fact it was possible because of the fast developments in integrated circuits technique. In this paper the present situation will be reviewed. The discussion will be mainly focussed upon event filtering by pure software methods and - rather hardware related - microprogrammable processors as well as random access memory triggers. (orig.)

  17. Optical Finite Element Processor

    Science.gov (United States)

    Casasent, David; Taylor, Bradley K.

    1986-01-01

    A new high-accuracy optical linear algebra processor (OLAP) with many advantageous features is described. It achieves floating point accuracy, handles bipolar data by sign-magnitude representation, performs LU decomposition using only one channel, easily partitions and considers data flow. A new application (finite element (FE) structural analysis) for OLAPs is introduced and the results of a case study presented. Error sources in encoded OLAPs are addressed for the first time. Their modeling and simulation are discussed and quantitative data are presented. Dominant error sources and the effects of composite error sources are analyzed.

  18. Parallelization of MCNP Monte Carlo neutron and photon transport code in parallel virtual machine and message passing interface

    International Nuclear Information System (INIS)

    Deng Li; Xie Zhongsheng

    1999-01-01

    The coupled neutron and photon transport Monte Carlo code MCNP (version 3B) has been parallelized in parallel virtual machine (PVM) and message passing interface (MPI) by modifying a previous serial code. The new code has been verified by solving sample problems. The speedup increases linearly with the number of processors and the average efficiency is up to 99% for 12-processor. (author)

  19. A Scalable Parallel PWTD-Accelerated SIE Solver for Analyzing Transient Scattering from Electrically Large Objects

    KAUST Repository

    Liu, Yang; Yucel, Abdulkadir; Bagci, Hakan; Michielssen, Eric

    2015-01-01

    of processors by leveraging two mechanisms: (i) a hierarchical parallelization strategy to evenly distribute the computation and memory loads at all levels of the PWTD tree among processors, and (ii) a novel asynchronous communication scheme to reduce the cost

  20. Massively Parallel Computing at Sandia and Its Application to National Defense

    National Research Council Canada - National Science Library

    Dosanjh, Sudip

    1991-01-01

    Two years ago, researchers at Sandia National Laboratories showed that a massively parallel computer with 1024 processors could solve scientific problems more than 1000 times faster than a single processor...

  1. Dual-scale topology optoelectronic processor.

    Science.gov (United States)

    Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H

    1991-12-15

    The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.

  2. AMD's 64-bit Opteron processor

    CERN Multimedia

    CERN. Geneva

    2003-01-01

    This talk concentrates on issues that relate to obtaining peak performance from the Opteron processor. Compiler options, memory layout, MPI issues in multi-processor configurations and the use of a NUMA kernel will be covered. A discussion of recent benchmarking projects and results will also be included.BiographiesDavid RichDavid directs AMD's efforts in high performance computing and also in the use of Opteron processors...

  3. Optimization of Particle-in-Cell Codes on RISC Processors

    Science.gov (United States)

    Decyk, Viktor K.; Karmesin, Steve Roy; Boer, Aeint de; Liewer, Paulette C.

    1996-01-01

    General strategies are developed to optimize particle-cell-codes written in Fortran for RISC processors which are commonly used on massively parallel computers. These strategies include data reorganization to improve cache utilization and code reorganization to improve efficiency of arithmetic pipelines.

  4. Temporal fringe pattern analysis with parallel computing

    International Nuclear Information System (INIS)

    Tuck Wah Ng; Kar Tien Ang; Argentini, Gianluca

    2005-01-01

    Temporal fringe pattern analysis is invaluable in transient phenomena studies but necessitates long processing times. Here we describe a parallel computing strategy based on the single-program multiple-data model and hyperthreading processor technology to reduce the execution time. In a two-node cluster workstation configuration we found that execution periods were reduced by 1.6 times when four virtual processors were used. To allow even lower execution times with an increasing number of processors, the time allocated for data transfer, data read, and waiting should be minimized. Parallel computing is found here to present a feasible approach to reduce execution times in temporal fringe pattern analysis

  5. 3D-Flow processor for a programmable Level-1 trigger (feasibility study)

    International Nuclear Information System (INIS)

    Crosetto, D.

    1992-10-01

    A feasibility study has been made to use the 3D-Flow processor in a pipelined programmable parallel processing architecture to identify particles such as electrons, jets, muons, etc., in high-energy physics experiments

  6. Acoustic of monolithic dome structures

    Directory of Open Access Journals (Sweden)

    Mostafa Refat Ismail

    2018-03-01

    The interior of monolithic domes have perfect, concave shapes to ensure that sound travels through the dome and perfectly collected at different vocal points. These dome structures are utilized for domestic use because the scale allows the focal points to be positioned across daily life activities, thereby affecting the sonic comfort of the internal space. This study examines the various acoustic treatments and parametric configurations of monolithic dome sizes. A geometric relationship of acoustic treatment and dome radius is established to provide architects guidelines on the correct selection of absorption needed to maintain the acoustic comfort of these special spaces.

  7. Parallel processing of Monte Carlo code MCNP for particle transport problem

    Energy Technology Data Exchange (ETDEWEB)

    Higuchi, Kenji; Kawasaki, Takuji

    1996-06-01

    It is possible to vectorize or parallelize Monte Carlo codes (MC code) for photon and neutron transport problem, making use of independency of the calculation for each particle. Applicability of existing MC code to parallel processing is mentioned. As for parallel computer, we have used both vector-parallel processor and scalar-parallel processor in performance evaluation. We have made (i) vector-parallel processing of MCNP code on Monte Carlo machine Monte-4 with four vector processors, (ii) parallel processing on Paragon XP/S with 256 processors. In this report we describe the methodology and results for parallel processing on two types of parallel or distributed memory computers. In addition, we mention the evaluation of parallel programming environments for parallel computers used in the present work as a part of the work developing STA (Seamless Thinking Aid) Basic Software. (author)

  8. Composable processor virtualization for embedded systems

    NARCIS (Netherlands)

    Molnos, A.M.; Milutinovic, A.; She, D.; Goossens, K.G.W.

    2010-01-01

    Processor virtualization divides a physical processor's time among a set of virual machines, enabling efficient hardware utilization, application security and allowing co-existence of different operating systems on the same processor. Through initially intended for the server domain, virtualization

  9. Using of opportunities of graphic processors for acceleration of scientific and technical calculations

    International Nuclear Information System (INIS)

    Dudnik, V.A.; Kudryavtsev, V.I.; Sereda, T.M.; Us, S.A.; Shestakov, M.V.

    2009-01-01

    The new opportunities of modern graphic processors (GPU) for acceleration of the scientific and technical calculations with the help of paralleling of a calculating task between the central processor and GPU are described. The description of using the technology NVIDIA CUDA for connection of parallel computing opportunities of GPU within the programme of the some intensive mathematical tasks is resulted. The examples of comparison of parameters of productivity in the process of these tasks' calculation without application of GPU and with use of opportunities NVIDIA CUDA for graphic processor GeForce 8800 are resulted

  10. Parallel algorithms for continuum dynamics

    International Nuclear Information System (INIS)

    Hicks, D.L.; Liebrock, L.M.

    1987-01-01

    Simply porting existing parallel programs to a new parallel processor may not achieve the full speedup possible; to achieve the maximum efficiency may require redesigning the parallel algorithms for the specific architecture. The authors discuss here parallel algorithms that were developed first for the HEP processor and then ported to the CRAY X-MP/4, the ELXSI/10, and the Intel iPSC/32. Focus is mainly on the most recent parallel processing results produced, i.e., those on the Intel Hypercube. The applications are simulations of continuum dynamics in which the momentum and stress gradients are important. Examples of these are inertial confinement fusion experiments, severe breaks in the coolant system of a reactor, weapons physics, shock-wave physics. Speedup efficiencies on the Intel iPSC Hypercube are very sensitive to the ratio of communication to computation. Great care must be taken in designing algorithms for this machine to avoid global communication. This is much more critical on the iPSC than it was on the three previous parallel processors

  11. Nonlinear Wave Simulation on the Xeon Phi Knights Landing Processor

    OpenAIRE

    Hristov Ivan; Goranov Goran; Hristova Radoslava

    2018-01-01

    We consider an interesting from computational point of view standing wave simulation by solving coupled 2D perturbed Sine-Gordon equations. We make an OpenMP realization which explores both thread and SIMD levels of parallelism. We test the OpenMP program on two different energy equivalent Intel architectures: 2× Xeon E5-2695 v2 processors, (code-named “Ivy Bridge-EP”) in the Hybrilit cluster, and Xeon Phi 7250 processor (code-named “Knights Landing” (KNL). The results show 2 times better per...

  12. Matrix preconditioning: a robust operation for optical linear algebra processors.

    Science.gov (United States)

    Ghosh, A; Paparao, P

    1987-07-15

    Analog electrooptical processors are best suited for applications demanding high computational throughput with tolerance for inaccuracies. Matrix preconditioning is one such application. Matrix preconditioning is a preprocessing step for reducing the condition number of a matrix and is used extensively with gradient algorithms for increasing the rate of convergence and improving the accuracy of the solution. In this paper, we describe a simple parallel algorithm for matrix preconditioning, which can be implemented efficiently on a pipelined optical linear algebra processor. From the results of our numerical experiments we show that the efficacy of the preconditioning algorithm is affected very little by the errors of the optical system.

  13. Monolithic fiber optic sensor assembly

    Science.gov (United States)

    Sanders, Scott

    2015-02-10

    A remote sensor element for spectrographic measurements employs a monolithic assembly of one or two fiber optics to two optical elements separated by a supporting structure to allow the flow of gases or particulates therebetween. In a preferred embodiment, the sensor element components are fused ceramic to resist high temperatures and failure from large temperature changes.

  14. Monolithic Integrated Ceramic Waveguide Filters

    OpenAIRE

    Hunter, IC; Sandhu, MY

    2014-01-01

    Design techniques for a new class of integrated monolithic high permittivity ceramic waveguide filters are presented. These filters enable a size reduction of 50% compared to air-filled TEM filters with the same unloaded Q-Factor. Designs for both chebyshev and asymmetric generalized chebyshev filter are presented, with experimental results for an 1800 MHz chebyshev filter showing excellent agreement with theory.

  15. Vectorization, parallelization and porting of nuclear codes (vectorization and parallelization). Progress report fiscal 1998

    International Nuclear Information System (INIS)

    Ishizuki, Shigeru; Kawai, Wataru; Nemoto, Toshiyuki; Ogasawara, Shinobu; Kume, Etsuo; Adachi, Masaaki; Kawasaki, Nobuo; Yatake, Yo-ichi

    2000-03-01

    Several computer codes in the nuclear field have been vectorized, parallelized and transported on the FUJITSU VPP500 system, the AP3000 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 12 codes in fiscal 1998. These results are reported in 3 parts, i.e., the vectorization and parallelization on vector processors part, the parallelization on scalar processors part and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of General Tokamak Circuit Simulation Program code GTCSP, the vectorization and parallelization of Molecular Dynamics NTV (n-particle, Temperature and Velocity) Simulation code MSP2, Eddy Current Analysis code EDDYCAL, Thermal Analysis Code for Test of Passive Cooling System by HENDEL T2 code THANPACST2 and MHD Equilibrium code SELENEJ on the VPP500 are described. In the parallelization on scalar processors part, the parallelization of Monte Carlo N-Particle Transport code MCNP4B2, Plasma Hydrodynamics code using Cubic Interpolated Propagation Method PHCIP and Vectorized Monte Carlo code (continuous energy model / multi-group model) MVP/GMVP on the Paragon are described. In the porting part, the porting of Monte Carlo N-Particle Transport code MCNP4B2 and Reactor Safety Analysis code RELAP5 on the AP3000 are described. (author)

  16. Protective Skins for Aerogel Monoliths

    Science.gov (United States)

    Leventis, Nicholas; Johnston, James C.; Kuczmarski, Maria A.; Meador, Ann B.

    2007-01-01

    A method of imparting relatively hard protective outer skins to aerogel monoliths has been developed. Even more than aerogel beads, aerogel monoliths are attractive as thermal-insulation materials, but the commercial utilization of aerogel monoliths in thermal-insulation panels has been inhibited by their fragility and the consequent difficulty of handling them. Therefore, there is a need to afford sufficient protection to aerogel monoliths to facilitate handling, without compromising the attractive bulk properties (low density, high porosity, low thermal conductivity, high surface area, and low permittivity) of aerogel materials. The present method was devised to satisfy this need. The essence of the present method is to coat an aerogel monolith with an outer polymeric skin, by painting or spraying. Apparently, the reason spraying and painting were not attempted until now is that it is well known in the aerogel industry that aerogels collapse in contact with liquids. In the present method, one prevents such collapse through the proper choice of coating liquid and process conditions: In particular, one uses a viscous polymer precursor liquid and (a) carefully controls the amount of liquid applied and/or (b) causes the liquid to become cured to the desired hard polymeric layer rapidly enough that there is not sufficient time for the liquid to percolate into the aerogel bulk. The method has been demonstrated by use of isocyanates, which, upon exposure to atmospheric moisture, become cured to polyurethane/polyurea-type coats. The method has also been demonstrated by use of commercial epoxy resins. The method could also be implemented by use of a variety of other resins, including polyimide precursors (for forming high-temperature-resistant protective skins) or perfluorinated monomers (for forming coats that impart hydrophobicity and some increase in strength).

  17. Professional Parallel Programming with C# Master Parallel Extensions with NET 4

    CERN Document Server

    Hillar, Gastón

    2010-01-01

    Expert guidance for those programming today's dual-core processors PCs As PC processors explode from one or two to now eight processors, there is an urgent need for programmers to master concurrent programming. This book dives deep into the latest technologies available to programmers for creating professional parallel applications using C#, .NET 4, and Visual Studio 2010. The book covers task-based programming, coordination data structures, PLINQ, thread pools, asynchronous programming model, and more. It also teaches other parallel programming techniques, such as SIMD and vectorization.Teach

  18. Distributed processor systems

    International Nuclear Information System (INIS)

    Zacharov, B.

    1976-01-01

    In recent years, there has been a growing tendency in high-energy physics and in other fields to solve computational problems by distributing tasks among the resources of inter-coupled processing devices and associated system elements. This trend has gained further momentum more recently with the increased availability of low-cost processors and with the development of the means of data distribution. In two lectures, the broad question of distributed computing systems is examined and the historical development of such systems reviewed. An attempt is made to examine the reasons for the existence of these systems and to discern the main trends for the future. The components of distributed systems are discussed in some detail and particular emphasis is placed on the importance of standards and conventions in certain key system components. The ideas and principles of distributed systems are discussed in general terms, but these are illustrated by a number of concrete examples drawn from the context of the high-energy physics environment. (Auth.)

  19. A high-speed digital signal processor for atmospheric radar, part 7.3A

    Science.gov (United States)

    Brosnahan, J. W.; Woodard, D. M.

    1984-01-01

    The Model SP-320 device is a monolithic realization of a complex general purpose signal processor, incorporating such features as a 32-bit ALU, a 16-bit x 16-bit combinatorial multiplier, and a 16-bit barrel shifter. The SP-320 is designed to operate as a slave processor to a host general purpose computer in applications such as coherent integration of a radar return signal in multiple ranges, or dedicated FFT processing. Presently available is an I/O module conforming to the Intel Multichannel interface standard; other I/O modules will be designed to meet specific user requirements. The main processor board includes input and output FIFO (First In First Out) memories, both with depths of 4096 W, to permit asynchronous operation between the source of data and the host computer. This design permits burst data rates in excess of 5 MW/s.

  20. Customizable Memory Schemes for Data Parallel Architectures

    NARCIS (Netherlands)

    Gou, C.

    2011-01-01

    Memory system efficiency is crucial for any processor to achieve high performance, especially in the case of data parallel machines. Processing capabilities of parallel lanes will be wasted, when data requests are not accomplished in a sustainable and timely manner. Irregular vector memory accesses

  1. Green Secure Processors: Towards Power-Efficient Secure Processor Design

    Science.gov (United States)

    Chhabra, Siddhartha; Solihin, Yan

    With the increasing wealth of digital information stored on computer systems today, security issues have become increasingly important. In addition to attacks targeting the software stack of a system, hardware attacks have become equally likely. Researchers have proposed Secure Processor Architectures which utilize hardware mechanisms for memory encryption and integrity verification to protect the confidentiality and integrity of data and computation, even from sophisticated hardware attacks. While there have been many works addressing performance and other system level issues in secure processor design, power issues have largely been ignored. In this paper, we first analyze the sources of power (energy) increase in different secure processor architectures. We then present a power analysis of various secure processor architectures in terms of their increase in power consumption over a base system with no protection and then provide recommendations for designs that offer the best balance between performance and power without compromising security. We extend our study to the embedded domain as well. We also outline the design of a novel hybrid cryptographic engine that can be used to minimize the power consumption for a secure processor. We believe that if secure processors are to be adopted in future systems (general purpose or embedded), it is critically important that power issues are considered in addition to performance and other system level issues. To the best of our knowledge, this is the first work to examine the power implications of providing hardware mechanisms for security.

  2. Parallel algorithms on the ASTRA SIMD machine

    International Nuclear Information System (INIS)

    Odor, G.; Rohrbach, F.; Vesztergombi, G.; Varga, G.; Tatrai, F.

    1996-01-01

    In view of the tremendous computing power jump of modern RISC processors the interest in parallel computing seems to be thinning out. Why use a complicated system of parallel processors, if the problem can be solved by a single powerful micro-chip. It is a general law, however, that exponential growth will always end by some kind of a saturation, and then parallelism will again become a hot topic. We try to prepare ourselves for this eventuality. The MPPC project started in 1990 in the keydeys of parallelism and produced four ASTRA machines (presented at CHEP's 92) with 4k processors (which are expandable to 16k) based on yesterday's chip-technology (chip presented at CHEP'91). These machines now provide excellent test-beds for algorithmic developments in a complete, real environment. We are developing for example fast-pattern recognition algorithms which could be used in high-energy physics experiments at the LHC (planned to be operational after 2004 at CERN) for triggering and data reduction. The basic feature of our ASP (Associate String Processor) approach is to use extremely simple (thus very cheap) processor elements but in huge quantities (up to millions of processors) connected together by a very simple string-like communication chain. In this paper we present powerful algorithms based on this architecture indicating the performance perspectives if the hardware quality reaches present or even future technology levels. (author)

  3. Monolithic Microwave Integrated Circuit (MMIC) technology for space communications applications

    Science.gov (United States)

    Connolly, Denis J.; Bhasin, Kul B.; Romanofsky, Robert R.

    1987-01-01

    Future communications satellites are likely to use gallium arsenide (GaAs) monolithic microwave integrated-circuit (MMIC) technology in most, if not all, communications payload subsystems. Multiple-scanning-beam antenna systems are expected to use GaAs MMIC's to increase functional capability, to reduce volume, weight, and cost, and to greatly improve system reliability. RF and IF matrix switch technology based on GaAs MMIC's is also being developed for these reasons. MMIC technology, including gigabit-rate GaAs digital integrated circuits, offers substantial advantages in power consumption and weight over silicon technologies for high-throughput, on-board baseband processor systems. For the more distant future pseudomorphic indium gallium arsenide (InGaAs) and other advanced III-V materials offer the possibility of MMIC subsystems well up into the millimeter wavelength region. All of these technology elements are in NASA's MMIC program. Their status is reviewed.

  4. Processor farming in two-level analysis of historical bridge

    Science.gov (United States)

    Krejčí, T.; Kruis, J.; Koudelka, T.; Šejnoha, M.

    2017-11-01

    This contribution presents a processor farming method in connection with a multi-scale analysis. In this method, each macro-scopic integration point or each finite element is connected with a certain meso-scopic problem represented by an appropriate representative volume element (RVE). The solution of a meso-scale problem provides then effective parameters needed on the macro-scale. Such an analysis is suitable for parallel computing because the meso-scale problems can be distributed among many processors. The application of the processor farming method to a real world masonry structure is illustrated by an analysis of Charles bridge in Prague. The three-dimensional numerical model simulates the coupled heat and moisture transfer of one half of arch No. 3. and it is a part of a complex hygro-thermo-mechanical analysis which has been developed to determine the influence of climatic loading on the current state of the bridge.

  5. A single chip pulse processor for nuclear spectroscopy

    International Nuclear Information System (INIS)

    Hilsenrath, F.; Bakke, J.C.; Voss, H.D.

    1985-01-01

    A high performance digital pulse processor, integrated into a single gate array microcircuit, has been developed for spaceflight applications. The new approach takes advantage of the latest CMOS high speed A/D flash converters and low-power gated logic arrays. The pulse processor measures pulse height, pulse area and the required timing information (e.g. multi detector coincidence and pulse pile-up detection). The pulse processor features high throughput rate (e.g. 0.5 Mhz for 2 usec gausssian pulses) and improved differential linearity (e.g. + or - 0.2 LSB for a + or - 1 LSB A/D). Because of the parallel digital architecture of the device, the interface is microprocessor bus compatible. A satellite flight application of this module is presented for use in the X-ray imager and high energy particle spectrometers of the PEM experiment on the Upper Atmospheric Research Satellite

  6. Seismometer array station processors

    International Nuclear Information System (INIS)

    Key, F.A.; Lea, T.G.; Douglas, A.

    1977-01-01

    A description is given of the design, construction and initial testing of two types of Seismometer Array Station Processor (SASP), one to work with data stored on magnetic tape in analogue form, the other with data in digital form. The purpose of a SASP is to detect the short period P waves recorded by a UK-type array of 20 seismometers and to edit these on to a a digital library tape or disc. The edited data are then processed to obtain a rough location for the source and to produce seismograms (after optimum processing) for analysis by a seismologist. SASPs are an important component in the scheme for monitoring underground explosions advocated by the UK in the Conference of the Committee on Disarmament. With digital input a SASP can operate at 30 times real time using a linear detection process and at 20 times real time using the log detector of Weichert. Although the log detector is slower, it has the advantage over the linear detector that signals with lower signal-to-noise ratio can be detected and spurious large amplitudes are less likely to produce a detection. It is recommended, therefore, that where possible array data should be recorded in digital form for input to a SASP and that the log detector of Weichert be used. Trial runs show that a SASP is capable of detecting signals down to signal-to-noise ratios of about two with very few false detections, and at mid-continental array sites it should be capable of detecting most, if not all, the signals with magnitude above msub(b) 4.5; the UK argues that, given a suitable network, it is realistic to hope that sources of this magnitude and above can be detected and identified by seismological means alone. (author)

  7. Satellite on-board real-time SAR processor prototype

    Science.gov (United States)

    Bergeron, Alain; Doucet, Michel; Harnisch, Bernd; Suess, Martin; Marchese, Linda; Bourqui, Pascal; Desnoyers, Nicholas; Legros, Mathieu; Guillot, Ludovic; Mercier, Luc; Châteauneuf, François

    2017-11-01

    A Compact Real-Time Optronic SAR Processor has been successfully developed and tested up to a Technology Readiness Level of 4 (TRL4), the breadboard validation in a laboratory environment. SAR, or Synthetic Aperture Radar, is an active system allowing day and night imaging independent of the cloud coverage of the planet. The SAR raw data is a set of complex data for range and azimuth, which cannot be compressed. Specifically, for planetary missions and unmanned aerial vehicle (UAV) systems with limited communication data rates this is a clear disadvantage. SAR images are typically processed electronically applying dedicated Fourier transformations. This, however, can also be performed optically in real-time. Originally the first SAR images were optically processed. The optical Fourier processor architecture provides inherent parallel computing capabilities allowing real-time SAR data processing and thus the ability for compression and strongly reduced communication bandwidth requirements for the satellite. SAR signal return data are in general complex data. Both amplitude and phase must be combined optically in the SAR processor for each range and azimuth pixel. Amplitude and phase are generated by dedicated spatial light modulators and superimposed by an optical relay set-up. The spatial light modulators display the full complex raw data information over a two-dimensional format, one for the azimuth and one for the range. Since the entire signal history is displayed at once, the processor operates in parallel yielding real-time performances, i.e. without resulting bottleneck. Processing of both azimuth and range information is performed in a single pass. This paper focuses on the onboard capabilities of the compact optical SAR processor prototype that allows in-orbit processing of SAR images. Examples of processed ENVISAT ASAR images are presented. Various SAR processor parameters such as processing capabilities, image quality (point target analysis), weight and

  8. Benchmarking NWP Kernels on Multi- and Many-core Processors

    Science.gov (United States)

    Michalakes, J.; Vachharajani, M.

    2008-12-01

    Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.

  9. Development of readout electronics for monolithic integration with diode strip detectors

    International Nuclear Information System (INIS)

    Hosticka, B.J.; Wrede, M.; Zimmer, G.; Kemmer, J.; Hofmann, R.; Lutz, G.

    1984-03-01

    Parallel in - serial out analog readout electronics integrated with silicon strip detectors will bring a reduction of two orders of magnitude in external electronics. The readout concept and the chosen CMOS technology solve the basic problem of low noise and low power requirements. A hybrid solution is an intermediate step towards the final goal of monolithic integration of detector and electronics. (orig.)

  10. A Parallel Encryption Algorithm Based on Piecewise Linear Chaotic Map

    Directory of Open Access Journals (Sweden)

    Xizhong Wang

    2013-01-01

    Full Text Available We introduce a parallel chaos-based encryption algorithm for taking advantage of multicore processors. The chaotic cryptosystem is generated by the piecewise linear chaotic map (PWLCM. The parallel algorithm is designed with a master/slave communication model with the Message Passing Interface (MPI. The algorithm is suitable not only for multicore processors but also for the single-processor architecture. The experimental results show that the chaos-based cryptosystem possesses good statistical properties. The parallel algorithm provides much better performance than the serial ones and would be useful to apply in encryption/decryption file with large size or multimedia.

  11. A compact narrow-linewidth laser with a low-Q monolithic cavity

    International Nuclear Information System (INIS)

    Peng, Yu

    2013-01-01

    We demonstrate an approach to narrowing the linewidth of a diode laser to around 15×10 3 Hz with a compact setup of confocal and parallel monolithic Fabry–Perot cavities (MFCs). Resonances of the confocal and parallel MFCs with low finesse are obtained. Diode lasers with optical feedback from confocal and parallel monolithic MFCs are demonstrated. The frequency could be tuned 80×10 6 Hz by changing the grating position of the external cavity diode laser based on the confocal MFC, and 100×10 6 Hz by tuning the temperature of the plane MFC over 0.02 ° C for the external cavity diode laser based on the parallel MFC. (paper)

  12. Portable parallel programming in a Fortran environment

    International Nuclear Information System (INIS)

    May, E.N.

    1989-01-01

    Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs

  13. Parallel 3-D method of characteristics in MPACT

    International Nuclear Information System (INIS)

    Kochunas, B.; Dovvnar, T. J.; Liu, Z.

    2013-01-01

    A new parallel 3-D MOC kernel has been developed and implemented in MPACT which makes use of the modular ray tracing technique to reduce computational requirements and to facilitate parallel decomposition. The parallel model makes use of both distributed and shared memory parallelism which are implemented with the MPI and OpenMP standards, respectively. The kernel is capable of parallel decomposition of problems in space, angle, and by characteristic rays up to 0(104) processors. Initial verification of the parallel 3-D MOC kernel was performed using the Takeda 3-D transport benchmark problems. The eigenvalues computed by MPACT are within the statistical uncertainty of the benchmark reference and agree well with the averages of other participants. The MPACT k eff differs from the benchmark results for rodded and un-rodded cases by 11 and -40 pcm, respectively. The calculations were performed for various numbers of processors and parallel decompositions up to 15625 processors; all producing the same result at convergence. The parallel efficiency of the worst case was 60%, while very good efficiency (>95%) was observed for cases using 500 processors. The overall run time for the 500 processor case was 231 seconds and 19 seconds for the case with 15625 processors. Ongoing work is focused on developing theoretical performance models and the implementation of acceleration techniques to minimize the number of iterations to converge. (authors)

  14. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

    Science.gov (United States)

    Manolakos, Elias S.

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332

  15. Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

    Science.gov (United States)

    Sharma, Anuj; Manolakos, Elias S

    2015-01-01

    Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.

  16. Scalable Parallelization of Skyline Computation for Multi-core Processors

    DEFF Research Database (Denmark)

    Chester, Sean; Sidlauskas, Darius; Assent, Ira

    2015-01-01

    The skyline is an important query operator for multi-criteria decision making. It reduces a dataset to only those points that offer optimal trade-offs of dimensions. In general, it is very expensive to compute. Recently, multi-core CPU algorithms have been proposed to accelerate the computation...... of the skyline. However, they do not sufficiently minimize dominance tests and so are not competitive with state-of-the-art sequential algorithms. In this paper, we introduce a novel multi-core skyline algorithm, Hybrid, which processes points in blocks. It maintains a shared, global skyline among all threads...

  17. Approximate inference for spatial functional data on massively parallel processors

    DEFF Research Database (Denmark)

    Raket, Lars Lau; Markussen, Bo

    2014-01-01

    With continually increasing data sizes, the relevance of the big n problem of classical likelihood approaches is greater than ever. The functional mixed-effects model is a well established class of models for analyzing functional data. Spatial functional data in a mixed-effects setting...... in linear time. An extremely efficient GPU implementation is presented, and the proposed methods are illustrated by conducting a classical statistical analysis of 2D chromatography data consisting of more than 140 million spatially correlated observation points....

  18. Optimal job splitting in parallel processor sharing queues

    NARCIS (Netherlands)

    Hoekstra, G.; van der Mei, R.D.; Bhulai, S.

    2012-01-01

    The main barrier to the sustained growth of wireless communications is the Shannon limit that applies to the channel capacity. A promising means to realize high-capacity enhancements is the use of multi-path communication solutions to improve reliability and network performance in areas that are

  19. Optimal job splitting in parallel processor sharing queues

    NARCIS (Netherlands)

    G.J. Hoekstra (Gerard); R.D. van der Mei (Rob); S. Bhulai (Sandjai)

    2012-01-01

    htmlabstractThe main barrier to the sustained growth of wireless communications is the Shannon limit that applies to the channel capacity. A promising means to realize high-capacity enhancements is the use of multi-path communication solutions to improve reliability and network performance in

  20. Parallel processing of dose calculation for external photon beam therapy

    International Nuclear Information System (INIS)

    Kunieda, Etsuo; Ando, Yutaka; Tsukamoto, Nobuhiro; Ito, Hisao; Kubo, Atsushi

    1994-01-01

    We implemented external photon beam dose calculation programs into a parallel processor system consisting of Transputers, 32-bit processors especially suitable for multi-processor configuration. Two network conformations, binary-tree and pipeline, were evaluated for rectangular and irregular field dose calculation algorithms. Although computation speed increased in proportion to the number of CPU, substantial overhead caused by inter-processor communication occurred when a smaller computation load was delivered to each processor. On the other hand, for irregular field calculation, which requires more computation capability for each calculation point, the communication overhead was still less even when more than 50 processors were involved. Real-time responses could be expected for more complex algorithms by increasing the number of processors. (author)

  1. Parallel processing data network of master and slave transputers controlled by a serial control network

    Science.gov (United States)

    Crosetto, Dario B.

    1996-01-01

    The present device provides for a dynamically configurable communication network having a multi-processor parallel processing system having a serial communication network and a high speed parallel communication network. The serial communication network is used to disseminate commands from a master processor (100) to a plurality of slave processors (200) to effect communication protocol, to control transmission of high density data among nodes and to monitor each slave processor's status. The high speed parallel processing network is used to effect the transmission of high density data among nodes in the parallel processing system. Each node comprises a transputer (104), a digital signal processor (114), a parallel transfer controller (106), and two three-port memory devices. A communication switch (108) within each node (100) connects it to a fast parallel hardware channel (70) through which all high density data arrives or leaves the node.

  2. XL-100S microprogrammable processor

    International Nuclear Information System (INIS)

    Gorbunov, N.V.; Guzik, Z.; Sutulin, V.A.; Forytski, A.

    1983-01-01

    The XL-100S microprogrammable processor providing the multiprocessor operation mode in the XL system crate is described. The processor meets the EUR 6500 CAMAC standards, address up to 4 Mbyte memory, and interacts with 7 CAMAC branchas. Eight external requests initiate operations preset by a sequence of microcommands in a memory of the capacity up to 64 kwords of 32-Git. The microprocessor architecture allows one to emulate commands of the majority of mini- or micro-computers, including floating point operations. The XL-100S processor may be used in various branches of experimental physics: for physical experiment apparatus control, fast selection of useful physical events, organization of the of input/output operations, organization of direct assess to memory included, etc. The Am2900 microprocessor set is used as an elementary base. The device is made in the form of a single width CAMAC module

  3. Making CSB + -Trees Processor Conscious

    DEFF Research Database (Denmark)

    Samuel, Michael; Pedersen, Anders Uhl; Bonnet, Philippe

    2005-01-01

    of the CSB+-tree. We argue that it is necessary to consider a larger group of parameters in order to adapt CSB+-tree to processor architectures as different as Pentium and Itanium. We identify this group of parameters and study how it impacts the performance of CSB+-tree on Itanium 2. Finally, we propose......Cache-conscious indexes, such as CSB+-tree, are sensitive to the underlying processor architecture. In this paper, we focus on how to adapt the CSB+-tree so that it performs well on a range of different processor architectures. Previous work has focused on the impact of node size on the performance...... a systematic method for adapting CSB+-tree to new platforms. This work is a first step towards integrating CSB+-tree in MySQL’s heap storage manager....

  4. Java Processor Optimized for RTSJ

    Directory of Open Access Journals (Sweden)

    Tu Shiliang

    2007-01-01

    Full Text Available Due to the preeminent work of the real-time specification for Java (RTSJ, Java is increasingly expected to become the leading programming language in real-time systems. To provide a Java platform suitable for real-time applications, a Java processor which can execute Java bytecode is directly proposed in this paper. It provides efficient support in hardware for some mechanisms specified in the RTSJ and offers a simpler programming model through ameliorating the scoped memory of the RTSJ. The worst case execution time (WCET of the bytecodes implemented in this processor is predictable by employing the optimization method proposed in our previous work, in which all the processing interfering predictability is handled before bytecode execution. Further advantage of this method is to make the implementation of the processor simpler and suited to a low-cost FPGA chip.

  5. PDDP, A Data Parallel Programming Model

    Directory of Open Access Journals (Sweden)

    Karen H. Warren

    1996-01-01

    Full Text Available PDDP, the parallel data distribution preprocessor, is a data parallel programming model for distributed memory parallel computers. PDDP implements high-performance Fortran-compatible data distribution directives and parallelism expressed by the use of Fortran 90 array syntax, the FORALL statement, and the WHERE construct. Distributed data objects belong to a global name space; other data objects are treated as local and replicated on each processor. PDDP allows the user to program in a shared memory style and generates codes that are portable to a variety of parallel machines. For interprocessor communication, PDDP uses the fastest communication primitives on each platform.

  6. High performance deformable image registration algorithms for manycore processors

    CERN Document Server

    Shackleford, James; Sharp, Gregory

    2013-01-01

    High Performance Deformable Image Registration Algorithms for Manycore Processors develops highly data-parallel image registration algorithms suitable for use on modern multi-core architectures, including graphics processing units (GPUs). Focusing on deformable registration, we show how to develop data-parallel versions of the registration algorithm suitable for execution on the GPU. Image registration is the process of aligning two or more images into a common coordinate frame and is a fundamental step to be able to compare or fuse data obtained from different sensor measurements. E

  7. The 2nd Symposium on the Frontiers of Massively Parallel Computations

    Science.gov (United States)

    Mills, Ronnie (Editor)

    1988-01-01

    Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.

  8. Optical Array Processor: Laboratory Results

    Science.gov (United States)

    Casasent, David; Jackson, James; Vaerewyck, Gerard

    1987-01-01

    A Space Integrating (SI) Optical Linear Algebra Processor (OLAP) is described and laboratory results on its performance in several practical engineering problems are presented. The applications include its use in the solution of a nonlinear matrix equation for optimal control and a parabolic Partial Differential Equation (PDE), the transient diffusion equation with two spatial variables. Frequency-multiplexed, analog and high accuracy non-base-two data encoding are used and discussed. A multi-processor OLAP architecture is described and partitioning and data flow issues are addressed.

  9. Fast processor for dilepton triggers

    International Nuclear Information System (INIS)

    Katsanevas, S.; Kostarakis, P.; Baltrusaitis, R.

    1983-01-01

    We describe a fast trigger processor, developed for and used in Fermilab experiment E-537, for selecting high-mass dimuon events produced by negative pions and anti-protons. The processor finds candidate tracks by matching hit information received from drift chambers and scintillation counters, and determines their momenta. Invariant masses are calculated for all possible pairs of tracks and an event is accepted if any invariant mass is greater than some preselectable minimum mass. The whole process, accomplished within 5 to 10 microseconds, achieves up to a ten-fold reduction in trigger rate

  10. Parallelization of a numerical simulation code for isotropic turbulence

    International Nuclear Information System (INIS)

    Sato, Shigeru; Yokokawa, Mitsuo; Watanabe, Tadashi; Kaburaki, Hideo.

    1996-03-01

    A parallel pseudospectral code which solves the three-dimensional Navier-Stokes equation by direct numerical simulation is developed and execution time, parallelization efficiency, load balance and scalability are evaluated. A vector parallel supercomputer, Fujitsu VPP500 with up to 16 processors is used for this calculation for Fourier modes up to 256x256x256 using 16 processors. Good scalability for number of processors is achieved when number of Fourier mode is fixed. For small Fourier modes, calculation time of the program is proportional to NlogN which is ideal complexity of calculation for 3D-FFT on vector parallel processors. It is found that the calculation performance decreases as the increase of the Fourier modes. (author)

  11. Architecture-Aware Optimization of an HEVC decoder on Asymmetric Multicore Processors

    OpenAIRE

    Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.

    2016-01-01

    Low-power asymmetric multicore processors (AMPs) attract considerable attention due to their appealing performance-power ratio for energy-constrained environments. However, these processors pose a significant programming challenge due to the integration of cores with different performance capabilities, asking for an asymmetry-aware scheduling solution that carefully distributes the workload. The recent HEVC standard, which offers several high-level parallelization strategies, is an important ...

  12. Rational calculation accuracy in acousto-optical matrix-vector processor

    Science.gov (United States)

    Oparin, V. V.; Tigin, Dmitry V.

    1994-01-01

    The high speed of parallel computations for a comparatively small-size processor and acceptable power consumption makes the usage of acousto-optic matrix-vector multiplier (AOMVM) attractive for processing of large amounts of information in real time. The limited accuracy of computations is an essential disadvantage of such a processor. The reduced accuracy requirements allow for considerable simplification of the AOMVM architecture and the reduction of the demands on its components.

  13. Some questions of using the algebraic coding theory for construction of special-purpose processors in high energy physics spectrometers

    International Nuclear Information System (INIS)

    Nikityuk, N.M.

    1989-01-01

    The results of investigations of using the algebraic coding theory for the creation of parallel encoders, majority coincidence schemes and coordinate processors for the first and second trigger levels are described. Concrete examples of calculation and structure of special-purpose processor using the table arithmetic method are given for multiplicity t ≤ 5. The question of using parallel and sequential syndrome coding methods for the registration of events with clusters is discussed. 30 refs.; 10 figs

  14. A monolithic silicon detector telescope

    International Nuclear Information System (INIS)

    Cardella, G.; Amorini, F.; Cabibbo, M.; Di Pietro, A.; Fallica, G.; Franzo, G.; Figuera, P.; Papa, M.; Pappalardo, G.; Percolla, G.; Priolo, F.; Privitera, V.; Rizzo, F.; Tudisco, S.

    1996-01-01

    An ultrathin silicon detector (1 μm) thick implanted on a standard 400 μm Si-detector has been built to realize a monolithic telescope detector for simultaneous charge and energy determination of charged particles. The performances of the telescope have been tested using standard alpha sources and fragments emitted in nuclear reactions with different projectile-target colliding systems. An excellent charge resolution has been obtained for low energy (less than 5 MeV) light nuclei. A multi-array lay-out of such detectors is under construction to charge identify the particles emitted in reactions induced by low energy radioactive beams. (orig.)

  15. Imaging monolithic silicon detector telescopes

    International Nuclear Information System (INIS)

    Amorini, F.; Sipala, V.; Cardella, G.; Boiano, C.; Carbone, B.; Cosentino, L.; Costa, E.; Di Pietro, A.; Emanuele, U.; Fallica, G.; Figuera, P.; Finocchiaro, P.; La Guidara, E.; Marchetta, C.; Pappalardo, A.; Piazza, A.; Randazzo, N.; Rizzo, F.; Russo, G.V.; Russotto, P.

    2008-01-01

    We show the results of some test beams performed on a new monolithic strip silicon detector telescope developed in collaboration with the INFN and ST-microelectronics. Using an appropriate design, the induction on the ΔE stages, generated by the charge released in the E stage, was used to obtain the position of the detected particle. The position measurement, together with the low threshold for particle charge identification, allows the new detector to be used for a large variety of applications due to its sensitivity of only a few microns measured in both directions

  16. Parallel programming with Easy Java Simulations

    Science.gov (United States)

    Esquembre, F.; Christian, W.; Belloni, M.

    2018-01-01

    Nearly all of today's processors are multicore, and ideally programming and algorithm development utilizing the entire processor should be introduced early in the computational physics curriculum. Parallel programming is often not introduced because it requires a new programming environment and uses constructs that are unfamiliar to many teachers. We describe how we decrease the barrier to parallel programming by using a java-based programming environment to treat problems in the usual undergraduate curriculum. We use the easy java simulations programming and authoring tool to create the program's graphical user interface together with objects based on those developed by Kaminsky [Building Parallel Programs (Course Technology, Boston, 2010)] to handle common parallel programming tasks. Shared-memory parallel implementations of physics problems, such as time evolution of the Schrödinger equation, are available as source code and as ready-to-run programs from the AAPT-ComPADRE digital library.

  17. A survey of Tumult, a real-time multi-processor system

    International Nuclear Information System (INIS)

    Jansen, P.G.

    1986-01-01

    Tumult (Twente University MULTi processor system) is the name of an ongoing project aiming at the design and implementation of a modular extendible multiprocessor system. All memory is distributed and processors communicate in parallel via a fast and reliable local switching network instead of a shared bus. A distributed real-time operating system is being designed and implemented, consisting of a multi-tasking subsystem per processor. Processes can communicate via a message passing mechanism. Communication links and processes are dynamically created and disposed by the application. In this article a brief description of the system is given; communication aspects are emphasized. (Auth.)

  18. Monolithic CMOS imaging x-ray spectrometers

    Science.gov (United States)

    Kenter, Almus; Kraft, Ralph; Gauron, Thomas; Murray, Stephen S.

    2014-07-01

    The Smithsonian Astrophysical Observatory (SAO) in collaboration with SRI/Sarnoff is developing monolithic CMOS detectors optimized for x-ray astronomy. The goal of this multi-year program is to produce CMOS x-ray imaging spectrometers that are Fano noise limited over the 0.1-10keV energy band while incorporating the many benefits of CMOS technology. These benefits include: low power consumption, radiation "hardness", high levels of integration, and very high read rates. Small format test devices from a previous wafer fabrication run (2011-2012) have recently been back-thinned and tested for response below 1keV. These devices perform as expected in regards to dark current, read noise, spectral response and Quantum Efficiency (QE). We demonstrate that running these devices at rates ~> 1Mpix/second eliminates the need for cooling as shot noise from any dark current is greatly mitigated. The test devices were fabricated on 15μm, high resistivity custom (~30kΩ-cm) epitaxial silicon and have a 16 by 192 pixel format. They incorporate 16μm pitch, 6 Transistor Pinned Photo Diode (6TPPD) pixels which have ~40μV/electron sensitivity and a highly parallel analog CDS signal chain. Newer, improved, lower noise detectors have just been fabricated (October 2013). These new detectors are fabricated on 9μm epitaxial silicon and have a 1k by 1k format. They incorporate similar 16μm pitch, 6TPPD pixels but have ~ 50% higher sensitivity and much (3×) lower read noise. These new detectors have undergone preliminary testing for functionality in Front Illuminated (FI) form and are presently being prepared for back thinning and packaging. Monolithic CMOS devices such as these, would be ideal candidate detectors for the focal planes of Solar, planetary and other space-borne x-ray astronomy missions. The high through-put, low noise and excellent low energy response, provide high dynamic range and good time resolution; bright, time varying x-ray features could be temporally and

  19. A Tutorial on Parallel and Concurrent Programming in Haskell

    Science.gov (United States)

    Peyton Jones, Simon; Singh, Satnam

    This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.

  20. Introducing 'bones' : a parallelizing source-to-source compiler based on algorithmic skeletons.

    NARCIS (Netherlands)

    Nugteren, C.; Corporaal, H.

    2012-01-01

    Recent advances in multi-core and many-core processors requires programmers to exploit an increasing amount of parallelism from their applications. Data parallel languages such as CUDA and OpenCL make it possible to take advantage of such processors, but still require a large amount of effort from

  1. Rapid prototyping and evaluation of programmable SIMD SDR processors in LISA

    Science.gov (United States)

    Chen, Ting; Liu, Hengzhu; Zhang, Botao; Liu, Dongpei

    2013-03-01

    With the development of international wireless communication standards, there is an increase in computational requirement for baseband signal processors. Time-to-market pressure makes it impossible to completely redesign new processors for the evolving standards. Due to its high flexibility and low power, software defined radio (SDR) digital signal processors have been proposed as promising technology to replace traditional ASIC and FPGA fashions. In addition, there are large numbers of parallel data processed in computation-intensive functions, which fosters the development of single instruction multiple data (SIMD) architecture in SDR platform. So a new way must be found to prototype the SDR processors efficiently. In this paper we present a bit-and-cycle accurate model of programmable SIMD SDR processors in a machine description language LISA. LISA is a language for instruction set architecture which can gain rapid model at architectural level. In order to evaluate the availability of our proposed processor, three common baseband functions, FFT, FIR digital filter and matrix multiplication have been mapped on the SDR platform. Analytical results showed that the SDR processor achieved the maximum of 47.1% performance boost relative to the opponent processor.

  2. VON WISPR Family Processors: Volume 1

    National Research Council Canada - National Science Library

    Wagstaff, Ronald

    1997-01-01

    ...) and the background noise they are embedded in. Processors utilizing those fluctuations such as the von WISPR Family Processors discussed herein, are methods or algorithms that preferentially attenuate the fluctuating signals and noise...

  3. Design Principles for Synthesizable Processor Cores

    DEFF Research Database (Denmark)

    Schleuniger, Pascal; McKee, Sally A.; Karlsson, Sven

    2012-01-01

    As FPGAs get more competitive, synthesizable processor cores become an attractive choice for embedded computing. Currently popular commercial processor cores do not fully exploit current FPGA architectures. In this paper, we propose general design principles to increase instruction throughput...

  4. Simulation of Particulate Flows Multi-Processor Machines with Distributed Memory

    Energy Technology Data Exchange (ETDEWEB)

    Uhlmann, M.

    2004-07-01

    We presented a method for the parallelization of an immersed boundary algorithm for particulate flows using the MPI standard of communication. The treatment of the fluid phase used the domain decomposition technique over a Cartesian processor grid. The solution of the Helmholtz problem is approximately factorized an relies upon apparel tri-diagonal solver the Poisson problem is solved by means of a parallel multi-grid technique similar to MUDPACK. for the solid phase we employ a master-slaves technique where one processor handles all the particles contained in its Eulerian fluid sub-domain and zero or more neighbor processors collaborate in the computation of particle-related quantities whenever a particle position over laps the boundary of a sub-domain. the parallel efficiency for some preliminary computations is presented. (Author) 9 refs.

  5. A design of a computer complex including vector processors

    International Nuclear Information System (INIS)

    Asai, Kiyoshi

    1982-12-01

    We, members of the Computing Center, Japan Atomic Energy Research Institute have been engaged for these six years in the research of adaptability of vector processing to large-scale nuclear codes. The research has been done in collaboration with researchers and engineers of JAERI and a computer manufacturer. In this research, forty large-scale nuclear codes were investigated from the viewpoint of vectorization. Among them, twenty-six codes were actually vectorized and executed. As the results of the investigation, it is now estimated that about seventy percents of nuclear codes and seventy percents of our total amount of CPU time of JAERI are highly vectorizable. Based on the data obtained by the investigation, (1)currently vectorizable CPU time, (2)necessary number of vector processors, (3)necessary manpower for vectorization of nuclear codes, (4)computing speed, memory size, number of parallel 1/0 paths, size and speed of 1/0 buffer of vector processor suitable for our applications, (5)necessary software and operational policy for use of vector processors are discussed, and finally (6)a computer complex including vector processors is presented in this report. (author)

  6. High performance graphics processors for medical imaging applications

    International Nuclear Information System (INIS)

    Goldwasser, S.M.; Reynolds, R.A.; Talton, D.A.; Walsh, E.S.

    1989-01-01

    This paper describes a family of high- performance graphics processors with special hardware for interactive visualization of 3D human anatomy. The basic architecture expands to multiple parallel processors, each processor using pipelined arithmetic and logical units for high-speed rendering of Computed Tomography (CT), Magnetic Resonance (MR) and Positron Emission Tomography (PET) data. User-selectable display alternatives include multiple 2D axial slices, reformatted images in sagittal or coronal planes and shaded 3D views. Special facilities support applications requiring color-coded display of multiple datasets (such as radiation therapy planning), or dynamic replay of time- varying volumetric data (such as cine-CT or gated MR studies of the beating heart). The current implementation is a single processor system which generates reformatted images in true real time (30 frames per second), and shaded 3D views in a few seconds per frame. It accepts full scale medical datasets in their native formats, so that minimal preprocessing delay exists between data acquisition and display

  7. Deterministic chaos in the processor load

    International Nuclear Information System (INIS)

    Halbiniak, Zbigniew; Jozwiak, Ireneusz J.

    2007-01-01

    In this article we present the results of research whose purpose was to identify the phenomenon of deterministic chaos in the processor load. We analysed the time series of the processor load during efficiency tests of database software. Our research was done on a Sparc Alpha processor working on the UNIX Sun Solaris 5.7 operating system. The conducted analyses proved the presence of the deterministic chaos phenomenon in the processor load in this particular case

  8. Parallel integer sorting with medium and fine-scale parallelism

    Science.gov (United States)

    Dagum, Leonardo

    1993-01-01

    Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.

  9. JPP: A Java Pre-Processor

    OpenAIRE

    Kiniry, Joseph R.; Cheong, Elaine

    1998-01-01

    The Java Pre-Processor, or JPP for short, is a parsing pre-processor for the Java programming language. Unlike its namesake (the C/C++ Pre-Processor, cpp), JPP provides functionality above and beyond simple textual substitution. JPP's capabilities include code beautification, code standard conformance checking, class and interface specification and testing, and documentation generation.

  10. Microfluidic devices and methods including porous polymer monoliths

    Science.gov (United States)

    Hatch, Anson V; Sommer, Gregory J; Singh, Anup K; Wang, Ying-Chih; Abhyankar, Vinay V

    2014-04-22

    Microfluidic devices and methods including porous polymer monoliths are described. Polymerization techniques may be used to generate porous polymer monoliths having pores defined by a liquid component of a fluid mixture. The fluid mixture may contain iniferters and the resulting porous polymer monolith may include surfaces terminated with iniferter species. Capture molecules may then be grafted to the monolith pores.

  11. Design and implementation of a high performance network security processor

    Science.gov (United States)

    Wang, Haixin; Bai, Guoqiang; Chen, Hongyi

    2010-03-01

    The last few years have seen many significant progresses in the field of application-specific processors. One example is network security processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from network processors (NPs). This article presents a high performance NSP system architecture implementation intended for both internet protocol security (IPSec) and secure socket layer (SSL) protocol acceleration, which are widely employed in virtual private network (VPN) and e-commerce applications. The efficient dual one-way pipelined data transfer skeleton and optimised integration scheme of the heterogenous parallel crypto engine arrays lead to a Gbps rate NSP, which is programmable with domain specific descriptor-based instructions. The descriptor-based control flow fragments large data packets and distributes them to the crypto engine arrays, which fully utilises the parallel computation resources and improves the overall system data throughput. A prototyping platform for this NSP design is implemented with a Xilinx XC3S5000 based FPGA chip set. Results show that the design gives a peak throughput for the IPSec ESP tunnel mode of 2.85 Gbps with over 2100 full SSL handshakes per second at a clock rate of 95 MHz.

  12. Massively Parallel Finite Element Programming

    KAUST Repository

    Heister, Timo

    2010-01-01

    Today\\'s large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.

  13. Massively Parallel Finite Element Programming

    KAUST Repository

    Heister, Timo; Kronbichler, Martin; Bangerth, Wolfgang

    2010-01-01

    Today's large finite element simulations require parallel algorithms to scale on clusters with thousands or tens of thousands of processor cores. We present data structures and algorithms to take advantage of the power of high performance computers in generic finite element codes. Existing generic finite element libraries often restrict the parallelization to parallel linear algebra routines. This is a limiting factor when solving on more than a few hundreds of cores. We describe routines for distributed storage of all major components coupled with efficient, scalable algorithms. We give an overview of our effort to enable the modern and generic finite element library deal.II to take advantage of the power of large clusters. In particular, we describe the construction of a distributed mesh and develop algorithms to fully parallelize the finite element calculation. Numerical results demonstrate good scalability. © 2010 Springer-Verlag.

  14. The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

    CERN Document Server

    Biesuz, Nicolo Vladi; The ATLAS collaboration; Luciano, Pierluigi; Magalotti, Daniel; Rossi, Enrico

    2015-01-01

    The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed on purpose to execute pattern matching with a high degree of parallelism. It finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 2 Gb/s serial links. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. We report on the performance of the intermedia...

  15. The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

    CERN Document Server

    Biesuz, Nicolo Vladi; The ATLAS collaboration; Luciano, Pierluigi; Magalotti, Daniel; Rossi, Enrico

    2015-01-01

    The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed to execute pattern matching with a high degree of parallelism. The AM system finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 828 2 Gbit/s serial links for a total in/out bandwidth of 56 Gb/s. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. ...

  16. Designing High-Performance Fuzzy Controllers Combining IP Cores and Soft Processors

    Directory of Open Access Journals (Sweden)

    Oscar Montiel-Ross

    2012-01-01

    Full Text Available This paper presents a methodology to integrate a fuzzy coprocessor described in VHDL (VHSIC Hardware Description Language to a soft processor embedded into an FPGA, which increases the throughput of the whole system, since the controller uses parallelism at the circuitry level for high-speed-demanding applications, the rest of the application can be written in C/C++. We used the ARM 32-bit soft processor, which allows sequential and parallel programming. The FLC coprocessor incorporates a tuning method that allows to manipulate the system response. We show experimental results using a fuzzy PD+I controller as the embedded coprocessor.

  17. Online Fastbus processor for LEP

    International Nuclear Information System (INIS)

    Mueller, H.

    1986-01-01

    The author describes the online computing aspects of Fastbus systems using a processor module which has been developed at CERN and is now available commercially. These General Purpose Master/Slaves (GPMS) are based on 68000/10 (or optionally 68020/68881) processors. Applications include use as event-filters (DELPHI), supervisory controllers, Fastbus stand-alone diagnostic tools, and multiprocessor array components. The direct mapping of single, 32-bit assembly instructions to execute Fastbus protocols makes the use of a GPM both simple and flexible. Loosely coupled processing in Fastbus networks is possible between GPM's as they support access semaphores and use a two port memory as I/O buffer for Fastbus. Both master and slave-ports support block transfers up to 20 Mbytes/s. The CERN standard Fastbus software and the MoniCa symbolic debugging monitor are available on the GPM with real time, multiprocessing support. (Auth.)

  18. Parallel thermal radiation transport in two dimensions

    International Nuclear Information System (INIS)

    Smedley-Stevenson, R.P.; Ball, S.R.

    2003-01-01

    This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)

  19. Parallel thermal radiation transport in two dimensions

    Energy Technology Data Exchange (ETDEWEB)

    Smedley-Stevenson, R.P.; Ball, S.R. [AWE Aldermaston (United Kingdom)

    2003-07-01

    This paper describes the distributed memory parallel implementation of a deterministic thermal radiation transport algorithm in a 2-dimensional ALE hydrodynamics code. The parallel algorithm consists of a variety of components which are combined in order to produce a state of the art computational capability, capable of solving large thermal radiation transport problems using Blue-Oak, the 3 Tera-Flop MPP (massive parallel processors) computing facility at AWE (United Kingdom). Particular aspects of the parallel algorithm are described together with examples of the performance on some challenging applications. (author)

  20. High performance parallel computers for science

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

    1989-01-01

    This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction

  1. Parallel Jacobi EVD Methods on Integrated Circuits

    Directory of Open Access Journals (Sweden)

    Chi-Chia Sun

    2014-01-01

    Full Text Available Design strategies for parallel iterative algorithms are presented. In order to further study different tradeoff strategies in design criteria for integrated circuits, A 10 × 10 Jacobi Brent-Luk-EVD array with the simplified μ-CORDIC processor is used as an example. The experimental results show that using the μ-CORDIC processor is beneficial for the design criteria as it yields a smaller area, faster overall computation time, and less energy consumption than the regular CORDIC processor. It is worth to notice that the proposed parallel EVD method can be applied to real-time and low-power array signal processing algorithms performing beamforming or DOA estimation.

  2. Parallel algorithms for boundary value problems

    Science.gov (United States)

    Lin, Avi

    1991-01-01

    A general approach to solve boundary value problems numerically in a parallel environment is discussed. The basic algorithm consists of two steps: the local step where all the P available processors work in parallel, and the global step where one processor solves a tridiagonal linear system of the order P. The main advantages of this approach are twofold. First, this suggested approach is very flexible, especially in the local step and thus the algorithm can be used with any number of processors and with any of the SIMD or MIMD machines. Secondly, the communication complexity is very small and thus can be used as easily with shared memory machines. Several examples for using this strategy are discussed.

  3. Computation and parallel implementation for early vision

    Science.gov (United States)

    Gualtieri, J. Anthony

    1990-01-01

    The problem of early vision is to transform one or more retinal illuminance images-pixel arrays-to image representations built out of such primitive visual features such as edges, regions, disparities, and clusters. These transformed representations form the input to later vision stages that perform higher level vision tasks including matching and recognition. Researchers developed algorithms for: (1) edge finding in the scale space formulation; (2) correlation methods for computing matches between pairs of images; and (3) clustering of data by neural networks. These algorithms are formulated for parallel implementation of SIMD machines, such as the Massively Parallel Processor, a 128 x 128 array processor with 1024 bits of local memory per processor. For some cases, researchers can show speedups of three orders of magnitude over serial implementations.

  4. On synchronous parallel computations with independent probabilistic choice

    International Nuclear Information System (INIS)

    Reif, J.H.

    1984-01-01

    This paper introduces probabilistic choice to synchronous parallel machine models; in particular parallel RAMs. The power of probabilistic choice in parallel computations is illustrate by parallelizing some known probabilistic sequential algorithms. The authors characterize the computational complexity of time, space, and processor bounded probabilistic parallel RAMs in terms of the computational complexity of probabilistic sequential RAMs. They show that parallelism uniformly speeds up time bounded probabilistic sequential RAM computations by nearly a quadratic factor. They also show that probabilistic choice can be eliminated from parallel computations by introducing nonuniformity

  5. NMRFx Processor: a cross-platform NMR data processing program

    International Nuclear Information System (INIS)

    Norris, Michael; Fetler, Bayard; Marchant, Jan; Johnson, Bruce A.

    2016-01-01

    NMRFx Processor is a new program for the processing of NMR data. Written in the Java programming language, NMRFx Processor is a cross-platform application and runs on Linux, Mac OS X and Windows operating systems. The application can be run in both a graphical user interface (GUI) mode and from the command line. Processing scripts are written in the Python programming language and executed so that the low-level Java commands are automatically run in parallel on computers with multiple cores or CPUs. Processing scripts can be generated automatically from the parameters of NMR experiments or interactively constructed in the GUI. A wide variety of processing operations are provided, including methods for processing of non-uniformly sampled datasets using iterative soft thresholding. The interactive GUI also enables the use of the program as an educational tool for teaching basic and advanced techniques in NMR data analysis.

  6. The fast tracker processor for hadron collider triggers

    CERN Document Server

    Annovi, A; Bardi, A; Carosi, R; Dell'Orso, Mauro; D'Onofrio, M; Giannetti, P; Iannaccone, G; Morsani, E; Pietri, M; Varotto, G

    2001-01-01

    Perspectives for precise and fast track reconstruction in future hadron collider experiments are addressed. We discuss the feasibility of a pipelined highly parallel processor dedicated to the implementation of a very fast tracking algorithm. The algorithm is based on the use of a large bank of pre-stored combinations of trajectory points, called patterns, for extremely complex tracking systems. The CMS experiment at LHC is used as a benchmark. Tracking data from the events selected by the level-1 trigger are sorted and filtered by the Fast Tracker processor at an input rate of 100 kHz. This data organization allows the level-2 trigger logic to reconstruct full resolution tracks with transverse momentum above a few GeV and search for secondary vertices within typical level-2 times. (15 refs).

  7. The fast tracker processor for hadronic collider triggers

    CERN Document Server

    Annovi, A; Bardi, A; Carosi, R; Dell'Orso, Mauro; D'Onofrio, M; Giannetti, P; Iannaccone, G; Morsani, F; Pietri, M; Varotto, G

    2000-01-01

    Perspective for precise and fast track reconstruction in future hadronic collider experiments are addressed. We discuss the feasibility of a pipelined highly parallelized processor dedicated to the implementation of a very fast algorithm. The algorithm is based on the use of a large bank of pre-stored combinations of trajectory points (patterns) for extremely complex tracking systems. The CMS experiment at LHC is used as a benchmark. Tracking data from the events selected by the level-1 trigger are sorted and filtered by the Fast Tracker processor at a rate of 100 kHz. This data organization allows the level-2 trigger logic to reconstruct full resolution traces with transverse momentum above few GeV and search secondary vertexes within typical level-2 times. 15 Refs.

  8. NMRFx Processor: a cross-platform NMR data processing program

    Energy Technology Data Exchange (ETDEWEB)

    Norris, Michael; Fetler, Bayard [One Moon Scientific, Inc. (United States); Marchant, Jan [University of Maryland Baltimore County, Howard Hughes Medical Institute (United States); Johnson, Bruce A., E-mail: bruce.johnson@asrc.cuny.edu [One Moon Scientific, Inc. (United States)

    2016-08-15

    NMRFx Processor is a new program for the processing of NMR data. Written in the Java programming language, NMRFx Processor is a cross-platform application and runs on Linux, Mac OS X and Windows operating systems. The application can be run in both a graphical user interface (GUI) mode and from the command line. Processing scripts are written in the Python programming language and executed so that the low-level Java commands are automatically run in parallel on computers with multiple cores or CPUs. Processing scripts can be generated automatically from the parameters of NMR experiments or interactively constructed in the GUI. A wide variety of processing operations are provided, including methods for processing of non-uniformly sampled datasets using iterative soft thresholding. The interactive GUI also enables the use of the program as an educational tool for teaching basic and advanced techniques in NMR data analysis.

  9. A VLSI image processor via pseudo-mersenne transforms

    International Nuclear Information System (INIS)

    Sei, W.J.; Jagadeesh, J.M.

    1986-01-01

    The computational burden on image processing in medical fields where a large amount of information must be processed quickly and accurately has led to consideration of special-purpose image processor chip design for some time. The very large scale integration (VLSI) resolution has made it cost-effective and feasible to consider the design of special purpose chips for medical imaging fields. This paper describes a VLSI CMOS chip suitable for parallel implementation of image processing algorithms and cyclic convolutions by using Pseudo-Mersenne Number Transform (PMNT). The main advantages of the PMNT over the Fast Fourier Transform (FFT) are: (1) no multiplications are required; (2) integer arithmetic is used. The design and development of this processor, which operates on 32-point convolution or 5 x 5 window image, are described

  10. Behavioral Simulation and Performance Evaluation of Multi-Processor Architectures

    Directory of Open Access Journals (Sweden)

    Ausif Mahmood

    1996-01-01

    Full Text Available The development of multi-processor architectures requires extensive behavioral simulations to verify the correctness of design and to evaluate its performance. A high level language can provide maximum flexibility in this respect if the constructs for handling concurrent processes and a time mapping mechanism are added. This paper describes a novel technique for emulating hardware processes involved in a parallel architecture such that an object-oriented description of the design is maintained. The communication and synchronization between hardware processes is handled by splitting the processes into their equivalent subprograms at the entry points. The proper scheduling of these subprograms is coordinated by a timing wheel which provides a time mapping mechanism. Finally, a high level language pre-processor is proposed so that the timing wheel and the process emulation details can be made transparent to the user.

  11. Optimisation of a parallel ocean general circulation model

    OpenAIRE

    M. I. Beare; D. P. Stevens

    1997-01-01

    International audience; This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by...

  12. Data communications in a parallel active messaging interface of a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Ratterman, Joseph D; Smith, Brian E

    2013-11-12

    Data communications in a parallel active messaging interface (`PAMI`) of a parallel computer composed of compute nodes that execute a parallel application, each compute node including application processors that execute the parallel application and at least one management processor dedicated to gathering information regarding data communications. The PAMI is composed of data communications endpoints, each endpoint composed of a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes and the endpoints coupled for data communications through the PAMI and through data communications resources. Embodiments function by gathering call site statistics describing data communications resulting from execution of data communications instructions and identifying in dependence upon the call cite statistics a data communications algorithm for use in executing a data communications instruction at a call site in the parallel application.

  13. .NET 4.5 parallel extensions

    CERN Document Server

    Freeman, Bryan

    2013-01-01

    This book contains practical recipes on everything you will need to create task-based parallel programs using C#, .NET 4.5, and Visual Studio. The book is packed with illustrated code examples to create scalable programs.This book is intended to help experienced C# developers write applications that leverage the power of modern multicore processors. It provides the necessary knowledge for an experienced C# developer to work with .NET parallelism APIs. Previous experience of writing multithreaded applications is not necessary.

  14. HTGR core seismic analysis using an array processor

    International Nuclear Information System (INIS)

    Shatoff, H.; Charman, C.M.

    1983-01-01

    A Floating Point Systems array processor performs nonlinear dynamic analysis of the high-temperature gas-cooled reactor (HTGR) core with significant time and cost savings. The graphite HTGR core consists of approximately 8000 blocks of various shapes which are subject to motion and impact during a seismic event. Two-dimensional computer programs (CRUNCH2D, MCOCO) can perform explicit step-by-step dynamic analyses of up to 600 blocks for time-history motions. However, use of two-dimensional codes was limited by the large cost and run times required. Three-dimensional analysis of the entire core, or even a large part of it, had been considered totally impractical. Because of the needs of the HTGR core seismic program, a Floating Point Systems array processor was used to enhance computer performance of the two-dimensional core seismic computer programs, MCOCO and CRUNCH2D. This effort began by converting the computational algorithms used in the codes to a form which takes maximum advantage of the parallel and pipeline processors offered by the architecture of the Floating Point Systems array processor. The subsequent conversion of the vectorized FORTRAN coding to the array processor required a significant programming effort to make the system work on the General Atomic (GA) UNIVAC 1100/82 host. These efforts were quite rewarding, however, since the cost of running the codes has been reduced approximately 50-fold and the time threefold. The core seismic analysis with large two-dimensional models has now become routine and extension to three-dimensional analysis is feasible. These codes simulate the one-fifth-scale full-array HTGR core model. This paper compares the analysis with the test results for sine-sweep motion

  15. Hypercube Expert System Shell - Applying Production Parallelism.

    Science.gov (United States)

    1989-12-01

    possible processor organizations, or int( rconntction n thod,, for par- allel architetures . The following are examples of commonlv used interconnection...this timing analysis because match speed-up avaiiah& from production parallelism is proportional to the average number of affected produclions1 ( 11:5

  16. Learning and Parallelization Boost Constraint Search

    Science.gov (United States)

    Yun, Xi

    2013-01-01

    Constraint satisfaction problems are a powerful way to abstract and represent academic and real-world problems from both artificial intelligence and operations research. A constraint satisfaction problem is typically addressed by a sequential constraint solver running on a single processor. Rather than construct a new, parallel solver, this work…

  17. Heuristic framework for parallel sorting computations | Nwanze ...

    African Journals Online (AJOL)

    Parallel sorting techniques have become of practical interest with the advent of new multiprocessor architectures. The decreasing cost of these processors will probably in the future, make the solutions that are derived thereof to be more appealing. Efficient algorithms for sorting scheme that are encountered in a number of ...

  18. Monolithic solid-state lasers for spaceflight

    Science.gov (United States)

    Krainak, Michael A.; Yu, Anthony W.; Stephen, Mark A.; Merritt, Scott; Glebov, Leonid; Glebova, Larissa; Ryasnyanskiy, Aleksandr; Smirnov, Vadim; Mu, Xiaodong; Meissner, Stephanie; Meissner, Helmuth

    2015-02-01

    A new solution for building high power, solid state lasers for space flight is to fabricate the whole laser resonator in a single (monolithic) structure or alternatively to build a contiguous diffusion bonded or welded structure. Monolithic lasers provide numerous advantages for space flight solid-state lasers by minimizing misalignment concerns. The closed cavity is immune to contamination. The number of components is minimized thus increasing reliability. Bragg mirrors serve as the high reflector and output coupler thus minimizing optical coatings and coating damage. The Bragg mirrors also provide spectral and spatial mode selection for high fidelity. The monolithic structure allows short cavities resulting in short pulses. Passive saturable absorber Q-switches provide a soft aperture for spatial mode filtering and improved pointing stability. We will review our recent commercial and in-house developments toward fully monolithic solid-state lasers.

  19. Monolithically integrated 8-channel WDM reflective modulator

    NARCIS (Netherlands)

    Stopinski, S.T.; Malinowski, M.; Piramidowicz, R.; Smit, M.K.; Leijtens, X.J.M.

    2013-01-01

    In this work the design and characterization of a monolithically integrated photonic circuit acting as a reflective modulator for eight WDM channels is presented. The chip was designed and fabricated in a generic integration technology

  20. High-speed parallel solution of the neutron diffusion equation with the hierarchical domain decomposition boundary element method incorporating parallel communications

    International Nuclear Information System (INIS)

    Tsuji, Masashi; Chiba, Gou

    2000-01-01

    A hierarchical domain decomposition boundary element method (HDD-BEM) for solving the multiregion neutron diffusion equation (NDE) has been fully parallelized, both for numerical computations and for data communications, to accomplish a high parallel efficiency on distributed memory message passing parallel computers. Data exchanges between node processors that are repeated during iteration processes of HDD-BEM are implemented, without any intervention of the host processor that was used to supervise parallel processing in the conventional parallelized HDD-BEM (P-HDD-BEM). Thus, the parallel processing can be executed with only cooperative operations of node processors. The communication overhead was even the dominant time consuming part in the conventional P-HDD-BEM, and the parallelization efficiency decreased steeply with the increase of the number of processors. With the parallel data communication, the efficiency is affected only by the number of boundary elements assigned to decomposed subregions, and the communication overhead can be drastically reduced. This feature can be particularly advantageous in the analysis of three-dimensional problems where a large number of processors are required. The proposed P-HDD-BEM offers a promising solution to the deterioration problem of parallel efficiency and opens a new path to parallel computations of NDEs on distributed memory message passing parallel computers. (author)

  1. Monolithic multinozzle emitters for nanoelectrospray mass spectrometry

    Science.gov (United States)

    Wang, Daojing [Daly City, CA; Yang, Peidong [Kensington, CA; Kim, Woong [Seoul, KR; Fan, Rong [Pasadena, CA

    2011-09-20

    Novel and significantly simplified procedures for fabrication of fully integrated nanoelectrospray emitters have been described. For nanofabricated monolithic multinozzle emitters (NM.sup.2 emitters), a bottom up approach using silicon nanowires on a silicon sliver is used. For microfabricated monolithic multinozzle emitters (M.sup.3 emitters), a top down approach using MEMS techniques on silicon wafers is used. The emitters have performance comparable to that of commercially-available silica capillary emitters for nanoelectrospray mass spectrometry.

  2. Decomposition of monolithic web application to microservices

    OpenAIRE

    Zaymus, Mikulas

    2017-01-01

    Solteq Oyj has an internal Wellbeing project for massage reservations. The task of this thesis was to transform the monolithic architecture of this application to microservices. The thesis starts with a detailed comparison between microservices and monolithic application. It points out the benefits and disadvantages microservice architecture can bring to the project. Next, it describes the theory and possible strategies that can be used in the process of decomposition of an existing monoli...

  3. Activated Carbon Fiber Monoliths as Supercapacitor Electrodes

    Directory of Open Access Journals (Sweden)

    Gelines Moreno-Fernandez

    2017-01-01

    Full Text Available Activated carbon fibers (ACF are interesting candidates for electrodes in electrochemical energy storage devices; however, one major drawback for practical application is their low density. In the present work, monoliths were synthesized from two different ACFs, reaching 3 times higher densities than the original ACFs’ apparent densities. The porosity of the monoliths was only slightly decreased with respect to the pristine ACFs, the employed PVDC binder developing additional porosity upon carbonization. The ACF monoliths are essentially microporous and reach BET surface areas of up to 1838 m2 g−1. SEM analysis reveals that the ACFs are well embedded into the monolith structure and that their length was significantly reduced due to the monolith preparation process. The carbonized monoliths were studied as supercapacitor electrodes in two- and three-electrode cells having 2 M H2SO4 as electrolyte. Maximum capacitances of around 200 F g−1 were reached. The results confirm that the capacitance of the bisulfate anions essentially originates from the double layer, while hydronium cations contribute with a mixture of both, double layer capacitance and pseudocapacitance.

  4. Parallel rendering

    Science.gov (United States)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  5. Processor farming method for multi-scale analysis of masonry structures

    Science.gov (United States)

    Krejčí, Tomáš; Koudelka, Tomáš

    2017-07-01

    This paper describes a processor farming method for a coupled heat and moisture transport in masonry using a two-level approach. The motivation for the two-level description comes from difficulties connected with masonry structures, where the size of stone blocks is much larger than the size of mortar layers and very fine finite element mesh has to be used. The two-level approach is suitable for parallel computing because nearly all computations can be performed independently with little synchronization. This approach is called processor farming. The master processor is dealing with the macro-scale level - the structure and the slave processors are dealing with a homogenization procedure on the meso-scale level which is represented by an appropriate representative volume element.

  6. Use of a track and vertex processor in a fixed-target charm experiment

    International Nuclear Information System (INIS)

    Schub, M.H.; Carey, T.A.; Hsiung, Y.B.; Kaplan, D.M.; Lee, C.; Miller, G.; Sa, J.; Teng, P.K.

    1996-01-01

    We have constructed and operated a high-speed parallel-pipelined track and vertex processor and used it to trigger data acquisition in a high-rate charm and beauty experiment at Fermilab. The processor uses information from hodoscopes and wire chambers to reconstruct tracks in the bend view of a magnetic spectrometer, and uses these tracks to find the corresponding tracks in a set of silicon-strip detectors. The processor then forms vertices and triggers the experiment if at least one vertex is downstream of the target. Under typical charm running conditions, with an interaction rate of ∼5 MHz, the processor rejects 80-90% of lower-level triggers while maintaining efficiency of ∼70% for two-prong D-meson decays. (orig.)

  7. Parallel computations

    CERN Document Server

    1982-01-01

    Parallel Computations focuses on parallel computation, with emphasis on algorithms used in a variety of numerical and physical applications and for many different types of parallel computers. Topics covered range from vectorization of fast Fourier transforms (FFTs) and of the incomplete Cholesky conjugate gradient (ICCG) algorithm on the Cray-1 to calculation of table lookups and piecewise functions. Single tridiagonal linear systems and vectorized computation of reactive flow are also discussed.Comprised of 13 chapters, this volume begins by classifying parallel computers and describing techn

  8. Vectorization, parallelization and porting of nuclear codes. Vectorization and parallelization. Progress report fiscal 1999

    Energy Technology Data Exchange (ETDEWEB)

    Adachi, Masaaki; Ogasawara, Shinobu; Kume, Etsuo [Japan Atomic Energy Research Inst., Tokai, Ibaraki (Japan). Tokai Research Establishment; Ishizuki, Shigeru; Nemoto, Toshiyuki; Kawasaki, Nobuo; Kawai, Wataru [Fujitsu Ltd., Tokyo (Japan); Yatake, Yo-ichi [Hitachi Ltd., Tokyo (Japan)

    2001-02-01

    Several computer codes in the nuclear field have been vectorized, parallelized and trans-ported on the FUJITSU VPP500 system, the AP3000 system, the SX-4 system and the Paragon system at Center for Promotion of Computational Science and Engineering in Japan Atomic Energy Research Institute. We dealt with 18 codes in fiscal 1999. These results are reported in 3 parts, i.e., the vectorization and the parallelization part on vector processors, the parallelization part on scalar processors and the porting part. In this report, we describe the vectorization and parallelization on vector processors. In this vectorization and parallelization on vector processors part, the vectorization of Relativistic Molecular Orbital Calculation code RSCAT, a microscopic transport code for high energy nuclear collisions code JAM, three-dimensional non-steady thermal-fluid analysis code STREAM, Relativistic Density Functional Theory code RDFT and High Speed Three-Dimensional Nodal Diffusion code MOSRA-Light on the VPP500 system and the SX-4 system are described. (author)

  9. Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

    Science.gov (United States)

    Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

    1990-01-01

    Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

  10. Novel memory architecture for video signal processor

    Science.gov (United States)

    Hung, Jen-Sheng; Lin, Chia-Hsing; Jen, Chein-Wei

    1993-11-01

    An on-chip memory architecture for video signal processor (VSP) is proposed. This memory structure is a two-level design for the different data locality in video applications. The upper level--Memory A provides enough storage capacity to reduce the impact on the limitation of chip I/O bandwidth, and the lower level--Memory B provides enough data parallelism and flexibility to meet the requirements of multiple reconfigurable pipeline function units in a single VSP chip. The needed memory size is decided by the memory usage analysis for video algorithms and the number of function units. Both levels of memory adopted a dual-port memory scheme to sustain the simultaneous read and write operations. Especially, Memory B uses multiple one-read-one-write memory banks to emulate the real multiport memory. Therefore, one can change the configuration of Memory B to several sets of memories with variable read/write ports by adjusting the bus switches. Then the numbers of read ports and write ports in proposed memory can meet requirement of data flow patterns in different video coding algorithms. We have finished the design of a prototype memory design using 1.2- micrometers SPDM SRAM technology and will fabricated it through TSMC, in Taiwan.

  11. Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

    Science.gov (United States)

    Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

    2004-09-01

    We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.

  12. Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications

    International Nuclear Information System (INIS)

    Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J.

    2004-01-01

    We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1x10 8 or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8x10 8 histories. For a smaller number of histories (1x10 8 ) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1x10 8 histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy

  13. Parallelising a molecular dynamics algorithm on a multi-processor workstation

    Science.gov (United States)

    Müller-Plathe, Florian

    1990-12-01

    The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.

  14. High Fidelity, Numerical Investigation of Cross Talk in a Multi-Qubit Xmon Processor

    Science.gov (United States)

    Najafi-Yazdi, Alireza; Kelly, Julian; Martinis, John

    Unwanted electromagnetic interference between qubits, transmission lines, flux lines and other elements of a superconducting quantum processor poses a challenge in engineering such devices. This problem is exacerbated with scaling up the number of qubits. High fidelity, massively parallel computational toolkits, which can simulate the 3D electromagnetic environment and all features of the device, are instrumental in addressing this challenge. In this work, we numerically investigated the crosstalk between various elements of a multi-qubit quantum processor designed and tested by the Google team. The processor consists of 6 superconducting Xmon qubits with flux lines and gatelines. The device also consists of a Purcell filter for readout. The simulations are carried out with a high fidelity, massively parallel EM solver. We will present our findings regarding the sources of crosstalk in the device, as well as numerical model setup, and a comparison with available experimental data.

  15. Uncooled monolithic ferroelectric IRFPA technology

    Science.gov (United States)

    Belcher, James F.; Hanson, Charles M.; Beratan, Howard R.; Udayakumar, K. R.; Soch, Kevin L.

    1998-10-01

    Once relegated to expensive military platforms, occasionally to civilian platforms, and envisioned for individual soldiers, uncooled thermal imaging affords cost-effective solutions for police cars, commercial surveillance, driving aids, and a variety of other industrial and consumer applications. System prices are continuing to drop, and swelling production volume will soon drive prices substantially lower. The impetus for further development is to improve performance. Hybrid barium strontium titanate (BST) detectors currently in production are relatively inexpensive, but have limited potential for improved performance. The MTF at high frequencies is limited by thermal conduction through the optical coating. Microbolometer arrays in development at Raytheon have recently demonstrated performance superior to hybrid detectors. However, microbolometer technology lacks a mature, low-cost system technology and an abundance of upgradable, deployable system implementations. Thin-film ferroelectric (TFFE) detectors have all the performance potential of microbolometers. They are also compatible with numerous fielded and planned system implementations. Like the resistive microbolometer, the TFFE detector is monolithic; i.e., the detector material is deposited directly on the readout IC rather than being bump bonded to it. Imaging arrays of 240 X 320 pixels have been produced, demonstrating the feasibility of the technology.

  16. Low-communication parallel quantum multi-target preimage search

    NARCIS (Netherlands)

    Banegas, G.S.; Bernstein, D.J.; Adams, Carlisle; Camenisch, Jan

    2017-01-01

    The most important pre-quantum threat to AES-128 is the 1994 van Oorschot–Wiener “parallel rho method”, a low-communication parallel pre-quantum multi-target preimage-search algorithm. This algorithm uses a mesh of p small processors, each running for approximately 2 128 /pt 2128/pt fast steps, to

  17. Parallel Algorithm for Adaptive Numerical Integration

    International Nuclear Information System (INIS)

    Sujatmiko, M.; Basarudin, T.

    1997-01-01

    This paper presents an automation algorithm for integration using adaptive trapezoidal method. The interval is adaptively divided where the width of sub interval are different and fit to the behavior of its function. For a function f, an integration on interval [a,b] can be obtained, with maximum tolerance ε, using estimation (f, a, b, ε). The estimated solution is valid if the error is still in a reasonable range, fulfil certain criteria. If the error is big, however, the problem is solved by dividing it into to similar and independent sub problem on to separate [a, (a+b)/2] and [(a+b)/2, b] interval, i. e. ( f, a, (a+b)/2, ε/2) and (f, (a+b)/2, b, ε/2) estimations. The problems are solved in two different kinds of processor, root processor and worker processor. Root processor function ti divide a main problem into sub problems and distribute them to worker processor. The division mechanism may go further until all of the sub problem are resolved. The solution of each sub problem is then submitted to the root processor such that the solution for the main problem can be obtained. The algorithm is implemented on C-programming-base distributed computer networking system under parallel virtual machine platform

  18. Parallel community climate model: Description and user`s guide

    Energy Technology Data Exchange (ETDEWEB)

    Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H. [and others

    1996-07-15

    This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.

  19. Does the Intel Xeon Phi processor fit HEP workloads?

    Science.gov (United States)

    Nowak, A.; Bitzes, G.; Dotti, A.; Lazzaro, A.; Jarp, S.; Szostek, P.; Valsan, L.; Botezatu, M.; Leduc, J.

    2014-06-01

    This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.

  20. Negative base encoding in optical linear algebra processors

    Science.gov (United States)

    Perlee, C.; Casasent, D.

    1986-01-01

    In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary. Attention is presently given to negative base encoding, treating base -2 initially, and then showing that the negative base system can be readily extended to any radix. In general, negative base encoding in optical linear algebra processors represents a more efficient technique than either sign magnitude or 2's complement encoding, when the additions of digitally encoded products are performed in parallel.

  1. Does the Intel Xeon Phi processor fit HEP workloads?

    International Nuclear Information System (INIS)

    Nowak, A; Bitzes, G; Dotti, A; Lazzaro, A; Jarp, S; Szostek, P; Valsan, L; Botezatu, M; Leduc, J

    2014-01-01

    This paper summarizes the five years of CERN openlab's efforts focused on the Intel Xeon Phi co-processor, from the time of its inception to public release. We consider the architecture of the device vis a vis the characteristics of HEP software and identify key opportunities for HEP processing, as well as scaling limitations. We report on improvements and speedups linked to parallelization and vectorization on benchmarks involving software frameworks such as Geant4 and ROOT. Finally, we extrapolate current software and hardware trends and project them onto accelerators of the future, with the specifics of offline and online HEP processing in mind.

  2. Addressing Thermal and Performance Variability Issues in Dynamic Processors

    Energy Technology Data Exchange (ETDEWEB)

    Yoshii, Kazutomo [Argonne National Lab. (ANL), Argonne, IL (United States); Llopis, Pablo [Univ. Carlos III de Madrid (Spain); Zhang, Kaicheng [Northwestern Univ., Evanston, IL (United States); Luo, Yingyi [Northwestern Univ., Evanston, IL (United States); Ogrenci-Memik, Seda [Northwestern Univ., Evanston, IL (United States); Memik, Gokhan [Northwestern Univ., Evanston, IL (United States); Sankaran, Rajesh [Argonne National Lab. (ANL), Argonne, IL (United States); Beckman, Pete [Argonne National Lab. (ANL), Argonne, IL (United States)

    2017-03-01

    As CMOS scaling nears its end, parameter variations (process, temperature and voltage) are becoming a major concern. To overcome parameter variations and provide stability, modern processors are becoming dynamic, opportunistically adjusting voltage and frequency based on thermal and energy constraints, which negatively impacts traditional bulk-synchronous parallelism-minded hardware and software designs. As node-level architecture is growing in complexity, implementing variation control mechanisms only with hardware can be a challenging task. In this paper we investigate a software strategy to manage hardwareinduced variations, leveraging low-level monitoring/controlling mechanisms.

  3. RAMA: A file system for massively parallel computers

    Science.gov (United States)

    Miller, Ethan L.; Katz, Randy H.

    1993-01-01

    This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.

  4. Accuracies Of Optical Processors For Adaptive Optics

    Science.gov (United States)

    Downie, John D.; Goodman, Joseph W.

    1992-01-01

    Paper presents analysis of accuracies and requirements concerning accuracies of optical linear-algebra processors (OLAP's) in adaptive-optics imaging systems. Much faster than digital electronic processor and eliminate some residual distortion. Question whether errors introduced by analog processing of OLAP overcome advantage of greater speed. Paper addresses issue by presenting estimate of accuracy required in general OLAP that yields smaller average residual aberration of wave front than digital electronic processor computing at given speed.

  5. Functional Verification of Enhanced RISC Processor

    OpenAIRE

    SHANKER NILANGI; SOWMYA L

    2013-01-01

    This paper presents design and verification of a 32-bit enhanced RISC processor core having floating point computations integrated within the core, has been designed to reduce the cost and complexity. The designed 3 stage pipelined 32-bit RISC processor is based on the ARM7 processor architecture with single precision floating point multiplier, floating point adder/subtractor for floating point operations and 32 x 32 booths multiplier added to the integer core of ARM7. The binary representati...

  6. Automatic differentiation for design sensitivity analysis of structural systems using multiple processors

    Science.gov (United States)

    Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi

    1994-01-01

    An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.

  7. Alternative Water Processor Test Development

    Science.gov (United States)

    Pickering, Karen D.; Mitchell, Julie; Vega, Leticia; Adam, Niklas; Flynn, Michael; Wjee (er. Rau); Lunn, Griffin; Jackson, Andrew

    2012-01-01

    The Next Generation Life Support Project is developing an Alternative Water Processor (AWP) as a candidate water recovery system for long duration exploration missions. The AWP consists of biological water processor (BWP) integrated with a forward osmosis secondary treatment system (FOST). The basis of the BWP is a membrane aerated biological reactor (MABR), developed in concert with Texas Tech University. Bacteria located within the MABR metabolize organic material in wastewater, converting approximately 90% of the total organic carbon to carbon dioxide. In addition, bacteria convert a portion of the ammonia-nitrogen present in the wastewater to nitrogen gas, through a combination of nitrogen and denitrification. The effluent from the BWP system is low in organic contaminants, but high in total dissolved solids. The FOST system, integrated downstream of the BWP, removes dissolved solids through a combination of concentration-driven forward osmosis and pressure driven reverse osmosis. The integrated system is expected to produce water with a total organic carbon less than 50 mg/l and dissolved solids that meet potable water requirements for spaceflight. This paper describes the test definition, the design of the BWP and FOST subsystems, and plans for integrated testing.

  8. The UA1 trigger processor

    International Nuclear Information System (INIS)

    Grayer, G.H.

    1981-01-01

    Experiment UA1 is a large multi-purpose spectrometer at the CERN proton-antiproton collider, scheduled for late 1981. The principal trigger is formed on the basis of the energy deposition in calorimeters. A trigger decision taken in under 2.4 microseconds can avoid dead time losses due to the bunched nature of the beam. To achieve this we have built fast 8-bit charge to digital converters followed by two identical digital processors tailored to the experiment. The outputs of groups of the 2440 photomultipliers in the calorimeters are summed to form a total of 288 input channels to the ADCs. A look-up table in RAM is used to convert the digitised photomultiplier signals to energy in one processor, combinations of input channels, and also counts the number of clusters with electromagnetic or hadronic energy above pre-determined levels. Up to twelve combinations of these conditions, together with external information, may be combined in coincidence or in veto to form the final trigger. Provision has been made for testing using simulated data in an off-line mode, and sampling real data when on-line. (orig.)

  9. Efficient Execution of Video Applications on Heterogeneous Multi- and Many-Core Processors

    NARCIS (Netherlands)

    Pereira de Azevedo Filho, A.

    2011-01-01

    In this dissertation we present methodologies and evaluations aiming at increasing the efficiency of video coding applications for heterogeneous many-core processors composed of SIMD-only, scratchpad memory based cores. Our contributions are spread in three different fronts: thread-level parallelism

  10. Multi-processor system for real-time flow estimation in medical ultrasound imaging

    DEFF Research Database (Denmark)

    Stetson, Paul F.; Jensen, Jesper Lomborg; Antonius, Peter

    1997-01-01

    the processed data. The generous bandwidth of the links makes it easy to balance the computational load among the processors.In order to manage the shared system memory and to make use of the parallel processing capabilities of the system, a real-time multitasking kernel has been developed. The kernel uses...

  11. Use of Digital Signal Processors (DSP) in high energy physics experiments

    International Nuclear Information System (INIS)

    Crosetto, D.

    1988-01-01

    The FDDP - Fast Digital Data Processor - is a modular system for executing parallel digital processing algorithms to perform programmable trigger decisions or programmable on-line data reduction. Typical application involve zero suppression and pulse shape analysis. The characteristics of the system are: modularity, expandability and flexibility. (author). 4 refs, 5 figs

  12. Data register and processor for multiwire chambers

    International Nuclear Information System (INIS)

    Karpukhin, V.V.

    1985-01-01

    A data register and a processor for data receiving and processing from drift chambers of a device for investigating relativistic positroniums are described. The data are delivered to the register input in the form of the Grey 8 bit code, memorized and transformed to a position code. The register information is delivered to the KAMAK trunk and to the front panel plug. The processor selects particle tracks in a horizontal plane of the facility. ΔY maximum coordinate divergence and minimum point quantity on the track are set from the processor front panel. Processor solution time is 16 μs maximum quantity of simultaneously analyzed coordinates is 16

  13. Many - body simulations using an array processor

    International Nuclear Information System (INIS)

    Rapaport, D.C.

    1985-01-01

    Simulations of microscopic models of water and polypeptides using molecular dynamics and Monte Carlo techniques have been carried out with the aid of an FPS array processor. The computational techniques are discussed, with emphasis on the development and optimization of the software to take account of the special features of the processor. The computing requirements of these simulations exceed what could be reasonably carried out on a normal 'scientific' computer. While the FPS processor is highly suited to the kinds of models described, several other computationally intensive problems in statistical mechanics are outlined for which alternative processor architectures are more appropriate

  14. Sensitometric control of roentgen film processors

    International Nuclear Information System (INIS)

    Forsberg, H.; Karolinska Sjukhuset, Stockholm

    1987-01-01

    Monitoring of film processors performance is essential since image quality, patient dose and costs are influenced by the performance. A system for sensitometric constancy control of film processors and their associated components is described. Experience with the system for 3 years is given when implemented on 17 film processors. Modern high quality film processors have a stability that makes a test frequency of once a week sufficient to maintain adequate image quality. The test system is so sensitive that corrective actions almost invariably have been taken before any technical problem degraded the image quality to a visible degree. (orig.)

  15. Applications of the parallel computing system using network

    International Nuclear Information System (INIS)

    Ido, Shunji; Hasebe, Hiroki

    1994-01-01

    Parallel programming is applied to multiple processors connected in Ethernet. Data exchanges between tasks located in each processing element are realized by two ways. One is socket which is standard library on recent UNIX operating systems. Another is a network connecting software, named as Parallel Virtual Machine (PVM) which is a free software developed by ORNL, to use many workstations connected to network as a parallel computer. This paper discusses the availability of parallel computing using network and UNIX workstations and comparison between specialized parallel systems (Transputer and iPSC/860) in a Monte Carlo simulation which generally shows high parallelization ratio. (author)

  16. Parallelization of Subchannel Analysis Code MATRA

    International Nuclear Information System (INIS)

    Kim, Seongjin; Hwang, Daehyun; Kwon, Hyouk

    2014-01-01

    A stand-alone calculation of MATRA code used up pertinent computing time for the thermal margin calculations while a relatively considerable time is needed to solve the whole core pin-by-pin problems. In addition, it is strongly required to improve the computation speed of the MATRA code to satisfy the overall performance of the multi-physics coupling calculations. Therefore, a parallel approach to improve and optimize the computability of the MATRA code is proposed and verified in this study. The parallel algorithm is embodied in the MATRA code using the MPI communication method and the modification of the previous code structure was minimized. An improvement is confirmed by comparing the results between the single and multiple processor algorithms. The speedup and efficiency are also evaluated when increasing the number of processors. The parallel algorithm was implemented to the subchannel code MATRA using the MPI. The performance of the parallel algorithm was verified by comparing the results with those from the MATRA with the single processor. It is also noticed that the performance of the MATRA code was greatly improved by implementing the parallel algorithm for the 1/8 core and whole core problems

  17. Improvement of Parallel Algorithm for MATRA Code

    International Nuclear Information System (INIS)

    Kim, Seong-Jin; Seo, Kyong-Won; Kwon, Hyouk; Hwang, Dae-Hyun

    2014-01-01

    The feasibility study to parallelize the MATRA code was conducted in KAERI early this year. As a result, a parallel algorithm for the MATRA code has been developed to decrease a considerably required computing time to solve a bigsize problem such as a whole core pin-by-pin problem of a general PWR reactor and to improve an overall performance of the multi-physics coupling calculations. It was shown that the performance of the MATRA code was greatly improved by implementing the parallel algorithm using MPI communication. For problems of a 1/8 core and whole core for SMART reactor, a speedup was evaluated as about 10 when the numbers of used processor were 25. However, it was also shown that the performance deteriorated as the axial node number increased. In this paper, the procedure of a communication between processors is optimized to improve the previous parallel algorithm.. To improve the performance deterioration of the parallelized MATRA code, the communication algorithm between processors was newly presented. It was shown that the speedup was improved and stable regardless of the axial node number

  18. Automatic Parallelization Tool: Classification of Program Code for Parallel Computing

    Directory of Open Access Journals (Sweden)

    Mustafa Basthikodi

    2016-04-01

    Full Text Available Performance growth of single-core processors has come to a halt in the past decade, but was re-enabled by the introduction of parallelism in processors. Multicore frameworks along with Graphical Processing Units empowered to enhance parallelism broadly. Couples of compilers are updated to developing challenges forsynchronization and threading issues. Appropriate program and algorithm classifications will have advantage to a great extent to the group of software engineers to get opportunities for effective parallelization. In present work we investigated current species for classification of algorithms, in that related work on classification is discussed along with the comparison of issues that challenges the classification. The set of algorithms are chosen which matches the structure with different issues and perform given task. We have tested these algorithms utilizing existing automatic species extraction toolsalong with Bones compiler. We have added functionalities to existing tool, providing a more detailed characterization. The contributions of our work include support for pointer arithmetic, conditional and incremental statements, user defined types, constants and mathematical functions. With this, we can retain significant data which is not captured by original speciesof algorithms. We executed new theories into the device, empowering automatic characterization of program code.

  19. Parallel computational in nuclear group constant calculation

    International Nuclear Information System (INIS)

    Su'ud, Zaki; Rustandi, Yaddi K.; Kurniadi, Rizal

    2002-01-01

    In this paper parallel computational method in nuclear group constant calculation using collision probability method will be discuss. The main focus is on the calculation of collision matrix which need large amount of computational time. The geometry treated here is concentric cylinder. The calculation of collision probability matrix is carried out using semi analytic method using Beckley Naylor Function. To accelerate computation speed some computer parallel used to solve the problem. We used LINUX based parallelization using PVM software with C or fortran language. While in windows based we used socket programming using DELPHI or C builder. The calculation results shows the important of optimal weight for each processor in case there area many type of processor speed

  20. Parallel algorithms

    CERN Document Server

    Casanova, Henri; Robert, Yves

    2008-01-01

    ""…The authors of the present book, who have extensive credentials in both research and instruction in the area of parallelism, present a sound, principled treatment of parallel algorithms. … This book is very well written and extremely well designed from an instructional point of view. … The authors have created an instructive and fascinating text. The book will serve researchers as well as instructors who need a solid, readable text for a course on parallelism in computing. Indeed, for anyone who wants an understandable text from which to acquire a current, rigorous, and broad vi

  1. Implementations of BLAST for parallel computers.

    Science.gov (United States)

    Jülich, A

    1995-02-01

    The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.

  2. Language constructs for modular parallel programs

    Energy Technology Data Exchange (ETDEWEB)

    Foster, I.

    1996-03-01

    We describe programming language constructs that facilitate the application of modular design techniques in parallel programming. These constructs allow us to isolate resource management and processor scheduling decisions from the specification of individual modules, which can themselves encapsulate design decisions concerned with concurrence, communication, process mapping, and data distribution. This approach permits development of libraries of reusable parallel program components and the reuse of these components in different contexts. In particular, alternative mapping strategies can be explored without modifying other aspects of program logic. We describe how these constructs are incorporated in two practical parallel programming languages, PCN and Fortran M. Compilers have been developed for both languages, allowing experimentation in substantial applications.

  3. Feed-forward volume rendering algorithm for moderately parallel MIMD machines

    Science.gov (United States)

    Yagel, Roni

    1993-01-01

    Algorithms for direct volume rendering on parallel and vector processors are investigated. Volumes are transformed efficiently on parallel processors by dividing the data into slices and beams of voxels. Equal sized sets of slices along one axis are distributed to processors. Parallelism is achieved at two levels. Because each slice can be transformed independently of others, processors transform their assigned slices with no communication, thus providing maximum possible parallelism at the first level. Within each slice, consecutive beams are incrementally transformed using coherency in the transformation computation. Also, coherency across slices can be exploited to further enhance performance. This coherency yields the second level of parallelism through the use of the vector processing or pipelining. Other ongoing efforts include investigations into image reconstruction techniques, load balancing strategies, and improving performance.

  4. Massively parallel performance of neutron transport response matrix algorithms

    International Nuclear Information System (INIS)

    Hanebutte, U.R.; Lewis, E.E.

    1993-01-01

    Massively parallel red/black response matrix algorithms for the solution of within-group neutron transport problems are implemented on the Connection Machines-2, 200 and 5. The response matrices are dericed from the diamond-differences and linear-linear nodal discrete ordinate and variational nodal P 3 approximations. The unaccelerated performance of the iterative procedure is examined relative to the maximum rated performances of the machines. The effects of processor partitions size, of virtual processor ratio and of problems size are examined in detail. For the red/black algorithm, the ratio of inter-node communication to computing times is found to be quite small, normally of the order of ten percent or less. Performance increases with problems size and with virtual processor ratio, within the memeory per physical processor limitation. Algorithm adaptation to courser grain machines is straight-forward, with total computing time being virtually inversely proportional to the number of physical processors. (orig.)

  5. Producing chopped firewood with firewood processors

    International Nuclear Information System (INIS)

    Kaerhae, K.; Jouhiaho, A.

    2009-01-01

    The TTS Institute's research and development project studied both the productivity of new, chopped firewood processors (cross-cutting and splitting machines) suitable for professional and independent small-scale production, and the costs of the chopped firewood produced. Seven chopped firewood processors were tested in the research, six of which were sawing processors and one shearing processor. The chopping work was carried out using wood feeding racks and a wood lifter. The work was also carried out without any feeding appliances. Altogether 132.5 solid m 3 of wood were chopped in the time studies. The firewood processor used had the most significant impact on chopping work productivity. In addition to the firewood processor, the stem mid-diameter, the length of the raw material, and of the firewood were also found to affect productivity. The wood feeding systems also affected productivity. If there is a feeding rack and hydraulic grapple loader available for use in chopping firewood, then it is worth using the wood feeding rack. A wood lifter is only worth using with the largest stems (over 20 cm mid-diameter) if a feeding rack cannot be used. When producing chopped firewood from small-diameter wood, i.e. with a mid-diameter less than 10 cm, the costs of chopping work were over 10 EUR solid m -3 with sawing firewood processors. The shearing firewood processor with a guillotine blade achieved a cost level of 5 EUR solid m -3 when the mid-diameter of the chopped stem was 10 cm. In addition to the raw material, the cost-efficient chopping work also requires several hundred annual operating hours with a firewood processor, which is difficult for individual firewood entrepreneurs to achieve. The operating hours of firewood processors can be increased to the required level by the joint use of the processors by a number of firewood entrepreneurs. (author)

  6. Research in Parallel Algorithms and Software for Computational Aerosciences

    Science.gov (United States)

    Domel, Neal D.

    1996-01-01

    Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  7. Parallel discrete event simulation: A shared memory approach

    Science.gov (United States)

    Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

    1987-01-01

    With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.

  8. Parallel processing approach to transform-based image coding

    Science.gov (United States)

    Normile, James O.; Wright, Dan; Chu, Ken; Yeh, Chia L.

    1991-06-01

    This paper describes a flexible parallel processing architecture designed for use in real time video processing. The system consists of floating point DSP processors connected to each other via fast serial links, each processor has access to a globally shared memory. A multiple bus architecture in combination with a dual ported memory allows communication with a host control processor. The system has been applied to prototyping of video compression and decompression algorithms. The decomposition of transform based algorithms for decompression into a form suitable for parallel processing is described. A technique for automatic load balancing among the processors is developed and discussed, results ar presented with image statistics and data rates. Finally techniques for accelerating the system throughput are analyzed and results from the application of one such modification described.

  9. Exploiting Symmetry on Parallel Architectures.

    Science.gov (United States)

    Stiller, Lewis Benjamin

    1995-01-01

    This thesis describes techniques for the design of parallel programs that solve well-structured problems with inherent symmetry. Part I demonstrates the reduction of such problems to generalized matrix multiplication by a group-equivariant matrix. Fast techniques for this multiplication are described, including factorization, orbit decomposition, and Fourier transforms over finite groups. Our algorithms entail interaction between two symmetry groups: one arising at the software level from the problem's symmetry and the other arising at the hardware level from the processors' communication network. Part II illustrates the applicability of our symmetry -exploitation techniques by presenting a series of case studies of the design and implementation of parallel programs. First, a parallel program that solves chess endgames by factorization of an associated dihedral group-equivariant matrix is described. This code runs faster than previous serial programs, and discovered it a number of results. Second, parallel algorithms for Fourier transforms for finite groups are developed, and preliminary parallel implementations for group transforms of dihedral and of symmetric groups are described. Applications in learning, vision, pattern recognition, and statistics are proposed. Third, parallel implementations solving several computational science problems are described, including the direct n-body problem, convolutions arising from molecular biology, and some communication primitives such as broadcast and reduce. Some of our implementations ran orders of magnitude faster than previous techniques, and were used in the investigation of various physical phenomena.

  10. Parallel Algorithms for Switching Edges in Heterogeneous Graphs.

    Science.gov (United States)

    Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

    2017-06-01

    An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.

  11. Parallel pic plasma simulation through particle decomposition techniques

    International Nuclear Information System (INIS)

    Briguglio, S.; Vlad, G.; Di Martino, B.; Naples, Univ. 'Federico II'

    1998-02-01

    Particle-in-cell (PIC) codes are among the major candidates to yield a satisfactory description of the detail of kinetic effects, such as the resonant wave-particle interaction, relevant in determining the transport mechanism in magnetically confined plasmas. A significant improvement of the simulation performance of such codes con be expected from parallelization, e.g., by distributing the particle population among several parallel processors. Parallelization of a hybrid magnetohydrodynamic-gyrokinetic code has been accomplished within the High Performance Fortran (HPF) framework, and tested on the IBM SP2 parallel system, using a 'particle decomposition' technique. The adopted technique requires a moderate effort in porting the code in parallel form and results in intrinsic load balancing and modest inter processor communication. The performance tests obtained confirm the hypothesis of high effectiveness of the strategy, if targeted towards moderately parallel architectures. Optimal use of resources is also discussed with reference to a specific physics problem [it

  12. Micro processors for plant protection

    International Nuclear Information System (INIS)

    McAffer, N.T.C.

    1976-01-01

    Micro computers can be used satisfactorily in general protection duties with economic advantages over hardwired systems. The reliability of such protection functions can be enhanced by keeping the task performed by each protection micro processor simple and by avoiding such a task being dependent on others in any substantial way. This implies that vital work done for any task is kept within it and that any communications from it to outside or to it from outside are restricted to those for controlling data transfer. Also that the amount of this data should be the minimum consistent with satisfactory task execution. Technology is changing rapidly and devices may become obsolete and be supplanted by new ones before their theoretical reliability can be confirmed or otherwise by field service. This emphasises the need for users to pool device performance data so that effective reliability judgements can be made within the lifetime of the devices. (orig.) [de

  13. Fire resistance of prefabricated monolithic slab

    Directory of Open Access Journals (Sweden)

    Gravit Marina

    2017-01-01

    Full Text Available A prefabricated monolithic slab (PMS has a number of valuable advantages, they allow to significantly decrease the weight of construction keeping the necessary structural-load capacity, to speed up and cheapen work conduction, to increase the heat isolating properties of an enclosure structure [1]. In order to create a design method of prefabricated monolithic slab fire-resistance, it's necessary to perform a series of PMS testing, one of which is being described in this article. Subjected to the test is a fragment of prefabricated monolithic slab with polystyrene concrete inserts along the beams with bent metal profile 250 mm thick, with a 2.7 m span loaded with evenly spread load equal to 600 kg/m2. After 3 hour testing for fire-resistance [2] no signs of construction ultimate behavior were detected.

  14. Acceleration of iterative tomographic reconstruction using graphics processors

    International Nuclear Information System (INIS)

    Belzunce, M.A.; Osorio, A.; Verrastro, C.A.

    2009-01-01

    Using iterative algorithms for image reconstruction in 3 D Positron Emission Tomography has shown to produce images with better quality than analytical methods. How ever, these algorithms are computationally expensive. New Graphic Processor Units (GPU) provides high performance at low cost and also programming tools that make possible to execute parallel algorithms easily in scientific applications. In this work, we try to achieve an acceleration of image reconstruction algorithms in 3 D PET by using a GPU. A parallel implementation of the algorithm ML-EM 3 D was developed using Siddon algorithm as Projector and Back-projector. Results show that accelerations of more than one order of magnitude can be achieved, keeping similar image quality. (author)

  15. Fast parallel event reconstruction

    CERN Multimedia

    CERN. Geneva

    2010-01-01

    On-line processing of large data volumes produced in modern HEP experiments requires using maximum capabilities of modern and future many-core CPU and GPU architectures.One of such powerful feature is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achievingmore operations per clock cycle. Motivated by the idea of using the SIMD unit ofmodern processors, the KF based track fit has been adapted for parallelism, including memory optimization, numerical analysis, vectorization with inline operator overloading, and optimization using SDKs. The speed of the algorithm has been increased in 120000 times with 0.1 ms/track, running in parallel on 16 SPEs of a Cell Blade computer.  Running on a Nehalem CPU with 8 cores it shows the processing speed of 52 ns/track using the Intel Threading Building Blocks. The same KF algorithm running on an Nvidia GTX 280 in the CUDA frameworkprovi...

  16. Towards a Process Algebra for Shared Processors

    DEFF Research Database (Denmark)

    Buchholtz, Mikael; Andersen, Jacob; Løvengreen, Hans Henrik

    2002-01-01

    We present initial work on a timed process algebra that models sharing of processor resources allowing preemption at arbitrary points in time. This enables us to model both the functional and the timely behaviour of concurrent processes executed on a single processor. We give a refinement relation...

  17. The communication processor of TUMULT-64

    NARCIS (Netherlands)

    Smit, Gerardus Johannes Maria; Jansen, P.G.

    1988-01-01

    Tumult (Twente University MULTi-processor system) is a modular extendible multi-processor system designed and implemented at the Twente University of Technology in co-operation with Oce Nederland B.V. and the Dr. Neher Laboratories (Dutch PTT). Characteristics of the hardware are: MIMD type,

  18. Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

    OpenAIRE

    Abdul Kareem PARCHUR; Ram Asaray SINGH

    2012-01-01

    High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310). The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many ke...

  19. The Glasgow Parallel Reduction Machine: Programming Shared-memory Many-core Systems using Parallel Task Composition

    Directory of Open Access Journals (Sweden)

    Ashkan Tousimojarad

    2013-12-01

    Full Text Available We present the Glasgow Parallel Reduction Machine (GPRM, a novel, flexible framework for parallel task-composition based many-core programming. We allow the programmer to structure programs into task code, written as C++ classes, and communication code, written in a restricted subset of C++ with functional semantics and parallel evaluation. In this paper we discuss the GPRM, the virtual machine framework that enables the parallel task composition approach. We focus the discussion on GPIR, the functional language used as the intermediate representation of the bytecode running on the GPRM. Using examples in this language we show the flexibility and power of our task composition framework. We demonstrate the potential using an implementation of a merge sort algorithm on a 64-core Tilera processor, as well as on a conventional Intel quad-core processor and an AMD 48-core processor system. We also compare our framework with OpenMP tasks in a parallel pointer chasing algorithm running on the Tilera processor. Our results show that the GPRM programs outperform the corresponding OpenMP codes on all test platforms, and can greatly facilitate writing of parallel programs, in particular non-data parallel algorithms such as reductions.

  20. A fast filter processor as a part of the trigger logic in an elastic scattering experiment

    International Nuclear Information System (INIS)

    Kenyon Gjerpe, I.

    1981-01-01

    A fast special purpose processor as a part of the trigger logic in an elastic scattering experiment is described. The decision to incorporate such a processor was taken because the trigger rate was estimated to be an order of magnitude higher than the date taking capability of the on-line minicomputer, a NORD 10. The processor is capable of checking the coplanarity and the opening angle of the two outgoing tracks within about 100 μs. This is done with a spatial resolution of 1 mm by using two points each track given by 3 MWPCs. For comparison this is two orders of magnitude faster than the same algorithm coded in assembly language on a PDP 11/40. The main contribution to this increased speed is due to extensive use of pipelining and parallelism. When running with the processor in the trigger, 75% more elastic events per incoming beam particle were collected, and 3 times as many elastic events per trigger were recorded on to tape for further in-depth analysis, than previously. Due to major improvements in the primary trigger logic this was less than the gain initially anticipated. A first version of the processor was designed and constructed in the CERN DD division by J. Joosten, M. Letheren and B. Martin under the supervision of C. Verkerk. The author was involved in the final design, construction and testing, and subsequently was responsible for the intergration, programming and running of the processor in the experiment. (orig.)

  1. Development of a highly reliable CRT processor

    International Nuclear Information System (INIS)

    Shimizu, Tomoya; Saiki, Akira; Hirai, Kenji; Jota, Masayoshi; Fujii, Mikiya

    1996-01-01

    Although CRT processors have been employed by the main control board to reduce the operator's workload during monitoring, the control systems are still operated by hardware switches. For further advancement, direct controller operation through a display device is expected. A CRT processor providing direct controller operation must be as reliable as the hardware switches are. The authors are developing a new type of highly reliable CRT processor that enables direct controller operations. In this paper, we discuss the design principles behind a highly reliable CRT processor. The principles are defined by studies of software reliability and of the functional reliability of the monitoring and operation systems. The functional configuration of an advanced CRT processor is also addressed. (author)

  2. Computer Generated Inputs for NMIS Processor Verification

    International Nuclear Information System (INIS)

    J. A. Mullens; J. E. Breeding; J. A. McEvers; R. W. Wysor; L. G. Chiang; J. R. Lenarduzzi; J. T. Mihalczo; J. K. Mattingly

    2001-01-01

    Proper operation of the Nuclear Identification Materials System (NMIS) processor can be verified using computer-generated inputs [BIST (Built-In-Self-Test)] at the digital inputs. Preselected sequences of input pulses to all channels with known correlation functions are compared to the output of the processor. These types of verifications have been utilized in NMIS type correlation processors at the Oak Ridge National Laboratory since 1984. The use of this test confirmed a malfunction in a NMIS processor at the All-Russian Scientific Research Institute of Experimental Physics (VNIIEF) in 1998. The NMIS processor boards were returned to the U.S. for repair and subsequently used in NMIS passive and active measurements with Pu at VNIIEF in 1999

  3. Monolithic JFET preamplifier for ionization chamber calorimeter

    International Nuclear Information System (INIS)

    Radeka, V.; Rescia, S.; Manfredi, P.F.; Speziali, V.

    1990-10-01

    A monolithic charge sensitive preamplifier using exclusively n-channel diffused JFETs has been designed and is now being fabricated by INTERFET Corp. by means of a dielectrically isolated process which allows preserving as much as possible the technology upon which discrete JFETs are based. A first prototype built by means of junction isolated process has been delivered. The characteristics of monolithically integrated JFETs compare favorably with discrete devices. First results of tests of a preamplifier which uses these devices are reported. 4 refs

  4. Increased thermal conductivity monolithic zeolite structures

    Science.gov (United States)

    Klett, James; Klett, Lynn; Kaufman, Jonathan

    2008-11-25

    A monolith comprises a zeolite, a thermally conductive carbon, and a binder. The zeolite is included in the form of beads, pellets, powders and mixtures thereof. The thermally conductive carbon can be carbon nano-fibers, diamond or graphite which provide thermal conductivities in excess of about 100 W/mK to more than 1,000 W/mK. A method of preparing a zeolite monolith includes the steps of mixing a zeolite dispersion in an aqueous colloidal silica binder with a dispersion of carbon nano-fibers in water followed by dehydration and curing of the binder is given.

  5. Technology development for SOI monolithic pixel detectors

    International Nuclear Information System (INIS)

    Marczewski, J.; Domanski, K.; Grabiec, P.; Grodner, M.; Jaroszewicz, B.; Kociubinski, A.; Kucharski, K.; Tomaszewski, D.; Caccia, M.; Kucewicz, W.; Niemiec, H.

    2006-01-01

    A monolithic detector of ionizing radiation has been manufactured using silicon on insulator (SOI) wafers with a high-resistivity substrate. In our paper the integration of a standard 3 μm CMOS technology, originally designed for bulk devices, with fabrication of pixels in the bottom wafer of a SOI substrate is described. Both technological sequences have been merged minimizing thermal budget and providing suitable properties of all the technological layers. The achieved performance proves that fully depleted monolithic active pixel matrix might be a viable option for a wide spectrum of future applications

  6. Solving Large Quadratic|Assignment Problems in Parallel

    DEFF Research Database (Denmark)

    Clausen, Jens; Perregaard, Michael

    1997-01-01

    and recalculation of bounds between branchings when used in a parallel Branch-and-Bound algorithm. The algorithm has been implemented on a 16-processor MEIKO Computing Surface with Intel i860 processors. Computational results from the solution of a number of large QAPs, including the classical Nugent 20...... processors, and have hence not been ideally suited for computations essentially involving non-vectorizable computations on integers.In this paper we investigate the combination of one of the best bound functions for a Branch-and-Bound algorithm (the Gilmore-Lawler bound) and various testing, variable binding...

  7. Image matrix processor for fast multi-dimensional computations

    Science.gov (United States)

    Roberson, George P.; Skeate, Michael F.

    1996-01-01

    An apparatus for multi-dimensional computation which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination.

  8. Point and track-finding processors for multiwire chambers

    CERN Document Server

    Hansroul, M

    1973-01-01

    The hardware processors described below are designed to be used in conjunction with multi-wire chambers. They have the characteristic of being based on computational methods in contrast to analogue procedures. In a sense, they are hardware implementations of computer programs. But, being specially designed for their purpose, they are free of the restrictions imposed by the architecture of the computer on which the equivalent program is to run. The parallelism inherent in the algorithms can thus be fully exploited. Combined with the use of fast access scratch-pad memories and the non-sequential nature of the control program, the parallelism accounts for the fact that these processors are expected to execute 2-3 orders of magnitude faster than the equivalent Fortran programs on a CDC 7600 or 6600. As a consequence, methods which are simple and straightforward, but which are impractical because they require an exorbitant amount of computer time can on the contrary be very attractive for hardware implementation. ...

  9. Parallel-Processing Test Bed For Simulation Software

    Science.gov (United States)

    Blech, Richard; Cole, Gary; Townsend, Scott

    1996-01-01

    Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).

  10. Construction of a sputtering reactor for the coating and processing of monolithic U-Mo nuclear fuel

    International Nuclear Information System (INIS)

    Schmid, Wolfgang

    2011-01-01

    In the presented thesis sputter deposition was used for the first time to coat monolithic U-Mo nuclear fuel foils with diffusion inhibitive materials. The intention of these coatings is to prevent the formation of an interdiffusion layer between U-Mo and Al cladding during the use of the fuel. A small sputtering reactor was built, in which the method was tested and processing parameters were investigated. In parallel a larger sputtering reactor was constructed, that allows to coat full size monolithic U-Mo nuclear fuel foils and was used to test an industrial application of the technique. As a result a method based on sputter deposition and erosion can be presented, that allows to clean as well as to coat the surface of monolithic U-Mo nuclear fuel foils in excellent quality. It can be included at any time into the manufacturing chain for U-Mo fuel elements, which is currently being developed.

  11. A class of parallel algorithms for computation of the manipulator inertia matrix

    Science.gov (United States)

    Fijany, Amir; Bejczy, Antal K.

    1989-01-01

    Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.

  12. Parallel discrete event simulation using shared memory

    Science.gov (United States)

    Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

    1988-01-01

    With traditional event-list techniques, evaluating a detailed discrete-event simulation-model can often require hours or even days of computation time. By eliminating the event list and maintaining only sufficient synchronization to ensure causality, parallel simulation can potentially provide speedups that are linear in the numbers of processors. A set of shared-memory experiments, using the Chandy-Misra distributed-simulation algorithm, to simulate networks of queues is presented. Parameters of the study include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential-simulation of most queueing network models.

  13. Programmable level-1 trigger with 3D-Flow processor array

    International Nuclear Information System (INIS)

    Crosetto, D.

    1994-01-01

    The 3D-Flow parallel processing system is a new concept in processor architecture, system architecture, and assembly architecture. Compared to the electronics used in present systems, this approach reduces the cost and complexity of the hardware and allows easy assembly, disassembly, incremental upgrading, and maintenance of different interconnection topologies. The 3D-Flow parallel-processing system benefits high energy physics (HEP) by allowing: (1) common less costly hardware to be used in different experiments. (2) new uses of existing installations. (3) tuning of trigger based on the first analyzed data, and (4) selection of desired events directly from raw data. The goal of this parallel-processing architecture is to acquire multiple data in parallel (up to 100 million frames per second) and to process them at high speed, accomplishing digital filtering on the input data, pattern recognition (particle identification), data moving, and data formatting. The main features of the system are its programmability, scalability, high-speed communication, and low cost. The compactness of the 3D-Flow parallel-processing system in concert with the processor architecture allows processor interconnections to be mapped into the geometry of sensors (detectors in HEP) without large interconnection signal delay, enabling real-time pattern recognition. The overall 3D-Flow project has passed a major design review at Fermilab (Reviewers included experts in computers, triggering, system assembly, and electronics)

  14. Performance evaluation for compressible flow calculations on five parallel computers of different architectures

    International Nuclear Information System (INIS)

    Kimura, Toshiya.

    1997-03-01

    A two-dimensional explicit Euler solver has been implemented for five MIMD parallel computers of different machine architectures in Center for Promotion of Computational Science and Engineering of Japan Atomic Energy Research Institute. These parallel computers are Fujitsu VPP300, NEC SX-4, CRAY T94, IBM SP2, and Hitachi SR2201. The code was parallelized by several parallelization methods, and a typical compressible flow problem has been calculated for different grid sizes changing the number of processors. Their effective performances for parallel calculations, such as calculation speed, speed-up ratio and parallel efficiency, have been investigated and evaluated. The communication time among processors has been also measured and evaluated. As a result, the differences on the performance and the characteristics between vector-parallel and scalar-parallel computers can be pointed, and it will present the basic data for efficient use of parallel computers and for large scale CFD simulations on parallel computers. (author)

  15. Performance studies of the parallel VIM code

    International Nuclear Information System (INIS)

    Shi, B.; Blomquist, R.N.

    1996-01-01

    In this paper, the authors evaluate the performance of the parallel version of the VIM Monte Carlo code on the IBM SPx at the High Performance Computing Research Facility at ANL. Three test problems with contrasting computational characteristics were used to assess effects in performance. A statistical method for estimating the inefficiencies due to load imbalance and communication is also introduced. VIM is a large scale continuous energy Monte Carlo radiation transport program and was parallelized using history partitioning, the master/worker approach, and p4 message passing library. Dynamic load balancing is accomplished when the master processor assigns chunks of histories to workers that have completed a previously assigned task, accommodating variations in the lengths of histories, processor speeds, and worker loads. At the end of each batch (generation), the fission sites and tallies are sent from each worker to the master process, contributing to the parallel inefficiency. All communications are between master and workers, and are serial. The SPx is a scalable 128-node parallel supercomputer with high-performance Omega switches of 63 microsec latency and 35 MBytes/sec bandwidth. For uniform and reproducible performance, they used only the 120 identical regular processors (IBM RS/6000) and excluded the remaining eight planet nodes, which may be loaded by other's jobs

  16. Simulation of Particulate Flows on Multi-Processor Machines with Distributed Memory

    International Nuclear Information System (INIS)

    Uhlmann, M.

    2004-01-01

    We present a method for the parallelization of an immersed boundary algorithm for particulate flows using the MPI standard of communication. The treatment of the fluid phase uses the domain decomposition technique over a Cartesian processor grid. The solution of the Hehnholtz problem is approximately factorized an relies upon apparel tri-diagonal solver; the Poisson problem is solved by means of a parallel multi-grid technique simulator MUDPACK. For the solid phase we employ a master-slaves technique where one process or handles all the particles contained in its Eulerian fluid sub-domain and zero or more neighbor processors collaborate in the computation of particle-related quantities whenever a particle position overlaps the boundary of a sub- do mam.The parallel efficiency for some preliminary computations is presented. (Author) 9 refs

  17. Upside to downsizing : Acceleware's graphic processor technology propels seismic data processing revolution

    Energy Technology Data Exchange (ETDEWEB)

    Smith, M.

    2009-11-15

    Accelware has developed a graphic processor technology (GPU) that is transforming the petroleum industry. The benefits of the technology are its small-footprint, low-wattage, and high speed. The software brings supercomputing speed to the desktop by leveraging the massive parallel processing capacity to the very latest in GPU technology. This article discussed the GPU technology and its emergence as a powerful supercomputing tool. Accelware's partnering with California-based NVIDIA was also outlined. The advantages of the technology were also discussed including its smaller footprint. Accelware's hardware takes up a fraction of the space and uses up to 70 per cent less power than a traditional central processing unit. By combining Accelware's core knowledge in making complex algorithms run in parallel with an in-house team of seismic industry experts, the company provides software solutions for seismic data processors that access the massively parallel processing capabilities of GPUs. 1 fig.

  18. An “artificial retina” processor for track reconstruction at the full LHC crossing rate

    International Nuclear Information System (INIS)

    Abba, A.; Bedeschi, F.; Caponio, F.; Cenci, R.; Citterio, M.; Cusimano, A.; Fu, J.; Geraci, A.; Grizzuti, M.; Lusardi, N.; Marino, P.; Morello, M.J.; Neri, N.; Ninci, D.; Petruzzo, M.; Piucci, A.; Punzi, G.; Ristori, L.; Spinella, F.

    2016-01-01

    We present the latest results of an R&D study for a specialized processor capable of reconstructing, in a silicon pixel detector, high-quality tracks from high-energy collision events at 40 MHz. The processor applies a highly parallel pattern-recognition algorithm inspired to quick detection of edges in mammals visual cortex. After a detailed study of a real-detector application, demonstrating that online reconstruction of offline-quality tracks is feasible at 40 MHz with sub-microsecond latency, we are implementing a prototype using common high-bandwidth FPGA devices.

  19. An “artificial retina” processor for track reconstruction at the full LHC crossing rate

    Energy Technology Data Exchange (ETDEWEB)

    Abba, A. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Politecnico di Milano, Milano (Italy); Bedeschi, F. [Istituto Nazionale di Fisica Nucleare – Sez. di Pisa, Pisa (Italy); Caponio, F. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Politecnico di Milano, Milano (Italy); Cenci, R., E-mail: riccardo.cenci@pi.infn.it [Istituto Nazionale di Fisica Nucleare – Sez. di Pisa, Pisa (Italy); Scuola Normale Superiore, Pisa (Italy); Citterio, M. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Cusimano, A. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Politecnico di Milano, Milano (Italy); Fu, J. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Geraci, A.; Grizzuti, M.; Lusardi, N. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Politecnico di Milano, Milano (Italy); Marino, P.; Morello, M.J. [Istituto Nazionale di Fisica Nucleare – Sez. di Pisa, Pisa (Italy); Scuola Normale Superiore, Pisa (Italy); Neri, N. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Ninci, D. [Istituto Nazionale di Fisica Nucleare – Sez. di Pisa, Pisa (Italy); Università di Pisa, Pisa (Italy); Petruzzo, M. [Istituto Nazionale di Fisica Nucleare – Sez. di Milano, Milano (Italy); Università di Milano, Milano (Italy); Piucci, A.; Punzi, G. [Istituto Nazionale di Fisica Nucleare – Sez. di Pisa, Pisa (Italy); Università di Pisa, Pisa (Italy); Ristori, L. [Fermi National Accelerator Laboratory, Batavia, IL (United States); Spinella, F. [Istituto Nazionale di Fisica Nucleare – Sez. di Pisa, Pisa (Italy); and others

    2016-07-11

    We present the latest results of an R&D study for a specialized processor capable of reconstructing, in a silicon pixel detector, high-quality tracks from high-energy collision events at 40 MHz. The processor applies a highly parallel pattern-recognition algorithm inspired to quick detection of edges in mammals visual cortex. After a detailed study of a real-detector application, demonstrating that online reconstruction of offline-quality tracks is feasible at 40 MHz with sub-microsecond latency, we are implementing a prototype using common high-bandwidth FPGA devices.

  20. An "artificial retina" processor for track reconstruction at the full LHC crossing rate

    Science.gov (United States)

    Abba, A.; Bedeschi, F.; Caponio, F.; Cenci, R.; Citterio, M.; Cusimano, A.; Fu, J.; Geraci, A.; Grizzuti, M.; Lusardi, N.; Marino, P.; Morello, M. J.; Neri, N.; Ninci, D.; Petruzzo, M.; Piucci, A.; Punzi, G.; Ristori, L.; Spinella, F.; Stracka, S.; Tonelli, D.; Walsh, J.

    2016-07-01

    We present the latest results of an R&D study for a specialized processor capable of reconstructing, in a silicon pixel detector, high-quality tracks from high-energy collision events at 40 MHz. The processor applies a highly parallel pattern-recognition algorithm inspired to quick detection of edges in mammals visual cortex. After a detailed study of a real-detector application, demonstrating that online reconstruction of offline-quality tracks is feasible at 40 MHz with sub-microsecond latency, we are implementing a prototype using common high-bandwidth FPGA devices.

  1. An 'artificial retina' processor for track reconstruction at the full LHC crossing rate

    OpenAIRE

    Abba, A; Bedeschi, F; Caponio, F; Cenci, R; Citterio, M; Cusimano, A; Fu, J; Geraci, A; Grizzuti, M; Lusardi, N; Marino, P; Morello, M J; Neri, N; Ninci, D; Petruzzo, M

    2016-01-01

    We present the latest results of an R&D; study for a specialized processor capable of reconstructing, in a silicon pixel detector, high-quality tracks from high-energy collision events at 40 MHz. The processor applies a highly parallel pattern-recognition algorithm inspired to quick detection of edges in mammals visual cortex. After a detailed study of a real-detector application, demonstrating that online reconstruction of offline-quality tracks is feasible at 40 MHz with sub-microsecond lat...

  2. Analytical Bounds on the Threads in IXP1200 Network Processor

    OpenAIRE

    Ramakrishna, STGS; Jamadagni, HS

    2003-01-01

    Increasing link speeds have placed enormous burden on the processing requirements and the processors are expected to carry out a variety of tasks. Network Processors (NP) [1] [2] is the blanket name given to the processors, which are traded for flexibility and performance. Network Processors are offered by a number of vendors; to take the main burden of processing requirement of network related operations from the conventional processors. The Network Processors cover a spectrum of design trad...

  3. A hybrid algorithm for parallel molecular dynamics simulations

    Science.gov (United States)

    Mangiardi, Chris M.; Meyer, R.

    2017-10-01

    This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.

  4. Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

    Science.gov (United States)

    Hsieh, Shang-Hsien

    1993-01-01

    The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.

  5. Effect of processor temperature on film dosimetry

    International Nuclear Information System (INIS)

    Srivastava, Shiv P.; Das, Indra J.

    2012-01-01

    Optical density (OD) of a radiographic film plays an important role in radiation dosimetry, which depends on various parameters, including beam energy, depth, field size, film batch, dose, dose rate, air film interface, postexposure processing time, and temperature of the processor. Most of these parameters have been studied for Kodak XV and extended dose range (EDR) films used in radiation oncology. There is very limited information on processor temperature, which is investigated in this study. Multiple XV and EDR films were exposed in the reference condition (d max. , 10 × 10 cm 2 , 100 cm) to a given dose. An automatic film processor (X-Omat 5000) was used for processing films. The temperature of the processor was adjusted manually with increasing temperature. At each temperature, a set of films was processed to evaluate OD at a given dose. For both films, OD is a linear function of processor temperature in the range of 29.4–40.6°C (85–105°F) for various dose ranges. The changes in processor temperature are directly related to the dose by a quadratic function. A simple linear equation is provided for the changes in OD vs. processor temperature, which could be used for correcting dose in radiation dosimetry when film is used.

  6. Optical Associative Processors For Visual Perception"

    Science.gov (United States)

    Casasent, David; Telfer, Brian

    1988-05-01

    We consider various associative processor modifications required to allow these systems to be used for visual perception, scene analysis, and object recognition. For these applications, decisions on the class of the objects present in the input image are required and thus heteroassociative memories are necessary (rather than the autoassociative memories that have been given most attention). We analyze the performance of both associative processors and note that there is considerable difference between heteroassociative and autoassociative memories. We describe associative processors suitable for realizing functions such as: distortion invariance (using linear discriminant function memory synthesis techniques), noise and image processing performance (using autoassociative memories in cascade with with a heteroassociative processor and with a finite number of autoassociative memory iterations employed), shift invariance (achieved through the use of associative processors operating on feature space data), and the analysis of multiple objects in high noise (which is achieved using associative processing of the output from symbolic correlators). We detail and provide initial demonstrations of the use of associative processors operating on iconic, feature space and symbolic data, as well as adaptive associative processors.

  7. FLUIDIZED BED STEAM REFORMER MONOLITH FORMATION

    International Nuclear Information System (INIS)

    Jantzen, C

    2006-01-01

    Fluidized Bed Steam Reforming (FBSR) is being considered as an alternative technology for the immobilization of a wide variety of aqueous high sodium containing radioactive wastes at various DOE facilities in the United States. The addition of clay, charcoal, and a catalyst as co-reactants converts aqueous Low Activity Wastes (LAW) to a granular or ''mineralized'' waste form while converting organic components to CO 2 and steam, and nitrate/nitrite components, if any, to N 2 . The waste form produced is a multiphase mineral assemblage of Na-Al-Si (NAS) feldspathoid minerals with cage-like structures that atomically bond radionuclides like Tc-99 and anions such as SO 4 , I, F, and Cl. The granular product has been shown to be as durable as LAW glass. Shallow land burial requires that the mineralized waste form be able to sustain the weight of soil overburden and potential intrusion by future generations. The strength requirement necessitates binding the granular product into a monolith. FBSR mineral products were formulated into a variety of monoliths including various cements, Ceramicrete, and hydroceramics. All but one of the nine monoliths tested met the 2 durability specification for Na and Re (simulant for Tc-99) when tested using the Product Consistency Test (PCT; ASTM C1285). Of the nine monoliths tested the cements produced with 80-87 wt% FBSR product, the Ceramicrete, and the hydroceramic produced with 83.3 wt% FBSR product, met the compressive strength and durability requirements for an LAW waste form

  8. Package Holds Five Monolithic Microwave Integrated Circuits

    Science.gov (United States)

    Mysoor, Narayan R.; Decker, D. Richard; Olson, Hilding M.

    1996-01-01

    Packages protect and hold monolithic microwave integrated circuit (MMIC) chips while providing dc and radio-frequency (RF) electrical connections for chips undergoing development. Required to be compact, lightweight, and rugged. Designed to minimize undesired resonances, reflections, losses, and impedance mismatches.

  9. An Algorithm for Parallel Sn Sweeps on Unstructured Meshes

    International Nuclear Information System (INIS)

    Pautz, Shawn D.

    2002-01-01

    A new algorithm for performing parallel S n sweeps on unstructured meshes is developed. The algorithm uses a low-complexity list ordering heuristic to determine a sweep ordering on any partitioned mesh. For typical problems and with 'normal' mesh partitionings, nearly linear speedups on up to 126 processors are observed. This is an important and desirable result, since although analyses of structured meshes indicate that parallel sweeps will not scale with normal partitioning approaches, no severe asymptotic degradation in the parallel efficiency is observed with modest (≤100) levels of parallelism. This result is a fundamental step in the development of efficient parallel S n methods

  10. SPRINT: A new parallel framework for R

    Directory of Open Access Journals (Sweden)

    Scharinger Florian

    2008-12-01

    Full Text Available Abstract Background Microarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of the resulting data tests the limits of existing bioinformatics computing infrastructure. A solution to this issue is to use High Performance Computing (HPC systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest. Results We have designed and built a prototype framework that allows the addition of parallelised functions to R to enable the easy exploitation of HPC systems. The Simple Parallel R INTerface (SPRINT is a wrapper around such parallelised functions. Their use requires very little modification to existing sequential R scripts and no expertise in parallel computing. As an example we created a function that carries out the computation of a pairwise calculated correlation matrix. This performs well with SPRINT. When executed using SPRINT on an HPC resource of eight processors this computation reduces by more than three times the time R takes to complete it on one processor. Conclusion SPRINT allows the biostatistician to concentrate on the research problems rather than the computation, while still allowing exploitation of HPC systems. It is easy to use and with further development will become more useful as more

  11. Advanced parallel processing with supercomputer architectures

    International Nuclear Information System (INIS)

    Hwang, K.

    1987-01-01

    This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers

  12. Parallel plasma fluid turbulence calculations

    International Nuclear Information System (INIS)

    Leboeuf, J.N.; Carreras, B.A.; Charlton, L.A.; Drake, J.B.; Lynch, V.E.; Newman, D.E.; Sidikman, K.L.; Spong, D.A.

    1994-01-01

    The study of plasma turbulence and transport is a complex problem of critical importance for fusion-relevant plasmas. To this day, the fluid treatment of plasma dynamics is the best approach to realistic physics at the high resolution required for certain experimentally relevant calculations. Core and edge turbulence in a magnetic fusion device have been modeled using state-of-the-art, nonlinear, three-dimensional, initial-value fluid and gyrofluid codes. Parallel implementation of these models on diverse platforms--vector parallel (National Energy Research Supercomputer Center's CRAY Y-MP C90), massively parallel (Intel Paragon XP/S 35), and serial parallel (clusters of high-performance workstations using the Parallel Virtual Machine protocol)--offers a variety of paths to high resolution and significant improvements in real-time efficiency, each with its own advantages. The largest and most efficient calculations have been performed at the 200 Mword memory limit on the C90 in dedicated mode, where an overlap of 12 to 13 out of a maximum of 16 processors has been achieved with a gyrofluid model of core fluctuations. The richness of the physics captured by these calculations is commensurate with the increased resolution and efficiency and is limited only by the ingenuity brought to the analysis of the massive amounts of data generated

  13. Monolithic blue LED series arrays for high-voltage AC operation

    Energy Technology Data Exchange (ETDEWEB)

    Ao, Jin-Ping [Satellite Venture Business Laboratory, University of Tokushima, Tokushima 770-8506 (Japan); Sato, Hisao; Mizobuchi, Takashi; Morioka, Kenji; Kawano, Shunsuke; Muramoto, Yoshihiko; Sato, Daisuke; Sakai, Shiro [Nitride Semiconductor Co. Ltd., Naruto, Tokushima 771-0360 (Japan); Lee, Young-Bae; Ohno, Yasuo [Department of Electrical and Electronic Engineering, University of Tokushima, Tokushima 770-8506 (Japan)

    2002-12-16

    Design and fabrication of monolithic blue LED series arrays that can be operated under high ac voltage are described. Several LEDs, such as 3, 7, and 20, are connected in series and in parallel to meet ac operation. The chip size of a single device is 150 {mu}m x 120 {mu}m and the total size is 1.1 mm x 1 mm for a 40(20+20) LED array. Deep dry etching was performed as device isolation. Two-layer interconnection and air bridge are utilized to connect the devices in an array. The monolithic series array exhibit the expected operation function under dc and ac bias. The output power and forward voltage are almost proportional to LED numbers connected in series. On-wafer measurement shows that the output power is 40 mW for 40(20+20) LED array under ac 72 V. (Abstract Copyright [2002], Wiley Periodicals, Inc.)

  14. Study and simulation of a parallel numerical processing machine

    International Nuclear Information System (INIS)

    Bel Hadj, Slaheddine

    1981-12-01

    This study has been carried out in the perspective of the implementation on a minicomputer of the NEPTUNIX package (software for the resolution of very large algebra-differential equation systems). Aiming at increasing the system performance, a previous research work has shown the necessity of reducing the execution time of certain numerical computation tasks, which are of frequent use. It has also demonstrated the feasibility of handling these tasks with efficient algorithms of parallel type. The present work deals with the study and simulation of a parallel architecture processor adapted to the fast execution of these algorithms. A minicomputer fitted with a connection to such a parallel processor, has a greatly extended computing power. Then the architecture of a parallel numerical processor, based on the use of VLSI microprocessors and co-processors, is described. Its design aims at the best cost / performance ratio. The last part deals with the simulation processor with the 'CHAMBOR' program. Results show an increasing factor of 30 in speed, in comparison with the execution on a MITRA 15 minicomputer. Moreover the conflicts importance, mainly at the level of access to a shared resource is evaluated. Although this implementation has been designed having in mind a dedicated application, other uses could be envisaged, particularly for the simulation of nuclear reactors: operator guiding system, the behavioural study under accidental circumstances, etc. (author) [fr

  15. Methacrylate monolithic columns functionalized with epinephrine for capillary electrochromatography applications.

    Science.gov (United States)

    Carrasco-Correa, Enrique Javier; Ramis-Ramos, Guillermo; Herrero-Martínez, José Manuel

    2013-07-12

    Epinephrine-bonded polymeric monoliths for capillary electrochromatography (CEC) were developed by nucleophilic substitution reaction of epoxide groups of poly(glycidyl-methacrylate-co-ethylenedimethacrylate) (poly(GMA-co-EDMA)) monoliths using epinephrine as nucleophilic reagent. The ring opening reaction under dynamic conditions was optimized. Successful chemical modification of the monolith surface was ascertained by in situ Raman spectroscopy characterization. In addition, the amount of epinephrine groups that was bound to the monolith surface was evaluated by oxidation of the catechol groups with Ce(IV), followed by spectrophotometric measurement of unreacted Ce(IV). About 9% of all theoretical epoxide groups of the parent monolith were bonded to epinephrine. The chromatographic behavior of the epinephrine-bonded monolith in CEC conditions was assessed with test mixtures of alkyl benzenes, aniline derivatives and substituted phenols. In comparison to the poly(GMA-co-EDMA) monoliths, the epinephrine-bonded monoliths exhibited a much higher retention and slight differences in selectivity. The epinephrine-bonded monolith was further modified by oxidation with a Ce(IV) solution and compared with the epinephrine-bonded monoliths. The resulting monolithic stationary phases were evaluated in terms of reproducibility, giving RSD values below 9% in the parameters investigated. Copyright © 2013 Elsevier B.V. All rights reserved.

  16. A two-level parallel direct search implementation for arbitrarily sized objective functions

    Energy Technology Data Exchange (ETDEWEB)

    Hutchinson, S.A.; Shadid, N.; Moffat, H.K. [Sandia National Labs., Albuquerque, NM (United States)] [and others

    1994-12-31

    In the past, many optimization schemes for massively parallel computers have attempted to achieve parallel efficiency using one of two methods. In the case of large and expensive objective function calculations, the optimization itself may be run in serial and the objective function calculations parallelized. In contrast, if the objective function calculations are relatively inexpensive and can be performed on a single processor, then the actual optimization routine itself may be parallelized. In this paper, a scheme based upon the Parallel Direct Search (PDS) technique is presented which allows the objective function calculations to be done on an arbitrarily large number (p{sub 2}) of processors. If, p, the number of processors available, is greater than or equal to 2p{sub 2} then the optimization may be parallelized as well. This allows for efficient use of computational resources since the objective function calculations can be performed on the number of processors that allow for peak parallel efficiency and then further speedup may be achieved by parallelizing the optimization. Results are presented for an optimization problem which involves the solution of a PDE using a finite-element algorithm as part of the objective function calculation. The optimum number of processors for the finite-element calculations is less than p/2. Thus, the PDS method is also parallelized. Performance comparisons are given for a nCUBE 2 implementation.

  17. Development of Innovative Design Processor

    International Nuclear Information System (INIS)

    Park, Y.S.; Park, C.O.

    2004-01-01

    The nuclear design analysis requires time-consuming and erroneous model-input preparation, code run, output analysis and quality assurance process. To reduce human effort and improve design quality and productivity, Innovative Design Processor (IDP) is being developed. Two basic principles of IDP are the document-oriented design and the web-based design. The document-oriented design is that, if the designer writes a design document called active document and feeds it to a special program, the final document with complete analysis, table and plots is made automatically. The active documents can be written with ordinary HTML editors or created automatically on the web, which is another framework of IDP. Using the proper mix-up of server side and client side programming under the LAMP (Linux/Apache/MySQL/PHP) environment, the design process on the web is modeled as a design wizard style so that even a novice designer makes the design document easily. This automation using the IDP is now being implemented for all the reload design of Korea Standard Nuclear Power Plant (KSNP) type PWRs. The introduction of this process will allow large reduction in all reload design efforts of KSNP and provide a platform for design and R and D tasks of KNFC. (authors)

  18. Onboard spectral imager data processor

    Science.gov (United States)

    Otten, Leonard J.; Meigs, Andrew D.; Franklin, Abraham J.; Sears, Robert D.; Robison, Mark W.; Rafert, J. Bruce; Fronterhouse, Donald C.; Grotbeck, Ronald L.

    1999-10-01

    Previous papers have described the concept behind the MightySat II.1 program, the satellite's Fourier Transform imaging spectrometer's optical design, the design for the spectral imaging payload, and its initial qualification testing. This paper discusses the on board data processing designed to reduce the amount of downloaded data by an order of magnitude and provide a demonstration of a smart spaceborne spectral imaging sensor. Two custom components, a spectral imager interface 6U VME card that moves data at over 30 MByte/sec, and four TI C-40 processors mounted to a second 6U VME and daughter card, are used to adapt the sensor to the spacecraft and provide the necessary high speed processing. A system architecture that offers both on board real time image processing and high-speed post data collection analysis of the spectral data has been developed. In addition to the on board processing of the raw data into a usable spectral data volume, one feature extraction technique has been incorporated. This algorithm operates on the basic interferometric data. The algorithm is integrated within the data compression process to search for uploadable feature descriptions.

  19. Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.

    Science.gov (United States)

    Bhandarkar, S M; Chirravuri, S; Arnold, J

    1996-01-01

    Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.

  20. A data base processor semantics specification package

    Science.gov (United States)

    Fishwick, P. A.

    1983-01-01

    A Semantics Specification Package (DBPSSP) for the Intel Data Base Processor (DBP) is defined. DBPSSP serves as a collection of cross assembly tools that allow the analyst to assemble request blocks on the host computer for passage to the DBP. The assembly tools discussed in this report may be effectively used in conjunction with a DBP compatible data communications protocol to form a query processor, precompiler, or file management system for the database processor. The source modules representing the components of DBPSSP are fully commented and included.

  1. Hardware trigger processor for the MDT system

    CERN Document Server

    AUTHOR|(SzGeCERN)757787; The ATLAS collaboration; Hazen, Eric; Butler, John; Black, Kevin; Gastler, Daniel Edward; Ntekas, Konstantinos; Taffard, Anyes; Martinez Outschoorn, Verena; Ishino, Masaya; Okumura, Yasuyuki

    2017-01-01

    We are developing a low-latency hardware trigger processor for the Monitored Drift Tube system in the Muon spectrometer. The processor will fit candidate Muon tracks in the drift tubes in real time, improving significantly the momentum resolution provided by the dedicated trigger chambers. We present a novel pure-FPGA implementation of a Legendre transform segment finder, an associative-memory alternative implementation, an ARM (Zynq) processor-based track fitter, and compact ATCA carrier board architecture. The ATCA architecture is designed to allow a modular, staged approach to deployment of the system and exploration of alternative technologies.

  2. Design concepts for a virtualizable embedded MPSoC architecture enabling virtualization in embedded multi-processor systems

    CERN Document Server

    Biedermann, Alexander

    2014-01-01

    Alexander Biedermann presents a generic hardware-based virtualization approach, which may transform an array of any off-the-shelf embedded processors into a multi-processor system with high execution dynamism. Based on this approach, he highlights concepts for the design of energy aware systems, self-healing systems as well as parallelized systems. For the latter, the novel so-called Agile Processing scheme is introduced by the author, which enables a seamless transition between sequential and parallel execution schemes. The design of such virtualizable systems is further aided by introduction

  3. Parallel execution of chemical software on EGEE Grid

    CERN Document Server

    Sterzel, Mariusz

    2008-01-01

    Constant interest among chemical community to study larger and larger molecules forces the parallelization of existing computational methods in chemistry and development of new ones. These are main reasons of frequent port updates and requests from the community for the Grid ports of new packages to satisfy their computational demands. Unfortunately some parallelization schemes used by chemical packages cannot be directly used in Grid environment. Here we present a solution for Gaussian package. The current state of development of Grid middleware allows easy parallel execution in case of software using any of MPI flavour. Unfortunately many chemical packages do not use MPI for parallelization therefore special treatment is needed. Gaussian can be executed in parallel on SMP architecture or via Linda. These require reservation of certain number of processors/cores on a given WN and the equal number of processors/cores on each WN, respectively. The current implementation of EGEE middleware does not offer such f...

  4. Frontiers of massively parallel scientific computation

    International Nuclear Information System (INIS)

    Fischer, J.R.

    1987-07-01

    Practical applications using massively parallel computer hardware first appeared during the 1980s. Their development was motivated by the need for computing power orders of magnitude beyond that available today for tasks such as numerical simulation of complex physical and biological processes, generation of interactive visual displays, satellite image analysis, and knowledge based systems. Representative of the first generation of this new class of computers is the Massively Parallel Processor (MPP). A team of scientists was provided the opportunity to test and implement their algorithms on the MPP. The first results are presented. The research spans a broad variety of applications including Earth sciences, physics, signal and image processing, computer science, and graphics. The performance of the MPP was very good. Results obtained using the Connection Machine and the Distributed Array Processor (DAP) are presented

  5. Options for Parallelizing a Planning and Scheduling Algorithm

    Science.gov (United States)

    Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.

    2011-01-01

    Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.

  6. A High-Speed and Low-Energy-Consumption Processor for SVD-MIMO-OFDM Systems

    Directory of Open Access Journals (Sweden)

    Hiroki Iwaizumi

    2013-01-01

    Full Text Available A processor design for singular value decomposition (SVD and compression/decompression of feedback matrices, which are mandatory operations for SVD multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM systems, is proposed and evaluated. SVD-MIMO is a transmission method for suppressing multistream interference and improving communication quality by beamforming. An application specific instruction-set processor (ASIP architecture is adopted to achieve flexibility in terms of operations and matrix size. The proposed processor realizes a high-speed/low-power design and real-time processing by the parallelization of floating-point units (FPUs and arithmetic instructions specialized in complex matrix operations.

  7. Photonics and Fiber Optics Processor Lab

    Data.gov (United States)

    Federal Laboratory Consortium — The Photonics and Fiber Optics Processor Lab develops, tests and evaluates high speed fiber optic network components as well as network protocols. In addition, this...

  8. Aspects of parallel processing and control engineering

    OpenAIRE

    McKittrick, Brendan J

    1991-01-01

    The concept of parallel processing is not a new one, but the application of it to control engineering tasks is a relatively recent development, made possible by contemporary hardware and software innovation. It has long been accepted that, if properly orchestrated several processors/CPUs when combined can form a powerful processing entity. What prevented this from being implemented in commercial systems was the adequacy of the microprocessor for most tasks and hence the expense of a multi-pro...

  9. Advances in gallium arsenide monolithic microwave integrated-circuit technology for space communications systems

    Science.gov (United States)

    Bhasin, K. B.; Connolly, D. J.

    1986-01-01

    Future communications satellites are likely to use gallium arsenide (GaAs) monolithic microwave integrated-circuit (MMIC) technology in most, if not all, communications payload subsystems. Multiple-scanning-beam antenna systems are expected to use GaAs MMIC's to increase functional capability, to reduce volume, weight, and cost, and to greatly improve system reliability. RF and IF matrix switch technology based on GaAs MMIC's is also being developed for these reasons. MMIC technology, including gigabit-rate GaAs digital integrated circuits, offers substantial advantages in power consumption and weight over silicon technologies for high-throughput, on-board baseband processor systems. In this paper, current developments in GaAs MMIC technology are described, and the status and prospects of the technology are assessed.

  10. Keystone Business Models for Network Security Processors

    OpenAIRE

    Arthur Low; Steven Muegge

    2013-01-01

    Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor...

  11. Real time monitoring of electron processors

    International Nuclear Information System (INIS)

    Nablo, S.V.; Kneeland, D.R.; McLaughlin, W.L.

    1995-01-01

    A real time radiation monitor (RTRM) has been developed for monitoring the dose rate (current density) of electron beam processors. The system provides continuous monitoring of processor output, electron beam uniformity, and an independent measure of operating voltage or electron energy. In view of the device's ability to replace labor-intensive dosimetry in verification of machine performance on a real-time basis, its application to providing archival performance data for in-line processing is discussed. (author)

  12. Parallel Implicit Algorithms for CFD

    Science.gov (United States)

    Keyes, David E.

    1998-01-01

    The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.

  13. Parallel computation

    International Nuclear Information System (INIS)

    Jejcic, A.; Maillard, J.; Maurel, G.; Silva, J.; Wolff-Bacha, F.

    1997-01-01

    The work in the field of parallel processing has developed as research activities using several numerical Monte Carlo simulations related to basic or applied current problems of nuclear and particle physics. For the applications utilizing the GEANT code development or improvement works were done on parts simulating low energy physical phenomena like radiation, transport and interaction. The problem of actinide burning by means of accelerators was approached using a simulation with the GEANT code. A program of neutron tracking in the range of low energies up to the thermal region has been developed. It is coupled to the GEANT code and permits in a single pass the simulation of a hybrid reactor core receiving a proton burst. Other works in this field refers to simulations for nuclear medicine applications like, for instance, development of biological probes, evaluation and characterization of the gamma cameras (collimators, crystal thickness) as well as the method for dosimetric calculations. Particularly, these calculations are suited for a geometrical parallelization approach especially adapted to parallel machines of the TN310 type. Other works mentioned in the same field refer to simulation of the electron channelling in crystals and simulation of the beam-beam interaction effect in colliders. The GEANT code was also used to simulate the operation of germanium detectors designed for natural and artificial radioactivity monitoring of environment

  14. Accuracy Limitations in Optical Linear Algebra Processors

    Science.gov (United States)

    Batsell, Stephen Gordon

    1990-01-01

    One of the limiting factors in applying optical linear algebra processors (OLAPs) to real-world problems has been the poor achievable accuracy of these processors. Little previous research has been done on determining noise sources from a systems perspective which would include noise generated in the multiplication and addition operations, noise from spatial variations across arrays, and from crosstalk. In this dissertation, we propose a second-order statistical model for an OLAP which incorporates all these system noise sources. We now apply this knowledge to determining upper and lower bounds on the achievable accuracy. This is accomplished by first translating the standard definition of accuracy used in electronic digital processors to analog optical processors. We then employ our second-order statistical model. Having determined a general accuracy equation, we consider limiting cases such as for ideal and noisy components. From the ideal case, we find the fundamental limitations on improving analog processor accuracy. From the noisy case, we determine the practical limitations based on both device and system noise sources. These bounds allow system trade-offs to be made both in the choice of architecture and in individual components in such a way as to maximize the accuracy of the processor. Finally, by determining the fundamental limitations, we show the system engineer when the accuracy desired can be achieved from hardware or architecture improvements and when it must come from signal pre-processing and/or post-processing techniques.

  15. Parallel algorithms for mapping pipelined and parallel computations

    Science.gov (United States)

    Nicol, David M.

    1988-01-01

    Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.

  16. Angular parallelization of a curvilinear Sn transport theory method

    International Nuclear Information System (INIS)

    Haghighat, A.

    1991-01-01

    In this paper a parallel algorithm for angular domain decomposition (or parallelization) of an r-dependent spherical S n transport theory method is derived. The parallel formulation is incorporated into TWOTRAN-II using the IBM Parallel Fortran compiler and implemented on an IBM 3090/400 (with four processors). The behavior of the parallel algorithm for different physical problems is studied, and it is concluded that the parallel algorithm behaves differently in the presence of a fission source as opposed to the absence of a fission source; this is attributed to the relative contributions of the source and the angular redistribution terms in the S s algorithm. Further, the parallel performance of the algorithm is measured for various problem sizes and different combinations of angular subdomains or processors. Poor parallel efficiencies between ∼35 and 50% are achieved in situations where the relative difference of parallel to serial iterations is ∼50%. High parallel efficiencies between ∼60% and 90% are obtained in situations where the relative difference of parallel to serial iterations is <35%

  17. A Real-Time Sound Field Rendering Processor

    Directory of Open Access Journals (Sweden)

    Tan Yiyu

    2017-12-01

    Full Text Available Real-time sound field renderings are computationally intensive and memory-intensive. Traditional rendering systems based on computer simulations suffer from memory bandwidth and arithmetic units. The computation is time-consuming, and the sample rate of the output sound is low because of the long computation time at each time step. In this work, a processor with a hybrid architecture is proposed to speed up computation and improve the sample rate of the output sound, and an interface is developed for system scalability through simply cascading many chips to enlarge the simulated area. To render a three-minute Beethoven wave sound in a small shoe-box room with dimensions of 1.28 m × 1.28 m × 0.64 m, the field programming gate array (FPGA-based prototype machine with the proposed architecture carries out the sound rendering at run-time while the software simulation with the OpenMP parallelization takes about 12.70 min on a personal computer (PC with 32 GB random access memory (RAM and an Intel i7-6800K six-core processor running at 3.4 GHz. The throughput in the software simulation is about 194 M grids/s while it is 51.2 G grids/s in the prototype machine even if the clock frequency of the prototype machine is much lower than that of the PC. The rendering processor with a processing element (PE and interfaces consumes about 238,515 gates after fabricated by the 0.18 µm processing technology from the ROHM semiconductor Co., Ltd. (Kyoto Japan, and the power consumption is about 143.8 mW.

  18. Parallel transposition of sparse data structures

    DEFF Research Database (Denmark)

    Wang, Hao; Liu, Weifeng; Hou, Kaixi

    2016-01-01

    Many applications in computational sciences and social sciences exploit sparsity and connectivity of acquired data. Even though many parallel sparse primitives such as sparse matrix-vector (SpMV) multiplication have been extensively studied, some other important building blocks, e.g., parallel tr...... transposition in the latest vendor-supplied library on an Intel multicore CPU platform, and the MergeTrans approach achieves on average of 3.4-fold (up to 11.7-fold) speedup on an Intel Xeon Phi many-core processor....

  19. Structured building model reduction toward parallel simulation

    Energy Technology Data Exchange (ETDEWEB)

    Dobbs, Justin R. [Cornell University; Hencey, Brondon M. [Cornell University

    2013-08-26

    Building energy model reduction exchanges accuracy for improved simulation speed by reducing the number of dynamical equations. Parallel computing aims to improve simulation times without loss of accuracy but is poorly utilized by contemporary simulators and is inherently limited by inter-processor communication. This paper bridges these disparate techniques to implement efficient parallel building thermal simulation. We begin with a survey of three structured reduction approaches that compares their performance to a leading unstructured method. We then use structured model reduction to find thermal clusters in the building energy model and allocate processing resources. Experimental results demonstrate faster simulation and low error without any interprocessor communication.

  20. NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors.

    Science.gov (United States)

    Cheung, Kit; Schultz, Simon R; Luk, Wayne

    2015-01-01

    NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation.

  1. A lock circuit for a multi-core processor

    DEFF Research Database (Denmark)

    2015-01-01

    An integrated circuit comprising a multiple processor cores and a lock circuit that comprises a queue register with respective bits set or reset via respective, connections dedicated to respective processor cores, whereby the queue register identifies those among the multiple processor cores...... that are enqueued in the queue register. Furthermore, the integrated circuit comprises a current register and a selector circuit configured to select a processor core and identify that processor core by a value in the current register. A selected processor core is a prioritized processor core among the cores...... configured with an integrated circuit; and a silicon die configured with an integrated circuit....

  2. Performance of a plasma fluid code on the Intel parallel computers

    International Nuclear Information System (INIS)

    Lynch, V.E.; Carreras, B.A.; Drake, J.B.; Leboeuf, J.N.; Liewer, P.

    1992-01-01

    One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel σ machine gives an improvement factor close to 64 over the single-processor CRAY-2

  3. Performance of a plasma fluid code on the Intel parallel computers

    International Nuclear Information System (INIS)

    Lynch, V.E.; Carreras, B.A.; Drake, J.B.; Leboeuf, J.N.; Liewer, P.

    1992-01-01

    One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2. 12 refs

  4. Performance of a plasma fluid code on the Intel parallel computers

    Science.gov (United States)

    Lynch, V. E.; Carreras, B. A.; Drake, J. B.; Leboeuf, J. N.; Liewer, P.

    1992-01-01

    One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2.

  5. Comparison Of Hybrid Sorting Algorithms Implemented On Different Parallel Hardware Platforms

    Directory of Open Access Journals (Sweden)

    Dominik Zurek

    2013-01-01

    Full Text Available Sorting is a common problem in computer science. There are lot of well-known sorting algorithms created for sequential execution on a single processor. Recently, hardware platforms enable to create wide parallel algorithms. We have standard processors consist of multiple cores and hardware accelerators like GPU. The graphic cards with their parallel architecture give new possibility to speed up many algorithms. In this paper we describe results of implementation of a few different sorting algorithms on GPU cards and multicore processors. Then hybrid algorithm will be presented which consists of parts executed on both platforms, standard CPU and GPU.

  6. Test of the TRAPPISTe monolithic detector system

    Science.gov (United States)

    Soung Yee, L.; Álvarez, P.; Martin, E.; Cortina, E.; Ferrer, C.

    2013-12-01

    A monolithic pixel detector named TRAPPISTe-2 has been developed in Silicon-on-Insulator (SOI) technology. A p-n junction is implanted in the bottom handle wafer and connected to readout electronics integrated in the top active layer. The two parts are insulated from each other by a buried oxide layer resulting in a monolithic detector. Two small pixel matrices have been fabricated: one containing a 3-transistor readout and a second containing a charge sensitive amplifier readout. These two readout structures have been characterized and the pixel matrices were tested with an infrared laser source. The readout circuits are adversely affected by the backgate effect, which limits the voltage that can be applied to the metal back plane to deplete the sensor, thus narrowing the depletion width of the sensor. Despite the low depletion voltages, the integrated pixel matrices were able to respond to and track a laser source.

  7. FLUIDIZED BED STEAM REFORMER MONOLITH FORMATION

    Energy Technology Data Exchange (ETDEWEB)

    Jantzen, C

    2006-12-22

    Fluidized Bed Steam Reforming (FBSR) is being considered as an alternative technology for the immobilization of a wide variety of aqueous high sodium containing radioactive wastes at various DOE facilities in the United States. The addition of clay, charcoal, and a catalyst as co-reactants converts aqueous Low Activity Wastes (LAW) to a granular or ''mineralized'' waste form while converting organic components to CO{sub 2} and steam, and nitrate/nitrite components, if any, to N{sub 2}. The waste form produced is a multiphase mineral assemblage of Na-Al-Si (NAS) feldspathoid minerals with cage-like structures that atomically bond radionuclides like Tc-99 and anions such as SO{sub 4}, I, F, and Cl. The granular product has been shown to be as durable as LAW glass. Shallow land burial requires that the mineralized waste form be able to sustain the weight of soil overburden and potential intrusion by future generations. The strength requirement necessitates binding the granular product into a monolith. FBSR mineral products were formulated into a variety of monoliths including various cements, Ceramicrete, and hydroceramics. All but one of the nine monoliths tested met the <2g/m{sup 2} durability specification for Na and Re (simulant for Tc-99) when tested using the Product Consistency Test (PCT; ASTM C1285). Of the nine monoliths tested the cements produced with 80-87 wt% FBSR product, the Ceramicrete, and the hydroceramic produced with 83.3 wt% FBSR product, met the compressive strength and durability requirements for an LAW waste form.

  8. Discovering Motifs in Biological Sequences Using the Micron Automata Processor.

    Science.gov (United States)

    Roy, Indranil; Aluru, Srinivas

    2016-01-01

    Finding approximately conserved sequences, called motifs, across multiple DNA or protein sequences is an important problem in computational biology. In this paper, we consider the (l, d) motif search problem of identifying one or more motifs of length l present in at least q of the n given sequences, with each occurrence differing from the motif in at most d substitutions. The problem is known to be NP-complete, and the largest solved instance reported to date is (26,11). We propose a novel algorithm for the (l,d) motif search problem using streaming execution over a large set of non-deterministic finite automata (NFA). This solution is designed to take advantage of the micron automata processor, a new technology close to deployment that can simultaneously execute multiple NFA in parallel. We demonstrate the capability for solving much larger instances of the (l, d) motif search problem using the resources available within a single automata processor board, by estimating run-times for problem instances (39,18) and (40,17). The paper serves as a useful guide to solving problems using this new accelerator technology.

  9. Parallel R

    CERN Document Server

    McCallum, Ethan

    2011-01-01

    It's tough to argue with R as a high-quality, cross-platform, open source statistical software product-unless you're in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets. You'll learn the basics of Snow, Multicore, Parallel, and some Hadoop-related tools, including how to find them, how to use them, when they work well, and when they don't. With these packages, you can overcome R's single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R's memory barrier.

  10. MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

    Energy Technology Data Exchange (ETDEWEB)

    Barhen, Jacob [ORNL; Kerekes, Ryan A [ORNL; ST Charles, Jesse Lee [ORNL; Buckner, Mark A [ORNL

    2008-01-01

    High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlation processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core

  11. MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

    International Nuclear Information System (INIS)

    Barhen, Jacob; Kerekes, Ryan A.; St Charles, Jesse Lee; Buckner, Mark A.

    2008-01-01

    High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlation processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core

  12. An overview of monolithic zirconia in dentistry

    Directory of Open Access Journals (Sweden)

    Özlem Malkondu

    2016-07-01

    Full Text Available Zirconia restorations have been used successfully for years in dentistry owing to their biocompatibility and good mechanical properties. Because of their lack of translucency, zirconia cores are generally veneered with porcelain, which makes restorations weaker due to failure of the adhesion between the two materials. In recent years, all-ceramic zirconia restorations have been introduced in the dental sector with the intent to solve this problem. Besides the elimination of chipping, the reduced occlusal space requirement seems to be a clear advantage of monolithic zirconia restorations. However, scientific evidence is needed to recommend this relatively new application for clinical use. This mini-review discusses the current scientific literature on monolithic zirconia restorations. The results of in vitro studies suggested that monolithic zirconia may be the best choice for posterior fixed partial dentures in the presence of high occlusal loads and minimal occlusal restoration space. The results should be supported with much more in vitro and particularly in vivo studies to obtain a final conclusion.

  13. Characterization of SOI monolithic detector system

    Science.gov (United States)

    Álvarez-Rengifo, P. L.; Soung Yee, L.; Martin, E.; Cortina, E.; Ferrer, C.

    2013-12-01

    A monolithic active pixel sensor for charged particle tracking was developed. This research is performed within the framework of an R&D project called TRAPPISTe (Tracking Particles for Physics Instrumentation in SOI Technology) whose aim is to evaluate the feasibility of developing a Monolithic Active Pixel Sensor (MAPS) with Silicon-on-Insulator (SOI) technology. Two chips were fabricated: TRAPPISTe-1 and TRAPPISTe-2. TRAPPISTe-1 was produced at the WINFAB facility at the Université catholique de Louvain (UCL), Belgium, in a 2 μm fully depleted (FD-SOI) CMOS process. TRAPPISTe-2 was fabricated with the LAPIS 0.2 μm FD-SOI CMOS process. The electrical characterization on single transistor test structures and of the electronic readout for the TRAPPISTe series of monolithic pixel detectors was carried out. The behavior of the prototypes’ electronics as a function of the back voltage was studied. Results showed that both readout circuits exhibited sensitivity to the back voltage. Despite this unwanted secondary effect, the responses of TRAPPISTe-2 amplifiers can be improved by a variation in the circuit parameters.

  14. Metal oxide nanorod arrays on monolithic substrates

    Energy Technology Data Exchange (ETDEWEB)

    Gao, Pu-Xian; Guo, Yanbing; Ren, Zheng

    2018-01-02

    A metal oxide nanorod array structure according to embodiments disclosed herein includes a monolithic substrate having a surface and multiple channels, an interface layer bonded to the surface of the substrate, and a metal oxide nanorod array coupled to the substrate surface via the interface layer. The metal oxide can include ceria, zinc oxide, tin oxide, alumina, zirconia, cobalt oxide, and gallium oxide. The substrate can include a glass substrate, a plastic substrate, a silicon substrate, a ceramic monolith, and a stainless steel monolith. The ceramic can include cordierite, alumina, tin oxide, and titania. The nanorod array structure can include a perovskite shell, such as a lanthanum-based transition metal oxide, or a metal oxide shell, such as ceria, zinc oxide, tin oxide, alumina, zirconia, cobalt oxide, and gallium oxide, or a coating of metal particles, such as platinum, gold, palladium, rhodium, and ruthenium, over each metal oxide nanorod. Structures can be bonded to the surface of a substrate and resist erosion if exposed to high velocity flow rates.

  15. Fracture-resistant monolithic dental crowns.

    Science.gov (United States)

    Zhang, Yu; Mai, Zhisong; Barani, Amir; Bush, Mark; Lawn, Brian

    2016-03-01

    To quantify the splitting resistance of monolithic zirconia, lithium disilicate and nanoparticle-composite dental crowns. Fracture experiments were conducted on anatomically-correct monolithic crown structures cemented to standard dental composite dies, by axial loading of a hard sphere placed between the cusps. The structures were observed in situ during fracture testing, and critical loads to split the structures were measured. Extended finite element modeling (XFEM), with provision for step-by-step extension of embedded cracks, was employed to simulate full failure evolution. Experimental measurements and XFEM predictions were self-consistent within data scatter. In conjunction with a fracture mechanics equation for critical splitting load, the data were used to predict load-sustaining capacity for crowns on actual dentin substrates and for loading with a sphere of different size. Stages of crack propagation within the crown and support substrate were quantified. Zirconia crowns showed the highest fracture loads, lithium disilicate intermediate, and dental nanocomposite lowest. Dental nanocomposite crowns have comparable fracture resistance to natural enamel. The results confirm that monolithic crowns are able to sustain high bite forces. The analysis indicates what material and geometrical properties are important in optimizing crown performance and longevity. Copyright © 2015 Academy of Dental Materials. All rights reserved.

  16. Implementing Shared Memory Parallelism in MCBEND

    Directory of Open Access Journals (Sweden)

    Bird Adam

    2017-01-01

    Full Text Available MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers’s ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.

  17. Monolithic silicon photonics in a sub-100nm SOI CMOS microprocessor foundry: progress from devices to systems

    Science.gov (United States)

    Popović, Miloš A.; Wade, Mark T.; Orcutt, Jason S.; Shainline, Jeffrey M.; Sun, Chen; Georgas, Michael; Moss, Benjamin; Kumar, Rajesh; Alloatti, Luca; Pavanello, Fabio; Chen, Yu-Hsin; Nammari, Kareem; Notaros, Jelena; Atabaki, Amir; Leu, Jonathan; Stojanović, Vladimir; Ram, Rajeev J.

    2015-02-01

    We review recent progress of an effort led by the Stojanović (UC Berkeley), Ram (MIT) and Popović (CU Boulder) research groups to enable the design of photonic devices, and complete on-chip electro-optic systems and interfaces, directly in standard microelectronics CMOS processes in a microprocessor foundry, with no in-foundry process modifications. This approach allows tight and large-scale monolithic integration of silicon photonics with state-of-the-art (sub-100nm-node) microelectronics, here a 45nm SOI CMOS process. It enables natural scale-up to manufacturing, and rapid advances in device design due to process repeatability. The initial driver application was addressing the processor-to-memory communication energy bottleneck. Device results include 5Gbps modulators based on an interleaved junction that take advantage of the high resolution of the sub-100nm CMOS process. We demonstrate operation at 5fJ/bit with 1.5dB insertion loss and 8dB extinction ratio. We also demonstrate the first infrared detectors in a zero-change CMOS process, using absorption in transistor source/drain SiGe stressors. Subsystems described include the first monolithically integrated electronic-photonic transmitter on chip (modulator+driver) with 20-70fJ/bit wall plug energy/bit (2-3.5Gbps), to our knowledge the lowest transmitter energy demonstrated to date. We also demonstrate native-process infrared receivers at 220fJ/bit (5Gbps). These are encouraging signs for the prospects of monolithic electronics-photonics integration. Beyond processor-to-memory interconnects, our approach to photonics as a "More-than- Moore" technology inside advanced CMOS promises to enable VLSI electronic-photonic chip platforms tailored to a vast array of emerging applications, from optical and acoustic sensing, high-speed signal processing, RF and optical metrology and clocks, through to analog computation and quantum technology.

  18. The Associative Memory system for the FTK processor at ATLAS

    CERN Document Server

    Cipriani, R; The ATLAS collaboration; Donati, S; Giannetti, P; Lanza, A; Luciano, P; Magalotti, D; Piendibene, M

    2013-01-01

    Modern experiments search for extremely rare processes hidden in much larger background levels. As the experiment complexity, the accelerator backgrounds and luminosity increase we need increasingly complex and exclusive selections. We present results and performances of a new prototype of Associative Memory system, the core of the Fast Tracker processor (FTK). FTK is a real time tracking device for the Atlas experiment trigger upgrade. The AM system provides massive computing power to minimize the online execution time of complex tracking algorithms. The time consuming pattern recognition problem, generally referred to as the “combinatorial challenge”, is beat by the Associative Memory (AM) technology exploiting parallelism to the maximum level: it compares the event to pre-calculated “expectations” or “patterns” (pattern matching) at once looking for candidate tracks called “roads”. The problem is solved by the time data are loaded into the AM devices. We report on the tests of the integrate...

  19. The Associative Memory system for the FTK processor at ATLAS

    CERN Document Server

    Cipriani, R; The ATLAS collaboration; Donati, S; Giannetti, P; Lanza, A; Luciano, P; Magalotti, D; Piendibene, M

    2014-01-01

    Modern experiments search for extremely rare processes hidden in much larger background levels. As the experiment complexity, the accelerator backgrounds and luminosity increase we need increasingly complex and exclusive selections. We present results and performances of a new prototype of Associative Memory system, the core of the Fast Tracker processor (FTK). FTK is a real time tracking device for the Atlas experiment trigger upgrade. The AM system provides massive computing power to minimize the online execution time of complex tracking algorithms. The time consuming pattern recognition problem, generally referred to as the “combinatorial challenge”, is beat by the Associative Memory (AM) technology exploiting parallelism to the maximum level: it compares the event to pre-calculated “expectations” or “patterns” (pattern matching) at once looking for candidate tracks called “roads”. The problem is solved by the time data are loaded into the AM devices. We report on the tests of the integrate...

  20. The Associative Memory system for the FTK processor at ATLAS

    CERN Document Server

    Cipriani, R; The ATLAS collaboration; Donati, S; Giannetti, P; Lanza, A; Luciano, P; Magalotti, D; Piendibene, M

    2013-01-01

    Experiments at the LHC hadron collider search for extremely rare processes hidden in much larger background levels. As the experiment complexity, the accelerator backgrounds and instantaneus luminosity increase, increasingly complex and exclusive selections are necessary. We present results and performances of a new prototype of Associative Memory (AM) system, the core of the Fast Tracker processor (FTK). FTK is a real time tracking device for the ATLAS experiment trigger upgrade. The AM system provides massive computing power to minimize the online execution time of complex tracking algorithms. The time consuming pattern recognition problem, generally referred to as the "combinatorial challenge", is beat by the AM technology exploiting parallelism to the maximum level. The Associative Memory compares the event to pre-calculated "expectations" or "patterns" (pattern matching) at once and look for candidate tracks called "roads". The problem is solved by the time data are loaded into the AM devices. We report ...

  1. Parallel implementation of the PHOENIX generalized stellar atmosphere program. II. Wavelength parallelization

    International Nuclear Information System (INIS)

    Baron, E.; Hauschildt, Peter H.

    1998-01-01

    We describe an important addition to the parallel implementation of our generalized nonlocal thermodynamic equilibrium (NLTE) stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is, distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition, task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000 - 300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard message passing interface (MPI) library calls and is fully portable between serial and parallel computers. copyright 1998 The American Astronomical Society

  2. Optimisation of a parallel ocean general circulation model

    Science.gov (United States)

    Beare, M. I.; Stevens, D. P.

    1997-10-01

    This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.

  3. DIMACS Workshop on Interconnection Networks and Mapping, and Scheduling Parallel Computations

    CERN Document Server

    Rosenberg, Arnold L; Sotteau, Dominique; NSF Science and Technology Center in Discrete Mathematics and Theoretical Computer Science; Interconnection networks and mapping and scheduling parallel computations

    1995-01-01

    The interconnection network is one of the most basic components of a massively parallel computer system. Such systems consist of hundreds or thousands of processors interconnected to work cooperatively on computations. One of the central problems in parallel computing is the task of mapping a collection of processes onto the processors and routing network of a parallel machine. Once this mapping is done, it is critical to schedule computations within and communication among processor from universities and laboratories, as well as practitioners involved in the design, implementation, and application of massively parallel systems. Focusing on interconnection networks of parallel architectures of today and of the near future , the book includes topics such as network topologies,network properties, message routing, network embeddings, network emulation, mappings, and efficient scheduling. inputs for a process are available where and when the process is scheduled to be computed. This book contains the refereed pro...

  4. Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor

    OpenAIRE

    Byun, Chansup; Kepner, Jeremy; Arcand, William; Bestor, David; Bergeron, Bill; Gadepally, Vijay; Houle, Michael; Hubbell, Matthew; Jones, Michael; Klein, Anna; Michaleas, Peter; Milechin, Lauren; Mullen, Julie; Prout, Andrew; Rosa, Antonio

    2017-01-01

    Knights Landing (KNL) is the code name for the second-generation Intel Xeon Phi product family. KNL has generated significant interest in the data analysis and machine learning communities because its new many-core architecture targets both of these workloads. The KNL many-core vector processor design enables it to exploit much higher levels of parallelism. At the Lincoln Laboratory Supercomputing Center (LLSC), the majority of users are running data analysis applications such as MATLAB and O...

  5. Power estimation on functional level for programmable processors

    Directory of Open Access Journals (Sweden)

    M. Schneider

    2004-01-01

    the input parameters of the Correspondence to: H. Blume (blume@eecs.rwth-aachen.de arithmetic functions like e.g. the achieved degree of parallelism or the kind and number of memory accesses can be computed. This approach is exemplarily demonstrated and evaluated applying two modern digital signal processors and a variety of basic algorithms of digital signal processing. The resulting estimation values for the inspected algorithms are compared to physically measured values. A resulting maximum estimation error of 3% is achieved.

  6. Power estimation on functional level for programmable processors

    Science.gov (United States)

    Schneider, M.; Blume, H.; Noll, T. G.

    2004-05-01

    parameters of the Correspondence to: H. Blume (blume@eecs.rwth-aachen.de) arithmetic functions like e.g. the achieved degree of parallelism or the kind and number of memory accesses can be computed. This approach is exemplarily demonstrated and evaluated applying two modern digital signal processors and a variety of basic algorithms of digital signal processing. The resulting estimation values for the inspected algorithms are compared to physically measured values. A resulting maximum estimation error of 3% is achieved.

  7. The parallel algorithm for the 2D discrete wavelet transform

    Science.gov (United States)

    Barina, David; Najman, Pavel; Kleparnik, Petr; Kula, Michal; Zemcik, Pavel

    2018-04-01

    The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.

  8. Radiation-magnetohydrodynamics of fusion plasmas on parallel supercomputers

    International Nuclear Information System (INIS)

    Yasar, O.; Moses, G.A.; Tautges, T.J.

    1993-01-01

    A parallel computational model to simulate fusion plasmas in the radiation-magnetohydrodynamics (R-MHD) framework is presented. Plasmas are often treated in a fluid dynamics context (magnetohydrodynamics, MHD), but when the flow field is coupled with the radiation field it falls into a more complex category, radiation magnetohydrodynamics (R-MHD), where the interaction between the flow field and the radiation field is nonlinear. The solution for the radiation field usually dominates the R-MHD computation. To solve for the radiation field, one usually chooses the S N discrete ordinates method (a deterministic method) rather than the Monte Carlo method if the geometry is not complex. The discrete ordinates method on a massively parallel processor (Intel iPSC/860) is implemented. The speedup is 14 for a run on 16 processors and the performance is 3.7 times better than a single CRAY YMP processor implementation. (orig./DG)

  9. Multicentre evaluation of the Naída CI Q70 sound processor: feedback from cochlear implant users and professionals

    Directory of Open Access Journals (Sweden)

    Jeanette Martin

    2016-12-01

    Full Text Available The aim of this survey was to gather data from both implant recipients and professionals on the ease of use of the Naída CI Q70 (Naída CI sound processor from Advanced Bionics and on the usefulness of the new functions and features available. A secondary objective was to investigate fitting practices with the new processor. A comprehensive user satisfaction survey was conducted in a total of 186 subjects from 24 centres. In parallel, 23 professional questionnaires were collected from 11 centres. Overall, there was high satisfaction with the Naída CI processor from adults, children, experienced and new CI users as well as from professionals. The Naída CI processor was shown as being easy to use by all ages of recipients and by professionals. The majority of experienced CI users rated the Naída CI processor as being similar or better than their previous processor in all areas surveyed. The Naída CI was recommended by the professionals for fitting in all populations. Features like UltraZoom, ZoomControl and DuoPhone would not be fitted to very young children in contrast to adults. Positive ratings were obtained for ease of use, comfort and usefulness of the new functions and features of the Naída CI sound processor. Seventy-seven percent of the experienced CI users rated the new processor as being better than their previous sound processor from a general point of view. The survey also showed that fitting practices were influenced by the age of the user.

  10. Processor architecture exploration and synthesis of massively parallel multi-processor accelerators in application to LDPC decoding

    NARCIS (Netherlands)

    Jan, Y.; Jóźwiak, Lech

    Numerous modern applications in various fields, such as communication and networking, multimedia, encryption, etc., impose extremely high demands regarding performance while at the same time requiring low energy consumption, low cost, and short design time. Often these very high demands cannot be

  11. Quality-driven model-based design of multi-processor accelerators : an application to LDPC decoders

    NARCIS (Netherlands)

    Jan, Y.

    2012-01-01

    The recent spectacular progress in nano-electronic technology has enabled the implementation of very complex multi-processor systems on single chips (MPSoCs). However in parallel, new highly demanding complex embedded applications are emerging, in fields like communication and networking,

  12. Parallel Lines

    Directory of Open Access Journals (Sweden)

    James G. Worner

    2017-05-01

    Full Text Available James Worner is an Australian-based writer and scholar currently pursuing a PhD at the University of Technology Sydney. His research seeks to expose masculinities lost in the shadow of Australia’s Anzac hegemony while exploring new opportunities for contemporary historiography. He is the recipient of the Doctoral Scholarship in Historical Consciousness at the university’s Australian Centre of Public History and will be hosted by the University of Bologna during 2017 on a doctoral research writing scholarship.   ‘Parallel Lines’ is one of a collection of stories, The Shapes of Us, exploring liminal spaces of modern life: class, gender, sexuality, race, religion and education. It looks at lives, like lines, that do not meet but which travel in proximity, simultaneously attracted and repelled. James’ short stories have been published in various journals and anthologies.

  13. Parallelization of MCNP 4, a Monte Carlo neutron and photon transport code system, in highly parallel distributed memory type computer

    International Nuclear Information System (INIS)

    Masukawa, Fumihiro; Takano, Makoto; Naito, Yoshitaka; Yamazaki, Takao; Fujisaki, Masahide; Suzuki, Koichiro; Okuda, Motoi.

    1993-11-01

    In order to improve the accuracy and calculating speed of shielding analyses, MCNP 4, a Monte Carlo neutron and photon transport code system, has been parallelized and measured of its efficiency in the highly parallel distributed memory type computer, AP1000. The code has been analyzed statically and dynamically, then the suitable algorithm for parallelization has been determined for the shielding analysis functions of MCNP 4. This includes a strategy where a new history is assigned to the idling processor element dynamically during the execution. Furthermore, to avoid the congestion of communicative processing, the batch concept, processing multi-histories by a unit, has been introduced. By analyzing a sample cask problem with 2,000,000 histories by the AP1000 with 512 processor elements, the 82 % of parallelization efficiency is achieved, and the calculational speed has been estimated to be around 50 times as fast as that of FACOM M-780. (author)

  14. Bayer image parallel decoding based on GPU

    Science.gov (United States)

    Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

    2012-11-01

    In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.

  15. A Versatile Image Processor For Digital Diagnostic Imaging And Its Application In Computed Radiography

    Science.gov (United States)

    Blume, H.; Alexandru, R.; Applegate, R.; Giordano, T.; Kamiya, K.; Kresina, R.

    1986-06-01

    In a digital diagnostic imaging department, the majority of operations for handling and processing of images can be grouped into a small set of basic operations, such as image data buffering and storage, image processing and analysis, image display, image data transmission and image data compression. These operations occur in almost all nodes of the diagnostic imaging communications network of the department. An image processor architecture was developed in which each of these functions has been mapped into hardware and software modules. The modular approach has advantages in terms of economics, service, expandability and upgradeability. The architectural design is based on the principles of hierarchical functionality, distributed and parallel processing and aims at real time response. Parallel processing and real time response is facilitated in part by a dual bus system: a VME control bus and a high speed image data bus, consisting of 8 independent parallel 16-bit busses, capable of handling combined up to 144 MBytes/sec. The presented image processor is versatile enough to meet the video rate processing needs of digital subtraction angiography, the large pixel matrix processing requirements of static projection radiography, or the broad range of manipulation and display needs of a multi-modality diagnostic work station. Several hardware modules are described in detail. For illustrating the capabilities of the image processor, processed 2000 x 2000 pixel computed radiographs are shown and estimated computation times for executing the processing opera-tions are presented.

  16. Real time processor for array speckle interferometry

    Science.gov (United States)

    Chin, Gordon; Florez, Jose; Borelli, Renan; Fong, Wai; Miko, Joseph; Trujillo, Carlos

    1989-02-01

    The authors are constructing a real-time processor to acquire image frames, perform array flat-fielding, execute a 64 x 64 element two-dimensional complex FFT (fast Fourier transform) and average the power spectrum, all within the 25 ms coherence time for speckles at near-IR (infrared) wavelength. The processor will be a compact unit controlled by a PC with real-time display and data storage capability. This will provide the ability to optimize observations and obtain results on the telescope rather than waiting several weeks before the data can be analyzed and viewed with offline methods. The image acquisition and processing, design criteria, and processor architecture are described.

  17. The UA1 upgrade calorimeter trigger processor

    International Nuclear Information System (INIS)

    Bains, M.; Charleton, D.; Ellis, N.; Garvey, J.; Gregory, J.; Jimack, M.P.; Jovanovic, P.; Kenyon, I.R.; Baird, S.A.; Campbell, D.; Cawthraw, M.; Coughlan, J.; Flynn, P.; Galagedera, S.; Grayer, G.; Halsall, R.; Shah, T.P.; Stephens, R.; Biddulph, P.; Eisenhandler, E.; Fensome, I.F.; Landon, M.; Robinson, D.; Oliver, J.; Sumorok, K.

    1990-01-01

    The increased luminosity of the improved CERN Collider and the more subtle signals of second-generation collider physics demand increasingly sophisticated triggering. We have built a new first-level trigger processor designed to use the excellent granularity of the UA1 upgrade calorimeter. This device is entirely digital and handles events in 1.5 μs, thus introducing no dead time. Its most novel feature is fast two-dimensional electromagnetic cluster-finding with the possibility of demanding an isolated shower of limited penetration. The processor allows multiple combinations of triggers on electromagnetic showers, hadronic jets and energy sums, including a total-energy veto of multiple interactions and a full vector sum of missing transverse energy. This hard-wired processor is about five times more powerful than its predecessor, and makes extensive use of pipelining techniques. It was used extensively in the 1988 and 1989 runs of the CERN Collider. (orig.)

  18. Embedded processor extensions for image processing

    Science.gov (United States)

    Thevenin, Mathieu; Paindavoine, Michel; Letellier, Laurent; Heyrman, Barthélémy

    2008-04-01

    The advent of camera phones marks a new phase in embedded camera sales. By late 2009, the total number of camera phones will exceed that of both conventional and digital cameras shipped since the invention of photography. Use in mobile phones of applications like visiophony, matrix code readers and biometrics requires a high degree of component flexibility that image processors (IPs) have not, to date, been able to provide. For all these reasons, programmable processor solutions have become essential. This paper presents several techniques geared to speeding up image processors. It demonstrates that a gain of twice is possible for the complete image acquisition chain and the enhancement pipeline downstream of the video sensor. Such results confirm the potential of these computing systems for supporting future applications.

  19. The UA1 upgrade calorimeter trigger processor

    International Nuclear Information System (INIS)

    Bains, N.; Baird, S.A.; Biddulph, P.

    1990-01-01

    The increased luminosity of the improved CERN Collider and the more subtle signals of second-generation collider physics demand increasingly sophisticated triggering. We have built a new first-level trigger processor designed to use the excellent granularity of the UA1 upgrade calorimeter. This device is entirely digital and handles events in 1.5 μs, thus introducing no deadtime. Its most novel feature is fast two-dimensional electromagnetic cluster-finding with the possibility of demanding an isolated shower of limited penetration. The processor allows multiple combinations of triggers on electromagnetic showers, hadronic jets and energy sums, including a total-energy veto of multiple interactions and a full vector sum of missing transverse energy. This hard-wired processor is about five times more powerful than its predecessor, and makes extensive use of pipelining techniques. It was used extensively in the 1988 and 1989 runs of the CERN Collider. (author)

  20. Development methods for VLSI-processors

    International Nuclear Information System (INIS)

    Horninger, K.; Sandweg, G.

    1982-01-01

    The aim of this project, which was originally planed for 3 years, was the development of modern system and circuit concepts, for VLSI-processors having a 32 bit wide data path. The result of this first years work is the concept of a general purpose processor. This processor is not only logically but also physically (on the chip) divided into four functional units: a microprogrammable instruction unit, an execution unit in slice technique, a fully associative cache memory and an I/O unit. For the ALU of the execution unit circuits in PLA and slice techniques have been realized. On the basis of regularity, area consumption and achievable performance the slice technique has been prefered. The designs utilize selftesting circuitry. (orig.) [de

  1. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-05-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. 6 figs

  2. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-01-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. (orig.)

  3. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    Science.gov (United States)

    Nash, Thomas

    1989-12-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.

  4. An intelligent allocation algorithm for parallel processing

    Science.gov (United States)

    Carroll, Chester C.; Homaifar, Abdollah; Ananthram, Kishan G.

    1988-01-01

    The problem of allocating nodes of a program graph to processors in a parallel processing architecture is considered. The algorithm is based on critical path analysis, some allocation heuristics, and the execution granularity of nodes in a program graph. These factors, and the structure of interprocessor communication network, influence the allocation. To achieve realistic estimations of the executive durations of allocations, the algorithm considers the fact that nodes in a program graph have to communicate through varying numbers of tokens. Coarse and fine granularities have been implemented, with interprocessor token-communication duration, varying from zero up to values comparable to the execution durations of individual nodes. The effect on allocation of communication network structures is demonstrated by performing allocations for crossbar (non-blocking) and star (blocking) networks. The algorithm assumes the availability of as many processors as it needs for the optimal allocation of any program graph. Hence, the focus of allocation has been on varying token-communication durations rather than varying the number of processors. The algorithm always utilizes as many processors as necessary for the optimal allocation of any program graph, depending upon granularity and characteristics of the interprocessor communication network.

  5. Multi-petascale highly efficient parallel supercomputer

    Science.gov (United States)

    Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

    2015-07-14

    A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.

  6. Performance modeling of parallel algorithms for solving neutron diffusion problems

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Kirk, B.L.

    1995-01-01

    Neutron diffusion calculations are the most common computational methods used in the design, analysis, and operation of nuclear reactors and related activities. Here, mathematical performance models are developed for the parallel algorithm used to solve the neutron diffusion equation on message passing and shared memory multiprocessors represented by the Intel iPSC/860 and the Sequent Balance 8000, respectively. The performance models are validated through several test problems, and these models are used to estimate the performance of each of the two considered architectures in situations typical of practical applications, such as fine meshes and a large number of participating processors. While message passing computers are capable of producing speedup, the parallel efficiency deteriorates rapidly as the number of processors increases. Furthermore, the speedup fails to improve appreciably for massively parallel computers so that only small- to medium-sized message passing multiprocessors offer a reasonable platform for this algorithm. In contrast, the performance model for the shared memory architecture predicts very high efficiency over a wide range of number of processors reasonable for this architecture. Furthermore, the model efficiency of the Sequent remains superior to that of the hypercube if its model parameters are adjusted to make its processors as fast as those of the iPSC/860. It is concluded that shared memory computers are better suited for this parallel algorithm than message passing computers

  7. Parallel alternating direction preconditioner for isogeometric simulations of explicit dynamics

    KAUST Repository

    Łoś, Marcin; Woźniak, Maciej; Paszyński, Maciej; Dalcin, Lisandro; Calo, Victor M.

    2015-01-01

    incorporated as a part of PETIGA an isogeometric framework [7] build on top of PETSc [8]. We show the scalability of the parallel algorithm on STAMPEDE linux cluster up to 10,000 processors, as well as the convergence rate of the PCG solver

  8. Computation of watersheds based on parallel graph algorithms

    NARCIS (Netherlands)

    Meijster, A.; Roerdink, J.B.T.M.; Maragos, P; Schafer, RW; Butt, MA

    1996-01-01

    In this paper the implementation of a parallel watershed algorithm is described. The algorithm has been implemented on a Cray J932, which is a shared memory architecture with 32 processors. The watershed transform has generally been considered to be inherently sequential, but recently a few research

  9. Software-defined reconfigurable microwave photonics processor.

    Science.gov (United States)

    Pérez, Daniel; Gasulla, Ivana; Capmany, José

    2015-06-01

    We propose, for the first time to our knowledge, a software-defined reconfigurable microwave photonics signal processor architecture that can be integrated on a chip and is capable of performing all the main functionalities by suitable programming of its control signals. The basic configuration is presented and a thorough end-to-end design model derived that accounts for the performance of the overall processor taking into consideration the impact and interdependencies of both its photonic and RF parts. We demonstrate the model versatility by applying it to several relevant application examples.

  10. Time Manager Software for a Flight Processor

    Science.gov (United States)

    Zoerne, Roger

    2012-01-01

    Data analysis is a process of inspecting, cleaning, transforming, and modeling data to highlight useful information and suggest conclusions. Accurate timestamps and a timeline of vehicle events are needed to analyze flight data. By moving the timekeeping to the flight processor, there is no longer a need for a redundant time source. If each flight processor is initially synchronized to GPS, they can freewheel and maintain a fairly accurate time throughout the flight with no additional GPS time messages received. How ever, additional GPS time messages will ensure an even greater accuracy. When a timestamp is required, a gettime function is called that immediately reads the time-base register.

  11. Selective oxidation of cyclohexene through gold functionalized silica monolith microreactors

    Science.gov (United States)

    Alotaibi, Mohammed T.; Taylor, Martin J.; Liu, Dan; Beaumont, Simon K.; Kyriakou, Georgios

    2016-04-01

    Two simple, reproducible methods of preparing evenly distributed Au nanoparticle containing mesoporous silica monoliths are investigated. These Au nanoparticle containing monoliths are subsequently investigated as flow reactors for the selective oxidation of cyclohexene. In the first strategy, the silica monolith was directly impregnated with Au nanoparticles during the formation of the monolith. The second approach was to pre-functionalize the monolith with thiol groups tethered within the silica mesostructure. These can act as evenly distributed anchors for the Au nanoparticles to be incorporated by flowing a Au nanoparticle solution through the thiol functionalized monolith. Both methods led to successfully achieving even distribution of Au nanoparticles along the length of the monolith as demonstrated by ICP-OES. However, the impregnation method led to strong agglomeration of the Au nanoparticles during subsequent heating steps while the thiol anchoring procedure maintained the nanoparticles in the range of 6.8 ± 1.4 nm. Both Au nanoparticle containing monoliths as well as samples with no Au incorporated were tested for the selective oxidation of cyclohexene under constant flow at 30 °C. The Au free materials were found to be catalytically inactive with Au being the minimum necessary requirement for the reaction to proceed. The impregnated Au-containing monolith was found to be less active than the thiol functionalized Au-containing material, attributable to the low metal surface area of the Au nanoparticles. The reaction on the thiol functionalized Au-containing monolith was found to depend strongly on the type of oxidant used: tert-butyl hydroperoxide (TBHP) was more active than H2O2, likely due to the thiol induced hydrophobicity in the monolith.

  12. Parallelization of a Monte Carlo particle transport simulation code

    Science.gov (United States)

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  13. Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

    Directory of Open Access Journals (Sweden)

    Abdul Kareem PARCHUR

    2012-08-01

    Full Text Available High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310. The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many key application areas in modern generation. The scaling of performance in two major series of Intel Xeon processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310 has been analyzed using the performance numbers of 12 CPU2006 integer benchmarks, performance numbers that exhibit significant differences in performance. The results and analysis can be used by performance engineers, scientists and developers to better understand the performance scaling in modern generation processors.

  14. Real time image synthesis on a SIMD linear array processor: algorithms and architectures

    International Nuclear Information System (INIS)

    Letellier, Laurent

    1993-01-01

    Nowadays, image synthesis has become a widely used technique. The impressive computing power required for real time applications necessitates the use of parallel architectures. In this context, we evaluate an SIMD linear parallel architecture, SYMPATI2, dedicated to image processing. The objective of this study is to propose a cost-effective graphics accelerator relying on SYMPATI2's modular and programmable structure. The parallelization of basic image synthesis algorithms on SYMPATI2 enables us to determine its limits in this application field. These limits lead us to evaluate a new structure with a fast intercommunication network between processors, but processors have to support the message consistency, which brings about a strong decrease in performance. To solve this problem, we suggest a simple network whose access priorities are represented by tokens. The simulations of this new architecture indicate that the SIMD mode causes a drastic cut in parallelism. To cope with this drawback, we propose a context switching procedure which reduces the SIMD rigidity and increases the parallelism rate significantly. Then, the graphics accelerator we propose is compared with existing graphics workstations. This comparison indicates that our structure, which is able to accelerate both image synthesis and image processing, is competitive and well-suited for multimedia applications. (author) [fr

  15. Compilation Techniques Specific for a Hardware Cryptography-Embedded Multimedia Mobile Processor

    Directory of Open Access Journals (Sweden)

    Masa-aki FUKASE

    2007-12-01

    Full Text Available The development of single chip VLSI processors is the key technology of ever growing pervasive computing to answer overall demands for usability, mobility, speed, security, etc. We have so far developed a hardware cryptography-embedded multimedia mobile processor architecture, HCgorilla. Since HCgorilla integrates a wide range of techniques from architectures to applications and languages, one-sided design approach is not always useful. HCgorilla needs more complicated strategy, that is, hardware/software (H/S codesign. Thus, we exploit the software support of HCgorilla composed of a Java interface and parallelizing compilers. They are assumed to be installed in servers in order to reduce the load and increase the performance of HCgorilla-embedded clients. Since compilers are the essence of software's responsibility, we focus in this article on our recent results about the design, specifications, and prototyping of parallelizing compilers for HCgorilla. The parallelizing compilers are composed of a multicore compiler and a LIW compiler. They are specified to abstract parallelism from executable serial codes or the Java interface output and output the codes executable in parallel by HCgorilla. The prototyping compilers are written in Java. The evaluation by using an arithmetic test program shows the reasonability of the prototyping compilers compared with hand compilers.

  16. Kalman Filter Tracking on Parallel Architectures

    International Nuclear Information System (INIS)

    Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

    2016-01-01

    Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment

  17. Special purpose processors for high energy physics applications

    International Nuclear Information System (INIS)

    Verkerk, C.

    1978-01-01

    The review on the subject of hardware processors from very fast decision logic for the split field magnet facility at CERN, to a point-finding processor used to relieve the data-acquisition minicomputer from the task of monitoring the SPS experiment is given. Block diagrams of decision making processor, point-finding processor, complanarity and opening angle processor and programmable track selector module are presented and discussed. The applications of fully programmable but slower processor on the one hand, and very fast and programmable decision logic on the other hand are given in this review

  18. Safety characteristics of the monolithic CFC divertor

    International Nuclear Information System (INIS)

    Zucchetti, M.; Merola, M.; Matera, R.

    1994-01-01

    The main distinguishing feature of the monolithic CFC divertor is the use of a single material, a carbon fibre reinforced carbon, for the protective armour, the heat sink and the cooling channels. This removes joint interface problems which are one of the most important concerns related to the reference solutions of the ITER CDA divertor. An activation analysis of the different coolant options for this concept is presented. It turns out that neither short-term nor long-term activation are a concern for any coolants investigated. Therefore the proposed concept proves to be attractive from a safety stand-point also. ((orig.))

  19. Safety characteristics of the monolithic CFC divertor

    Science.gov (United States)

    Zucchetti, M.; Merola, M.; Matera, R.

    1994-09-01

    The main distinguishing feature of the monolithic CFC divertor is the use of a single material, a carbon fibre reinforced carbon, for the protective armour, the heat sink and the cooling channels. This removes joint interface problems which are one of the most important concerns related to the reference solutions of the ITER CDA divertor. An activation analysis of the different coolant options for this concept is presented. It turns out that neither short-term nor long-term activation are a concern for any coolants investigated. Therefore the proposed concept proves to be attractive from a safety stand-point also.

  20. A novel photocatalytic monolith reactor for multiphase heterogeneous photocatalysis

    NARCIS (Netherlands)

    Du, P.; Carneiro, J.T.; Moulijn, J.A.; Mul, Guido

    2008-01-01

    A novel reactor for multi-phase photocatalysis is presented, the so-called internally illuminated monolith reactor (IIMR). In the concept of the IIMR, side light emitting fibers are placed inside the channels of a ceramic monolith, equipped with a TiO2 photocatalyst coated on the wall of each

  1. Immobilisation of shredded soft waste in cement monolith

    International Nuclear Information System (INIS)

    Brown, D.J.; Dalton, M.J.; Smith, D.L.

    1983-04-01

    A grouting process for the immobilisation of shredded contaminated laboratory waste in a cement monolith is being developed at the Atomic Energy Establishment Winfrith. The objective is to produce a 'monolithic' type package which is acceptable both for sea and land disposal. The work carried out on this project in the period April 1982 - March 1983 is summarised in this report. (author)

  2. Fabrication of mesoporous polymer monolith: a template-free approach.

    Science.gov (United States)

    Okada, Keisuke; Nandi, Mahasweta; Maruyama, Jun; Oka, Tatsuya; Tsujimoto, Takashi; Kondoh, Katsuyoshi; Uyama, Hiroshi

    2011-07-14

    Mesoporous polyacrylonitrile (PAN) monolith has been fabricated by a template-free approach using the unique affinity of PAN towards a water/dimethyl sulfoxide (DMSO) mixture. A newly developed Thermally Induced Phase Separation Technique (TIPS) has been used to obtain the polymer monoliths and their microstructures have been controlled by optimizing the concentration and cooling temperature.

  3. Creating deep soil core monoliths: Beyond the solum

    Science.gov (United States)

    Soil monoliths serve as useful teaching aids in the study of the Earth’s critical zone where rock, soil, water, air, and organisms interact. Typical monolith preparation has so far been confined to the 1 to 2-m depth of the solum. Critical ecosystem services provided by soils include materials from ...

  4. A Monolithic Perovskite Structure for Use as a Magnetic Regenerator

    DEFF Research Database (Denmark)

    Pryds, Nini; Clemens, Frank; Menon, Mohan

    2011-01-01

    A La0.67Ca0.26Sr0.07Mn1.05O3 (LCSM) perovskite was prepared for the first time as a ceramic monolithic regenerator used in a regenerative magnetic refrigeration device. The parameters influencing the extrusion process and the performance of the regenerator, such as the nature of the monolith paste...

  5. Fine-grain concrete from mining waste for monolithic construction

    Science.gov (United States)

    Lesovik, R. V.; Ageeva, M. S.; Lesovik, G. A.; Sopin, D. M.; Kazlitina, O. V.; Mitrokhin, A. A.

    2018-03-01

    The technology of a monolithic construction is a well-established practice among most Russian real estate developers. The strong points of the technology are low cost of materials and lower demand for qualified workers. The monolithic construction uses various types of reinforced slabs and foamed concrete, since they are easy to use and highly durable; they also need practically no additional treatment.

  6. Energy Absorption of Monolithic and Fibre Reinforced Aluminium Cylinders

    NARCIS (Netherlands)

    De Kanter, J.L.C.G.

    2006-01-01

    Summary accompanying the thesis: Energy Absorption of Monolithic and Fibre Reinforced Aluminium Cylinders by Jens de Kanter This thesis presents the investigation of the crush behaviour of both monolithic aluminium cylinders and externally fibre reinforced aluminium cylinders. The research is based

  7. Media Presentation Synchronisation for Non-monolithic Rendering Architectures

    NARCIS (Netherlands)

    I. Vaishnavi (Ishan); D.C.A. Bulterman (Dick); P.S. Cesar Garcia (Pablo Santiago); B. Gao (Bo)

    2007-01-01

    htmlabstractNon-monolithic renderers are physically distributed media playback engines. Non-monolithic renderers may use a number of different underlying network connection types to transmit media items belonging to a presentation. There is therefore a need for a media based and inter-network- type

  8. Noise limitations in optical linear algebra processors.

    Science.gov (United States)

    Batsell, S G; Jong, T L; Walkup, J F; Krile, T F

    1990-05-10

    A general statistical noise model is presented for optical linear algebra processors. A statistical analysis which includes device noise, the multiplication process, and the addition operation is undertaken. We focus on those processes which are architecturally independent. Finally, experimental results which verify the analytical predictions are also presented.

  9. Cassava processors' awareness of occupational and environmental ...

    African Journals Online (AJOL)

    A larger percentage (74.5%) of the respondents indicated that the Agricultural Development Programme (ADP) is their source of information. The result also showed that processor's awareness of occupational hazards associated with the different stages of cassava processing vary because their involvement in these stages

  10. A high-speed analog neural processor

    NARCIS (Netherlands)

    Masa, P.; Masa, Peter; Hoen, Klaas; Hoen, Klaas; Wallinga, Hans

    1994-01-01

    Targeted at high-energy physics research applications, our special-purpose analog neural processor can classify up to 70 dimensional vectors within 50 nanoseconds. The decision-making process of the implemented feedforward neural network enables this type of computation to tolerate weight

  11. Beeldverwerking met de Micron Automatic Processor

    OpenAIRE

    Goyens, Frank

    2017-01-01

    Deze thesis is een onderzoek naar toepassingen binnen beeldverwerking op de Micron Automata Processor hardware. De hardware wordt vergeleken met populaire hedendaagse hardware. Ook bevat dit onderzoek nuttige informatie en strategieën voor het ontwikkelen van nieuwe toepassingen. Bevindingen in dit onderzoek omvatten proof of concept algoritmes en een praktische toepassing.

  12. 7 CFR 1215.14 - Processor.

    Science.gov (United States)

    2010-01-01

    ... 7 Agriculture 10 2010-01-01 2010-01-01 false Processor. 1215.14 Section 1215.14 Agriculture Regulations of the Department of Agriculture (Continued) AGRICULTURAL MARKETING SERVICE (MARKETING AGREEMENTS... CONSUMER INFORMATION Popcorn Promotion, Research, and Consumer Information Order Definitions § 1215.14...

  13. Simplifying cochlear implant speech processor fitting

    NARCIS (Netherlands)

    Willeboer, C.

    2008-01-01

    Conventional fittings of the speech processor of a cochlear implant (CI) rely to a large extent on the implant recipient's subjective responses. For each of the 22 intracochlear electrodes the recipient has to indicate the threshold level (T-level) and comfortable loudness level (C-level) while

  14. Space Station Water Processor Process Pump

    Science.gov (United States)

    Parker, David

    1995-01-01

    This report presents the results of the development program conducted under contract NAS8-38250-12 related to the International Space Station (ISS) Water Processor (WP) Process Pump. The results of the Process Pumps evaluation conducted on this program indicates that further development is required in order to achieve the performance and life requirements for the ISSWP.

  15. Interleaved Subtask Scheduling on Multi Processor SOC

    NARCIS (Netherlands)

    Zhe, M.

    2006-01-01

    The ever-progressing semiconductor processing technique has integrated more and more embedded processors on a single system-on-achip (SoC). With such powerful SoC platforms, and also due to the stringent time-to-market deadlines, many functionalities which used to be implemented in ASICs are

  16. User manual Dieka PreProcessor

    NARCIS (Netherlands)

    Valkering, Kasper

    2000-01-01

    This is the user manual belonging to the Dieka-PreProcessor. This application was written by Wenhua Cao and revised and expanded by Kasper Valkering. The aim of this preproccesor is to be able to draw and mesh extrusion dies in ProEngineer, and do the FE-calculation in Dieka. The preprocessor makes

  17. Globe hosts launch of new processor

    CERN Multimedia

    2006-01-01

    Launch of the quadecore processor chip at the Globe. On 14 November, in a series of major media events around the world, the chip-maker Intel launched its new 'quadcore' processor. For the regions of Europe, the Middle East and Africa, the day-long launch event took place in CERN's Globe of Science and Innovation, with over 30 journalists in attendance, coming from as far away as Johannesburg and Dubai. CERN was a significant choice for the event: the first tests of this new generation of processor in Europe had been made at CERN over the preceding months, as part of CERN openlab, a research partnership with leading IT companies such as Intel, HP and Oracle. The event also provided the opportunity for the journalists to visit ATLAS and the CERN Computer Centre. The strategy of putting multiple processor cores on the same chip, which has been pursued by Intel and other chip-makers in the last few years, represents an important departure from the more traditional improvements in the sheer speed of such chips. ...

  18. Synthesis of Porous Carbon Monoliths Using Hard Templates.

    Science.gov (United States)

    Klepel, Olaf; Danneberg, Nina; Dräger, Matti; Erlitz, Marcel; Taubert, Michael

    2016-03-21

    The preparation of porous carbon monoliths with a defined shape via template-assisted routes is reported. Monoliths made from porous concrete and zeolite were each used as the template. The porous concrete-derived carbon monoliths exhibited high gravimetric specific surface areas up to 2000 m²·g -1 . The pore system comprised macro-, meso-, and micropores. These pores were hierarchically arranged. The pore system was created by the complex interplay of the actions of both the template and the activating agent as well. On the other hand, zeolite-made template shapes allowed for the preparation of microporous carbon monoliths with a high volumetric specific surface area. This feature could be beneficial if carbon monoliths must be integrated into technical systems under space-limited conditions.

  19. An accurate projection algorithm for array processor based SPECT systems

    International Nuclear Information System (INIS)

    King, M.A.; Schwinger, R.B.; Cool, S.L.

    1985-01-01

    A data re-projection algorithm has been developed for use in single photon emission computed tomography (SPECT) on an array processor based computer system. The algorithm makes use of an accurate representation of pixel activity (uniform square pixel model of intensity distribution), and is rapidly performed due to the efficient handling of an array based algorithm and the Fast Fourier Transform (FFT) on parallel processing hardware. The algorithm consists of using a pixel driven nearest neighbour projection operation to an array of subdivided projection bins. This result is then convolved with the projected uniform square pixel distribution before being compressed to original bin size. This distribution varies with projection angle and is explicitly calculated. The FFT combined with a frequency space multiplication is used instead of a spatial convolution for more rapid execution. The new algorithm was tested against other commonly used projection algorithms by comparing the accuracy of projections of a simulated transverse section of the abdomen against analytically determined projections of that transverse section. The new algorithm was found to yield comparable or better standard error and yet result in easier and more efficient implementation on parallel hardware. Applications of the algorithm include iterative reconstruction and attenuation correction schemes and evaluation of regions of interest in dynamic and gated SPECT

  20. Concurrent and Accurate Short Read Mapping on Multicore Processors.

    Science.gov (United States)

    Martínez, Héctor; Tárraga, Joaquín; Medina, Ignacio; Barrachina, Sergio; Castillo, Maribel; Dopazo, Joaquín; Quintana-Ortí, Enrique S

    2015-01-01

    We introduce a parallel aligner with a work-flow organization for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, HPG Aligner SA (HPG Aligner SA is an open-source application. The software is available at http://www.opencb.org, exploits a suffix array to rapidly map a large fraction of the RNA fragments (reads), as well as leverages the accuracy of the Smith-Waterman algorithm to deal with conflictive reads. The aligner is enhanced with a careful strategy to detect splice junctions based on an adaptive division of RNA reads into small segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing crucial information for the successful alignment of the complete reads. The experimental results on a platform with Intel multicore technology report the parallel performance of HPG Aligner SA, on RNA reads of 100-400 nucleotides, which excels in execution time/sensitivity to state-of-the-art aligners such as TopHat 2+Bowtie 2, MapSplice, and STAR.

  1. Data-parallel tomographic reconstruction : A comparison of filtered backprojection and direct Fourier reconstruction

    NARCIS (Netherlands)

    Roerdink, J.B.T.M.; Westenberg, M.A

    1998-01-01

    We consider the parallelization of two standard 2D reconstruction algorithms, filtered backprojection and direct Fourier reconstruction, using the data-parallel programming style. The algorithms are implemented on a Connection Machine CM-5 with 16 processors and a peak performance of 2 Gflop/s.

  2. An efficient implementation of a backpropagation learning algorithm on quadrics parallel supercomputer

    International Nuclear Information System (INIS)

    Taraglio, S.; Massaioli, F.

    1995-08-01

    A parallel implementation of a library to build and train Multi Layer Perceptrons via the Back Propagation algorithm is presented. The target machine is the SIMD massively parallel supercomputer Quadrics. Performance measures are provided on three different machines with different number of processors, for two network examples. A sample source code is given

  3. Parallel computing in plasma physics: Nonlinear instabilities

    International Nuclear Information System (INIS)

    Pohn, E.; Kamelander, G.; Shoucri, M.

    2000-01-01

    A Vlasov-Poisson-system is used for studying the time evolution of the charge-separation at a spatial one- as well as a two-dimensional plasma-edge. Ions are advanced in time using the Vlasov-equation. The whole three-dimensional velocity-space is considered leading to very time-consuming four-resp. five-dimensional fully kinetic simulations. In the 1D simulations electrons are assumed to behave adiabatic, i.e. they are Boltzmann-distributed, leading to a nonlinear Poisson-equation. In the 2D simulations a gyro-kinetic approximation is used for the electrons. The plasma is assumed to be initially neutral. The simulations are performed at an equidistant grid. A constant time-step is used for advancing the density-distribution function in time. The time-evolution of the distribution function is performed using a splitting scheme. Each dimension (x, y, υ x , υ y , υ z ) of the phase-space is advanced in time separately. The value of the distribution function for the next time is calculated from the value of an - in general - interstitial point at the present time (fractional shift). One-dimensional cubic-spline interpolation is used for calculating the interstitial function values. After the fractional shifts are performed for each dimension of the phase-space, a whole time-step for advancing the distribution function is finished. Afterwards the charge density is calculated, the Poisson-equation is solved and the electric field is calculated before the next time-step is performed. The fractional shift method sketched above was parallelized for p processors as follows. Considering first the shifts in y-direction, a proper parallelization strategy is to split the grid into p disjoint υ z -slices, which are sub-grids, each containing a different 1/p-th part of the υ z range but the whole range of all other dimensions. Each processor is responsible for performing the y-shifts on a different slice, which can be done in parallel without any communication between

  4. Parallel embedded systems: where real-time and low-power meet

    DEFF Research Database (Denmark)

    Karakehayov, Zdravko; Guo, Yu

    2008-01-01

    This paper introduces a combination of models and proofs for optimal power management via Dynamic Frequency Scaling and Dynamic Voltage Scaling. The approach is suitable for systems on a chip or microcontrollers where processors run in parallel with embedded peripherals. We have developed...... a software tool, called CASTLE, to provide computer assistance in the design process of energy-aware embedded systems. The tool considers single processor and parallel architectures. An example shows an energy reduction of 23% when the tool allocates two microcontrollers for parallel execution....

  5. Eigensolution of finite element problems in a completely connected parallel architecture

    Science.gov (United States)

    Akl, Fred A.; Morel, Michael R.

    1989-01-01

    A parallel algorithm for the solution of the generalized eigenproblem in linear elastic finite element analysis, (K)(phi)=(M)(phi)(omega), where (K) and (M) are of order N, and (omega) is of order q is presented. The parallel algorithm is based on a completely connected parallel architecture in which each processor is allowed to communicate with all other processors. The algorithm has been successfully implemented on a tightly coupled multiple-instruction-multiple-data (MIMD) parallel processing computer, Cray X-MP. A finite element model is divided into m domains each of which is assumed to process n elements. Each domain is then assigned to a processor, or to a logical processor (task) if the number of domains exceeds the number of physical processors. The macro-tasking library routines are used in mapping each domain to a user task. Computational speed-up and efficiency are used to determine the effectiveness of the algorithm. The effect of the number of domains, the number of degrees-of-freedom located along the global fronts and the dimension of the subspace on the performance of the algorithm are investigated. For a 64-element rectangular plate, speed-ups of 1.86, 3.13, 3.18 and 3.61 are achieved on two, four, six and eight processors, respectively.

  6. Growth techniques for monolithic YBCO solenoidal magnets

    International Nuclear Information System (INIS)

    Scruggs, S.J.; Putman, P.T.; Fang, H.; Alessandrini, M.; Salama, K.

    2006-01-01

    The possibility of growing large single domain YBCO solenoids by the use of a large seed has been investigated. There are two known methods for producing a similar solenoid. This first is a conventional top seeded melt growth process followed by a post processing machining step to create the bore. The second involves using multiple seeds spaced around the magnet bore. The appeal of the new technique lies in decreasing processing time compared to the single seed technique, while avoiding alignment problems found in the multiple seeding technique. By avoiding these problems, larger diameter monoliths can be produced. Large diameter monoliths are beneficial because the maximum magnetic field produced by a trapped field magnet is proportional to the radius of the sample. Furthermore, the availability of trapped field magnets with large diameter could enable their use in applications that traditionally have been considered to require wound electromagnets, such as beam bending magnets for particle accelerators or electric propulsion. A comparison of YBCO solenoids grown by the use of a large seed and grown by the use of two small seeds simulating multiple seeding is made. Trapped field measurements as well as microstructure evaluation were used in characterization of each solenoid. Results indicate that high quality growth occurs only in the vicinity of the seeds for the multiple seeded sample, while the sample with the large seeded exhibited high quality growth throughout the entire sample

  7. Growth techniques for monolithic YBCO solenoidal magnets

    Energy Technology Data Exchange (ETDEWEB)

    Scruggs, S.J. [Texas Center for Superconductivity at University of Houston, 4800 Calhoun, Houston, TX 77204 (United States)]. E-mail: Sscruggs2@uh.edu; Putman, P.T. [Texas Center for Superconductivity at University of Houston, 4800 Calhoun, Houston, TX 77204 (United States); Fang, H. [Texas Center for Superconductivity at University of Houston, 4800 Calhoun, Houston, TX 77204 (United States); Alessandrini, M. [Texas Center for Superconductivity at University of Houston, 4800 Calhoun, Houston, TX 77204 (United States); Salama, K. [Texas Center for Superconductivity at University of Houston, 4800 Calhoun, Houston, TX 77204 (United States)

    2006-10-01

    The possibility of growing large single domain YBCO solenoids by the use of a large seed has been investigated. There are two known methods for producing a similar solenoid. This first is a conventional top seeded melt growth process followed by a post processing machining step to create the bore. The second involves using multiple seeds spaced around the magnet bore. The appeal of the new technique lies in decreasing processing time compared to the single seed technique, while avoiding alignment problems found in the multiple seeding technique. By avoiding these problems, larger diameter monoliths can be produced. Large diameter monoliths are beneficial because the maximum magnetic field produced by a trapped field magnet is proportional to the radius of the sample. Furthermore, the availability of trapped field magnets with large diameter could enable their use in applications that traditionally have been considered to require wound electromagnets, such as beam bending magnets for particle accelerators or electric propulsion. A comparison of YBCO solenoids grown by the use of a large seed and grown by the use of two small seeds simulating multiple seeding is made. Trapped field measurements as well as microstructure evaluation were used in characterization of each solenoid. Results indicate that high quality growth occurs only in the vicinity of the seeds for the multiple seeded sample, while the sample with the large seeded exhibited high quality growth throughout the entire sample.

  8. Data collection from FASTBUS to a DEC UNIBUS processor through the UNIBUS-Processor Interface

    International Nuclear Information System (INIS)

    Larwill, M.; Barsotti, E.; Lesny, D.; Pordes, R.

    1983-01-01

    This paper describes the use of the UNIBUS Processor Interface, an interface between FASTBUS and the Digital Equipment Corporation UNIBUS. The UPI was developed by Fermilab and the University of Illinois. Details of the use of this interface in a high energy physics experiment at Fermilab are given. The paper includes a discussion of the operation of the UPI on the UNIBUS of a VAX-11, and plans for using the UPI to perform data acquisition from FASTBUS to a VAX-11 Processor

  9. XOP: A second generation fast processor for on-line use in high energy physics experiments

    International Nuclear Information System (INIS)

    Lingjaerde, T.

    1981-01-01

    Processors for trigger calculations and data compression in high energy physics are characterized by a high data input capability combined with fas execution of relatively simple routines. In order to achieve the required performance it is advantageous to replace the classical computer instruction-set by microcoded instructions, the various fields of which control the internal subunits in parallel. The fast processor called ESOP is based on such a principle: the different operations are handled step by step by dedicated optimized modules under control of a central instruction unit. Thus, the arithmetic operations, address calculations, conditional checking, loop counts and next instruction evaluation all overlap in time. Based upon the experience from ESOP the architecture of a new processor 'XOP' is beginning to take shape which will be faster and easier to use. In this context the most important innovations are: easy handling of operands in the arithmetic unit by means of three data buses and large data files, a powerful data addressing unit for easy handling of vectors, as well as single operands, and a very flexible logic for conditional branching. Input/output will be made transparent through the introduction of internal fast processors which will be used in conjunction with powerful firmware as a software debugging aid. (orig.)

  10. Array processors based on Gaussian fraction-free method

    Energy Technology Data Exchange (ETDEWEB)

    Peng, S; Sedukhin, S [Aizu Univ., Aizuwakamatsu, Fukushima (Japan); Sedukhin, I

    1998-03-01

    The design of algorithmic array processors for solving linear systems of equations using fraction-free Gaussian elimination method is presented. The design is based on a formal approach which constructs a family of planar array processors systematically. These array processors are synthesized and analyzed. It is shown that some array processors are optimal in the framework of linear allocation of computations and in terms of number of processing elements and computing time. (author)

  11. Custom Hardware Processor to Compute a Figure of Merit for the Fit of X-Ray Diffraction

    International Nuclear Information System (INIS)

    Gomez-Pulido, P.J.A.; Vega-Rodriguez, M.A.; Sanchez-Perez, J.M.; Sanchez-Bajo, F.; Santos, S.P.D.

    2008-01-01

    A custom processor based on re configurable hardware technology is proposed in order to compute the figure of merit used to measure the quality of the fit of X-ray diffraction peaks. As the experimental X-ray profiles can present many peaks severely overlapped, it is necessary to select the best model among a large set of reasonably good solutions. Determining the best solution is computationally intensive, because this is a hard combinatorial optimization problem. The proposed processors, working in parallel, increase the performance relative to a software implementation.

  12. Parallel algorithms and architecture for computation of manipulator forward dynamics

    Science.gov (United States)

    Fijany, Amir; Bejczy, Antal K.

    1989-01-01

    Parallel computation of manipulator forward dynamics is investigated. Considering three classes of algorithms for the solution of the problem, that is, the O(n), the O(n exp 2), and the O(n exp 3) algorithms, parallelism in the problem is analyzed. It is shown that the problem belongs to the class of NC and that the time and processors bounds are of O(log2/2n) and O(n exp 4), respectively. However, the fastest stable parallel algorithms achieve the computation time of O(n) and can be derived by parallelization of the O(n exp 3) serial algorithms. Parallel computation of the O(n exp 3) algorithms requires the development of parallel algorithms for a set of fundamentally different problems, that is, the Newton-Euler formulation, the computation of the inertia matrix, decomposition of the symmetric, positive definite matrix, and the solution of triangular systems. Parallel algorithms for this set of problems are developed which can be efficiently implemented on a unique architecture, a triangular array of n(n+2)/2 processors with a simple nearest-neighbor interconnection. This architecture is particularly suitable for VLSI and WSI implementations. The developed parallel algorithm, compared to the best serial O(n) algorithm, achieves an asymptotic speedup of more than two orders-of-magnitude in the computation the forward dynamics.

  13. Parallelization of simulation code for liquid-gas model of lattice-gas fluid

    International Nuclear Information System (INIS)

    Kawai, Wataru; Ebihara, Kenichi; Kume, Etsuo; Watanabe, Tadashi

    2000-03-01

    A simulation code for hydrodynamical phenomena which is based on the liquid-gas model of lattice-gas fluid is parallelized by using MPI (Message Passing Interface) library. The parallelized code can be applied to the larger size of the simulations than the non-parallelized code. The calculation times of the parallelized code on VPP500 (Vector-Parallel super computer with dispersed memory units), AP3000 (Scalar-parallel server with dispersed memory units), and a workstation cluster decreased in inverse proportion to the number of processors. (author)

  14. Fast robot kinematics modeling by using a parallel simulator (PSIM)

    International Nuclear Information System (INIS)

    El-Gazzar, H.M.; Ayad, N.M.A.

    2002-01-01

    High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done

  15. Fast robot kinematics modeling by using a parallel simulator (PSIM)

    Energy Technology Data Exchange (ETDEWEB)

    El-Gazzar, H M; Ayad, N M.A. [Atomic Energy Authority, Reactor Dept., Computer and Control Lab., P.O. Box no 13759 (Egypt)

    2002-09-15

    High-speed computers are strongly needed not only for solving scientific and engineering problems, but also for numerous industrial applications. Such applications include computer-aided design, oil exploration, weather predication, space applications and safety of nuclear reactors. The rapid development in VLSI technology makes it possible to implement time consuming algorithms in real-time situations. Parallel processing approaches can now be used to reduce the processing-time for models of very high mathematical structure such as the kinematics molding of robot manipulator. This system is used to construct and evaluate the performance and cost effectiveness of several proposed methods to solve the Jacobian algorithm. Parallelism is introduced to the algorithms by using different task-allocations and dividing the whole job into sub tasks. Detailed analysis is performed and results are obtained for the case of six DOF (degree of freedom) robot arms (Stanford Arm). Execution times comparisons between Von Neumann (uni processor) and parallel processor architectures by using parallel simulator package (PSIM) are presented. The gained results are much in favour for the parallel techniques by at least fifty-percent improvements. Of course, further studies are needed to achieve the convenient and optimum number of processors has to be done.

  16. Rubus: A compiler for seamless and extensible parallelism.

    Directory of Open Access Journals (Sweden)

    Muhammad Adnan

    Full Text Available Nowadays, a typical processor may have multiple processing cores on a single chip. Furthermore, a special purpose processing unit called Graphic Processing Unit (GPU, originally designed for 2D/3D games, is now available for general purpose use in computers and mobile devices. However, the traditional programming languages which were designed to work with machines having single core CPUs, cannot utilize the parallelism available on multi-core processors efficiently. Therefore, to exploit the extraordinary processing power of multi-core processors, researchers are working on new tools and techniques to facilitate parallel programming. To this end, languages like CUDA and OpenCL have been introduced, which can be used to write code with parallelism. The main shortcoming of these languages is that programmer needs to specify all the complex details manually in order to parallelize the code across multiple cores. Therefore, the code written in these languages is difficult to understand, debug and maintain. Furthermore, to parallelize legacy code can require rewriting a significant portion of code in CUDA or OpenCL, which can consume significant time and resources. Thus, the amount of parallelism achieved is proportional to the skills of the programmer and the time spent in code optimizations. This paper proposes a new open source compiler, Rubus, to achieve seamless parallelism. The Rubus compiler relieves the programmer from manually specifying the low-level details. It analyses and transforms a sequential program into a parallel program automatically, without any user intervention. This achieves massive speedup and better utilization of the underlying hardware without a programmer's expertise in parallel programming. For five different benchmarks, on average a speedup of 34.54 times has been achieved by Rubus as compared to Java on a basic GPU having only 96 cores. Whereas, for a matrix multiplication benchmark the average execution speedup of 84

  17. A task parallel implementation of fast multipole methods

    KAUST Repository

    Taura, Kenjiro

    2012-11-01

    This paper describes a task parallel implementation of ExaFMM, an open source implementation of fast multipole methods (FMM), using a lightweight task parallel library MassiveThreads. Although there have been many attempts on parallelizing FMM, experiences have almost exclusively been limited to formulation based on flat homogeneous parallel loops. FMM in fact contains operations that cannot be readily expressed in such conventional but restrictive models. We show that task parallelism, or parallel recursions in particular, allows us to parallelize all operations of FMM naturally and scalably. Moreover it allows us to parallelize a \\'\\'mutual interaction\\'\\' for force/potential evaluation, which is roughly twice as efficient as a more conventional, unidirectional force/potential evaluation. The net result is an open source FMM that is clearly among the fastest single node implementations, including those on GPUs; with a million particles on a 32 cores Sandy Bridge 2.20GHz node, it completes a single time step including tree construction and force/potential evaluation in 65 milliseconds. The study clearly showcases both programmability and performance benefits of flexible parallel constructs over more monolithic parallel loops. © 2012 IEEE.

  18. Fundamental Parallel Algorithms for Private-Cache Chip Multiprocessors

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Nelson, Michael

    2008-01-01

    about the way cores are interconnected, for we assume that all inter-processor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present...... two sorting algorithms, a distribution sort and a mergesort. Our algorithms are asymptotically optimal in terms of parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks....... In addition, we study sorting lower bounds in a computational model, which we call the parallel external-memory (PEM) model, that formalizes the essential properties of our algorithms for private-cache CMPs....

  19. Parallel discrete-event simulation of FCFS stochastic queueing networks

    Science.gov (United States)

    Nicol, David M.

    1988-01-01

    Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.

  20. Lipsi: Probably the Smallest Processor in the World

    DEFF Research Database (Denmark)

    Schoeberl, Martin

    2018-01-01

    While research on high-performance processors is important, it is also interesting to explore processor architectures at the other end of the spectrum: tiny processor cores for auxiliary functions. While it is common to implement small circuits for such functions, such as a serial port, in dedica...... at a minimal cost....