WorldWideScience

Sample records for distributed-memory multiprocessor computers

  1. Multiprocessor shared-memory information exchange

    International Nuclear Information System (INIS)

    Santoline, L.L.; Bowers, M.D.; Crew, A.W.; Roslund, C.J.; Ghrist, W.D. III

    1989-01-01

    In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, is designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange

  2. Distributed parallel messaging for multiprocessor systems

    Science.gov (United States)

    Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

    2013-06-04

    A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.

  3. A multiprocessor computer simulation model employing a feedback scheduler/allocator for memory space and bandwidth matching and TMR processing

    Science.gov (United States)

    Bradley, D. B.; Irwin, J. D.

    1974-01-01

    A computer simulation model for a multiprocessor computer is developed that is useful for studying the problem of matching multiprocessor's memory space, memory bandwidth and numbers and speeds of processors with aggregate job set characteristics. The model assumes an input work load of a set of recurrent jobs. The model includes a feedback scheduler/allocator which attempts to improve system performance through higher memory bandwidth utilization by matching individual job requirements for space and bandwidth with space availability and estimates of bandwidth availability at the times of memory allocation. The simulation model includes provisions for specifying precedence relations among the jobs in a job set, and provisions for specifying precedence execution of TMR (Triple Modular Redundant and SIMPLEX (non redundant) jobs.

  4. Ring interconnection for distributed memory automation and computing system

    Energy Technology Data Exchange (ETDEWEB)

    Vinogradov, V I [Inst. for Nuclear Research of the Russian Academy of Sciences, Moscow (Russian Federation)

    1996-12-31

    Problems of development of measurement, acquisition and central systems based on a distributed memory and a ring interface are discussed. It has been found that the RAM LINK-type protocol can be used for ringlet links in non-symmetrical distributed memory architecture multiprocessor system interaction. 5 refs.

  5. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-05-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. 6 figs

  6. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    International Nuclear Information System (INIS)

    Nash, T.

    1989-01-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. (orig.)

  7. Event parallelism: Distributed memory parallel computing for high energy physics experiments

    Science.gov (United States)

    Nash, Thomas

    1989-12-01

    This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.

  8. Assessing Programming Costs of Explicit Memory Localization on a Large Scale Shared Memory Multiprocessor

    Directory of Open Access Journals (Sweden)

    Silvio Picano

    1992-01-01

    Full Text Available We present detailed experimental work involving a commercially available large scale shared memory multiple instruction stream-multiple data stream (MIMD parallel computer having a software controlled cache coherence mechanism. To make effective use of such an architecture, the programmer is responsible for designing the program's structure to match the underlying multiprocessors capabilities. We describe the techniques used to exploit our multiprocessor (the BBN TC2000 on a network simulation program, showing the resulting performance gains and the associated programming costs. We show that an efficient implementation relies heavily on the user's ability to explicitly manage the memory system.

  9. Multigrid solution of diffusion equations on distributed memory multiprocessor systems

    International Nuclear Information System (INIS)

    Finnemann, H.

    1988-01-01

    The subject is the solution of partial differential equations for simulation of the reactor core on high-performance computers. The parallelization and implementation of nodal multigrid diffusion algorithms on array and ring configurations of the DIRMU multiprocessor system is outlined. The particular iteration scheme employed in the nodal expansion method appears similarly efficient in serial and parallel environments. The combination of modern multi-level techniques with innovative hardware (vector-multiprocessor systems) provides powerful tools needed for real time simulation of physical systems. The parallel efficiencies range from 70 to 90%. The same performance is estimated for large problems on large multiprocessor systems being designed at present. (orig.) [de

  10. Communication and Memory Architecture Design of Application-Specific High-End Multiprocessors

    Directory of Open Access Journals (Sweden)

    Yahya Jan

    2012-01-01

    Full Text Available This paper is devoted to the design of communication and memory architectures of massively parallel hardware multiprocessors necessary for the implementation of highly demanding applications. We demonstrated that for the massively parallel hardware multiprocessors the traditionally used flat communication architectures and multi-port memories do not scale well, and the memory and communication network influence on both the throughput and circuit area dominates the processors influence. To resolve the problems and ensure scalability, we proposed to design highly optimized application-specific hierarchical and/or partitioned communication and memory architectures through exploring and exploiting the regularity and hierarchy of the actual data flows of a given application. Furthermore, we proposed some data distribution and related data mapping schemes in the shared (global partitioned memories with the aim to eliminate the memory access conflicts, as well as, to ensure that our communication design strategies will be applicable. We incorporated these architecture synthesis strategies into our quality-driven model-based multi-processor design method and related automated architecture exploration framework. Using this framework, we performed a large series of experiments that demonstrate many various important features of the synthesized memory and communication architectures. They also demonstrate that our method and related framework are able to efficiently synthesize well scalable memory and communication architectures even for the high-end multiprocessors. The gains as high as 12-times in performance and 25-times in area can be obtained when using the hierarchical communication networks instead of the flat networks. However, for the high parallelism levels only the partitioned approach ensures the scalability in performance.

  11. GOTHIC memory management : a multiprocessor shared single level store

    OpenAIRE

    Michel , Béatrice

    1990-01-01

    Gothic purpose is to build an object-oriented fault-tolerant distributed operating system for a local area network of multiprocessor workstations. This paper describes Gothic memory manager. It realizes the sharing of the secondary memory space between any process running on the Gothic system. Processes on different processors can communicate by sharing permanent information. The manager implements a shared single level storage with an invalidation protocol working on disk-pages to maintain s...

  12. DiFX: A software correlator for very long baseline interferometry using multi-processor computing environments

    OpenAIRE

    Deller, A. T.; Tingay, S. J.; Bailes, M.; West, C.

    2007-01-01

    We describe the development of an FX style correlator for Very Long Baseline Interferometry (VLBI), implemented in software and intended to run in multi-processor computing environments, such as large clusters of commodity machines (Beowulf clusters) or computers specifically designed for high performance computing, such as multi-processor shared-memory machines. We outline the scientific and practical benefits for VLBI correlation, these chiefly being due to the inherent flexibility of softw...

  13. A general model for memory interference in a multiprocessor system with memory hierarchy

    Science.gov (United States)

    Taha, Badie A.; Standley, Hilda M.

    1989-01-01

    The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.

  14. Parallel-vector algorithms for particle simulations on shared-memory multiprocessors

    International Nuclear Information System (INIS)

    Nishiura, Daisuke; Sakaguchi, Hide

    2011-01-01

    Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.

  15. Elastic pointer directory organization for scalable shared memory multiprocessors

    Institute of Scientific and Technical Information of China (English)

    Yuhang Liu; Mingfa Zhu; Limin Xiao

    2014-01-01

    In the field of supercomputing, one key issue for scal-able shared-memory multiprocessors is the design of the directory which denotes the sharing state for a cache block. A good direc-tory design intends to achieve three key attributes: reasonable memory overhead, sharer position precision and implementation complexity. However, researchers often face the problem that gain-ing one attribute may result in losing another. The paper proposes an elastic pointer directory (EPD) structure based on the analysis of shared-memory applications, taking the fact that the number of sharers for each directory entry is typical y smal . Analysis re-sults show that for 4 096 nodes, the ratio of memory overhead to the ful-map directory is 2.7%. Theoretical analysis and cycle-accurate execution-driven simulations on a 16 and 64-node cache coherence non uniform memory access (CC-NUMA) multiproces-sor show that the corresponding pointer overflow probability is reduced significantly. The performance is observed to be better than that of a limited pointers directory and almost identical to the ful-map directory, except for the slight implementation complex-ity. Using the directory cache to explore directory access locality is also studied. The experimental result shows that this is a promis-ing approach to be used in the state-of-the-art high performance computing domain.

  16. The ACP [Advanced Computer Program] multiprocessor system at Fermilab

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.

    1986-09-01

    The Advanced Computer Program at Fermilab has developed a multiprocessor system which is easy to use and uniquely cost effective for many high energy physics problems. The system is based on single board computers which cost under $2000 each to build including 2 Mbytes of on board memory. These standard VME modules each run experiment reconstruction code in Fortran at speeds approaching that of a VAX 11/780. Two versions have been developed: one uses Motorola's 68020 32 bit microprocessor, the other runs with AT and T's 32100. both include the corresponding floating point coprocessor chip. The first system, when fully configured, uses 70 each of the two types of processors. A 53 processor system has been operated for several months with essentially no down time by computer operators in the Fermilab Computer Center, performing at nearly the capacity of 6 CDC Cyber 175 mainframe computers. The VME crates in which the processing ''nodes'' sit are connected via a high speed ''Branch Bus'' to one or more MicroVAX computers which act as hosts handling system resource management and all I/O in offline applications. An interface from Fastbus to the Branch Bus has been developed for online use which has been tested error free at 20 Mbytes/sec for 48 hours. ACP hardware modules are now available commercially. A major package of software, including a simulator that runs on any VAX, has been developed. It allows easy migration of existing programs to this multiprocessor environment. This paper describes the ACP Multiprocessor System and early experience with it at Fermilab and elsewhere

  17. The ACP (Advanced Computer Program) multiprocessor system at Fermilab

    Energy Technology Data Exchange (ETDEWEB)

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Case, G.; Cook, A.; Fischler, M.; Gaines, I.; Hance, R.; Husby, D.

    1986-09-01

    The Advanced Computer Program at Fermilab has developed a multiprocessor system which is easy to use and uniquely cost effective for many high energy physics problems. The system is based on single board computers which cost under $2000 each to build including 2 Mbytes of on board memory. These standard VME modules each run experiment reconstruction code in Fortran at speeds approaching that of a VAX 11/780. Two versions have been developed: one uses Motorola's 68020 32 bit microprocessor, the other runs with AT and T's 32100. both include the corresponding floating point coprocessor chip. The first system, when fully configured, uses 70 each of the two types of processors. A 53 processor system has been operated for several months with essentially no down time by computer operators in the Fermilab Computer Center, performing at nearly the capacity of 6 CDC Cyber 175 mainframe computers. The VME crates in which the processing ''nodes'' sit are connected via a high speed ''Branch Bus'' to one or more MicroVAX computers which act as hosts handling system resource management and all I/O in offline applications. An interface from Fastbus to the Branch Bus has been developed for online use which has been tested error free at 20 Mbytes/sec for 48 hours. ACP hardware modules are now available commercially. A major package of software, including a simulator that runs on any VAX, has been developed. It allows easy migration of existing programs to this multiprocessor environment. This paper describes the ACP Multiprocessor System and early experience with it at Fermilab and elsewhere.

  18. Distributed-memory matrix computations

    DEFF Research Database (Denmark)

    Balle, Susanne Mølleskov

    1995-01-01

    The main goal of this project is to investigate, develop, and implement algorithms for numerical linear algebra on parallel computers in order to acquire expertise in methods for parallel computations. An important motivation for analyzaing and investigating the potential for parallelism in these......The main goal of this project is to investigate, develop, and implement algorithms for numerical linear algebra on parallel computers in order to acquire expertise in methods for parallel computations. An important motivation for analyzaing and investigating the potential for parallelism...... in these algorithms is that many scientific applications rely heavily on the performance of the involved dense linear algebra building blocks. Even though we consider the distributed-memory as well as the shared-memory programming paradigm, the major part of the thesis is dedicated to distributed-memory architectures....... We emphasize distributed-memory massively parallel computers - such as the Connection Machines model CM-200 and model CM-5/CM-5E - available to us at UNI-C and at Thinking Machines Corporation. The CM-200 was at the time this project started one of the few existing massively parallel computers...

  19. Debugging in a multi-processor environment

    International Nuclear Information System (INIS)

    Spann, J.M.

    1981-01-01

    The Supervisory Control and Diagnostic System (SCDS) for the Mirror Fusion Test Facility (MFTF) consists of nine 32-bit minicomputers arranged in a tightly coupled distributed computer system utilizing a share memory as the data exchange medium. Debugging of more than one program in the multi-processor environment is a difficult process. This paper describes what new tools were developed and how the testing of software is performed in the SCDS for the MFTF project

  20. Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

    Energy Technology Data Exchange (ETDEWEB)

    Ohmacht, Martin

    2017-08-15

    In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.

  1. Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

    Science.gov (United States)

    Ohmacht, Martin

    2014-09-09

    In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.

  2. Matrix factorization on a hypercube multiprocessor

    International Nuclear Information System (INIS)

    Geist, G.A.; Heath, M.T.

    1985-08-01

    This paper is concerned with parallel algorithms for matrix factorization on distributed-memory, message-passing multiprocessors, with special emphasis on the hypercube. Both Cholesky factorization of symmetric positive definite matrices and LU factorization of nonsymmetric matrices using partial pivoting are considered. The use of the resulting triangular factors to solve systems of linear equations by forward and back substitutions is also considered. Efficiencies of various parallel computational approaches are compared in terms of empirical results obtained on an Intel iPSC hypercube. 19 refs., 6 figs., 2 tabs

  3. Optical RAM-enabled cache memory and optical routing for chip multiprocessors: technologies and architectures

    Science.gov (United States)

    Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.

    2014-03-01

    The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.

  4. Multiprocessor systems and their concurrency

    Energy Technology Data Exchange (ETDEWEB)

    Starke, P H

    1984-01-01

    A multiprocessor system can be considered as a collection of finite automata which communicate over channels or shared memory units. The behaviour of such a system can be described by a semilanguage. This approach allows to define a numerical measure for the concurrency of multiprocessor systems and of distributed systems. This measure is characterized algebraically and the reconfiguration problem asking for an algorithm to construct an l-processor system which is equivalent to a given n-processor system is solved in the paper. 6 references.

  5. Utilizing a multiprocessor architecture - The performance of MIDAS

    International Nuclear Information System (INIS)

    Maples, C.; Logan, D.; Meng, J.; Rathbun, W.; Weaver, D.

    1983-01-01

    The MIDAS architecture organizes multiple CPUs into clusters called distributed subsystems. Each subsystem consists of an array of processors controlled by a supervisory CPU. The multiprocessor array is composed of commercial CPUs (with floating point hardware) and specialized processing elements. Interprocessor communication within the array may occur either through switched memory modules or common shared memory. The architecture permits multiple processors to be focused on single problems. A distributed subsystem has been constructed and tested. It currently consists of a supervisor CPU; 16 blocks of independently switchable memory; 9 general purpose, VAX-class CPUs; and 2 specialized pipelined processors to handle I/O. Results on a variety of problems indicate that the subsystem performs 8 to 15 times faster than a standard computer with an identical CPU. The difference in performance represents the effect of differing CPU and I/O requirements

  6. The structural robustness of multiprocessor computing system

    Directory of Open Access Journals (Sweden)

    N. Andronaty

    1996-03-01

    Full Text Available The model of the multiprocessor computing system on the base of transputers which permits to resolve the question of valuation of a structural robustness (viability, survivability is described.

  7. Shared random access memory resource for multiprocessor real-time systems

    International Nuclear Information System (INIS)

    Dimmler, D.G.; Hardy, W.H. II

    1977-01-01

    A shared random-access memory resource is described which is used within real-time data acquisition and control systems with multiprocessor and multibus organizations. Hardware and software aspects are discussed in a specific example where interconnections are done via a UNIBUS. The general applicability of the approach is also discussed

  8. Multiprocessor development for robot control

    International Nuclear Information System (INIS)

    Lee, Jong Min; Kim, Byung Soo; Kim, Chang Hoi; Hwang, Suk Yong; Sohn, Surg Won; Yoon, Tae Seob; Lee, Yong Bum; Kim, Woong Ki

    1988-02-01

    A mutiprocessor system that is essential to A.I. (Artificial Intelligence) robot control was developed. A.I. robot control needs very complex real time control. The multiprocessor system interconnecting many SBC's (Single Board Computer) is much faster and accurater than using only one SBC. Various multiprocessor systems and their applications were compared and discussed. The multiprocessor architecture system is specially designed to be used in nuclear environments. The main functions are job distribution, multitasking, and intelligent remote control by SDLC protocol using optical fiber. The system can be applied to position control for locomotion and manipulation, data fusion system, and image processing. (Author)

  9. File-System Workload on a Scientific Multiprocessor

    Science.gov (United States)

    Kotz, David; Nieuwejaar, Nils

    1995-01-01

    Many scientific applications have intense computational and I/O requirements. Although multiprocessors have permitted astounding increases in computational performance, the formidable I/O needs of these applications cannot be met by current multiprocessors a their I/O subsystems. To prevent I/O subsystems from forever bottlenecking multiprocessors and limiting the range of feasible applications, new I/O subsystems must be designed. The successful design of computer systems (both hardware and software) depends on a thorough understanding of their intended use. A system designer optimizes the policies and mechanisms for the cases expected to most common in the user's workload. In the case of multiprocessor file systems, however, designers have been forced to build file systems based only on speculation about how they would be used, extrapolating from file-system characterizations of general-purpose workloads on uniprocessor and distributed systems or scientific workloads on vector supercomputers (see sidebar on related work). To help these system designers, in June 1993 we began the Charisma Project, so named because the project sought to characterize 1/0 in scientific multiprocessor applications from a variety of production parallel computing platforms and sites. The Charisma project is unique in recording individual read and write requests-in live, multiprogramming, parallel workloads (rather than from selected or nonparallel applications). In this article, we present the first results from the project: a characterization of the file-system workload an iPSC/860 multiprocessor running production, parallel scientific applications at NASA's Ames Research Center.

  10. Parallelising a molecular dynamics algorithm on a multi-processor workstation

    Science.gov (United States)

    Müller-Plathe, Florian

    1990-12-01

    The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.

  11. Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT

    Energy Technology Data Exchange (ETDEWEB)

    Secchi, Simone; Tumeo, Antonino; Villa, Oreste

    2011-07-27

    Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy in reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.

  12. Multiprocessor architecture: Synthesis and evaluation

    Science.gov (United States)

    Standley, Hilda M.

    1990-01-01

    Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.

  13. On the Parallel Elliptic Single/Multigrid Solutions about Aligned and Nonaligned Bodies Using the Virtual Machine for Multiprocessors

    Directory of Open Access Journals (Sweden)

    A. Averbuch

    1994-01-01

    Full Text Available Parallel elliptic single/multigrid solutions around an aligned and nonaligned body are presented and implemented on two multi-user and single-user shared memory multiprocessors (Sequent Symmetry and MOS and on a distributed memory multiprocessor (a Transputer network. Our parallel implementation uses the Virtual Machine for Muli-Processors (VMMP, a software package that provides a coherent set of services for explicitly parallel application programs running on diverse multiple instruction multiple data (MIMD multiprocessors, both shared memory and message passing. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. Furthermore, it ensures high portability of application programs by implementing the same services on all target multiprocessors. The performance of our algorithm is investigated in detail. It is seen to fit well the above architectures when the number of processors is less than the maximal number of grid points along the axes. In general, the efficiency in the nonaligned case is higher than in the aligned case. Alignment overhead is observed to be up to 200% in the shared-memory case and up to 65% in the message-passing case. We have demonstrated that when using VMMP, the portability of the algorithms is straightforward and efficient.

  14. Multiprocessor development for robot control

    International Nuclear Information System (INIS)

    Lee, Jong Min; Kim, Seung Ho; Hwang, Suk Yeoung; Sohn, Surg Won; Kim, Byung Soo; Kim, Chang Hoi; Lee, Yong Bum; Kim, Woong Ki

    1988-12-01

    The object of this project is to develop a multiprocessor system which is essential to robot technology. A multiprocessor system interconnecting many single board computer is much faster and flexible than a single processor. The developed multiprocessor will be used to control nuclear mobile robot, so a loosely coupled system is adopted as a robot controller. A total configuration of controller is divided into three main parts in related with its function. It is consisted of supervisory control part, functional control part, remote control part. The designed control system is to be expanded easily for further use with a modular architecture, so the functional independency within sub-systems can be obtained throughout the system structure. Electromagnetic interference affecting to the control system is minimized by using optical fiber as communication media between robot and control system. System performances is enhanced not only by using distributed architecture in hardware, but by adopting real-time, multi-tasking operating system in software. The iRMX86 OS is used and reconfigured for real-time, multi-tasking operation. RS-485 serial communication protocol is used between functional control part and remote control part. Since the developed multiprocessor control system is an essential and fundamental technology for artificial intelligent robot, the result of this project can be applied directly to nuclear mobile robot. (Author)

  15. Real-Time Multiprocessor Programming Language (RTMPL) user's manual

    Science.gov (United States)

    Arpasi, D. J.

    1985-01-01

    A real-time multiprocessor programming language (RTMPL) has been developed to provide for high-order programming of real-time simulations on systems of distributed computers. RTMPL is a structured, engineering-oriented language. The RTMPL utility supports a variety of multiprocessor configurations and types by generating assembly language programs according to user-specified targeting information. Many programming functions are assumed by the utility (e.g., data transfer and scaling) to reduce the programming chore. This manual describes RTMPL from a user's viewpoint. Source generation, applications, utility operation, and utility output are detailed. An example simulation is generated to illustrate many RTMPL features.

  16. Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

    Science.gov (United States)

    Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

    1990-01-01

    Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.

  17. A compositional reservoir simulator on distributed memory parallel computers

    International Nuclear Information System (INIS)

    Rame, M.; Delshad, M.

    1995-01-01

    This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. A portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented

  18. Distributed power management of real-time applications on a GALS multiprocessor SOC

    NARCIS (Netherlands)

    Nelson, Andrew; Goossens, Kees

    2015-01-01

    It is generally desirable to reduce the power consumption of embedded systems. Dynamic Voltage and Frequency Scaling (DVFS) is a commonly applied technique to achieve power reduction at the cost of computational performance. Multiprocessor System on Chips (MPSoCs) can have multiple voltage and

  19. Hardware support for CSP on a Java chip multiprocessor

    DEFF Research Database (Denmark)

    Gruian, Flavius; Schoeberl, Martin

    2013-01-01

    Due to memory bandwidth limitations, chip multiprocessors (CMPs) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded...... applications. Programmatically, the Communicating Sequential Processes (CSPs) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on......-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory.The presented solution is scalable and also specific for our limited resources and real-time predictability requirements. CMP architectures of three to eight processors were...

  20. A possible approach to estimating the operational efficiency of multiprocessor systems

    International Nuclear Information System (INIS)

    Kuznetsov, N.Y.; Gorlach, S.P.; Sumskaya, A.A.

    1984-01-01

    This article presents a mathematical model that constructs the upper and lower estimates evaluating the efficiency of solution of a large class of problems using a multiprocessor system with a specific architecture. Efficiency depends on a system's architecture (e.g., the number of processors, memory volume, the number of communication links, commutation speed) and the types of problems it is intended to solve. The behavior of the model is considered in a stationary mode. The model is used to evaluate the efficiency of a particular algorithm implemented in a multiprocessor system. It is concluded that the model is flexible and enables the investigation of a broad class of problems in computational mathematics, including linear algebra and boundary-value problems of mathematical physics

  1. Parallel implementation and evaluation of motion estimation system algorithms on a distributed memory multiprocessor using knowledge based mappings

    Science.gov (United States)

    Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

    1989-01-01

    Several techniques to perform static and dynamic load balancing techniques for vision systems are presented. These techniques are novel in the sense that they capture the computational requirements of a task by examining the data when it is produced. Furthermore, they can be applied to many vision systems because many algorithms in different systems are either the same, or have similar computational characteristics. These techniques are evaluated by applying them on a parallel implementation of the algorithms in a motion estimation system on a hypercube multiprocessor system. The motion estimation system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from different time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters. It is shown that the performance gains when these data decomposition and load balancing techniques are used are significant and the overhead of using these techniques is minimal.

  2. Performances of multiprocessor multidisk architectures for continuous media storage

    Science.gov (United States)

    Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.

    1996-03-01

    Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.

  3. Some algorithms for the solution of the symmetric eigenvalue problem on a multiprocessor electronic computer

    International Nuclear Information System (INIS)

    Molchanov, I.N.; Khimich, A.N.

    1984-01-01

    This article shows how a reflection method can be used to find the eigenvalues of a matrix by transforming the matrix to tridiagonal form. The method of conjugate gradients is used to find the smallest eigenvalue and the corresponding eigenvector of symmetric positive-definite band matrices. Topics considered include the computational scheme of the reflection method, the organization of parallel calculations by the reflection method, the computational scheme of the conjugate gradient method, the organization of parallel calculations by the conjugate gradient method, and the effectiveness of parallel algorithms. It is concluded that it is possible to increase the overall effectiveness of the multiprocessor electronic computers by either letting the newly available processors of a new problem operate in the multiprocessor mode, or by improving the coefficient of uniform partition of the original information

  4. A simple multiprocessor management system for event-parallel computing

    International Nuclear Information System (INIS)

    Bracker, S.; Gounder, K.; Hendrix, K.; Summers, D.

    1996-01-01

    Offline software using Transmission Control Protocol/Internet Protocol (TCP/IP) sockets to distribute particle physics events to multiple UNIX/RISC workstations is described. A modular, building block approach was taken that allowed tailoring to solve specific tasks efficiently and simply as they arose. The modest, initial cost was having to learn about sockets for interprocess communication. This multiprocessor management software has been used to control the reconstruction of eight billion raw data events from Fermilab Experiment E791

  5. Modeling and Analyzing Real-Time Multiprocessor Systems

    NARCIS (Netherlands)

    Wiggers, M.H.; Thiele, Lothar; Lee, Edward A.; Schlieker, Simon; Bekooij, Marco Jan Gerrit

    2010-01-01

    Researchers have proposed approaches to verify that real-time multiprocessor systems meet their timeliness constraints. These approaches make assumptions on the model of computation, the load placed on the multiprocessor system, and the faults that can arise. This heterogeneous set of assumptions

  6. Multiprocessors for high energy physics

    International Nuclear Information System (INIS)

    Pohl, M.

    1987-01-01

    I review the role, status and progress of multiprocessor projects relevant to high energy physics. A short overview of the large variety of multiprocessors architectures is given, with special emphasis on machines suitable for experimental data reconstruction. A lot of progress has been made in the attempt to make the use of multiprocessors less painful by creating a ''Parallel Programming Environment'' supporting the non-expert user. A high degree of usability has been reached for coarse grain (event level) parallelism. The program development tools available on various systems (subroutine packages, preprocessors and parallelizing compilers) are discussed in some detail. Tools for execution control and debugging are also developing, thus opening the path from dedicated systems for large scale, stable production towards support of a more general job mix. At medium term, multiprocessors will thus cover a growing fraction of the typical high energy physics computing task. (orig.)

  7. Distributed Memory Parallel Computing with SEAWAT

    Science.gov (United States)

    Verkaik, J.; Huizer, S.; van Engelen, J.; Oude Essink, G.; Ram, R.; Vuik, K.

    2017-12-01

    Fresh groundwater reserves in coastal aquifers are threatened by sea-level rise, extreme weather conditions, increasing urbanization and associated groundwater extraction rates. To counteract these threats, accurate high-resolution numerical models are required to optimize the management of these precious reserves. The major model drawbacks are long run times and large memory requirements, limiting the predictive power of these models. Distributed memory parallel computing is an efficient technique for reducing run times and memory requirements, where the problem is divided over multiple processor cores. A new Parallel Krylov Solver (PKS) for SEAWAT is presented. PKS has recently been applied to MODFLOW and includes Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) linear accelerators. Both accelerators are preconditioned by an overlapping additive Schwarz preconditioner in a way that: a) subdomains are partitioned using Recursive Coordinate Bisection (RCB) load balancing, b) each subdomain uses local memory only and communicates with other subdomains by Message Passing Interface (MPI) within the linear accelerator, c) it is fully integrated in SEAWAT. Within SEAWAT, the PKS-CG solver replaces the Preconditioned Conjugate Gradient (PCG) solver for solving the variable-density groundwater flow equation and the PKS-BiCGSTAB solver replaces the Generalized Conjugate Gradient (GCG) solver for solving the advection-diffusion equation. PKS supports the third-order Total Variation Diminishing (TVD) scheme for computing advection. Benchmarks were performed on the Dutch national supercomputer (https://userinfo.surfsara.nl/systems/cartesius) using up to 128 cores, for a synthetic 3D Henry model (100 million cells) and the real-life Sand Engine model ( 10 million cells). The Sand Engine model was used to investigate the potential effect of the long-term morphological evolution of a large sand replenishment and climate change on fresh groundwater resources

  8. Using a commercial symmetric multiprocessor for lattice QCD

    International Nuclear Information System (INIS)

    Brower, R.C.; Chen, D.; Negele, J.W.

    1998-01-01

    In its evolution, the computer industry has reached the point when considerable computing power can be packaged on a single microprocessor chip. At the same time, costs of designing a computer system around such a CPU are growing. For these reasons we decided to explore a possibility of using commercially available symmetric multiprocessors (SMP) as building blocks for the LQCD computer. Careful analysis of the architecture allowed us to build a QCD primitive library running close to the peak performance on the UltraSPARC processor. As a result, multithreaded QCD code (both the heatbath and the Wilson fermion inverter) runs at about 50% efficiency on a single SMP. The communication between different CPUs is handled by a coherent memory system. Currently we are planning to connect several SMPs with a high bandwidth network into a single system. (orig.)

  9. A survey of Tumult, a real-time multi-processor system

    International Nuclear Information System (INIS)

    Jansen, P.G.

    1986-01-01

    Tumult (Twente University MULTi processor system) is the name of an ongoing project aiming at the design and implementation of a modular extendible multiprocessor system. All memory is distributed and processors communicate in parallel via a fast and reliable local switching network instead of a shared bus. A distributed real-time operating system is being designed and implemented, consisting of a multi-tasking subsystem per processor. Processes can communicate via a message passing mechanism. Communication links and processes are dynamically created and disposed by the application. In this article a brief description of the system is given; communication aspects are emphasized. (Auth.)

  10. Recommending the heterogeneous cluster type multi-processor system computing

    International Nuclear Information System (INIS)

    Iijima, Nobukazu

    2010-01-01

    Real-time reactor simulator had been developed by reusing the equipment of the Musashi reactor and its performance improvement became indispensable for research tools to increase sampling rate with introduction of arithmetic units using multi-Digital Signal Processor(DSP) system (cluster). In order to realize the heterogeneous cluster type multi-processor system computing, combination of two kinds of Control Processor (CP) s, Cluster Control Processor (CCP) and System Control Processor (SCP), were proposed with Large System Control Processor (LSCP) for hierarchical cluster if needed. Faster computing performance of this system was well evaluated by simulation results for simultaneous execution of plural jobs and also pipeline processing between clusters, which showed the system led to effective use of existing system and enhancement of the cost performance. (T. Tanaka)

  11. Shared Memory Parallelization of an Implicit ADI-type CFD Code

    Science.gov (United States)

    Hauser, Th.; Huang, P. G.

    1999-01-01

    A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.

  12. Software for the ACP [Advanced Computer Program] multiprocessor system

    International Nuclear Information System (INIS)

    Biel, J.; Areti, H.; Atac, R.

    1987-01-01

    Software has been developed for use with the Fermilab Advanced Computer Program (ACP) multiprocessor system. The software was designed to make a system of a hundred independent node processors as easy to use as a single, powerful CPU. Subroutines have been developed by which a user's host program can send data to and get results from the program running in each of his ACP node processors. Utility programs make it easy to compile and link host and node programs, to debug a node program on an ACP development system, and to submit a debugged program to an ACP production system

  13. Embedded multiprocessors scheduling and synchronization

    CERN Document Server

    Sriram, Sundararajan

    2009-01-01

    Techniques for Optimizing Multiprocessor Implementations of Signal Processing ApplicationsAn indispensable component of the information age, signal processing is embedded in a variety of consumer devices, including cell phones and digital television, as well as in communication infrastructure, such as media servers and cellular base stations. Multiple programmable processors, along with custom hardware running in parallel, are needed to achieve the computation throughput required of such applications. Reviews important research in key areas related to the multiprocessor implementation of multi

  14. Performance of Multithreaded Chip Multiprocessors And Implications for Operating System Design

    OpenAIRE

    Fedorova, Alexandra; Seltzer, Margo I.; Small, Christopher A.; Nussbaum, Daniel

    2005-01-01

    An operating system’s design is often influenced by the architecture of the target hardware. While uniprocessor and multiprocessor architectures are well understood, such is not the case for multithreaded chip multiprocessors (CMT) – a new generation of processors designed to improve performance of memory-intensive applications. The first systems equipped with CMT processors are just becoming available, so it is critical that we now understand how to obtain the best performance from such syst...

  15. Runtime adaptive multi-processor system-on-chip: RAMPSoC

    OpenAIRE

    Göhringer, D.; Hübner, M.; Schatz, V.; Becker, J.

    2008-01-01

    Current trends in high performance computing show, that the usage of multiprocessor systems on chip are one approach for the requirements of computing intensive applications. The multiprocessor system on chip (MPSoC) approaches often provide a static and homogeneous infrastructure of networked microprocessor on the chip die. A novel idea in this research area is to introduce the dynamic adaptivity of reconfigurable hardware in order to provide a flexible heterogeneous set of processing elemen...

  16. Parallel External Memory Graph Algorithms

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

    2010-01-01

    In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of ¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....

  17. Computational cost of isogeometric multi-frontal solvers on parallel distributed memory machines

    KAUST Repository

    Woźniak, Maciej; Paszyński, Maciej R.; Pardo, D.; Dalcin, Lisandro; Calo, Victor M.

    2015-01-01

    This paper derives theoretical estimates of the computational cost for isogeometric multi-frontal direct solver executed on parallel distributed memory machines. We show theoretically that for the Cp-1 global continuity of the isogeometric solution

  18. Multiprocessor programming environment

    Energy Technology Data Exchange (ETDEWEB)

    Smith, M.B.; Fornaro, R.

    1988-12-01

    Programming tools and techniques have been well developed for traditional uniprocessor computer systems. The focus of this research project is on the development of a programming environment for a high speed real time heterogeneous multiprocessor system, with special emphasis on languages and compilers. The new tools and techniques will allow a smooth transition for programmers with experience only on single processor systems.

  19. Cache aware mapping of streaming apllications on a multiprocessor system-on-chip

    NARCIS (Netherlands)

    Moonen, A.J.M.; Bekooij, M.J.G.; Berg, van den R.M.J.; Meerbergen, van J.; Sciuto, D.; Peng, Z.

    2008-01-01

    Efficient use of the memory hierarchy is critical for achieving high performance in a multiprocessor system- on-chip. An external memory that is shared between processors is a bottleneck in current and future systems. Cache misses and a large cache miss penalty contribute to a low processor

  20. Multiprocessor based data acquisition system for radiation monitoring in nuclear reactors

    International Nuclear Information System (INIS)

    Pansare, M.G.; Narsaiah, A.; Anantha Krishnan, T.S.

    1989-01-01

    Expensive minicomputers are required for building powerful Data Acquisition Systems (DAS) capable of scanning and processing large number of signals in a real-time environment. However by using the inexpensive microprocessors in multiprocessor configuration it is possible to build DASs that are as powerful as minicomputer based systems at much lesser cost. This paper describes such a multiprocessor based DAS designed for acquiring data from various radiation monitoring instruments of a nuclear reactor. The system is built by using MULTIBUS standard boards based on intel 8086, 16 bit microprocessor, with local and shared memory. The system monitors upto 128 analog input channels, 64 digital input channels and actuates upto 128 digital output contacts. The system continuously checks for the alarm condition of the input channels and displays the alarm status on an ALARM CRT. Facility has been provided for the transfer of data to a central computer. At any instant of time, the information regarding different channels being monitored is available from the local console as well as through five remote terminals located at various places in the reactor building. (author)

  1. Development of a VME multi-processor system for plasma control at the JT-60 Upgrade

    International Nuclear Information System (INIS)

    Takahashi, M.; Kurihara, K.; Kawamata, Y.; Akasaka, H.; Kimura, T.

    1992-01-01

    Design and initial operation results are reported of a VME multi-processor system [1] for plasma control at a large fusion device named 'the JT-60 Upgrade' utilizing three 32-bit MC88100 based RISC computers and VME components. Development of the system was stimulated by faster and more accurate computation requirements for the plasma position and current control. The RISC computers operate at 25 MHz along with two cashe memories named MC88200. We newly developed VME bus modules of up/down counter, analog-to-digital converter and clock pulse generator for measuring magnetic field and coil current and for synchronizing the processing in the three RISCs and direct digital controllers (DDCs) of magnet power supplies. We also evaluated that the speed of the data transfer between the VME bus system and the DDCs through CAMAC highways satisfies the above requirements. In the initial operation of the JT-60 upgrade, it has been proved that the VME multi-processor system well controls the plasma position and current with a sampling period of 250 μsec and a delay of 500 μsec. (author)

  2. Sparse distributed memory overview

    Science.gov (United States)

    Raugh, Mike

    1990-01-01

    The Sparse Distributed Memory (SDM) project is investigating the theory and applications of massively parallel computing architecture, called sparse distributed memory, that will support the storage and retrieval of sensory and motor patterns characteristic of autonomous systems. The immediate objectives of the project are centered in studies of the memory itself and in the use of the memory to solve problems in speech, vision, and robotics. Investigation of methods for encoding sensory data is an important part of the research. Examples of NASA missions that may benefit from this work are Space Station, planetary rovers, and solar exploration. Sparse distributed memory offers promising technology for systems that must learn through experience and be capable of adapting to new circumstances, and for operating any large complex system requiring automatic monitoring and control. Sparse distributed memory is a massively parallel architecture motivated by efforts to understand how the human brain works. Sparse distributed memory is an associative memory, able to retrieve information from cues that only partially match patterns stored in the memory. It is able to store long temporal sequences derived from the behavior of a complex system, such as progressive records of the system's sensory data and correlated records of the system's motor controls.

  3. Safety-critical Java with cyclic executives on chip-multiprocessors

    DEFF Research Database (Denmark)

    Ravn, Anders P.; Schoeberl, Martin

    2012-01-01

    Chip-multiprocessors offer increased processing power at a low cost. However, in order to use them for real-time systems, tasks have to be scheduled efficiently and predictably. It is well known that finding optimal schedules is a computationally hard problem. In this paper we present a solution ...... for multiprocessors, we have implemented it in the context of safety-critical Java on a Java processor....

  4. On a Multiprocessor Computer Farm for Online Physics Data Processing

    CERN Document Server

    Sinanis, N J

    1999-01-01

    The topic of this thesis is the design-phase performance evaluation of a large multiprocessor (MP) computer farm intended for the on-line data processing of the Compact Muon Solenoid (CMS) experiment. CMS is a high energy Physics experiment, planned to operate at CERN (Geneva, Switzerland) during the year 2005. The CMS computer farm is consisting of 1,000 MP computer systems and a 1,000 X 1,000 communications switch. The followed approach to the farm performance evaluation is through simulation studies and evaluation of small prototype systems building blocks of the farm. For the purposes of the simulation studies, we have developed a discrete-event, event-driven simulator that is capable to describe the high-level architecture of the farm and give estimates of the farm's performance. The simulator is designed in a modular way to facilitate the development of various modules that model the behavior of the farm building blocks in the desired level of detail. With the aid of this simulator, we make a particular...

  5. Compiling for Application Specific Computational Acceleration in Reconfigurable Architectures Final Report CRADA No. TSB-2033-01

    Energy Technology Data Exchange (ETDEWEB)

    De Supinski, B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Caliga, D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2017-09-28

    The primary objective of this project was to develop memory optimization technology to efficiently deliver data to, and distribute data within, the SRC-6's Field Programmable Gate Array- ("FPGA") based Multi-Adaptive Processors (MAPs). The hardware/software approach was to explore efficient MAP configurations and generate the compiler technology to exploit those configurations. This memory accessing technology represents an important step towards making reconfigurable symmetric multi-processor (SMP) architectures that will be a costeffective solution for large-scale scientific computing.

  6. Distributed-Memory Fast Maximal Independent Set

    Energy Technology Data Exchange (ETDEWEB)

    Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew

    2017-09-13

    The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluate their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.

  7. Memory Hierarchy Design for Next Generation Scalable Many-core Platforms

    OpenAIRE

    Azarkhish, Erfan

    2016-01-01

    Performance and energy consumption in modern computing platforms is largely dominated by the memory hierarchy. The increasing computational power in the multiprocessors and accelerators, and the emergence of the data-intensive workloads (e.g. large-scale graph traversal and scientific algorithms) requiring fast transfer of large volumes of data, are two main trends which intensify this problem by putting even higher pressure on the memory hierarchy. This increasing gap between computation spe...

  8. Use of a genetic algorithm to solve two-fluid flow problems on an NCUBE multiprocessor computer

    International Nuclear Information System (INIS)

    Pryor, R.J.; Cline, D.D.

    1992-01-01

    A method of solving the two-phase fluid flow equations using a genetic algorithm on a NCUBE multiprocessor computer is presented. The topics discussed are the two-phase flow equations, the genetic representation of the unknowns, the fitness function, the genetic operators, and the implementation of the algorithm on the NCUBE computer. The efficiency of the implementation is investigated using a pipe blowdown problem. Effects of varying the genetic parameters and the number of processors are presented

  9. Use of a genetic agorithm to solve two-fluid flow problems on an NCUBE multiprocessor computer

    International Nuclear Information System (INIS)

    Pryor, R.J.; Cline, D.D.

    1993-01-01

    A method of solving the two-phases fluid flow equations using a genetic algorithm on a NCUBE multiprocessor computer is presented. The topics discussed are the two-phase flow equations, the genetic representation of the unkowns, the fitness function, the genetic operators, and the implementation of the algorithm on the NCUBE computer. The efficiency of the implementation is investigated using a pipe blowdown problem. Effects of varying the genetic parameters and the number of processors are presented. (orig.)

  10. Computational cost of isogeometric multi-frontal solvers on parallel distributed memory machines

    KAUST Repository

    Woźniak, Maciej

    2015-02-01

    This paper derives theoretical estimates of the computational cost for isogeometric multi-frontal direct solver executed on parallel distributed memory machines. We show theoretically that for the Cp-1 global continuity of the isogeometric solution, both the computational cost and the communication cost of a direct solver are of order O(log(N)p2) for the one dimensional (1D) case, O(Np2) for the two dimensional (2D) case, and O(N4/3p2) for the three dimensional (3D) case, where N is the number of degrees of freedom and p is the polynomial order of the B-spline basis functions. The theoretical estimates are verified by numerical experiments performed with three parallel multi-frontal direct solvers: MUMPS, PaStiX and SuperLU, available through PETIGA toolkit built on top of PETSc. Numerical results confirm these theoretical estimates both in terms of p and N. For a given problem size, the strong efficiency rapidly decreases as the number of processors increases, becoming about 20% for 256 processors for a 3D example with 1283 unknowns and linear B-splines with C0 global continuity, and 15% for a 3D example with 643 unknowns and quartic B-splines with C3 global continuity. At the same time, one cannot arbitrarily increase the problem size, since the memory required by higher order continuity spaces is large, quickly consuming all the available memory resources even in the parallel distributed memory version. Numerical results also suggest that the use of distributed parallel machines is highly beneficial when solving higher order continuity spaces, although the number of processors that one can efficiently employ is somehow limited.

  11. Job-mix modeling and system analysis of an aerospace multiprocessor.

    Science.gov (United States)

    Mallach, E. G.

    1972-01-01

    An aerospace guidance computer organization, consisting of multiple processors and memory units attached to a central time-multiplexed data bus, is described. A job mix for this type of computer is obtained by analysis of Apollo mission programs. Multiprocessor performance is then analyzed using: 1) queuing theory, under certain 'limiting case' assumptions; 2) Markov process methods; and 3) system simulation. Results of the analyses indicate: 1) Markov process analysis is a useful and efficient predictor of simulation results; 2) efficient job execution is not seriously impaired even when the system is so overloaded that new jobs are inordinately delayed in starting; 3) job scheduling is significant in determining system performance; and 4) a system having many slow processors may or may not perform better than a system of equal power having few fast processors, but will not perform significantly worse.

  12. A Time-predictable Memory Network-on-Chip

    DEFF Research Database (Denmark)

    Schoeberl, Martin; Chong, David VH; Puffitsch, Wolfgang

    2014-01-01

    To derive safe bounds on worst-case execution times (WCETs), all components of a computer system need to be time-predictable: the processor pipeline, the caches, the memory controller, and memory arbitration on a multicore processor. This paper presents a solution for time-predictable memory...... arbitration and access for chip-multiprocessors. The memory network-on-chip is organized as a tree with time-division multiplexing (TDM) of accesses to the shared memory. The TDM based arbitration completely decouples processor cores and allows WCET analysis of the memory accesses on individual cores without...

  13. USC orthogonal multiprocessor for image processing with neural networks

    Science.gov (United States)

    Hwang, Kai; Panda, Dhabaleswar K.; Haddadi, Navid

    1990-07-01

    This paper presents the architectural features and imaging applications of the Orthogonal MultiProcessor (OMP) system, which is under construction at the University of Southern California with research funding from NSF and assistance from several industrial partners. The prototype OMP is being built with 16 Intel i860 RISC microprocessors and 256 parallel memory modules using custom-designed spanning buses, which are 2-D interleaved and orthogonally accessed without conflicts. The 16-processor OMP prototype is targeted to achieve 430 MIPS and 600 Mflops, which have been verified by simulation experiments based on the design parameters used. The prototype OMP machine will be initially applied for image processing, computer vision, and neural network simulation applications. We summarize important vision and imaging algorithms that can be restructured with neural network models. These algorithms can efficiently run on the OMP hardware with linear speedup. The ultimate goal is to develop a high-performance Visual Computer (Viscom) for integrated low- and high-level image processing and vision tasks.

  14. Solution of the Euler and Navier-Stokes equations on MIMD distributed memory multiprocessors using cyclic reduction

    International Nuclear Information System (INIS)

    Curchitser, E.N.; Pelz, R.B.; Marconi, F.

    1992-01-01

    The Euler and Navier-Stokes equations are solved for the steady, two-dimensional flow over a NACA 0012 airfoil using a 1024 node nCUBE/2 multiprocessor. Second-order, upwind-discretized difference equations are solved implicitly using ADI factorization. Parallel cyclic reduction is employed to solve the block tridiagonal systems. For realistic problems, communication times are negligible compared to calculation times. The processors are tightly synchronized, and their loads are well balanced. When the flux Jacobians flux are frozen, the wall-clock time for one implicit timestep is about equal to that of a multistage explicit scheme. 10 refs

  15. 3D-TV Rendering on a Multiprocessor System on a Chip

    NARCIS (Netherlands)

    Van Eijndhoven, J.T.J.; Li, X.

    2006-01-01

    This thesis focuses on the issue of mapping 3D-TV rendering applications to a multiprocessor platform. The target platform aims to address tomorrow's multi-media consumer market. The prototype chip, called Wasabi, contains a set of TriMedia processors that communicate viaa shared memory, fast

  16. A Case for Tamper-Resistant and Tamper-Evident Computer Systems

    National Research Council Canada - National Science Library

    Solihin, Yan

    2007-01-01

    .... These attacks attempt to snoop or modify data transfer between various chips in a computer system such as between the processor and memory, and between processors in a multiprocessor interconnect network...

  17. Software/hardware distributed processing network supporting the Ada environment

    Science.gov (United States)

    Wood, Richard J.; Pryk, Zen

    1993-09-01

    A high-performance, fault-tolerant, distributed network has been developed, tested, and demonstrated. The network is based on the MIPS Computer Systems, Inc. R3000 Risc for processing, VHSIC ASICs for high speed, reliable, inter-node communications and compatible commercial memory and I/O boards. The network is an evolution of the Advanced Onboard Signal Processor (AOSP) architecture. It supports Ada application software with an Ada- implemented operating system. A six-node implementation (capable of expansion up to 256 nodes) of the RISC multiprocessor architecture provides 120 MIPS of scalar throughput, 96 Mbytes of RAM and 24 Mbytes of non-volatile memory. The network provides for all ground processing applications, has merit for space-qualified RISC-based network, and interfaces to advanced Computer Aided Software Engineering (CASE) tools for application software development.

  18. A real-time multichannel memory controller and optimal mapping of memory clients to memory channels

    NARCIS (Netherlands)

    Gomony, M.D.; Akesson, K.B.; Goossens, K.G.W.

    2015-01-01

    Ever-increasing demands for main memory bandwidth and memory speed/power tradeoff led to the introduction of memories with multiple memory channels, such as Wide IO DRAM. Efficient utilization of a multichannel memory as a shared resource in multiprocessor real-time systems depends on mapping of the

  19. Studies of electron collisions with polyatomic molecules using distributed-memory parallel computers

    International Nuclear Information System (INIS)

    Winstead, C.; Hipes, P.G.; Lima, M.A.P.; McKoy, V.

    1991-01-01

    Elastic electron scattering cross sections from 5--30 eV are reported for the molecules C 2 H 4 , C 2 H 6 , C 3 H 8 , Si 2 H 6 , and GeH 4 , obtained using an implementation of the Schwinger multichannel method for distributed-memory parallel computer architectures. These results, obtained within the static-exchange approximation, are in generally good agreement with the available experimental data. These calculations demonstrate the potential of highly parallel computation in the study of collisions between low-energy electrons and polyatomic gases. The computational methodology discussed is also directly applicable to the calculation of elastic cross sections at higher levels of approximation (target polarization) and of electronic excitation cross sections

  20. Plasma physics modeling and the Cray-2 multiprocessor

    International Nuclear Information System (INIS)

    Killeen, J.

    1985-01-01

    The importance of computer modeling in the magnetic fusion energy research program is discussed. The need for the most advanced supercomputers is described. To meet the demand for more powerful scientific computers to solve larger and more complicated problems, the computer industry is developing multiprocessors. The role of the Cray-2 in plasma physics modeling is discussed with some examples. 28 refs., 2 figs., 1 tab

  1. Advanced lectures on multiprocessor programming (1/3)

    CERN Multimedia

    CERN. Geneva

    2011-01-01

    Three classes (60 mins) on Multiprocessor Programming Prof. Dr. Christoph von Praun Georg-Simon-Ohm University of Applied Sciences Nuremberg, Germany This is an advanced class on multiprocessor programming. The class gives an introduction to principles of concurrent objects and the notion of different progress guarantees that concurrent computations can have. The focus of this class is on non-blocking computations, i.e. concurrent programs that do not make use of locks. We discuss the implementation of practical non-blocking data structures in detail. 1st class: Introduction to concurrent objects 2nd class: Principles of non-blocking synchronization 3rd class: Concurrent queues Brief Bio of Christoph von Praun Christoph worked on a variety of analysis techniques and runtime platforms for parallel programs. Hist most recent research studies programming models and tools that support transactional synchronization. In prior work, which he also did at the IBM T.J. Watson Research Center in Yorktown Height...

  2. Academic training: Advanced lectures on multiprocessor programming

    CERN Multimedia

    PH Department

    2011-01-01

    Academic Training Lecture - Regular Programme 31 October 1, 2 November 2011 from 11:00 to 12:00 -  IT Auditorium, Bldg. 31   Three classes (60 mins) on Multiprocessor Programming Prof. Dr. Christoph von Praun Georg-Simon-Ohm University of Applied Sciences Nuremberg, Germany This is an advanced class on multiprocessor programming. The class gives an introduction to principles of concurrent objects and the notion of different progress guarantees that concurrent computations can have. The focus of this class is on non-blocking computations, i.e. concurrent programs that do not make use of locks. We discuss the implementation of practical non-blocking data structures in detail. 1st class: Introduction to concurrent objects 2nd class: Principles of non-blocking synchronization 3rd class: Concurrent queues Brief Bio of Christoph von Praun Christoph worked on a variety of analysis techniques and runtime platforms for parallel programs. Hist most recent research studies programming models an...

  3. Distributed computing feasibility in a non-dedicated homogeneous distributed system

    Science.gov (United States)

    Leutenegger, Scott T.; Sun, Xian-He

    1993-01-01

    The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.

  4. Conditional load and store in a shared memory

    Science.gov (United States)

    Blumrich, Matthias A; Ohmacht, Martin

    2015-02-03

    A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.

  5. A scalable single-chip multi-processor architecture with on-chip RTOS kernel

    NARCIS (Netherlands)

    Theelen, B.D.; Verschueren, A.C.; Reyes Suarez, V.V.; Stevens, M.P.J.; Nunez, A.

    2003-01-01

    Now that system-on-chip technology is emerging, single-chip multi-processors are becoming feasible. A key problem of designing such systems is the complexity of their on-chip interconnects and memory architecture. It is furthermore unclear at what level software should be integrated. An example of a

  6. External parallel sorting with multiprocessor computers

    International Nuclear Information System (INIS)

    Comanceau, S.I.

    1984-01-01

    This article describes methods of external sorting in which the entire main computer memory is used for the internal sorting of entries, forming out of them sorted segments of the greatest possible size, and outputting them to external memories. The obtained segments are merged into larger segments until all entries form one ordered segment. The described methods are suitable for sequential files stored on magnetic tape. The needs of the sorting algorithm can be met by using the relatively slow peripheral storage devices (e.g., tapes, disks, drums). The efficiency of the external sorting methods is determined by calculating the total sorting time as a function of the number of entries to be sorted and the number of parallel processors participating in the sorting process

  7. Operating system for a real-time multiprocessor propulsion system simulator. User's manual

    Science.gov (United States)

    Cole, G. L.

    1985-01-01

    The NASA Lewis Research Center is developing and evaluating experimental hardware and software systems to help meet future needs for real-time, high-fidelity simulations of air-breathing propulsion systems. Specifically, the real-time multiprocessor simulator project focuses on the use of multiple microprocessors to achieve the required computing speed and accuracy at relatively low cost. Operating systems for such hardware configurations are generally not available. A real time multiprocessor operating system (RTMPOS) that supports a variety of multiprocessor configurations was developed at Lewis. With some modification, RTMPOS can also support various microprocessors. RTMPOS, by means of menus and prompts, provides the user with a versatile, user-friendly environment for interactively loading, running, and obtaining results from a multiprocessor-based simulator. The menu functions are described and an example simulation session is included to demonstrate the steps required to go from the simulation loading phase to the execution phase.

  8. The art of multiprocessor programming

    CERN Document Server

    Herlihy, Maurice

    2012-01-01

    Revised and updated with improvements conceived in parallel programming courses, The Art of Multiprocessor Programming is an authoritative guide to multicore programming. It introduces a higher level set of software development skills than that needed for efficient single-core programming. This book provides comprehensive coverage of the new principles, algorithms, and tools necessary for effective multiprocessor programming. Students and professionals alike will benefit from thorough coverage of key multiprocessor programming issues. This revised edition incorporates much-demanded updates t

  9. High-performance computing — an overview

    Science.gov (United States)

    Marksteiner, Peter

    1996-08-01

    An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.

  10. Single-chip serial channel enhances multi-processor systems

    Energy Technology Data Exchange (ETDEWEB)

    Millar, J.

    1982-01-01

    In this paper multiprocessor systems are described and explained. The impact that VLSI advancements are having on multiprocessor design is pointed out. The TMS 7041 single-chip microcomputer is described briefly, highlighting its multiprocessor communication capability. And finally, a typical multiprocessor system is shown, implementing the TMS 7041.

  11. Performance of the coupled thermalhydraulics/neutron kinetics code R/P/C on workstation clusters and multiprocessor systems

    International Nuclear Information System (INIS)

    Hammer, C.; Paffrath, M.; Boeer, R.; Finnemann, H.; Jackson, C.J.

    1996-01-01

    The light water reactor core simulation code PANBOX has been coupled with the transient analysis code RELAP5 for the purpose of performing plant safety analyses with a three-dimensional (3-D) neutron kinetics model. The system has been parallelized to improve the computational efficiency. The paper describes the features of this system with emphasis on performance aspects. Performance results are given for different types of parallelization, i. e. for using an automatic parallelizing compiler, using the portable PVM platform on a workstation cluster, using PVM on a shared memory multiprocessor, and for using machine dependent interfaces. (author)

  12. A Multiprocessor Operating System Simulator

    Science.gov (United States)

    Johnston, Gary M.; Campbell, Roy H.

    1988-01-01

    This paper describes a multiprocessor operating system simulator that was developed by the authors in the Fall semester of 1987. The simulator was built in response to the need to provide students with an environment in which to build and test operating system concepts as part of the coursework of a third-year undergraduate operating systems course. Written in C++, the simulator uses the co-routine style task package that is distributed with the AT&T C++ Translator to provide a hierarchy of classes that represents a broad range of operating system software and hardware components. The class hierarchy closely follows that of the 'Choices' family of operating systems for loosely- and tightly-coupled multiprocessors. During an operating system course, these classes are refined and specialized by students in homework assignments to facilitate experimentation with different aspects of operating system design and policy decisions. The current implementation runs on the IBM RT PC under 4.3bsd UNIX.

  13. Using high performance interconnects in a distributed computing and mass storage environment

    International Nuclear Information System (INIS)

    Ernst, M.

    1994-01-01

    Detector Collaborations of the HERA Experiments typically involve more than 500 physicists from a few dozen institutes. These physicists require access to large amounts of data in a fully transparent manner. Important issues include Distributed Mass Storage Management Systems in a Distributed and Heterogeneous Computing Environment. At the very center of a distributed system, including tens of CPUs and network attached mass storage peripherals are the communication links. Today scientists are witnessing an integration of computing and communication technology with the open-quote network close-quote becoming the computer. This contribution reports on a centrally operated computing facility for the HERA Experiments at DESY, including Symmetric Multiprocessor Machines (84 Processors), presently more than 400 GByte of magnetic disk and 40 TB of automoted tape storage, tied together by a HIPPI open-quote network close-quote. Focussing on the High Performance Interconnect technology, details will be provided about the HIPPI based open-quote Backplane close-quote configured around a 20 Gigabit/s Multi Media Router and the performance and efficiency of the related computer interfaces

  14. Adaptive Dynamic Process Scheduling on Distributed Memory Parallel Computers

    Directory of Open Access Journals (Sweden)

    Wei Shu

    1994-01-01

    Full Text Available One of the challenges in programming distributed memory parallel machines is deciding how to allocate work to processors. This problem is particularly important for computations with unpredictable dynamic behaviors or irregular structures. We present a scheme for dynamic scheduling of medium-grained processes that is useful in this context. The adaptive contracting within neighborhood (ACWN is a dynamic, distributed, load-dependent, and scalable scheme. It deals with dynamic and unpredictable creation of processes and adapts to different systems. The scheme is described and contrasted with two other schemes that have been proposed in this context, namely the randomized allocation and the gradient model. The performance of the three schemes on an Intel iPSC/2 hypercube is presented and analyzed. The experimental results show that even though the ACWN algorithm incurs somewhat larger overhead than the randomized allocation, it achieves better performance in most cases due to its adaptiveness. Its feature of quickly spreading the work helps it outperform the gradient model in performance and scalability.

  15. Efficient implementation of multidimensional fast fourier transform on a distributed-memory parallel multi-node computer

    Science.gov (United States)

    Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

    2012-01-10

    The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.

  16. TUMULT, the Twente University multiprocessor

    NARCIS (Netherlands)

    Scholten, Johan; Jansen, P.G.

    1988-01-01

    TUMULT, (Twente University multiprocessor) is described. Its aim is the design and implementation of a modular extendable multiprocessor system. Up to 15 processing elements are connected through an interprocessor communication network, using message-passing for the exchange of data. The hardware is

  17. A Comparison of Two Paradigms for Distributed Shared Memory

    NARCIS (Netherlands)

    Levelt, W.G.; Kaashoek, M.F.; Bal, H.E.; Tanenbaum, A.S.

    1992-01-01

    Two paradigms for distributed shared memory on loosely‐coupled computing systems are compared: the shared data‐object model as used in Orca, a programming language specially designed for loosely‐coupled computing systems, and the shared virtual memory model. For both paradigms two systems are

  18. Virtual memory support for distributed computing environments using a shared data object model

    Science.gov (United States)

    Huang, F.; Bacon, J.; Mapp, G.

    1995-12-01

    Conventional storage management systems provide one interface for accessing memory segments and another for accessing secondary storage objects. This hinders application programming and affects overall system performance due to mandatory data copying and user/kernel boundary crossings, which in the microkernel case may involve context switches. Memory-mapping techniques may be used to provide programmers with a unified view of the storage system. This paper extends such techniques to support a shared data object model for distributed computing environments in which good support for coherence and synchronization is essential. The approach is based on a microkernel, typed memory objects, and integrated coherence control. A microkernel architecture is used to support multiple coherence protocols and the addition of new protocols. Memory objects are typed and applications can choose the most suitable protocols for different types of object to avoid protocol mismatch. Low-level coherence control is integrated with high-level concurrency control so that the number of messages required to maintain memory coherence is reduced and system-wide synchronization is realized without severely impacting the system performance. These features together contribute a novel approach to the support for flexible coherence under application control.

  19. Application of the coupled code Athlet-Quabox/Cubbox for the extreme scenarios of the OECD/NRC BWR turbine trip benchmark and its performance on multi-processor computers

    International Nuclear Information System (INIS)

    Langenbuch, S.; Schmidt, K.D.; Velkov, K.

    2003-01-01

    The OECD/NRC BWR Turbine Trip (TT) Benchmark is investigated to perform code-to-code comparison of coupled codes including a comparison to measured data which are available from turbine trip experiments at Peach Bottom 2. This Benchmark problem for a BWR over-pressure transient represents a challenging application of coupled codes which integrate 3-dimensional neutron kinetics into thermal-hydraulic system codes for best-estimate simulation of plant transients. This transient represents a typical application of coupled codes which are usually performed on powerful workstations using a single CPU. Nowadays, the availability of multi-CPUs is much easier. Indeed, powerful workstations already provide 4 to 8 CPU, computer centers give access to multi-processor systems with numbers of CPUs in the order of 16 up to several 100. Therefore, the performance of the coupled code Athlet-Quabox/Cubbox on multi-processor systems is studied. Different cases of application lead to changing requirements of the code efficiency, because the amount of computer time spent in different parts of the code is varying. This paper presents main results of the coupled code Athlet-Quabox/Cubbox for the extreme scenarios of the BWR TT Benchmark together with evaluations of the code performance on multi-processor computers. (authors)

  20. Geometric Algorithms for Private-Cache Chip Multiprocessors

    DEFF Research Database (Denmark)

    Ajwani, Deepak; Sitchinava, Nodari; Zeh, Norbert

    2010-01-01

    -D convex hulls. These results are obtained by analyzing adaptations of either the PEM merge sort algorithm or PRAM algorithms. For the second group of problems—orthogonal line segment intersection reporting, batched range reporting, and related problems—more effort is required. What distinguishes......We study techniques for obtaining efficient algorithms for geometric problems on private-cache chip multiprocessors. We show how to obtain optimal algorithms for interval stabbing counting, 1-D range counting, weighted 2-D dominance counting, and for computing 3-D maxima, 2-D lower envelopes, and 2...... these problems from the ones in the previous group is the variable output size, which requires I/O-efficient load balancing strategies based on the contribution of the individual input elements to the output size. To obtain nearly optimal algorithms for these problems, we introduce a parallel distribution...

  1. Language Constructs for Data Partitioning and Distribution

    Directory of Open Access Journals (Sweden)

    P. Crooks

    1995-01-01

    Full Text Available This article presents a survey of language features for distributed memory multiprocessor systems (DMMs, in particular, systems that provide features for data partitioning and distribution. In these systems the programmer is freed from consideration of the low-level details of the target architecture in that there is no need to program explicit processes or specify interprocess communication. Programs are written according to the shared memory programming paradigm but the programmer is required to specify, by means of directives, additional syntax or interactive methods, how the data of the program are decomposed and distributed.

  2. Multiprocessor system with multiple concurrent modes of execution

    Science.gov (United States)

    Ahn, Daniel; Ceze, Luis H; Chen, Dong; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin

    2013-12-31

    A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.

  3. Task-FIFO co-scheduling of streaming applications on MPSoCs with predictable memory hierarchy

    NARCIS (Netherlands)

    Tang, Q.; Basten, A.A.; Geilen, M.C.W.; Stuijk, S.; Wei, Ji-Bo

    This article studies the scheduling of real-time streaming applications on multiprocessor systems-on-chips with predictable memory hierarchy. An iteration-based task-FIFO co-scheduling framework is proposed for this problem. We obtain FIFO size distributions using Pareto space searching, based on

  4. Task-FIFO co-scheduling of streaming applications on MPSoCs with predictable memory hierarchy

    NARCIS (Netherlands)

    Tang, Q.; Basten, T.; Geilen, M.; Stuijk, S.; Wei, J.B.

    2017-01-01

    This article studies the scheduling of real-time streaming applications on multiprocessor systems-on-chips with predictable memory hierarchy. An iteration-based task-FIFO co-scheduling framework is proposed for this problem. We obtain FIFO size distributions using Pareto space searching, based on

  5. Multiprocessor scheduling for real-time systems

    CERN Document Server

    Baruah, Sanjoy; Buttazzo, Giorgio

    2015-01-01

    This book provides a comprehensive overview of both theoretical and pragmatic aspects of resource-allocation and scheduling in multiprocessor and multicore hard-real-time systems.  The authors derive new, abstract models of real-time tasks that capture accurately the salient features of real application systems that are to be implemented on multiprocessor platforms, and identify rules for mapping application systems onto the most appropriate models.  New run-time multiprocessor scheduling algorithms are presented, which are demonstrably better than those currently used, both in terms of run-time efficiency and tractability of off-line analysis.  Readers will benefit from a new design and analysis framework for multiprocessor real-time systems, which will translate into a significantly enhanced ability to provide formally verified, safety-critical real-time systems at a significantly lower cost.

  6. Multiprocessor data acquisition system

    International Nuclear Information System (INIS)

    Haumann, J.R.; Crawford, R.K.

    1987-01-01

    A multiprocessor data acquisition system has been built to replace the single processor systems at the Intense Pulsed Neutron Source (IPNS) at Argonne National Laboratory. The multiprocessor system was needed to accommodate the higher data rates at IPNS brought about by improvements in the source and changes in instrument configurations. This paper describes the hardware configuration of the system and the method of task sharing and compares results to the single processor system

  7. Hard Real-Time Performances in Multiprocessor-Embedded Systems Using ASMP-Linux

    Directory of Open Access Journals (Sweden)

    Daniel Pierre Bovet

    2008-01-01

    Full Text Available Multiprocessor systems, especially those based on multicore or multithreaded processors, and new operating system architectures can satisfy the ever increasing computational requirements of embedded systems. ASMP-LINUX is a modified, high responsiveness, open-source hard real-time operating system for multiprocessor systems capable of providing high real-time performance while maintaining the code simple and not impacting on the performances of the rest of the system. Moreover, ASMP-LINUX does not require code changing or application recompiling/relinking. In order to assess the performances of ASMP-LINUX, benchmarks have been performed on several hardware platforms and configurations.

  8. Hard Real-Time Performances in Multiprocessor-Embedded Systems Using ASMP-Linux

    Directory of Open Access Journals (Sweden)

    Betti Emiliano

    2008-01-01

    Full Text Available Abstract Multiprocessor systems, especially those based on multicore or multithreaded processors, and new operating system architectures can satisfy the ever increasing computational requirements of embedded systems. ASMP-LINUX is a modified, high responsiveness, open-source hard real-time operating system for multiprocessor systems capable of providing high real-time performance while maintaining the code simple and not impacting on the performances of the rest of the system. Moreover, ASMP-LINUX does not require code changing or application recompiling/relinking. In order to assess the performances of ASMP-LINUX, benchmarks have been performed on several hardware platforms and configurations.

  9. Efficient implementation of a multidimensional fast fourier transform on a distributed-memory parallel multi-node computer

    Science.gov (United States)

    Bhanot, Gyan V [Princeton, NJ; Chen, Dong [Croton-On-Hudson, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

    2008-01-01

    The present in invention is directed to a method, system and program storage device for efficiently implementing a multidimensional Fast Fourier Transform (FFT) of a multidimensional array comprising a plurality of elements initially distributed in a multi-node computer system comprising a plurality of nodes in communication over a network, comprising: distributing the plurality of elements of the array in a first dimension across the plurality of nodes of the computer system over the network to facilitate a first one-dimensional FFT; performing the first one-dimensional FFT on the elements of the array distributed at each node in the first dimension; re-distributing the one-dimensional FFT-transformed elements at each node in a second dimension via "all-to-all" distribution in random order across other nodes of the computer system over the network; and performing a second one-dimensional FFT on elements of the array re-distributed at each node in the second dimension, wherein the random order facilitates efficient utilization of the network thereby efficiently implementing the multidimensional FFT. The "all-to-all" re-distribution of array elements is further efficiently implemented in applications other than the multidimensional FFT on the distributed-memory parallel supercomputer.

  10. Software for event oriented processing on multiprocessor systems

    International Nuclear Information System (INIS)

    Fischler, M.; Areti, H.; Biel, J.; Bracker, S.; Case, G.; Gaines, I.; Husby, D.; Nash, T.

    1984-08-01

    Computing intensive problems that require the processing of numerous essentially independent events are natural customers for large scale multi-microprocessor systems. This paper describes the software required to support users with such problems in a multiprocessor environment. It is based on experience with and development work aimed at processing very large amounts of high energy physics data

  11. VME multiprocessor system for plasma control at the JT-60 Upgrade

    International Nuclear Information System (INIS)

    Kimura, T.; Kurihara, K.; Takahashi, M.; Kawamata, Y.; Akasaka, H.; Matsukawa, M.

    1989-01-01

    In this paper design and preliminary tests are reported of a VME multiprocessor system for the JT-60 Upgrade plasma control utilizing three MC88100 based RISC computers and VME buses. The design of the VME system was stimulated by faster and more accurate computation requirements for the plasma position and shape control

  12. Parallel implementations of 2D explicit Euler solvers

    International Nuclear Information System (INIS)

    Giraud, L.; Manzini, G.

    1996-01-01

    In this work we present a subdomain partitioning strategy applied to an explicit high-resolution Euler solver. We describe the design of a portable parallel multi-domain code suitable for parallel environments. We present several implementations on a representative range of MlMD computers that include shared memory multiprocessors, distributed virtual shared memory computers, as well as networks of workstations. Computational results are given to illustrate the efficiency, the scalability, and the limitations of the different approaches. We discuss also the effect of the communication protocol on the optimal domain partitioning strategy for the distributed memory computers

  13. ClimateSpark: An In-memory Distributed Computing Framework for Big Climate Data Analytics

    Science.gov (United States)

    Hu, F.; Yang, C. P.; Duffy, D.; Schnase, J. L.; Li, Z.

    2016-12-01

    Massive array-based climate data is being generated from global surveillance systems and model simulations. They are widely used to analyze the environment problems, such as climate changes, natural hazards, and public health. However, knowing the underlying information from these big climate datasets is challenging due to both data- and computing- intensive issues in data processing and analyzing. To tackle the challenges, this paper proposes ClimateSpark, an in-memory distributed computing framework to support big climate data processing. In ClimateSpark, the spatiotemporal index is developed to enable Apache Spark to treat the array-based climate data (e.g. netCDF4, HDF4) as native formats, which are stored in Hadoop Distributed File System (HDFS) without any preprocessing. Based on the index, the spatiotemporal query services are provided to retrieve dataset according to a defined geospatial and temporal bounding box. The data subsets will be read out, and a data partition strategy will be applied to equally split the queried data to each computing node, and store them in memory as climateRDDs for processing. By leveraging Spark SQL and User Defined Function (UDFs), the climate data analysis operations can be conducted by the intuitive SQL language. ClimateSpark is evaluated by two use cases using the NASA Modern-Era Retrospective Analysis for Research and Applications (MERRA) climate reanalysis dataset. One use case is to conduct the spatiotemporal query and visualize the subset results in animation; the other one is to compare different climate model outputs using Taylor-diagram service. Experimental results show that ClimateSpark can significantly accelerate data query and processing, and enable the complex analysis services served in the SQL-style fashion.

  14. Operating experience with a VMEbus multiprocessor system for data acquisition and reduction in nuclear physics

    International Nuclear Information System (INIS)

    Kutt, P.H.; Balamuth, D.P.

    1989-01-01

    A multiprocessor system based on commercially available VMEbus components has been developed for the acquisition and reduction of event-mode data in nuclear physics experiments. The system contains seven 68000 CPU's and 14 MB of memory. A minimal operating system handles data transfer and task allocation, and a compiler for a specially designed event analysis language produces code for the processors. The system has been in operation for four years at the University of Pennsylvania Tandem Accelerator Laboratory. Computation rates over 3 times that of a MicroVAX II have been achieved at a fraction of the cost. The use of WORM optical disks for event recording allows the processing for gigabyte data sets without operator intervention. A more powerful system is being planned which will make use of recently developed RISC processors to obtain an order of magnitude increase in computing power per node

  15. Construction and Application of an AMR Algorithm for Distributed Memory Computers

    OpenAIRE

    Deiterding, Ralf

    2003-01-01

    While the parallelization of blockstructured adaptive mesh refinement techniques is relatively straight-forward on shared memory architectures, appropriate distribution strategies for the emerging generation of distributed memory machines are a topic of on-going research. In this paper, a locality-preserving domain decomposition is proposed that partitions the entire AMR hierarchy from the base level on. It is shown that the approach reduces the communication costs and simplifies the im...

  16. A Performance-Prediction Model for PIC Applications on Clusters of Symmetric MultiProcessors: Validation with Hierarchical HPF+OpenMP Implementation

    Directory of Open Access Journals (Sweden)

    Sergio Briguglio

    2003-01-01

    Full Text Available A performance-prediction model is presented, which describes different hierarchical workload decomposition strategies for particle in cell (PIC codes on Clusters of Symmetric MultiProcessors. The devised workload decomposition is hierarchically structured: a higher-level decomposition among the computational nodes, and a lower-level one among the processors of each computational node. Several decomposition strategies are evaluated by means of the prediction model, with respect to the memory occupancy, the parallelization efficiency and the required programming effort. Such strategies have been implemented by integrating the high-level languages High Performance Fortran (at the inter-node stage and OpenMP (at the intra-node one. The details of these implementations are presented, and the experimental values of parallelization efficiency are compared with the predicted results.

  17. 2: Local area networks as a multiprocessor treatment planning system

    International Nuclear Information System (INIS)

    Neblett, D.L.; Hogan, S.E.

    1987-01-01

    The creation of a local area network (LAN) of interconnected computers provides an environment of multi computer processors that adds a new dimension to treatment planning. A LAN system provides the opportunity to have two or more computers working on the plan in parallel. With high speed interprocessor transfer, events such as the time consuming task of correcting several individual beams for contours and inhomogeneities can be performed simultaneously; thus, effectively creating a parallel multiprocessor treatment planning system

  18. Control and Reliability of Optical Networks in Multiprocessors

    Science.gov (United States)

    Olsen, James Jonathan

    1993-01-01

    Optical communication links have great potential to improve the performance of interconnection networks within large parallel multiprocessors, but the problems of semiconductor laser drive control and reliability inhibit their wide use. These problems have been solved in the telecommunications context, but the telecommunications solutions, based on a small number of links, are often too bulky, complex, power-hungry, and expensive to be feasible for use in a multiprocessor network with thousands of optical links. The main problems with the telecommunications approaches are that they are, by definition, designed for long-distance communication and therefore deal with communications links in isolation, instead of in an overall systems context. By taking a system-level approach to solving the laser reliability problem in a multiprocessor, and by exploiting the short -distance nature of the links, one can achieve small, simple, low-power, and inexpensive solutions, practical for implementation in the thousands of optical links that might be used in a multiprocessor. Through modeling and experimentation, I demonstrate that such system-level solutions exist, and are feasible for use in a multiprocessor network. I divide semiconductor laser reliability problems into two classes: transient errors and hard failures, and develop solutions to each type of problem in the context of a large multiprocessor. I find that for transient errors, the computer system would require a very low bit-error-rate (BER), such as 10^{-23}, if no provision were made for error control. Optical links cannot achieve such rates directly, but I find that a much more reasonable link-level BER (such as 10^{-7} ) would be acceptable with simple error detection coding. I then propose a feedback system that will enable lasers to achieve these error levels even when laser threshold current varies. Instead of telecommunications techniques, which require laser output power monitors, I describe a software

  19. Languages, compilers and run-time environments for distributed memory machines

    CERN Document Server

    Saltz, J

    1992-01-01

    Papers presented within this volume cover a wide range of topics related to programming distributed memory machines. Distributed memory architectures, although having the potential to supply the very high levels of performance required to support future computing needs, present awkward programming problems. The major issue is to design methods which enable compilers to generate efficient distributed memory programs from relatively machine independent program specifications. This book is the compilation of papers describing a wide range of research efforts aimed at easing the task of programmin

  20. DOLIB: Distributed Object Library

    Energy Technology Data Exchange (ETDEWEB)

    D' Azevedo, E.F.

    1994-01-01

    This report describes the use and implementation of DOLIB (Distributed Object Library), a library of routines that emulates global or virtual shared memory on Intel multiprocessor systems. Access to a distributed global array is through explicit calls to gather and scatter. Advantages of using DOLIB include: dynamic allocation and freeing of huge (gigabyte) distributed arrays, both C and FORTRAN callable interfaces, and the ability to mix shared-memory and message-passing programming models for ease of use and optimal performance. DOLIB is independent of language and compiler extensions and requires no special operating system support. DOLIB also supports automatic caching of read-only data for high performance. The virtual shared memory support provided in DOLIB is well suited for implementing Lagrangian particle tracking techniques. We have also used DOLIB to create DONIO (Distributed Object Network I/O Library), which obtains over a 10-fold improvement in disk I/O performance on the Intel Paragon.

  1. DOLIB: Distributed Object Library

    Energy Technology Data Exchange (ETDEWEB)

    D`Azevedo, E.F.; Romine, C.H.

    1994-10-01

    This report describes the use and implementation of DOLIB (Distributed Object Library), a library of routines that emulates global or virtual shared memory on Intel multiprocessor systems. Access to a distributed global array is through explicit calls to gather and scatter. Advantages of using DOLIB include: dynamic allocation and freeing of huge (gigabyte) distributed arrays, both C and FORTRAN callable interfaces, and the ability to mix shared-memory and message-passing programming models for ease of use and optimal performance. DOLIB is independent of language and compiler extensions and requires no special operating system support. DOLIB also supports automatic caching of read-only data for high performance. The virtual shared memory support provided in DOLIB is well suited for implementing Lagrangian particle tracking techniques. We have also used DOLIB to create DONIO (Distributed Object Network I/O Library), which obtains over a 10-fold improvement in disk I/O performance on the Intel Paragon.

  2. Directions for memory hierarchies and their components: research and development

    International Nuclear Information System (INIS)

    Smith, A.J.

    1978-10-01

    The memory hierarchy is usually the largest identifiable part of a computer system and making effective use of it is critical to the operation and use of the system. The levels of such a memory hierarchy are considered and the state of the art and likely directions for both research and development are described. Algorithmic and logical features of the hierarchy not directly associated with specific components are also discussed. Among the problems believed to be the most significant are the following: (a) evaluate the effectiveness of gap filler technology as a level of storage between main memory and disk, and if it proves to be effective, determine how/where it should be used, (b) develop algorithms for the use of mass storage in a large computer system, and (c) determine how cache memories should be implemented in very large, fast multiprocessor systems

  3. Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

    Energy Technology Data Exchange (ETDEWEB)

    Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu

    2015-12-09

    Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.

  4. A Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced Memories

    Science.gov (United States)

    1989-02-01

    frames per second, font generation directly from conic spline descriptions, and rapid calculation of radiosity form factors. The hardware consists of...generality for rendering curved surfaces, volume data, objects dcscri id with Constructive Solid Geometry, for rendering scenes using the radiosity ...f.aces and for computing a spherical radiosity lighting model (see Section 7.6). Custom Memory Chips \\ 208 bits x 128 pixels - Renderer Board ix p o a

  5. A view of Kanerva's sparse distributed memory

    Science.gov (United States)

    Denning, P. J.

    1986-01-01

    Pentti Kanerva is working on a new class of computers, which are called pattern computers. Pattern computers may close the gap between capabilities of biological organisms to recognize and act on patterns (visual, auditory, tactile, or olfactory) and capabilities of modern computers. Combinations of numeric, symbolic, and pattern computers may one day be capable of sustaining robots. The overview of the requirements for a pattern computer, a summary of Kanerva's Sparse Distributed Memory (SDM), and examples of tasks this computer can be expected to perform well are given.

  6. Hybrid shared/distributed parallelism for 3D characteristics transport solvers

    International Nuclear Information System (INIS)

    Dahmani, M.; Roy, R.

    2005-01-01

    In this paper, we will present a new hybrid parallel model for solving large-scale 3-dimensional neutron transport problems used in nuclear reactor simulations. Large heterogeneous reactor problems, like the ones that occurs when simulating Candu cores, have remained computationally intensive and impractical for routine applications on single-node or even vector computers. Based on the characteristics method, this new model is designed to solve the transport equation after distributing the calculation load on a network of shared memory multi-processors. The tracks are either generated on the fly at each characteristics sweep or stored in sequential files. The load balancing is taken into account by estimating the calculation load of tracks and by distributing batches of uniform load on each node of the network. Moreover, the communication overhead can be predicted after benchmarking the latency and bandwidth using appropriate network test suite. These models are useful for predicting the performance of the parallel applications and to analyze the scalability of the parallel systems. (authors)

  7. Design of massively parallel hardware multi-processors for highly-demanding embedded applications

    NARCIS (Netherlands)

    Jozwiak, L.; Jan, Y.

    2013-01-01

    Many new embedded applications require complex computations to be performed to tight schedules, while at the same time demanding low energy consumption and low cost. For implementation of these highly-demanding applications, highly-optimized application-specific multi-processor system-on-a-chip

  8. Over-Distribution in Source Memory

    Science.gov (United States)

    Brainerd, C. J.; Reyna, V. F.; Holliday, R. E.; Nakamura, K.

    2012-01-01

    Semantic false memories are confounded with a second type of error, over-distribution, in which items are attributed to contradictory episodic states. Over-distribution errors have proved to be more common than false memories when the two are disentangled. We investigated whether over-distribution is prevalent in another classic false memory paradigm: source monitoring. It is. Conventional false memory responses (source misattributions) were predominantly over-distribution errors, but unlike semantic false memory, over-distribution also accounted for more than half of true memory responses (correct source attributions). Experimental control of over-distribution was achieved via a series of manipulations that affected either recollection of contextual details or item memory (concreteness, frequency, list-order, number of presentation contexts, and individual differences in verbatim memory). A theoretical model was used to analyze the data (conjoint process dissociation) that predicts that predicts that (a) over-distribution is directly proportional to item memory but inversely proportional to recollection and (b) item memory is not a necessary precondition for recollection of contextual details. The results were consistent with both predictions. PMID:21942494

  9. Use of the CAMAC-MULTIBUS combined protocol for organizing multi-processor operation in a crate

    International Nuclear Information System (INIS)

    Glejbman, Eh.M.

    1985-01-01

    Problems of developing electronic units for large on-line systems for nuclear-physical experiments automation and developed on the base of principles of distributed control and data processing are discussed. Crates with simultaneous disposition and operation of CAMAC moduli (EUR-4100) and those realizing the MULTIBUS hardcopy log in dataway are described. It is attained due to sharing the CAMAC and the MULTIBUS hardcopy logs in the crate dataway. Application of job scheduler and executor moduli in the MULTIBUS interface permits to organize multiprocessor operation and to obtain separation of data stream as well as to increase total computational capacity in the crate

  10. Sparse distributed memory

    Science.gov (United States)

    Denning, Peter J.

    1989-01-01

    Sparse distributed memory was proposed be Pentti Kanerva as a realizable architecture that could store large patterns and retrieve them based on partial matches with patterns representing current sensory inputs. This memory exhibits behaviors, both in theory and in experiment, that resemble those previously unapproached by machines - e.g., rapid recognition of faces or odors, discovery of new connections between seemingly unrelated ideas, continuation of a sequence of events when given a cue from the middle, knowing that one doesn't know, or getting stuck with an answer on the tip of one's tongue. These behaviors are now within reach of machines that can be incorporated into the computing systems of robots capable of seeing, talking, and manipulating. Kanerva's theory is a break with the Western rationalistic tradition, allowing a new interpretation of learning and cognition that respects biology and the mysteries of individual human beings.

  11. Operating system for a real-time multiprocessor propulsion system simulator

    Science.gov (United States)

    Cole, G. L.

    1984-01-01

    The success of the Real Time Multiprocessor Operating System (RTMPOS) in the development and evaluation of experimental hardware and software systems for real time interactive simulation of air breathing propulsion systems was evaluated. The Real Time Multiprocessor Operating System (RTMPOS) provides the user with a versatile, interactive means for loading, running, debugging and obtaining results from a multiprocessor based simulator. A front end processor (FEP) serves as the simulator controller and interface between the user and the simulator. These functions are facilitated by the RTMPOS which resides on the FEP. The RTMPOS acts in conjunction with the FEP's manufacturer supplied disk operating system that provides typical utilities like an assembler, linkage editor, text editor, file handling services, etc. Once a simulation is formulated, the RTMPOS provides for engineering level, run time operations such as loading, modifying and specifying computation flow of programs, simulator mode control, data handling and run time monitoring. Run time monitoring is a powerful feature of RTMPOS that allows the user to record all actions taken during a simulation session and to receive advisories from the simulator via the FEP. The RTMPOS is programmed mainly in PASCAL along with some assembly language routines. The RTMPOS software is easily modified to be applicable to hardware from different manufacturers.

  12. Scientific applications and numerical algorithms on the midas multiprocessor system

    International Nuclear Information System (INIS)

    Logan, D.; Maples, C.

    1986-01-01

    The MIDAS multiprocessor system is a multi-level, hierarchial structure designed at the Advanced Computer Architecture Laboratory of the University of California's Lawrence Berkeley Laboratory. A two-stage, 11-processor system has been operational for over a year and is currently undergoing expansion. It has been employed to investigate the performance of different methods of decomposing various problems and algorithms into a multiprocessor environment. The results of such tests on a variety of applications such as scientific data analysis, Monte Carlo calculations, and image processing, are discussed. Often such decompositions involve investigating the parallel structure of fundamental algorithms. Several basic algorithms dealing with random number generation, matrix diagonalization, fast Fourier transforms, and finite element methods in solving partial differential equations are also discussed. The performance and projected extensibilities of these decompositions on the MIDAS system are reported

  13. ClimateSpark: An in-memory distributed computing framework for big climate data analytics

    Science.gov (United States)

    Hu, Fei; Yang, Chaowei; Schnase, John L.; Duffy, Daniel Q.; Xu, Mengchao; Bowen, Michael K.; Lee, Tsengdar; Song, Weiwei

    2018-06-01

    The unprecedented growth of climate data creates new opportunities for climate studies, and yet big climate data pose a grand challenge to climatologists to efficiently manage and analyze big data. The complexity of climate data content and analytical algorithms increases the difficulty of implementing algorithms on high performance computing systems. This paper proposes an in-memory, distributed computing framework, ClimateSpark, to facilitate complex big data analytics and time-consuming computational tasks. Chunking data structure improves parallel I/O efficiency, while a spatiotemporal index is built for the chunks to avoid unnecessary data reading and preprocessing. An integrated, multi-dimensional, array-based data model (ClimateRDD) and ETL operations are developed to address big climate data variety by integrating the processing components of the climate data lifecycle. ClimateSpark utilizes Spark SQL and Apache Zeppelin to develop a web portal to facilitate the interaction among climatologists, climate data, analytic operations and computing resources (e.g., using SQL query and Scala/Python notebook). Experimental results show that ClimateSpark conducts different spatiotemporal data queries/analytics with high efficiency and data locality. ClimateSpark is easily adaptable to other big multiple-dimensional, array-based datasets in various geoscience domains.

  14. Shared performance monitor in a multiprocessor system

    Science.gov (United States)

    Chiu, George; Gara, Alan G.; Salapura, Valentina

    2012-07-24

    A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.

  15. An optimal multi-channel memory controller for real-time systems

    NARCIS (Netherlands)

    Gomony, M.D.; Akesson, K.B.; Goossens, K.G.W.

    2013-01-01

    Optimal utilization of a multi-channel memory, such as Wide IO DRAM, as shared memory in multi-processor platforms depends on the mapping of memory clients to the memory channels, the granularity at which the memory requests are interleaved in each channel, and the bandwidth and memory capacity

  16. Multi-processor developments in the United States for future high energy physics experiments and accelerators

    International Nuclear Information System (INIS)

    Gaines, I.

    1988-03-01

    The use of multi-processors for analysis and high-level triggering in High Energy Physics experiments, pioneered by the early emulator systems, has reached maturity, in particular with the multiple microprocessor systems in use at Fermilab. It is widely acknowledged that such systems will fulfill the major portion of the computing needs of future large experiments. Recent developments at Fermilab's Advanced Computer Program will make such systems even more powerful, cost-effective, and easier to use than they are at present. The next generation of microprocessors, already available, will provide CPU power of about one VAX 780 equivalent/$300, while supporting most VMS FORTRAN extensions and large (>8MB) amounts of memory. Low cost high density mass storage devices (based on video tape cartridge technology) will allow parallel I/O to remove potential I/O bottlenecks in systems of over 1000 VAX equipment processors. New interconnection schemes and system software will allow more flexible topologies and extremely high data bandwidth, especially for on-line systems. This talk will summarize the work at the Advanced Computer Program and the rest of the US in this field. 3 refs., 4 figs

  17. Simulation of Particulate Flows Multi-Processor Machines with Distributed Memory

    Energy Technology Data Exchange (ETDEWEB)

    Uhlmann, M.

    2004-07-01

    We presented a method for the parallelization of an immersed boundary algorithm for particulate flows using the MPI standard of communication. The treatment of the fluid phase used the domain decomposition technique over a Cartesian processor grid. The solution of the Helmholtz problem is approximately factorized an relies upon apparel tri-diagonal solver the Poisson problem is solved by means of a parallel multi-grid technique similar to MUDPACK. for the solid phase we employ a master-slaves technique where one processor handles all the particles contained in its Eulerian fluid sub-domain and zero or more neighbor processors collaborate in the computation of particle-related quantities whenever a particle position over laps the boundary of a sub-domain. the parallel efficiency for some preliminary computations is presented. (Author) 9 refs.

  18. Trade-Off Exploration for Target Tracking Application in a Customized Multiprocessor Architecture

    Directory of Open Access Journals (Sweden)

    Yassin El-Hillali

    2009-01-01

    Full Text Available This paper presents the design of an FPGA-based multiprocessor-system-on-chip (MPSoC architecture optimized for Multiple Target Tracking (MTT in automotive applications. An MTT system uses an automotive radar to track the speed and relative position of all the vehicles (targets within its field of view. As the number of targets increases, the computational needs of the MTT system also increase making it difficult for a single processor to handle it alone. Our implementation distributes the computational load among multiple soft processor cores optimized for executing specific computational tasks. The paper explains how we designed and profiled the MTT application to partition it among different processors. It also explains how we applied different optimizations to customize the individual processor cores to their assigned tasks and to assess their impact on performance and FPGA resource utilization. The result is a complete MTT application running on an optimized MPSoC architecture that fits in a contemporary medium-sized FPGA and that meets the application's real-time constraints.

  19. A general purpose subroutine for fast fourier transform on a distributed memory parallel machine

    Science.gov (United States)

    Dubey, A.; Zubair, M.; Grosch, C. E.

    1992-01-01

    One issue which is central in developing a general purpose Fast Fourier Transform (FFT) subroutine on a distributed memory parallel machine is the data distribution. It is possible that different users would like to use the FFT routine with different data distributions. Thus, there is a need to design FFT schemes on distributed memory parallel machines which can support a variety of data distributions. An FFT implementation on a distributed memory parallel machine which works for a number of data distributions commonly encountered in scientific applications is presented. The problem of rearranging the data after computing the FFT is also addressed. The performance of the implementation on a distributed memory parallel machine Intel iPSC/860 is evaluated.

  20. Design and simulation of parallel and distributed architectures for images processing

    International Nuclear Information System (INIS)

    Pirson, Alain

    1990-01-01

    The exploitation of visual information requires special computers. The diversity of operations and the Computing power involved bring about structures founded on the concepts of concurrency and distributed processing. This work identifies a vision computer with an association of dedicated intelligent entities, exchanging messages according to the model of parallelism introduced by the language Occam. It puts forward an architecture of the 'enriched processor network' type. It consists of a classical multiprocessor structure where each node is provided with specific devices. These devices perform processing tasks as well as inter-nodes dialogues. Such an architecture benefits from the homogeneity of multiprocessor networks and the power of dedicated resources. Its implementation corresponds to that of a distributed structure, tasks being allocated to each Computing element. This approach culminates in an original architecture called ATILA. This modular structure is based on a transputer network supplied with vision dedicated co-processors and powerful communication devices. (author) [fr

  1. Method for wiring allocation and switch configuration in a multiprocessor environment

    Science.gov (United States)

    Aridor, Yariv [Zichron Ya'akov, IL; Domany, Tamar [Kiryat Tivon, IL; Frachtenberg, Eitan [Jerusalem, IL; Gal, Yoav [Haifa, IL; Shmueli, Edi [Haifa, IL; Stockmeyer, legal representative, Robert E.; Stockmeyer, Larry Joseph [San Jose, CA

    2008-07-15

    A method for wiring allocation and switch configuration in a multiprocessor computer, the method including employing depth-first tree traversal to determine a plurality of paths among a plurality of processing elements allocated to a job along a plurality of switches and wires in a plurality of D-lines, and selecting one of the paths in accordance with at least one selection criterion.

  2. Parallel algorithms for geometric connected component labeling on a hypercube multiprocessor

    Science.gov (United States)

    Belkhale, K. P.; Banerjee, P.

    1992-01-01

    Different algorithms for the geometric connected component labeling (GCCL) problem are defined each of which involves d stages of message passing, for a d-dimensional hypercube. The major idea is that in each stage a hypercube multiprocessor increases its knowledge of domain. The algorithms under consideration include the QUAD algorithm for small number of processors and the Overlap Quad algorithm for large number of processors, subject to the locality of the connected sets. These algorithms differ in their run time, memory requirements, and message complexity. They were implemented on an Intel iPSC2/D4/MX hypercube.

  3. Meeting the memory challenges of brain-scale network simulation

    Directory of Open Access Journals (Sweden)

    Susanne eKunkel

    2012-01-01

    Full Text Available The development of high-performance simulation software is crucial for studying the brain connectome. Using connectome data to generate neurocomputational models requires software capable of coping with models on a variety of scales: from the microscale, investigating plasticity and dynamics of circuits in local networks, to the macroscale, investigating the interactions between distinct brain regions. Prior to any serious dynamical investigation, the first task of network simulations is to check the consistency of data integrated in the connectome and constrain ranges for yet unknown parameters. Thanks to distributed computing techniques, it is possible today to routinely simulate local cortical networks of around 10^5 neurons with up to 10^9 synapses on clusters and multi-processor shared-memory machines. However, brain-scale networks are one or two orders of magnitude larger than such local networks, in terms of numbers of neurons and synapses as well as in terms of computational load. Such networks have been studied in individual studies, but the underlying simulation technologies have neither been described in sufficient detail to be reproducible nor made publicly available. Here, we discover that as the network model sizes approach the regime of meso- and macroscale simulations, memory consumption on individual compute nodes becomes a critical bottleneck. This is especially relevant on modern supercomputers such as the Bluegene/P architecture where the available working memory per CPU core is rather limited. We develop a simple linear model to analyze the memory consumption of the constituent components of a neuronal simulator as a function of network size and the number of cores used. This approach has multiple benefits. The model enables identification of key contributing components to memory saturation and prediction of the effects of potential improvements to code before any implementation takes place.

  4. Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

    Science.gov (United States)

    Leamy, Michael J.; Springer, Adam C.

    In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.

  5. Cyclic executive for safety-critical Java on chip-multiprocessors

    DEFF Research Database (Denmark)

    Ravn, Anders P.; Schoeberl, Martin

    2010-01-01

    , that uses model checking to find a static schedule, if one exists at all, which gives an implementation of a table driven multiprocessor scheduler. To evaluate the proposed cyclic executive for multiprocessors we have implemented it in the context of safety-critical Java on a Java processor....

  6. A parallelization study of the general purpose Monte Carlo code MCNP4 on a distributed memory highly parallel computer

    International Nuclear Information System (INIS)

    Yamazaki, Takao; Fujisaki, Masahide; Okuda, Motoi; Takano, Makoto; Masukawa, Fumihiro; Naito, Yoshitaka

    1993-01-01

    The general purpose Monte Carlo code MCNP4 has been implemented on the Fujitsu AP1000 distributed memory highly parallel computer. Parallelization techniques developed and studied are reported. A shielding analysis function of the MCNP4 code is parallelized in this study. A technique to map a history to each processor dynamically and to map control process to a certain processor was applied. The efficiency of parallelized code is up to 80% for a typical practical problem with 512 processors. These results demonstrate the advantages of a highly parallel computer to the conventional computers in the field of shielding analysis by Monte Carlo method. (orig.)

  7. Energy-Aware Real-Time Task Scheduling for Heterogeneous Multiprocessors with Particle Swarm Optimization Algorithm

    Directory of Open Access Journals (Sweden)

    Weizhe Zhang

    2014-01-01

    Full Text Available Energy consumption in computer systems has become a more and more important issue. High energy consumption has already damaged the environment to some extent, especially in heterogeneous multiprocessors. In this paper, we first formulate and describe the energy-aware real-time task scheduling problem in heterogeneous multiprocessors. Then we propose a particle swarm optimization (PSO based algorithm, which can successfully reduce the energy cost and the time for searching feasible solutions. Experimental results show that the PSO-based energy-aware metaheuristic uses 40%–50% less energy than the GA-based and SFLA-based algorithms and spends 10% less time than the SFLA-based algorithm in finding the solutions. Besides, it can also find 19% more feasible solutions than the SFLA-based algorithm.

  8. Multiprocessor Global Scheduling on Frame-Based DVFS Systems

    OpenAIRE

    Berten, Vandy; Goossens, Joël

    2008-01-01

    International audience; In this work, we are interested in multiprocessor energy efficient systems where task durations are not known in advance but are known stochastically. More precisely we consider global scheduling algorithms for frame-based multiprocessor stochastic DVFS (Dynamic Voltage and Frequency Scaling) systems. Moreover we consider processors with a discrete set of available frequencies. We provide a global scheduling algorithm, and formally show that no deadline will ever be mi...

  9. System-Level Design Methodologies for Networked Multiprocessor Systems-on-Chip

    DEFF Research Database (Denmark)

    Virk, Kashif Munir

    2008-01-01

    is the first such attempt in the published literature. The second part of the thesis deals with the issues related to the development of system-level design methodologies for networked multiprocessor systems-on-chip at various levels of design abstraction with special focus on the modeling and design...... at the system-level. The multiprocessor modeling framework is then extended to include models of networked multiprocessor systems-on-chip which is then employed to model wireless sensor networks both at the sensor node level as well as the wireless network level. In the third and the final part, the thesis...... to the transaction-level model. The thesis, as a whole makes contributions by describing a design methodology for networked multiprocessor embedded systems at three layers of abstraction from system-level through transaction-level to the cycle accurate level as well as demonstrating it practically by implementing...

  10. Domain Decomposition: A Bridge between Nature and Parallel Computers

    Science.gov (United States)

    1992-09-01

    B., "Domain Decomposition Algorithms for Indefinite Elliptic Problems," S"IAM Journal of S; cientific and Statistical (’omputing, Vol. 13, 1992, pp...AD-A256 575 NASA Contractor Report 189709 ICASE Report No. 92-44 ICASE DOMAIN DECOMPOSITION: A BRIDGE BETWEEN NATURE AND PARALLEL COMPUTERS DTIC dE...effectively implemented on dis- tributed memory multiprocessors. In 1990 (as reported in Ref. 38 using the tile algo- rithm), a 103,201-unknown 2D elliptic

  11. Simulation of Particulate Flows on Multi-Processor Machines with Distributed Memory

    International Nuclear Information System (INIS)

    Uhlmann, M.

    2004-01-01

    We present a method for the parallelization of an immersed boundary algorithm for particulate flows using the MPI standard of communication. The treatment of the fluid phase uses the domain decomposition technique over a Cartesian processor grid. The solution of the Hehnholtz problem is approximately factorized an relies upon apparel tri-diagonal solver; the Poisson problem is solved by means of a parallel multi-grid technique simulator MUDPACK. For the solid phase we employ a master-slaves technique where one process or handles all the particles contained in its Eulerian fluid sub-domain and zero or more neighbor processors collaborate in the computation of particle-related quantities whenever a particle position overlaps the boundary of a sub- do mam.The parallel efficiency for some preliminary computations is presented. (Author) 9 refs

  12. Investigation of implementing a synchronization protocol under multiprocessors hierarchical scheduling

    NARCIS (Netherlands)

    Nemati, F.; Behnam, M.; Bril, R.J.

    2009-01-01

    In the multi-core and multiprocessor domain, there has been considerable work done on scheduling techniques assuming that real-time tasks are independent. In practice a typical real-time system usually share logical resources among tasks. However, synchronization in the multiprocessor area has not

  13. Realtime multiprocessor for mobile ad hoc networks

    Directory of Open Access Journals (Sweden)

    T. Jungeblut

    2008-05-01

    Full Text Available This paper introduces a real-time Multiprocessor System-On-Chip (MPSoC for low power wireless applications. The multiprocessor is based on eight 32bit RISC processors that are connected via an Network-On-Chip (NoC. The NoC follows a novel approach with guaranteed bandwidth to the application that meets hard realtime requirements. At a clock frequency of 100 MHz the total power consumption of the MPSoC that has been fabricated in 180 nm UMC standard cell technology is 772 mW.

  14. Scalable shared-memory multiprocessing

    CERN Document Server

    Lenoski, Daniel E

    1995-01-01

    Dr. Lenoski and Dr. Weber have experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. They were key contributors to the architecture and design of the DASH multiprocessor. Currently, they are involved with commercializing scalable shared-memory technology.

  15. Hardware locks for a real-time Java chip multiprocessor

    DEFF Research Database (Denmark)

    Strøm, Torur Biskopstø; Puffitsch, Wolfgang; Schoeberl, Martin

    2016-01-01

    A software locking mechanism commonly protects shared resources for multithreaded applications. This mechanism can, especially in chip-multiprocessor systems, result in a large synchronization overhead. For real-time systems in particular, this overhead increases the worst-case execution time....... This improvement can allow a larger number of real-time tasks to be reliably scheduled on a multiprocessor real-time platform....

  16. Best Speed Fit EDF Scheduling for Performance Asymmetric Multiprocessors

    Directory of Open Access Journals (Sweden)

    Peng Wu

    2017-01-01

    Full Text Available In order to improve the performance of a real-time system, asymmetric multiprocessors have been proposed. The benefits of improved system performance and reduced power consumption from such architectures cannot be fully exploited unless suitable task scheduling and task allocation approaches are implemented at the operating system level. Unfortunately, most of the previous research on scheduling algorithms for performance asymmetric multiprocessors is focused on task priority assignment. They simply assign the highest priority task to the fastest processor. In this paper, we propose BSF-EDF (best speed fit for earliest deadline first for performance asymmetric multiprocessor scheduling. This approach chooses a suitable processor rather than the fastest one, when allocating tasks. With this proposed BSF-EDF scheduling, we also derive an effective schedulability test.

  17. Multiprocessor Priority Ceiling Emulation for Safety-Critical Java

    DEFF Research Database (Denmark)

    Strøm, Torur Biskopstø; Schoeberl, Martin

    2015-01-01

    Priority ceiling emulation has preferable properties on uniprocessor systems, such as avoiding priority inversion and being deadlock free. This has made it a popular locking protocol. According to the safety-critical Java specication, priority ceiling emulation is a requirement for implementations....... However, implementing the protocol for multiprocessor systemsis more complex so implementations might perform worse than non-preemptive implementations. In this paper we compare two multiprocessor lock implementations with hardware support for the Java optimized processor: non-preemptive locking...

  18. Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

    Science.gov (United States)

    Quealy, Angela; Cole, Gary L.; Blech, Richard A.

    1993-01-01

    The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.

  19. FPGA-based distributed computing microarchitecture for complex physical dynamics investigation.

    Science.gov (United States)

    Borgese, Gianluca; Pace, Calogero; Pantano, Pietro; Bilotta, Eleonora

    2013-09-01

    In this paper, we present a distributed computing system, called DCMARK, aimed at solving partial differential equations at the basis of many investigation fields, such as solid state physics, nuclear physics, and plasma physics. This distributed architecture is based on the cellular neural network paradigm, which allows us to divide the differential equation system solving into many parallel integration operations to be executed by a custom multiprocessor system. We push the number of processors to the limit of one processor for each equation. In order to test the present idea, we choose to implement DCMARK on a single FPGA, designing the single processor in order to minimize its hardware requirements and to obtain a large number of easily interconnected processors. This approach is particularly suited to study the properties of 1-, 2- and 3-D locally interconnected dynamical systems. In order to test the computing platform, we implement a 200 cells, Korteweg-de Vries (KdV) equation solver and perform a comparison between simulations conducted on a high performance PC and on our system. Since our distributed architecture takes a constant computing time to solve the equation system, independently of the number of dynamical elements (cells) of the CNN array, it allows us to reduce the elaboration time more than other similar systems in the literature. To ensure a high level of reconfigurability, we design a compact system on programmable chip managed by a softcore processor, which controls the fast data/control communication between our system and a PC Host. An intuitively graphical user interface allows us to change the calculation parameters and plot the results.

  20. Fundamental Parallel Algorithms for Private-Cache Chip Multiprocessors

    DEFF Research Database (Denmark)

    Arge, Lars Allan; Goodrich, Michael T.; Nelson, Michael

    2008-01-01

    about the way cores are interconnected, for we assume that all inter-processor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present...... two sorting algorithms, a distribution sort and a mergesort. Our algorithms are asymptotically optimal in terms of parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks....... In addition, we study sorting lower bounds in a computational model, which we call the parallel external-memory (PEM) model, that formalizes the essential properties of our algorithms for private-cache CMPs....

  1. Embedded software design and programming of multiprocessor system-on-chip simulink and system C case studies

    CERN Document Server

    Popovici, Katalin; Jerraya, Ahmed A; Wolf, Marilyn

    2010-01-01

    Current multimedia and telecom applications require complex, heterogeneous multiprocessor system on chip (MPSoC) architectures with specific communication infrastructure in order to achieve the required performance. Heterogeneous MPSoC includes different types of processing units (DSP, microcontroller, ASIP) and different communication schemes (fast links, non standard memory organization and access).Programming an MPSoC requires the generation of efficient software running on MPSoC from a high level environment, by using the characteristics of the architecture. This task is known to be tediou

  2. The performance of disk arrays in shared-memory database machines

    Science.gov (United States)

    Katz, Randy H.; Hong, Wei

    1993-01-01

    In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.

  3. PGHPF – An Optimizing High Performance Fortran Compiler for Distributed Memory Machines

    Directory of Open Access Journals (Sweden)

    Zeki Bozkus

    1997-01-01

    Full Text Available High Performance Fortran (HPF is the first widely supported, efficient, and portable parallel programming language for shared and distributed memory systems. HPF is realized through a set of directive-based extensions to Fortran 90. It enables application developers and Fortran end-users to write compact, portable, and efficient software that will compile and execute on workstations, shared memory servers, clusters, traditional supercomputers, or massively parallel processors. This article describes a production-quality HPF compiler for a set of parallel machines. Compilation techniques such as data and computation distribution, communication generation, run-time support, and optimization issues are elaborated as the basis for an HPF compiler implementation on distributed memory machines. The performance of this compiler on benchmark programs demonstrates that high efficiency can be achieved executing HPF code on parallel architectures.

  4. Migration of vectorized iterative solvers to distributed memory architectures

    Energy Technology Data Exchange (ETDEWEB)

    Pommerell, C. [AT& T Bell Labs., Murray Hill, NJ (United States); Ruehl, R. [CSCS-ETH, Manno (Switzerland)

    1994-12-31

    Both necessity and opportunity motivate the use of high-performance computers for iterative linear solvers. Necessity results from the size of the problems being solved-smaller problems are often better handled by direct methods. Opportunity arises from the formulation of the iterative methods in terms of simple linear algebra operations, even if this {open_quote}natural{close_quotes} parallelism is not easy to exploit in irregularly structured sparse matrices and with good preconditioners. As a result, high-performance implementations of iterative solvers have attracted a lot of interest in recent years. Most efforts are geared to vectorize or parallelize the dominating operation-structured or unstructured sparse matrix-vector multiplication, or to increase locality and parallelism by reformulating the algorithm-reducing global synchronization in inner products or local data exchange in preconditioners. Target architectures for iterative solvers currently include mostly vector supercomputers and architectures with one or few optimized (e.g., super-scalar and/or super-pipelined RISC) processors and hierarchical memory systems. More recently, parallel computers with physically distributed memory and a better price/performance ratio have been offered by vendors as a very interesting alternative to vector supercomputers. However, programming comfort on such distributed memory parallel processors (DMPPs) still lags behind. Here the authors are concerned with iterative solvers and their changing computing environment. In particular, they are considering migration from traditional vector supercomputers to DMPPs. Application requirements force one to use flexible and portable libraries. They want to extend the portability of iterative solvers rather than reimplementing everything for each new machine, or even for each new architecture.

  5. Paging memory from random access memory to backing storage in a parallel computer

    Science.gov (United States)

    Archer, Charles J; Blocksome, Michael A; Inglett, Todd A; Ratterman, Joseph D; Smith, Brian E

    2013-05-21

    Paging memory from random access memory (`RAM`) to backing storage in a parallel computer that includes a plurality of compute nodes, including: executing a data processing application on a virtual machine operating system in a virtual machine on a first compute node; providing, by a second compute node, backing storage for the contents of RAM on the first compute node; and swapping, by the virtual machine operating system in the virtual machine on the first compute node, a page of memory from RAM on the first compute node to the backing storage on the second compute node.

  6. A system-level multiprocessor system-on-chip modeling framework

    DEFF Research Database (Denmark)

    Virk, Kashif Munir; Madsen, Jan

    2004-01-01

    We present a system-level modeling framework to model system-on-chips (SoC) consisting of heterogeneous multiprocessors and network-on-chip communication structures in order to enable the developers of today's SoC designs to take advantage of the flexibility and scalability of network-on-chip and...... SoC design. We show how a hand-held multimedia terminal, consisting of JPEG, MP3 and GSM applications, can be modeled as a multiprocessor SoC in our framework....

  7. The computational nature of memory modification.

    Science.gov (United States)

    Gershman, Samuel J; Monfils, Marie-H; Norman, Kenneth A; Niv, Yael

    2017-03-15

    Retrieving a memory can modify its influence on subsequent behavior. We develop a computational theory of memory modification, according to which modification of a memory trace occurs through classical associative learning, but which memory trace is eligible for modification depends on a structure learning mechanism that discovers the units of association by segmenting the stream of experience into statistically distinct clusters (latent causes). New memories are formed when the structure learning mechanism infers that a new latent cause underlies current sensory observations. By the same token, old memories are modified when old and new sensory observations are inferred to have been generated by the same latent cause. We derive this framework from probabilistic principles, and present a computational implementation. Simulations demonstrate that our model can reproduce the major experimental findings from studies of memory modification in the Pavlovian conditioning literature.

  8. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    Science.gov (United States)

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

  9. Memory intensive functional architecture for distributed computer control systems

    International Nuclear Information System (INIS)

    Dimmler, D.G.

    1983-10-01

    A memory-intensive functional architectue for distributed data-acquisition, monitoring, and control systems with large numbers of nodes has been conceptually developed and applied in several large-scale and some smaller systems. This discussion concentrates on: (1) the basic architecture; (2) recent expansions of the architecture which now become feasible in view of the rapidly developing component technologies in microprocessors and functional large-scale integration circuits; and (3) implementation of some key hardware and software structures and one system implementation which is a system for performing control and data acquisition of a neutron spectrometer at the Brookhaven High Flux Beam Reactor. The spectrometer is equipped with a large-area position-sensitive neutron detector

  10. Optimizing NEURON Simulation Environment Using Remote Memory Access with Recursive Doubling on Distributed Memory Systems.

    Science.gov (United States)

    Shehzad, Danish; Bozkuş, Zeki

    2016-01-01

    Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI) is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA) by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models.

  11. Optimizing NEURON Simulation Environment Using Remote Memory Access with Recursive Doubling on Distributed Memory Systems

    Directory of Open Access Journals (Sweden)

    Danish Shehzad

    2016-01-01

    Full Text Available Increase in complexity of neuronal network models escalated the efforts to make NEURON simulation environment efficient. The computational neuroscientists divided the equations into subnets amongst multiple processors for achieving better hardware performance. On parallel machines for neuronal networks, interprocessor spikes exchange consumes large section of overall simulation time. In NEURON for communication between processors Message Passing Interface (MPI is used. MPI_Allgather collective is exercised for spikes exchange after each interval across distributed memory systems. The increase in number of processors though results in achieving concurrency and better performance but it inversely affects MPI_Allgather which increases communication time between processors. This necessitates improving communication methodology to decrease the spikes exchange time over distributed memory systems. This work has improved MPI_Allgather method using Remote Memory Access (RMA by moving two-sided communication to one-sided communication, and use of recursive doubling mechanism facilitates achieving efficient communication between the processors in precise steps. This approach enhanced communication concurrency and has improved overall runtime making NEURON more efficient for simulation of large neuronal network models.

  12. The computational nature of memory modification

    Science.gov (United States)

    Gershman, Samuel J; Monfils, Marie-H; Norman, Kenneth A; Niv, Yael

    2017-01-01

    Retrieving a memory can modify its influence on subsequent behavior. We develop a computational theory of memory modification, according to which modification of a memory trace occurs through classical associative learning, but which memory trace is eligible for modification depends on a structure learning mechanism that discovers the units of association by segmenting the stream of experience into statistically distinct clusters (latent causes). New memories are formed when the structure learning mechanism infers that a new latent cause underlies current sensory observations. By the same token, old memories are modified when old and new sensory observations are inferred to have been generated by the same latent cause. We derive this framework from probabilistic principles, and present a computational implementation. Simulations demonstrate that our model can reproduce the major experimental findings from studies of memory modification in the Pavlovian conditioning literature. DOI: http://dx.doi.org/10.7554/eLife.23763.001 PMID:28294944

  13. One-Step Programmable Arbiters for Multiprocessors

    DEFF Research Database (Denmark)

    Højberg, Kristian Søe

    1978-01-01

    When processors in a multiprocessor system demand service from a shared bus in an asynchronous mode, a synchronous state arbiter resolves conflicts and allocates resources. Independent of the combination of requests, only one state transition is required from a free to allocated resource...

  14. Blocking Optimality in Distributed Real-Time Locking Protocols

    Directory of Open Access Journals (Sweden)

    Björn Bernhard Brandenburg

    2014-09-01

    Full Text Available Lower and upper bounds on the maximum priority inversion blocking (pi-blocking that is generally unavoidable in distributed multiprocessor real-time locking protocols (where resources may be accessed only from specific synchronization processors are established. Prior work on suspension-based shared-memory multiprocessor locking protocols (which require resources to be accessible from all processors has established asymptotically tight bounds of Ω(m and Ω(n maximum pi-blocking under suspension-oblivious and suspension-aware analysis, respectively, where m denotes the total number of processors and n denotes the number of tasks. In this paper, it is shown that, in the case of distributed semaphore protocols, there exist two different task allocation scenarios that give rise to distinct lower bounds. In the case of co-hosted task allocation, where application tasks may also be assigned to synchronization processors (i.e., processors hosting critical sections, Ω(Φ · n maximum pi-blocking is unavoidable for some tasks under any locking protocol under both suspension-aware and suspension-oblivious schedulability analysis, where Φ denotes the ratio of the maximum response time to the shortest period. In contrast, in the case of disjoint task allocation (i.e., if application tasks may not be assigned to synchronization processors, only Ω(m and Ω(n maximum pi-blocking is fundamentally unavoidable under suspension-oblivious and suspension-aware analysis, respectively, as in the shared-memory case. These bounds are shown to be asymptotically tight with the construction of two new distributed real-time locking protocols that ensure O(m and O(n maximum pi-blocking under suspension-oblivious and suspension-aware analysis, respectively.

  15. Prefetching in file systems for MIMD multiprocessors

    Science.gov (United States)

    Kotz, David F.; Ellis, Carla Schlatter

    1990-01-01

    The question of whether prefetching blocks on the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions, is considered. Experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor. Results of these experiments suggest that (1) the hit ratio, the accepted measure in traditional caching studies, may not be an adequate measure of performance when the workload consists of parallel computations and parallel file access patterns, (2) caching with prefetching can significantly improve the hit ratio and the average time to perform an I/O (input/output) operation, and (3) an improvement in overall execution time has been observed in most cases. In spite of these gains, prefetching sometimes results in increased execution times (a negative result, given the optimistic nature of the study). The authors explore why it is not trivial to translate savings on individual I/O requests into consistently better overall performance and identify the key problems that need to be addressed in order to improve the potential of prefetching techniques in the environment.

  16. Parallel diffusion calculation for the PHAETON on-line multiprocessor computer

    International Nuclear Information System (INIS)

    Collart, J.M.; Fedon-Magnaud, C.; Lautard, J.J.

    1987-04-01

    The aim of the PHAETON project is the design of an on-line computer in order to increase the immediate knowledge of the main operating and safety parameters in power plants. A significant stage is the computation of the three dimensional flux distribution. For cost and safety reason a computer based on a parallel microprocessor architecture has been studied. This paper presents a first approach to parallelized three dimensional diffusion calculation. A computing software has been written and built in a four processors demonstrator. We present the realization in progress, concerning the final equipment. 8 refs

  17. Parallel structures in human and computer memory

    Science.gov (United States)

    Kanerva, Pentti

    1986-08-01

    If we think of our experiences as being recorded continuously on film, then human memory can be compared to a film library that is indexed by the contents of the film strips stored in it. Moreover, approximate retrieval cues suffice to retrieve information stored in this library: We recognize a familiar person in a fuzzy photograph or a familiar tune played on a strange instrument. This paper is about how to construct a computer memory that would allow a computer to recognize patterns and to recall sequences the way humans do. Such a memory is remarkably similar in structure to a conventional computer memory and also to the neural circuits in the cortex of the cerebellum of the human brain. The paper concludes that the frame problem of artificial intelligence could be solved by the use of such a memory if we were able to encode information about the world properly.

  18. Exact distributions of two-sample rank statistics and block rank statistics using computer algebra

    NARCIS (Netherlands)

    Wiel, van de M.A.

    1998-01-01

    We derive generating functions for various rank statistics and we use computer algebra to compute the exact null distribution of these statistics. We present various techniques for reducing time and memory space used by the computations. We use the results to write Mathematica notebooks for

  19. On the Scalability of Time-predictable Chip-Multiprocessing

    DEFF Research Database (Denmark)

    Puffitsch, Wolfgang; Schoeberl, Martin

    2012-01-01

    Real-time systems need a time-predictable execution platform to be able to determine the worst-case execution time statically. In order to be time-predictable, several advanced processor features, such as out-of-order execution and other forms of speculation, have to be avoided. However, just using...... simple processors is not an option for embedded systems with high demands on computing power. In order to provide high performance and predictability we argue to use multiprocessor systems with a time-predictable memory interface. In this paper we present the scalability of a Java chip......-multiprocessor system that is designed to be time-predictable. Adding time-predictable caches is mandatory to achieve scalability with a shared memory multi-processor system. As Java bytecode retains information about the nature of memory accesses, it is possible to implement a memory hierarchy that takes...

  20. Computational design of RNA parts, devices, and transcripts with kinetic folding algorithms implemented on multiprocessor clusters.

    Science.gov (United States)

    Thimmaiah, Tim; Voje, William E; Carothers, James M

    2015-01-01

    With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts.

  1. PSHED: a simplified approach to developing parallel programs

    International Nuclear Information System (INIS)

    Mahajan, S.M.; Ramesh, K.; Rajesh, K.; Somani, A.; Goel, M.

    1992-01-01

    This paper presents a simplified approach in the forms of a tree structured computational model for parallel application programs. An attempt is made to provide a standard user interface to execute programs on BARC Parallel Processing System (BPPS), a scalable distributed memory multiprocessor. The interface package called PSHED provides a basic framework for representing and executing parallel programs on different parallel architectures. The PSHED package incorporates concepts from a broad range of previous research in programming environments and parallel computations. (author). 6 refs

  2. Memory systems, computation, and the second law of thermodynamics

    International Nuclear Information System (INIS)

    Wolpert, D.H.

    1992-01-01

    A memory is a physical system for transferring information form one moment in time to another, where that information concerns something external to the system itself. This paper argues on information-theoretic and statistical mechanical grounds that useful memories must be of one of two types, exemplified by memory in abstract computer programs and by memory in photographs. Photograph-type memories work by exploring a collapse of state space flow to an attractor state. (This attractor state is the open-quotes initializedclose quotes state of the memory.) The central assumption of the theory of reversible computation tells us that in any such collapsing, regardless of whether the collapsing must increase in entropy of the system. In concert with the second law, this establishes the logical necessity of the empirical observation that photograph-type memories are temporally asymmetric (they can tell us about the past but not about the future). Under the assumption that human memory is a photograph-type memory, this result also explains why we humans can remember only our past and not our future. In contrast to photo-graph-type memories, computer-type memories do not require any initialization, and therefore are not directly affected by the second law. As a result, computer memories can be of the future as easily as of the past, even if the program running on the computer is logically irreversible. This is entirely in accord with the well-known temporal reversibility of the process of computation. This paper ends by arguing that the asymmetry of the psychological arrow of time is a direct consequence of the asymmetry of human memory. With the rest of this paper, this explains, explicitly and rigorously, why the psychological and thermodynamic arrows of time are correlated with one another. 24 refs

  3. Distributed Optimization System

    Science.gov (United States)

    Hurtado, John E.; Dohrmann, Clark R.; Robinett, III, Rush D.

    2004-11-30

    A search system and method for controlling multiple agents to optimize an objective using distributed sensing and cooperative control. The search agent can be one or more physical agents, such as a robot, and can be software agents for searching cyberspace. The objective can be: chemical sources, temperature sources, radiation sources, light sources, evaders, trespassers, explosive sources, time dependent sources, time independent sources, function surfaces, maximization points, minimization points, and optimal control of a system such as a communication system, an economy, a crane, and a multi-processor computer.

  4. Self-Testing Computer Memory

    Science.gov (United States)

    Chau, Savio, N.; Rennels, David A.

    1988-01-01

    Memory system for computer repeatedly tests itself during brief, regular interruptions of normal processing of data. Detects and corrects transient faults as single-event upsets (changes in bits due to ionizing radiation) within milliseconds after occuring. Self-testing concept surpasses conventional by actively flushing latent defects out of memory and attempting to correct before accumulating beyond capacity for self-correction or detection. Cost of improvement modest increase in complexity of circuitry and operating time.

  5. Supporting Multiprocessors in the Icecap Safety-Critical Java Run-Time Environment

    DEFF Research Database (Denmark)

    Zhao, Shuai; Wellings, Andy; Korsholm, Stephan Erbs

    The current version of the Safety Critical Java (SCJ) specification defines three compliance levels. Level 0 targets single processor programs while Level 1 and 2 can support multiprocessor platforms. Level 1 programs must be fully partitioned but Level 2 programs can also be more globally...... scheduled. As of yet, there is no official Reference Implementation for SCJ. However, the icecap project has produced a Safety-Critical Java Run-time Environment based on the Hardware-near Virtual Machine (HVM). This supports SCJ at all compliance levels and provides an implementation of the safety......-critical Java (javax.safetycritical) package. This is still work-in-progress and lacks certain key features. Among these is the ability to support multiprocessor platforms. In this paper, we explore two possible options to adding multiprocessor support to this environment: the “green thread” and the “native...

  6. Parallel iterative solution of the Hermite Collocation equations on GPUs II

    International Nuclear Information System (INIS)

    Vilanakis, N; Mathioudakis, E

    2014-01-01

    Hermite Collocation is a high order finite element method for Boundary Value Problems modelling applications in several fields of science and engineering. Application of this integration free numerical solver for the solution of linear BVPs results in a large and sparse general system of algebraic equations, suggesting the usage of an efficient iterative solver especially for realistic simulations. In part I of this work an efficient parallel algorithm of the Schur complement method coupled with Bi-Conjugate Gradient Stabilized (BiCGSTAB) iterative solver has been designed for multicore computing architectures with a Graphics Processing Unit (GPU). In the present work the proposed algorithm has been extended for high performance computing environments consisting of multiprocessor machines with multiple GPUs. Since this is a distributed GPU and shared CPU memory parallel architecture, a hybrid memory treatment is needed for the development of the parallel algorithm. The realization of the algorithm took place on a multiprocessor machine HP SL390 with Tesla M2070 GPUs using the OpenMP and OpenACC standards. Execution time measurements reveal the efficiency of the parallel implementation

  7. Lifetime-Based Memory Management for Distributed Data Processing Systems

    DEFF Research Database (Denmark)

    Lu, Lu; Shi, Xuanhua; Zhou, Yongluan

    2016-01-01

    create a large amount of long-living data objects in the heap, which may quickly saturate the garbage collector, especially when handling a large dataset, and hence would limit the scalability of the system. To eliminate this problem, we propose a lifetime-based memory management framework, which...... the garbage collection time by up to 99.9%, 2) to achieve up to 22.7x speed up in terms of execution time in cases without data spilling and 41.6x speedup in cases with data spilling, and 3) to consume up to 46.6% less memory.......In-memory caching of intermediate data and eager combining of data in shuffle buffers have been shown to be very effective in minimizing the re-computation and I/O cost in distributed data processing systems like Spark and Flink. However, it has also been widely reported that these techniques would...

  8. Processor tradeoffs in distributed real-time systems

    Science.gov (United States)

    Krishna, C. M.; Shin, Kang G.; Bhandari, Inderpal S.

    1987-01-01

    The problem of the optimization of the design of real-time distributed systems is examined with reference to a class of computer architectures similar to the continuously reconfigurable multiprocessor flight control system structure, CM2FCS. Particular attention is given to the impact of processor replacement and the burn-in time on the probability of dynamic failure and mean cost. The solution is obtained numerically and interpreted in the context of real-time applications.

  9. ARTiS, an Asymmetric Real-Time Scheduler for Linux on Multi-Processor Architectures

    OpenAIRE

    Piel , Éric; Marquet , Philippe; Soula , Julien; Osuna , Christophe; Dekeyser , Jean-Luc

    2005-01-01

    The ARTiS system is a real-time extension of the GNU/Linux scheduler dedicated to SMP (Symmetric Multi-Processors) systems. It allows to mix High Performance Computing and real-time. ARTiS exploits the SMP architecture to guarantee the preemption of a processor when the system has to schedule a real-time task. The implementation is available as a modification of the Linux kernel, especially focusing (but not restricted to) IA-64 architecture. The basic idea of ARTiS is to assign a selected se...

  10. Parallel Breadth-First Search on Distributed Memory Systems

    Energy Technology Data Exchange (ETDEWEB)

    Computational Research Division; Buluc, Aydin; Madduri, Kamesh

    2011-04-15

    Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for Breadth-First Search (BFS), a key subroutine in several graph algorithms. We present two highly-tuned par- allel approaches for BFS on large parallel systems: a level-synchronous strategy that relies on a simple vertex-based partitioning of the graph, and a two-dimensional sparse matrix- partitioning-based approach that mitigates parallel commu- nication overhead. For both approaches, we also present hybrid versions with intra-node multithreading. Our novel hybrid two-dimensional algorithm reduces communication times by up to a factor of 3.5, relative to a common vertex based approach. Our experimental study identifies execu- tion regimes in which these approaches will be competitive, and we demonstrate extremely high performance on lead- ing distributed-memory parallel systems. For instance, for a 40,000-core parallel execution on Hopper, an AMD Magny- Cours based system, we achieve a BFS performance rate of 17.8 billion edge visits per second on an undirected graph of 4.3 billion vertices and 68.7 billion edges with skewed degree distribution.

  11. Large scale particle simulations in a virtual memory computer

    International Nuclear Information System (INIS)

    Gray, P.C.; Million, R.; Wagner, J.S.; Tajima, T.

    1983-01-01

    Virtual memory computers are capable of executing large-scale particle simulations even when the memory requirements exceeds the computer core size. The required address space is automatically mapped onto slow disc memory the the operating system. When the simulation size is very large, frequent random accesses to slow memory occur during the charge accumulation and particle pushing processes. Assesses to slow memory significantly reduce the excecution rate of the simulation. We demonstrate in this paper that with the proper choice of sorting algorithm, a nominal amount of sorting to keep physically adjacent particles near particles with neighboring array indices can reduce random access to slow memory, increase the efficiency of the I/O system, and hence, reduce the required computing time. (orig.)

  12. Large-scale particle simulations in a virtual-memory computer

    International Nuclear Information System (INIS)

    Gray, P.C.; Wagner, J.S.; Tajima, T.; Million, R.

    1982-08-01

    Virtual memory computers are capable of executing large-scale particle simulations even when the memory requirements exceed the computer core size. The required address space is automatically mapped onto slow disc memory by the operating system. When the simulation size is very large, frequent random accesses to slow memory occur during the charge accumulation and particle pushing processes. Accesses to slow memory significantly reduce the execution rate of the simulation. We demonstrate in this paper that with the proper choice of sorting algorithm, a nominal amount of sorting to keep physically adjacent particles near particles with neighboring array indices can reduce random access to slow memory, increase the efficiency of the I/O system, and hence, reduce the required computing time

  13. Total recall in distributive associative memories

    Science.gov (United States)

    Danforth, Douglas G.

    1991-01-01

    Iterative error correction of asymptotically large associative memories is equivalent to a one-step learning rule. This rule is the inverse of the activation function of the memory. Spectral representations of nonlinear activation functions are used to obtain the inverse in closed form for Sparse Distributed Memory, Selected-Coordinate Design, and Radial Basis Functions.

  14. Reproducibility in a multiprocessor system

    Science.gov (United States)

    Bellofatto, Ralph A; Chen, Dong; Coteus, Paul W; Eisley, Noel A; Gara, Alan; Gooding, Thomas M; Haring, Rudolf A; Heidelberger, Philip; Kopcsay, Gerard V; Liebsch, Thomas A; Ohmacht, Martin; Reed, Don D; Senger, Robert M; Steinmacher-Burow, Burkhard; Sugawara, Yutaka

    2013-11-26

    Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed; a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.

  15. Lower bounds for the head-body-tail problem on parallel machines: a computational study for the multiprocessor flow shop

    NARCIS (Netherlands)

    A. Vandevelde; J.A. Hoogeveen; C.A.J. Hurkens (Cor); J.K. Lenstra (Jan Karel)

    2005-01-01

    htmlabstractThe multiprocessor flow-shop is the generalization of the flow-shop in which each machine is replaced by a set of identical machines. As finding a minimum-length schedule is NP-hard, we set out to find good lower and upper bounds. The lower bounds are based on relaxation of the

  16. Ring-array processor distribution topology for optical interconnects

    Science.gov (United States)

    Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.

    1992-01-01

    The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.

  17. Distributed plant simulator with two-phase flow analysis code using drift-flux non-equilibrium model for pressurized water reactors

    International Nuclear Information System (INIS)

    Yamamoto, Takaya; Kitamura, Masashi; Ohi, Tadashi; Akagi, Katsumi

    1999-01-01

    As advanced monitoring and controlling systems, such as the advanced main control console and the operator support system have been developed, real-time simulators' simulation accuracy must be improved and simulation limits must be extended. Therefore the authors have developed a distributed simulation system to achieve high processing performance using low cost hardware. Moreover, the authors have developed a thermal-hydraulic computer code, using drift-flux non-equilibrium model, which can realize a high precision two-phase flow analysis, which is considered to have the same prediction capability as two-fluid models, while achieving high speed and stability for real-time simulators. The distributed plant simulator for PWR plants was realized as a result. The distributed simulator consists of multi-processors connected to each other by an optical fiber network. Controlling software for synchronized scheduling and memory transfer was also developed. The simulation results of the four loop PWR simulator are compared with experimental data and real plant data; the agreement is satisfactory for a plant simulator. The simulation speed is also satisfactory being twice as fast as real-time. (author)

  18. A Transparent Runtime Data Distribution Engine for OpenMP

    Directory of Open Access Journals (Sweden)

    Dimitrios S. Nikolopoulos

    2000-01-01

    Full Text Available This paper makes two important contributions. First, the paper investigates the performance implications of data placement in OpenMP programs running on modern NUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of contemporary NUMA architectures, reasonably balanced page placement schemes, such as round-robin or random distribution, incur modest performance losses. Second, the paper presents a transparent, user-level page migration engine with an ability to gain back any performance loss that stems from suboptimal placement of pages in iterative OpenMP programs. The main body of the paper describes how our OpenMP runtime environment uses page migration for implementing implicit data distribution and redistribution schemes without programmer intervention. Our experimental results verify the effectiveness of the proposed framework and provide a proof of concept that it is not necessary to introduce data distribution directives in OpenMP and warrant the simplicity or the portability of the programming model.

  19. Computing betweenness centrality in external memory

    DEFF Research Database (Denmark)

    Arge, Lars; Goodrich, Michael T.; Walderveen, Freek van

    2013-01-01

    Betweenness centrality is one of the most well-known measures of the importance of nodes in a social-network graph. In this paper we describe the first known external-memory and cache-oblivious algorithms for computing betweenness centrality. We present four different external-memory algorithms...

  20. Human Memory Organization for Computer Programs.

    Science.gov (United States)

    Norcio, A. F.; Kerst, Stephen M.

    1983-01-01

    Results of study investigating human memory organization in processing of computer programming languages indicate that algorithmic logic segments form a cognitive organizational structure in memory for programs. Statement indentation and internal program documentation did not enhance organizational process of recall of statements in five Fortran…

  1. Optical interconnection networks for high-performance computing systems

    International Nuclear Information System (INIS)

    Biberman, Aleksandr; Bergman, Keren

    2012-01-01

    Enabled by silicon photonic technology, optical interconnection networks have the potential to be a key disruptive technology in computing and communication industries. The enduring pursuit of performance gains in computing, combined with stringent power constraints, has fostered the ever-growing computational parallelism associated with chip multiprocessors, memory systems, high-performance computing systems and data centers. Sustaining these parallelism growths introduces unique challenges for on- and off-chip communications, shifting the focus toward novel and fundamentally different communication approaches. Chip-scale photonic interconnection networks, enabled by high-performance silicon photonic devices, offer unprecedented bandwidth scalability with reduced power consumption. We demonstrate that the silicon photonic platforms have already produced all the high-performance photonic devices required to realize these types of networks. Through extensive empirical characterization in much of our work, we demonstrate such feasibility of waveguides, modulators, switches and photodetectors. We also demonstrate systems that simultaneously combine many functionalities to achieve more complex building blocks. We propose novel silicon photonic devices, subsystems, network topologies and architectures to enable unprecedented performance of these photonic interconnection networks. Furthermore, the advantages of photonic interconnection networks extend far beyond the chip, offering advanced communication environments for memory systems, high-performance computing systems, and data centers. (review article)

  2. Lower bounds for the head-body-tail problem on parallel machines : a computational study of the multiprocessor flow shop

    NARCIS (Netherlands)

    Vandevelde, A.; Hoogeveen, J.A.; Hurkens, C.A.J.; Lenstra, J.K.

    2005-01-01

    The multiprocessor flow-shop is the generalization of the flow-shop in which each machine is replaced by a set of identical machines. As finding a minimum-length schedule is NP-hard, we set out to find good lower and upper bounds. The lower bounds are based on relaxation of the capacities of all

  3. Overview of the Scalable Coherent Interface, IEEE STD 1596 (SCI)

    International Nuclear Information System (INIS)

    Gustavson, D.B.; James, D.V.; Wiggers, H.A.

    1992-10-01

    The Scalable Coherent Interface standard defines a new generation of interconnection that spans the full range from supercomputer memory 'bus' to campus-wide network. SCI provides bus-like services and a shared-memory software model while using an underlying, packet protocol on many independent communication links. Initially these links are 1 GByte/s (wires) and 1 GBit/s (fiber), but the protocol scales well to future faster or lower-cost technologies. The interconnect may use switches, meshes, and rings. The SCI distributed-shared-memory model is simple and versatile, enabling for the first time a smooth integration of highly parallel multiprocessors, workstations, personal computers, I/O, networking and data acquisition

  4. The FORCE: A portable parallel programming language supporting computational structural mechanics

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Brehm, Juergen; Ramanan, Aruna

    1989-01-01

    This project supports the conversion of codes in Computational Structural Mechanics (CSM) to a parallel form which will efficiently exploit the computational power available from multiprocessors. The work is a part of a comprehensive, FORTRAN-based system to form a basis for a parallel version of the NICE/SPAR combination which will form the CSM Testbed. The software is macro-based and rests on the force methodology developed by the principal investigator in connection with an early scientific multiprocessor. Machine independence is an important characteristic of the system so that retargeting it to the Flex/32, or any other multiprocessor on which NICE/SPAR might be imnplemented, is well supported. The principal investigator has experience in producing parallel software for both full and sparse systems of linear equations using the force macros. Other researchers have used the Force in finite element programs. It has been possible to rapidly develop software which performs at maximum efficiency on a multiprocessor. The inherent machine independence of the system also means that the parallelization will not be limited to a specific multiprocessor.

  5. Multi-processor network implementations in Multibus II and VME

    International Nuclear Information System (INIS)

    Briegel, C.

    1992-01-01

    ACNET (Fermilab Accelerator Controls Network), a proprietary network protocol, is implemented in a multi-processor configuration for both Multibus II and VME. The implementations are contrasted by the bus protocol and software design goals. The Multibus II implementation provides for multiple processors running a duplicate set of tasks on each processor. For a network connected task, messages are distributed by a network round-robin scheduler. Further, messages can be stopped, continued, or re-routed for each task by user-callable commands. The VME implementation provides for multiple processors running one task across all processors. The process can either be fixed to a particular processor or dynamically allocated to an available processor depending on the scheduling algorithm of the multi-processing operating system. (author)

  6. Interoperable mesh components for large-scale, distributed-memory simulations

    International Nuclear Information System (INIS)

    Devine, K; Leung, V; Diachin, L; Miller, M

    2009-01-01

    SciDAC applications have a demonstrated need for advanced software tools to manage the complexities associated with sophisticated geometry, mesh, and field manipulation tasks, particularly as computer architectures move toward the petascale. In this paper, we describe a software component - an abstract data model and programming interface - designed to provide support for parallel unstructured mesh operations. We describe key issues that must be addressed to successfully provide high-performance, distributed-memory unstructured mesh services and highlight some recent research accomplishments in developing new load balancing and MPI-based communication libraries appropriate for leadership class computing. Finally, we give examples of the use of parallel adaptive mesh modification in two SciDAC applications.

  7. Operating System for Runtime Reconfigurable Multiprocessor Systems

    Directory of Open Access Journals (Sweden)

    Diana Göhringer

    2011-01-01

    Full Text Available Operating systems traditionally handle the task scheduling of one or more application instances on processor-like hardware architectures. RAMPSoC, a novel runtime adaptive multiprocessor System-on-Chip, exploits the dynamic reconfiguration on FPGAs to generate, start and terminate hardware and software tasks. The hardware tasks have to be transferred to the reconfigurable hardware via a configuration access port. The software tasks can be loaded into the local memory of the respective IP core either via the configuration access port or via the on-chip communication infrastructure (e.g. a Network-on-Chip. Recent-series of Xilinx FPGAs, such as Virtex-5, provide two Internal Configuration Access Ports, which cannot be accessed simultaneously. To prevent conflicts, the access to these ports as well as the hardware resource management needs to be controlled, e.g. by a special-purpose operating system running on an embedded processor. For that purpose and to handle the relations between temporally and spatially scheduled operations, the novel approach of an operating system is of high importance. This special purpose operating system, called CAP-OS (Configuration Access Port-Operating System, which will be presented in this paper, supports the clients using the configuration port with the services of priority-based access scheduling, hardware task mapping and resource management.

  8. Contributing to the design of run-time systems dedicated to high performance computing

    International Nuclear Information System (INIS)

    Perache, M.

    2006-10-01

    In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)

  9. Dynamic computing random access memory

    International Nuclear Information System (INIS)

    Traversa, F L; Bonani, F; Pershin, Y V; Di Ventra, M

    2014-01-01

    The present von Neumann computing paradigm involves a significant amount of information transfer between a central processing unit and memory, with concomitant limitations in the actual execution speed. However, it has been recently argued that a different form of computation, dubbed memcomputing (Di Ventra and Pershin 2013 Nat. Phys. 9 200–2) and inspired by the operation of our brain, can resolve the intrinsic limitations of present day architectures by allowing for computing and storing of information on the same physical platform. Here we show a simple and practical realization of memcomputing that utilizes easy-to-build memcapacitive systems. We name this architecture dynamic computing random access memory (DCRAM). We show that DCRAM provides massively-parallel and polymorphic digital logic, namely it allows for different logic operations with the same architecture, by varying only the control signals. In addition, by taking into account realistic parameters, its energy expenditures can be as low as a few fJ per operation. DCRAM is fully compatible with CMOS technology, can be realized with current fabrication facilities, and therefore can really serve as an alternative to the present computing technology. (paper)

  10. A QDWH-Based SVD Software Framework on Distributed-Memory Manycore Systems

    KAUST Repository

    Sukkari, Dalal

    2017-01-01

    This paper presents a high performance software framework for computing a dense SVD on distributed- memory manycore systems. Originally introduced by Nakatsukasa et al. (Nakatsukasa et al. 2010; Nakatsukasa and Higham 2013), the SVD solver relies on the polar decomposition using the QR Dynamically-Weighted Halley algorithm (QDWH). Although the QDWH-based SVD algorithm performs a significant amount of extra floating-point operations compared to the traditional SVD with the one-stage bidiagonal reduction, the inherent high level of concurrency associated with Level 3 BLAS compute-bound kernels ultimately compensates for the arithmetic complexity overhead. Using the ScaLAPACK two-dimensional block cyclic data distribution with a rectangular processor topology, the resulting QDWH-SVD further reduces excessive communications during the panel factorization, while increasing the degree of parallelism during the update of the trailing submatrix, as opposed to relying to the default square processor grid. After detailing the algorithmic complexity and the memory footprint of the algorithm, we conduct a thorough performance analysis and study the impact of the grid topology on the performance by looking at the communication and computation profiling trade-offs. We report performance results against state-of-the-art existing QDWH software implementations (e.g., Elemental) and their SVD extensions on large-scale distributed-memory manycore systems based on commodity Intel x86 Haswell processors and Knights Landing (KNL) architecture. The QDWH-SVD framework achieves up to 3/8-fold on the Haswell/KNL-based platforms, respectively, against ScaLAPACK PDGESVD and turns out to be a competitive alternative for well and ill-conditioned matrices. We finally come up herein with a performance model based on these empirical results. Our QDWH-based polar decomposition and its SVD extension are freely available at https://github.com/ecrc/qdwh.git and https

  11. Computationally efficient implementation of combustion chemistry in parallel PDF calculations

    International Nuclear Information System (INIS)

    Lu Liuyan; Lantz, Steven R.; Ren Zhuyin; Pope, Stephen B.

    2009-01-01

    In parallel calculations of combustion processes with realistic chemistry, the serial in situ adaptive tabulation (ISAT) algorithm [S.B. Pope, Computationally efficient implementation of combustion chemistry using in situ adaptive tabulation, Combustion Theory and Modelling, 1 (1997) 41-63; L. Lu, S.B. Pope, An improved algorithm for in situ adaptive tabulation, Journal of Computational Physics 228 (2009) 361-386] substantially speeds up the chemistry calculations on each processor. To improve the parallel efficiency of large ensembles of such calculations in parallel computations, in this work, the ISAT algorithm is extended to the multi-processor environment, with the aim of minimizing the wall clock time required for the whole ensemble. Parallel ISAT strategies are developed by combining the existing serial ISAT algorithm with different distribution strategies, namely purely local processing (PLP), uniformly random distribution (URAN), and preferential distribution (PREF). The distribution strategies enable the queued load redistribution of chemistry calculations among processors using message passing. They are implemented in the software x2f m pi, which is a Fortran 95 library for facilitating many parallel evaluations of a general vector function. The relative performance of the parallel ISAT strategies is investigated in different computational regimes via the PDF calculations of multiple partially stirred reactors burning methane/air mixtures. The results show that the performance of ISAT with a fixed distribution strategy strongly depends on certain computational regimes, based on how much memory is available and how much overlap exists between tabulated information on different processors. No one fixed strategy consistently achieves good performance in all the regimes. Therefore, an adaptive distribution strategy, which blends PLP, URAN and PREF, is devised and implemented. It yields consistently good performance in all regimes. In the adaptive parallel

  12. Efficient process migration in the EMPS multiprocessor system

    NARCIS (Netherlands)

    van Dijk, G.J.W.; Gils, van M.J.

    1992-01-01

    The process migration facility in the Eindhoven multiprocessor system (EMPS) is presented. In the EMPS system, mailboxes are used for interprocess communication. These mailboxes provide transparency of location for communicating processes. The major advantages of mailbox communication in the EMPS

  13. Distributed optimization system and method

    Science.gov (United States)

    Hurtado, John E.; Dohrmann, Clark R.; Robinett, III, Rush D.

    2003-06-10

    A search system and method for controlling multiple agents to optimize an objective using distributed sensing and cooperative control. The search agent can be one or more physical agents, such as a robot, and can be software agents for searching cyberspace. The objective can be: chemical sources, temperature sources, radiation sources, light sources, evaders, trespassers, explosive sources, time dependent sources, time independent sources, function surfaces, maximization points, minimization points, and optimal control of a system such as a communication system, an economy, a crane, and a multi-processor computer.

  14. 3-dimensional magnetotelluric inversion including topography using deformed hexahedral edge finite elements and direct solvers parallelized on symmetric multiprocessor computers - Part II: direct data-space inverse solution

    Science.gov (United States)

    Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.

    2016-01-01

    Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.

  15. Monte Carlo photon transport on shared memory and distributed memory parallel processors

    International Nuclear Information System (INIS)

    Martin, W.R.; Wan, T.C.; Abdel-Rahman, T.S.; Mudge, T.N.; Miura, K.

    1987-01-01

    Parallelized Monte Carlo algorithms for analyzing photon transport in an inertially confined fusion (ICF) plasma are considered. Algorithms were developed for shared memory (vector and scalar) and distributed memory (scalar) parallel processors. The shared memory algorithm was implemented on the IBM 3090/400, and timing results are presented for dedicated runs with two, three, and four processors. Two alternative distributed memory algorithms (replication and dispatching) were implemented on a hypercube parallel processor (1 through 64 nodes). The replication algorithm yields essentially full efficiency for all cube sizes; with the 64-node configuration, the absolute performance is nearly the same as with the CRAY X-MP. The dispatching algorithm also yields efficiencies above 80% in a large simulation for the 64-processor configuration

  16. Reduction of Used Memory Ensemble Kalman Filtering (RumEnKF): A data assimilation scheme for memory intensive, high performance computing

    Science.gov (United States)

    Hut, Rolf; Amisigo, Barnabas A.; Steele-Dunne, Susan; van de Giesen, Nick

    2015-12-01

    Reduction of Used Memory Ensemble Kalman Filtering (RumEnKF) is introduced as a variant on the Ensemble Kalman Filter (EnKF). RumEnKF differs from EnKF in that it does not store the entire ensemble, but rather only saves the first two moments of the ensemble distribution. In this way, the number of ensemble members that can be calculated is less dependent on available memory, and mainly on available computing power (CPU). RumEnKF is developed to make optimal use of current generation super computer architecture, where the number of available floating point operations (flops) increases more rapidly than the available memory and where inter-node communication can quickly become a bottleneck. RumEnKF reduces the used memory compared to the EnKF when the number of ensemble members is greater than half the number of state variables. In this paper, three simple models are used (auto-regressive, low dimensional Lorenz and high dimensional Lorenz) to show that RumEnKF performs similarly to the EnKF. Furthermore, it is also shown that increasing the ensemble size has a similar impact on the estimation error from the three algorithms.

  17. Distributed 3-D iterative reconstruction for quantitative SPECT

    International Nuclear Information System (INIS)

    Ju, Z.W.; Frey, E.C.; Tsui, B.M.W.

    1995-01-01

    The authors describe a distributed three dimensional (3-D) iterative reconstruction library for quantitative single-photon emission computed tomography (SPECT). This library includes 3-D projector-backprojector pairs (PBPs) and distributed 3-D iterative reconstruction algorithms. The 3-D PBPs accurately and efficiently model various combinations of the image degrading factors including attenuation, detector response and scatter response. These PBPs were validated by comparing projection data computed using the projectors with that from direct Monte Carlo (MC) simulations. The distributed 3-D iterative algorithms spread the projection-backprojection operations for all the projection angles over a heterogeneous network of single or multi-processor computers to reduce the reconstruction time. Based on a master/slave paradigm, these distributed algorithms provide dynamic load balancing and fault tolerance. The distributed algorithms were verified by comparing images reconstructed using both the distributed and non-distributed algorithms. Computation times for distributed 3-D reconstructions running on up to 4 identical processors were reduced by a factor approximately 80--90% times the number of the processors participating, compared to those for non-distributed 3-D reconstructions running on a single processor. When combined with faster affordable computers, this library provides an efficient means for implementing accurate reconstruction and compensation methods to improve quality and quantitative accuracy in SPECT images

  18. Cache-aware network-on-chip for chip multiprocessors

    Science.gov (United States)

    Tatas, Konstantinos; Kyriacou, Costas; Dekoulis, George; Demetriou, Demetris; Avraam, Costas; Christou, Anastasia

    2009-05-01

    This paper presents the hardware prototype of a Network-on-Chip (NoC) for a chip multiprocessor that provides support for cache coherence, cache prefetching and cache-aware thread scheduling. A NoC with support to these cache related mechanisms can assist in improving systems performance by reducing the cache miss ratio. The presented multi-core system employs the Data-Driven Multithreading (DDM) model of execution. In DDM thread scheduling is done according to data availability, thus the system is aware of the threads to be executed in the near future. This characteristic of the DDM model allows for cache aware thread scheduling and cache prefetching. The NoC prototype is a crossbar switch with output buffering that can support a cache-aware 4-node chip multiprocessor. The prototype is built on the Xilinx ML506 board equipped with a Xilinx Virtex-5 FPGA.

  19. Quantitative Performance Analysis of the SPEC OMPM2001 Benchmarks

    Directory of Open Access Journals (Sweden)

    Vishal Aslot

    2003-01-01

    Full Text Available The state of modern computer systems has evolved to allow easy access to multiprocessor systems by supporting multiple processors on a single physical package. As the multiprocessor hardware evolves, new ways of programming it are also developed. Some inventions may merely be adopting and standardizing the older paradigms. One such evolving standard for programming shared-memory parallel computers is the OpenMP API. The Standard Performance Evaluation Corporation (SPEC has created a suite of parallel programs called SPEC OMP to compare and evaluate modern shared-memory multiprocessor systems using the OpenMP standard. We have studied these benchmarks in detail to understand their performance on a modern architecture. In this paper, we present detailed measurements of the benchmarks. We organize, summarize, and display our measurements using a Quantitative Model. We present a detailed discussion and derivation of the model. Also, we discuss the important loops in the SPEC OMPM2001 benchmarks and the reasons for less than ideal speedup on our platform.

  20. Simulation of radiation effects on three-dimensional computer optical memories

    Science.gov (United States)

    Moscovitch, M.; Emfietzoglou, D.

    1997-01-01

    A model was developed to simulate the effects of heavy charged-particle (HCP) radiation on the information stored in three-dimensional computer optical memories. The model is based on (i) the HCP track radial dose distribution, (ii) the spatial and temporal distribution of temperature in the track, (iii) the matrix-specific radiation-induced changes that will affect the response, and (iv) the kinetics of transition of photochromic molecules from the colored to the colorless isomeric form (bit flip). It is shown that information stored in a volume of several nanometers radius around the particle's track axis may be lost. The magnitude of the effect is dependent on the particle's track structure.

  1. Distributed multiscale computing

    NARCIS (Netherlands)

    Borgdorff, J.

    2014-01-01

    Multiscale models combine knowledge, data, and hypotheses from different scales. Simulating a multiscale model often requires extensive computation. This thesis evaluates distributing these computations, an approach termed distributed multiscale computing (DMC). First, the process of multiscale

  2. Intelligent discrete particle swarm optimization for multiprocessor task scheduling problem

    Directory of Open Access Journals (Sweden)

    S Sarathambekai

    2017-03-01

    Full Text Available Discrete particle swarm optimization is one of the most recently developed population-based meta-heuristic optimization algorithm in swarm intelligence that can be used in any discrete optimization problems. This article presents a discrete particle swarm optimization algorithm to efficiently schedule the tasks in the heterogeneous multiprocessor systems. All the optimization algorithms share a common algorithmic step, namely population initialization. It plays a significant role because it can affect the convergence speed and also the quality of the final solution. The random initialization is the most commonly used method in majority of the evolutionary algorithms to generate solutions in the initial population. The initial good quality solutions can facilitate the algorithm to locate the optimal solution or else it may prevent the algorithm from finding the optimal solution. Intelligence should be incorporated to generate the initial population in order to avoid the premature convergence. This article presents a discrete particle swarm optimization algorithm, which incorporates opposition-based technique to generate initial population and greedy algorithm to balance the load of the processors. Make span, flow time, and reliability cost are three different measures used to evaluate the efficiency of the proposed discrete particle swarm optimization algorithm for scheduling independent tasks in distributed systems. Computational simulations are done based on a set of benchmark instances to assess the performance of the proposed algorithm.

  3. Static Memory Deduplication for Performance Optimization in Cloud Computing

    Directory of Open Access Journals (Sweden)

    Gangyong Jia

    2017-04-01

    Full Text Available In a cloud computing environment, the number of virtual machines (VMs on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.

  4. Static Memory Deduplication for Performance Optimization in Cloud Computing.

    Science.gov (United States)

    Jia, Gangyong; Han, Guangjie; Wang, Hao; Yang, Xuan

    2017-04-27

    In a cloud computing environment, the number of virtual machines (VMs) on a single physical server and the number of applications running on each VM are continuously growing. This has led to an enormous increase in the demand of memory capacity and subsequent increase in the energy consumption in the cloud. Lack of enough memory has become a major bottleneck for scalability and performance of virtualization interfaces in cloud computing. To address this problem, memory deduplication techniques which reduce memory demand through page sharing are being adopted. However, such techniques suffer from overheads in terms of number of online comparisons required for the memory deduplication. In this paper, we propose a static memory deduplication (SMD) technique which can reduce memory capacity requirement and provide performance optimization in cloud computing. The main innovation of SMD is that the process of page detection is performed offline, thus potentially reducing the performance cost, especially in terms of response time. In SMD, page comparisons are restricted to the code segment, which has the highest shared content. Our experimental results show that SMD efficiently reduces memory capacity requirement and improves performance. We demonstrate that, compared to other approaches, the cost in terms of the response time is negligible.

  5. Distributed learning enhances relational memory consolidation.

    Science.gov (United States)

    Litman, Leib; Davachi, Lila

    2008-09-01

    It has long been known that distributed learning (DL) provides a mnemonic advantage over massed learning (ML). However, the underlying mechanisms that drive this robust mnemonic effect remain largely unknown. In two experiments, we show that DL across a 24 hr interval does not enhance immediate memory performance but instead slows the rate of forgetting relative to ML. Furthermore, we demonstrate that this savings in forgetting is specific to relational, but not item, memory. In the context of extant theories and knowledge of memory consolidation, these results suggest that an important mechanism underlying the mnemonic benefit of DL is enhanced memory consolidation. We speculate that synaptic strengthening mechanisms supporting long-term memory consolidation may be differentially mediated by the spacing of memory reactivation. These findings have broad implications for the scientific study of episodic memory consolidation and, more generally, for educational curriculum development and policy.

  6. Chip-Multiprocessor Hardware Locks for Safety-Critical Java

    DEFF Research Database (Denmark)

    Strøm, Torur Biskopstø; Puffitsch, Wolfgang; Schoeberl, Martin

    2013-01-01

    and may void a task set's schedulability. In this paper we present a hardware locking mechanism to reduce the synchronization overhead. The solution is implemented for the chip-multiprocessor version of the Java Optimized Processor in the context of safety-critical Java. The implementation is compared...

  7. Multi-processor data acquisition and monitoring systems for particle physics

    International Nuclear Information System (INIS)

    White, V.; Burch, B.; Eng, K.; Heinicke, P.; Pyatetsky, M.; Ritchie, D.

    1983-01-01

    A high speed distributed processing system, using PDP-11 and VAX processors, is being developed at Fermilab. The acquisition of data is done using one or more PDP-11s. Additional processors are connected to provide either data logging or extra data analysis capabilities. Within this framework, functional interchangeability of PDP-11 and VAX processors and of the PDP-11 operating systems, RT-11 and RSX-11M, has been maintained. Inter-processor connections have been implemented in a general way using the 5 megabit DR11-W hardware currently selected for the purpose. Using this approach the authors have been able to make use of several existing data acquisition and analysis packages, such as RT/MULTI, in a multi-processor system

  8. A distributed-memory hierarchical solver for general sparse linear systems

    Energy Technology Data Exchange (ETDEWEB)

    Chen, Chao [Stanford Univ., CA (United States). Inst. for Computational and Mathematical Engineering; Pouransari, Hadi [Stanford Univ., CA (United States). Dept. of Mechanical Engineering; Rajamanickam, Sivasankaran [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Center for Computing Research; Boman, Erik G. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States). Center for Computing Research; Darve, Eric [Stanford Univ., CA (United States). Inst. for Computational and Mathematical Engineering and Dept. of Mechanical Engineering

    2017-12-20

    We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by every processor. We also provide various numerical results to demonstrate the versatility and scalability of the parallel algorithm.

  9. Resistive content addressable memory based in-memory computation architecture

    KAUST Repository

    Salama, Khaled N.; Zidan, Mohammed A.; Kurdahi, Fadi; Eltawil, Ahmed M.

    2016-01-01

    Various examples are provided examples related to resistive content addressable memory (RCAM) based in-memory computation architectures. In one example, a system includes a content addressable memory (CAM) including an array of cells having a memristor based crossbar and an interconnection switch matrix having a gateless memristor array, which is coupled to an output of the CAM. In another example, a method, includes comparing activated bit values stored a key register with corresponding bit values in a row of a CAM, setting a tag bit value to indicate that the activated bit values match the corresponding bit values, and writing masked key bit values to corresponding bit locations in the row of the CAM based on the tag bit value.

  10. Resistive content addressable memory based in-memory computation architecture

    KAUST Repository

    Salama, Khaled N.

    2016-12-08

    Various examples are provided examples related to resistive content addressable memory (RCAM) based in-memory computation architectures. In one example, a system includes a content addressable memory (CAM) including an array of cells having a memristor based crossbar and an interconnection switch matrix having a gateless memristor array, which is coupled to an output of the CAM. In another example, a method, includes comparing activated bit values stored a key register with corresponding bit values in a row of a CAM, setting a tag bit value to indicate that the activated bit values match the corresponding bit values, and writing masked key bit values to corresponding bit locations in the row of the CAM based on the tag bit value.

  11. The fast Amsterdam multiprocessor (FAMP) system hardware

    International Nuclear Information System (INIS)

    Hertzberger, L.O.; Kieft, G.; Kisielewski, B.; Wiggers, L.W.; Engster, C.; Koningsveld, L. van

    1981-01-01

    The architecture of a multiprocessor system is described that will be used for on-line filter and second stage trigger applications. The system is based on the MC 68000 microprocessor from Motorola. Emphasis is paid to hardware aspects, in particular the modularity, processor communication and interfacing, whereas the system software and the applications will be described in separate articles. (orig.)

  12. An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

    KAUST Repository

    Bonny, Talal

    2012-07-28

    Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we introduce our Adaptive Hybrid Multiprocessor technique to accelerate the implementation of the Smith-Waterman algorithm. Our technique utilizes both the graphics processing unit (GPU) and the central processing unit (CPU). It adapts to the implementation according to the number of CPUs given as input by efficiently distributing the workload between the processing units. Using existing resources (GPU and CPU) in an efficient way is a novel approach. The peak performance achieved for the platforms GPU + CPU, GPU + 2CPUs, and GPU + 3CPUs is 10.4 GCUPS, 13.7 GCUPS, and 18.6 GCUPS, respectively (with the query length of 511 amino acid). © 2010 IEEE.

  13. A Taxonomy of Reconfigurable Single-/Multiprocessor Systems-on-Chip

    Directory of Open Access Journals (Sweden)

    Diana Göhringer

    2009-01-01

    Full Text Available Runtime adaptivity of hardware in processor architectures is a novel trend, which is under investigation in a variety of research labs all over the world. The runtime exchange of modules, implemented on a reconfigurable hardware, affects the instruction flow (e.g., in reconfigurable instruction set processors or the data flow, which has a strong impact on the performance of an application. Furthermore, the choice of a certain processor architecture related to the class of target applications is a crucial point in application development. A simple example is the domain of high-performance computing applications found in meteorology or high-energy physics, where vector processors are the optimal choice. A classification scheme for computer systems was provided in 1966 by Flynn where single/multiple data and instruction streams were combined to four types of architectures. This classification is now used as a foundation for an extended classification scheme including runtime adaptivity as further degree of freedom for processor architecture design. The developed scheme is validated by a multiprocessor system implemented on reconfigurable hardware as well as by a classification of existing static and reconfigurable processor systems.

  14. Design of a modular digital computer system, DRL 4. [for meeting future requirements of spaceborne computers

    Science.gov (United States)

    1972-01-01

    The design is reported of an advanced modular computer system designated the Automatically Reconfigurable Modular Multiprocessor System, which anticipates requirements for higher computing capacity and reliability for future spaceborne computers. Subjects discussed include: an overview of the architecture, mission analysis, synchronous and nonsynchronous scheduling control, reliability, and data transmission.

  15. Techniques for Reducing Consistency-Related Communication in Distributed Shared Memory System

    OpenAIRE

    Zwaenepoel, W; Bennett, J.K.; Carter, J.B.

    1995-01-01

    Distributed shared memory 8DSM) is an abstraction of shared memory on a distributed memory machine. Hardware DSM systems support this abstraction at the architecture level; software DSM systems support the abstraction within the runtime system. One of the key problems in building an efficient software DSM system is to reduce the amount of communication needed to keep the distributed memories consistent. In this paper we present four techniques for doing so: 1) software release consistency; 2)...

  16. High Performance Polar Decomposition on Distributed Memory Systems

    KAUST Repository

    Sukkari, Dalal E.

    2016-08-08

    The polar decomposition of a dense matrix is an important operation in linear algebra. It can be directly calculated through the singular value decomposition (SVD) or iteratively using the QR dynamically-weighted Halley algorithm (QDWH). The former is difficult to parallelize due to the preponderant number of memory-bound operations during the bidiagonal reduction. We investigate the latter scenario, which performs more floating-point operations but exposes at the same time more parallelism, and therefore, runs closer to the theoretical peak performance of the system, thanks to more compute-bound matrix operations. Profiling results show the performance scalability of QDWH for calculating the polar decomposition using around 9200 MPI processes on well and ill-conditioned matrices of 100K×100K problem size. We study then the performance impact of the QDWH-based polar decomposition as a pre-processing step toward calculating the SVD itself. The new distributed-memory implementation of the QDWH-SVD solver achieves up to five-fold speedup against current state-of-the-art vendor SVD implementations. © Springer International Publishing Switzerland 2016.

  17. Associative Memory Computing Power and Its Simulation

    CERN Document Server

    Volpi, G; The ATLAS collaboration

    2014-01-01

    The associative memory (AM) system is a computing device made of hundreds of AM ASICs chips designed to perform “pattern matching” at very high speed. Since each AM chip stores a data base of 130000 pre-calculated patterns and large numbers of chips can be easily assembled together, it is possible to produce huge AM banks. Speed and size of the system are crucial for real-time High Energy Physics applications, such as the ATLAS Fast TracKer (FTK) Processor. Using 80 million channels of the ATLAS tracker, FTK finds tracks within 100 micro seconds. The simulation of such a parallelized system is an extremely complex task if executed in commercial computers based on normal CPUs. The algorithm performance is limited, due to the lack of parallelism, and in addition the memory requirement is very large. In fact the AM chip uses a content addressable memory (CAM) architecture. Any data inquiry is broadcast to all memory elements simultaneously, thus data retrieval time is independent of the database size. The gr...

  18. Associative Memory computing power and its simulation

    CERN Document Server

    Ancu, L S; The ATLAS collaboration; Britzger, D; Giannetti, P; Howarth, J W; Luongo, C; Pandini, C; Schmitt, S; Volpi, G

    2014-01-01

    The associative memory (AM) system is a computing device made of hundreds of AM ASICs chips designed to perform “pattern matching” at very high speed. Since each AM chip stores a data base of 130000 pre-calculated patterns and large numbers of chips can be easily assembled together, it is possible to produce huge AM banks. Speed and size of the system are crucial for real-time High Energy Physics applications, such as the ATLAS Fast TracKer (FTK) Processor. Using 80 million channels of the ATLAS tracker, FTK finds tracks within 100 micro seconds. The simulation of such a parallelized system is an extremely complex task if executed in commercial computers based on normal CPUs. The algorithm performance is limited, due to the lack of parallelism, and in addition the memory requirement is very large. In fact the AM chip uses a content addressable memory (CAM) architecture. Any data inquiry is broadcast to all memory elements simultaneously, thus data retrieval time is independent of the database size. The gr...

  19. Distributed Shared Memory for the Cell Broadband Engine (DSMCBE)

    DEFF Research Database (Denmark)

    Larsen, Morten Nørgaard; Skovhede, Kenneth; Vinter, Brian

    2009-01-01

    in and out of non-coherent local storage blocks for each special processor element. In this paper we present a software library, namely the Distributed Shared Memory for the Cell Broadband Engine (DSMCBE). By using techniques known from distributed shared memory DSMCBE allows programmers to program the CELL...

  20. Intelligent distributed computing

    CERN Document Server

    Thampi, Sabu

    2015-01-01

    This book contains a selection of refereed and revised papers of the Intelligent Distributed Computing Track originally presented at the third International Symposium on Intelligent Informatics (ISI-2014), September 24-27, 2014, Delhi, India.  The papers selected for this Track cover several Distributed Computing and related topics including Peer-to-Peer Networks, Cloud Computing, Mobile Clouds, Wireless Sensor Networks, and their applications.

  1. Numerical fluid flow and heat transfer calculations on multiprocessor systems

    Energy Technology Data Exchange (ETDEWEB)

    Oehman, G.A.; Malen, T.E.; Kuusela, P.

    1989-01-01

    The first part of the report presents the basic principles of parallel processing, and factors influencing tbe efficiency of practical applications are discussed. In a multiprocessor computer, different parts of the program code are executed in parallel, i.e. simultaneous with respect to time, on different processors, and thus it becomes possible to decrease the overall computation time by a factor, which in the ideal case is equal to the number of processors. The application study starts from the numerical solution of the twodimesional Laplace equation, which describes the steady heat conduction in a solid plate and advances through the solution of the three dimensional Laplace equation to the case of study laminar fluid flow in a twodimensional box at Reynolds numbers up to 20. Hereby the stream function-vorticity method is first applied and the SIMPLER method. The conventional (sequential) numerical algoritms for these fluid flow and heat transfer problems are found not to be ideally suited for conversion to parallel computation, but sped-up ratios considerably above 50 % of the theoretical maximum are regularly achieved in the runs. The numerical procedures we coded in the OCCAM-2 language and the test runs were performed at who Akademi on the imperimental HATHI-computers containing 16 T4l4 and 100 INMOS T800 transputers respectively.

  2. Numerical fluid flow and heat transfer calculations on multiprocessor systems

    Energy Technology Data Exchange (ETDEWEB)

    Oehman, G.A.; Malen, T.E.; Kuusela, P.

    1989-12-31

    The first part of the report presents the basic principles of parallel processing, and factors influencing tbe efficiency of practical applications are discussed. In a multiprocessor computer, different parts of the program code are executed in parallel, i.e. simultaneous with respect to time, on different processors, and thus it becomes possible to decrease the overall computation time by a factor, which in the ideal case is equal to the number of processors. The application study starts from the numerical solution of the twodimesional Laplace equation, which describes the steady heat conduction in a solid plate and advances through the solution of the three dimensional Laplace equation to the case of study laminar fluid flow in a twodimensional box at Reynolds numbers up to 20. Hereby the stream function-vorticity method is first applied and the SIMPLER method. The conventional (sequential) numerical algoritms for these fluid flow and heat transfer problems are found not to be ideally suited for conversion to parallel computation, but sped-up ratios considerably above 50 % of the theoretical maximum are regularly achieved in the runs. The numerical procedures we coded in the OCCAM-2 language and the test runs were performed at who Akademi on the imperimental HATHI-computers containing 16 T4l4 and 100 INMOS T800 transputers respectively.

  3. Stream-processing pipelines: processing of streams on multiprocessor architecture

    NARCIS (Netherlands)

    Kavaldjiev, N.K.; Smit, Gerardus Johannes Maria; Jansen, P.G.

    In this paper we study the timing aspects of the operation of stream-processing applications that run on a multiprocessor architecture. Dependencies are derived for the processing and communication times of the processors in such a system. Three cases of real-time constrained operation and four

  4. A Performance Evaluation of the Hemingway DSM System on a Network of SMPs

    National Research Council Canada - National Science Library

    Aggarwal, Anshu; Grumwald, Dirk

    1997-01-01

    .... In this paper we investigate the performance of a software distributed shared memory system, Hemingway, which is built out of such multiprocessor workstations, utilizing off-the-shelf communication networks...

  5. A combined PLC and CPU approach to multiprocessor control

    International Nuclear Information System (INIS)

    Harris, J.J.; Broesch, J.D.; Coon, R.M.

    1995-10-01

    A sophisticated multiprocessor control system has been developed for use in the E-Power Supply System Integrated Control (EPSSIC) on the DIII-D tokamak. EPSSIC provides control and interlocks for the ohmic heating coil power supply and its associated systems. Of particular interest is the architecture of this system: both a Programmable Logic Controller (PLC) and a Central Processor Unit (CPU) have been combined on a standard VME bus. The PLC and CPU input and output signals are routed through signal conditioning modules, which provide the necessary voltage and ground isolation. Additionally these modules adapt the signal levels to that of the VME I/O boards. One set of I/O signals is shared between the two processors. The resulting multiprocessor system provides a number of advantages: redundant operation for mission critical situations, flexible communications using conventional TCP/IP protocols, the simplicity of ladder logic programming for the majority of the control code, and an easily maintained and expandable non-proprietary system

  6. Simulation of radiation effects on three-dimensional computer optical memories

    International Nuclear Information System (INIS)

    Moscovitch, M.; Emfietzoglou, D.

    1997-01-01

    A model was developed to simulate the effects of heavy charged-particle (HCP) radiation on the information stored in three-dimensional computer optical memories. The model is based on (i) the HCP track radial dose distribution, (ii) the spatial and temporal distribution of temperature in the track, (iii) the matrix-specific radiation-induced changes that will affect the response, and (iv) the kinetics of transition of photochromic molecules from the colored to the colorless isomeric form (bit flip). It is shown that information stored in a volume of several nanometers radius around the particle close-quote s track axis may be lost. The magnitude of the effect is dependent on the particle close-quote s track structure. copyright 1997 American Institute of Physics

  7. Basic operation principles of associate co-processor module for ...

    African Journals Online (AJOL)

    With such an organization, it is necessary to read each memory module address and ... for specialized computer systems using a modern element base. ... multiprocessor systems that perform associative functions and data storage functions.

  8. Evict on write, a management strategy for a prefetch unit and/or first level cache in a multiprocessor system with speculative execution

    Science.gov (United States)

    Gara, Alan; Ohmacht, Martin

    2014-09-16

    In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.

  9. Memory controllers for high-performance and real-time MPSoCs : requirements, architectures, and future trends

    NARCIS (Netherlands)

    Akesson, K.B.; Huang, Po-Chun; Clermidy, F.; Dutoit, D.; Goossens, K.G.W.; Chang, Yuan-Hao; Kuo, Tei-Wei; Vivet, P.; Wingard, D.

    2011-01-01

    Designing memory controllers for complex real-time and high-performance multi-processor systems-on-chip is challenging, since sufficient capacity and (real-time) performance must be provided in a reliable manner at low cost and with low power consumption. This special session contains four

  10. FTMP (Fault Tolerant Multiprocessor) programmer's manual

    Science.gov (United States)

    Feather, F. E.; Liceaga, C. A.; Padilla, P. A.

    1986-01-01

    The Fault Tolerant Multiprocessor (FTMP) computer system was constructed using the Rockwell/Collins CAPS-6 processor. It is installed in the Avionics Integration Research Laboratory (AIRLAB) of NASA Langley Research Center. It is hosted by AIRLAB's System 10, a VAX 11/750, for the loading of programs and experimentation. The FTMP support software includes a cross compiler for a high level language called Automated Engineering Design (AED) System, an assembler for the CAPS-6 processor assembly language, and a linker. Access to this support software is through an automated remote access facility on the VAX which relieves the user of the burden of learning how to use the IBM 4381. This manual is a compilation of information about the FTMP support environment. It explains the FTMP software and support environment along many of the finer points of running programs on FTMP. This will be helpful to the researcher trying to run an experiment on FTMP and even to the person probing FTMP with fault injections. Much of the information in this manual can be found in other sources; we are only attempting to bring together the basic points in a single source. If the reader should need points clarified, there is a list of support documentation in the back of this manual.

  11. Estimating Performance of Single Bus, Shared Memory Multiprocessors

    Science.gov (United States)

    1987-05-01

    Chandy78] K.M. Chandy, C.M. Sauer, "Approximate methods for analyzing queuing network models of computing systems," Computing Surveys, vol10 , no 3...Denning78] P. Denning, J. Buzen, "The operational analysis of queueing network models", Computing Sur- veys, vol10 , no 3, September 1978, pp 225-261

  12. Persistent Memory in Single Node Delay-Coupled Reservoir Computing.

    Science.gov (United States)

    Kovac, André David; Koall, Maximilian; Pipa, Gordon; Toutounji, Hazem

    2016-01-01

    Delays are ubiquitous in biological systems, ranging from genetic regulatory networks and synaptic conductances, to predator/pray population interactions. The evidence is mounting, not only to the presence of delays as physical constraints in signal propagation speed, but also to their functional role in providing dynamical diversity to the systems that comprise them. The latter observation in biological systems inspired the recent development of a computational architecture that harnesses this dynamical diversity, by delay-coupling a single nonlinear element to itself. This architecture is a particular realization of Reservoir Computing, where stimuli are injected into the system in time rather than in space as is the case with classical recurrent neural network realizations. This architecture also exhibits an internal memory which fades in time, an important prerequisite to the functioning of any reservoir computing device. However, fading memory is also a limitation to any computation that requires persistent storage. In order to overcome this limitation, the current work introduces an extended version to the single node Delay-Coupled Reservoir, that is based on trained linear feedback. We show by numerical simulations that adding task-specific linear feedback to the single node Delay-Coupled Reservoir extends the class of solvable tasks to those that require nonfading memory. We demonstrate, through several case studies, the ability of the extended system to carry out complex nonlinear computations that depend on past information, whereas the computational power of the system with fading memory alone quickly deteriorates. Our findings provide the theoretical basis for future physical realizations of a biologically-inspired ultrafast computing device with extended functionality.

  13. Computational Fluid Dynamics (CFD) Computations With Zonal Navier-Stokes Flow Solver (ZNSFLOW) Common High Performance Computing Scalable Software Initiative (CHSSI) Software

    National Research Council Canada - National Science Library

    Edge, Harris

    1999-01-01

    ...), computational fluid dynamics (CFD) 6 project. Under the project, a proven zonal Navier-Stokes solver was rewritten for scalable parallel performance on both shared memory and distributed memory high performance computers...

  14. A simplified computational memory model from information processing.

    Science.gov (United States)

    Zhang, Lanhua; Zhang, Dongsheng; Deng, Yuqin; Ding, Xiaoqian; Wang, Yan; Tang, Yiyuan; Sun, Baoliang

    2016-11-23

    This paper is intended to propose a computational model for memory from the view of information processing. The model, called simplified memory information retrieval network (SMIRN), is a bi-modular hierarchical functional memory network by abstracting memory function and simulating memory information processing. At first meta-memory is defined to express the neuron or brain cortices based on the biology and graph theories, and we develop an intra-modular network with the modeling algorithm by mapping the node and edge, and then the bi-modular network is delineated with intra-modular and inter-modular. At last a polynomial retrieval algorithm is introduced. In this paper we simulate the memory phenomena and functions of memorization and strengthening by information processing algorithms. The theoretical analysis and the simulation results show that the model is in accordance with the memory phenomena from information processing view.

  15. Scalable Distributed Architectures for Information Retrieval

    National Research Council Canada - National Science Library

    Lu, Zhihong

    1999-01-01

    .... Our distributed architectures exploit parallelism in information retrieval on a cluster of parallel IR servers using symmetric multiprocessors, and use partial collection replication and selection...

  16. E-Token Energy-Aware Proportionate Sharing Scheduling Algorithm for Multiprocessor Systems

    Directory of Open Access Journals (Sweden)

    Pasupuleti Ramesh

    2017-01-01

    Full Text Available WSN plays vital role from small range healthcare surveillance systems to largescale environmental monitoring. Its design for energy constrained applications is a challenging issue. Sensors in WSNs are projected to run separately for longer periods. It is of excessive cost to substitute exhausted batteries which is not even possible in antagonistic situations. Multiprocessors are used in WSNs for high performance scientific computing, where each processor is assigned the same or different workload. When the computational demands of the system increase then the energy efficient approaches play an important role to increase system lifetime. Energy efficiency is commonly carried out by using proportionate fair scheduler. This introduces abnormal overloading effect. In order to overcome the existing problems E-token Energy-Aware Proportionate Sharing (EEAPS scheduling is proposed here. The power consumption for each thread/task is calculated and the tasks are allotted to the multiple processors through the auctioning mechanism. The algorithm is simulated by using the real-time simulator (RTSIM and the results are tested.

  17. Energy-efficient fault tolerance in multiprocessor real-time systems

    Science.gov (United States)

    Guo, Yifeng

    The recent progress in the multiprocessor/multicore systems has important implications for real-time system design and operation. From vehicle navigation to space applications as well as industrial control systems, the trend is to deploy multiple processors in real-time systems: systems with 4 -- 8 processors are common, and it is expected that many-core systems with dozens of processing cores will be available in near future. For such systems, in addition to general temporal requirement common for all real-time systems, two additional operational objectives are seen as critical: energy efficiency and fault tolerance. An intriguing dimension of the problem is that energy efficiency and fault tolerance are typically conflicting objectives, due to the fact that tolerating faults (e.g., permanent/transient) often requires extra resources with high energy consumption potential. In this dissertation, various techniques for energy-efficient fault tolerance in multiprocessor real-time systems have been investigated. First, the Reliability-Aware Power Management (RAPM) framework, which can preserve the system reliability with respect to transient faults when Dynamic Voltage Scaling (DVS) is applied for energy savings, is extended to support parallel real-time applications with precedence constraints. Next, the traditional Standby-Sparing (SS) technique for dual processor systems, which takes both transient and permanent faults into consideration while saving energy, is generalized to support multiprocessor systems with arbitrary number of identical processors. Observing the inefficient usage of slack time in the SS technique, a Preference-Oriented Scheduling Framework is designed to address the problem where tasks are given preferences for being executed as soon as possible (ASAP) or as late as possible (ALAP). A preference-oriented earliest deadline (POED) scheduler is proposed and its application in multiprocessor systems for energy-efficient fault tolerance is

  18. Interference control by best-effort process duty-cycling in chip multi-processor systems for real-time medical image processing

    NARCIS (Netherlands)

    Westmijze, M.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria

    2013-01-01

    Systems with chip multi-processors are currently used for several applications that have real-time requirements. In chip multi-processor architectures, many hardware resources such as parts of the cache hierarchy are shared between cores and by using such resources, applications can significantly

  19. Spin-transfer torque magnetoresistive random-access memory technologies for normally off computing (invited)

    International Nuclear Information System (INIS)

    Ando, K.; Yuasa, S.; Fujita, S.; Ito, J.; Yoda, H.; Suzuki, Y.; Nakatani, Y.; Miyazaki, T.

    2014-01-01

    Most parts of present computer systems are made of volatile devices, and the power to supply them to avoid information loss causes huge energy losses. We can eliminate this meaningless energy loss by utilizing the non-volatile function of advanced spin-transfer torque magnetoresistive random-access memory (STT-MRAM) technology and create a new type of computer, i.e., normally off computers. Critical tasks to achieve normally off computers are implementations of STT-MRAM technologies in the main memory and low-level cache memories. STT-MRAM technology for applications to the main memory has been successfully developed by using perpendicular STT-MRAMs, and faster STT-MRAM technologies for applications to the cache memory are now being developed. The present status of STT-MRAMs and challenges that remain for normally off computers are discussed

  20. Cloud Computing as Evolution of Distributed Computing – A Case Study for SlapOS Distributed Cloud Computing Platform

    Directory of Open Access Journals (Sweden)

    George SUCIU

    2013-01-01

    Full Text Available The cloud computing paradigm has been defined from several points of view, the main two directions being either as an evolution of the grid and distributed computing paradigm, or, on the contrary, as a disruptive revolution in the classical paradigms of operating systems, network layers and web applications. This paper presents a distributed cloud computing platform called SlapOS, which unifies technologies and communication protocols into a new technology model for offering any application as a service. Both cloud and distributed computing can be efficient methods for optimizing resources that are aggregated from a grid of standard PCs hosted in homes, offices and small data centers. The paper fills a gap in the existing distributed computing literature by providing a distributed cloud computing model which can be applied for deploying various applications.

  1. Distributed terascale volume visualization using distributed shared virtual memory

    KAUST Repository

    Beyer, Johanna; Hadwiger, Markus; Schneider, Jens; Jeong, Wonki; Pfister, Hanspeter

    2011-01-01

    Table 1 illustrates the impact of different distribution unit sizes, different screen resolutions, and numbers of GPU nodes. We use two and four GPUs (NVIDIA Quadro 5000 with 2.5 GB memory) and a mouse cortex EM dataset (see Figure 2) of resolution

  2. The distribution and the functions of autobiographical memories: Why do older adults remember autobiographical memories from their youth?

    Science.gov (United States)

    Wolf, Tabea; Zimprich, Daniel

    2016-09-01

    In the present study, the distribution of autobiographical memories was examined from a functional perspective: we examined whether the extent to which long-term autobiographical memories were rated as having a self-, a directive, or a social function affects the location (mean age) and scale (standard deviation) of the memory distribution. Analyses were based on a total of 5598 autobiographical memories generated by 149 adults aged between 50 and 81 years in response to 51 cue-words. Participants provided their age at the time when the recalled events had happened and rated how frequently they recall these events for self-, directive, and social purposes. While more frequently using autobiographical memories for self-functions was associated with an earlier mean age, memories frequently shared with others showed a narrower distribution around a later mean age. The directive function, by contrast, did not affect the memory distribution. The results strengthen the assumption that experiences from an individual's late adolescence serve to maintain a sense of self-continuity throughout the lifespan. Experiences that are frequently shared with others, in contrast, stem from a narrow age range located in young adulthood.

  3. Persistent Memory in Single Node Delay-Coupled Reservoir Computing.

    Directory of Open Access Journals (Sweden)

    André David Kovac

    Full Text Available Delays are ubiquitous in biological systems, ranging from genetic regulatory networks and synaptic conductances, to predator/pray population interactions. The evidence is mounting, not only to the presence of delays as physical constraints in signal propagation speed, but also to their functional role in providing dynamical diversity to the systems that comprise them. The latter observation in biological systems inspired the recent development of a computational architecture that harnesses this dynamical diversity, by delay-coupling a single nonlinear element to itself. This architecture is a particular realization of Reservoir Computing, where stimuli are injected into the system in time rather than in space as is the case with classical recurrent neural network realizations. This architecture also exhibits an internal memory which fades in time, an important prerequisite to the functioning of any reservoir computing device. However, fading memory is also a limitation to any computation that requires persistent storage. In order to overcome this limitation, the current work introduces an extended version to the single node Delay-Coupled Reservoir, that is based on trained linear feedback. We show by numerical simulations that adding task-specific linear feedback to the single node Delay-Coupled Reservoir extends the class of solvable tasks to those that require nonfading memory. We demonstrate, through several case studies, the ability of the extended system to carry out complex nonlinear computations that depend on past information, whereas the computational power of the system with fading memory alone quickly deteriorates. Our findings provide the theoretical basis for future physical realizations of a biologically-inspired ultrafast computing device with extended functionality.

  4. A simplified computational memory model from information processing

    Science.gov (United States)

    Zhang, Lanhua; Zhang, Dongsheng; Deng, Yuqin; Ding, Xiaoqian; Wang, Yan; Tang, Yiyuan; Sun, Baoliang

    2016-01-01

    This paper is intended to propose a computational model for memory from the view of information processing. The model, called simplified memory information retrieval network (SMIRN), is a bi-modular hierarchical functional memory network by abstracting memory function and simulating memory information processing. At first meta-memory is defined to express the neuron or brain cortices based on the biology and graph theories, and we develop an intra-modular network with the modeling algorithm by mapping the node and edge, and then the bi-modular network is delineated with intra-modular and inter-modular. At last a polynomial retrieval algorithm is introduced. In this paper we simulate the memory phenomena and functions of memorization and strengthening by information processing algorithms. The theoretical analysis and the simulation results show that the model is in accordance with the memory phenomena from information processing view. PMID:27876847

  5. Safe and Efficient Support for Embeded Multi-Processors in ADA

    Science.gov (United States)

    Ruiz, Jose F.

    2010-08-01

    New software demands increasing processing power, and multi-processor platforms are spreading as the answer to achieve the required performance. Embedded real-time systems are also subject to this trend, but in the case of real-time mission-critical systems, the properties of reliability, predictability and analyzability are also paramount. The Ada 2005 language defined a subset of its tasking model, the Ravenscar profile, that provides the basis for the implementation of deterministic and time analyzable applications on top of a streamlined run-time system. This Ravenscar tasking profile, originally designed for single processors, has proven remarkably useful for modelling verifiable real-time single-processor systems. This paper proposes a simple extension to the Ravenscar profile to support multi-processor systems using a fully partitioned approach. The implementation of this scheme is simple, and it can be used to develop applications amenable to schedulability analysis.

  6. Distributed, Embedded and Real-time Java Systems

    CERN Document Server

    Wellings, Andy

    2012-01-01

    Research on real-time Java technology has been prolific over the past decade, leading to a large number of corresponding hardware and software solutions, and frameworks for distributed and embedded real-time Java systems.  This book is aimed primarily at researchers in real-time embedded systems, particularly those who wish to understand the current state of the art in using Java in this domain.  Much of the work in real-time distributed, embedded and real-time Java has focused on the Real-time Specification for Java (RTSJ) as the underlying base technology, and consequently many of the Chapters in this book address issues with, or solve problems using, this framework. Describes innovative techniques in: scheduling, memory management, quality of service and communication systems supporting real-time Java applications; Includes coverage of multiprocessor embedded systems and parallel programming; Discusses state-of-the-art resource management for embedded systems, including Java’s real-time garbage collect...

  7. A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images.

    Science.gov (United States)

    Afshar, Yaser; Sbalzarini, Ivo F

    2016-01-01

    Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10(10) pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments.

  8. Simulation of the behaviour of electron-optical systems using a parallel computer

    International Nuclear Information System (INIS)

    Balladore, J.L.; Hawkes, P.W.

    1990-01-01

    The advantage of using a multiprocessor computer for the calculation of electron-optical properties is investigated. A considerable reduction of computing time is obtained by reorganising the finite-element field computation. (orig.)

  9. Efficient calculation of open quantum system dynamics and time-resolved spectroscopy with distributed memory HEOM (DM-HEOM).

    Science.gov (United States)

    Kramer, Tobias; Noack, Matthias; Reinefeld, Alexander; Rodríguez, Mirta; Zelinskyy, Yaroslav

    2018-06-11

    Time- and frequency-resolved optical signals provide insights into the properties of light-harvesting molecular complexes, including excitation energies, dipole strengths and orientations, as well as in the exciton energy flow through the complex. The hierarchical equations of motion (HEOM) provide a unifying theory, which allows one to study the combined effects of system-environment dissipation and non-Markovian memory without making restrictive assumptions about weak or strong couplings or separability of vibrational and electronic degrees of freedom. With increasing system size the exact solution of the open quantum system dynamics requires memory and compute resources beyond a single compute node. To overcome this barrier, we developed a scalable variant of HEOM. Our distributed memory HEOM, DM-HEOM, is a universal tool for open quantum system dynamics. It is used to accurately compute all experimentally accessible time- and frequency-resolved processes in light-harvesting molecular complexes with arbitrary system-environment couplings for a wide range of temperatures and complex sizes. © 2018 Wiley Periodicals, Inc. © 2018 Wiley Periodicals, Inc.

  10. Proceedings: Sisal `93

    Energy Technology Data Exchange (ETDEWEB)

    Feo, J.T. [ed.

    1993-10-01

    This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor; A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.

  11. Monitoring of Computing Resource Use of Active Software Releases in ATLAS

    CERN Document Server

    Limosani, Antonio; The ATLAS collaboration

    2016-01-01

    The LHC is the world's most powerful particle accelerator, colliding protons at centre of mass energy of 13 TeV. As the energy and frequency of collisions has grown in the search for new physics, so too has demand for computing resources needed for event reconstruction. We will report on the evolution of resource usage in terms of CPU and RAM in key ATLAS offline reconstruction workflows at the Tier0 at CERN and on the WLCG. Monitoring of workflows is achieved using the ATLAS PerfMon package, which is the standard ATLAS performance monitoring system running inside Athena jobs. Systematic daily monitoring has recently been expanded to include all workflows beginning at Monte Carlo generation through to end user physics analysis, beyond that of event reconstruction. Moreover, the move to a multiprocessor mode in production jobs has facilitated the use of tools, such as "MemoryMonitor", to measure the memory shared across processors in jobs. Resource consumption is broken down into software domains and displayed...

  12. Computational modelling of memory retention from synapse to behaviour

    Science.gov (United States)

    van Rossum, Mark C. W.; Shippi, Maria

    2013-03-01

    One of our most intriguing mental abilities is the capacity to store information and recall it from memory. Computational neuroscience has been influential in developing models and concepts of learning and memory. In this tutorial review we focus on the interplay between learning and forgetting. We discuss recent advances in the computational description of the learning and forgetting processes on synaptic, neuronal, and systems levels, as well as recent data that open up new challenges for statistical physicists.

  13. Computational modelling of memory retention from synapse to behaviour

    International Nuclear Information System (INIS)

    Van Rossum, Mark C W; Shippi, Maria

    2013-01-01

    One of our most intriguing mental abilities is the capacity to store information and recall it from memory. Computational neuroscience has been influential in developing models and concepts of learning and memory. In this tutorial review we focus on the interplay between learning and forgetting. We discuss recent advances in the computational description of the learning and forgetting processes on synaptic, neuronal, and systems levels, as well as recent data that open up new challenges for statistical physicists. (paper)

  14. Sparse Distributed Memory: understanding the speed and robustness of expert memory

    Directory of Open Access Journals (Sweden)

    Marcelo Salhab Brogliato

    2014-04-01

    Full Text Available How can experts, sometimes in exacting detail, almost immediately and very precisely recall memory items from a vast repertoire? The problem in which we will be interested concerns models of theoretical neuroscience that could explain the speed and robustness of an expert's recollection. The approach is based on Sparse Distributed Memory, which has been shown to be plausible, both in a neuroscientific and in a psychological manner, in a number of ways. A crucial characteristic concerns the limits of human recollection, the `tip-of-tongue' memory event--which is found at a non-linearity in the model. We expand the theoretical framework, deriving an optimization formula to solve to this non-linearity. Numerical results demonstrate how the higher frequency of rehearsal, through work or study, immediately increases the robustness and speed associated with expert memory.

  15. Mapping of H.264 decoding on a multiprocessor architecture

    Science.gov (United States)

    van der Tol, Erik B.; Jaspers, Egbert G.; Gelderblom, Rob H.

    2003-05-01

    Due to the increasing significance of development costs in the competitive domain of high-volume consumer electronics, generic solutions are required to enable reuse of the design effort and to increase the potential market volume. As a result from this, Systems-on-Chip (SoCs) contain a growing amount of fully programmable media processing devices as opposed to application-specific systems, which offered the most attractive solutions due to a high performance density. The following motivates this trend. First, SoCs are increasingly dominated by their communication infrastructure and embedded memory, thereby making the cost of the functional units less significant. Moreover, the continuously growing design costs require generic solutions that can be applied over a broad product range. Hence, powerful programmable SoCs are becoming increasingly attractive. However, to enable power-efficient designs, that are also scalable over the advancing VLSI technology, parallelism should be fully exploited. Both task-level and instruction-level parallelism can be provided by means of e.g. a VLIW multiprocessor architecture. To provide the above-mentioned scalability, we propose to partition the data over the processors, instead of traditional functional partitioning. An advantage of this approach is the inherent locality of data, which is extremely important for communication-efficient software implementations. Consequently, a software implementation is discussed, enabling e.g. SD resolution H.264 decoding with a two-processor architecture, whereas High-Definition (HD) decoding can be achieved with an eight-processor system, executing the same software. Experimental results show that the data communication considerably reduces up to 65% directly improving the overall performance. Apart from considerable improvement in memory bandwidth, this novel concept of partitioning offers a natural approach for optimally balancing the load of all processors, thereby further improving the

  16. Implementation and Performance of Munin

    OpenAIRE

    Bennett, J.K.; Carter, J.B.; Zwaenepoel, W

    1991-01-01

    Munin is a distributed shared memory (DSM) system that allows shared memory paral­lel programs to be executed efficiently on distributed memory multiprocessors. Munin is unique among existing DSM systems in its use of multiple consistency protocols and in its use of release consistency. In Munin, shared program variables are annotated with their expected access pattern, and these annotations are then used by the runtime system to choose a consistency protocol best suited to that access patt...

  17. ATLAS Distributed Computing

    CERN Document Server

    Schovancova, J; The ATLAS collaboration

    2011-01-01

    The poster details the different aspects of the ATLAS Distributed Computing experience after the first year of LHC data taking. We describe the performance of the ATLAS distributed computing system and the lessons learned during the 2010 run, pointing out parts of the system which were in a good shape, and also spotting areas which required improvements. Improvements ranged from hardware upgrade on the ATLAS Tier-0 computing pools to improve data distribution rates, tuning of FTS channels between CERN and Tier-1s, and studying data access patterns for Grid analysis to improve the global processing rate. We show recent software development driven by operational needs with emphasis on data management and job execution in the ATLAS production system.

  18. Development of a parallel DBMS on the basis of PostgreSQL

    OpenAIRE

    Pan, C.

    2011-01-01

    The paper describes the architecture and the design of PargreSQL parallel database management system (DBMS) for distributed memory multiprocessors. PargreSQL is based upon PostgreSQL open-source DBMS and exploits partitioned parallelism.

  19. Multiprocessor development for robot control

    International Nuclear Information System (INIS)

    Lee, John Min; Kim, Seung Ho; Kim, Chang Hoi; Kim, Byung Soo; Hwang, Suk Yeong; Lee, Young Bum; Sohn, Suk Won; Kim, Woon Gi

    1990-01-01

    The project of this study is to develop a real time controller applying autonomous robotic systems operated in hostile environment. Developed control system is designed with a multiprocessor to get independency and reliability as well as to extend the system easily. The control system is designed in three distinct subsystems (supervisory control part, functional control part, and remote control part). To review the functional performance of developed controller, a prototype mobile robot, which was installed 4 DOF mainpulator, was designed and manufactured. Initial tests showed that the robot could turn with a radius of 38 cm and a maximum speed of 1.26 km/hr and go over obstacle of 18 cm in height. (author)

  20. Energy-aware memory management for embedded multimedia systems a computer-aided design approach

    CERN Document Server

    Balasa, Florin

    2011-01-01

    Energy-Aware Memory Management for Embedded Multimedia Systems: A Computer-Aided Design Approach presents recent computer-aided design (CAD) ideas that address memory management tasks, particularly the optimization of energy consumption in the memory subsystem. It explains how to efficiently implement CAD solutions, including theoretical methods and novel algorithms. The book covers various energy-aware design techniques, including data-dependence analysis techniques, memory size estimation methods, extensions of mapping approaches, and memory banking approaches. It shows how these techniques

  1. Bringing high-performance computing to the biologist's workbench: approaches, applications, and challenges

    International Nuclear Information System (INIS)

    Oehmen, C S; Cannon, W R

    2008-01-01

    Data-intensive and high-performance computing are poised to significantly impact the future of biological research which is increasingly driven by the prevalence of high-throughput experimental methodologies for genome sequencing, transcriptomics, proteomics, and other areas. Large centers such as NIH's National Center for Biotechnology Information, The Institute for Genomic Research, and the DOE's Joint Genome Institute) have made extensive use of multiprocessor architectures to deal with some of the challenges of processing, storing and curating exponentially growing genomic and proteomic datasets, thus enabling users to rapidly access a growing public data source, as well as use analysis tools transparently on high-performance computing resources. Applying this computational power to single-investigator analysis, however, often relies on users to provide their own computational resources, forcing them to endure the learning curve of porting, building, and running software on multiprocessor architectures. Solving the next generation of large-scale biology challenges using multiprocessor machines-from small clusters to emerging petascale machines-can most practically be realized if this learning curve can be minimized through a combination of workflow management, data management and resource allocation as well as intuitive interfaces and compatibility with existing common data formats

  2. Memory-assisted measurement-device-independent quantum key distribution

    Science.gov (United States)

    Panayi, Christiana; Razavi, Mohsen; Ma, Xiongfeng; Lütkenhaus, Norbert

    2014-04-01

    A protocol with the potential of beating the existing distance records for conventional quantum key distribution (QKD) systems is proposed. It borrows ideas from quantum repeaters by using memories in the middle of the link, and that of measurement-device-independent QKD, which only requires optical source equipment at the user's end. For certain memories with short access times, our scheme allows a higher repetition rate than that of quantum repeaters with single-mode memories, thereby requiring lower coherence times. By accounting for various sources of nonideality, such as memory decoherence, dark counts, misalignment errors, and background noise, as well as timing issues with memories, we develop a mathematical framework within which we can compare QKD systems with and without memories. In particular, we show that with the state-of-the-art technology for quantum memories, it is potentially possible to devise memory-assisted QKD systems that, at certain distances of practical interest, outperform current QKD implementations.

  3. Memory-assisted measurement-device-independent quantum key distribution

    International Nuclear Information System (INIS)

    Panayi, Christiana; Razavi, Mohsen; Ma, Xiongfeng; Lütkenhaus, Norbert

    2014-01-01

    A protocol with the potential of beating the existing distance records for conventional quantum key distribution (QKD) systems is proposed. It borrows ideas from quantum repeaters by using memories in the middle of the link, and that of measurement-device-independent QKD, which only requires optical source equipment at the user's end. For certain memories with short access times, our scheme allows a higher repetition rate than that of quantum repeaters with single-mode memories, thereby requiring lower coherence times. By accounting for various sources of nonideality, such as memory decoherence, dark counts, misalignment errors, and background noise, as well as timing issues with memories, we develop a mathematical framework within which we can compare QKD systems with and without memories. In particular, we show that with the state-of-the-art technology for quantum memories, it is potentially possible to devise memory-assisted QKD systems that, at certain distances of practical interest, outperform current QKD implementations. (paper)

  4. SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

    International Nuclear Information System (INIS)

    Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

    1999-01-01

    In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS

  5. SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

    Energy Technology Data Exchange (ETDEWEB)

    Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

    1999-01-04

    In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS.

  6. A Screen Space GPGPU Surface LIC Algorithm for Distributed Memory Data Parallel Sort Last Rendering Infrastructures

    Energy Technology Data Exchange (ETDEWEB)

    Loring, Burlen; Karimabadi, Homa; Rortershteyn, Vadim

    2014-07-01

    The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not. We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.

  7. The Distributed Nature of Working Memory

    NARCIS (Netherlands)

    Christophel, Thomas B.; Klink, P. Christiaan; Spitzer, Bernhard; Roelfsema, Pieter R.; Haynes, John-Dylan

    2017-01-01

    Studies in humans and non-human primates have provided evidence for storage of working memory contents in multiple regions ranging from sensory to parietal and prefrontal cortex. We discuss potential explanations for these distributed representations: (i) features in sensory regions versus

  8. A 32-bit computer for large memory applications on the FASTBUS

    International Nuclear Information System (INIS)

    Kellner, R.; Blossom, J.M.; Hung, J.P.

    1985-01-01

    A FASTBUS based 32-bit computer is being built at Los Alamos National Laboratory for use in systems requiring large fast memory in the FASTBUS environment. A separate local execution bus allows data reduction to proceed concurrently with other FASTBUS operations. The computer, which can operate in either master or slave mode, includes the National Semiconductor NS32032 chip set with demand paged memory management, floating point slave processor, interrupt control unit, timers, and time-of-day clock. The 16.0 megabytes of random access memory are interleaved to allow windowed direct memory access on and off the FASTBUS at 80 megabytes per second

  9. Soft-error tolerance and energy consumption evaluation of embedded computer with magnetic random access memory in practical systems using computer simulations

    Science.gov (United States)

    Nebashi, Ryusuke; Sakimura, Noboru; Sugibayashi, Tadahiko

    2017-08-01

    We evaluated the soft-error tolerance and energy consumption of an embedded computer with magnetic random access memory (MRAM) using two computer simulators. One is a central processing unit (CPU) simulator of a typical embedded computer system. We simulated the radiation-induced single-event-upset (SEU) probability in a spin-transfer-torque MRAM cell and also the failure rate of a typical embedded computer due to its main memory SEU error. The other is a delay tolerant network (DTN) system simulator. It simulates the power dissipation of wireless sensor network nodes of the system using a revised CPU simulator and a network simulator. We demonstrated that the SEU effect on the embedded computer with 1 Gbit MRAM-based working memory is less than 1 failure in time (FIT). We also demonstrated that the energy consumption of the DTN sensor node with MRAM-based working memory can be reduced to 1/11. These results indicate that MRAM-based working memory enhances the disaster tolerance of embedded computers.

  10. Three-dimensional magnetic field computation on a distributed memory parallel processor

    International Nuclear Information System (INIS)

    Barion, M.L.

    1990-01-01

    The analysis of three-dimensional magnetic fields by finite element methods frequently proves too onerous a task for the computing resource on which it is attempted. When non-linear and transient effects are included, it may become impossible to calculate the field distribution to sufficient resolution. One approach to this problem is to exploit the natural parallelism in the finite element method via parallel processing. This paper reports on an implementation of a finite element code for non-linear three-dimensional low-frequency magnetic field calculation on Intel's iPSC/2

  11. Computer Icons and the Art of Memory.

    Science.gov (United States)

    McNair, John R.

    1996-01-01

    States that key aspects of "memoria," the ancient Art of Memory, especially its focus on vivid representational images set against distinct backgrounds, can be helpful in creating memorable, universal, and easily retrievable computer icons. (PA)

  12. Distributed computing system with dual independent communications paths between computers and employing split tokens

    Science.gov (United States)

    Rasmussen, Robert D. (Inventor); Manning, Robert M. (Inventor); Lewis, Blair F. (Inventor); Bolotin, Gary S. (Inventor); Ward, Richard S. (Inventor)

    1990-01-01

    This is a distributed computing system providing flexible fault tolerance; ease of software design and concurrency specification; and dynamic balance of the loads. The system comprises a plurality of computers each having a first input/output interface and a second input/output interface for interfacing to communications networks each second input/output interface including a bypass for bypassing the associated computer. A global communications network interconnects the first input/output interfaces for providing each computer the ability to broadcast messages simultaneously to the remainder of the computers. A meshwork communications network interconnects the second input/output interfaces providing each computer with the ability to establish a communications link with another of the computers bypassing the remainder of computers. Each computer is controlled by a resident copy of a common operating system. Communications between respective ones of computers is by means of split tokens each having a moving first portion which is sent from computer to computer and a resident second portion which is disposed in the memory of at least one of computer and wherein the location of the second portion is part of the first portion. The split tokens represent both functions to be executed by the computers and data to be employed in the execution of the functions. The first input/output interfaces each include logic for detecting a collision between messages and for terminating the broadcasting of a message whereby collisions between messages are detected and avoided.

  13. Abstractions for aperiodic multiprocessor scheduling of real-time stream processing applications

    NARCIS (Netherlands)

    Hausmans, J.P.H.M.

    2015-01-01

    Embedded multiprocessor systems are often used in the domain of real-time stream processing applications to keep up with increasing power and performance requirements. Examples of such real-time stream processing applications are digital radio baseband processing and WLAN transceivers. These stream

  14. The Spacetime Memory of Geometric Phases and Quantum Computing

    CERN Document Server

    Binder, B

    2002-01-01

    Spacetime memory is defined with a holonomic approach to information processing, where multi-state stability is introduced by a non-linear phase-locked loop. Geometric phases serve as the carrier of physical information and geometric memory (of orientation) given by a path integral measure of curvature that is periodically refreshed. Regarding the resulting spin-orbit coupling and gauge field, the geometric nature of spacetime memory suggests to assign intrinsic computational properties to the electromagnetic field.

  15. A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images

    Science.gov (United States)

    Afshar, Yaser; Sbalzarini, Ivo F.

    2016-01-01

    Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 1010 pixels), but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments. PMID:27046144

  16. A Parallel Distributed-Memory Particle Method Enables Acquisition-Rate Segmentation of Large Fluorescence Microscopy Images.

    Directory of Open Access Journals (Sweden)

    Yaser Afshar

    Full Text Available Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. We address both issues by developing a distributed parallel algorithm for segmentation of large fluorescence microscopy images. The method is based on the versatile Discrete Region Competition algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collectively solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10(10 pixels, but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data compression and interactive experiments.

  17. Multiple-User, Multitasking, Virtual-Memory Computer System

    Science.gov (United States)

    Generazio, Edward R.; Roth, Don J.; Stang, David B.

    1993-01-01

    Computer system designed and programmed to serve multiple users in research laboratory. Provides for computer control and monitoring of laboratory instruments, acquisition and anlaysis of data from those instruments, and interaction with users via remote terminals. System provides fast access to shared central processing units and associated large (from megabytes to gigabytes) memories. Underlying concept of system also applicable to monitoring and control of industrial processes.

  18. Hardware for Accelerating N-Modular Redundant Systems for High-Reliability Computing

    Science.gov (United States)

    Dobbs, Carl, Sr.

    2012-01-01

    A hardware unit has been designed that reduces the cost, in terms of performance and power consumption, for implementing N-modular redundancy (NMR) in a multiprocessor device. The innovation monitors transactions to memory, and calculates a form of sumcheck on-the-fly, thereby relieving the processors of calculating the sumcheck in software

  19. Multiprocessor Real-Time Scheduling with Hierarchical Processor Affinities

    OpenAIRE

    Bonifaci , Vincenzo; Brandenburg , Björn; D'Angelo , Gianlorenzo; Marchetti-Spaccamela , Alberto

    2016-01-01

    International audience; Many multiprocessor real-time operating systems offer the possibility to restrict the migrations of any task to a specified subset of processors by setting affinity masks. A notion of " strong arbitrary processor affinity scheduling " (strong APA scheduling) has been proposed; this notion avoids schedulability losses due to overly simple implementations of processor affinities. Due to potential overheads, strong APA has not been implemented so far in a real-time operat...

  20. A class Hierarchical, object-oriented approach to virtual memory management

    Science.gov (United States)

    Russo, Vincent F.; Campbell, Roy H.; Johnston, Gary M.

    1989-01-01

    The Choices family of operating systems exploits class hierarchies and object-oriented programming to facilitate the construction of customized operating systems for shared memory and networked multiprocessors. The software is being used in the Tapestry laboratory to study the performance of algorithms, mechanisms, and policies for parallel systems. Described here are the architectural design and class hierarchy of the Choices virtual memory management system. The software and hardware mechanisms and policies of a virtual memory system implement a memory hierarchy that exploits the trade-off between response times and storage capacities. In Choices, the notion of a memory hierarchy is captured by abstract classes. Concrete subclasses of those abstractions implement a virtual address space, segmentation, paging, physical memory management, secondary storage, and remote (that is, networked) storage. Captured in the notion of a memory hierarchy are classes that represent memory objects. These classes provide a storage mechanism that contains encapsulated data and have methods to read or write the memory object. Each of these classes provides specializations to represent the memory hierarchy.

  1. Self-powered information measuring wireless networks using the distribution of tasks within multicore processors

    Science.gov (United States)

    Zhuravska, Iryna M.; Koretska, Oleksandra O.; Musiyenko, Maksym P.; Surtel, Wojciech; Assembay, Azat; Kovalev, Vladimir; Tleshova, Akmaral

    2017-08-01

    The article contains basic approaches to develop the self-powered information measuring wireless networks (SPIM-WN) using the distribution of tasks within multicore processors critical applying based on the interaction of movable components - as in the direction of data transmission as wireless transfer of energy coming from polymetric sensors. Base mathematic model of scheduling tasks within multiprocessor systems was modernized to schedule and allocate tasks between cores of one-crystal computer (SoC) to increase energy efficiency SPIM-WN objects.

  2. Intelligent Distributed Computing VI : Proceedings of the 6th International Symposium on Intelligent Distributed Computing

    CERN Document Server

    Badica, Costin; Malgeri, Michele; Unland, Rainer

    2013-01-01

    This book represents the combined peer-reviewed proceedings of the Sixth International Symposium on Intelligent Distributed Computing -- IDC~2012, of the International Workshop on Agents for Cloud -- A4C~2012 and of the Fourth International Workshop on Multi-Agent Systems Technology and Semantics -- MASTS~2012. All the events were held in Calabria, Italy during September 24-26, 2012. The 37 contributions published in this book address many topics related to theory and applications of intelligent distributed computing and multi-agent systems, including: adaptive and autonomous distributed systems, agent programming, ambient assisted living systems, business process modeling and verification, cloud computing, coalition formation, decision support systems, distributed optimization and constraint satisfaction, gesture recognition, intelligent energy management in WSNs, intelligent logistics, machine learning, mobile agents, parallel and distributed computational intelligence, parallel evolutionary computing, trus...

  3. Distributed terascale volume visualization using distributed shared virtual memory

    KAUST Repository

    Beyer, Johanna

    2011-10-01

    Table 1 illustrates the impact of different distribution unit sizes, different screen resolutions, and numbers of GPU nodes. We use two and four GPUs (NVIDIA Quadro 5000 with 2.5 GB memory) and a mouse cortex EM dataset (see Figure 2) of resolution 21,494 x 25,790 x 1,850 = 955GB. The size of the virtual distribution units significantly influences the data distribution between nodes. Small distribution units result in a high depth complexity for compositing. Large distribution units lead to a low utilization of GPUs, because in the worst case only a single distribution unit will be in view, which is rendered by only a single node. The choice of an optimal distribution unit size depends on three major factors: the output screen resolution, the block cache size on each node, and the number of nodes. Currently, we are working on optimizing the compositing step and network communication between nodes. © 2011 IEEE.

  4. Thermal-Aware Scheduling for Future Chip Multiprocessors

    Directory of Open Access Journals (Sweden)

    Pedro Trancoso

    2007-04-01

    Full Text Available The increased complexity and operating frequency in current single chip microprocessors is resulting in a decrease in the performance improvements. Consequently, major manufacturers offer chip multiprocessor (CMP architectures in order to keep up with the expected performance gains. This architecture is successfully being introduced in many markets including that of the embedded systems. Nevertheless, the integration of several cores onto the same chip may lead to increased heat dissipation and consequently additional costs for cooling, higher power consumption, decrease of the reliability, and thermal-induced performance loss, among others. In this paper, we analyze the evolution of the thermal issues for the future chip multiprocessor architectures and show that as the number of on-chip cores increases, the thermal-induced problems will worsen. In addition, we present several scenarios that result in excessive thermal stress to the CMP chip or significant performance loss. In order to minimize or even eliminate these problems, we propose thermal-aware scheduler (TAS algorithms. When assigning processes to cores, TAS takes their temperature and cooling ability into account in order to avoid thermal stress and at the same time improve the performance. Experimental results have shown that a TAS algorithm that considers also the temperatures of neighboring cores is able to significantly reduce the temperature-induced performance loss while at the same time, decrease the chip's temperature across many different operation and configuration scenarios.

  5. Highway traffic simulation on multi-processor computers

    Energy Technology Data Exchange (ETDEWEB)

    Hanebutte, U.R.; Doss, E.; Tentner, A.M.

    1997-04-01

    A computer model has been developed to simulate highway traffic for various degrees of automation with a high level of fidelity in regard to driver control and vehicle characteristics. The model simulates vehicle maneuvering in a multi-lane highway traffic system and allows for the use of Intelligent Transportation System (ITS) technologies such as an Automated Intelligent Cruise Control (AICC). The structure of the computer model facilitates the use of parallel computers for the highway traffic simulation, since domain decomposition techniques can be applied in a straight forward fashion. In this model, the highway system (i.e. a network of road links) is divided into multiple regions; each region is controlled by a separate link manager residing on an individual processor. A graphical user interface augments the computer model kv allowing for real-time interactive simulation control and interaction with each individual vehicle and road side infrastructure element on each link. Average speed and traffic volume data is collected at user-specified loop detector locations. Further, as a measure of safety the so- called Time To Collision (TTC) parameter is being recorded.

  6. Parallelization of MCNP 4, a Monte Carlo neutron and photon transport code system, in highly parallel distributed memory type computer

    International Nuclear Information System (INIS)

    Masukawa, Fumihiro; Takano, Makoto; Naito, Yoshitaka; Yamazaki, Takao; Fujisaki, Masahide; Suzuki, Koichiro; Okuda, Motoi.

    1993-11-01

    In order to improve the accuracy and calculating speed of shielding analyses, MCNP 4, a Monte Carlo neutron and photon transport code system, has been parallelized and measured of its efficiency in the highly parallel distributed memory type computer, AP1000. The code has been analyzed statically and dynamically, then the suitable algorithm for parallelization has been determined for the shielding analysis functions of MCNP 4. This includes a strategy where a new history is assigned to the idling processor element dynamically during the execution. Furthermore, to avoid the congestion of communicative processing, the batch concept, processing multi-histories by a unit, has been introduced. By analyzing a sample cask problem with 2,000,000 histories by the AP1000 with 512 processor elements, the 82 % of parallelization efficiency is achieved, and the calculational speed has been estimated to be around 50 times as fast as that of FACOM M-780. (author)

  7. PCI bus content-addressable-memory (CAM) implementation on FPGA for pattern recognition/image retrieval in a distributed environment

    Science.gov (United States)

    Megherbi, Dalila B.; Yan, Yin; Tanmay, Parikh; Khoury, Jed; Woods, C. L.

    2004-11-01

    Recently surveillance and Automatic Target Recognition (ATR) applications are increasing as the cost of computing power needed to process the massive amount of information continues to fall. This computing power has been made possible partly by the latest advances in FPGAs and SOPCs. In particular, to design and implement state-of-the-Art electro-optical imaging systems to provide advanced surveillance capabilities, there is a need to integrate several technologies (e.g. telescope, precise optics, cameras, image/compute vision algorithms, which can be geographically distributed or sharing distributed resources) into a programmable system and DSP systems. Additionally, pattern recognition techniques and fast information retrieval, are often important components of intelligent systems. The aim of this work is using embedded FPGA as a fast, configurable and synthesizable search engine in fast image pattern recognition/retrieval in a distributed hardware/software co-design environment. In particular, we propose and show a low cost Content Addressable Memory (CAM)-based distributed embedded FPGA hardware architecture solution with real time recognition capabilities and computing for pattern look-up, pattern recognition, and image retrieval. We show how the distributed CAM-based architecture offers a performance advantage of an order-of-magnitude over RAM-based architecture (Random Access Memory) search for implementing high speed pattern recognition for image retrieval. The methods of designing, implementing, and analyzing the proposed CAM based embedded architecture are described here. Other SOPC solutions/design issues are covered. Finally, experimental results, hardware verification, and performance evaluations using both the Xilinx Virtex-II and the Altera Apex20k are provided to show the potential and power of the proposed method for low cost reconfigurable fast image pattern recognition/retrieval at the hardware/software co-design level.

  8. Contributing to the design of run-time systems dedicated to high performance computing; Contribution a l'elaboration d'environnements de programmation dedies au calcul scientifique hautes performances

    Energy Technology Data Exchange (ETDEWEB)

    Perache, M

    2006-10-15

    In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)

  9. Multiprocessor Real-Time Locking Protocols for Replicated Resources

    Science.gov (United States)

    2016-07-01

    assignment problem, the ac- tual identities of the allocated replicas must be known. When locking protocols are used, tasks may experience delays due to both...Multiprocessor Real-Time Locking Protocols for Replicated Resources ∗ Catherine E. Jarrett1, Kecheng Yang1, Ming Yang1, Pontus Ekberg2, and James H...replicas to execute. In prior work on replicated resources, k-exclusion locks have been used, but this restricts tasks to lock only one replica at a time. To

  10. A new communication scheme for the neutron diffusion nodal method in a distributed computing environment

    International Nuclear Information System (INIS)

    Kirk, B.L.; Azmy, Y.

    1994-01-01

    A modified scheme is developed for solving the two-dimensional nodal diffusion equations on distributed memory computers. The scheme is aimed at minimizing the volume of communication among processors while maximizing the tasks in parallel. Results show a significant improvement in parallel efficiency on the Intel iPSC/860 hypercube compared to previous algorithms

  11. Multi-processor system-level synthesis for multiple applications on platform FPGA

    NARCIS (Netherlands)

    Kumar, A.; Fernando, S.D.; Ha, Y.; Mesman, B.; Corporaal, H.; Bertels, Koen

    2007-01-01

    Multiprocessor systems-on-chip (MPSoC) are being developed in increasing numbers to support the high number of applications running on modern embedded systems. Designing and programming such systems prove to be a major challenge. Most of the current design methodologies rely on creating the design

  12. Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

    Science.gov (United States)

    Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

    2015-09-01

    The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.

  13. Hybrid computing using a neural network with dynamic external memory.

    Science.gov (United States)

    Graves, Alex; Wayne, Greg; Reynolds, Malcolm; Harley, Tim; Danihelka, Ivo; Grabska-Barwińska, Agnieszka; Colmenarejo, Sergio Gómez; Grefenstette, Edward; Ramalho, Tiago; Agapiou, John; Badia, Adrià Puigdomènech; Hermann, Karl Moritz; Zwols, Yori; Ostrovski, Georg; Cain, Adam; King, Helen; Summerfield, Christopher; Blunsom, Phil; Kavukcuoglu, Koray; Hassabis, Demis

    2016-10-27

    Artificial neural networks are remarkably adept at sensory processing, sequence learning and reinforcement learning, but are limited in their ability to represent variables and data structures and to store data over long timescales, owing to the lack of an external memory. Here we introduce a machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer. Like a conventional computer, it can use its memory to represent and manipulate complex data structures, but, like a neural network, it can learn to do so from data. When trained with supervised learning, we demonstrate that a DNC can successfully answer synthetic questions designed to emulate reasoning and inference problems in natural language. We show that it can learn tasks such as finding the shortest path between specified points and inferring the missing links in randomly generated graphs, and then generalize these tasks to specific graphs such as transport networks and family trees. When trained with reinforcement learning, a DNC can complete a moving blocks puzzle in which changing goals are specified by sequences of symbols. Taken together, our results demonstrate that DNCs have the capacity to solve complex, structured tasks that are inaccessible to neural networks without external read-write memory.

  14. An Alternative Algorithm for Computing Watersheds on Shared Memory Parallel Computers

    NARCIS (Netherlands)

    Meijster, A.; Roerdink, J.B.T.M.

    1995-01-01

    In this paper a parallel implementation of a watershed algorithm is proposed. The algorithm can easily be implemented on shared memory parallel computers. The watershed transform is generally considered to be inherently sequential since the discrete watershed of an image is defined using recursion.

  15. Recognition of simple visual images using a sparse distributed memory: Some implementations and experiments

    Science.gov (United States)

    Jaeckel, Louis A.

    1990-01-01

    Previously, a method was described of representing a class of simple visual images so that they could be used with a Sparse Distributed Memory (SDM). Herein, two possible implementations are described of a SDM, for which these images, suitably encoded, will serve both as addresses to the memory and as data to be stored in the memory. A key feature of both implementations is that a pattern that is represented as an unordered set with a variable number of members can be used as an address to the memory. In the 1st model, an image is encoded as a 9072 bit string to be used as a read or write address; the bit string may also be used as data to be stored in the memory. Another representation, in which an image is encoded as a 256 bit string, may be used with either model as data to be stored in the memory, but not as an address. In the 2nd model, an image is not represented as a vector of fixed length to be used as an address. Instead, a rule is given for determining which memory locations are to be activated in response to an encoded image. This activation rule treats the pieces of an image as an unordered set. With this model, the memory can be simulated, based on a method of computing the approximate result of a read operation.

  16. Computing with memory for energy-efficient robust systems

    CERN Document Server

    Paul, Somnath

    2013-01-01

    This book analyzes energy and reliability as major challenges faced by designers of computing frameworks in the nanometer technology regime.  The authors describe the existing solutions to address these challenges and then reveal a new reconfigurable computing platform, which leverages high-density nanoscale memory for both data storage and computation to maximize the energy-efficiency and reliability. The energy and reliability benefits of this new paradigm are illustrated and the design challenges are discussed. Various hardware and software aspects of this exciting computing paradigm are de

  17. Data-acquisition systems for the present and the future

    International Nuclear Information System (INIS)

    Drobnis, D.D.

    1982-09-01

    Basic components of today's acquisition systems are surveyed. These include front-end tools such as microprocessors, programmable controllers, and CAMAC interfaces. Some key concepts in large central real-time systems are examined: Hardware and Software architecture, and data base structure. Some trends in present data acquisition system design are analyzed, including increasing distribution of system functions and expansion to hierarchical multi-processor netowrks. With the evolution of microprocessors, front-end intelligence is growing into front-end computing power. Real-time host systems are becoming increasingly sophisticated human interface and data base management tools, with increasingly complex operating systems, and increasing amounts of memory, mass storage, and computing power. And the ultimate analysis of plasma data is becoming increasingly sophisticated

  18. High-speed computation of the EM algorithm for PET image reconstruction

    International Nuclear Information System (INIS)

    Rajan, K.; Patnaik, L.M.; Ramakrishna, J.

    1994-01-01

    The PET image reconstruction based on the EM algorithm has several attractive advantages over the conventional convolution backprojection algorithms. However, two major drawbacks have impeded the routine use of the EM algorithm, namely, the long computational time due to slow convergence and the large memory required for the storage of the image, projection data and the probability matrix. In this study, the authors attempts to solve these two problems by parallelizing the EM algorithm on a multiprocessor system. The authors have implemented an extended hypercube (EH) architecture for the high-speed computation of the EM algorithm using the commercially available fast floating point digital signal processor (DSP) chips as the processing elements (PEs). The authors discuss and compare the performance of the EM algorithm on a 386/387 machine, CD 4360 mainframe, and on the EH system. The results show that the computational speed performance of an EH using DSP chips as PEs executing the EM image reconstruction algorithm is about 130 times better than that of the CD 4360 mainframe. The EH topology is expandable with more number of PEs

  19. Contributing to the design of run-time systems dedicated to high performance computing; Contribution a l'elaboration d'environnements de programmation dedies au calcul scientifique hautes performances

    Energy Technology Data Exchange (ETDEWEB)

    Perache, M

    2006-10-15

    In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)

  20. Single-Chip Computers With Microelectromechanical Systems-Based Magnetic Memory

    NARCIS (Netherlands)

    Carley, L. Richard; Bain, James A.; Fedder, Gary K.; Greve, David W.; Guillou, David F.; Lu, Michael S.C.; Mukherjee, Tamal; Santhanam, Suresh; Abelmann, Leon; Min, Seungook

    This article describes an approach for implementing a complete computer system (CPU, RAM, I/O, and nonvolatile mass memory) on a single integrated-circuit substrate (a chip)—hence, the name "single-chip computer." The approach presented combines advances in the field of microelectromechanical

  1. Continuous-variable quantum computing in optical time-frequency modes using quantum memories.

    Science.gov (United States)

    Humphreys, Peter C; Kolthammer, W Steven; Nunn, Joshua; Barbieri, Marco; Datta, Animesh; Walmsley, Ian A

    2014-09-26

    We develop a scheme for time-frequency encoded continuous-variable cluster-state quantum computing using quantum memories. In particular, we propose a method to produce, manipulate, and measure two-dimensional cluster states in a single spatial mode by exploiting the intrinsic time-frequency selectivity of Raman quantum memories. Time-frequency encoding enables the scheme to be extremely compact, requiring a number of memories that are a linear function of only the number of different frequencies in which the computational state is encoded, independent of its temporal duration. We therefore show that quantum memories can be a powerful component for scalable photonic quantum information processing architectures.

  2. Distributed memory in a heterogeneous network, as used in the CERN-PS complex timing system

    CERN Document Server

    Kovaltsov, V I

    1995-01-01

    The Distributed Table Manager (DTM) is a fast and efficient utility for distributing named binary data structures called Tables, of arbitrary size and structure, around a heterogeneous network of computers to a set of registered clients. The Tables are transmitted over a UDP network between DTM servers in network format, where the servers perform the conversions to and from host format for local clients. The servers provide clients with synchronization mechanisms, a choice of network data flows, and table options such as keeping table disc copies, shared memory or heap memory table allocation, table read/write permissions, and table subnet broadcasting. DTM has been designed to be easily maintainable, and to automatically recover from the type of errors typically encountered in a large control system network. The DTM system is based on a three level server daemon hierarchy, in which an inter daemon protocol handles network failures, and incorporates recovery procedures which will guarantee table consistency w...

  3. ATLAS Distributed Computing Automation

    CERN Document Server

    Schovancova, J; The ATLAS collaboration; Borrego, C; Campana, S; Di Girolamo, A; Elmsheuser, J; Hejbal, J; Kouba, T; Legger, F; Magradze, E; Medrano Llamas, R; Negri, G; Rinaldi, L; Sciacca, G; Serfon, C; Van Der Ster, D C

    2012-01-01

    The ATLAS Experiment benefits from computing resources distributed worldwide at more than 100 WLCG sites. The ATLAS Grid sites provide over 100k CPU job slots, over 100 PB of storage space on disk or tape. Monitoring of status of such a complex infrastructure is essential. The ATLAS Grid infrastructure is monitored 24/7 by two teams of shifters distributed world-wide, by the ATLAS Distributed Computing experts, and by site administrators. In this paper we summarize automation efforts performed within the ATLAS Distributed Computing team in order to reduce manpower costs and improve the reliability of the system. Different aspects of the automation process are described: from the ATLAS Grid site topology provided by the ATLAS Grid Information System, via automatic site testing by the HammerCloud, to automatic exclusion from production or analysis activities.

  4. Multiprocessor performance modeling with ADAS

    Science.gov (United States)

    Hayes, Paul J.; Andrews, Asa M.

    1989-01-01

    A graph managing strategy referred to as the Algorithm to Architecture Mapping Model (ATAMM) appears useful for the time-optimized execution of application algorithm graphs in embedded multiprocessors and for the performance prediction of graph designs. This paper reports the modeling of ATAMM in the Architecture Design and Assessment System (ADAS) to make an independent verification of ATAMM's performance prediction capability and to provide a user framework for the evaluation of arbitrary algorithm graphs. Following an overview of ATAMM and its major functional rules are descriptions of the ADAS model of ATAMM, methods to enter an arbitrary graph into the model, and techniques to analyze the simulation results. The performance of a 7-node graph example is evaluated using the ADAS model and verifies the ATAMM concept by substantiating previously published performance results.

  5. A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing.

    Energy Technology Data Exchange (ETDEWEB)

    Vineyard, Craig Michael [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verzi, Stephen Joseph [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-09-01

    As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilize memory.

  6. The reminiscence bump without memories: The distribution of imagined word-cued and important autobiographical memories in a hypothetical 70-year-old.

    Science.gov (United States)

    Koppel, Jonathan; Berntsen, Dorthe

    2016-08-01

    The reminiscence bump is the disproportionate number of autobiographical memories dating from adolescence and early adulthood. It has often been ascribed to a consolidation of the mature self in the period covered by the bump. Here we stripped away factors relating to the characteristics of autobiographical memories per se, most notably factors that aid in their encoding or retention, by asking students to generate imagined word-cued and imagined 'most important' autobiographical memories of a hypothetical, prototypical 70-year-old of their own culture and gender. We compared the distribution of these fictional memories with the distributions of actual word-cued and most important autobiographical memories in a sample of 61-70-year-olds. We found a striking similarity between the temporal distributions of the imagined memories and the actual memories. These results suggest that the reminiscence bump is largely driven by constructive, schematic factors at retrieval, thereby challenging most existing theoretical accounts. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Automatic code generation for distributed robotic systems

    International Nuclear Information System (INIS)

    Jones, J.P.

    1993-01-01

    Hetero Helix is a software environment which supports relatively large robotic system development projects. The environment supports a heterogeneous set of message-passing LAN-connected common-bus multiprocessors, but the programming model seen by software developers is a simple shared memory. The conceptual simplicity of shared memory makes it an extremely attractive programming model, especially in large projects where coordinating a large number of people can itself become a significant source of complexity. We present results from three system development efforts conducted at Oak Ridge National Laboratory over the past several years. Each of these efforts used automatic software generation to create 10 to 20 percent of the system

  8. emMAW: computing minimal absent words in external memory.

    Science.gov (United States)

    Héliou, Alice; Pissis, Solon P; Puglisi, Simon J

    2017-09-01

    The biological significance of minimal absent words has been investigated in genomes of organisms from all domains of life. For instance, three minimal absent words of the human genome were found in Ebola virus genomes. There exists an O(n) -time and O(n) -space algorithm for computing all minimal absent words of a sequence of length n on a fixed-sized alphabet based on suffix arrays. A standard implementation of this algorithm, when applied to a large sequence of length n , requires more than 20 n  bytes of RAM. Such memory requirements are a significant hurdle to the computation of minimal absent words in large datasets. We present emMAW, the first external-memory algorithm for computing minimal absent words. A free open-source implementation of our algorithm is made available. This allows for computation of minimal absent words on far bigger data sets than was previously possible. Our implementation requires less than 3 h on a standard workstation to process the full human genome when as little as 1 GB of RAM is made available. We stress that our implementation, despite making use of external memory, is fast; indeed, even on relatively smaller datasets when enough RAM is available to hold all necessary data structures, it is less than two times slower than state-of-the-art internal-memory implementations. https://github.com/solonas13/maw (free software under the terms of the GNU GPL). alice.heliou@lix.polytechnique.fr or solon.pissis@kcl.ac.uk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  9. Injecting Artificial Memory Errors Into a Running Computer Program

    Science.gov (United States)

    Bornstein, Benjamin J.; Granat, Robert A.; Wagstaff, Kiri L.

    2008-01-01

    Single-event upsets (SEUs) or bitflips are computer memory errors caused by radiation. BITFLIPS (Basic Instrumentation Tool for Fault Localized Injection of Probabilistic SEUs) is a computer program that deliberately injects SEUs into another computer program, while the latter is running, for the purpose of evaluating the fault tolerance of that program. BITFLIPS was written as a plug-in extension of the open-source Valgrind debugging and profiling software. BITFLIPS can inject SEUs into any program that can be run on the Linux operating system, without needing to modify the program s source code. Further, if access to the original program source code is available, BITFLIPS offers fine-grained control over exactly when and which areas of memory (as specified via program variables) will be subjected to SEUs. The rate of injection of SEUs is controlled by specifying either a fault probability or a fault rate based on memory size and radiation exposure time, in units of SEUs per byte per second. BITFLIPS can also log each SEU that it injects and, if program source code is available, report the magnitude of effect of the SEU on a floating-point value or other program variable.

  10. Computational and empirical simulations of selective memory impairments: Converging evidence for a single-system account of memory dissociations.

    Science.gov (United States)

    Curtis, Evan T; Jamieson, Randall K

    2018-04-01

    Current theory has divided memory into multiple systems, resulting in a fractionated account of human behaviour. By an alternative perspective, memory is a single system. However, debate over the details of different single-system theories has overshadowed the converging agreement among them, slowing the reunification of memory. Evidence in favour of dividing memory often takes the form of dissociations observed in amnesia, where amnesic patients are impaired on some memory tasks but not others. The dissociations are taken as evidence for separate explicit and implicit memory systems. We argue against this perspective. We simulate two key dissociations between classification and recognition in a computational model of memory, A Theory of Nonanalytic Association. We assume that amnesia reflects a quantitative difference in the quality of encoding. We also present empirical evidence that replicates the dissociations in healthy participants, simulating amnesic behaviour by reducing study time. In both analyses, we successfully reproduce the dissociations. We integrate our computational and empirical successes with the success of alternative models and manipulations and argue that our demonstrations, taken in concert with similar demonstrations with similar models, provide converging evidence for a more general set of single-system analyses that support the conclusion that a wide variety of memory phenomena can be explained by a unified and coherent set of principles.

  11. Monitoring of computing resource use of active software releases at ATLAS

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00219183; The ATLAS collaboration

    2017-01-01

    The LHC is the world’s most powerful particle accelerator, colliding protons at centre of mass energy of 13 TeV. As the energy and frequency of collisions has grown in the search for new physics, so too has demand for computing resources needed for event reconstruction. We will report on the evolution of resource usage in terms of CPU and RAM in key ATLAS offline reconstruction workflows at the TierO at CERN and on the WLCG. Monitoring of workflows is achieved using the ATLAS PerfMon package, which is the standard ATLAS performance monitoring system running inside Athena jobs. Systematic daily monitoring has recently been expanded to include all workflows beginning at Monte Carlo generation through to end-user physics analysis, beyond that of event reconstruction. Moreover, the move to a multiprocessor mode in production jobs has facilitated the use of tools, such as “MemoryMonitor”, to measure the memory shared across processors in jobs. Resource consumption is broken down into software domains and dis...

  12. Translation techniques for distributed-shared memory programming models

    Energy Technology Data Exchange (ETDEWEB)

    Fuller, Douglas James [Iowa State Univ., Ames, IA (United States)

    2005-01-01

    The high performance computing community has experienced an explosive improvement in distributed-shared memory hardware. Driven by increasing real-world problem complexity, this explosion has ushered in vast numbers of new systems. Each new system presents new challenges to programmers and application developers. Part of the challenge is adapting to new architectures with new performance characteristics. Different vendors release systems with widely varying architectures that perform differently in different situations. Furthermore, since vendors need only provide a single performance number (total MFLOPS, typically for a single benchmark), they only have strong incentive initially to optimize the API of their choice. Consequently, only a fraction of the available APIs are well optimized on most systems. This causes issues porting and writing maintainable software, let alone issues for programmers burdened with mastering each new API as it is released. Also, programmers wishing to use a certain machine must choose their API based on the underlying hardware instead of the application. This thesis argues that a flexible, extensible translator for distributed-shared memory APIs can help address some of these issues. For example, a translator might take as input code in one API and output an equivalent program in another. Such a translator could provide instant porting for applications to new systems that do not support the application's library or language natively. While open-source APIs are abundant, they do not perform optimally everywhere. A translator would also allow performance testing using a single base code translated to a number of different APIs. Most significantly, this type of translator frees programmers to select the most appropriate API for a given application based on the application (and developer) itself instead of the underlying hardware.

  13. Report on evaluation of research and development of superhigh-function electronic computers; Chokoseino denshi keisanki no kenkyu kaihatsu ni kansuru hyoka hokokusho

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1973-02-20

    Described herein is development of superhigh-function electronic computers.This project was implemented on a 6-year joint project, beginning in FY 1966, by the government, industrial and academic circles, with the objective to develop standard, large-size computers comparable with those of the world's highest functions by the beginning of the 70's. The computers developed by this project met almost all of the specifications of the world's representative, large-size commercial computers, partly surpassing the world's machine. In particular, integration of the virtual memory, buffer memory and multi-processor functions, which were considered to be the central technical features of the computers of the next generation, into one system was a Japan's unique concept, not seen in other countries. The other developments considered to have great ripple effects are seen in LSI's, and techniques for utilizing and mounting them and for improving their reliability. Development of magnetic discs is another notable result for the peripheral devices. Development of the input/output devices was started to correspond to inputting, outputting and reading Chinese characters, which are characteristics of Japan. The software developed has sufficient functions for common use and is considered to be the world's leading, large-size operating system, although evaluation thereof largely awaits the actual specification results. (NEDO)

  14. Communication strategies for angular domain decomposition of transport calculations on message passing multiprocessors

    International Nuclear Information System (INIS)

    Azmy, Y.Y.

    1997-01-01

    The effect of three communication schemes for solving Arbitrarily High Order Transport (AHOT) methods of the Nodal type on parallel performance is examined via direct measurements and performance models. The target architecture in this study is Oak Ridge National Laboratory's 128 node Paragon XP/S 5 computer and the parallelization is based on the Parallel Virtual Machine (PVM) library. However, the conclusions reached can be easily generalized to a large class of message passing platforms and communication software. The three schemes considered here are: (1) PVM's global operations (broadcast and reduce) which utilizes the Paragon's native corresponding operations based on a spanning tree routing; (2) the Bucket algorithm wherein the angular domain decomposition of the mesh sweep is complemented with a spatial domain decomposition of the accumulation process of the scalar flux from the angular flux and the convergence test; (3) a distributed memory version of the Bucket algorithm that pushes the spatial domain decomposition one step farther by actually distributing the fixed source and flux iterates over the memories of the participating processes. Their conclusion is that the Bucket algorithm is the most efficient of the three if all participating processes have sufficient memories to hold the entire problem arrays. Otherwise, the third scheme becomes necessary at an additional cost to speedup and parallel efficiency that is quantifiable via the parallel performance model

  15. A memory-array architecture for computer vision

    Energy Technology Data Exchange (ETDEWEB)

    Balsara, P.T.

    1989-01-01

    With the fast advances in the area of computer vision and robotics there is a growing need for machines that can understand images at a very high speed. A conventional von Neumann computer is not suited for this purpose because it takes a tremendous amount of time to solve most typical image processing problems. Exploiting the inherent parallelism present in various vision tasks can significantly reduce the processing time. Fortunately, parallelism is increasingly affordable as hardware gets cheaper. Thus it is now imperative to study computer vision in a parallel processing framework. The author should first design a computational structure which is well suited for a wide range of vision tasks and then develop parallel algorithms which can run efficiently on this structure. Recent advances in VLSI technology have led to several proposals for parallel architectures for computer vision. In this thesis he demonstrates that a memory array architecture with efficient local and global communication capabilities can be used for high speed execution of a wide range of computer vision tasks. This architecture, called the Access Constrained Memory Array Architecture (ACMAA), is efficient for VLSI implementation because of its modular structure, simple interconnect and limited global control. Several parallel vision algorithms have been designed for this architecture. The choice of vision problems demonstrates the versatility of ACMAA for a wide range of vision tasks. These algorithms were simulated on a high level ACMAA simulator running on the Intel iPSC/2 hypercube, a parallel architecture. The results of this simulation are compared with those of sequential algorithms running on a single hypercube node. Details of the ACMAA processor architecture are also presented.

  16. Centralized multiprocessor control system for the frascati storage rings DAΦNE

    International Nuclear Information System (INIS)

    Di Pirro, G.; Milardi, C.; Serio, M.

    1992-01-01

    We describe the status of the DANTE (DAΦne New Tools Environment) control system for the new DAΦNE Φ-factory under construction at the Frascati National Laboratories. The system is based on a centralized communication architecture for simplicity and reliability. A central processor unit coordinates all communications between the consoles and the lower level distributed processing power, and continuously updates a central memory that contains the whole machine status. We have developed a system of VME Fiber Optic interfaces allowing very fast point to point communication between distant processors. Macintosh II personal computers are used as consoles. The lower levels are all built using the VME standard. (author)

  17. The discrete-dipole-approximation code ADDA: capabilities and known limitations

    NARCIS (Netherlands)

    Yurkin, M.A.; Hoekstra, A.G.

    2011-01-01

    The open-source code ADDA is described, which implements the discrete dipole approximation (DDA), a method to simulate light scattering by finite 3D objects of arbitrary shape and composition. Besides standard sequential execution, ADDA can run on a multiprocessor distributed-memory system,

  18. Memory-assisted quantum key distribution resilient against multiple-excitation effects

    Science.gov (United States)

    Lo Piparo, Nicolò; Sinclair, Neil; Razavi, Mohsen

    2018-01-01

    Memory-assisted measurement-device-independent quantum key distribution (MA-MDI-QKD) has recently been proposed as a technique to improve the rate-versus-distance behavior of QKD systems by using existing, or nearly-achievable, quantum technologies. The promise is that MA-MDI-QKD would require less demanding quantum memories than the ones needed for probabilistic quantum repeaters. Nevertheless, early investigations suggest that, in order to beat the conventional memory-less QKD schemes, the quantum memories used in the MA-MDI-QKD protocols must have high bandwidth-storage products and short interaction times. Among different types of quantum memories, ensemble-based memories offer some of the required specifications, but they typically suffer from multiple excitation effects. To avoid the latter issue, in this paper, we propose two new variants of MA-MDI-QKD both relying on single-photon sources for entangling purposes. One is based on known techniques for entanglement distribution in quantum repeaters. This scheme turns out to offer no advantage even if one uses ideal single-photon sources. By finding the root cause of the problem, we then propose another setup, which can outperform single memory-less setups even if we allow for some imperfections in our single-photon sources. For such a scheme, we compare the key rate for different types of ensemble-based memories and show that certain classes of atomic ensembles can improve the rate-versus-distance behavior.

  19. DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

    NARCIS (Netherlands)

    Stefanov, T.; Pimentel, A.; Nikolov, H.; Ha, S.; Teich, J.

    2017-01-01

    The complexity of modern embedded systems, which are increasingly based on heterogeneous multiprocessor system-on-chip (MPSoC) architectures, has led to the emergence of system-level design. To cope with this design complexity, system-level design aims at raising the abstraction level of the design

  20. Parallel discrete ordinates algorithms on distributed and common memory systems

    International Nuclear Information System (INIS)

    Wienke, B.R.; Hiromoto, R.E.; Brickner, R.G.

    1987-01-01

    The S/sub n/ algorithm employs iterative techniques in solving the linear Boltzmann equation. These methods, both ordered and chaotic, were compared on both the Denelcor HEP and the Intel hypercube. Strategies are linked to the organization and accessibility of memory (common memory versus distributed memory architectures), with common concern for acquisition of global information. Apart from this, the inherent parallelism of the algorithm maps directly onto the two architectures. Results comparing execution times, speedup, and efficiency are based on a representative 16-group (full upscatter and downscatter) sample problem. Calculations were performed on both the Los Alamos National Laboratory (LANL) Denelcor HEP and the LANL Intel hypercube. The Denelcor HEP is a 64-bit multi-instruction, multidate MIMD machine consisting of up to 16 process execution modules (PEMs), each capable of executing 64 processes concurrently. Each PEM can cooperate on a job, or run several unrelated jobs, and share a common global memory through a crossbar switch. The Intel hypercube, on the other hand, is a distributed memory system composed of 128 processing elements, each with its own local memory. Processing elements are connected in a nearest-neighbor hypercube configuration and sharing of data among processors requires execution of explicit message-passing constructs

  1. Parallel grid generation algorithm for distributed memory computers

    Science.gov (United States)

    Moitra, Stuti; Moitra, Anutosh

    1994-01-01

    A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.

  2. Design considerations for a multiprocessor based data acquisition system

    International Nuclear Information System (INIS)

    Tippie, J.W.; Kulaga, J.E.

    1979-01-01

    The rapid advance of digital technology has provided the systems designer with many new design options. Hardware is no longer the controlling expense. Complex operating systems provide the flexibility and development tools needed by software designers, but restrict throughput. Multiprocessor-based systems can be used to ''front-end'' high-throughput applications while maintaining the many advantages offered by multitasking operating systems. The design of a high-throughput data acquisition system for application in low energy nuclear physics is considered

  3. Performance modeling of parallel algorithms for solving neutron diffusion problems

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Kirk, B.L.

    1995-01-01

    Neutron diffusion calculations are the most common computational methods used in the design, analysis, and operation of nuclear reactors and related activities. Here, mathematical performance models are developed for the parallel algorithm used to solve the neutron diffusion equation on message passing and shared memory multiprocessors represented by the Intel iPSC/860 and the Sequent Balance 8000, respectively. The performance models are validated through several test problems, and these models are used to estimate the performance of each of the two considered architectures in situations typical of practical applications, such as fine meshes and a large number of participating processors. While message passing computers are capable of producing speedup, the parallel efficiency deteriorates rapidly as the number of processors increases. Furthermore, the speedup fails to improve appreciably for massively parallel computers so that only small- to medium-sized message passing multiprocessors offer a reasonable platform for this algorithm. In contrast, the performance model for the shared memory architecture predicts very high efficiency over a wide range of number of processors reasonable for this architecture. Furthermore, the model efficiency of the Sequent remains superior to that of the hypercube if its model parameters are adjusted to make its processors as fast as those of the iPSC/860. It is concluded that shared memory computers are better suited for this parallel algorithm than message passing computers

  4. Reprogrammable logic in memristive crossbar for in-memory computing

    Science.gov (United States)

    Cheng, Long; Zhang, Mei-Yun; Li, Yi; Zhou, Ya-Xiong; Wang, Zhuo-Rui; Hu, Si-Yu; Long, Shi-Bing; Liu, Ming; Miao, Xiang-Shui

    2017-12-01

    Memristive stateful logic has emerged as a promising next-generation in-memory computing paradigm to address escalating computing-performance pressures in traditional von Neumann architecture. Here, we present a nonvolatile reprogrammable logic method that can process data between different rows and columns in a memristive crossbar array based on material implication (IMP) logic. Arbitrary Boolean logic can be executed with a reprogrammable cell containing four memristors in a crossbar array. In the fabricated Ti/HfO2/W memristive array, some fundamental functions, such as universal NAND logic and data transfer, were experimentally implemented. Moreover, using eight memristors in a 2  ×  4 array, a one-bit full adder was theoretically designed and verified by simulation to exhibit the feasibility of our method to accomplish complex computing tasks. In addition, some critical logic-related performances were further discussed, such as the flexibility of data processing, cascading problem and bit error rate. Such a method could be a step forward in developing IMP-based memristive nonvolatile logic for large-scale in-memory computing architecture.

  5. Coping with distributed computing

    International Nuclear Information System (INIS)

    Cormell, L.

    1992-09-01

    The rapid increase in the availability of high performance, cost-effective RISC/UNIX workstations has been both a blessing and a curse. The blessing of having extremely powerful computing engines available on the desk top is well-known to many users. The user has tremendous freedom, flexibility, and control of his environment. That freedom can, however, become the curse of distributed computing. The user must become a system manager to some extent, he must worry about backups, maintenance, upgrades, etc. Traditionally these activities have been the responsibility of a central computing group. The central computing group, however, may find that it can no longer provide all of the traditional services. With the plethora of workstations now found on so many desktops throughout the entire campus or lab, the central computing group may be swamped by support requests. This talk will address several of these computer support and management issues by providing some examples of the approaches taken at various HEP institutions. In addition, a brief review of commercial directions or products for distributed computing and management will be given

  6. Differentiation and Response Bias in Episodic Memory: Evidence from Reaction Time Distributions

    Science.gov (United States)

    Criss, Amy H.

    2010-01-01

    In differentiation models, the processes of encoding and retrieval produce an increase in the distribution of memory strength for targets and a decrease in the distribution of memory strength for foils as the amount of encoding increases. This produces an increase in the hit rate and decrease in the false-alarm rate for a strongly encoded compared…

  7. A distributed real-time operating system

    International Nuclear Information System (INIS)

    Tuynman, F.; Hertzberger, L.O.

    1984-07-01

    A distributed real-time operating system, Fados, has been developed for an embedded multi-processor system. The operating system is based on a host target approach and provides for communication between arbitrary processes on host and target machine. The facilities offered are, apart from process communication, access to the file system on the host by programs on the target machine and monitoring and debugging of programs on the target machine from the host. The process communication has been designed in such a way that the possibilities are the same as those offered by the Ada programming language. The operating system is implemented on a MC 68000 based multiprocessor system in combination with a Unix host. (orig.)

  8. An FPGA design flow for reconfigurable network-based multi-processor systems on chip

    NARCIS (Netherlands)

    Kumar, A.; Hansson, M.A; Huisken, J.; Corporaal, H.

    2007-01-01

    Multi-processor systems on chip (MPSoC) platforms are becoming increasingly more heterogeneous and are shifting towards a more communication-centric methodology. Networks on chip (NoC) have emerged as the design paradigm for scalable on-chip communication architectures. As the system complexity

  9. Centaure: an heterogeneous parallel architecture for computer vision

    International Nuclear Information System (INIS)

    Peythieux, Marc

    1997-01-01

    This dissertation deals with the architecture of parallel computers dedicated to computer vision. In the first chapter, the problem to be solved is presented, as well as the architecture of the Sympati and Symphonie computers, on which this work is based. The second chapter is about the state of the art of computers and integrated processors that can execute computer vision and image processing codes. The third chapter contains a description of the architecture of Centaure. It has an heterogeneous structure: it is composed of a multiprocessor system based on Analog Devices ADSP21060 Sharc digital signal processor, and of a set of Symphonie computers working in a multi-SIMD fashion. Centaure also has a modular structure. Its basic node is composed of one Symphonie computer, tightly coupled to a Sharc thanks to a dual ported memory. The nodes of Centaure are linked together by the Sharc communication links. The last chapter deals with a performance validation of Centaure. The execution times on Symphonie and on Centaure of a benchmark which is typical of industrial vision, are presented and compared. In the first place, these results show that the basic node of Centaure allows a faster execution than Symphonie, and that increasing the size of the tested computer leads to a better speed-up with Centaure than with Symphonie. In the second place, these results validate the choice of running the low level structure of Centaure in a multi- SIMD fashion. (author) [fr

  10. Parallel computing by Monte Carlo codes MVP/GMVP

    International Nuclear Information System (INIS)

    Nagaya, Yasunobu; Nakagawa, Masayuki; Mori, Takamasa

    2001-01-01

    General-purpose Monte Carlo codes MVP/GMVP are well-vectorized and thus enable us to perform high-speed Monte Carlo calculations. In order to achieve more speedups, we parallelized the codes on the different types of parallel computing platforms or by using a standard parallelization library MPI. The platforms used for benchmark calculations are a distributed-memory vector-parallel computer Fujitsu VPP500, a distributed-memory massively parallel computer Intel paragon and a distributed-memory scalar-parallel computer Hitachi SR2201, IBM SP2. As mentioned generally, linear speedup could be obtained for large-scale problems but parallelization efficiency decreased as the batch size per a processing element(PE) was smaller. It was also found that the statistical uncertainty for assembly powers was less than 0.1% by the PWR full-core calculation with more than 10 million histories and it took about 1.5 hours by massively parallel computing. (author)

  11. Massively Parallel Polar Decomposition on Distributed-Memory Systems

    KAUST Repository

    Ltaief, Hatem; Sukkari, Dalal E.; Esposito, Aniello; Nakatsukasa, Yuji; Keyes, David E.

    2018-01-01

    We present a high-performance implementation of the Polar Decomposition (PD) on distributed-memory systems. Building upon on the QR-based Dynamically Weighted Halley (QDWH) algorithm, the key idea lies in finding the best rational approximation

  12. Distributed GPU Computing in GIScience

    Science.gov (United States)

    Jiang, Y.; Yang, C.; Huang, Q.; Li, J.; Sun, M.

    2013-12-01

    Geoscientists strived to discover potential principles and patterns hidden inside ever-growing Big Data for scientific discoveries. To better achieve this objective, more capable computing resources are required to process, analyze and visualize Big Data (Ferreira et al., 2003; Li et al., 2013). Current CPU-based computing techniques cannot promptly meet the computing challenges caused by increasing amount of datasets from different domains, such as social media, earth observation, environmental sensing (Li et al., 2013). Meanwhile CPU-based computing resources structured as cluster or supercomputer is costly. In the past several years with GPU-based technology matured in both the capability and performance, GPU-based computing has emerged as a new computing paradigm. Compare to traditional computing microprocessor, the modern GPU, as a compelling alternative microprocessor, has outstanding high parallel processing capability with cost-effectiveness and efficiency(Owens et al., 2008), although it is initially designed for graphical rendering in visualization pipe. This presentation reports a distributed GPU computing framework for integrating GPU-based computing within distributed environment. Within this framework, 1) for each single computer, computing resources of both GPU-based and CPU-based can be fully utilized to improve the performance of visualizing and processing Big Data; 2) within a network environment, a variety of computers can be used to build up a virtual super computer to support CPU-based and GPU-based computing in distributed computing environment; 3) GPUs, as a specific graphic targeted device, are used to greatly improve the rendering efficiency in distributed geo-visualization, especially for 3D/4D visualization. Key words: Geovisualization, GIScience, Spatiotemporal Studies Reference : 1. Ferreira de Oliveira, M. C., & Levkowitz, H. (2003). From visual data exploration to visual data mining: A survey. Visualization and Computer Graphics, IEEE

  13. Spin-wave interference patterns created by spin-torque nano-oscillators for memory and computation

    International Nuclear Information System (INIS)

    Macia, Ferran; Kent, Andrew D; Hoppensteadt, Frank C

    2011-01-01

    Magnetization dynamics in nanomagnets has attracted broad interest since it was predicted that a dc current flowing through a thin magnetic layer can create spin-wave excitations. These excitations are due to spin momentum transfer, a transfer of spin angular momentum between conduction electrons and the background magnetization, that enables new types of information processing. Here we show how arrays of spin-torque nano-oscillators can create propagating spin-wave interference patterns of use for memory and computation. Memristic transponders distributed on the thin film respond to threshold tunnel magnetoresistance values, thereby allowing spin-wave detection and creating new excitation patterns. We show how groups of transponders create resonant (reverberating) spin-wave interference patterns that may be used for polychronous wave computation and information storage.

  14. Distributed computing and nuclear reactor analysis

    International Nuclear Information System (INIS)

    Brown, F.B.; Derstine, K.L.; Blomquist, R.N.

    1994-01-01

    Large-scale scientific and engineering calculations for nuclear reactor analysis can now be carried out effectively in a distributed computing environment, at costs far lower than for traditional mainframes. The distributed computing environment must include support for traditional system services, such as a queuing system for batch work, reliable filesystem backups, and parallel processing capabilities for large jobs. All ANL computer codes for reactor analysis have been adapted successfully to a distributed system based on workstations and X-terminals. Distributed parallel processing has been demonstrated to be effective for long-running Monte Carlo calculations

  15. Scaling Techniques for Massive Scale-Free Graphs in Distributed (External) Memory

    KAUST Repository

    Pearce, Roger; Gokhale, Maya; Amato, Nancy M.

    2013-01-01

    We present techniques to process large scale-free graphs in distributed memory. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers and clusters with local non-volatile memory, e.g., NAND Flash

  16. Mobile Thread Task Manager

    Science.gov (United States)

    Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin J.

    2013-01-01

    The Mobile Thread Task Manager (MTTM) is being applied to parallelizing existing flight software to understand the benefits and to develop new techniques and architectural concepts for adapting software to multicore architectures. It allocates and load-balances tasks for a group of threads that migrate across processors to improve cache performance. In order to balance-load across threads, the MTTM augments a basic map-reduce strategy to draw jobs from a global queue. In a multicore processor, memory may be "homed" to the cache of a specific processor and must be accessed from that processor. The MTTB architecture wraps access to data with thread management to move threads to the home processor for that data so that the computation follows the data in an attempt to avoid L2 cache misses. Cache homing is also handled by a memory manager that translates identifiers to processor IDs where the data will be homed (according to rules defined by the user). The user can also specify the number of threads and processors separately, which is important for tuning performance for different patterns of computation and memory access. MTTM efficiently processes tasks in parallel on a multiprocessor computer. It also provides an interface to make it easier to adapt existing software to a multiprocessor environment.

  17. Solution of equations of electric power networks in multiprocessor computers; Solucao de equacoes de redes de energia eletrica em computadores multiprocessadores

    Energy Technology Data Exchange (ETDEWEB)

    Feltrin, Antonio Padilha

    1991-05-01

    This thesis describes a methodology for decomposing the repeat solution process the equation Ax = b independent tasks to be done in parallel, based on the matrix inverse factors (W-matrix) with partitions. The partitioning scheme proposed in this thesis consists of breaking up the W-matrix according the depths of the factorization path tree. In this scheme, all the information needed to generate the partitions can be obtained straightforward from the network factorization path tree. The partitioning algorithm is simple and ease to implement. The elements of W, matrices, except the last partition, can be obtained directly from L-matrix elements, not requiring extra work. The proposed scheme guarantees that additional fills be only created in the last partition. The forward and backward solutions are performed by rows, and the strategy proposed is to schedule on beach processor the operations corresponding to a row of each partition. It should be kept in mind that the multiprocessor environments are equipped with powerful unit processor and then it seems a sound strategy to perform the multi-add elementary operations inside the hardware in order to exploit its computing efficiency. This strategy seeks to match the parallel algorithm to the parallel architecture. The precedent relations - that give rise to delays - are replaced by multi-add operations performed inside the processor mode without external communication. In the partition, the forward, diagonal and backward solutions may be gathered, and so all the operations can be expressed as the product of matrix W{sub lp} by the update components of vector b. The performance results show that the potential speedup of the solution time is essentially bounded by the floating point operation capability of each processor, denoting that the methodology is a suitable way to exploit the growing power of the computing technology. 46 figs, 40 refs, 49 tabs.

  18. Childhood amnesia in the making: different distributions of autobiographical memories in children and adults.

    Science.gov (United States)

    Bauer, Patricia J; Larkina, Marina

    2014-04-01

    Within the memory literature, a robust finding is of childhood amnesia: a relative paucity among adults for autobiographical or personal memories from the first 3 to 4 years of life, and from the first 7 years, a smaller number of memories than would be expected based on normal forgetting. Childhood amnesia is observed in spite of strong evidence that during the period eventually obscured by the amnesia, children construct and preserve autobiographical memories. Why early memories seemingly are lost to recollection is an unanswered question. In the present research, we examined the issue by using the cue word technique to chart the distributions of autobiographical memories in samples of children ages 7 to 11 years and samples of young and middle-aged adults. Among adults, the distributions were best fit by the power function, whereas among children, the exponential function provided a better fit to the distributions of memories. The findings suggest that a major source of childhood amnesia is a constant rate of forgetting in childhood, seemingly resulting from failed consolidation, the outcome of which is a smaller pool of memories available for later retrieval.

  19. Support system for ATLAS distributed computing operations

    CERN Document Server

    Kishimoto, Tomoe; The ATLAS collaboration

    2018-01-01

    The ATLAS distributed computing system has allowed the experiment to successfully meet the challenges of LHC Run 2. In order for distributed computing to operate smoothly and efficiently, several support teams are organized in the ATLAS experiment. The ADCoS (ATLAS Distributed Computing Operation Shifts) is a dedicated group of shifters who follow and report failing jobs, failing data transfers between sites, degradation of ATLAS central computing services, and more. The DAST (Distributed Analysis Support Team) provides user support to resolve issues related to running distributed analysis on the grid. The CRC (Computing Run Coordinator) maintains a global view of the day-to-day operations. In this presentation, the status and operational experience of the support system for ATLAS distributed computing in LHC Run 2 will be reported. This report also includes operations experience from the grid site point of view, and an analysis of the errors that create the biggest waste of wallclock time. The report of oper...

  20. MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.

    Science.gov (United States)

    González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil

    2016-12-15

    MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. A high speed multi-tasking, multi-processor telemetry system

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Kung Chris [Univ. of Texas, El Paso, TX (United States)

    1996-12-31

    This paper describes a small size, light weight, multitasking, multiprocessor telemetry system capable of collecting 32 channels of differential signals at a sampling rate of 6.25 kHz per channel. The system is designed to collect data from remote wind turbine research sites and transfer the data via wireless communication. A description of operational theory, hardware components, and itemized cost is provided. Synchronization with other data acquisition systems and test data on data transmission rates is also given. 11 refs., 7 figs., 4 tabs.

  2. Quality-driven model-based design of multi-processor accelerators : an application to LDPC decoders

    NARCIS (Netherlands)

    Jan, Y.

    2012-01-01

    The recent spectacular progress in nano-electronic technology has enabled the implementation of very complex multi-processor systems on single chips (MPSoCs). However in parallel, new highly demanding complex embedded applications are emerging, in fields like communication and networking,

  3. Energy efficient distributed computing systems

    CERN Document Server

    Lee, Young-Choon

    2012-01-01

    The energy consumption issue in distributed computing systems raises various monetary, environmental and system performance concerns. Electricity consumption in the US doubled from 2000 to 2005.  From a financial and environmental standpoint, reducing the consumption of electricity is important, yet these reforms must not lead to performance degradation of the computing systems.  These contradicting constraints create a suite of complex problems that need to be resolved in order to lead to 'greener' distributed computing systems.  This book brings together a group of outsta

  4. Distributed computing for global health

    CERN Multimedia

    CERN. Geneva; Schwede, Torsten; Moore, Celia; Smith, Thomas E; Williams, Brian; Grey, François

    2005-01-01

    Distributed computing harnesses the power of thousands of computers within organisations or over the Internet. In order to tackle global health problems, several groups of researchers have begun to use this approach to exceed by far the computing power of a single lab. This event illustrates how companies, research institutes and the general public are contributing their computing power to these efforts, and what impact this may have on a range of world health issues. Grids for neglected diseases Vincent Breton, CNRS/EGEE This talk introduces the topic of distributed computing, explaining the similarities and differences between Grid computing, volunteer computing and supercomputing, and outlines the potential of Grid computing for tackling neglected diseases where there is little economic incentive for private R&D efforts. Recent results on malaria drug design using the Grid infrastructure of the EU-funded EGEE project, which is coordinated by CERN and involves 70 partners in Europe, the US and Russi...

  5. Data Provenance for Agent-Based Models in a Distributed Memory

    Directory of Open Access Journals (Sweden)

    Delmar B. Davis

    2018-04-01

    Full Text Available Agent-Based Models (ABMs assist with studying emergent collective behavior of individual entities in social, biological, economic, network, and physical systems. Data provenance can support ABM by explaining individual agent behavior. However, there is no provenance support for ABMs in a distributed setting. The Multi-Agent Spatial Simulation (MASS library provides a framework for simulating ABMs at fine granularity, where agents and spatial data are shared application resources in a distributed memory. We introduce a novel approach to capture ABM provenance in a distributed memory, called ProvMASS. We evaluate our technique with traditional data provenance queries and performance measures. Our results indicate that a configurable approach can capture provenance that explains coordination of distributed shared resources, simulation logic, and agent behavior while limiting performance overhead. We also show the ability to support practical analyses (e.g., agent tracking and storage requirements for different capture configurations.

  6. Evaluation of External Memory Access Performance on a High-End FPGA Hybrid Computer

    Directory of Open Access Journals (Sweden)

    Konstantinos Kalaitzis

    2016-10-01

    Full Text Available The motivation of this research was to evaluate the main memory performance of a hybrid super computer such as the Convey HC-x, and ascertain how the controller performs in several access scenarios, vis-à-vis hand-coded memory prefetches. Such memory patterns are very useful in stencil computations. The theoretical bandwidth of the memory of the Convey is compared with the results of our measurements. The accurate study of the memory subsystem is particularly useful for users when they are developing their application-specific personality. Experiments were performed to measure the bandwidth between the coprocessor and the memory subsystem. The experiments aimed mainly at measuring the reading access speed of the memory from Application Engines (FPGAs. Different ways of accessing data were used in order to find the most efficient way to access memory. This way was proposed for future work in the Convey HC-x. When performing a series of accesses to memory, non-uniform latencies occur. The Memory Controller of the Convey HC-x in the coprocessor attempts to cover this latency. We measure memory efficiency as a ratio of the number of memory accesses and the number of execution cycles. The result of this measurement converges to one in most cases. In addition, we performed experiments with hand-coded memory accesses. The analysis of the experimental results shows how the memory subsystem and Memory Controllers work. From this work we conclude that the memory controllers do an excellent job, largely because (transparently to the user they seem to cache large amounts of data, and hence hand-coding is not needed in most situations.

  7. 2-D fluid dynamics models for laser driven fusion on IBM 3090 vector multiprocessors

    International Nuclear Information System (INIS)

    Atzeni, S.

    1988-01-01

    Fluid-dynamics codes for laser fusion are complex research codes, consisting of many distinct modules and embodying a variety of numerical methods. They are therefore good candidates for testing general purpose advanced computer architectures and the related software. In this paper, after a brief outline of the basic concepts of laser fusion, the implementation of the 2-D laser fusion fluid code DUED on the IBM 3090 VF vector multiprocessors is discussed. Emphasis is put on parallelization, performed by means of IBM Parallel FORTRAN (PF). It is shown how different modules have been optimized by using different features of PF: i) modules based on depth-2 nested loops exploit automatic parallelization; ii) laser light ray tracing is partitioned by scheduling parallel ICCG algorithm (executed in parallel by appropiately synchronized parallel subroutines). Performance results are given for separate modules of the code, as well as for typical complete runs

  8. Neuronal model with distributed delay: analysis and simulation study for gamma distribution memory kernel.

    Science.gov (United States)

    Karmeshu; Gupta, Varun; Kadambari, K V

    2011-06-01

    A single neuronal model incorporating distributed delay (memory)is proposed. The stochastic model has been formulated as a Stochastic Integro-Differential Equation (SIDE) which results in the underlying process being non-Markovian. A detailed analysis of the model when the distributed delay kernel has exponential form (weak delay) has been carried out. The selection of exponential kernel has enabled the transformation of the non-Markovian model to a Markovian model in an extended state space. For the study of First Passage Time (FPT) with exponential delay kernel, the model has been transformed to a system of coupled Stochastic Differential Equations (SDEs) in two-dimensional state space. Simulation studies of the SDEs provide insight into the effect of weak delay kernel on the Inter-Spike Interval(ISI) distribution. A measure based on Jensen-Shannon divergence is proposed which can be used to make a choice between two competing models viz. distributed delay model vis-á-vis LIF model. An interesting feature of the model is that the behavior of (CV(t))((ISI)) (Coefficient of Variation) of the ISI distribution with respect to memory kernel time constant parameter η reveals that neuron can switch from a bursting state to non-bursting state as the noise intensity parameter changes. The membrane potential exhibits decaying auto-correlation structure with or without damped oscillatory behavior depending on the choice of parameters. This behavior is in agreement with empirically observed pattern of spike count in a fixed time window. The power spectral density derived from the auto-correlation function is found to exhibit single and double peaks. The model is also examined for the case of strong delay with memory kernel having the form of Gamma distribution. In contrast to fast decay of damped oscillations of the ISI distribution for the model with weak delay kernel, the decay of damped oscillations is found to be slower for the model with strong delay kernel.

  9. COMPUTING: International symposium

    International Nuclear Information System (INIS)

    Anon.

    1984-01-01

    Recent Developments in Computing, Processor, and Software Research for High Energy Physics, a four-day international symposium, was held in Guanajuato, Mexico, from 8-11 May, with 112 attendees from nine countries. The symposium was the third in a series of meetings exploring activities in leading-edge computing technology in both processor and software research and their effects on high energy physics. Topics covered included fixed-target on- and off-line reconstruction processors; lattice gauge and general theoretical processors and computing; multiprocessor projects; electron-positron collider on- and offline reconstruction processors; state-of-the-art in university computer science and industry; software research; accelerator processors; and proton-antiproton collider on and off-line reconstruction processors

  10. Computational cost estimates for parallel shared memory isogeometric multi-frontal solvers

    KAUST Repository

    Woźniak, Maciej; Kuźnik, Krzysztof M.; Paszyński, Maciej R.; Calo, Victor M.; Pardo, D.

    2014-01-01

    In this paper we present computational cost estimates for parallel shared memory isogeometric multi-frontal solvers. The estimates show that the ideal isogeometric shared memory parallel direct solver scales as O( p2log(N/p)) for one dimensional problems, O(Np2) for two dimensional problems, and O(N4/3p2) for three dimensional problems, where N is the number of degrees of freedom, and p is the polynomial order of approximation. The computational costs of the shared memory parallel isogeometric direct solver are compared with those corresponding to the sequential isogeometric direct solver, being the latest equal to O(N p2) for the one dimensional case, O(N1.5p3) for the two dimensional case, and O(N2p3) for the three dimensional case. The shared memory version significantly reduces both the scalability in terms of N and p. Theoretical estimates are compared with numerical experiments performed with linear, quadratic, cubic, quartic, and quintic B-splines, in one and two spatial dimensions. © 2014 Elsevier Ltd. All rights reserved.

  11. Computational cost estimates for parallel shared memory isogeometric multi-frontal solvers

    KAUST Repository

    Woźniak, Maciej

    2014-06-01

    In this paper we present computational cost estimates for parallel shared memory isogeometric multi-frontal solvers. The estimates show that the ideal isogeometric shared memory parallel direct solver scales as O( p2log(N/p)) for one dimensional problems, O(Np2) for two dimensional problems, and O(N4/3p2) for three dimensional problems, where N is the number of degrees of freedom, and p is the polynomial order of approximation. The computational costs of the shared memory parallel isogeometric direct solver are compared with those corresponding to the sequential isogeometric direct solver, being the latest equal to O(N p2) for the one dimensional case, O(N1.5p3) for the two dimensional case, and O(N2p3) for the three dimensional case. The shared memory version significantly reduces both the scalability in terms of N and p. Theoretical estimates are compared with numerical experiments performed with linear, quadratic, cubic, quartic, and quintic B-splines, in one and two spatial dimensions. © 2014 Elsevier Ltd. All rights reserved.

  12. The discrete-dipole-approximation code ADDA: Capabilities and known limitations

    International Nuclear Information System (INIS)

    Yurkin, Maxim A.; Hoekstra, Alfons G.

    2011-01-01

    The open-source code ADDA is described, which implements the discrete dipole approximation (DDA), a method to simulate light scattering by finite 3D objects of arbitrary shape and composition. Besides standard sequential execution, ADDA can run on a multiprocessor distributed-memory system, parallelizing a single DDA calculation. Hence the size parameter of the scatterer is in principle limited only by total available memory and computational speed. ADDA is written in C99 and is highly portable. It provides full control over the scattering geometry (particle morphology and orientation, and incident beam) and allows one to calculate a wide variety of integral and angle-resolved scattering quantities (cross sections, the Mueller matrix, etc.). Moreover, ADDA incorporates a range of state-of-the-art DDA improvements, aimed at increasing the accuracy and computational speed of the method. We discuss both physical and computational aspects of the DDA simulations and provide a practical introduction into performing such simulations with the ADDA code. We also present several simulation results, in particular, for a sphere with size parameter 320 (100-wavelength diameter) and refractive index 1.05.

  13. Parallel algorithms for quantum chemistry. I. Integral transformations on a hypercube multiprocessor

    International Nuclear Information System (INIS)

    Whiteside, R.A.; Binkley, J.S.; Colvin, M.E.; Schaefer, H.F. III

    1987-01-01

    For many years it has been recognized that fundamental physical constraints such as the speed of light will limit the ultimate speed of single processor computers to less than about three billion floating point operations per second (3 GFLOPS). This limitation is becoming increasingly restrictive as commercially available machines are now within an order of magnitude of this asymptotic limit. A natural way to avoid this limit is to harness together many processors to work on a single computational problem. In principle, these parallel processing computers have speeds limited only by the number of processors one chooses to acquire. The usefulness of potentially unlimited processing speed to a computationally intensive field such as quantum chemistry is obvious. If these methods are to be applied to significantly larger chemical systems, parallel schemes will have to be employed. For this reason we have developed distributed-memory algorithms for a number of standard quantum chemical methods. We are currently implementing these on a 32 processor Intel hypercube. In this paper we present our algorithm and benchmark results for one of the bottleneck steps in quantum chemical calculations: the four index integral transformation

  14. A Web-based Distributed Voluntary Computing Platform for Large Scale Hydrological Computations

    Science.gov (United States)

    Demir, I.; Agliamzanov, R.

    2014-12-01

    Distributed volunteer computing can enable researchers and scientist to form large parallel computing environments to utilize the computing power of the millions of computers on the Internet, and use them towards running large scale environmental simulations and models to serve the common good of local communities and the world. Recent developments in web technologies and standards allow client-side scripting languages to run at speeds close to native application, and utilize the power of Graphics Processing Units (GPU). Using a client-side scripting language like JavaScript, we have developed an open distributed computing framework that makes it easy for researchers to write their own hydrologic models, and run them on volunteer computers. Users will easily enable their websites for visitors to volunteer sharing their computer resources to contribute running advanced hydrological models and simulations. Using a web-based system allows users to start volunteering their computational resources within seconds without installing any software. The framework distributes the model simulation to thousands of nodes in small spatial and computational sizes. A relational database system is utilized for managing data connections and queue management for the distributed computing nodes. In this paper, we present a web-based distributed volunteer computing platform to enable large scale hydrological simulations and model runs in an open and integrated environment.

  15. A microcomputer network for the control of digitising machines

    International Nuclear Information System (INIS)

    Seller, P.

    1981-01-01

    A distributed microcomputing network operates in the Bubble Chamber Research Group Scanning Laboratory at the Rutherford and Appleton Laboratories. A microcomputer at each digitising table buffers information, controls the functioning of the table and enhances the machine/operator interface. The system consists of fourteen microcomputers together with a VAX 11/780 computer used for data analysis. These are inter-connected via a packet switched network. This paper will describe the features of the combined system, including the distributed computing architecture and the packet switched method of communication. This paper will also describe in detail a high speed packet switching controller used as a central node of the network. This controller is a multiprocessor microcomputer system with eighteen central processor units, thirty-four direct memory access channels and thirty-four prioritorised and vectored interrupt channels. This microcomputer is of general interest as a communications controller due to its totally programmable nature. (orig.)

  16. P3T+: A Performance Estimator for Distributed and Parallel Programs

    Directory of Open Access Journals (Sweden)

    T. Fahringer

    2000-01-01

    Full Text Available Developing distributed and parallel programs on today's multiprocessor architectures is still a challenging task. Particular distressing is the lack of effective performance tools that support the programmer in evaluating changes in code, problem and machine sizes, and target architectures. In this paper we introduce P3T+ which is a performance estimator for mostly regular HPF (High Performance Fortran programs but partially covers also message passing programs (MPI. P3T+ is unique by modeling programs, compiler code transformations, and parallel and distributed architectures. It computes at compile-time a variety of performance parameters including work distribution, number of transfers, amount of data transferred, transfer times, computation times, and number of cache misses. Several novel technologies are employed to compute these parameters: loop iteration spaces, array access patterns, and data distributions are modeled by employing highly effective symbolic analysis. Communication is estimated by simulating the behavior of a communication library used by the underlying compiler. Computation times are predicted through pre-measured kernels on every target architecture of interest. We carefully model most critical architecture specific factors such as cache lines sizes, number of cache lines available, startup times, message transfer time per byte, etc. P3T+ has been implemented and is closely integrated with the Vienna High Performance Compiler (VFC to support programmers develop parallel and distributed applications. Experimental results for realistic kernel codes taken from real-world applications are presented to demonstrate both accuracy and usefulness of P3T+.

  17. An Applet-based Anonymous Distributed Computing System.

    Science.gov (United States)

    Finkel, David; Wills, Craig E.; Ciaraldi, Michael J.; Amorin, Kevin; Covati, Adam; Lee, Michael

    2001-01-01

    Defines anonymous distributed computing systems and focuses on the specifics of a Java, applet-based approach for large-scale, anonymous, distributed computing on the Internet. Explains the possibility of a large number of computers participating in a single computation and describes a test of the functionality of the system. (Author/LRW)

  18. Computer Use and Its Effect on the Memory Process in Young and Adults

    Science.gov (United States)

    Alliprandini, Paula Mariza Zedu; Straub, Sandra Luzia Wrobel; Brugnera, Elisangela; de Oliveira, Tânia Pitombo; Souza, Isabela Augusta Andrade

    2013-01-01

    This work investigates the effect of computer use in the memory process in young and adults under the Perceptual and Memory experimental conditions. The memory condition involved the phases acquisition of information and recovery, on time intervals (2 min, 24 hours and 1 week) on situations of pre and post-test (before and after the participants…

  19. Computational Model-Based Prediction of Human Episodic Memory Performance Based on Eye Movements

    Science.gov (United States)

    Sato, Naoyuki; Yamaguchi, Yoko

    Subjects' episodic memory performance is not simply reflected by eye movements. We use a ‘theta phase coding’ model of the hippocampus to predict subjects' memory performance from their eye movements. Results demonstrate the ability of the model to predict subjects' memory performance. These studies provide a novel approach to computational modeling in the human-machine interface.

  20. Fiscal 1998 research report on super compiler technology; 1998 nendo super konpaira technology no chosa kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1999-03-01

    For next-generation super computing systems, research was made on parallel and distributed compiler technology for enhancing an effective performance, and concerned software and architectures for enhancing a performance in coordination with compilers. As for parallel compiler technology, the researches of scalable automated parallel compiler technology, parallel tuning tools, and an operating system to use multi-processor resources effectively are pointed out to be important as concrete technical development issues. In addition, by developing these research results to the architecture technology of single-chip multi-processors, the possibility of development and expansion of the PC, WS and HPC (high-performance computer) markets, and creation of new industries is pointed out. Although wide-area distributed computing is being watched as next-generation computing industry, concrete industrial fields using such computing are now not clear, staying in the groping research stage. (NEDO)

  1. Graphical Visualization on Computational Simulation Using Shared Memory

    International Nuclear Information System (INIS)

    Lima, A B; Correa, Eberth

    2014-01-01

    The Shared Memory technique is a powerful tool for parallelizing computer codes. In particular it can be used to visualize the results ''on the fly'' without stop running the simulation. In this presentation we discuss and show how to use the technique conjugated with a visualization code using openGL

  2. Temporal analysis and scheduling of hard real-time radios running on a multi-processor

    NARCIS (Netherlands)

    Moreira, O.

    2012-01-01

    On a multi-radio baseband system, multiple independent transceivers must share the resources of a multi-processor, while meeting each its own hard real-time requirements. Not all possible combinations of transceivers are known at compile time, so a solution must be found that either allows for

  3. Optical backplane interconnect switch for data processors and computers

    Science.gov (United States)

    Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.

    1989-01-01

    An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.

  4. PRISMA database machine: A distributed, main-memory approach

    NARCIS (Netherlands)

    Schmidt, J.W.; Apers, Peter M.G.; Ceri, S.; Kersten, Martin L.; Oerlemans, Hans C.M.; Missikoff, M.

    1988-01-01

    The PRISMA project is a large-scale research effort in the design and implementation of a highly parallel machine for data and knowledge processing. The PRISMA database machine is a distributed, main-memory database management system implemented in an object-oriented language that runs on top of a

  5. Power profiling of Cholesky and QR factorizations on distributed memory systems

    KAUST Repository

    Bosilca, George; Ltaief, Hatem; Dongarra, Jack

    2012-01-01

    with a dynamic distributed scheduler (DAGuE) to leverage distributed memory systems. We present performance results (Gflop/s) as well as the power profile (Watts) of two common dense factorizations needed to solve linear systems of equations, namely

  6. Distributed computing at the SSCL

    International Nuclear Information System (INIS)

    Cormell, L.; White, R.

    1993-05-01

    The rapid increase in the availability of high performance, cost- effective RISC/UNIX workstations has been both a blessing and a curse. The blessing of having extremely powerful computing engines available on the desk top is well-known to many users. The user has tremendous freedom, flexibility, and control of his environment. That freedom can, however, become the curse of distributed computing. The user must become a system manager to some extent, he must worry about backups, maintenance, upgrades, etc. Traditionally these activities have been the responsibility of a central computing group. The central computing group, however, may find that it can no linger provide all of the traditional services. With the plethora of workstations now found on so many desktops throughout the entire campus or lab, the central computing group may be swamped by support requests. This talk will address several of these computer support and management issues by discussing the approach taken at the Superconducting Super Collider Laboratory. In addition, a brief review of the future directions of commercial products for distributed computing and management will be given

  7. Distributed computing at the SSCL

    International Nuclear Information System (INIS)

    Cormell, L.R.; White, R.C.

    1994-01-01

    The rapid increase in the availability of high performance, cost-effective RISC/UNIX workstations has been both a blessing and a curse. The blessing of having extremely powerful computing engines available on the desk top is well-known to many users. The user has tremendous freedom, flexibility, and control of his environment. That freedom can, however, become the curse of distributed computing. The user must become a system manager to some extent, he must worry about backups, maintenance, upgrades, etc. Traditionally these activities have been the responsibility of central computing group. The central computing group, however, may find that it can no longer provide all of the traditional services. With the plethora of workstations now found on so many desktops throughout the entire campus or lab, the central computing group may be swamped by support requests. This talk will address several of these computer support and management issues by discussing the approach taken at the Superconducting Super Collider Laboratory (SSCL). In addition, a brief review of the future directions of commercial products for distributed computing and management will be given

  8. HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing.

    Science.gov (United States)

    Wan, Shixiang; Zou, Quan

    2017-01-01

    Multiple sequence alignment (MSA) plays a key role in biological sequence analyses, especially in phylogenetic tree construction. Extreme increase in next-generation sequencing results in shortage of efficient ultra-large biological sequence alignment approaches for coping with different sequence types. Distributed and parallel computing represents a crucial technique for accelerating ultra-large (e.g. files more than 1 GB) sequence analyses. Based on HAlign and Spark distributed computing system, we implement a highly cost-efficient and time-efficient HAlign-II tool to address ultra-large multiple biological sequence alignment and phylogenetic tree construction. The experiments in the DNA and protein large scale data sets, which are more than 1GB files, showed that HAlign II could save time and space. It outperformed the current software tools. HAlign-II can efficiently carry out MSA and construct phylogenetic trees with ultra-large numbers of biological sequences. HAlign-II shows extremely high memory efficiency and scales well with increases in computing resource. THAlign-II provides a user-friendly web server based on our distributed computing infrastructure. HAlign-II with open-source codes and datasets was established at http://lab.malab.cn/soft/halign.

  9. Insect olfactory coding and memory at multiple timescales.

    Science.gov (United States)

    Gupta, Nitin; Stopfer, Mark

    2011-10-01

    Insects can learn, allowing them great flexibility for locating seasonal food sources and avoiding wily predators. Because insects are relatively simple and accessible to manipulation, they provide good experimental preparations for exploring mechanisms underlying sensory coding and memory. Here we review how the intertwining of memory with computation enables the coding, decoding, and storage of sensory experience at various stages of the insect olfactory system. Individual parts of this system are capable of multiplexing memories at different timescales, and conversely, memory on a given timescale can be distributed across different parts of the circuit. Our sampling of the olfactory system emphasizes the diversity of memories, and the importance of understanding these memories in the context of computations performed by different parts of a sensory system. Published by Elsevier Ltd.

  10. The DANTE Boltzmann transport solver: An unstructured mesh, 3-D, spherical harmonics algorithm compatible with parallel computer architectures

    International Nuclear Information System (INIS)

    McGhee, J.M.; Roberts, R.M.; Morel, J.E.

    1997-01-01

    A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner for scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated

  11. Projection multiplex recording of computer-synthesised one-dimensional Fourier holograms for holographic memory systems: mathematical and experimental modelling

    Energy Technology Data Exchange (ETDEWEB)

    Betin, A Yu; Bobrinev, V I; Verenikina, N M; Donchenko, S S; Odinokov, S B [Research Institute ' Radiotronics and Laser Engineering' , Bauman Moscow State Technical University, Moscow (Russian Federation); Evtikhiev, N N; Zlokazov, E Yu; Starikov, S N; Starikov, R S [National Reseach Nuclear University MEPhI (Moscow Engineering Physics Institute), Moscow (Russian Federation)

    2015-08-31

    A multiplex method of recording computer-synthesised one-dimensional Fourier holograms intended for holographic memory devices is proposed. The method potentially allows increasing the recording density in the previously proposed holographic memory system based on the computer synthesis and projection recording of data page holograms. (holographic memory)

  12. 10th International Symposium on Intelligent Distributed Computing

    CERN Document Server

    Seghrouchni, Amal; Beynier, Aurélie; Camacho, David; Herpson, Cédric; Hindriks, Koen; Novais, Paulo

    2017-01-01

    This book presents the combined peer-reviewed proceedings of the tenth International Symposium on Intelligent Distributed Computing (IDC’2016), which was held in Paris, France from October 10th to 12th, 2016. The 23 contributions address a range of topics related to theory and application of intelligent distributed computing, including: Intelligent Distributed Agent-Based Systems, Ambient Intelligence and Social Networks, Computational Sustainability, Intelligent Distributed Knowledge Representation and Processing, Smart Networks, Networked Intelligence and Intelligent Distributed Applications, amongst others.

  13. DIRAC distributed computing services

    International Nuclear Information System (INIS)

    Tsaregorodtsev, A

    2014-01-01

    DIRAC Project provides a general-purpose framework for building distributed computing systems. It is used now in several HEP and astrophysics experiments as well as for user communities in other scientific domains. There is a large interest from smaller user communities to have a simple tool like DIRAC for accessing grid and other types of distributed computing resources. However, small experiments cannot afford to install and maintain dedicated services. Therefore, several grid infrastructure projects are providing DIRAC services for their respective user communities. These services are used for user tutorials as well as to help porting the applications to the grid for a practical day-to-day work. The services are giving access typically to several grid infrastructures as well as to standalone computing clusters accessible by the target user communities. In the paper we will present the experience of running DIRAC services provided by the France-Grilles NGI and other national grid infrastructure projects.

  14. High speed vision processor with reconfigurable processing element array based on full-custom distributed memory

    Science.gov (United States)

    Chen, Zhe; Yang, Jie; Shi, Cong; Qin, Qi; Liu, Liyuan; Wu, Nanjian

    2016-04-01

    In this paper, a hybrid vision processor based on a compact full-custom distributed memory for near-sensor high-speed image processing is proposed. The proposed processor consists of a reconfigurable processing element (PE) array, a row processor (RP) array, and a dual-core microprocessor. The PE array includes two-dimensional processing elements with a compact full-custom distributed memory. It supports real-time reconfiguration between the PE array and the self-organized map (SOM) neural network. The vision processor is fabricated using a 0.18 µm CMOS technology. The circuit area of the distributed memory is reduced markedly into 1/3 of that of the conventional memory so that the circuit area of the vision processor is reduced by 44.2%. Experimental results demonstrate that the proposed design achieves correct functions.

  15. A learnable parallel processing architecture towards unity of memory and computing.

    Science.gov (United States)

    Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

    2015-08-14

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  16. A learnable parallel processing architecture towards unity of memory and computing

    Science.gov (United States)

    Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

    2015-08-01

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  17. Computer Graphics Simulations of Sampling Distributions.

    Science.gov (United States)

    Gordon, Florence S.; Gordon, Sheldon P.

    1989-01-01

    Describes the use of computer graphics simulations to enhance student understanding of sampling distributions that arise in introductory statistics. Highlights include the distribution of sample proportions, the distribution of the difference of sample means, the distribution of the difference of sample proportions, and the distribution of sample…

  18. Computer Game Play Reduces Intrusive Memories of Experimental Trauma via Reconsolidation-Update Mechanisms.

    Science.gov (United States)

    James, Ella L; Bonsall, Michael B; Hoppitt, Laura; Tunbridge, Elizabeth M; Geddes, John R; Milton, Amy L; Holmes, Emily A

    2015-08-01

    Memory of a traumatic event becomes consolidated within hours. Intrusive memories can then flash back repeatedly into the mind's eye and cause distress. We investigated whether reconsolidation-the process during which memories become malleable when recalled-can be blocked using a cognitive task and whether such an approach can reduce these unbidden intrusions. We predicted that reconsolidation of a reactivated visual memory of experimental trauma could be disrupted by engaging in a visuospatial task that would compete for visual working memory resources. We showed that intrusive memories were virtually abolished by playing the computer game Tetris following a memory-reactivation task 24 hr after initial exposure to experimental trauma. Furthermore, both memory reactivation and playing Tetris were required to reduce subsequent intrusions (Experiment 2), consistent with reconsolidation-update mechanisms. A simple, noninvasive cognitive-task procedure administered after emotional memory has already consolidated (i.e., > 24 hours after exposure to experimental trauma) may prevent the recurrence of intrusive memories of those emotional events. © The Author(s) 2015.

  19. A highly efficient parallel algorithm for solving the neutron diffusion nodal equations on shared-memory computers

    International Nuclear Information System (INIS)

    Azmy, Y.Y.; Kirk, B.L.

    1990-01-01

    Modern parallel computer architectures offer an enormous potential for reducing CPU and wall-clock execution times of large-scale computations commonly performed in various applications in science and engineering. Recently, several authors have reported their efforts in developing and implementing parallel algorithms for solving the neutron diffusion equation on a variety of shared- and distributed-memory parallel computers. Testing of these algorithms for a variety of two- and three-dimensional meshes showed significant speedup of the computation. Even for very large problems (i.e., three-dimensional fine meshes) executed concurrently on a few nodes in serial (nonvector) mode, however, the measured computational efficiency is very low (40 to 86%). In this paper, the authors present a highly efficient (∼85 to 99.9%) algorithm for solving the two-dimensional nodal diffusion equations on the Sequent Balance 8000 parallel computer. Also presented is a model for the performance, represented by the efficiency, as a function of problem size and the number of participating processors. The model is validated through several tests and then extrapolated to larger problems and more processors to predict the performance of the algorithm in more computationally demanding situations

  20. Power profiling of Cholesky and QR factorizations on distributed memory systems

    KAUST Repository

    Bosilca, George

    2012-08-30

    This paper presents the power profile of two high performance dense linear algebra libraries on distributed memory systems, ScaLAPACK and DPLASMA. From the algorithmic perspective, their methodologies are opposite. The former is based on block algorithms and relies on multithreaded BLAS and a two-dimensional block cyclic data distribution to achieve high parallel performance. The latter is based on tile algorithms running on top of a tile data layout and uses fine-grained task parallelism combined with a dynamic distributed scheduler (DAGuE) to leverage distributed memory systems. We present performance results (Gflop/s) as well as the power profile (Watts) of two common dense factorizations needed to solve linear systems of equations, namely Cholesky and QR. The reported numbers show that DPLASMA surpasses ScaLAPACK not only in terms of performance (up to 2X speedup) but also in terms of energy efficiency (up to 62 %). © 2012 Springer-Verlag (outside the USA).

  1. Distributed computing for macromolecular crystallography.

    Science.gov (United States)

    Krissinel, Evgeny; Uski, Ville; Lebedev, Andrey; Winn, Martyn; Ballard, Charles

    2018-02-01

    Modern crystallographic computing is characterized by the growing role of automated structure-solution pipelines, which represent complex expert systems utilizing a number of program components, decision makers and databases. They also require considerable computational resources and regular database maintenance, which is increasingly more difficult to provide at the level of individual desktop-based CCP4 setups. On the other hand, there is a significant growth in data processed in the field, which brings up the issue of centralized facilities for keeping both the data collected and structure-solution projects. The paradigm of distributed computing and data management offers a convenient approach to tackling these problems, which has become more attractive in recent years owing to the popularity of mobile devices such as tablets and ultra-portable laptops. In this article, an overview is given of developments by CCP4 aimed at bringing distributed crystallographic computations to a wide crystallographic community.

  2. Memory allocation and computations for Laplace’s equation of 3-D arbitrary boundary problems

    Directory of Open Access Journals (Sweden)

    Tsay Tswn-Syau

    2017-01-01

    Full Text Available Computation iteration schemes and memory allocation technique for finite difference method were presented in this paper. The transformed form of a groundwater flow problem in the generalized curvilinear coordinates was taken to be the illustrating example and a 3-dimensional second order accurate 19-point scheme was presented. Traditional element-by-element methods (e.g. SOR are preferred since it is simple and memory efficient but time consuming in computation. For efficient memory allocation, an index method was presented to store the sparse non-symmetric matrix of the problem. For computations, conjugate-gradient-like methods were reported to be computationally efficient. Among them, using incomplete Choleski decomposition as preconditioner was reported to be good method for iteration convergence. In general, the developed index method in this paper has the following advantages: (1 adaptable to various governing and boundary conditions, (2 flexible for higher order approximation, (3 independence of problem dimension, (4 efficient for complex problems when global matrix is not symmetric, (5 convenience for general sparse matrices, (6 computationally efficient in the most time consuming procedure of matrix multiplication, and (7 applicable to any developed matrix solver.

  3. Distributed simulation of large computer systems

    International Nuclear Information System (INIS)

    Marzolla, M.

    2001-01-01

    Sequential simulation of large complex physical systems is often regarded as a computationally expensive task. In order to speed-up complex discrete-event simulations, the paradigm of Parallel and Distributed Discrete Event Simulation (PDES) has been introduced since the late 70s. The authors analyze the applicability of PDES to the modeling and analysis of large computer system; such systems are increasingly common in the area of High Energy and Nuclear Physics, because many modern experiments make use of large 'compute farms'. Some feasibility tests have been performed on a prototype distributed simulator

  4. Bayesian optimization for computationally extensive probability distributions.

    Science.gov (United States)

    Tamura, Ryo; Hukushima, Koji

    2018-01-01

    An efficient method for finding a better maximizer of computationally extensive probability distributions is proposed on the basis of a Bayesian optimization technique. A key idea of the proposed method is to use extreme values of acquisition functions by Gaussian processes for the next training phase, which should be located near a local maximum or a global maximum of the probability distribution. Our Bayesian optimization technique is applied to the posterior distribution in the effective physical model estimation, which is a computationally extensive probability distribution. Even when the number of sampling points on the posterior distributions is fixed to be small, the Bayesian optimization provides a better maximizer of the posterior distributions in comparison to those by the random search method, the steepest descent method, or the Monte Carlo method. Furthermore, the Bayesian optimization improves the results efficiently by combining the steepest descent method and thus it is a powerful tool to search for a better maximizer of computationally extensive probability distributions.

  5. A measurement-based performability model for a multiprocessor system

    Science.gov (United States)

    Ilsueh, M. C.; Iyer, Ravi K.; Trivedi, K. S.

    1987-01-01

    A measurement-based performability model based on real error-data collected on a multiprocessor system is described. Model development from the raw errror-data to the estimation of cumulative reward is described. Both normal and failure behavior of the system are characterized. The measured data show that the holding times in key operational and failure states are not simple exponential and that semi-Markov process is necessary to model the system behavior. A reward function, based on the service rate and the error rate in each state, is then defined in order to estimate the performability of the system and to depict the cost of different failure types and recovery procedures.

  6. An efficient communication scheme for solving Sn equations on message-passing multiprocessors

    International Nuclear Information System (INIS)

    Azmy, Y.Y.

    1993-01-01

    Early models of Intel's hypercube multiprocessors, e.g., the iPSC/1 and iPSC/2, were characterized by the high latency of message passing. This relatively weak dependence of the communication penalty on the size of messages, in contrast to its strong dependence on the number of messages, justified using the Fan-in Fan-out algorithm (which implements a minimum spanning tree path) to perform global operations, such as global sums, etc. Recent models of message-passing computers, such as the iPSC/860 and the Paragon, have been found to possess much smaller latency, thus forcing a reexamination of the issue of performance optimization with respect to communication schemes. Essentially, the Fan-in Fan-out scheme minimizes the number of nonsimultaneous messages sent but not the volume of data traffic across the network. Furthermore, if a global operation is performed in conjunction with the message passing, a large fraction of the attached nodes remains idle as the number of utilized processors is halved in each step of the process. On the other hand, the Recursive Halving scheme offers the smallest communication cost for global operations but has some drawbacks

  7. Distributed computing environments for future space control systems

    Science.gov (United States)

    Viallefont, Pierre

    1993-01-01

    The aim of this paper is to present the results of a CNES research project on distributed computing systems. The purpose of this research was to study the impact of the use of new computer technologies in the design and development of future space applications. The first part of this study was a state-of-the-art review of distributed computing systems. One of the interesting ideas arising from this review is the concept of a 'virtual computer' allowing the distributed hardware architecture to be hidden from a software application. The 'virtual computer' can improve system performance by adapting the best architecture (addition of computers) to the software application without having to modify its source code. This concept can also decrease the cost and obsolescence of the hardware architecture. In order to verify the feasibility of the 'virtual computer' concept, a prototype representative of a distributed space application is being developed independently of the hardware architecture.

  8. A Distributed Computational Infrastructure for Science and Education

    Directory of Open Access Journals (Sweden)

    Rustam K. Bazarov

    2014-06-01

    Full Text Available Researchers have lately been paying increasingly more attention to parallel and distributed algorithms for solving high-dimensionality problems. In this regard, the issue of acquiring or renting computational resources becomes a topical one for employees of scientific and educational institutions. This article examines technology and methods for organizing a distributed computational infrastructure. The author addresses the experience of creating a high-performance system powered by existing clusterization and grid computing technology. The approach examined in the article helps minimize financial costs, aggregate territorially distributed computational resources and ensures a more rational use of available computer equipment, eliminating its downtimes.

  9. Computer-Presented Organizational/Memory Aids as Instruction for Solving Pico-Fomi Problems.

    Science.gov (United States)

    Steinberg, Esther R.; And Others

    1985-01-01

    Describes investigation of effectiveness of computer-presented organizational/memory aids (matrix and verbal charts controlled by computer or learner) as instructional technique for solving Pico-Fomi problems, and the acquisition of deductive inference rules when such aids are present. Results indicate chart use control should be adapted to…

  10. Organization of the secure distributed computing based on multi-agent system

    Science.gov (United States)

    Khovanskov, Sergey; Rumyantsev, Konstantin; Khovanskova, Vera

    2018-04-01

    Nowadays developing methods for distributed computing is received much attention. One of the methods of distributed computing is using of multi-agent systems. The organization of distributed computing based on the conventional network computers can experience security threats performed by computational processes. Authors have developed the unified agent algorithm of control system of computing network nodes operation. Network PCs is used as computing nodes. The proposed multi-agent control system for the implementation of distributed computing allows in a short time to organize using of the processing power of computers any existing network to solve large-task by creating a distributed computing. Agents based on a computer network can: configure a distributed computing system; to distribute the computational load among computers operated agents; perform optimization distributed computing system according to the computing power of computers on the network. The number of computers connected to the network can be increased by connecting computers to the new computer system, which leads to an increase in overall processing power. Adding multi-agent system in the central agent increases the security of distributed computing. This organization of the distributed computing system reduces the problem solving time and increase fault tolerance (vitality) of computing processes in a changing computing environment (dynamic change of the number of computers on the network). Developed a multi-agent system detects cases of falsification of the results of a distributed system, which may lead to wrong decisions. In addition, the system checks and corrects wrong results.

  11. System and method for programmable bank selection for banked memory subsystems

    Energy Technology Data Exchange (ETDEWEB)

    Blumrich, Matthias A. (Ridgefield, CT); Chen, Dong (Croton on Hudson, NY); Gara, Alan G. (Mount Kisco, NY); Giampapa, Mark E. (Irvington, NY); Hoenicke, Dirk (Seebruck-Seeon, DE); Ohmacht, Martin (Yorktown Heights, NY); Salapura, Valentina (Chappaqua, NY); Sugavanam, Krishnan (Mahopac, NY)

    2010-09-07

    A programmable memory system and method for enabling one or more processor devices access to shared memory in a computing environment, the shared memory including one or more memory storage structures having addressable locations for storing data. The system comprises: one or more first logic devices associated with a respective one or more processor devices, each first logic device for receiving physical memory address signals and programmable for generating a respective memory storage structure select signal upon receipt of pre-determined address bit values at selected physical memory address bit locations; and, a second logic device responsive to each of the respective select signal for generating an address signal used for selecting a memory storage structure for processor access. The system thus enables each processor device of a computing environment memory storage access distributed across the one or more memory storage structures.

  12. Towards Modeling False Memory With Computational Knowledge Bases.

    Science.gov (United States)

    Li, Justin; Kohanyi, Emma

    2017-01-01

    One challenge to creating realistic cognitive models of memory is the inability to account for the vast common-sense knowledge of human participants. Large computational knowledge bases such as WordNet and DBpedia may offer a solution to this problem but may pose other challenges. This paper explores some of these difficulties through a semantic network spreading activation model of the Deese-Roediger-McDermott false memory task. In three experiments, we show that these knowledge bases only capture a subset of human associations, while irrelevant information introduces noise and makes efficient modeling difficult. We conclude that the contents of these knowledge bases must be augmented and, more important, that the algorithms must be refined and optimized, before large knowledge bases can be widely used for cognitive modeling. Copyright © 2016 Cognitive Science Society, Inc.

  13. Immigration, language proficiency, and autobiographical memories: Lifespan distribution and second-language access.

    Science.gov (United States)

    Esposito, Alena G; Baker-Ward, Lynne

    2016-08-01

    This investigation examined two controversies in the autobiographical literature: how cross-language immigration affects the distribution of autobiographical memories across the lifespan and under what circumstances language-dependent recall is observed. Both Spanish/English bilingual immigrants and English monolingual non-immigrants participated in a cue word study, with the bilingual sample taking part in a within-subject language manipulation. The expected bump in the number of memories from early life was observed for non-immigrants but not immigrants, who reported more memories for events surrounding immigration. Aspects of the methodology addressed possible reasons for past discrepant findings. Language-dependent recall was influenced by second-language proficiency. Results were interpreted as evidence that bilinguals with high second-language proficiency, in contrast to those with lower second-language proficiency, access a single conceptual store through either language. The final multi-level model predicting language-dependent recall, including second-language proficiency, age of immigration, internal language, and cue word language, explained ¾ of the between-person variance and (1)/5 of the within-person variance. We arrive at two conclusions. First, major life transitions influence the distribution of memories. Second, concept representation across multiple languages follows a developmental model. In addition, the results underscore the importance of considering language experience in research involving memory reports.

  14. A distributed computing model for telemetry data processing

    Science.gov (United States)

    Barry, Matthew R.; Scott, Kevin L.; Weismuller, Steven P.

    1994-05-01

    We present a new approach to distributing processed telemetry data among spacecraft flight controllers within the control centers at NASA's Johnson Space Center. This approach facilitates the development of application programs which integrate spacecraft-telemetered data and ground-based synthesized data, then distributes this information to flight controllers for analysis and decision-making. The new approach combines various distributed computing models into one hybrid distributed computing model. The model employs both client-server and peer-to-peer distributed computing models cooperating to provide users with information throughout a diverse operations environment. Specifically, it provides an attractive foundation upon which we are building critical real-time monitoring and control applications, while simultaneously lending itself to peripheral applications in playback operations, mission preparations, flight controller training, and program development and verification. We have realized the hybrid distributed computing model through an information sharing protocol. We shall describe the motivations that inspired us to create this protocol, along with a brief conceptual description of the distributed computing models it employs. We describe the protocol design in more detail, discussing many of the program design considerations and techniques we have adopted. Finally, we describe how this model is especially suitable for supporting the implementation of distributed expert system applications.

  15. A distributed computing model for telemetry data processing

    Science.gov (United States)

    Barry, Matthew R.; Scott, Kevin L.; Weismuller, Steven P.

    1994-01-01

    We present a new approach to distributing processed telemetry data among spacecraft flight controllers within the control centers at NASA's Johnson Space Center. This approach facilitates the development of application programs which integrate spacecraft-telemetered data and ground-based synthesized data, then distributes this information to flight controllers for analysis and decision-making. The new approach combines various distributed computing models into one hybrid distributed computing model. The model employs both client-server and peer-to-peer distributed computing models cooperating to provide users with information throughout a diverse operations environment. Specifically, it provides an attractive foundation upon which we are building critical real-time monitoring and control applications, while simultaneously lending itself to peripheral applications in playback operations, mission preparations, flight controller training, and program development and verification. We have realized the hybrid distributed computing model through an information sharing protocol. We shall describe the motivations that inspired us to create this protocol, along with a brief conceptual description of the distributed computing models it employs. We describe the protocol design in more detail, discussing many of the program design considerations and techniques we have adopted. Finally, we describe how this model is especially suitable for supporting the implementation of distributed expert system applications.

  16. The parallel processing of EGS4 code on distributed memory scalar parallel computer:Intel Paragon XP/S15-256

    Energy Technology Data Exchange (ETDEWEB)

    Takemiya, Hiroshi; Ohta, Hirofumi; Honma, Ichirou

    1996-03-01

    The parallelization of Electro-Magnetic Cascade Monte Carlo Simulation Code, EGS4 on distributed memory scalar parallel computer: Intel Paragon XP/S15-256 is described. EGS4 has the feature that calculation time for one incident particle is quite different from each other because of the dynamic generation of secondary particles and different behavior of each particle. Granularity for parallel processing, parallel programming model and the algorithm of parallel random number generation are discussed and two kinds of method, each of which allocates particles dynamically or statically, are used for the purpose of realizing high speed parallel processing of this code. Among four problems chosen for performance evaluation, the speedup factors for three problems have been attained to nearly 100 times with 128 processor. It has been found that when both the calculation time for each incident particles and its dispersion are large, it is preferable to use dynamic particle allocation method which can average the load for each processor. And it has also been found that when they are small, it is preferable to use static particle allocation method which reduces the communication overhead. Moreover, it is pointed out that to get the result accurately, it is necessary to use double precision variables in EGS4 code. Finally, the workflow of program parallelization is analyzed and tools for program parallelization through the experience of the EGS4 parallelization are discussed. (author).

  17. Context-aware distributed cloud computing using CloudScheduler

    Science.gov (United States)

    Seuster, R.; Leavett-Brown, CR; Casteels, K.; Driemel, C.; Paterson, M.; Ring, D.; Sobie, RJ; Taylor, RP; Weldon, J.

    2017-10-01

    The distributed cloud using the CloudScheduler VM provisioning service is one of the longest running systems for HEP workloads. It has run millions of jobs for ATLAS and Belle II over the past few years using private and commercial clouds around the world. Our goal is to scale the distributed cloud to the 10,000-core level, with the ability to run any type of application (low I/O, high I/O and high memory) on any cloud. To achieve this goal, we have been implementing changes that utilize context-aware computing designs that are currently employed in the mobile communication industry. Context-awareness makes use of real-time and archived data to respond to user or system requirements. In our distributed cloud, we have many opportunistic clouds with no local HEP services, software or storage repositories. A context-aware design significantly improves the reliability and performance of our system by locating the nearest location of the required services. We describe how we are collecting and managing contextual information from our workload management systems, the clouds, the virtual machines and our services. This information is used not only to monitor the system but also to carry out automated corrective actions. We are incrementally adding new alerting and response services to our distributed cloud. This will enable us to scale the number of clouds and virtual machines. Further, a context-aware design will enable us to run analysis or high I/O application on opportunistic clouds. We envisage an open-source HTTP data federation (for example, the DynaFed system at CERN) as a service that would provide us access to existing storage elements used by the HEP experiments.

  18. Optical computing, optical memory, and SBIRs at Foster-Miller

    Science.gov (United States)

    Domash, Lawrence H.

    1994-03-01

    A desktop design and manufacturing system for binary diffractive elements, MacBEEP, was developed with the optical researcher in mind. Optical processing systems for specialized tasks such as cellular automation computation and fractal measurement were constructed. A new family of switchable holograms has enabled several applications for control of laser beams in optical memories. New spatial light modulators and optical logic elements have been demonstrated based on a more manufacturable semiconductor technology. Novel synthetic and polymeric nonlinear materials for optical storage are under development in an integrated memory architecture. SBIR programs enable creative contributions from smaller companies, both product oriented and technology oriented, and support advances that might not otherwise be developed.

  19. A homotopy method for solving Riccati equations on a shared memory parallel computer

    International Nuclear Information System (INIS)

    Zigic, D.; Watson, L.T.; Collins, E.G. Jr.; Davis, L.D.

    1993-01-01

    Although there are numerous algorithms for solving Riccati equations, there still remains a need for algorithms which can operate efficiently on large problems and on parallel machines. This paper gives a new homotopy-based algorithm for solving Riccati equations on a shared memory parallel computer. The central part of the algorithm is the computation of the kernel of the Jacobian matrix, which is essential for the corrector iterations along the homotopy zero curve. Using a Schur decomposition the tensor product structure of various matrices can be efficiently exploited. The algorithm allows for efficient parallelization on shared memory machines

  20. LHCb Distributed Data Analysis on the Computing Grid

    CERN Document Server

    Paterson, S; Parkes, C

    2006-01-01

    LHCb is one of the four Large Hadron Collider (LHC) experiments based at CERN, the European Organisation for Nuclear Research. The LHC experiments will start taking an unprecedented amount of data when they come online in 2007. Since no single institute has the compute resources to handle this data, resources must be pooled to form the Grid. Where the Internet has made it possible to share information stored on computers across the world, Grid computing aims to provide access to computing power and storage capacity on geographically distributed systems. LHCb software applications must work seamlessly on the Grid allowing users to efficiently access distributed compute resources. It is essential to the success of the LHCb experiment that physicists can access data from the detector, stored in many heterogeneous systems, to perform distributed data analysis. This thesis describes the work performed to enable distributed data analysis for the LHCb experiment on the LHC Computing Grid.

  1. An Overview of Cloud Computing in Distributed Systems

    Science.gov (United States)

    Divakarla, Usha; Kumari, Geetha

    2010-11-01

    Cloud computing is the emerging trend in the field of distributed computing. Cloud computing evolved from grid computing and distributed computing. Cloud plays an important role in huge organizations in maintaining huge data with limited resources. Cloud also helps in resource sharing through some specific virtual machines provided by the cloud service provider. This paper gives an overview of the cloud organization and some of the basic security issues pertaining to the cloud.

  2. SuperLU{_}DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

    Energy Technology Data Exchange (ETDEWEB)

    Li, Xiaoye S.; Demmel, James W.

    2002-03-27

    In this paper, we present the main algorithmic features in the software package SuperLU{_}DIST, a distributed-memory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with focus on scalability issues, and demonstrate the parallel performance and scalability on current machines. The solver is based on sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by the authors. The main advantage of static pivoting over classical partial pivoting is that it permits a priori determination of data structures and communication pattern for sparse Gaussian elimination, which makes it more scalable on distributed memory machines. Based on this a priori knowledge, we designed highly parallel and scalable algorithms for both LU decomposition and triangular solve and we show that they are suitable for large-scale distributed memory machines.

  3. Distributed metadata in a high performance computing environment

    Science.gov (United States)

    Bent, John M.; Faibish, Sorin; Zhang, Zhenhua; Liu, Xuezhao; Tang, Haiying

    2017-07-11

    A computer-executable method, system, and computer program product for managing meta-data in a distributed storage system, wherein the distributed storage system includes one or more burst buffers enabled to operate with a distributed key-value store, the co computer-executable method, system, and computer program product comprising receiving a request for meta-data associated with a block of data stored in a first burst buffer of the one or more burst buffers in the distributed storage system, wherein the meta data is associated with a key-value, determining which of the one or more burst buffers stores the requested metadata, and upon determination that a first burst buffer of the one or more burst buffers stores the requested metadata, locating the key-value in a portion of the distributed key-value store accessible from the first burst buffer.

  4. Resource Allocation Model for Modelling Abstract RTOS on Multiprocessor System-on-Chip

    DEFF Research Database (Denmark)

    Virk, Kashif Munir; Madsen, Jan

    2003-01-01

    Resource Allocation is an important problem in RTOS's, and has been an active area of research. Numerous approaches have been developed and many different techniques have been combined for a wide range of applications. In this paper, we address the problem of resource allocation in the context...... of modelling an abstract RTOS on multiprocessor SoC platforms. We discuss the implementation details of a simplified basic priority inheritance protocol for our abstract system model in SystemC....

  5. Computational dissection of human episodic memory reveals mental process-specific genetic profiles.

    Science.gov (United States)

    Luksys, Gediminas; Fastenrath, Matthias; Coynel, David; Freytag, Virginie; Gschwind, Leo; Heck, Angela; Jessen, Frank; Maier, Wolfgang; Milnik, Annette; Riedel-Heller, Steffi G; Scherer, Martin; Spalek, Klara; Vogler, Christian; Wagner, Michael; Wolfsgruber, Steffen; Papassotiropoulos, Andreas; de Quervain, Dominique J-F

    2015-09-01

    Episodic memory performance is the result of distinct mental processes, such as learning, memory maintenance, and emotional modulation of memory strength. Such processes can be effectively dissociated using computational models. Here we performed gene set enrichment analyses of model parameters estimated from the episodic memory performance of 1,765 healthy young adults. We report robust and replicated associations of the amine compound SLC (solute-carrier) transporters gene set with the learning rate, of the collagen formation and transmembrane receptor protein tyrosine kinase activity gene sets with the modulation of memory strength by negative emotional arousal, and of the L1 cell adhesion molecule (L1CAM) interactions gene set with the repetition-based memory improvement. Furthermore, in a large functional MRI sample of 795 subjects we found that the association between L1CAM interactions and memory maintenance revealed large clusters of differences in brain activity in frontal cortical areas. Our findings provide converging evidence that distinct genetic profiles underlie specific mental processes of human episodic memory. They also provide empirical support to previous theoretical and neurobiological studies linking specific neuromodulators to the learning rate and linking neural cell adhesion molecules to memory maintenance. Furthermore, our study suggests additional memory-related genetic pathways, which may contribute to a better understanding of the neurobiology of human memory.

  6. Computational dissection of human episodic memory reveals mental process-specific genetic profiles

    Science.gov (United States)

    Luksys, Gediminas; Fastenrath, Matthias; Coynel, David; Freytag, Virginie; Gschwind, Leo; Heck, Angela; Jessen, Frank; Maier, Wolfgang; Milnik, Annette; Riedel-Heller, Steffi G.; Scherer, Martin; Spalek, Klara; Vogler, Christian; Wagner, Michael; Wolfsgruber, Steffen; Papassotiropoulos, Andreas; de Quervain, Dominique J.-F.

    2015-01-01

    Episodic memory performance is the result of distinct mental processes, such as learning, memory maintenance, and emotional modulation of memory strength. Such processes can be effectively dissociated using computational models. Here we performed gene set enrichment analyses of model parameters estimated from the episodic memory performance of 1,765 healthy young adults. We report robust and replicated associations of the amine compound SLC (solute-carrier) transporters gene set with the learning rate, of the collagen formation and transmembrane receptor protein tyrosine kinase activity gene sets with the modulation of memory strength by negative emotional arousal, and of the L1 cell adhesion molecule (L1CAM) interactions gene set with the repetition-based memory improvement. Furthermore, in a large functional MRI sample of 795 subjects we found that the association between L1CAM interactions and memory maintenance revealed large clusters of differences in brain activity in frontal cortical areas. Our findings provide converging evidence that distinct genetic profiles underlie specific mental processes of human episodic memory. They also provide empirical support to previous theoretical and neurobiological studies linking specific neuromodulators to the learning rate and linking neural cell adhesion molecules to memory maintenance. Furthermore, our study suggests additional memory-related genetic pathways, which may contribute to a better understanding of the neurobiology of human memory. PMID:26261317

  7. A Distributed Snapshot Protocol for Efficient Artificial Intelligence Computation in Cloud Computing Environments

    Directory of Open Access Journals (Sweden)

    JongBeom Lim

    2018-01-01

    Full Text Available Many artificial intelligence applications often require a huge amount of computing resources. As a result, cloud computing adoption rates are increasing in the artificial intelligence field. To support the demand for artificial intelligence applications and guarantee the service level agreement, cloud computing should provide not only computing resources but also fundamental mechanisms for efficient computing. In this regard, a snapshot protocol has been used to create a consistent snapshot of the global state in cloud computing environments. However, the existing snapshot protocols are not optimized in the context of artificial intelligence applications, where large-scale iterative computation is the norm. In this paper, we present a distributed snapshot protocol for efficient artificial intelligence computation in cloud computing environments. The proposed snapshot protocol is based on a distributed algorithm to run interconnected multiple nodes in a scalable fashion. Our snapshot protocol is able to deal with artificial intelligence applications, in which a large number of computing nodes are running. We reveal that our distributed snapshot protocol guarantees the correctness, safety, and liveness conditions.

  8. Proceedings of workshop on distributed computing and network

    International Nuclear Information System (INIS)

    Abe, F.; Yuasa, F.

    1993-02-01

    'Distributed Computing and Network' is one of hot topics in the field of computing. Recent progress in the computer technology is providing new paradigm for computing even in High Energy Physics. Particularly the workstation based computer system is opening new active field of computer application to sciences. The major topics discussed in this symposium are distributed computing and wide area research network for domestic and international link. The two days symposium provided so enough topics to foresee the next direction of our computing environment. 70 people have got together to discuss on these interesting thema as well as information exchange on the computer technologies. (J.P.N.)

  9. Mnemonic transmission, social contagion, and emergence of collective memory: Influence of emotional valence, group structure, and information distribution.

    Science.gov (United States)

    Choi, Hae-Yoon; Kensinger, Elizabeth A; Rajaram, Suparna

    2017-09-01

    Social transmission of memory and its consequence on collective memory have generated enduring interdisciplinary interest because of their widespread significance in interpersonal, sociocultural, and political arenas. We tested the influence of 3 key factors-emotional salience of information, group structure, and information distribution-on mnemonic transmission, social contagion, and collective memory. Participants individually studied emotionally salient (negative or positive) and nonemotional (neutral) picture-word pairs that were completely shared, partially shared, or unshared within participant triads, and then completed 3 consecutive recalls in 1 of 3 conditions: individual-individual-individual (control), collaborative-collaborative (identical group; insular structure)-individual, and collaborative-collaborative (reconfigured group; diverse structure)-individual. Collaboration enhanced negative memories especially in insular group structure and especially for shared information, and promoted collective forgetting of positive memories. Diverse group structure reduced this negativity effect. Unequally distributed information led to social contagion that creates false memories; diverse structure propagated a greater variety of false memories whereas insular structure promoted confidence in false recognition and false collective memory. A simultaneous assessment of network structure, information distribution, and emotional valence breaks new ground to specify how network structure shapes the spread of negative memories and false memories, and the emergence of collective memory. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  10. High threshold distributed quantum computing with three-qubit nodes

    International Nuclear Information System (INIS)

    Li Ying; Benjamin, Simon C

    2012-01-01

    In the distributed quantum computing paradigm, well-controlled few-qubit ‘nodes’ are networked together by connections which are relatively noisy and failure prone. A practical scheme must offer high tolerance to errors while requiring only simple (i.e. few-qubit) nodes. Here we show that relatively modest, three-qubit nodes can support advanced purification techniques and so offer robust scalability: the infidelity in the entanglement channel may be permitted to approach 10% if the infidelity in local operations is of order 0.1%. Our tolerance of network noise is therefore an order of magnitude beyond prior schemes, and our architecture remains robust even in the presence of considerable decoherence rates (memory errors). We compare the performance with that of schemes involving nodes of lower and higher complexity. Ion traps, and NV-centres in diamond, are two highly relevant emerging technologies: they possess the requisite properties of good local control, rapid and reliable readout, and methods for entanglement-at-a-distance. (paper)

  11. Efficient computation of aerodynamic influence coefficients for aeroelastic analysis on a transputer network

    Science.gov (United States)

    Janetzke, David C.; Murthy, Durbha V.

    1991-01-01

    Aeroelastic analysis is multi-disciplinary and computationally expensive. Hence, it can greatly benefit from parallel processing. As part of an effort to develop an aeroelastic capability on a distributed memory transputer network, a parallel algorithm for the computation of aerodynamic influence coefficients is implemented on a network of 32 transputers. The aerodynamic influence coefficients are calculated using a 3-D unsteady aerodynamic model and a parallel discretization. Efficiencies up to 85 percent were demonstrated using 32 processors. The effect of subtask ordering, problem size, and network topology are presented. A comparison to results on a shared memory computer indicates that higher speedup is achieved on the distributed memory system.

  12. Parallel computation of aerodynamic influence coefficients for aeroelastic analysis on a transputer network

    Science.gov (United States)

    Janetzke, D. C.; Murthy, D. V.

    1991-01-01

    Aeroelastic analysis is mult-disciplinary and computationally expensive. Hence, it can greatly benefit from parallel processing. As part of an effort to develop an aeroelastic analysis capability on a distributed-memory transputer network, a parallel algorithm for the computation of aerodynamic influence coefficients is implemented on a network of 32 transputers. The aerodynamic influence coefficients are calculated using a three-dimensional unsteady aerodynamic model and a panel discretization. Efficiencies up to 85 percent are demonstrated using 32 processors. The effects of subtask ordering, problem size and network topology are presented. A comparison to results on a shared-memory computer indicates that higher speedup is achieved on the distributed-memory system.

  13. 9th International Symposium on Intelligent Distributed Computing

    CERN Document Server

    Camacho, David; Analide, Cesar; Seghrouchni, Amal; Badica, Costin

    2016-01-01

    This book represents the combined peer-reviewed proceedings of the ninth International Symposium on Intelligent Distributed Computing – IDC’2015, of the Workshop on Cyber Security and Resilience of Large-Scale Systems – WSRL’2015, and of the International Workshop on Future Internet and Smart Networks – FI&SN’2015. All the events were held in Guimarães, Portugal during October 7th-9th, 2015. The 46 contributions published in this book address many topics related to theory and applications of intelligent distributed computing, including: Intelligent Distributed Agent-Based Systems, Ambient Intelligence and Social Networks, Computational Sustainability, Intelligent Distributed Knowledge Representation and Processing, Smart Networks, Networked Intelligence and Intelligent Distributed Applications, amongst others.

  14. Study and obtention of exact, and approximation, algorithms and heuristics for a mesh partitioning problem under memory constraints

    International Nuclear Information System (INIS)

    Morais, Sebastien

    2016-01-01

    In many scientific areas, the size and the complexity of numerical simulations lead to make intensive use of massively parallel runs on High Performance Computing (HPC) architectures. Such computers consist in a set of processing units (PU) where memory is distributed. Distribution of simulation data is therefore crucial: it has to minimize the computation time of the simulation while ensuring that the data allocated to every PU can be locally stored in memory. For most of the numerical simulations, the physical and numerical data are based on a mesh. The computations are then performed at the cell level (for example within triangles and quadrilaterals in 2D, or within tetrahedrons and hexahedrons in 3D). More specifically, computing and memory cost can be associated to each cell. In our context, where the mathematical methods used are finite elements or finite volumes, the realization of the computations associated with a cell may require information carried by neighboring cells. The standard implementation relies to locally store useful data of this neighborhood on the PU, even if cells of this neighborhood are not locally computed. Such non computed but stored cells are called ghost cells, and can have a significant impact on the memory consumption of a PU. The problem to solve is thus not only to partition a mesh on several parts by affecting each cell to one and only one part while minimizing the computational load assigned to each part. It is also necessary to keep into account that the memory load of both the cells where the computations are performed and their neighbors has to fit into PU memory. This leads to partition the computations while the mesh is distributed with overlaps. Explicitly taking these data overlaps into account is the problem that we propose to study. (author) [fr

  15. Feature-Based Visual Short-Term Memory Is Widely Distributed and Hierarchically Organized.

    Science.gov (United States)

    Dotson, Nicholas M; Hoffman, Steven J; Goodell, Baldwin; Gray, Charles M

    2018-06-15

    Feature-based visual short-term memory is known to engage both sensory and association cortices. However, the extent of the participating circuit and the neural mechanisms underlying memory maintenance is still a matter of vigorous debate. To address these questions, we recorded neuronal activity from 42 cortical areas in monkeys performing a feature-based visual short-term memory task and an interleaved fixation task. We find that task-dependent differences in firing rates are widely distributed throughout the cortex, while stimulus-specific changes in firing rates are more restricted and hierarchically organized. We also show that microsaccades during the memory delay encode the stimuli held in memory and that units modulated by microsaccades are more likely to exhibit stimulus specificity, suggesting that eye movements contribute to visual short-term memory processes. These results support a framework in which most cortical areas, within a modality, contribute to mnemonic representations at timescales that increase along the cortical hierarchy. Copyright © 2018 Elsevier Inc. All rights reserved.

  16. ATLAS Distributed Computing: Experience and Evolution

    CERN Document Server

    Nairz, A; The ATLAS collaboration

    2013-01-01

    The ATLAS experiment has just concluded its first running period which commenced in 2010. After two years of remarkable performance from the LHC and ATLAS, the experiment has accumulated more than 25 fb-1 of data. The total volume of beam and simulated data products exceeds 100 PB distributed across more than 150 computing centers around the world, managed by the experiment's distributed data management system. These sites have provided up to 150,000 computing cores to ATLAS's global production and analysis processing system, enabling a rich physics program including the discovery of the Higgs-like boson in 2012. The wealth of accumulated experience in global data-intensive computing at this massive scale, and the considerably more challenging requirements of LHC computing from 2014 when the LHC resumes operation, are driving a comprehensive design and development cycle to prepare a revised computing model together with data processing and management systems able to meet the demands of higher trigger rates, e...

  17. ATLAS distributed computing: experience and evolution

    CERN Document Server

    Nairz, A; The ATLAS collaboration

    2014-01-01

    The ATLAS experiment has just concluded its first running period which commenced in 2010. After two years of remarkable performance from the LHC and ATLAS, the experiment has accumulated more than 25/fb of data. The total volume of beam and simulated data products exceeds 100~PB distributed across more than 150 computing centres around the world, managed by the experiment's distributed data management system. These sites have provided up to 150,000 computing cores to ATLAS's global production and analysis processing system, enabling a rich physics programme including the discovery of the Higgs-like boson in 2012. The wealth of accumulated experience in global data-intensive computing at this massive scale, and the considerably more challenging requirements of LHC computing from 2015 when the LHC resumes operation, are driving a comprehensive design and development cycle to prepare a revised computing model together with data processing and management systems able to meet the demands of higher trigger rates, e...

  18. Computational strategies for three-dimensional flow simulations on distributed computer systems

    Science.gov (United States)

    Sankar, Lakshmi N.; Weed, Richard A.

    1995-08-01

    This research effort is directed towards an examination of issues involved in porting large computational fluid dynamics codes in use within the industry to a distributed computing environment. This effort addresses strategies for implementing the distributed computing in a device independent fashion and load balancing. A flow solver called TEAM presently in use at Lockheed Aeronautical Systems Company was acquired to start this effort. The following tasks were completed: (1) The TEAM code was ported to a number of distributed computing platforms including a cluster of HP workstations located in the School of Aerospace Engineering at Georgia Tech; a cluster of DEC Alpha Workstations in the Graphics visualization lab located at Georgia Tech; a cluster of SGI workstations located at NASA Ames Research Center; and an IBM SP-2 system located at NASA ARC. (2) A number of communication strategies were implemented. Specifically, the manager-worker strategy and the worker-worker strategy were tested. (3) A variety of load balancing strategies were investigated. Specifically, the static load balancing, task queue balancing and the Crutchfield algorithm were coded and evaluated. (4) The classical explicit Runge-Kutta scheme in the TEAM solver was replaced with an LU implicit scheme. And (5) the implicit TEAM-PVM solver was extensively validated through studies of unsteady transonic flow over an F-5 wing, undergoing combined bending and torsional motion. These investigations are documented in extensive detail in the dissertation, 'Computational Strategies for Three-Dimensional Flow Simulations on Distributed Computing Systems', enclosed as an appendix.

  19. Computational strategies for three-dimensional flow simulations on distributed computer systems

    Science.gov (United States)

    Sankar, Lakshmi N.; Weed, Richard A.

    1995-01-01

    This research effort is directed towards an examination of issues involved in porting large computational fluid dynamics codes in use within the industry to a distributed computing environment. This effort addresses strategies for implementing the distributed computing in a device independent fashion and load balancing. A flow solver called TEAM presently in use at Lockheed Aeronautical Systems Company was acquired to start this effort. The following tasks were completed: (1) The TEAM code was ported to a number of distributed computing platforms including a cluster of HP workstations located in the School of Aerospace Engineering at Georgia Tech; a cluster of DEC Alpha Workstations in the Graphics visualization lab located at Georgia Tech; a cluster of SGI workstations located at NASA Ames Research Center; and an IBM SP-2 system located at NASA ARC. (2) A number of communication strategies were implemented. Specifically, the manager-worker strategy and the worker-worker strategy were tested. (3) A variety of load balancing strategies were investigated. Specifically, the static load balancing, task queue balancing and the Crutchfield algorithm were coded and evaluated. (4) The classical explicit Runge-Kutta scheme in the TEAM solver was replaced with an LU implicit scheme. And (5) the implicit TEAM-PVM solver was extensively validated through studies of unsteady transonic flow over an F-5 wing, undergoing combined bending and torsional motion. These investigations are documented in extensive detail in the dissertation, 'Computational Strategies for Three-Dimensional Flow Simulations on Distributed Computing Systems', enclosed as an appendix.

  20. Distributed computer systems theory and practice

    CERN Document Server

    Zedan, H S M

    2014-01-01

    Distributed Computer Systems: Theory and Practice is a collection of papers dealing with the design and implementation of operating systems, including distributed systems, such as the amoeba system, argus, Andrew, and grapevine. One paper discusses the concepts and notations for concurrent programming, particularly language notation used in computer programming, synchronization methods, and also compares three classes of languages. Another paper explains load balancing or load redistribution to improve system performance, namely, static balancing and adaptive load balancing. For program effici

  1. Distributed computing for real-time petroleum reservoir monitoring

    Energy Technology Data Exchange (ETDEWEB)

    Ayodele, O. R. [University of Alberta, Edmonton, AB (Canada)

    2004-05-01

    Computer software architecture is presented to illustrate how the concept of distributed computing can be applied to real-time reservoir monitoring processes, permitting the continuous monitoring of the dynamic behaviour of petroleum reservoirs at much shorter intervals. The paper describes the fundamental technologies driving distributed computing, namely Java 2 Platform Enterprise edition (J2EE) by Sun Microsystems, and the Microsoft Dot-Net (Microsoft.Net) initiative, and explains the challenges involved in distributed computing. These are: (1) availability of permanently placed downhole equipment to acquire and transmit seismic data; (2) availability of high bandwidth to transmit the data; (3) security considerations; (4) adaptation of existing legacy codes to run on networks as downloads on demand; and (5) credibility issues concerning data security over the Internet. Other applications of distributed computing in the petroleum industry are also considered, specifically MWD, LWD and SWD (measurement-while-drilling, logging-while-drilling, and simulation-while-drilling), and drill-string vibration monitoring. 23 refs., 1 fig.

  2. Computing visibility on terrains in external memory

    NARCIS (Netherlands)

    Haverkort, H.J.; Toma, L.; Zhuang, Yi

    2007-01-01

    We describe a novel application of the distribution sweeping technique to computing visibility on terrains. Given an arbitrary viewpoint v, the basic problem we address is computing the visibility map or viewshed of v, which is the set of points in the terrain that are visible from v. We give the

  3. Dynamic overset grid communication on distributed memory parallel processors

    Science.gov (United States)

    Barszcz, Eric; Weeratunga, Sisira K.; Meakin, Robert L.

    1993-01-01

    A parallel distributed memory implementation of intergrid communication for dynamic overset grids is presented. Included are discussions of various options considered during development. Results are presented comparing an Intel iPSC/860 to a single processor Cray Y-MP. Results for grids in relative motion show the iPSC/860 implementation to be faster than the Cray implementation.

  4. A database for on-line event analysis on a distributed memory machine

    CERN Document Server

    Argante, E; Van der Stok, P D V; Willers, Ian Malcolm

    1995-01-01

    Parallel in-memory databases can enhance the structuring and parallelization of programs used in High Energy Physics (HEP). Efficient database access routines are used as communication primitives which hide the communication topology in contrast to the more explicit communications like PVM or MPI. A parallel in-memory database, called SPIDER, has been implemented on a 32 node Meiko CS-2 distributed memory machine. The spider primitives generate a lower overhead than the one generated by PVM or PMI. The event reconstruction program, CPREAD of the CPLEAR experiment, has been used as a test case. Performance measurerate generated by CPLEAR.

  5. AMD's 64-bit Opteron processor

    CERN Multimedia

    CERN. Geneva

    2003-01-01

    This talk concentrates on issues that relate to obtaining peak performance from the Opteron processor. Compiler options, memory layout, MPI issues in multi-processor configurations and the use of a NUMA kernel will be covered. A discussion of recent benchmarking projects and results will also be included.BiographiesDavid RichDavid directs AMD's efforts in high performance computing and also in the use of Opteron processors...

  6. Towards distributed multiscale computing for the VPH

    NARCIS (Netherlands)

    Hoekstra, A.G.; Coveney, P.

    2010-01-01

    Multiscale modeling is fundamental to the Virtual Physiological Human (VPH) initiative. Most detailed three-dimensional multiscale models lead to prohibitive computational demands. As a possible solution we present MAPPER, a computational science infrastructure for Distributed Multiscale Computing

  7. Perspective: Memcomputing: Leveraging memory and physics to compute efficiently

    Science.gov (United States)

    Di Ventra, Massimiliano; Traversa, Fabio L.

    2018-05-01

    It is well known that physical phenomena may be of great help in computing some difficult problems efficiently. A typical example is prime factorization that may be solved in polynomial time by exploiting quantum entanglement on a quantum computer. There are, however, other types of (non-quantum) physical properties that one may leverage to compute efficiently a wide range of hard problems. In this perspective, we discuss how to employ one such property, memory (time non-locality), in a novel physics-based approach to computation: Memcomputing. In particular, we focus on digital memcomputing machines (DMMs) that are scalable. DMMs can be realized with non-linear dynamical systems with memory. The latter property allows the realization of a new type of Boolean logic, one that is self-organizing. Self-organizing logic gates are "terminal-agnostic," namely, they do not distinguish between the input and output terminals. When appropriately assembled to represent a given combinatorial/optimization problem, the corresponding self-organizing circuit converges to the equilibrium points that express the solutions of the problem at hand. In doing so, DMMs take advantage of the long-range order that develops during the transient dynamics. This collective dynamical behavior, reminiscent of a phase transition, or even the "edge of chaos," is mediated by families of classical trajectories (instantons) that connect critical points of increasing stability in the system's phase space. The topological character of the solution search renders DMMs robust against noise and structural disorder. Since DMMs are non-quantum systems described by ordinary differential equations, not only can they be built in hardware with the available technology, they can also be simulated efficiently on modern classical computers. As an example, we will show the polynomial-time solution of the subset-sum problem for the worst cases, and point to other types of hard problems where simulations of DMMs

  8. On the impact of communication complexity in the design of parallel numerical algorithms

    Science.gov (United States)

    Gannon, D.; Vanrosendale, J.

    1984-01-01

    This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation.

  9. Standard interfaces for program-modular multiprocessor systems

    International Nuclear Information System (INIS)

    Chernykh, E.V.

    1982-01-01

    The peculiarities of the structures of existing and developed standard interfaces used in automation systems for nuclear physical experiments are considered. general structural characteristics of multiprocessor system interfaces are revealed. The comparison of the existing system CAMAC crate and designed standards of COMPEX, E3S and FASTBUS interfaces by capacity and relative cost is carried out. The analysis of the given data shows that operation of any interface is more advantageous at the rates close to capacity values, the relative cost being minimum. In this case the advantage is on the side of interfaces with greater capacity values for which at a moderated decrease of the exchange or requests processing rate the relative costs grow slower. A higher capacity of one-cycle exchange is provided with functional data way specialization in the interface. The conclusion is drawn that most perspective trend in the development of automation systems for high energy physics experiments is using FASTBUS standard

  10. Computation of the efficiency distribution of a multichannel focusing collimator

    International Nuclear Information System (INIS)

    Balasubramanian, A.; Venkateswaran, T.V.

    1977-01-01

    This article describes two computer methods of calculating the point source efficiency distribution functions of a focusing collimator with round tapered holes. The first method which computes only the geometric efficiency distribution is adequate for low energy collimators while the second method which computes both geometric and penetration efficiencies can be made use of for medium and high energy collimators. The scatter contribution to the efficiency is not taken into account. In the first method the efficiency distribution of a single cone of the collimator is obtained and the data are used for computing the distribution of the whole collimator. For high energy collimator the entire detector region is imagined to be divided into elemental areas. Efficiency of the elemental area is computed after suitably weighting for the penetration within the collimator septa, which is determined by three dimensional geometric techniques. The method of computing the line source efficiency distribution from point source distribution is also explained. The formulations have been tested by computing the efficiency distribution of several commercial collimators and collimators fabricated by us. (Auth.)

  11. Distributed-Memory Breadth-First Search on Massive Graphs

    Energy Technology Data Exchange (ETDEWEB)

    Buluc, Aydin [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Computational Research Division; Beamer, Scott [Univ. of California, Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences; Madduri, Kamesh [Pennsylvania State Univ., University Park, PA (United States). Computer Science & Engineering Dept.; Asanovic, Krste [Univ. of California, Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences; Patterson, David [Univ. of California, Berkeley, CA (United States). Dept. of Electrical Engineering and Computer Sciences

    2017-09-26

    This chapter studies the problem of traversing large graphs using the breadth-first search order on distributed-memory supercomputers. We consider both the traditional level-synchronous top-down algorithm as well as the recently discovered direction optimizing algorithm. We analyze the performance and scalability trade-offs in using different local data structures such as CSR and DCSC, enabling in-node multithreading, and graph decompositions such as 1D and 2D decomposition.

  12. Parallelization of a Monte Carlo particle transport simulation code

    Science.gov (United States)

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  13. A Weibull distribution accrual failure detector for cloud computing.

    Science.gov (United States)

    Liu, Jiaxi; Wu, Zhibo; Wu, Jin; Dong, Jian; Zhao, Yao; Wen, Dongxin

    2017-01-01

    Failure detectors are used to build high availability distributed systems as the fundamental component. To meet the requirement of a complicated large-scale distributed system, accrual failure detectors that can adapt to multiple applications have been studied extensively. However, several implementations of accrual failure detectors do not adapt well to the cloud service environment. To solve this problem, a new accrual failure detector based on Weibull Distribution, called the Weibull Distribution Failure Detector, has been proposed specifically for cloud computing. It can adapt to the dynamic and unexpected network conditions in cloud computing. The performance of the Weibull Distribution Failure Detector is evaluated and compared based on public classical experiment data and cloud computing experiment data. The results show that the Weibull Distribution Failure Detector has better performance in terms of speed and accuracy in unstable scenarios, especially in cloud computing.

  14. Exploiting short-term memory in soft body dynamics as a computational resource.

    Science.gov (United States)

    Nakajima, K; Li, T; Hauser, H; Pfeifer, R

    2014-11-06

    Soft materials are not only highly deformable, but they also possess rich and diverse body dynamics. Soft body dynamics exhibit a variety of properties, including nonlinearity, elasticity and potentially infinitely many degrees of freedom. Here, we demonstrate that such soft body dynamics can be employed to conduct certain types of computation. Using body dynamics generated from a soft silicone arm, we show that they can be exploited to emulate functions that require memory and to embed robust closed-loop control into the arm. Our results suggest that soft body dynamics have a short-term memory and can serve as a computational resource. This finding paves the way towards exploiting passive body dynamics for control of a large class of underactuated systems. © 2014 The Author(s) Published by the Royal Society. All rights reserved.

  15. Parallel SN algorithms in shared- and distributed-memory environments

    International Nuclear Information System (INIS)

    Haghighat, Alireza; Hunter, Melissa A.; Mattis, Ronald E.

    1995-01-01

    Different 2-D spatial domain partitioning Sn transport theory algorithms have been developed on the basis of the Block-Jacobi iterative scheme. These algorithms have been incorporated into TWOTRAN-II, and tested on a shared-memory CRAY Y-MP C90 and a distributed-memory IBM SP1. For a series of fixed source r-z geometry homogeneous problems, parallel efficiencies in a range of 50-90% are achieved on the C90 with 6 processors, and lower values (20-60%) are obtained on the SP1. It is demonstrated that better performance is attainable if one addresses issues such as convergence rate, load-balancing, and granularity for both architectures, as well as message passing (network bandwidth and latency) for SP1. (author). 17 refs, 4 figs

  16. The Principals and Practice of Distributed High Throughput Computing

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    The potential of Distributed Processing Systems to deliver computing capabilities with qualities ranging from high availability and reliability to easy expansion in functionality and capacity were recognized and formalized in the 1970’s. For more three decade these principals Distributed Computing guided the development of the HTCondor resource and job management system. The widely adopted suite of software tools offered by HTCondor are based on novel distributed computing technologies and are driven by the evolving needs of High Throughput scientific applications. We will review the principals that underpin our work, the distributed computing frameworks and technologies we developed and the lessons we learned from delivering effective and dependable software tools in an ever changing landscape computing technologies and needs that range today from a desktop computer to tens of thousands of cores offered by commercial clouds. About the speaker Miron Livny received a B.Sc. degree in Physics and Mat...

  17. 7th International Symposium on Intelligent Distributed Computing

    CERN Document Server

    Jung, Jason; Badica, Costin

    2014-01-01

    This book represents the combined peer-reviewed proceedings of the Seventh International Symposium on Intelligent Distributed Computing - IDC-2013, of the Second Workshop on Agents for Clouds - A4C-2013, of the Fifth International Workshop on Multi-Agent Systems Technology and Semantics - MASTS-2013, and of the International Workshop on Intelligent Robots - iR-2013. All the events were held in Prague, Czech Republic during September 4-6, 2013. The 41 contributions published in this book address many topics related to theory and applications of intelligent distributed computing and multi-agent systems, including: agent-based data processing, ambient intelligence, bio-informatics, collaborative systems, cryptography and security, distributed algorithms, grid and cloud computing, information extraction, intelligent robotics, knowledge management, linked data, mobile agents, ontologies, pervasive computing, self-organizing systems, peer-to-peer computing, social networks and trust, and swarm intelligence.  .

  18. ATLAS distributed computing: experience and evolution

    International Nuclear Information System (INIS)

    Nairz, A

    2014-01-01

    The ATLAS experiment has just concluded its first running period which commenced in 2010. After two years of remarkable performance from the LHC and ATLAS, the experiment has accumulated more than 25 fb −1 of data. The total volume of beam and simulated data products exceeds 100 PB distributed across more than 150 computing centres around the world, managed by the experiment's distributed data management system. These sites have provided up to 150,000 computing cores to ATLAS's global production and analysis processing system, enabling a rich physics programme including the discovery of the Higgs-like boson in 2012. The wealth of accumulated experience in global data-intensive computing at this massive scale, and the considerably more challenging requirements of LHC computing from 2015 when the LHC resumes operation, are driving a comprehensive design and development cycle to prepare a revised computing model together with data processing and management systems able to meet the demands of higher trigger rates, energies and event complexities. An essential requirement will be the efficient utilisation of current and future processor technologies as well as a broad range of computing platforms, including supercomputing and cloud resources. We will report on experience gained thus far and our progress in preparing ATLAS computing for the future

  19. From shoebox to performative agent: the computer as personal memory machine

    NARCIS (Netherlands)

    van Dijck, J.

    2005-01-01

    Digital technologies offer new opportunities in the everyday lives of people: with still expanding memory capacities, the computer is rapidly becoming a giant storage and processing facility for recording and retrieving ‘bits of life’. Software engineers and companies promise not only to expand the

  20. Distributed computing by oblivious mobile robots

    CERN Document Server

    Flocchini, Paola; Santoro, Nicola

    2012-01-01

    The study of what can be computed by a team of autonomous mobile robots, originally started in robotics and AI, has become increasingly popular in theoretical computer science (especially in distributed computing), where it is now an integral part of the investigations on computability by mobile entities. The robots are identical computational entities located and able to move in a spatial universe; they operate without explicit communication and are usually unable to remember the past; they are extremely simple, with limited resources, and individually quite weak. However, collectively the ro

  1. Simulation model of load balancing in distributed computing systems

    Science.gov (United States)

    Botygin, I. A.; Popov, V. N.; Frolov, S. G.

    2017-02-01

    The availability of high-performance computing, high speed data transfer over the network and widespread of software for the design and pre-production in mechanical engineering have led to the fact that at the present time the large industrial enterprises and small engineering companies implement complex computer systems for efficient solutions of production and management tasks. Such computer systems are generally built on the basis of distributed heterogeneous computer systems. The analytical problems solved by such systems are the key models of research, but the system-wide problems of efficient distribution (balancing) of the computational load and accommodation input, intermediate and output databases are no less important. The main tasks of this balancing system are load and condition monitoring of compute nodes, and the selection of a node for transition of the user’s request in accordance with a predetermined algorithm. The load balancing is one of the most used methods of increasing productivity of distributed computing systems through the optimal allocation of tasks between the computer system nodes. Therefore, the development of methods and algorithms for computing optimal scheduling in a distributed system, dynamically changing its infrastructure, is an important task.

  2. Parallel hierarchical global illumination

    Energy Technology Data Exchange (ETDEWEB)

    Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)

    1997-10-08

    Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.

  3. The FORCE - A highly portable parallel programming language

    Science.gov (United States)

    Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

    1989-01-01

    This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.

  4. Ultra-fast fluence optimization for beam angle selection algorithms

    Science.gov (United States)

    Bangert, M.; Ziegenhein, P.; Oelfke, U.

    2014-03-01

    Beam angle selection (BAS) including fluence optimization (FO) is among the most extensive computational tasks in radiotherapy. Precomputed dose influence data (DID) of all considered beam orientations (up to 100 GB for complex cases) has to be handled in the main memory and repeated FOs are required for different beam ensembles. In this paper, the authors describe concepts accelerating FO for BAS algorithms using off-the-shelf multiprocessor workstations. The FO runtime is not dominated by the arithmetic load of the CPUs but by the transportation of DID from the RAM to the CPUs. On multiprocessor workstations, however, the speed of data transportation from the main memory to the CPUs is non-uniform across the RAM; every CPU has a dedicated memory location (node) with minimum access time. We apply a thread node binding strategy to ensure that CPUs only access DID from their preferred node. Ideal load balancing for arbitrary beam ensembles is guaranteed by distributing the DID of every candidate beam equally to all nodes. Furthermore we use a custom sorting scheme of the DID to minimize the overall data transportation. The framework is implemented on an AMD Opteron workstation. One FO iteration comprising dose, objective function, and gradient calculation takes between 0.010 s (9 beams, skull, 0.23 GB DID) and 0.070 s (9 beams, abdomen, 1.50 GB DID). Our overall FO time is < 1 s for small cases, larger cases take ~ 4 s. BAS runs including FOs for 1000 different beam ensembles take ~ 15-70 min, depending on the treatment site. This enables an efficient clinical evaluation of different BAS algorithms.

  5. Ultra-fast fluence optimization for beam angle selection algorithms

    International Nuclear Information System (INIS)

    Bangert, M; Ziegenhein, P; Oelfke, U

    2014-01-01

    Beam angle selection (BAS) including fluence optimization (FO) is among the most extensive computational tasks in radiotherapy. Precomputed dose influence data (DID) of all considered beam orientations (up to 100 GB for complex cases) has to be handled in the main memory and repeated FOs are required for different beam ensembles. In this paper, the authors describe concepts accelerating FO for BAS algorithms using off-the-shelf multiprocessor workstations. The FO runtime is not dominated by the arithmetic load of the CPUs but by the transportation of DID from the RAM to the CPUs. On multiprocessor workstations, however, the speed of data transportation from the main memory to the CPUs is non-uniform across the RAM; every CPU has a dedicated memory location (node) with minimum access time. We apply a thread node binding strategy to ensure that CPUs only access DID from their preferred node. Ideal load balancing for arbitrary beam ensembles is guaranteed by distributing the DID of every candidate beam equally to all nodes. Furthermore we use a custom sorting scheme of the DID to minimize the overall data transportation. The framework is implemented on an AMD Opteron workstation. One FO iteration comprising dose, objective function, and gradient calculation takes between 0.010 s (9 beams, skull, 0.23 GB DID) and 0.070 s (9 beams, abdomen, 1.50 GB DID). Our overall FO time is < 1 s for small cases, larger cases take ∼ 4 s. BAS runs including FOs for 1000 different beam ensembles take ∼ 15–70 min, depending on the treatment site. This enables an efficient clinical evaluation of different BAS algorithms.

  6. Irrelevant sensory stimuli interfere with working memory storage: evidence from a computational model of prefrontal neurons.

    Science.gov (United States)

    Bancroft, Tyler D; Hockley, William E; Servos, Philip

    2013-03-01

    The encoding of irrelevant stimuli into the memory store has previously been suggested as a mechanism of interference in working memory (e.g., Lange & Oberauer, Memory, 13, 333-339, 2005; Nairne, Memory & Cognition, 18, 251-269, 1990). Recently, Bancroft and Servos (Experimental Brain Research, 208, 529-532, 2011) used a tactile working memory task to provide experimental evidence that irrelevant stimuli were, in fact, encoded into working memory. In the present study, we replicated Bancroft and Servos's experimental findings using a biologically based computational model of prefrontal neurons, providing a neurocomputational model of overwriting in working memory. Furthermore, our modeling results show that inhibition acts to protect the contents of working memory, and they suggest a need for further experimental research into the capacity of vibrotactile working memory.

  7. Modeling Workflow Management in a Distributed Computing System ...

    African Journals Online (AJOL)

    Distributed computing is becoming increasingly important in our daily life. This is because it enables the people who use it to share information more rapidly and increases their productivity. A major characteristic feature or distributed computing is the explicit representation of process logic within a communication system, ...

  8. Method for prefetching non-contiguous data structures

    Science.gov (United States)

    Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Brewster, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

    2009-05-05

    A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.

  9. Frequent Statement and Dereference Elimination for Imperative and Object-Oriented Distributed Programs

    Science.gov (United States)

    El-Zawawy, Mohamed A.

    2014-01-01

    This paper introduces new approaches for the analysis of frequent statement and dereference elimination for imperative and object-oriented distributed programs running on parallel machines equipped with hierarchical memories. The paper uses languages whose address spaces are globally partitioned. Distributed programs allow defining data layout and threads writing to and reading from other thread memories. Three type systems (for imperative distributed programs) are the tools of the proposed techniques. The first type system defines for every program point a set of calculated (ready) statements and memory accesses. The second type system uses an enriched version of types of the first type system and determines which of the ready statements and memory accesses are used later in the program. The third type system uses the information gather so far to eliminate unnecessary statement computations and memory accesses (the analysis of frequent statement and dereference elimination). Extensions to these type systems are also presented to cover object-oriented distributed programs. Two advantages of our work over related work are the following. The hierarchical style of concurrent parallel computers is similar to the memory model used in this paper. In our approach, each analysis result is assigned a type derivation (serves as a correctness proof). PMID:24892098

  10. Investigating Solution Convergence in a Global Ocean Model Using a 2048-Processor Cluster of Distributed Shared Memory Machines

    Directory of Open Access Journals (Sweden)

    Chris Hill

    2007-01-01

    Full Text Available Up to 1920 processors of a cluster of distributed shared memory machines at the NASA Ames Research Center are being used to simulate ocean circulation globally at horizontal resolutions of 1/4, 1/8, and 1/16-degree with the Massachusetts Institute of Technology General Circulation Model, a finite volume code that can scale to large numbers of processors. The study aims to understand physical processes responsible for skill improvements as resolution is increased and to gain insight into what resolution is sufficient for particular purposes. This paper focuses on the computational aspects of reaching the technical objective of efficiently performing these global eddy-resolving ocean simulations. At 1/16-degree resolution the model grid contains 1.2 billion cells. At this resolution it is possible to simulate approximately one month of ocean dynamics in about 17 hours of wallclock time with a model timestep of two minutes on a cluster of four 512-way NUMA Altix systems. The Altix systems' large main memory and I/O subsystems allow computation and disk storage of rich sets of diagnostics during each integration, supporting the scientific objective to develop a better understanding of global ocean circulation model solution convergence as model resolution is increased.

  11. Realtime Audio with Garbage Collection

    OpenAIRE

    Matheussen, Kjetil Svalastog

    2010-01-01

    Two non-moving concurrent garbage collectors tailored for realtime audio processing are described. Both collectors work on copies of the heap to avoid cache misses and audio-disruptive synchronizations. Both collectors are targeted at multiprocessor personal computers. The first garbage collector works in uncooperative environments, and can replace Hans Boehm's conservative garbage collector for C and C++. The collector does not access the virtual memory system. Neither doe...

  12. Quality-Driven Model-Based Design of MultiProcessor Embedded Systems for Highlydemanding Applications

    DEFF Research Database (Denmark)

    Jozwiak, Lech; Madsen, Jan

    2013-01-01

    The recent spectacular progress in modern nano-dimension semiconductor technology enabled implementation of a complete complex multi-processor system on a single chip (MPSoC), global networking and mobile wire-less communication, and facilitated a fast progress in these areas. New important...... accessible or distant) objects, installations, machines or devices, or even implanted in human or animal body can serve as examples. However, many of the modern embedded application impose very stringent functional and parametric demands. Moreover, the spectacular advances in microelectronics introduced...

  13. AGIS: Integration of new technologies used in ATLAS Distributed Computing

    CERN Document Server

    AUTHOR|(INSPIRE)INSPIRE-00291854; The ATLAS collaboration; Di Girolamo, Alessandro; Alandes Pradillo, Maria

    2017-01-01

    The variety of the ATLAS Distributed Computing infrastructure requires a central information system to define the topology of computing resources and to store different parameters and configuration data which are needed by various ATLAS software components. The ATLAS Grid Information System (AGIS) is the system designed to integrate configuration and status information about resources, services and topology of the computing infrastructure used by ATLAS Distributed Computing applications and services. Being an intermediate middleware system between clients and external information sources (like central BDII, GOCDB, MyOSG), AGIS defines the relations between experiment specific used resources and physical distributed computing capabilities. Being in production during LHC Runl AGIS became the central information system for Distributed Computing in ATLAS and it is continuously evolving to fulfil new user requests, enable enhanced operations and follow the extension of the ATLAS Computing model. The ATLAS Computin...

  14. Analysis and Optimisation of Hierarchically Scheduled Multiprocessor Embedded Systems

    DEFF Research Database (Denmark)

    Pop, Traian; Pop, Paul; Eles, Petru

    2008-01-01

    We present an approach to the analysis and optimisation of heterogeneous multiprocessor embedded systems. The systems are heterogeneous not only in terms of hardware components, but also in terms of communication protocols and scheduling policies. When several scheduling policies share a resource......, they are organised in a hierarchy. In this paper, we first develop a holistic scheduling and schedulability analysis that determines the timing properties of a hierarchically scheduled system. Second, we address design problems that are characteristic to such hierarchically scheduled systems: assignment...... of scheduling policies to tasks, mapping of tasks to hardware components, and the scheduling of the activities. We also present several algorithms for solving these problems. Our heuristics are able to find schedulable implementations under limited resources, achieving an efficient utilisation of the system...

  15. The numerical parallel computing of photon transport

    International Nuclear Information System (INIS)

    Huang Qingnan; Liang Xiaoguang; Zhang Lifa

    1998-12-01

    The parallel computing of photon transport is investigated, the parallel algorithm and the parallelization of programs on parallel computers both with shared memory and with distributed memory are discussed. By analyzing the inherent law of the mathematics and physics model of photon transport according to the structure feature of parallel computers, using the strategy of 'to divide and conquer', adjusting the algorithm structure of the program, dissolving the data relationship, finding parallel liable ingredients and creating large grain parallel subtasks, the sequential computing of photon transport into is efficiently transformed into parallel and vector computing. The program was run on various HP parallel computers such as the HY-1 (PVP), the Challenge (SMP) and the YH-3 (MPP) and very good parallel speedup has been gotten

  16. AGIS: Integration of new technologies used in ATLAS Distributed Computing

    Science.gov (United States)

    Anisenkov, Alexey; Di Girolamo, Alessandro; Alandes Pradillo, Maria

    2017-10-01

    The variety of the ATLAS Distributed Computing infrastructure requires a central information system to define the topology of computing resources and to store different parameters and configuration data which are needed by various ATLAS software components. The ATLAS Grid Information System (AGIS) is the system designed to integrate configuration and status information about resources, services and topology of the computing infrastructure used by ATLAS Distributed Computing applications and services. Being an intermediate middleware system between clients and external information sources (like central BDII, GOCDB, MyOSG), AGIS defines the relations between experiment specific used resources and physical distributed computing capabilities. Being in production during LHC Runl AGIS became the central information system for Distributed Computing in ATLAS and it is continuously evolving to fulfil new user requests, enable enhanced operations and follow the extension of the ATLAS Computing model. The ATLAS Computing model and data structures used by Distributed Computing applications and services are continuously evolving and trend to fit newer requirements from ADC community. In this note, we describe the evolution and the recent developments of AGIS functionalities, related to integration of new technologies recently become widely used in ATLAS Computing, like flexible computing utilization of opportunistic Cloud and HPC resources, ObjectStore services integration for Distributed Data Management (Rucio) and ATLAS workload management (PanDA) systems, unified storage protocols declaration required for PandDA Pilot site movers and others. The improvements of information model and general updates are also shown, in particular we explain how other collaborations outside ATLAS could benefit the system as a computing resources information catalogue. AGIS is evolving towards a common information system, not coupled to a specific experiment.

  17. A distributed computer system for digitising machines

    International Nuclear Information System (INIS)

    Bairstow, R.; Barlow, J.; Waters, M.; Watson, J.

    1977-07-01

    This paper describes a Distributed Computing System, based on micro computers, for the monitoring and control of digitising tables used by the Rutherford Laboratory Bubble Chamber Research Group in the measurement of bubble chamber photographs. (author)

  18. EGSNRC distributed systems on commercial network

    International Nuclear Information System (INIS)

    McCormack, J.M.

    2001-01-01

    Full text: EGSnrc is a Monte Carlo based simulation program for determining radiation dose distribution within a body. Computational times are large as each individual photon path must be calculated and every energy absorption event stored. This means that EGSnrc lends itself to distributed processing, as each photon is independent of the next, and code is included within the package to enable this. EGSnrc is currently only supported on Unix based computer systems, whilst the department has ∼45 Pentium II and III class workstations all operating under Windows NT within a Novell network. This investigation demonstrates the capability of a windows based system to perform distributed computation of EGSnrc. All Unix scripts were modified to work as one single Windows NT batch file. The source code was then compiled using the gcc C compiler (a Windows NT version of the Unix compiler) without modification of the underlying source code. A small Visual Basic program was used as a trigger to start the simulation as a Windows NT service, with Novell Z.E.N. Works to distribute the trigger code to each system. When a trigger was received, the computer began a simulation as a low priority task in such a way that the user did not see anything on the screen, and so the simulation did not slow down the general running of the computer. The results were then transferred to the network, and collated on a central computer. As an unattended system, a calculation can start within 15 minutes of any desired time, calculate the desired results, and return the results for collation. This demonstrated effectively a distributed Windows NT TM EGSnrc system. Simulations must be chosen carefully to ensure that each photon can be considered independent, as photon histories do not get distributed. Each system that was used for EGSnrc was required to be capable of running the full EGSnrc simulation on its own EGSnrc stored the entire result array locally, so a large, high-resolution body required

  19. Retrieval and organizational strategies in conceptual memory a computer model

    CERN Document Server

    Kolodner, Janet L

    2014-01-01

    'Someday we expect that computers will be able to keep us informed about the news. People have imagined being able to ask their home computers questions such as "What's going on in the world?"…'. Originally published in 1984, this book is a fascinating look at the world of memory and computers before the internet became the mainstream phenomenon it is today. It looks at the early development of a computer system that could keep us informed in a way that we now take for granted. Presenting a theory of remembering, based on human information processing, it begins to address many of the hard problems implicated in the quest to make computers remember. The book had two purposes in presenting this theory of remembering. First, to be used in implementing intelligent computer systems, including fact retrieval systems and intelligent systems in general. Any intelligent program needs to use and store and use a great deal of knowledge. The strategies and structures in the book were designed to be used for that purpos...

  20. Computing possibilities in the mid 1990s

    International Nuclear Information System (INIS)

    Nash, T.

    1988-09-01

    This paper describes the kind of computing resources it may be possible to make available for experiments in high energy physics in the mid and late 1990s. We outline some of the work going on today, particularly at Fermilab's Advanced Computer Program, that projects to the future. We attempt to define areas in which coordinated R and D efforts should prove fruitful to provide for on and off-line computing in the SSC era. Because of extraordinary components anticipated from industry, we can be optimistic even to the level of predicting million VAX equivalent on-line multiprocessor/data acquisition systems for SSC detectors. Managing this scale of computing will require a new approach to large hardware and software systems. 15 refs., 6 figs

  1. Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

    Science.gov (United States)

    Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

    2017-12-01

    This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.

  2. The Sensitivity of Memory Consolidation and Reconsolidation to Inhibitors of Protein Synthesis and Kinases: Computational Analysis

    Science.gov (United States)

    Zhang, Yili; Smolen, Paul; Baxter, Douglas A.; Byrne, John H.

    2010-01-01

    Memory consolidation and reconsolidation require kinase activation and protein synthesis. Blocking either process during or shortly after training or recall disrupts memory stabilization, which suggests the existence of a critical time window during which these processes are necessary. Using a computational model of kinase synthesis and…

  3. Exploring memory hierarchy design with emerging memory technologies

    CERN Document Server

    Sun, Guangyu

    2014-01-01

    This book equips readers with tools for computer architecture of high performance, low power, and high reliability memory hierarchy in computer systems based on emerging memory technologies, such as STTRAM, PCM, FBDRAM, etc.  The techniques described offer advantages of high density, near-zero static power, and immunity to soft errors, which have the potential of overcoming the “memory wall.”  The authors discuss memory design from various perspectives: emerging memory technologies are employed in the memory hierarchy with novel architecture modification;  hybrid memory structure is introduced to leverage advantages from multiple memory technologies; an analytical model named “Moguls” is introduced to explore quantitatively the optimization design of a memory hierarchy; finally, the vulnerability of the CMPs to radiation-based soft errors is improved by replacing different levels of on-chip memory with STT-RAMs.   ·         Provides a holistic study of using emerging memory technologies i...

  4. Impact of singular excessive computer game and television exposure on sleep patterns and memory performance of school-aged children.

    Science.gov (United States)

    Dworak, Markus; Schierl, Thomas; Bruns, Thomas; Strüder, Heiko Klaus

    2007-11-01

    Television and computer game consumption are a powerful influence in the lives of most children. Previous evidence has supported the notion that media exposure could impair a variety of behavioral characteristics. Excessive television viewing and computer game playing have been associated with many psychiatric symptoms, especially emotional and behavioral symptoms, somatic complaints, attention problems such as hyperactivity, and family interaction problems. Nevertheless, there is insufficient knowledge about the relationship between singular excessive media consumption on sleep patterns and linked implications on children. The aim of this study was to investigate the effects of singular excessive television and computer game consumption on sleep patterns and memory performance of children. Eleven school-aged children were recruited for this polysomnographic study. Children were exposed to voluntary excessive television and computer game consumption. In the subsequent night, polysomnographic measurements were conducted to measure sleep-architecture and sleep-continuity parameters. In addition, a visual and verbal memory test was conducted before media stimulation and after the subsequent sleeping period to determine visuospatial and verbal memory performance. Only computer game playing resulted in significant reduced amounts of slow-wave sleep as well as significant declines in verbal memory performance. Prolonged sleep-onset latency and more stage 2 sleep were also detected after previous computer game consumption. No effects on rapid eye movement sleep were observed. Television viewing reduced sleep efficiency significantly but did not affect sleep patterns. The results suggest that television and computer game exposure affect children's sleep and deteriorate verbal cognitive performance, which supports the hypothesis of the negative influence of media consumption on children's sleep, learning, and memory.

  5. Learning to read aloud: A neural network approach using sparse distributed memory

    Science.gov (United States)

    Joglekar, Umesh Dwarkanath

    1989-01-01

    An attempt to solve a problem of text-to-phoneme mapping is described which does not appear amenable to solution by use of standard algorithmic procedures. Experiments based on a model of distributed processing are also described. This model (sparse distributed memory (SDM)) can be used in an iterative supervised learning mode to solve the problem. Additional improvements aimed at obtaining better performance are suggested.

  6. Method of computer generation and projection recording of microholograms for holographic memory systems: mathematical modelling and experimental implementation

    International Nuclear Information System (INIS)

    Betin, A Yu; Bobrinev, V I; Evtikhiev, N N; Zherdev, A Yu; Zlokazov, E Yu; Lushnikov, D S; Markin, V V; Odinokov, S B; Starikov, S N; Starikov, R S

    2013-01-01

    A method of computer generation and projection recording of microholograms for holographic memory systems is presented; the results of mathematical modelling and experimental implementation of the method are demonstrated. (holographic memory)

  7. Process Management and Exception Handling in Multiprocessor Operating Systems Using Object-Oriented Design Techniques. Revised Sep. 1988

    Science.gov (United States)

    Russo, Vincent; Johnston, Gary; Campbell, Roy

    1988-01-01

    The programming of the interrupt handling mechanisms, process switching primitives, scheduling mechanism, and synchronization primitives of an operating system for a multiprocessor require both efficient code in order to support the needs of high- performance or real-time applications and careful organization to facilitate maintenance. Although many advantages have been claimed for object-oriented class hierarchical languages and their corresponding design methodologies, the application of these techniques to the design of the primitives within an operating system has not been widely demonstrated. To investigate the role of class hierarchical design in systems programming, the authors have constructed the Choices multiprocessor operating system architecture the C++ programming language. During the implementation, it was found that many operating system design concerns can be represented advantageously using a class hierarchical approach, including: the separation of mechanism and policy; the organization of an operating system into layers, each of which represents an abstract machine; and the notions of process and exception management. In this paper, we discuss an implementation of the low-level primitives of this system and outline the strategy by which we developed our solution.

  8. A parallel solution-adaptive scheme for predicting multi-phase core flows in solid propellant rocket motors

    International Nuclear Information System (INIS)

    Sachdev, J.S.; Groth, C.P.T.; Gottlieb, J.J.

    2003-01-01

    The development of a parallel adaptive mesh refinement (AMR) scheme is described for solving the governing equations for multi-phase (gas-particle) core flows in solid propellant rocket motors (SRM). An Eulerian formulation is used to described the coupled motion between the gas and particle phases. A cell-centred upwind finite-volume discretization and the use of limited solution reconstruction, Riemann solver based flux functions for the gas and particle phases, and explicit multi-stage time-stepping allows for high solution accuracy and computational robustness. A Riemann problem is formulated for prescribing boundary data at the burning surface. Efficient and scalable parallel implementations are achieved with domain decomposition on distributed memory multiprocessor architectures. Numerical results are described to demonstrate the capabilities of the approach for predicting SRM core flows. (author)

  9. Overview of the Force Scientific Parallel Language

    Directory of Open Access Journals (Sweden)

    Gita Alaghband

    1994-01-01

    Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.

  10. Commodity multi-processor systems in the ATLAS level-2 trigger

    International Nuclear Information System (INIS)

    Abolins, M.; Blair, R.; Bock, R.; Bogaerts, A.; Dawson, J.; Ermoline, Y.; Hauser, R.; Kugel, A.; Lay, R.; Muller, M.; Noffz, K.-H.; Pope, B.; Schlereth, J.; Werner, P.

    2000-01-01

    Low cost SMP (Symmetric Multi-Processor) systems provide substantial CPU and I/O capacity. These features together with the ease of system integration make them an attractive and cost effective solution for a number of real-time applications in event selection. In ATLAS the authors consider them as intelligent input buffers (active ROB complex), as event flow supervisors or as powerful processing nodes. Measurements of the performance of one off-the-shelf commercial 4-processor PC with two PCI buses, equipped with commercial FPGA based data source cards (microEnable) and running commercial software are presented and mapped on such applications together with a long-term program of work. The SMP systems may be considered as an important building block in future data acquisition systems

  11. Commodity multi-processor systems in the ATLAS level-2 trigger

    CERN Document Server

    Abolins, M; Bock, R; Bogaerts, J A C; Dawson, J; Ermoline, Y; Hauser, R; Kugel, A; Lay, R; Müller, M; Noffz, K H; Pope, B; Schlereth, J L; Werner, P

    2000-01-01

    Low cost SMP (symmetric multi-processor) systems provide substantial CPU and I/O capacity. These features together with the ease of system integration make them an attractive and cost effective solution for a number of real-time applications in event selection. In ATLAS we consider them as intelligent input buffers (an "active" ROB complex), as event flow supervisors or as powerful processing nodes. Measurements of the performance of one off-the-shelf commercial 4- processor PC with two PCI buses, equipped with commercial FPGA based data source cards (microEnable) and running commercial software are presented and mapped on such applications together with a long-term programme of work. The SMP systems may be considered as an important building block in future data acquisition systems. (9 refs).

  12. Distributed Cognition (DCOG): Foundations for a Computational Associative Memory Model

    National Research Council Canada - National Science Library

    Eggleston, Robert G; McCreight, Katherine L

    2006-01-01

    .... In this report, we describe the foundations of a different type of computational architecture; one that we believe will be less susceptible to cognitive brittleness and can better scale to complex and ill-structured work domains...

  13. Behavioral Simulation and Performance Evaluation of Multi-Processor Architectures

    Directory of Open Access Journals (Sweden)

    Ausif Mahmood

    1996-01-01

    Full Text Available The development of multi-processor architectures requires extensive behavioral simulations to verify the correctness of design and to evaluate its performance. A high level language can provide maximum flexibility in this respect if the constructs for handling concurrent processes and a time mapping mechanism are added. This paper describes a novel technique for emulating hardware processes involved in a parallel architecture such that an object-oriented description of the design is maintained. The communication and synchronization between hardware processes is handled by splitting the processes into their equivalent subprograms at the entry points. The proper scheduling of these subprograms is coordinated by a timing wheel which provides a time mapping mechanism. Finally, a high level language pre-processor is proposed so that the timing wheel and the process emulation details can be made transparent to the user.

  14. Visualization and Data Analysis for High-Performance Computing

    Energy Technology Data Exchange (ETDEWEB)

    Sewell, Christopher Meyer [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-09-27

    This is a set of slides from a guest lecture for a class at the University of Texas, El Paso on visualization and data analysis for high-performance computing. The topics covered are the following: trends in high-performance computing; scientific visualization, such as OpenGL, ray tracing and volume rendering, VTK, and ParaView; data science at scale, such as in-situ visualization, image databases, distributed memory parallelism, shared memory parallelism, VTK-m, "big data", and then an analysis example.

  15. Mobile Agents in Networking and Distributed Computing

    CERN Document Server

    Cao, Jiannong

    2012-01-01

    The book focuses on mobile agents, which are computer programs that can autonomously migrate between network sites. This text introduces the concepts and principles of mobile agents, provides an overview of mobile agent technology, and focuses on applications in networking and distributed computing.

  16. Distributed quantum computing with single photon sources

    International Nuclear Information System (INIS)

    Beige, A.; Kwek, L.C.

    2005-01-01

    Full text: Distributed quantum computing requires the ability to perform nonlocal gate operations between the distant nodes (stationary qubits) of a large network. To achieve this, it has been proposed to interconvert stationary qubits with flying qubits. In contrast to this, we show that distributed quantum computing only requires the ability to encode stationary qubits into flying qubits but not the conversion of flying qubits into stationary qubits. We describe a scheme for the realization of an eventually deterministic controlled phase gate by performing measurements on pairs of flying qubits. Our scheme could be implemented with a linear optics quantum computing setup including sources for the generation of single photons on demand, linear optics elements and photon detectors. In the presence of photon loss and finite detector efficiencies, the scheme could be used to build large cluster states for one way quantum computing with a high fidelity. (author)

  17. Effect of Computer-Presented Organizational/Memory Aids on Problem Solving Behavior.

    Science.gov (United States)

    Steinberg, Esther R.; And Others

    This research studied the effects of computer-presented organizational/memory aids on problem solving behavior. The aids were either matrix or verbal charts shown on the display screen next to the problem. The 104 college student subjects were randomly assigned to one of the four conditions: type of chart (matrix or verbal chart) and use of charts…

  18. Design of Networks-on-Chip for Real-Time Multi-Processor Systems-on-Chip

    DEFF Research Database (Denmark)

    Sparsø, Jens

    2012-01-01

    This paper addresses the design of networks-on-chips for use in multi-processor systems-on-chips - the hardware platforms used in embedded systems. These platforms typically have to guarantee real-time properties, and as the network is a shared resource, it has to provide service guarantees...... (bandwidth and/or latency) to different communication flows. The paper reviews some past work in this field and the lessons learned, and the paper discusses ongoing research conducted as part of the project "Time-predictable Multi-Core Architecture for Embedded Systems" (T-CREST), supported by the European...

  19. Memory and selective attention in multiple sclerosis: cross-sectional computer-based assessment in a large outpatient sample.

    Science.gov (United States)

    Adler, Georg; Lembach, Yvonne

    2015-08-01

    Cognitive impairments may have a severe impact on everyday functioning and quality of life of patients with multiple sclerosis (MS). However, there are some methodological problems in the assessment and only a few studies allow a representative estimate of the prevalence and severity of cognitive impairments in MS patients. We applied a computer-based method, the memory and attention test (MAT), in 531 outpatients with MS, who were assessed at nine neurological practices or specialized outpatient clinics. The findings were compared with those obtained in an age-, sex- and education-matched control group of 84 healthy subjects. Episodic short-term memory was substantially decreased in the MS patients. About 20% of them reached a score of only less than two standard deviations below the mean of the control group. The episodic short-term memory score was negatively correlated with the EDSS score. Minor but also significant impairments in the MS patients were found for verbal short-term memory, episodic working memory and selective attention. The computer-based MAT was found to be useful for a routine assessment of cognition in MS outpatients.

  20. Distributed MRI reconstruction using Gadgetron-based cloud computing.

    Science.gov (United States)

    Xue, Hui; Inati, Souheil; Sørensen, Thomas Sangild; Kellman, Peter; Hansen, Michael S

    2015-03-01

    To expand the open source Gadgetron reconstruction framework to support distributed computing and to demonstrate that a multinode version of the Gadgetron can be used to provide nonlinear reconstruction with clinically acceptable latency. The Gadgetron framework was extended with new software components that enable an arbitrary number of Gadgetron instances to collaborate on a reconstruction task. This cloud-enabled version of the Gadgetron was deployed on three different distributed computing platforms ranging from a heterogeneous collection of commodity computers to the commercial Amazon Elastic Compute Cloud. The Gadgetron cloud was used to provide nonlinear, compressed sensing reconstruction on a clinical scanner with low reconstruction latency (eg, cardiac and neuroimaging applications). The proposed setup was able to handle acquisition and 11 -SPIRiT reconstruction of nine high temporal resolution real-time, cardiac short axis cine acquisitions, covering the ventricles for functional evaluation, in under 1 min. A three-dimensional high-resolution brain acquisition with 1 mm(3) isotropic pixel size was acquired and reconstructed with nonlinear reconstruction in less than 5 min. A distributed computing enabled Gadgetron provides a scalable way to improve reconstruction performance using commodity cluster computing. Nonlinear, compressed sensing reconstruction can be deployed clinically with low image reconstruction latency. © 2014 Wiley Periodicals, Inc.