large-scale shared-memory multiprocessors: Topics by WorldWideScience.org

Sample records for large-scale shared-memory multiprocessors

Assessing Programming Costs of Explicit Memory Localization on a Large Scale Shared Memory Multiprocessor

Directory of Open Access Journals (Sweden)

Silvio Picano

1992-01-01

Full Text Available We present detailed experimental work involving a commercially available large scale shared memory multiple instruction stream-multiple data stream (MIMD parallel computer having a software controlled cache coherence mechanism. To make effective use of such an architecture, the programmer is responsible for designing the program's structure to match the underlying multiprocessors capabilities. We describe the techniques used to exploit our multiprocessor (the BBN TC2000 on a network simulation program, showing the resulting performance gains and the associated programming costs. We show that an efficient implementation relies heavily on the user's ability to explicitly manage the memory system.
Multiprocessor shared-memory information exchange

International Nuclear Information System (INIS)

Santoline, L.L.; Bowers, M.D.; Crew, A.W.; Roslund, C.J.; Ghrist, W.D. III

1989-01-01

In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, is designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange
GOTHIC memory management : a multiprocessor shared single level store

OpenAIRE

Michel , Béatrice

1990-01-01

Gothic purpose is to build an object-oriented fault-tolerant distributed operating system for a local area network of multiprocessor workstations. This paper describes Gothic memory manager. It realizes the sharing of the secondary memory space between any process running on the Gothic system. Processes on different processors can communicate by sharing permanent information. The manager implements a shared single level storage with an invalidation protocol working on disk-pages to maintain s...
Elastic pointer directory organization for scalable shared memory multiprocessors

Institute of Scientific and Technical Information of China (English)

Yuhang Liu; Mingfa Zhu; Limin Xiao

2014-01-01

In the field of supercomputing, one key issue for scal-able shared-memory multiprocessors is the design of the directory which denotes the sharing state for a cache block. A good direc-tory design intends to achieve three key attributes: reasonable memory overhead, sharer position precision and implementation complexity. However, researchers often face the problem that gain-ing one attribute may result in losing another. The paper proposes an elastic pointer directory (EPD) structure based on the analysis of shared-memory applications, taking the fact that the number of sharers for each directory entry is typical y smal . Analysis re-sults show that for 4 096 nodes, the ratio of memory overhead to the ful-map directory is 2.7%. Theoretical analysis and cycle-accurate execution-driven simulations on a 16 and 64-node cache coherence non uniform memory access (CC-NUMA) multiproces-sor show that the corresponding pointer overflow probability is reduced significantly. The performance is observed to be better than that of a limited pointers directory and almost identical to the ful-map directory, except for the slight implementation complex-ity. Using the directory cache to explore directory access locality is also studied. The experimental result shows that this is a promis-ing approach to be used in the state-of-the-art high performance computing domain.
Scalable shared-memory multiprocessing

CERN Document Server

Lenoski, Daniel E

1995-01-01

Dr. Lenoski and Dr. Weber have experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. They were key contributors to the architecture and design of the DASH multiprocessor. Currently, they are involved with commercializing scalable shared-memory technology.
Shared random access memory resource for multiprocessor real-time systems

International Nuclear Information System (INIS)

Dimmler, D.G.; Hardy, W.H. II

1977-01-01

A shared random-access memory resource is described which is used within real-time data acquisition and control systems with multiprocessor and multibus organizations. Hardware and software aspects are discussed in a specific example where interconnections are done via a UNIBUS. The general applicability of the approach is also discussed
Communication and Memory Architecture Design of Application-Specific High-End Multiprocessors

Directory of Open Access Journals (Sweden)

Yahya Jan

2012-01-01

Full Text Available This paper is devoted to the design of communication and memory architectures of massively parallel hardware multiprocessors necessary for the implementation of highly demanding applications. We demonstrated that for the massively parallel hardware multiprocessors the traditionally used flat communication architectures and multi-port memories do not scale well, and the memory and communication network influence on both the throughput and circuit area dominates the processors influence. To resolve the problems and ensure scalability, we proposed to design highly optimized application-specific hierarchical and/or partitioned communication and memory architectures through exploring and exploiting the regularity and hierarchy of the actual data flows of a given application. Furthermore, we proposed some data distribution and related data mapping schemes in the shared (global partitioned memories with the aim to eliminate the memory access conflicts, as well as, to ensure that our communication design strategies will be applicable. We incorporated these architecture synthesis strategies into our quality-driven model-based multi-processor design method and related automated architecture exploration framework. Using this framework, we performed a large series of experiments that demonstrate many various important features of the synthesized memory and communication architectures. They also demonstrate that our method and related framework are able to efficiently synthesize well scalable memory and communication architectures even for the high-end multiprocessors. The gains as high as 12-times in performance and 25-times in area can be obtained when using the hierarchical communication networks instead of the flat networks. However, for the high parallelism levels only the partitioned approach ensures the scalability in performance.
A general model for memory interference in a multiprocessor system with memory hierarchy

Science.gov (United States)

Taha, Badie A.; Standley, Hilda M.

1989-01-01

The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.
Parallel-vector algorithms for particle simulations on shared-memory multiprocessors

International Nuclear Information System (INIS)

Nishiura, Daisuke; Sakaguchi, Hide

2011-01-01

Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.
Shared performance monitor in a multiprocessor system

Science.gov (United States)

Chiu, George; Gara, Alan G.; Salapura, Valentina

2012-07-24

A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT

Energy Technology Data Exchange (ETDEWEB)

Secchi, Simone; Tumeo, Antonino; Villa, Oreste

2011-07-27

Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy in reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.
Cache aware mapping of streaming apllications on a multiprocessor system-on-chip

NARCIS (Netherlands)

Moonen, A.J.M.; Bekooij, M.J.G.; Berg, van den R.M.J.; Meerbergen, van J.; Sciuto, D.; Peng, Z.

2008-01-01

Efficient use of the memory hierarchy is critical for achieving high performance in a multiprocessor system- on-chip. An external memory that is shared between processors is a bottleneck in current and future systems. Cache misses and a large cache miss penalty contribute to a low processor
Shared Memory Parallelization of an Implicit ADI-type CFD Code

Science.gov (United States)

Hauser, Th.; Huang, P. G.

1999-01-01

A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
The performance of disk arrays in shared-memory database machines

Science.gov (United States)

Katz, Randy H.; Hong, Wei

1993-01-01

In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.
DiFX: A software correlator for very long baseline interferometry using multi-processor computing environments

OpenAIRE

Deller, A. T.; Tingay, S. J.; Bailes, M.; West, C.

2007-01-01

We describe the development of an FX style correlator for Very Long Baseline Interferometry (VLBI), implemented in software and intended to run in multi-processor computing environments, such as large clusters of commodity machines (Beowulf clusters) or computers specifically designed for high performance computing, such as multi-processor shared-memory machines. We outline the scientific and practical benefits for VLBI correlation, these chiefly being due to the inherent flexibility of softw...
Hardware support for CSP on a Java chip multiprocessor

DEFF Research Database (Denmark)

Gruian, Flavius; Schoeberl, Martin

2013-01-01

Due to memory bandwidth limitations, chip multiprocessors (CMPs) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded...... applications. Programmatically, the Communicating Sequential Processes (CSPs) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on......-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory.The presented solution is scalable and also specific for our limited resources and real-time predictability requirements. CMP architectures of three to eight processors were...
Multiprocessor systems and their concurrency

Energy Technology Data Exchange (ETDEWEB)

Starke, P H

1984-01-01

A multiprocessor system can be considered as a collection of finite automata which communicate over channels or shared memory units. The behaviour of such a system can be described by a semilanguage. This approach allows to define a numerical measure for the concurrency of multiprocessor systems and of distributed systems. This measure is characterized algebraically and the reconfiguration problem asking for an algorithm to construct an l-processor system which is equivalent to a given n-processor system is solved in the paper. 6 references.
Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

Energy Technology Data Exchange (ETDEWEB)

Ohmacht, Martin

2017-08-15

In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.
Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

Science.gov (United States)

Ohmacht, Martin

2014-09-09

In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.
Optical RAM-enabled cache memory and optical routing for chip multiprocessors: technologies and architectures

Science.gov (United States)

Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.

2014-03-01

The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.

On the Parallel Elliptic Single/Multigrid Solutions about Aligned and Nonaligned Bodies Using the Virtual Machine for Multiprocessors

Directory of Open Access Journals (Sweden)

A. Averbuch

1994-01-01

Full Text Available Parallel elliptic single/multigrid solutions around an aligned and nonaligned body are presented and implemented on two multi-user and single-user shared memory multiprocessors (Sequent Symmetry and MOS and on a distributed memory multiprocessor (a Transputer network. Our parallel implementation uses the Virtual Machine for Muli-Processors (VMMP, a software package that provides a coherent set of services for explicitly parallel application programs running on diverse multiple instruction multiple data (MIMD multiprocessors, both shared memory and message passing. VMMP is intended to simplify parallel program writing and to promote portable and efficient programming. Furthermore, it ensures high portability of application programs by implementing the same services on all target multiprocessors. The performance of our algorithm is investigated in detail. It is seen to fit well the above architectures when the number of processors is less than the maximal number of grid points along the axes. In general, the efficiency in the nonaligned case is higher than in the aligned case. Alignment overhead is observed to be up to 200% in the shared-memory case and up to 65% in the message-passing case. We have demonstrated that when using VMMP, the portability of the algorithms is straightforward and efficient.
Parallelising a molecular dynamics algorithm on a multi-processor workstation

Science.gov (United States)

Müller-Plathe, Florian

1990-12-01

The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.
Performances of multiprocessor multidisk architectures for continuous media storage

Science.gov (United States)

Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.

1996-03-01

Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.
Conditional load and store in a shared memory

Science.gov (United States)

Blumrich, Matthias A; Ohmacht, Martin

2015-02-03

A method, system and computer program product for implementing load-reserve and store-conditional instructions in a multi-processor computing system. The computing system includes a multitude of processor units and a shared memory cache, and each of the processor units has access to the memory cache. In one embodiment, the method comprises providing the memory cache with a series of reservation registers, and storing in these registers addresses reserved in the memory cache for the processor units as a result of issuing load-reserve requests. In this embodiment, when one of the processor units makes a request to store data in the memory cache using a store-conditional request, the reservation registers are checked to determine if an address in the memory cache is reserved for that processor unit. If an address in the memory cache is reserved for that processor, the data are stored at this address.
Multigrid solution of diffusion equations on distributed memory multiprocessor systems

International Nuclear Information System (INIS)

Finnemann, H.

1988-01-01

The subject is the solution of partial differential equations for simulation of the reactor core on high-performance computers. The parallelization and implementation of nodal multigrid diffusion algorithms on array and ring configurations of the DIRMU multiprocessor system is outlined. The particular iteration scheme employed in the nodal expansion method appears similarly efficient in serial and parallel environments. The combination of modern multi-level techniques with innovative hardware (vector-multiprocessor systems) provides powerful tools needed for real time simulation of physical systems. The parallel efficiencies range from 70 to 90%. The same performance is estimated for large problems on large multiprocessor systems being designed at present. (orig.) [de
Debugging in a multi-processor environment

International Nuclear Information System (INIS)

Spann, J.M.

1981-01-01

The Supervisory Control and Diagnostic System (SCDS) for the Mirror Fusion Test Facility (MFTF) consists of nine 32-bit minicomputers arranged in a tightly coupled distributed computer system utilizing a share memory as the data exchange medium. Debugging of more than one program in the multi-processor environment is a difficult process. This paper describes what new tools were developed and how the testing of software is performed in the SCDS for the MFTF project
Multiprocessors for high energy physics

International Nuclear Information System (INIS)

Pohl, M.

1987-01-01

I review the role, status and progress of multiprocessor projects relevant to high energy physics. A short overview of the large variety of multiprocessors architectures is given, with special emphasis on machines suitable for experimental data reconstruction. A lot of progress has been made in the attempt to make the use of multiprocessors less painful by creating a ''Parallel Programming Environment'' supporting the non-expert user. A high degree of usability has been reached for coarse grain (event level) parallelism. The program development tools available on various systems (subroutine packages, preprocessors and parallelizing compilers) are discussed in some detail. Tools for execution control and debugging are also developing, thus opening the path from dedicated systems for large scale, stable production towards support of a more general job mix. At medium term, multiprocessors will thus cover a growing fraction of the typical high energy physics computing task. (orig.)
3D-TV Rendering on a Multiprocessor System on a Chip

NARCIS (Netherlands)

Van Eijndhoven, J.T.J.; Li, X.

2006-01-01

This thesis focuses on the issue of mapping 3D-TV rendering applications to a multiprocessor platform. The target platform aims to address tomorrow's multi-media consumer market. The prototype chip, called Wasabi, contains a set of TriMedia processors that communicate viaa shared memory, fast
Hardware locks for a real-time Java chip multiprocessor

DEFF Research Database (Denmark)

Strøm, Torur Biskopstø; Puffitsch, Wolfgang; Schoeberl, Martin

2016-01-01

A software locking mechanism commonly protects shared resources for multithreaded applications. This mechanism can, especially in chip-multiprocessor systems, result in a large synchronization overhead. For real-time systems in particular, this overhead increases the worst-case execution time....... This improvement can allow a larger number of real-time tasks to be reliably scheduled on a multiprocessor real-time platform....
A real-time multichannel memory controller and optimal mapping of memory clients to memory channels

NARCIS (Netherlands)

Gomony, M.D.; Akesson, K.B.; Goossens, K.G.W.

2015-01-01

Ever-increasing demands for main memory bandwidth and memory speed/power tradeoff led to the introduction of memories with multiple memory channels, such as Wide IO DRAM. Efficient utilization of a multichannel memory as a shared resource in multiprocessor real-time systems depends on mapping of the
A Stream Tilling Approach to Surface Area Estimation for Large Scale Spatial Data in a Shared Memory System

Directory of Open Access Journals (Sweden)

Liu Jiping

2017-12-01

Full Text Available Surface area estimation is a widely used tool for resource evaluation in the physical world. When processing large scale spatial data, the input/output (I/O can easily become the bottleneck in parallelizing the algorithm due to the limited physical memory resources and the very slow disk transfer rate. In this paper, we proposed a stream tilling approach to surface area estimation that first decomposed a spatial data set into tiles with topological expansions. With these tiles, the one-to-one mapping relationship between the input and the computing process was broken. Then, we realized a streaming framework towards the scheduling of the I/O processes and computing units. Herein, each computing unit encapsulated a same copy of the estimation algorithm, and multiple asynchronous computing units could work individually in parallel. Finally, the performed experiment demonstrated that our stream tilling estimation can efficiently alleviate the heavy pressures from the I/O-bound work, and the measured speedup after being optimized have greatly outperformed the directly parallel versions in shared memory systems with multi-core processors.
A multiprocessor computer simulation model employing a feedback scheduler/allocator for memory space and bandwidth matching and TMR processing

Science.gov (United States)

Bradley, D. B.; Irwin, J. D.

1974-01-01

A computer simulation model for a multiprocessor computer is developed that is useful for studying the problem of matching multiprocessor's memory space, memory bandwidth and numbers and speeds of processors with aggregate job set characteristics. The model assumes an input work load of a set of recurrent jobs. The model includes a feedback scheduler/allocator which attempts to improve system performance through higher memory bandwidth utilization by matching individual job requirements for space and bandwidth with space availability and estimates of bandwidth availability at the times of memory allocation. The simulation model includes provisions for specifying precedence relations among the jobs in a job set, and provisions for specifying precedence execution of TMR (Triple Modular Redundant and SIMPLEX (non redundant) jobs.
Multiprocessor based data acquisition system for radiation monitoring in nuclear reactors

International Nuclear Information System (INIS)

Pansare, M.G.; Narsaiah, A.; Anantha Krishnan, T.S.

1989-01-01

Expensive minicomputers are required for building powerful Data Acquisition Systems (DAS) capable of scanning and processing large number of signals in a real-time environment. However by using the inexpensive microprocessors in multiprocessor configuration it is possible to build DASs that are as powerful as minicomputer based systems at much lesser cost. This paper describes such a multiprocessor based DAS designed for acquiring data from various radiation monitoring instruments of a nuclear reactor. The system is built by using MULTIBUS standard boards based on intel 8086, 16 bit microprocessor, with local and shared memory. The system monitors upto 128 analog input channels, 64 digital input channels and actuates upto 128 digital output contacts. The system continuously checks for the alarm condition of the input channels and displays the alarm status on an ALARM CRT. Facility has been provided for the transfer of data to a central computer. At any instant of time, the information regarding different channels being monitored is available from the local console as well as through five remote terminals located at various places in the reactor building. (author)
Multiprocessor architecture: Synthesis and evaluation

Science.gov (United States)

Standley, Hilda M.

1990-01-01

Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.
Multiprocessor data acquisition system

International Nuclear Information System (INIS)

Haumann, J.R.; Crawford, R.K.

1987-01-01

A multiprocessor data acquisition system has been built to replace the single processor systems at the Intense Pulsed Neutron Source (IPNS) at Argonne National Laboratory. The multiprocessor system was needed to accommodate the higher data rates at IPNS brought about by improvements in the source and changes in instrument configurations. This paper describes the hardware configuration of the system and the method of task sharing and compares results to the single processor system
Topology Optimization of Large Scale Stokes Flow Problems

DEFF Research Database (Denmark)

Aage, Niels; Poulsen, Thomas Harpsøe; Gersborg-Hansen, Allan

2008-01-01

This note considers topology optimization of large scale 2D and 3D Stokes flow problems using parallel computations. We solve problems with up to 1.125.000 elements in 2D and 128.000 elements in 3D on a shared memory computer consisting of Sun UltraSparc IV CPUs.......This note considers topology optimization of large scale 2D and 3D Stokes flow problems using parallel computations. We solve problems with up to 1.125.000 elements in 2D and 128.000 elements in 3D on a shared memory computer consisting of Sun UltraSparc IV CPUs....
Overview of the Force Scientific Parallel Language

Directory of Open Access Journals (Sweden)

Gita Alaghband

1994-01-01

Full Text Available The Force parallel programming language designed for large-scale shared-memory multiprocessors is presented. The language provides a number of parallel constructs as extensions to the ordinary Fortran language and is implemented as a two-level macro preprocessor to support portability across shared memory multiprocessors. The global parallelism model on which the Force is based provides a powerful parallel language. The parallel constructs, generic synchronization, and freedom from process management supported by the Force has resulted in structured parallel programs that are ported to the many multiprocessors on which the Force is implemented. Two new parallel constructs for looping and functional decomposition are discussed. Several programming examples to illustrate some parallel programming approaches using the Force are also presented.
Large scale particle simulations in a virtual memory computer

International Nuclear Information System (INIS)

Gray, P.C.; Million, R.; Wagner, J.S.; Tajima, T.

1983-01-01

Virtual memory computers are capable of executing large-scale particle simulations even when the memory requirements exceeds the computer core size. The required address space is automatically mapped onto slow disc memory the the operating system. When the simulation size is very large, frequent random accesses to slow memory occur during the charge accumulation and particle pushing processes. Assesses to slow memory significantly reduce the excecution rate of the simulation. We demonstrate in this paper that with the proper choice of sorting algorithm, a nominal amount of sorting to keep physically adjacent particles near particles with neighboring array indices can reduce random access to slow memory, increase the efficiency of the I/O system, and hence, reduce the required computing time. (orig.)
Large-scale particle simulations in a virtual-memory computer

International Nuclear Information System (INIS)

Gray, P.C.; Wagner, J.S.; Tajima, T.; Million, R.

1982-08-01

Virtual memory computers are capable of executing large-scale particle simulations even when the memory requirements exceed the computer core size. The required address space is automatically mapped onto slow disc memory by the operating system. When the simulation size is very large, frequent random accesses to slow memory occur during the charge accumulation and particle pushing processes. Accesses to slow memory significantly reduce the execution rate of the simulation. We demonstrate in this paper that with the proper choice of sorting algorithm, a nominal amount of sorting to keep physically adjacent particles near particles with neighboring array indices can reduce random access to slow memory, increase the efficiency of the I/O system, and hence, reduce the required computing time
Utilizing a multiprocessor architecture - The performance of MIDAS

International Nuclear Information System (INIS)

Maples, C.; Logan, D.; Meng, J.; Rathbun, W.; Weaver, D.

1983-01-01

The MIDAS architecture organizes multiple CPUs into clusters called distributed subsystems. Each subsystem consists of an array of processors controlled by a supervisory CPU. The multiprocessor array is composed of commercial CPUs (with floating point hardware) and specialized processing elements. Interprocessor communication within the array may occur either through switched memory modules or common shared memory. The architecture permits multiple processors to be focused on single problems. A distributed subsystem has been constructed and tested. It currently consists of a supervisor CPU; 16 blocks of independently switchable memory; 9 general purpose, VAX-class CPUs; and 2 specialized pipelined processors to handle I/O. Results on a variety of problems indicate that the subsystem performs 8 to 15 times faster than a standard computer with an identical CPU. The difference in performance represents the effect of differing CPU and I/O requirements

Memory Hierarchy Design for Next Generation Scalable Many-core Platforms

OpenAIRE

Azarkhish, Erfan

2016-01-01

Performance and energy consumption in modern computing platforms is largely dominated by the memory hierarchy. The increasing computational power in the multiprocessors and accelerators, and the emergence of the data-intensive workloads (e.g. large-scale graph traversal and scientific algorithms) requiring fast transfer of large volumes of data, are two main trends which intensify this problem by putting even higher pressure on the memory hierarchy. This increasing gap between computation spe...
An optimal multi-channel memory controller for real-time systems

NARCIS (Netherlands)

Gomony, M.D.; Akesson, K.B.; Goossens, K.G.W.

2013-01-01

Optimal utilization of a multi-channel memory, such as Wide IO DRAM, as shared memory in multi-processor platforms depends on the mapping of memory clients to the memory channels, the granularity at which the memory requests are interleaved in each channel, and the bandwidth and memory capacity
Meeting the memory challenges of brain-scale network simulation

Directory of Open Access Journals (Sweden)

Susanne eKunkel

2012-01-01

Full Text Available The development of high-performance simulation software is crucial for studying the brain connectome. Using connectome data to generate neurocomputational models requires software capable of coping with models on a variety of scales: from the microscale, investigating plasticity and dynamics of circuits in local networks, to the macroscale, investigating the interactions between distinct brain regions. Prior to any serious dynamical investigation, the first task of network simulations is to check the consistency of data integrated in the connectome and constrain ranges for yet unknown parameters. Thanks to distributed computing techniques, it is possible today to routinely simulate local cortical networks of around 10^5 neurons with up to 10^9 synapses on clusters and multi-processor shared-memory machines. However, brain-scale networks are one or two orders of magnitude larger than such local networks, in terms of numbers of neurons and synapses as well as in terms of computational load. Such networks have been studied in individual studies, but the underlying simulation technologies have neither been described in sufficient detail to be reproducible nor made publicly available. Here, we discover that as the network model sizes approach the regime of meso- and macroscale simulations, memory consumption on individual compute nodes becomes a critical bottleneck. This is especially relevant on modern supercomputers such as the Bluegene/P architecture where the available working memory per CPU core is rather limited. We develop a simple linear model to analyze the memory consumption of the constituent components of a neuronal simulator as a function of network size and the number of cores used. This approach has multiple benefits. The model enables identification of key contributing components to memory saturation and prediction of the effects of potential improvements to code before any implementation takes place.
Software for event oriented processing on multiprocessor systems

International Nuclear Information System (INIS)

Fischler, M.; Areti, H.; Biel, J.; Bracker, S.; Case, G.; Gaines, I.; Husby, D.; Nash, T.

1984-08-01

Computing intensive problems that require the processing of numerous essentially independent events are natural customers for large scale multi-microprocessor systems. This paper describes the software required to support users with such problems in a multiprocessor environment. It is based on experience with and development work aimed at processing very large amounts of high energy physics data
A survey of Tumult, a real-time multi-processor system

International Nuclear Information System (INIS)

Jansen, P.G.

1986-01-01

Tumult (Twente University MULTi processor system) is the name of an ongoing project aiming at the design and implementation of a modular extendible multiprocessor system. All memory is distributed and processors communicate in parallel via a fast and reliable local switching network instead of a shared bus. A distributed real-time operating system is being designed and implemented, consisting of a multi-tasking subsystem per processor. Processes can communicate via a message passing mechanism. Communication links and processes are dynamically created and disposed by the application. In this article a brief description of the system is given; communication aspects are emphasized. (Auth.)
Investigation of implementing a synchronization protocol under multiprocessors hierarchical scheduling

NARCIS (Netherlands)

Nemati, F.; Behnam, M.; Bril, R.J.

2009-01-01

In the multi-core and multiprocessor domain, there has been considerable work done on scheduling techniques assuming that real-time tasks are independent. In practice a typical real-time system usually share logical resources among tasks. However, synchronization in the multiprocessor area has not
A possible approach to estimating the operational efficiency of multiprocessor systems

International Nuclear Information System (INIS)

Kuznetsov, N.Y.; Gorlach, S.P.; Sumskaya, A.A.

1984-01-01

This article presents a mathematical model that constructs the upper and lower estimates evaluating the efficiency of solution of a large class of problems using a multiprocessor system with a specific architecture. Efficiency depends on a system's architecture (e.g., the number of processors, memory volume, the number of communication links, commutation speed) and the types of problems it is intended to solve. The behavior of the model is considered in a stationary mode. The model is used to evaluate the efficiency of a particular algorithm implemented in a multiprocessor system. It is concluded that the model is flexible and enables the investigation of a broad class of problems in computational mathematics, including linear algebra and boundary-value problems of mathematical physics
Hybrid shared/distributed parallelism for 3D characteristics transport solvers

International Nuclear Information System (INIS)

Dahmani, M.; Roy, R.

2005-01-01

In this paper, we will present a new hybrid parallel model for solving large-scale 3-dimensional neutron transport problems used in nuclear reactor simulations. Large heterogeneous reactor problems, like the ones that occurs when simulating Candu cores, have remained computationally intensive and impractical for routine applications on single-node or even vector computers. Based on the characteristics method, this new model is designed to solve the transport equation after distributing the calculation load on a network of shared memory multi-processors. The tracks are either generated on the fly at each characteristics sweep or stored in sequential files. The load balancing is taken into account by estimating the calculation load of tracks and by distributing batches of uniform load on each node of the network. Moreover, the communication overhead can be predicted after benchmarking the latency and bandwidth using appropriate network test suite. These models are useful for predicting the performance of the parallel applications and to analyze the scalability of the parallel systems. (authors)
Scaling Non-Regular Shared-Memory Codes by Reusing Custom Loop Schedules

Directory of Open Access Journals (Sweden)

Dimitrios S. Nikolopoulos

2003-01-01

Full Text Available In this paper we explore the idea of customizing and reusing loop schedules to improve the scalability of non-regular numerical codes in shared-memory architectures with non-uniform memory access latency. The main objective is to implicitly setup affinity links between threads and data, by devising loop schedules that achieve balanced work distribution within irregular data spaces and reusing them as much as possible along the execution of the program for better memory access locality. This transformation provides a great deal of flexibility in optimizing locality, without compromising the simplicity of the shared-memory programming paradigm. In particular, the programmer does not need to explicitly distribute data between processors. The paper presents practical examples from real applications and experiments showing the efficiency of the approach.
An Adaptive Insertion and Promotion Policy for Partitioned Shared Caches

Science.gov (United States)

Mahrom, Norfadila; Liebelt, Michael; Raof, Rafikha Aliana A.; Daud, Shuhaizar; Hafizah Ghazali, Nur

2018-03-01

Cache replacement policies in chip multiprocessors (CMP) have been investigated extensively and proven able to enhance shared cache management. However, competition among multiple processors executing different threads that require simultaneous access to a shared memory may cause cache contention and memory coherence problems on the chip. These issues also exist due to some drawbacks of the commonly used Least Recently Used (LRU) policy employed in multiprocessor systems, which are because of the cache lines residing in the cache longer than required. In image processing analysis of for example extra pulmonary tuberculosis (TB), an accurate diagnosis for tissue specimen is required. Therefore, a fast and reliable shared memory management system to execute algorithms for processing vast amount of specimen image is needed. In this paper, the effects of the cache replacement policy in a partitioned shared cache are investigated. The goal is to quantify whether better performance can be achieved by using less complex replacement strategies. This paper proposes a Middle Insertion 2 Positions Promotion (MI2PP) policy to eliminate cache misses that could adversely affect the access patterns and the throughput of the processors in the system. The policy employs a static predefined insertion point, near distance promotion, and the concept of ownership in the eviction policy to effectively improve cache thrashing and to avoid resource stealing among the processors.
The FORCE - A highly portable parallel programming language

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
One-Step Programmable Arbiters for Multiprocessors

DEFF Research Database (Denmark)

Højberg, Kristian Søe

1978-01-01

When processors in a multiprocessor system demand service from a shared bus in an asynchronous mode, a synchronous state arbiter resolves conflicts and allocates resources. Independent of the combination of requests, only one state transition is required from a free to allocated resource...
Speedup predictions on large scientific parallel programs

International Nuclear Information System (INIS)

Williams, E.; Bobrowicz, F.

1985-01-01

How much speedup can we expect for large scientific parallel programs running on supercomputers. For insight into this problem we extend the parallel processing environment currently existing on the Cray X-MP (a shared memory multiprocessor with at most four processors) to a simulated N-processor environment, where N greater than or equal to 1. Several large scientific parallel programs from Los Alamos National Laboratory were run in this simulated environment, and speedups were predicted. A speedup of 14.4 on 16 processors was measured for one of the three most used codes at the Laboratory
Shared memories reveal shared structure in neural activity across individuals

Science.gov (United States)

Chen, J.; Leong, Y.C.; Honey, C.J.; Yong, C.H.; Norman, K.A.; Hasson, U.

2016-01-01

Our lives revolve around sharing experiences and memories with others. When different people recount the same events, how similar are their underlying neural representations? Participants viewed a fifty-minute movie, then verbally described the events during functional MRI, producing unguided detailed descriptions lasting up to forty minutes. As each person spoke, event-specific spatial patterns were reinstated in default-network, medial-temporal, and high-level visual areas. Individual event patterns were both highly discriminable from one another and similar between people, suggesting consistent spatial organization. In many high-order areas, patterns were more similar between people recalling the same event than between recall and perception, indicating systematic reshaping of percept into memory. These results reveal the existence of a common spatial organization for memories in high-level cortical areas, where encoded information is largely abstracted beyond sensory constraints; and that neural patterns during perception are altered systematically across people into shared memory representations for real-life events. PMID:27918531
One-way shared memory

DEFF Research Database (Denmark)

Schoeberl, Martin

2018-01-01

Standard multicore processors use the shared main memory via the on-chip caches for communication between cores. However, this form of communication has two limitations: (1) it is hardly time-predictable and therefore not a good solution for real-time systems and (2) this single shared memory...... is a bottleneck in the system. This paper presents a communication architecture for time-predictable multicore systems where core-local memories are distributed on the chip. A network-on-chip constantly copies data from a sender core-local memory to a receiver core-local memory. As this copying is performed...... in one direction we call this architecture a one-way shared memory. With the use of time-division multiplexing for the memory accesses and the network-on-chip routers we achieve a time-predictable solution where the communication latency and bandwidth can be bounded. An example architecture for a 3...
Episodic memory in aspects of large-scale brain networks

Science.gov (United States)

Jeong, Woorim; Chung, Chun Kee; Kim, June Sic

2015-01-01

Understanding human episodic memory in aspects of large-scale brain networks has become one of the central themes in neuroscience over the last decade. Traditionally, episodic memory was regarded as mostly relying on medial temporal lobe (MTL) structures. However, recent studies have suggested involvement of more widely distributed cortical network and the importance of its interactive roles in the memory process. Both direct and indirect neuro-modulations of the memory network have been tried in experimental treatments of memory disorders. In this review, we focus on the functional organization of the MTL and other neocortical areas in episodic memory. Task-related neuroimaging studies together with lesion studies suggested that specific sub-regions of the MTL are responsible for specific components of memory. However, recent studies have emphasized that connectivity within MTL structures and even their network dynamics with other cortical areas are essential in the memory process. Resting-state functional network studies also have revealed that memory function is subserved by not only the MTL system but also a distributed network, particularly the default-mode network (DMN). Furthermore, researchers have begun to investigate memory networks throughout the entire brain not restricted to the specific resting-state network (RSN). Altered patterns of functional connectivity (FC) among distributed brain regions were observed in patients with memory impairments. Recently, studies have shown that brain stimulation may impact memory through modulating functional networks, carrying future implications of a novel interventional therapy for memory impairment. PMID:26321939
Episodic memory in aspects of large-scale brain networks

Directory of Open Access Journals (Sweden)

Woorim eJeong

2015-08-01

Full Text Available Understanding human episodic memory in aspects of large-scale brain networks has become one of the central themes in neuroscience over the last decade. Traditionally, episodic memory was regarded as mostly relying on medial temporal lobe (MTL structures. However, recent studies have suggested involvement of more widely distributed cortical network and the importance of its interactive roles in the memory process. Both direct and indirect neuro-modulations of the memory network have been tried in experimental treatments of memory disorders. In this review, we focus on the functional organization of the MTL and other neocortical areas in episodic memory. Task-related neuroimaging studies together with lesion studies suggested that specific sub-regions of the MTL are responsible for specific components of memory. However, recent studies have emphasized that connectivity within MTL structures and even their network dynamics with other cortical areas are essential in the memory process. Resting-state functional network studies also have revealed that memory function is subserved by not only the MTL system but also a distributed network, particularly the default-mode network. Furthermore, researchers have begun to investigate memory networks throughout the entire brain not restricted to the specific resting-state network. Altered patterns of functional connectivity among distributed brain regions were observed in patients with memory impairments. Recently, studies have shown that brain stimulation may impact memory through modulating functional networks, carrying future implications of a novel interventional therapy for memory impairment.
A Time-predictable Memory Network-on-Chip

DEFF Research Database (Denmark)

Schoeberl, Martin; Chong, David VH; Puffitsch, Wolfgang

2014-01-01

To derive safe bounds on worst-case execution times (WCETs), all components of a computer system need to be time-predictable: the processor pipeline, the caches, the memory controller, and memory arbitration on a multicore processor. This paper presents a solution for time-predictable memory...... arbitration and access for chip-multiprocessors. The memory network-on-chip is organized as a tree with time-division multiplexing (TDM) of accesses to the shared memory. The TDM based arbitration completely decouples processor cores and allows WCET analysis of the memory accesses on individual cores without...
Performance of Multithreaded Chip Multiprocessors And Implications for Operating System Design

OpenAIRE

Fedorova, Alexandra; Seltzer, Margo I.; Small, Christopher A.; Nussbaum, Daniel

2005-01-01

An operating system’s design is often influenced by the architecture of the target hardware. While uniprocessor and multiprocessor architectures are well understood, such is not the case for multithreaded chip multiprocessors (CMT) – a new generation of processors designed to improve performance of memory-intensive applications. The first systems equipped with CMT processors are just becoming available, so it is critical that we now understand how to obtain the best performance from such syst...
Monte Carlo photon transport on shared memory and distributed memory parallel processors

International Nuclear Information System (INIS)

Martin, W.R.; Wan, T.C.; Abdel-Rahman, T.S.; Mudge, T.N.; Miura, K.

1987-01-01

Parallelized Monte Carlo algorithms for analyzing photon transport in an inertially confined fusion (ICF) plasma are considered. Algorithms were developed for shared memory (vector and scalar) and distributed memory (scalar) parallel processors. The shared memory algorithm was implemented on the IBM 3090/400, and timing results are presented for dedicated runs with two, three, and four processors. Two alternative distributed memory algorithms (replication and dispatching) were implemented on a hypercube parallel processor (1 through 64 nodes). The replication algorithm yields essentially full efficiency for all cube sizes; with the 64-node configuration, the absolute performance is nearly the same as with the CRAY X-MP. The dispatching algorithm also yields efficiencies above 80% in a large simulation for the 64-processor configuration

Control and Reliability of Optical Networks in Multiprocessors

Science.gov (United States)

Olsen, James Jonathan

1993-01-01

-based feedback system using BER levels for laser drive control. For hard failures, a common telecommunications solution is to provide redundant spare optical links to replace failed ones. Unfortunately, this involves the inclusion of many extra, otherwise unneeded optical links, most of which will remain unused throughout the system lifetime. I present a new approach, which I call 'bandwidth fallback', which allows continued use of partially-failed channels while still accepting full-width data inputs. This provides, at a very small performance penalty, a high reliability level while needing no spare links at all. I conclude that the drive control and reliability problems of semiconductor lasers do not bar their use in large scale multiprocessors, since inexpensive system-level solutions to them are possible. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617 -253-5668; Fax 617-253-1690.) (Abstract shortened by UMI.).
E-Token Energy-Aware Proportionate Sharing Scheduling Algorithm for Multiprocessor Systems

Directory of Open Access Journals (Sweden)

Pasupuleti Ramesh

2017-01-01

Full Text Available WSN plays vital role from small range healthcare surveillance systems to largescale environmental monitoring. Its design for energy constrained applications is a challenging issue. Sensors in WSNs are projected to run separately for longer periods. It is of excessive cost to substitute exhausted batteries which is not even possible in antagonistic situations. Multiprocessors are used in WSNs for high performance scientific computing, where each processor is assigned the same or different workload. When the computational demands of the system increase then the energy efficient approaches play an important role to increase system lifetime. Energy efficiency is commonly carried out by using proportionate fair scheduler. This introduces abnormal overloading effect. In order to overcome the existing problems E-token Energy-Aware Proportionate Sharing (EEAPS scheduling is proposed here. The power consumption for each thread/task is calculated and the tasks are allotted to the multiple processors through the auctioning mechanism. The algorithm is simulated by using the real-time simulator (RTSIM and the results are tested.
Distributed parallel messaging for multiprocessor systems

Science.gov (United States)

Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

2013-06-04

A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
Parallel clustering algorithm for large-scale biological data sets.

Science.gov (United States)

Wang, Minchao; Zhang, Wu; Ding, Wang; Dai, Dongbo; Zhang, Huiran; Xie, Hao; Chen, Luonan; Guo, Yike; Xie, Jiang

2014-01-01

Recent explosion of biological data brings a great challenge for the traditional clustering algorithms. With increasing scale of data sets, much larger memory and longer runtime are required for the cluster identification problems. The affinity propagation algorithm outperforms many other classical clustering algorithms and is widely applied into the biological researches. However, the time and space complexity become a great bottleneck when handling the large-scale data sets. Moreover, the similarity matrix, whose constructing procedure takes long runtime, is required before running the affinity propagation algorithm, since the algorithm clusters data sets based on the similarities between data pairs. Two types of parallel architectures are proposed in this paper to accelerate the similarity matrix constructing procedure and the affinity propagation algorithm. The memory-shared architecture is used to construct the similarity matrix, and the distributed system is taken for the affinity propagation algorithm, because of its large memory size and great computing capacity. An appropriate way of data partition and reduction is designed in our method, in order to minimize the global communication cost among processes. A speedup of 100 is gained with 128 cores. The runtime is reduced from serval hours to a few seconds, which indicates that parallel algorithm is capable of handling large-scale data sets effectively. The parallel affinity propagation also achieves a good performance when clustering large-scale gene data (microarray) and detecting families in large protein superfamilies.
Multiprocessor Global Scheduling on Frame-Based DVFS Systems

OpenAIRE

Berten, Vandy; Goossens, Joël

2008-01-01

International audience; In this work, we are interested in multiprocessor energy efficient systems where task durations are not known in advance but are known stochastically. More precisely we consider global scheduling algorithms for frame-based multiprocessor stochastic DVFS (Dynamic Voltage and Frequency Scaling) systems. Moreover we consider processors with a discrete set of available frequencies. We provide a global scheduling algorithm, and formally show that no deadline will ever be mi...
Performing an allreduce operation using shared memory

Science.gov (United States)

Archer, Charles J [Rochester, MN; Dozsa, Gabor [Ardsley, NY; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN

2012-04-17

Methods, apparatus, and products are disclosed for performing an allreduce operation using shared memory that include: receiving, by at least one of a plurality of processing cores on a compute node, an instruction to perform an allreduce operation; establishing, by the core that received the instruction, a job status object for specifying a plurality of shared memory allreduce work units, the plurality of shared memory allreduce work units together performing the allreduce operation on the compute node; determining, by an available core on the compute node, a next shared memory allreduce work unit in the job status object; and performing, by that available core on the compute node, that next shared memory allreduce work unit.
Method for prefetching non-contiguous data structures

Science.gov (United States)

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Brewster, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2009-05-05

A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.
Performance of the coupled thermalhydraulics/neutron kinetics code R/P/C on workstation clusters and multiprocessor systems

International Nuclear Information System (INIS)

Hammer, C.; Paffrath, M.; Boeer, R.; Finnemann, H.; Jackson, C.J.

1996-01-01

The light water reactor core simulation code PANBOX has been coupled with the transient analysis code RELAP5 for the purpose of performing plant safety analyses with a three-dimensional (3-D) neutron kinetics model. The system has been parallelized to improve the computational efficiency. The paper describes the features of this system with emphasis on performance aspects. Performance results are given for different types of parallelization, i. e. for using an automatic parallelizing compiler, using the portable PVM platform on a workstation cluster, using PVM on a shared memory multiprocessor, and for using machine dependent interfaces. (author)
Parallel algorithms for geometric connected component labeling on a hypercube multiprocessor

Science.gov (United States)

Belkhale, K. P.; Banerjee, P.

1992-01-01

Different algorithms for the geometric connected component labeling (GCCL) problem are defined each of which involves d stages of message passing, for a d-dimensional hypercube. The major idea is that in each stage a hypercube multiprocessor increases its knowledge of domain. The algorithms under consideration include the QUAD algorithm for small number of processors and the Overlap Quad algorithm for large number of processors, subject to the locality of the connected sets. These algorithms differ in their run time, memory requirements, and message complexity. They were implemented on an Intel iPSC2/D4/MX hypercube.
A Shared Scratchpad Memory with Synchronization Support

DEFF Research Database (Denmark)

Hansen, Henrik Enggaard; Maroun, Emad Jacob; Kristensen, Andreas Toftegaard

2017-01-01

Multicore processors usually communicate via shared memory, which is backed up by a shared level 2 cache and a cache coherence protocol. However, this solution is not a good fit for real-time systems, where we need to provide tight guarantees on execution and memory access times. In this paper, we...... propose a shared scratchpad memory as a time-predictable communication and synchronization structure, instead of the level 2 cache. The shared on-chip memory is accessed via a time division multiplexing arbiter, isolating the execution time of load and store instructions between processing cores....... Furthermore, the arbiter supports an extended time slot where an atomic load and store instruction can be executed to implement synchronization primitives. In the evaluation we show that a shared scratchpad memory is an efficient communication structure for a small number of processors; in our setup, 9 cores...
Computational cost estimates for parallel shared memory isogeometric multi-frontal solvers

KAUST Repository

Woźniak, Maciej; Kuźnik, Krzysztof M.; Paszyński, Maciej R.; Calo, Victor M.; Pardo, D.

2014-01-01

In this paper we present computational cost estimates for parallel shared memory isogeometric multi-frontal solvers. The estimates show that the ideal isogeometric shared memory parallel direct solver scales as O( p2log(N/p)) for one dimensional problems, O(Np2) for two dimensional problems, and O(N4/3p2) for three dimensional problems, where N is the number of degrees of freedom, and p is the polynomial order of approximation. The computational costs of the shared memory parallel isogeometric direct solver are compared with those corresponding to the sequential isogeometric direct solver, being the latest equal to O(N p2) for the one dimensional case, O(N1.5p3) for the two dimensional case, and O(N2p3) for the three dimensional case. The shared memory version significantly reduces both the scalability in terms of N and p. Theoretical estimates are compared with numerical experiments performed with linear, quadratic, cubic, quartic, and quintic B-splines, in one and two spatial dimensions. © 2014 Elsevier Ltd. All rights reserved.
Computational cost estimates for parallel shared memory isogeometric multi-frontal solvers

KAUST Repository

Woźniak, Maciej

2014-06-01

In this paper we present computational cost estimates for parallel shared memory isogeometric multi-frontal solvers. The estimates show that the ideal isogeometric shared memory parallel direct solver scales as O( p2log(N/p)) for one dimensional problems, O(Np2) for two dimensional problems, and O(N4/3p2) for three dimensional problems, where N is the number of degrees of freedom, and p is the polynomial order of approximation. The computational costs of the shared memory parallel isogeometric direct solver are compared with those corresponding to the sequential isogeometric direct solver, being the latest equal to O(N p2) for the one dimensional case, O(N1.5p3) for the two dimensional case, and O(N2p3) for the three dimensional case. The shared memory version significantly reduces both the scalability in terms of N and p. Theoretical estimates are compared with numerical experiments performed with linear, quadratic, cubic, quartic, and quintic B-splines, in one and two spatial dimensions. © 2014 Elsevier Ltd. All rights reserved.
Interference control by best-effort process duty-cycling in chip multi-processor systems for real-time medical image processing

NARCIS (Netherlands)

Westmijze, M.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria

2013-01-01

Systems with chip multi-processors are currently used for several applications that have real-time requirements. In chip multi-processor architectures, many hardware resources such as parts of the cache hierarchy are shared between cores and by using such resources, applications can significantly
Switch/router architectures shared-bus and shared-memory based systems

CERN Document Server

Aweya, James

2018-01-01

A practicing engineer's inclusive review of communication systems based on shared-bus and shared-memory switch/router architectures. This book delves into the inner workings of router and switch design in a comprehensive manner that is accessible to a broad audience. It begins by describing the role of switch/routers in a network, then moves on to the functional composition of a switch/router. A comparison of centralized versus distributed design of the architecture is also presented. The author discusses use of bus versus shared-memory for communication within a design, and also covers Quality of Service (QoS) mechanisms and configuration tools. Written in a simple style and language to allow readers to easily understand and appreciate the material presented, Switch/Router Architectures: Shared-Bus and Shared-Memory Based Systems discusses the design of multilayer switches—starting with the basic concepts and on to the basic architectures. It describes the evolution of multilayer switch designs and highli...
Attention and Visuospatial Working Memory Share the Same Processing Resources

Directory of Open Access Journals (Sweden)

Jing eFeng

2012-04-01

Full Text Available Attention and visuospatial working memory (VWM share very similar characteristics; both have the same upper bound of about four items in capacity and they recruit overlapping brain regions. We examined whether both attention and visuospatial working memory share the same processing resources using a novel dual-task-costs approach based on a load-varying dual-task technique. With sufficiently large loads on attention and VWM, considerable interference between the two processes was observed. A further load increase on either process produced reciprocal increases in interference on both processes, indicating that attention and VWM share common resources. More critically, comparison among four experiments on the reciprocal interference effects, as measured by the dual-task costs, demonstrates no significant contribution from additional processing other than the shared processes. These results support the notion that attention and VWM share the same processing resources.
SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

International Nuclear Information System (INIS)

Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

1999-01-01

In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS
SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

Energy Technology Data Exchange (ETDEWEB)

Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

1999-01-04

In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS.
Matrix factorization on a hypercube multiprocessor

International Nuclear Information System (INIS)

Geist, G.A.; Heath, M.T.

1985-08-01

This paper is concerned with parallel algorithms for matrix factorization on distributed-memory, message-passing multiprocessors, with special emphasis on the hypercube. Both Cholesky factorization of symmetric positive definite matrices and LU factorization of nonsymmetric matrices using partial pivoting are considered. The use of the resulting triangular factors to solve systems of linear equations by forward and back substitutions is also considered. Efficiencies of various parallel computational approaches are compared in terms of empirical results obtained on an Intel iPSC hypercube. 19 refs., 6 figs., 2 tabs
Real-Time Multiprocessor Programming Language (RTMPL) user's manual

Science.gov (United States)

Arpasi, D. J.

1985-01-01

A real-time multiprocessor programming language (RTMPL) has been developed to provide for high-order programming of real-time simulations on systems of distributed computers. RTMPL is a structured, engineering-oriented language. The RTMPL utility supports a variety of multiprocessor configurations and types by generating assembly language programs according to user-specified targeting information. Many programming functions are assumed by the utility (e.g., data transfer and scaling) to reduce the programming chore. This manual describes RTMPL from a user's viewpoint. Source generation, applications, utility operation, and utility output are detailed. An example simulation is generated to illustrate many RTMPL features.
A scalable single-chip multi-processor architecture with on-chip RTOS kernel

NARCIS (Netherlands)

Theelen, B.D.; Verschueren, A.C.; Reyes Suarez, V.V.; Stevens, M.P.J.; Nunez, A.

2003-01-01

Now that system-on-chip technology is emerging, single-chip multi-processors are becoming feasible. A key problem of designing such systems is the complexity of their on-chip interconnects and memory architecture. It is furthermore unclear at what level software should be integrated. An example of a

Scaling Techniques for Massive Scale-Free Graphs in Distributed (External) Memory

KAUST Repository

Pearce, Roger; Gokhale, Maya; Amato, Nancy M.

2013-01-01

We present techniques to process large scale-free graphs in distributed memory. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers and clusters with local non-volatile memory, e.g., NAND Flash
The art of multiprocessor programming

CERN Document Server

Herlihy, Maurice

2012-01-01

Revised and updated with improvements conceived in parallel programming courses, The Art of Multiprocessor Programming is an authoritative guide to multicore programming. It introduces a higher level set of software development skills than that needed for efficient single-core programming. This book provides comprehensive coverage of the new principles, algorithms, and tools necessary for effective multiprocessor programming. Students and professionals alike will benefit from thorough coverage of key multiprocessor programming issues. This revised edition incorporates much-demanded updates t
Externalising the autobiographical self: sharing personal memories online facilitated memory retention.

Science.gov (United States)

Wang, Qi; Lee, Dasom; Hou, Yubo

2017-07-01

Internet technology provides a new means of recalling and sharing personal memories in the digital age. What is the mnemonic consequence of posting personal memories online? Theories of transactive memory and autobiographical memory would make contrasting predictions. In the present study, college students completed a daily diary for a week, listing at the end of each day all the events that happened to them on that day. They also reported whether they posted any of the events online. Participants received a surprise memory test after the completion of the diary recording and then another test a week later. At both tests, events posted online were significantly more likely than those not posted online to be recalled. It appears that sharing memories online may provide unique opportunities for rehearsal and meaning-making that facilitate memory retention.
Improvement of multiprocessing performance by using optical centralized shared bus

Science.gov (United States)

Han, Xuliang; Chen, Ray T.

2004-06-01

With the ever-increasing need to solve larger and more complex problems, multiprocessing is attracting more and more research efforts. One of the challenges facing the multiprocessor designers is to fulfill in an effective manner the communications among the processes running in parallel on multiple multiprocessors. The conventional electrical backplane bus provides narrow bandwidth as restricted by the physical limitations of electrical interconnects. In the electrical domain, in order to operate at high frequency, the backplane topology has been changed from the simple shared bus to the complicated switched medium. However, the switched medium is an indirect network. It cannot support multicast/broadcast as effectively as the shared bus. Besides the additional latency of going through the intermediate switching nodes, signal routing introduces substantial delay and considerable system complexity. Alternatively, optics has been well known for its interconnect capability. Therefore, it has become imperative to investigate how to improve multiprocessing performance by utilizing optical interconnects. From the implementation standpoint, the existing optical technologies still cannot fulfill the intelligent functions that a switch fabric should provide as effectively as their electronic counterparts. Thus, an innovative optical technology that can provide sufficient bandwidth capacity, while at the same time, retaining the essential merits of the shared bus topology, is highly desirable for the multiprocessing performance improvement. In this paper, the optical centralized shared bus is proposed for use in the multiprocessing systems. This novel optical interconnect architecture not only utilizes the beneficial characteristics of optics, but also retains the desirable properties of the shared bus topology. Meanwhile, from the architecture standpoint, it fits well in the centralized shared-memory multiprocessing scheme. Therefore, a smooth migration with substantial
Single-chip serial channel enhances multi-processor systems

Energy Technology Data Exchange (ETDEWEB)

Millar, J.

1982-01-01

In this paper multiprocessor systems are described and explained. The impact that VLSI advancements are having on multiprocessor design is pointed out. The TMS 7041 single-chip microcomputer is described briefly, highlighting its multiprocessor communication capability. And finally, a typical multiprocessor system is shown, implementing the TMS 7041.
Direct access inter-process shared memory

Science.gov (United States)

Brightwell, Ronald B; Pedretti, Kevin; Hudson, Trammell B

2013-10-22

A technique for directly sharing physical memory between processes executing on processor cores is described. The technique includes loading a plurality of processes into the physical memory for execution on a corresponding plurality of processor cores sharing the physical memory. An address space is mapped to each of the processes by populating a first entry in a top level virtual address table for each of the processes. The address space of each of the processes is cross-mapped into each of the processes by populating one or more subsequent entries of the top level virtual address table with the first entry in the top level virtual address table from other processes.
Large-scale data analytics

CERN Document Server

Gkoulalas-Divanis, Aris

2014-01-01

Provides cutting-edge research in large-scale data analytics from diverse scientific areas Surveys varied subject areas and reports on individual results of research in the field Shares many tips and insights into large-scale data analytics from authors and editors with long-term experience and specialization in the field
Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems

KAUST Repository

Mudigere, Dheevatsa

2015-05-01

In this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain decomposition approach, exposes tradeoffs between the number of threads assigned to each MPI-rank sub domain, and the total number of domains. By applying several algorithm- and architecture-aware optimization techniques for unstructured grids, we show a 6.9X speed-up in performance on a single-node Intel® XeonTM1 E5 2690 v2 processor relative to the out-of-the-box compilation. Our scaling studies on TACC Stampede supercomputer show that our optimizations continue to provide performance benefits over baseline implementation as we scale up to 256 nodes.
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments

Energy Technology Data Exchange (ETDEWEB)

Jin, Shuangshuang; Chen, Yousu; Wu, Di; Diao, Ruisheng; Huang, Zhenyu

2015-12-09

Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Message Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.
TUMULT, the Twente University multiprocessor

NARCIS (Netherlands)

Scholten, Johan; Jansen, P.G.

1988-01-01

TUMULT, (Twente University multiprocessor) is described. Its aim is the design and implementation of a modular extendable multiprocessor system. Up to 15 processing elements are connected through an interprocessor communication network, using message-passing for the exchange of data. The hardware is
A Comparison of Two Paradigms for Distributed Shared Memory

NARCIS (Netherlands)

Levelt, W.G.; Kaashoek, M.F.; Bal, H.E.; Tanenbaum, A.S.

1992-01-01

Two paradigms for distributed shared memory on loosely‐coupled computing systems are compared: the shared data‐object model as used in Orca, a programming language specially designed for loosely‐coupled computing systems, and the shared virtual memory model. For both paradigms two systems are
Multiprocessor development for robot control

International Nuclear Information System (INIS)

Lee, Jong Min; Kim, Byung Soo; Kim, Chang Hoi; Hwang, Suk Yong; Sohn, Surg Won; Yoon, Tae Seob; Lee, Yong Bum; Kim, Woong Ki

1988-02-01

A mutiprocessor system that is essential to A.I. (Artificial Intelligence) robot control was developed. A.I. robot control needs very complex real time control. The multiprocessor system interconnecting many SBC's (Single Board Computer) is much faster and accurater than using only one SBC. Various multiprocessor systems and their applications were compared and discussed. The multiprocessor architecture system is specially designed to be used in nuclear environments. The main functions are job distribution, multitasking, and intelligent remote control by SDLC protocol using optical fiber. The system can be applied to position control for locomotion and manipulation, data fusion system, and image processing. (Author)
Vertex trigger implementation using shared memory technology

CERN Document Server

Müller, H

1998-01-01

The implementation of a 1 st level vertex trigger for LHC-B is particularly difficult due to the high ( 1 MHz ) input data rate. With ca. 350 silicon hits per event, both the R strips and Phi strips of the detectors produce a total of ca 2 Gbyte/s zero-suppressed da ta.1 note succeeds to the ideas to use R-phi coordinates for fast integer linefinding in programmable hardware, as described in LHB note 97-006. For an implementation we propose a FPGA preprocessing stage operating at 1 MHz with the benefit to substantially reduce the amount of data to be transmitted to the CPUs and to liberate a large fraction of CPU time. Interconnected via 4 Gbit/s SCI technol-ogy 2 , a shared memory system can be built which allows to perform data driven eventbuilding with, or without preprocessing. A fully data driven architecture between source modules and destination memories provides a highly reliable memory-to-memory transfer mechanism of very low latency. The eventbuilding is performed via associating events at the sourc...
Self-Stabilization of Wait-Free Shared Memory Objects

NARCIS (Netherlands)

Hoepman, J.H.; Papatriantafilou, Marina; Tsigas, Philippas

2002-01-01

This paper proposes a general definition of self-stabilizing wait-free shared memory objects. The definition ensures that, even in the face of processor failures, every execution after a transient memory failure is linearizable except for an a priori bounded number of actions. Shared registers have
Exploiting multi-scale parallelism for large scale numerical modelling of laser wakefield accelerators

International Nuclear Information System (INIS)

Fonseca, R A; Vieira, J; Silva, L O; Fiuza, F; Davidson, A; Tsung, F S; Mori, W B

2013-01-01

A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ∼10 6 cores and sustained performance over ∼2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios. (paper)
The FORCE: A highly portable parallel programming language

Science.gov (United States)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
Multiprocessor system with multiple concurrent modes of execution

Science.gov (United States)

Ahn, Daniel; Ceze, Luis H; Chen, Dong; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin

2013-12-31

A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.
Is sharing specific autobiographical memories a distinct form of self-disclosure?

Science.gov (United States)

Beike, Denise R; Brandon, Nicole R; Cole, Holly E

2016-04-01

Theories of autobiographical memory posit a social function, meaning that recollecting and sharing memories of specific discrete events creates and maintains relationship intimacy. Eight studies with 1,271 participants tested whether sharing specific autobiographical memories in conversations increases feelings of closeness among conversation partners, relative to sharing other self-related information. The first 2 studies revealed that conversations in which specific autobiographical memories were shared were also accompanied by feelings of closeness among conversation partners. The next 5 studies experimentally introduced specific autobiographical memories versus general information about the self into conversations between mostly unacquainted pairs of participants. Discussing specific autobiographical memories led to greater closeness among conversation partners than discussing nonself-related topics, but no greater closeness than discussing other, more general self-related information. In the final study unacquainted pairs in whom feelings of closeness had been experimentally induced through shared humor were more likely to discuss specific autobiographical memories than unacquainted control participant pairs. We conclude that sharing specific autobiographical memories may express more than create relationship closeness, and discuss how relationship closeness may afford sharing of specific autobiographical memories by providing common ground, a social display, or a safety signal. (c) 2016 APA, all rights reserved).
Multiprocessor scheduling for real-time systems

CERN Document Server

Baruah, Sanjoy; Buttazzo, Giorgio

2015-01-01

This book provides a comprehensive overview of both theoretical and pragmatic aspects of resource-allocation and scheduling in multiprocessor and multicore hard-real-time systems. The authors derive new, abstract models of real-time tasks that capture accurately the salient features of real application systems that are to be implemented on multiprocessor platforms, and identify rules for mapping application systems onto the most appropriate models. New run-time multiprocessor scheduling algorithms are presented, which are demonstrably better than those currently used, both in terms of run-time efficiency and tractability of off-line analysis. Readers will benefit from a new design and analysis framework for multiprocessor real-time systems, which will translate into a significantly enhanced ability to provide formally verified, safety-critical real-time systems at a significantly lower cost.
Large Scale Self-Organizing Information Distribution System

National Research Council Canada - National Science Library

Low, Steven

2005-01-01

This project investigates issues in "large-scale" networks. Here "large-scale" refers to networks with large number of high capacity nodes and transmission links, and shared by a large number of users...

Large scale integration of flexible non-volatile, re-addressable memories using P(VDF-TrFE) and amorphous oxide transistors

International Nuclear Information System (INIS)

Gelinck, Gerwin H; Cobb, Brian; Van Breemen, Albert J J M; Myny, Kris

2015-01-01

Ferroelectric polymers and amorphous metal oxide semiconductors have emerged as important materials for re-programmable non-volatile memories and high-performance, flexible thin-film transistors, respectively. However, realizing sophisticated transistor memory arrays has proven to be a challenge, and demonstrating reliable writing to and reading from such a large scale memory has thus far not been demonstrated. Here, we report an integration of ferroelectric, P(VDF-TrFE), transistor memory arrays with thin-film circuitry that can address each individual memory element in that array. n-type indium gallium zinc oxide is used as the active channel material in both the memory and logic thin-film transistors. The maximum process temperature is 200 °C, allowing plastic films to be used as substrate material. The technology was scaled up to 150 mm wafer size, and offers good reproducibility, high device yield and low device variation. This forms the basis for successful demonstration of memory arrays, read and write circuitry, and the integration of these. (paper)
A combined PLC and CPU approach to multiprocessor control

International Nuclear Information System (INIS)

Harris, J.J.; Broesch, J.D.; Coon, R.M.

1995-10-01

A sophisticated multiprocessor control system has been developed for use in the E-Power Supply System Integrated Control (EPSSIC) on the DIII-D tokamak. EPSSIC provides control and interlocks for the ohmic heating coil power supply and its associated systems. Of particular interest is the architecture of this system: both a Programmable Logic Controller (PLC) and a Central Processor Unit (CPU) have been combined on a standard VME bus. The PLC and CPU input and output signals are routed through signal conditioning modules, which provide the necessary voltage and ground isolation. Additionally these modules adapt the signal levels to that of the VME I/O boards. One set of I/O signals is shared between the two processors. The resulting multiprocessor system provides a number of advantages: redundant operation for mission critical situations, flexible communications using conventional TCP/IP protocols, the simplicity of ladder logic programming for the majority of the control code, and an easily maintained and expandable non-proprietary system
Use of the CAMAC-MULTIBUS combined protocol for organizing multi-processor operation in a crate

International Nuclear Information System (INIS)

Glejbman, Eh.M.

1985-01-01

Problems of developing electronic units for large on-line systems for nuclear-physical experiments automation and developed on the base of principles of distributed control and data processing are discussed. Crates with simultaneous disposition and operation of CAMAC moduli (EUR-4100) and those realizing the MULTIBUS hardcopy log in dataway are described. It is attained due to sharing the CAMAC and the MULTIBUS hardcopy logs in the crate dataway. Application of job scheduler and executor moduli in the MULTIBUS interface permits to organize multiprocessor operation and to obtain separation of data stream as well as to increase total computational capacity in the crate
Working memory resources are shared across sensory modalities.

Science.gov (United States)

Salmela, V R; Moisala, M; Alho, K

2014-10-01

A common assumption in the working memory literature is that the visual and auditory modalities have separate and independent memory stores. Recent evidence on visual working memory has suggested that resources are shared between representations, and that the precision of representations sets the limit for memory performance. We tested whether memory resources are also shared across sensory modalities. Memory precision for two visual (spatial frequency and orientation) and two auditory (pitch and tone duration) features was measured separately for each feature and for all possible feature combinations. Thus, only the memory load was varied, from one to four features, while keeping the stimuli similar. In Experiment 1, two gratings and two tones-both containing two varying features-were presented simultaneously. In Experiment 2, two gratings and two tones-each containing only one varying feature-were presented sequentially. The memory precision (delayed discrimination threshold) for a single feature was close to the perceptual threshold. However, as the number of features to be remembered was increased, the discrimination thresholds increased more than twofold. Importantly, the decrease in memory precision did not depend on the modality of the other feature(s), or on whether the features were in the same or in separate objects. Hence, simultaneously storing one visual and one auditory feature had an effect on memory precision equal to those of simultaneously storing two visual or two auditory features. The results show that working memory is limited by the precision of the stored representations, and that working memory can be described as a resource pool that is shared across modalities.
Universal algorithm of time sharing

International Nuclear Information System (INIS)

Silin, I.N.; Fedyun'kin, E.D.

1979-01-01

Timesharing system algorithm is proposed for the wide class of one- and multiprocessor computer configurations. Dynamical priority is the piece constant function of the channel characteristic and system time quantum. The interactive job quantum has variable length. Characteristic recurrent formula is received. The concept of the background job is introduced. Background job loads processor if high priority jobs are inactive. Background quality function is given on the base of the statistical data received in the timesharing process. Algorithm includes optimal trashing off procedure for the jobs replacements in the memory. Sharing of the system time in proportion to the external priorities is guaranteed for the all active enough computing channels (back-ground too). The fast answer is guaranteed for the interactive jobs, which use small time and memory. The external priority control is saved for the high level scheduler. The experience of the algorithm realization on the BESM-6 computer in JINR is discussed
Development of a VME multi-processor system for plasma control at the JT-60 Upgrade

International Nuclear Information System (INIS)

Takahashi, M.; Kurihara, K.; Kawamata, Y.; Akasaka, H.; Kimura, T.

1992-01-01

Design and initial operation results are reported of a VME multi-processor system [1] for plasma control at a large fusion device named 'the JT-60 Upgrade' utilizing three 32-bit MC88100 based RISC computers and VME components. Development of the system was stimulated by faster and more accurate computation requirements for the plasma position and current control. The RISC computers operate at 25 MHz along with two cashe memories named MC88200. We newly developed VME bus modules of up/down counter, analog-to-digital converter and clock pulse generator for measuring magnetic field and coil current and for synchronizing the processing in the three RISCs and direct digital controllers (DDCs) of magnet power supplies. We also evaluated that the speed of the data transfer between the VME bus system and the DDCs through CAMAC highways satisfies the above requirements. In the initial operation of the JT-60 upgrade, it has been proved that the VME multi-processor system well controls the plasma position and current with a sampling period of 250 μsec and a delay of 500 μsec. (author)
A homotopy method for solving Riccati equations on a shared memory parallel computer

International Nuclear Information System (INIS)

Zigic, D.; Watson, L.T.; Collins, E.G. Jr.; Davis, L.D.

1993-01-01

Although there are numerous algorithms for solving Riccati equations, there still remains a need for algorithms which can operate efficiently on large problems and on parallel machines. This paper gives a new homotopy-based algorithm for solving Riccati equations on a shared memory parallel computer. The central part of the algorithm is the computation of the kernel of the Jacobian matrix, which is essential for the corrector iterations along the homotopy zero curve. Using a Schur decomposition the tensor product structure of various matrices can be efficiently exploited. The algorithm allows for efficient parallelization on shared memory machines
Temporal analysis and scheduling of hard real-time radios running on a multi-processor

NARCIS (Netherlands)

Moreira, O.

2012-01-01

On a multi-radio baseband system, multiple independent transceivers must share the resources of a multi-processor, while meeting each its own hard real-time requirements. Not all possible combinations of transceivers are known at compile time, so a solution must be found that either allows for
Parallel simulated annealing algorithms for cell placement on hypercube multiprocessors

Science.gov (United States)

Banerjee, Prithviraj; Jones, Mark Howard; Sargent, Jeff S.

1990-01-01

Two parallel algorithms for standard cell placement using simulated annealing are developed to run on distributed-memory message-passing hypercube multiprocessors. The cells can be mapped in a two-dimensional area of a chip onto processors in an n-dimensional hypercube in two ways, such that both small and large cell exchange and displacement moves can be applied. The computation of the cost function in parallel among all the processors in the hypercube is described, along with a distributed data structure that needs to be stored in the hypercube to support the parallel cost evaluation. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. A dynamic parallel annealing schedule estimates the errors due to interacting parallel moves and adapts the rate of synchronization automatically. Two novel approaches in controlling error in parallel algorithms are described: heuristic cell coloring and adaptive sequence control.
Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data.

Science.gov (United States)

Li, Wenyuan; Gong, Ke; Li, Qingjiao; Alber, Frank; Zhou, Xianghong Jasmine

2015-03-15

Genome-wide proximity ligation assays, e.g. Hi-C and its variant TCC, have recently become important tools to study spatial genome organization. Removing biases from chromatin contact matrices generated by such techniques is a critical preprocessing step of subsequent analyses. The continuing decline of sequencing costs has led to an ever-improving resolution of the Hi-C data, resulting in very large matrices of chromatin contacts. Such large-size matrices, however, pose a great challenge on the memory usage and speed of its normalization. Therefore, there is an urgent need for fast and memory-efficient methods for normalization of Hi-C data. We developed Hi-Corrector, an easy-to-use, open source implementation of the Hi-C data normalization algorithm. Its salient features are (i) scalability-the software is capable of normalizing Hi-C data of any size in reasonable times; (ii) memory efficiency-the sequential version can run on any single computer with very limited memory, no matter how little; (iii) fast speed-the parallel version can run very fast on multiple computing nodes with limited local memory. The sequential version is implemented in ANSI C and can be easily compiled on any system; the parallel version is implemented in ANSI C with the MPI library (a standardized and portable parallel environment designed for solving large-scale scientific problems). The package is freely available at http://zhoulab.usc.edu/Hi-Corrector/. © The Author 2014. Published by Oxford University Press.
Distributed Shared Memory for the Cell Broadband Engine (DSMCBE)

DEFF Research Database (Denmark)

Larsen, Morten Nørgaard; Skovhede, Kenneth; Vinter, Brian

2009-01-01

in and out of non-coherent local storage blocks for each special processor element. In this paper we present a software library, namely the Distributed Shared Memory for the Cell Broadband Engine (DSMCBE). By using techniques known from distributed shared memory DSMCBE allows programmers to program the CELL...
The ACP [Advanced Computer Program] multiprocessor system at Fermilab

International Nuclear Information System (INIS)

Nash, T.; Areti, H.; Atac, R.

1986-09-01

The Advanced Computer Program at Fermilab has developed a multiprocessor system which is easy to use and uniquely cost effective for many high energy physics problems. The system is based on single board computers which cost under $2000 each to build including 2 Mbytes of on board memory. These standard VME modules each run experiment reconstruction code in Fortran at speeds approaching that of a VAX 11/780. Two versions have been developed: one uses Motorola's 68020 32 bit microprocessor, the other runs with AT and T's 32100. both include the corresponding floating point coprocessor chip. The first system, when fully configured, uses 70 each of the two types of processors. A 53 processor system has been operated for several months with essentially no down time by computer operators in the Fermilab Computer Center, performing at nearly the capacity of 6 CDC Cyber 175 mainframe computers. The VME crates in which the processing ''nodes'' sit are connected via a high speed ''Branch Bus'' to one or more MicroVAX computers which act as hosts handling system resource management and all I/O in offline applications. An interface from Fastbus to the Branch Bus has been developed for online use which has been tested error free at 20 Mbytes/sec for 48 hours. ACP hardware modules are now available commercially. A major package of software, including a simulator that runs on any VAX, has been developed. It allows easy migration of existing programs to this multiprocessor environment. This paper describes the ACP Multiprocessor System and early experience with it at Fermilab and elsewhere
File-System Workload on a Scientific Multiprocessor

Science.gov (United States)

Kotz, David; Nieuwejaar, Nils

1995-01-01

Many scientific applications have intense computational and I/O requirements. Although multiprocessors have permitted astounding increases in computational performance, the formidable I/O needs of these applications cannot be met by current multiprocessors a their I/O subsystems. To prevent I/O subsystems from forever bottlenecking multiprocessors and limiting the range of feasible applications, new I/O subsystems must be designed. The successful design of computer systems (both hardware and software) depends on a thorough understanding of their intended use. A system designer optimizes the policies and mechanisms for the cases expected to most common in the user's workload. In the case of multiprocessor file systems, however, designers have been forced to build file systems based only on speculation about how they would be used, extrapolating from file-system characterizations of general-purpose workloads on uniprocessor and distributed systems or scientific workloads on vector supercomputers (see sidebar on related work). To help these system designers, in June 1993 we began the Charisma Project, so named because the project sought to characterize 1/0 in scientific multiprocessor applications from a variety of production parallel computing platforms and sites. The Charisma project is unique in recording individual read and write requests-in live, multiprogramming, parallel workloads (rather than from selected or nonparallel applications). In this article, we present the first results from the project: a characterization of the file-system workload an iPSC/860 multiprocessor running production, parallel scientific applications at NASA's Ames Research Center.
Modeling and Analyzing Real-Time Multiprocessor Systems

NARCIS (Netherlands)

Wiggers, M.H.; Thiele, Lothar; Lee, Edward A.; Schlieker, Simon; Bekooij, Marco Jan Gerrit

2010-01-01

Researchers have proposed approaches to verify that real-time multiprocessor systems meet their timeliness constraints. These approaches make assumptions on the model of computation, the load placed on the multiprocessor system, and the faults that can arise. This heterogeneous set of assumptions
A shared resource between declarative memory and motor memory.

Science.gov (United States)

Keisler, Aysha; Shadmehr, Reza

2010-11-03

The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and nondeclarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/nondeclarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system.
A shared resource between declarative memory and motor memory

Science.gov (United States)

Keisler, Aysha; Shadmehr, Reza

2010-01-01

The neural systems that support motor adaptation in humans are thought to be distinct from those that support the declarative system. Yet, during motor adaptation changes in motor commands are supported by a fast adaptive process that has important properties (rapid learning, fast decay) that are usually associated with the declarative system. The fast process can be contrasted to a slow adaptive process that also supports motor memory, but learns gradually and shows resistance to forgetting. Here we show that after people stop performing a motor task, the fast motor memory can be disrupted by a task that engages declarative memory, but the slow motor memory is immune from this interference. Furthermore, we find that the fast/declarative component plays a major role in the consolidation of the slow motor memory. Because of the competitive nature of declarative and non-declarative memory during consolidation, impairment of the fast/declarative component leads to improvements in the slow/non-declarative component. Therefore, the fast process that supports formation of motor memory is not only neurally distinct from the slow process, but it shares critical resources with the declarative memory system. PMID:21048140
Automatic code generation for distributed robotic systems

International Nuclear Information System (INIS)

Jones, J.P.

1993-01-01

Hetero Helix is a software environment which supports relatively large robotic system development projects. The environment supports a heterogeneous set of message-passing LAN-connected common-bus multiprocessors, but the programming model seen by software developers is a simple shared memory. The conceptual simplicity of shared memory makes it an extremely attractive programming model, especially in large projects where coordinating a large number of people can itself become a significant source of complexity. We present results from three system development efforts conducted at Oak Ridge National Laboratory over the past several years. Each of these efforts used automatic software generation to create 10 to 20 percent of the system
Real-world-time simulation of memory consolidation in a large-scale cerebellar model

Directory of Open Access Journals (Sweden)

Masato eGosui

2016-03-01

Full Text Available We report development of a large-scale spiking network model of thecerebellum composed of more than 1 million neurons. The model isimplemented on graphics processing units (GPUs, which are dedicatedhardware for parallel computing. Using 4 GPUs simultaneously, we achieve realtime simulation, in which computer simulation ofcerebellar activity for 1 sec completes within 1 sec in thereal-world time, with temporal resolution of 1 msec.This allows us to carry out a very long-term computer simulationof cerebellar activity in a practical time with millisecond temporalresolution. Using the model, we carry out computer simulationof long-term gain adaptation of optokinetic response (OKR eye movementsfor 5 days aimed to study the neural mechanisms of posttraining memoryconsolidation. The simulation results are consistent with animal experimentsand our theory of posttraining memory consolidation. These resultssuggest that realtime computing provides a useful means to studya very slow neural process such as memory consolidation in the brain.
Sharing specific "We" autobiographical memories in close relationships: the role of contact frequency.

Science.gov (United States)

Beike, Denise R; Cole, Holly E; Merrick, Carmen R

2017-11-01

Sharing memories in conversations with close others is posited to be part of the social function of autobiographical memory. The present research focused on the sharing of a particular type of memory: Specific memories about one-time co-experienced events, which we termed Specific We memories. Two studies with 595 total participants examined the factors that lead to and/or are influenced by the sharing of Specific We memories. In Study 1, participants reported on their most recent conversation. Specific We memories were reportedly discussed most often in conversations with others who were close and with whom the participant had frequent communication. In Study 2, participants were randomly assigned either to increase or to simply record the frequency of communication with a close other (parent). Increases in the frequency of reported sharing of Specific We memories as well as closeness to the parent resulted. Mediation analyses of both studies revealed causal relationships among reported sharing of Specific We memories and closeness. We discuss the relevance of these results for understanding the social function of autobiographical memory.
Multiprocessor development for robot control

International Nuclear Information System (INIS)

Lee, Jong Min; Kim, Seung Ho; Hwang, Suk Yeoung; Sohn, Surg Won; Kim, Byung Soo; Kim, Chang Hoi; Lee, Yong Bum; Kim, Woong Ki

1988-12-01

The object of this project is to develop a multiprocessor system which is essential to robot technology. A multiprocessor system interconnecting many single board computer is much faster and flexible than a single processor. The developed multiprocessor will be used to control nuclear mobile robot, so a loosely coupled system is adopted as a robot controller. A total configuration of controller is divided into three main parts in related with its function. It is consisted of supervisory control part, functional control part, remote control part. The designed control system is to be expanded easily for further use with a modular architecture, so the functional independency within sub-systems can be obtained throughout the system structure. Electromagnetic interference affecting to the control system is minimized by using optical fiber as communication media between robot and control system. System performances is enhanced not only by using distributed architecture in hardware, but by adopting real-time, multi-tasking operating system in software. The iRMX86 OS is used and reconfigured for real-time, multi-tasking operation. RS-485 serial communication protocol is used between functional control part and remote control part. Since the developed multiprocessor control system is an essential and fundamental technology for artificial intelligent robot, the result of this project can be applied directly to nuclear mobile robot. (Author)

The ACP (Advanced Computer Program) multiprocessor system at Fermilab

Energy Technology Data Exchange (ETDEWEB)

Nash, T.; Areti, H.; Atac, R.; Biel, J.; Case, G.; Cook, A.; Fischler, M.; Gaines, I.; Hance, R.; Husby, D.

1986-09-01

The Advanced Computer Program at Fermilab has developed a multiprocessor system which is easy to use and uniquely cost effective for many high energy physics problems. The system is based on single board computers which cost under $2000 each to build including 2 Mbytes of on board memory. These standard VME modules each run experiment reconstruction code in Fortran at speeds approaching that of a VAX 11/780. Two versions have been developed: one uses Motorola's 68020 32 bit microprocessor, the other runs with AT and T's 32100. both include the corresponding floating point coprocessor chip. The first system, when fully configured, uses 70 each of the two types of processors. A 53 processor system has been operated for several months with essentially no down time by computer operators in the Fermilab Computer Center, performing at nearly the capacity of 6 CDC Cyber 175 mainframe computers. The VME crates in which the processing ''nodes'' sit are connected via a high speed ''Branch Bus'' to one or more MicroVAX computers which act as hosts handling system resource management and all I/O in offline applications. An interface from Fastbus to the Branch Bus has been developed for online use which has been tested error free at 20 Mbytes/sec for 48 hours. ACP hardware modules are now available commercially. A major package of software, including a simulator that runs on any VAX, has been developed. It allows easy migration of existing programs to this multiprocessor environment. This paper describes the ACP Multiprocessor System and early experience with it at Fermilab and elsewhere.
Embedded multiprocessors scheduling and synchronization

CERN Document Server

Sriram, Sundararajan

2009-01-01

Techniques for Optimizing Multiprocessor Implementations of Signal Processing ApplicationsAn indispensable component of the information age, signal processing is embedded in a variety of consumer devices, including cell phones and digital television, as well as in communication infrastructure, such as media servers and cellular base stations. Multiple programmable processors, along with custom hardware running in parallel, are needed to achieve the computation throughput required of such applications. Reviews important research in key areas related to the multiprocessor implementation of multi
A class Hierarchical, object-oriented approach to virtual memory management

Science.gov (United States)

Russo, Vincent F.; Campbell, Roy H.; Johnston, Gary M.

1989-01-01

The Choices family of operating systems exploits class hierarchies and object-oriented programming to facilitate the construction of customized operating systems for shared memory and networked multiprocessors. The software is being used in the Tapestry laboratory to study the performance of algorithms, mechanisms, and policies for parallel systems. Described here are the architectural design and class hierarchy of the Choices virtual memory management system. The software and hardware mechanisms and policies of a virtual memory system implement a memory hierarchy that exploits the trade-off between response times and storage capacities. In Choices, the notion of a memory hierarchy is captured by abstract classes. Concrete subclasses of those abstractions implement a virtual address space, segmentation, paging, physical memory management, secondary storage, and remote (that is, networked) storage. Captured in the notion of a memory hierarchy are classes that represent memory objects. These classes provide a storage mechanism that contains encapsulated data and have methods to read or write the memory object. Each of these classes provides specializations to represent the memory hierarchy.
Concurrent Operations of O2-Tree on Shared Memory Multicore Architectures

OpenAIRE

Daniel Ohene-Kwofie; E. J. Otoo1, Gideon Nimako

2014-01-01

Modern computer architectures provide high performance computing capability by having multiple CPU cores. Such systems are also typically associated with very large main-memory capacities, thereby allowing them to be used for fast processing of in-memory database applications. However, most of the concurrency control mechanism associated with the index structures of these memory resident databases do not scale well, under high transaction rates. This paper presents the O2-Tree, a fast main me...
Overview of the Scalable Coherent Interface, IEEE STD 1596 (SCI)

International Nuclear Information System (INIS)

Gustavson, D.B.; James, D.V.; Wiggers, H.A.

1992-10-01

The Scalable Coherent Interface standard defines a new generation of interconnection that spans the full range from supercomputer memory 'bus' to campus-wide network. SCI provides bus-like services and a shared-memory software model while using an underlying, packet protocol on many independent communication links. Initially these links are 1 GByte/s (wires) and 1 GBit/s (fiber), but the protocol scales well to future faster or lower-cost technologies. The interconnect may use switches, meshes, and rings. The SCI distributed-shared-memory model is simple and versatile, enabling for the first time a smooth integration of highly parallel multiprocessors, workstations, personal computers, I/O, networking and data acquisition
Algorithm 896: LSA: Algorithms for Large-Scale Optimization

Czech Academy of Sciences Publication Activity Database

Lukšan, Ladislav; Matonoha, Ctirad; Vlček, Jan

2009-01-01

Roč. 36, č. 3 (2009), 16-1-16-29 ISSN 0098-3500 R&D Pro jects: GA AV ČR IAA1030405; GA ČR GP201/06/P397 Institutional research plan: CEZ:AV0Z10300504 Keywords : algorithms * design * large-scale optimization * large-scale nonsmooth optimization * large-scale nonlinear least squares * large-scale nonlinear minimax * large-scale systems of nonlinear equations * sparse pro blems * partially separable pro blems * limited-memory methods * discrete Newton methods * quasi-Newton methods * primal interior-point methods Subject RIV: BB - Applied Statistics, Operational Research Impact factor: 1.904, year: 2009
Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory

KAUST Repository

Pearce, Roger

2010-11-01

Processing large graphs is becoming increasingly important for many domains such as social networks, bioinformatics, etc. Unfortunately, many algorithms and implementations do not scale with increasing graph sizes. As a result, researchers have attempted to meet the growing data demands using parallel and external memory techniques. We present a novel asynchronous approach to compute Breadth-First-Search (BFS), Single-Source-Shortest-Paths, and Connected Components for large graphs in shared memory. Our highly parallel asynchronous approach hides data latency due to both poor locality and delays in the underlying graph data storage. We present an experimental study applying our technique to both In-Memory and Semi-External Memory graphs utilizing multi-core processors and solid-state memory devices. Our experiments using synthetic and real-world datasets show that our asynchronous approach is able to overcome data latencies and provide significant speedup over alternative approaches. For example, on billion vertex graphs our asynchronous BFS scales up to 14x on 16-cores. © 2010 IEEE.
Techniques for Reducing Consistency-Related Communication in Distributed Shared Memory System

OpenAIRE

Zwaenepoel, W; Bennett, J.K.; Carter, J.B.

1995-01-01

Distributed shared memory 8DSM) is an abstraction of shared memory on a distributed memory machine. Hardware DSM systems support this abstraction at the architecture level; software DSM systems support the abstraction within the runtime system. One of the key problems in building an efficient software DSM system is to reduce the amount of communication needed to keep the distributed memories consistent. In this paper we present four techniques for doing so: 1) software release consistency; 2)...
Building a columnar database on shared main memory-based storage

OpenAIRE

Tinnefeld, Christian

2014-01-01

In the field of disk-based parallel database management systems exists a great variety of solutions based on a shared-storage or a shared-nothing architecture. In contrast, main memory-based parallel database management systems are dominated solely by the shared-nothing approach as it preserves the in-memory performance advantage by processing data locally on each server. We argue that this unilateral development is going to cease due to the combination of the following three trends: a) Nowad...
Scaling Law of Urban Ride Sharing

Science.gov (United States)

Tachet, R.; Sagarra, O.; Santi, P.; Resta, G.; Szell, M.; Strogatz, S. H.; Ratti, C.

2017-03-01

Sharing rides could drastically improve the efficiency of car and taxi transportation. Unleashing such potential, however, requires understanding how urban parameters affect the fraction of individual trips that can be shared, a quantity that we call shareability. Using data on millions of taxi trips in New York City, San Francisco, Singapore, and Vienna, we compute the shareability curves for each city, and find that a natural rescaling collapses them onto a single, universal curve. We explain this scaling law theoretically with a simple model that predicts the potential for ride sharing in any city, using a few basic urban quantities and no adjustable parameters. Accurate extrapolations of this type will help planners, transportation companies, and society at large to shape a sustainable path for urban growth.
Benefits of transactive memory systems in large-scale development

OpenAIRE

Aivars, Sablis

2016-01-01

Context. Large-scale software development projects are those consisting of a large number of teams, maybe even spread across multiple locations, and working on large and complex software tasks. That means that neither a team member individually nor an entire team holds all the knowledge about the software being developed and teams have to communicate and coordinate their knowledge. Therefore, teams and team members in large-scale software development projects must acquire and manage expertise...
Exploiting Data Sparsity for Large-Scale Matrix Computations

KAUST Repository

Akbudak, Kadir

2018-02-24

Exploiting data sparsity in dense matrices is an algorithmic bridge between architectures that are increasingly memory-austere on a per-core basis and extreme-scale applications. The Hierarchical matrix Computations on Manycore Architectures (HiCMA) library tackles this challenging problem by achieving significant reductions in time to solution and memory footprint, while preserving a specified accuracy requirement of the application. HiCMA provides a high-performance implementation on distributed-memory systems of one of the most widely used matrix factorization in large-scale scientific applications, i.e., the Cholesky factorization. It employs the tile low-rank data format to compress the dense data-sparse off-diagonal tiles of the matrix. It then decomposes the matrix computations into interdependent tasks and relies on the dynamic runtime system StarPU for asynchronous out-of-order scheduling, while allowing high user-productivity. Performance comparisons and memory footprint on matrix dimensions up to eleven million show a performance gain and memory saving of more than an order of magnitude for both metrics on thousands of cores, against state-of-the-art open-source and vendor optimized numerical libraries. This represents an important milestone in enabling large-scale matrix computations toward solving big data problems in geospatial statistics for climate/weather forecasting applications.
Exploiting Data Sparsity for Large-Scale Matrix Computations

KAUST Repository

Akbudak, Kadir; Ltaief, Hatem; Mikhalev, Aleksandr; Charara, Ali; Keyes, David E.

2018-01-01

Exploiting data sparsity in dense matrices is an algorithmic bridge between architectures that are increasingly memory-austere on a per-core basis and extreme-scale applications. The Hierarchical matrix Computations on Manycore Architectures (HiCMA) library tackles this challenging problem by achieving significant reductions in time to solution and memory footprint, while preserving a specified accuracy requirement of the application. HiCMA provides a high-performance implementation on distributed-memory systems of one of the most widely used matrix factorization in large-scale scientific applications, i.e., the Cholesky factorization. It employs the tile low-rank data format to compress the dense data-sparse off-diagonal tiles of the matrix. It then decomposes the matrix computations into interdependent tasks and relies on the dynamic runtime system StarPU for asynchronous out-of-order scheduling, while allowing high user-productivity. Performance comparisons and memory footprint on matrix dimensions up to eleven million show a performance gain and memory saving of more than an order of magnitude for both metrics on thousands of cores, against state-of-the-art open-source and vendor optimized numerical libraries. This represents an important milestone in enabling large-scale matrix computations toward solving big data problems in geospatial statistics for climate/weather forecasting applications.
Shared Semantics and the Use of Organizational Memories for E-Mail Communications.

Science.gov (United States)

Schwartz, David G.

1998-01-01

Examines the use of shared semantics information to link concepts in an organizational memory to e-mail communications. Presents a framework for determining shared semantics based on organizational and personal user profiles. Illustrates how shared semantics are used by the HyperMail system to help link organizational memories (OM) content to…
Distributed power management of real-time applications on a GALS multiprocessor SOC

NARCIS (Netherlands)

Nelson, Andrew; Goossens, Kees

2015-01-01

It is generally desirable to reduce the power consumption of embedded systems. Dynamic Voltage and Frequency Scaling (DVFS) is a commonly applied technique to achieve power reduction at the cost of computational performance. Multiprocessor System on Chips (MPSoCs) can have multiple voltage and
Highly Scalable Trip Grouping for Large Scale Collective Transportation Systems

DEFF Research Database (Denmark)

Gidofalvi, Gyozo; Pedersen, Torben Bach; Risch, Tore

2008-01-01

Transportation-related problems, like road congestion, parking, and pollution, are increasing in most cities. In order to reduce traffic, recent work has proposed methods for vehicle sharing, for example for sharing cabs by grouping "closeby" cab requests and thus minimizing transportation cost...... and utilizing cab space. However, the methods published so far do not scale to large data volumes, which is necessary to facilitate large-scale collective transportation systems, e.g., ride-sharing systems for large cities. This paper presents highly scalable trip grouping algorithms, which generalize previous...
Scaling Techniques for Massive Scale-Free Graphs in Distributed (External) Memory

KAUST Repository

Pearce, Roger

2013-05-01

We present techniques to process large scale-free graphs in distributed memory. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers and clusters with local non-volatile memory, e.g., NAND Flash. We apply an edge list partitioning technique, designed to accommodate high-degree vertices (hubs) that create scaling challenges when processing scale-free graphs. In addition to partitioning hubs, we use ghost vertices to represent the hubs to reduce communication hotspots. We present a scaling study with three important graph algorithms: Breadth-First Search (BFS), K-Core decomposition, and Triangle Counting. We also demonstrate scalability on BG/P Intrepid by comparing to best known Graph500 results. We show results on two clusters with local NVRAM storage that are capable of traversing trillion-edge scale-free graphs. By leveraging node-local NAND Flash, our approach can process thirty-two times larger datasets with only a 39% performance degradation in Traversed Edges Per Second (TEPS). © 2013 IEEE.
Directions for memory hierarchies and their components: research and development

International Nuclear Information System (INIS)

Smith, A.J.

1978-10-01

The memory hierarchy is usually the largest identifiable part of a computer system and making effective use of it is critical to the operation and use of the system. The levels of such a memory hierarchy are considered and the state of the art and likely directions for both research and development are described. Algorithmic and logical features of the hierarchy not directly associated with specific components are also discussed. Among the problems believed to be the most significant are the following: (a) evaluate the effectiveness of gap filler technology as a level of storage between main memory and disk, and if it proves to be effective, determine how/where it should be used, (b) develop algorithms for the use of mass storage in a large computer system, and (c) determine how cache memories should be implemented in very large, fast multiprocessor systems
A Hybrid Approach to Processing Big Data Graphs on Memory-Restricted Systems

KAUST Repository

Harshvardhan,

2015-05-01

With the advent of big-data, processing large graphs quickly has become increasingly important. Most existing approaches either utilize in-memory processing techniques that can only process graphs that fit completely in RAM, or disk-based techniques that sacrifice performance. In this work, we propose a novel RAM-Disk hybrid approach to graph processing that can scale well from a single shared-memory node to large distributed-memory systems. It works by partitioning the graph into sub graphs that fit in RAM and uses a paging-like technique to load sub graphs. We show that without modifying the algorithms, this approach can scale from small memory-constrained systems (such as tablets) to large-scale distributed machines with 16, 000+ cores.
A dual shared stack for FSLM in Erika Enterprise

NARCIS (Netherlands)

Balasubramanian, S.M.N.; Afshar, S.; Gai, P.; Behnam, M.; Bril, R.J.

2017-01-01

Recently, the flexible spin-lock model (FSLM) has been introduced, unifying spin-based and suspension-based resource sharing protocols for real-time multi-core platforms. Unlike the multiprocessor stack resource policy (MSRP), FSLM doesn’t allow tasks on a core to share a single stack, however. In

Cyclic executive for safety-critical Java on chip-multiprocessors

DEFF Research Database (Denmark)

Ravn, Anders P.; Schoeberl, Martin

2010-01-01

, that uses model checking to find a static schedule, if one exists at all, which gives an implementation of a table driven multiprocessor scheduler. To evaluate the proposed cyclic executive for multiprocessors we have implemented it in the context of safety-critical Java on a Java processor....
Human visual system automatically represents large-scale sequential regularities.

Science.gov (United States)

Kimura, Motohiro; Widmann, Andreas; Schröger, Erich

2010-03-04

Our brain recordings reveal that large-scale sequential regularities defined across non-adjacent stimuli can be automatically represented in visual sensory memory. To show that, we adopted an auditory paradigm developed by Sussman, E., Ritter, W., and Vaughan, H. G. Jr. (1998). Predictability of stimulus deviance and the mismatch negativity. NeuroReport, 9, 4167-4170, Sussman, E., and Gumenyuk, V. (2005). Organization of sequential sounds in auditory memory. NeuroReport, 16, 1519-1523 to the visual domain by presenting task-irrelevant infrequent luminance-deviant stimuli (D, 20%) inserted among task-irrelevant frequent stimuli being of standard luminance (S, 80%) in randomized (randomized condition, SSSDSSSSSDSSSSD...) and fixed manners (fixed condition, SSSSDSSSSDSSSSD...). Comparing the visual mismatch negativity (visual MMN), an event-related brain potential (ERP) index of memory-mismatch processes in human visual sensory system, revealed that visual MMN elicited by deviant stimuli was reduced in the fixed compared to the randomized condition. Thus, the large-scale sequential regularity being present in the fixed condition (SSSSD) must have been represented in visual sensory memory. Interestingly, this effect did not occur in conditions with stimulus-onset asynchronies (SOAs) of 480 and 800 ms but was confined to the 160-ms SOA condition supporting the hypothesis that large-scale regularity extraction was based on perceptual grouping of the five successive stimuli defining the regularity. 2010 Elsevier B.V. All rights reserved.
Peak performance: remote memory revisited

NARCIS (Netherlands)

Mühleisen, H.; Gonçalves, R.; Kersten, M.; Johnson, R.; Kemper, A.

2013-01-01

Many database systems share a need for large amounts of fast storage. However, economies of scale limit the utility of extending a single machine with an arbitrary amount of memory. The recent broad availability of the zero-copy data transfer protocol RDMA over low-latency and high-throughput
A Hybrid Approach to Processing Big Data Graphs on Memory-Restricted Systems

KAUST Repository

Harshvardhan,; West, Brandon; Fidel, Adam; Amato, Nancy M.; Rauchwerger, Lawrence

2015-01-01

that sacrifice performance. In this work, we propose a novel RAM-Disk hybrid approach to graph processing that can scale well from a single shared-memory node to large distributed-memory systems. It works by partitioning the graph into sub graphs that fit in RAM
Group Clustering Mechanism for P2P Large Scale Data Sharing Collaboration

Institute of Scientific and Technical Information of China (English)

DENGQianni; LUXinda; CHENLi

2005-01-01

Research shows that P2P scientific collaboration network will exhibit small-world topology, as do a large number of social networks for which the same pattern has been documented. In this paper we propose a topology building protocol to benefit from the small world feature. We find that the idea of Freenet resembles the dynamic pattern of social interactions in scientific data sharing and the small world characteristic of Freenet is propitious to improve the file locating performance in scientificdata sharing. But the LRU (Least recently used) datas-tore cache replacement scheme of Freenet is not suitableto be used in scientific data sharing network. Based onthe group locality of scientific collaboration, we proposean enhanced group clustering cache replacement scheme.Simulation shows that this scheme improves the request hitratio dramatically while keeping the small average hops per successful request comparable to LRU.
Multiprocessor programming environment

Energy Technology Data Exchange (ETDEWEB)

Smith, M.B.; Fornaro, R.

1988-12-01

Programming tools and techniques have been well developed for traditional uniprocessor computer systems. The focus of this research project is on the development of a programming environment for a high speed real time heterogeneous multiprocessor system, with special emphasis on languages and compilers. The new tools and techniques will allow a smooth transition for programmers with experience only on single processor systems.
System-Level Design Methodologies for Networked Multiprocessor Systems-on-Chip

DEFF Research Database (Denmark)

Virk, Kashif Munir

2008-01-01

is the first such attempt in the published literature. The second part of the thesis deals with the issues related to the development of system-level design methodologies for networked multiprocessor systems-on-chip at various levels of design abstraction with special focus on the modeling and design...... at the system-level. The multiprocessor modeling framework is then extended to include models of networked multiprocessor systems-on-chip which is then employed to model wireless sensor networks both at the sensor node level as well as the wireless network level. In the third and the final part, the thesis...... to the transaction-level model. The thesis, as a whole makes contributions by describing a design methodology for networked multiprocessor embedded systems at three layers of abstraction from system-level through transaction-level to the cycle accurate level as well as demonstrating it practically by implementing...
Memory Transmission in Small Groups and Large Networks: An Agent-Based Model.

Science.gov (United States)

Luhmann, Christian C; Rajaram, Suparna

2015-12-01

The spread of social influence in large social networks has long been an interest of social scientists. In the domain of memory, collaborative memory experiments have illuminated cognitive mechanisms that allow information to be transmitted between interacting individuals, but these experiments have focused on small-scale social contexts. In the current study, we took a computational approach, circumventing the practical constraints of laboratory paradigms and providing novel results at scales unreachable by laboratory methodologies. Our model embodied theoretical knowledge derived from small-group experiments and replicated foundational results regarding collaborative inhibition and memory convergence in small groups. Ultimately, we investigated large-scale, realistic social networks and found that agents are influenced by the agents with which they interact, but we also found that agents are influenced by nonneighbors (i.e., the neighbors of their neighbors). The similarity between these results and the reports of behavioral transmission in large networks offers a major theoretical insight by linking behavioral transmission to the spread of information. © The Author(s) 2015.
On the Scalability of Time-predictable Chip-Multiprocessing

DEFF Research Database (Denmark)

Puffitsch, Wolfgang; Schoeberl, Martin

2012-01-01

Real-time systems need a time-predictable execution platform to be able to determine the worst-case execution time statically. In order to be time-predictable, several advanced processor features, such as out-of-order execution and other forms of speculation, have to be avoided. However, just using...... simple processors is not an option for embedded systems with high demands on computing power. In order to provide high performance and predictability we argue to use multiprocessor systems with a time-predictable memory interface. In this paper we present the scalability of a Java chip......-multiprocessor system that is designed to be time-predictable. Adding time-predictable caches is mandatory to achieve scalability with a shared memory multi-processor system. As Java bytecode retains information about the nature of memory accesses, it is possible to implement a memory hierarchy that takes...
Using a commercial symmetric multiprocessor for lattice QCD

International Nuclear Information System (INIS)

Brower, R.C.; Chen, D.; Negele, J.W.

1998-01-01

In its evolution, the computer industry has reached the point when considerable computing power can be packaged on a single microprocessor chip. At the same time, costs of designing a computer system around such a CPU are growing. For these reasons we decided to explore a possibility of using commercially available symmetric multiprocessors (SMP) as building blocks for the LQCD computer. Careful analysis of the architecture allowed us to build a QCD primitive library running close to the peak performance on the UltraSPARC processor. As a result, multithreaded QCD code (both the heatbath and the Wilson fermion inverter) runs at about 50% efficiency on a single SMP. The communication between different CPUs is handled by a coherent memory system. Currently we are planning to connect several SMPs with a high bandwidth network into a single system. (orig.)
Penalized Estimation in Large-Scale Generalized Linear Array Models

DEFF Research Database (Denmark)

Lund, Adam; Vincent, Martin; Hansen, Niels Richard

2017-01-01

Large-scale generalized linear array models (GLAMs) can be challenging to fit. Computation and storage of its tensor product design matrix can be impossible due to time and memory constraints, and previously considered design matrix free algorithms do not scale well with the dimension...
Realtime multiprocessor for mobile ad hoc networks

Directory of Open Access Journals (Sweden)

T. Jungeblut

2008-05-01

Full Text Available This paper introduces a real-time Multiprocessor System-On-Chip (MPSoC for low power wireless applications. The multiprocessor is based on eight 32bit RISC processors that are connected via an Network-On-Chip (NoC. The NoC follows a novel approach with guaranteed bandwidth to the application that meets hard realtime requirements. At a clock frequency of 100 MHz the total power consumption of the MPSoC that has been fabricated in 180 nm UMC standard cell technology is 772 mW.
Best Speed Fit EDF Scheduling for Performance Asymmetric Multiprocessors

Directory of Open Access Journals (Sweden)

Peng Wu

2017-01-01

Full Text Available In order to improve the performance of a real-time system, asymmetric multiprocessors have been proposed. The benefits of improved system performance and reduced power consumption from such architectures cannot be fully exploited unless suitable task scheduling and task allocation approaches are implemented at the operating system level. Unfortunately, most of the previous research on scheduling algorithms for performance asymmetric multiprocessors is focused on task priority assignment. They simply assign the highest priority task to the fastest processor. In this paper, we propose BSF-EDF (best speed fit for earliest deadline first for performance asymmetric multiprocessor scheduling. This approach chooses a suitable processor rather than the fastest one, when allocating tasks. With this proposed BSF-EDF scheduling, we also derive an effective schedulability test.
Ring interconnection for distributed memory automation and computing system

Energy Technology Data Exchange (ETDEWEB)

Vinogradov, V I [Inst. for Nuclear Research of the Russian Academy of Sciences, Moscow (Russian Federation)

1996-12-31

Problems of development of measurement, acquisition and central systems based on a distributed memory and a ring interface are discussed. It has been found that the RAM LINK-type protocol can be used for ringlet links in non-symmetrical distributed memory architecture multiprocessor system interaction. 5 refs.
Multiprocessor Priority Ceiling Emulation for Safety-Critical Java

DEFF Research Database (Denmark)

Strøm, Torur Biskopstø; Schoeberl, Martin

2015-01-01

Priority ceiling emulation has preferable properties on uniprocessor systems, such as avoiding priority inversion and being deadlock free. This has made it a popular locking protocol. According to the safety-critical Java specication, priority ceiling emulation is a requirement for implementations....... However, implementing the protocol for multiprocessor systemsis more complex so implementations might perform worse than non-preemptive implementations. In this paper we compare two multiprocessor lock implementations with hardware support for the Java optimized processor: non-preemptive locking...
Implementation and Performance of Munin

OpenAIRE

Bennett, J.K.; Carter, J.B.; Zwaenepoel, W

1991-01-01

Munin is a distributed shared memory (DSM) system that allows shared memory parallel programs to be executed efficiently on distributed memory multiprocessors. Munin is unique among existing DSM systems in its use of multiple consistency protocols and in its use of release consistency. In Munin, shared program variables are annotated with their expected access pattern, and these annotations are then used by the runtime system to choose a consistency protocol best suited to that access patt...
Safety-critical Java with cyclic executives on chip-multiprocessors

DEFF Research Database (Denmark)

Ravn, Anders P.; Schoeberl, Martin

2012-01-01

Chip-multiprocessors offer increased processing power at a low cost. However, in order to use them for real-time systems, tasks have to be scheduled efficiently and predictably. It is well known that finding optimal schedules is a computationally hard problem. In this paper we present a solution ...... for multiprocessors, we have implemented it in the context of safety-critical Java on a Java processor....
Embedded software design and programming of multiprocessor system-on-chip simulink and system C case studies

CERN Document Server

Popovici, Katalin; Jerraya, Ahmed A; Wolf, Marilyn

2010-01-01

Current multimedia and telecom applications require complex, heterogeneous multiprocessor system on chip (MPSoC) architectures with specific communication infrastructure in order to achieve the required performance. Heterogeneous MPSoC includes different types of processing units (DSP, microcontroller, ASIP) and different communication schemes (fast links, non standard memory organization and access).Programming an MPSoC requires the generation of efficient software running on MPSoC from a high level environment, by using the characteristics of the architecture. This task is known to be tediou
Design of Networks-on-Chip for Real-Time Multi-Processor Systems-on-Chip

DEFF Research Database (Denmark)

Sparsø, Jens

2012-01-01

This paper addresses the design of networks-on-chips for use in multi-processor systems-on-chips - the hardware platforms used in embedded systems. These platforms typically have to guarantee real-time properties, and as the network is a shared resource, it has to provide service guarantees...... (bandwidth and/or latency) to different communication flows. The paper reviews some past work in this field and the lessons learned, and the paper discusses ongoing research conducted as part of the project "Time-predictable Multi-Core Architecture for Embedded Systems" (T-CREST), supported by the European...
Dataflow models for shared memory access latency analysis

NARCIS (Netherlands)

Staschulat, Jan; Bekooij, Marco Jan Gerrit

2009-01-01

Performance analysis of applications in multi-core platforms is challenging because of temporal interference while accessing shared resources. Especially, memory arbiters introduce a non-constant delay which signicantly in uences the execution time of a task. In this paper, we selected a

Working memory training mostly engages general-purpose large-scale networks for learning.

Science.gov (United States)

Salmi, Juha; Nyberg, Lars; Laine, Matti

2018-03-21

The present meta-analytic study examined brain activation changes following working memory (WM) training, a form of cognitive training that has attracted considerable interest. Comparisons with perceptual-motor (PM) learning revealed that WM training engages domain-general large-scale networks for learning encompassing the dorsal attention and salience networks, sensory areas, and striatum. Also the dynamics of the training-induced brain activation changes within these networks showed a high overlap between WM and PM training. The distinguishing feature for WM training was the consistent modulation of the dorso- and ventrolateral prefrontal cortex (DLPFC/VLPFC) activity. The strongest candidate for mediating transfer to similar untrained WM tasks was the frontostriatal system, showing higher striatal and VLPFC activations, and lower DLPFC activations after training. Modulation of transfer-related areas occurred mostly with longer training periods. Overall, our findings place WM training effects into a general perception-action cycle, where some modulations may depend on the specific cognitive demands of a training task. Copyright © 2018 Elsevier Ltd. All rights reserved.
Estimating Performance of Single Bus, Shared Memory Multiprocessors

Science.gov (United States)

1987-05-01

Chandy78] K.M. Chandy, C.M. Sauer, "Approximate methods for analyzing queuing network models of computing systems," Computing Surveys, vol10 , no 3...Denning78] P. Denning, J. Buzen, "The operational analysis of queueing network models", Computing Sur- veys, vol10 , no 3, September 1978, pp 225-261
Implementing Shared Memory Parallelism in MCBEND

Directory of Open Access Journals (Sweden)

Bird Adam

2017-01-01

Full Text Available MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers’s ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
Efficient implementations of block sparse matrix operations on shared memory vector machines

International Nuclear Information System (INIS)

Washio, T.; Maruyama, K.; Osoda, T.; Doi, S.; Shimizu, F.

2000-01-01

In this paper, we propose vectorization and shared memory-parallelization techniques for block-type random sparse matrix operations in finite element (FEM) applications. Here, a block corresponds to unknowns on one node in the FEM mesh and we assume that the block size is constant over the mesh. First, we discuss some basic vectorization ideas (the jagged diagonal (JAD) format and the segmented scan algorithm) for the sparse matrix-vector product. Then, we extend these ideas to the shared memory parallelization. After that, we show that the techniques can be applied not only to the sparse matrix-vector product but also to the sparse matrix-matrix product, the incomplete or complete sparse LU factorization and preconditioning. Finally, we report the performance evaluation results obtained on an NEC SX-4 shared memory vector machine for linear systems in some FEM applications. (author)
Network Partitioning Domain Knowledge Multiobjective Application Mapping for Large-Scale Network-on-Chip

Directory of Open Access Journals (Sweden)

Yin Zhen Tei

2014-01-01

Full Text Available This paper proposes a multiobjective application mapping technique targeted for large-scale network-on-chip (NoC. As the number of intellectual property (IP cores in multiprocessor system-on-chip (MPSoC increases, NoC application mapping to find optimum core-to-topology mapping becomes more challenging. Besides, the conflicting cost and performance trade-off makes multiobjective application mapping techniques even more complex. This paper proposes an application mapping technique that incorporates domain knowledge into genetic algorithm (GA. The initial population of GA is initialized with network partitioning (NP while the crossover operator is guided with knowledge on communication demands. NP reduces the large-scale application mapping complexity and provides GA with a potential mapping search space. The proposed genetic operator is compared with state-of-the-art genetic operators in terms of solution quality. In this work, multiobjective optimization of energy and thermal-balance is considered. Through simulation, knowledge-based initial mapping shows significant improvement in Pareto front compared to random initial mapping that is widely used. The proposed knowledge-based crossover also shows better Pareto front compared to state-of-the-art knowledge-based crossover.
An interactive display system for large-scale 3D models

Science.gov (United States)

Liu, Zijian; Sun, Kun; Tao, Wenbing; Liu, Liman

2018-04-01

With the improvement of 3D reconstruction theory and the rapid development of computer hardware technology, the reconstructed 3D models are enlarging in scale and increasing in complexity. Models with tens of thousands of 3D points or triangular meshes are common in practical applications. Due to storage and computing power limitation, it is difficult to achieve real-time display and interaction with large scale 3D models for some common 3D display software, such as MeshLab. In this paper, we propose a display system for large-scale 3D scene models. We construct the LOD (Levels of Detail) model of the reconstructed 3D scene in advance, and then use an out-of-core view-dependent multi-resolution rendering scheme to realize the real-time display of the large-scale 3D model. With the proposed method, our display system is able to render in real time while roaming in the reconstructed scene and 3D camera poses can also be displayed. Furthermore, the memory consumption can be significantly decreased via internal and external memory exchange mechanism, so that it is possible to display a large scale reconstructed scene with over millions of 3D points or triangular meshes in a regular PC with only 4GB RAM.
Quantitative Performance Analysis of the SPEC OMPM2001 Benchmarks

Directory of Open Access Journals (Sweden)

Vishal Aslot

2003-01-01

Full Text Available The state of modern computer systems has evolved to allow easy access to multiprocessor systems by supporting multiple processors on a single physical package. As the multiprocessor hardware evolves, new ways of programming it are also developed. Some inventions may merely be adopting and standardizing the older paradigms. One such evolving standard for programming shared-memory parallel computers is the OpenMP API. The Standard Performance Evaluation Corporation (SPEC has created a suite of parallel programs called SPEC OMP to compare and evaluate modern shared-memory multiprocessor systems using the OpenMP standard. We have studied these benchmarks in detail to understand their performance on a modern architecture. In this paper, we present detailed measurements of the benchmarks. We organize, summarize, and display our measurements using a Quantitative Model. We present a detailed discussion and derivation of the model. Also, we discuss the important loops in the SPEC OMPM2001 benchmarks and the reasons for less than ideal speedup on our platform.
Performance modeling of parallel algorithms for solving neutron diffusion problems

International Nuclear Information System (INIS)

Azmy, Y.Y.; Kirk, B.L.

1995-01-01

Neutron diffusion calculations are the most common computational methods used in the design, analysis, and operation of nuclear reactors and related activities. Here, mathematical performance models are developed for the parallel algorithm used to solve the neutron diffusion equation on message passing and shared memory multiprocessors represented by the Intel iPSC/860 and the Sequent Balance 8000, respectively. The performance models are validated through several test problems, and these models are used to estimate the performance of each of the two considered architectures in situations typical of practical applications, such as fine meshes and a large number of participating processors. While message passing computers are capable of producing speedup, the parallel efficiency deteriorates rapidly as the number of processors increases. Furthermore, the speedup fails to improve appreciably for massively parallel computers so that only small- to medium-sized message passing multiprocessors offer a reasonable platform for this algorithm. In contrast, the performance model for the shared memory architecture predicts very high efficiency over a wide range of number of processors reasonable for this architecture. Furthermore, the model efficiency of the Sequent remains superior to that of the hypercube if its model parameters are adjusted to make its processors as fast as those of the iPSC/860. It is concluded that shared memory computers are better suited for this parallel algorithm than message passing computers
Dissecting the large-scale galactic conformity

Science.gov (United States)

Seo, Seongu

2018-01-01

Galactic conformity is an observed phenomenon that galaxies located in the same region have similar properties such as star formation rate, color, gas fraction, and so on. The conformity was first observed among galaxies within in the same halos (“one-halo conformity”). The one-halo conformity can be readily explained by mutual interactions among galaxies within a halo. Recent observations however further witnessed a puzzling connection among galaxies with no direct interaction. In particular, galaxies located within a sphere of ~5 Mpc radius tend to show similarities, even though the galaxies do not share common halos with each other ("two-halo conformity" or “large-scale conformity”). Using a cosmological hydrodynamic simulation, Illustris, we investigate the physical origin of the two-halo conformity and put forward two scenarios. First, back-splash galaxies are likely responsible for the large-scale conformity. They have evolved into red galaxies due to ram-pressure stripping in a given galaxy cluster and happen to reside now within a ~5 Mpc sphere. Second, galaxies in strong tidal field induced by large-scale structure also seem to give rise to the large-scale conformity. The strong tides suppress star formation in the galaxies. We discuss the importance of the large-scale conformity in the context of galaxy evolution.
Runtime adaptive multi-processor system-on-chip: RAMPSoC

OpenAIRE

Göhringer, D.; Hübner, M.; Schatz, V.; Becker, J.

2008-01-01

Current trends in high performance computing show, that the usage of multiprocessor systems on chip are one approach for the requirements of computing intensive applications. The multiprocessor system on chip (MPSoC) approaches often provide a static and homogeneous infrastructure of networked microprocessor on the chip die. A novel idea in this research area is to introduce the dynamic adaptivity of reconfigurable hardware in order to provide a flexible heterogeneous set of processing elemen...
Development of the Wechsler Memory Scale--Revised.

Science.gov (United States)

Herman, David O.; Young, Laura

A study involving a sample of people selected to represent the nonimpaired American population, aged 16 to 74 years, was undertaken to determine the effectiveness of the Wechsler Memory Scale--Revised. The scale's subtests were designed to assess memory of personal and general knowledge, logical memory, verbal paired association, figural memory,…
Investigating Solution Convergence in a Global Ocean Model Using a 2048-Processor Cluster of Distributed Shared Memory Machines

Directory of Open Access Journals (Sweden)

Chris Hill

2007-01-01

Full Text Available Up to 1920 processors of a cluster of distributed shared memory machines at the NASA Ames Research Center are being used to simulate ocean circulation globally at horizontal resolutions of 1/4, 1/8, and 1/16-degree with the Massachusetts Institute of Technology General Circulation Model, a finite volume code that can scale to large numbers of processors. The study aims to understand physical processes responsible for skill improvements as resolution is increased and to gain insight into what resolution is sufficient for particular purposes. This paper focuses on the computational aspects of reaching the technical objective of efficiently performing these global eddy-resolving ocean simulations. At 1/16-degree resolution the model grid contains 1.2 billion cells. At this resolution it is possible to simulate approximately one month of ocean dynamics in about 17 hours of wallclock time with a model timestep of two minutes on a cluster of four 512-way NUMA Altix systems. The Altix systems' large main memory and I/O subsystems allow computation and disk storage of rich sets of diagnostics during each integration, supporting the scientific objective to develop a better understanding of global ocean circulation model solution convergence as model resolution is increased.
Parallel External Memory Graph Algorithms

DEFF Research Database (Denmark)

Arge, Lars Allan; Goodrich, Michael T.; Sitchinava, Nodari

2010-01-01

In this paper, we study parallel I/O efficient graph algorithms in the Parallel External Memory (PEM) model, one o f the private-cache chip multiprocessor (CMP) models. We study the fundamental problem of list ranking which leads to efficient solutions to problems on trees, such as computing lowest...... an optimal speedup of Â¿(P) in parallel I/O complexity and parallel computation time, compared to the single-processor external memory counterparts....
Large-Scale Image Analytics Using Deep Learning

Science.gov (United States)

Ganguly, S.; Nemani, R. R.; Basu, S.; Mukhopadhyay, S.; Michaelis, A.; Votava, P.

2014-12-01

High resolution land cover classification maps are needed to increase the accuracy of current Land ecosystem and climate model outputs. Limited studies are in place that demonstrates the state-of-the-art in deriving very high resolution (VHR) land cover products. In addition, most methods heavily rely on commercial softwares that are difficult to scale given the region of study (e.g. continents to globe). Complexities in present approaches relate to (a) scalability of the algorithm, (b) large image data processing (compute and memory intensive), (c) computational cost, (d) massively parallel architecture, and (e) machine learning automation. In addition, VHR satellite datasets are of the order of terabytes and features extracted from these datasets are of the order of petabytes. In our present study, we have acquired the National Agricultural Imaging Program (NAIP) dataset for the Continental United States at a spatial resolution of 1-m. This data comes as image tiles (a total of quarter million image scenes with ~60 million pixels) and has a total size of ~100 terabytes for a single acquisition. Features extracted from the entire dataset would amount to ~8-10 petabytes. In our proposed approach, we have implemented a novel semi-automated machine learning algorithm rooted on the principles of "deep learning" to delineate the percentage of tree cover. In order to perform image analytics in such a granular system, it is mandatory to devise an intelligent archiving and query system for image retrieval, file structuring, metadata processing and filtering of all available image scenes. Using the Open NASA Earth Exchange (NEX) initiative, which is a partnership with Amazon Web Services (AWS), we have developed an end-to-end architecture for designing the database and the deep belief network (following the distbelief computing model) to solve a grand challenge of scaling this process across quarter million NAIP tiles that cover the entire Continental United States. The
Large-Scale Unsupervised Hashing with Shared Structure Learning.

Science.gov (United States)

Liu, Xianglong; Mu, Yadong; Zhang, Danchen; Lang, Bo; Li, Xuelong

2015-09-01

Hashing methods are effective in generating compact binary signatures for images and videos. This paper addresses an important open issue in the literature, i.e., how to learn compact hash codes by enhancing the complementarity among different hash functions. Most of prior studies solve this problem either by adopting time-consuming sequential learning algorithms or by generating the hash functions which are subject to some deliberately-designed constraints (e.g., enforcing hash functions orthogonal to one another). We analyze the drawbacks of past works and propose a new solution to this problem. Our idea is to decompose the feature space into a subspace shared by all hash functions and its complementary subspace. On one hand, the shared subspace, corresponding to the common structure across different hash functions, conveys most relevant information for the hashing task. Similar to data de-noising, irrelevant information is explicitly suppressed during hash function generation. On the other hand, in case that the complementary subspace also contains useful information for specific hash functions, the final form of our proposed hashing scheme is a compromise between these two kinds of subspaces. To make hash functions not only preserve the local neighborhood structure but also capture the global cluster distribution of the whole data, an objective function incorporating spectral embedding loss, binary quantization loss, and shared subspace contribution is introduced to guide the hash function learning. We propose an efficient alternating optimization method to simultaneously learn both the shared structure and the hash functions. Experimental results on three well-known benchmarks CIFAR-10, NUS-WIDE, and a-TRECVID demonstrate that our approach significantly outperforms state-of-the-art hashing methods.
Working Memory Span Development: A Time-Based Resource-Sharing Model Account

Science.gov (United States)

Barrouillet, Pierre; Gavens, Nathalie; Vergauwe, Evie; Gaillard, Vinciane; Camos, Valerie

2009-01-01

The time-based resource-sharing model (P. Barrouillet, S. Bernardin, & V. Camos, 2004) assumes that during complex working memory span tasks, attention is frequently and surreptitiously switched from processing to reactivate decaying memory traces before their complete loss. Three experiments involving children from 5 to 14 years of age…
Concurrent Operations of O2-Tree on Shared Memory Multicore Architectures

Directory of Open Access Journals (Sweden)

Daniel Ohene-Kwofie

2014-05-01

Full Text Available Modern computer architectures provide high performance computing capability by having multiple CPU cores. Such systems are also typically associated with very large main-memory capacities, thereby allowing them to be used for fast processing of in-memory database applications. However, most of the concurrency control mechanism associated with the index structures of these memory resident databases do not scale well, under high transaction rates. This paper presents the O2-Tree, a fast main memory resident index, which is also highly scalable and tolerant of high transaction rates in a concurrent environment using the relaxed balancing tree algorithm. The O2-Tree is a modified Red-Black tree in which the leaf nodes are formed into blocks that hold key-value pairs, while each internal node stores a single key that results from splitting leaf nodes. Multi-threaded concurrent manipulation of the O2-Tree outperforms popular NoSQL based key-value stores considered in this paper.
Analysis and Optimisation of Hierarchically Scheduled Multiprocessor Embedded Systems

DEFF Research Database (Denmark)

Pop, Traian; Pop, Paul; Eles, Petru

2008-01-01

We present an approach to the analysis and optimisation of heterogeneous multiprocessor embedded systems. The systems are heterogeneous not only in terms of hardware components, but also in terms of communication protocols and scheduling policies. When several scheduling policies share a resource......, they are organised in a hierarchy. In this paper, we first develop a holistic scheduling and schedulability analysis that determines the timing properties of a hierarchically scheduled system. Second, we address design problems that are characteristic to such hierarchically scheduled systems: assignment...... of scheduling policies to tasks, mapping of tasks to hardware components, and the scheduling of the activities. We also present several algorithms for solving these problems. Our heuristics are able to find schedulable implementations under limited resources, achieving an efficient utilisation of the system...
A Multiprocessor Operating System Simulator

Science.gov (United States)

Johnston, Gary M.; Campbell, Roy H.

1988-01-01

This paper describes a multiprocessor operating system simulator that was developed by the authors in the Fall semester of 1987. The simulator was built in response to the need to provide students with an environment in which to build and test operating system concepts as part of the coursework of a third-year undergraduate operating systems course. Written in C++, the simulator uses the co-routine style task package that is distributed with the AT&T C++ Translator to provide a hierarchy of classes that represents a broad range of operating system software and hardware components. The class hierarchy closely follows that of the 'Choices' family of operating systems for loosely- and tightly-coupled multiprocessors. During an operating system course, these classes are refined and specialized by students in homework assignments to facilitate experimentation with different aspects of operating system design and policy decisions. The current implementation runs on the IBM RT PC under 4.3bsd UNIX.
Energy-efficient fault tolerance in multiprocessor real-time systems

Science.gov (United States)

Guo, Yifeng

The recent progress in the multiprocessor/multicore systems has important implications for real-time system design and operation. From vehicle navigation to space applications as well as industrial control systems, the trend is to deploy multiple processors in real-time systems: systems with 4 -- 8 processors are common, and it is expected that many-core systems with dozens of processing cores will be available in near future. For such systems, in addition to general temporal requirement common for all real-time systems, two additional operational objectives are seen as critical: energy efficiency and fault tolerance. An intriguing dimension of the problem is that energy efficiency and fault tolerance are typically conflicting objectives, due to the fact that tolerating faults (e.g., permanent/transient) often requires extra resources with high energy consumption potential. In this dissertation, various techniques for energy-efficient fault tolerance in multiprocessor real-time systems have been investigated. First, the Reliability-Aware Power Management (RAPM) framework, which can preserve the system reliability with respect to transient faults when Dynamic Voltage Scaling (DVS) is applied for energy savings, is extended to support parallel real-time applications with precedence constraints. Next, the traditional Standby-Sparing (SS) technique for dual processor systems, which takes both transient and permanent faults into consideration while saving energy, is generalized to support multiprocessor systems with arbitrary number of identical processors. Observing the inefficient usage of slack time in the SS technique, a Preference-Oriented Scheduling Framework is designed to address the problem where tasks are given preferences for being executed as soon as possible (ASAP) or as late as possible (ALAP). A preference-oriented earliest deadline (POED) scheduler is proposed and its application in multiprocessor systems for energy-efficient fault tolerance is

A system-level multiprocessor system-on-chip modeling framework

DEFF Research Database (Denmark)

Virk, Kashif Munir; Madsen, Jan

2004-01-01

We present a system-level modeling framework to model system-on-chips (SoC) consisting of heterogeneous multiprocessors and network-on-chip communication structures in order to enable the developers of today's SoC designs to take advantage of the flexibility and scalability of network-on-chip and...... SoC design. We show how a hand-held multimedia terminal, consisting of JPEG, MP3 and GSM applications, can be modeled as a multiprocessor SoC in our framework....
A Performance Evaluation of the Hemingway DSM System on a Network of SMPs

National Research Council Canada - National Science Library

Aggarwal, Anshu; Grumwald, Dirk

1997-01-01

.... In this paper we investigate the performance of a software distributed shared memory system, Hemingway, which is built out of such multiprocessor workstations, utilizing off-the-shelf communication networks...
Multi-processor developments in the United States for future high energy physics experiments and accelerators

International Nuclear Information System (INIS)

Gaines, I.

1988-03-01

The use of multi-processors for analysis and high-level triggering in High Energy Physics experiments, pioneered by the early emulator systems, has reached maturity, in particular with the multiple microprocessor systems in use at Fermilab. It is widely acknowledged that such systems will fulfill the major portion of the computing needs of future large experiments. Recent developments at Fermilab's Advanced Computer Program will make such systems even more powerful, cost-effective, and easier to use than they are at present. The next generation of microprocessors, already available, will provide CPU power of about one VAX 780 equivalent/$300, while supporting most VMS FORTRAN extensions and large (>8MB) amounts of memory. Low cost high density mass storage devices (based on video tape cartridge technology) will allow parallel I/O to remove potential I/O bottlenecks in systems of over 1000 VAX equipment processors. New interconnection schemes and system software will allow more flexible topologies and extremely high data bandwidth, especially for on-line systems. This talk will summarize the work at the Advanced Computer Program and the rest of the US in this field. 3 refs., 4 figs
Operating system for a real-time multiprocessor propulsion system simulator. User's manual

Science.gov (United States)

Cole, G. L.

1985-01-01

The NASA Lewis Research Center is developing and evaluating experimental hardware and software systems to help meet future needs for real-time, high-fidelity simulations of air-breathing propulsion systems. Specifically, the real-time multiprocessor simulator project focuses on the use of multiple microprocessors to achieve the required computing speed and accuracy at relatively low cost. Operating systems for such hardware configurations are generally not available. A real time multiprocessor operating system (RTMPOS) that supports a variety of multiprocessor configurations was developed at Lewis. With some modification, RTMPOS can also support various microprocessors. RTMPOS, by means of menus and prompts, provides the user with a versatile, user-friendly environment for interactively loading, running, and obtaining results from a multiprocessor-based simulator. The menu functions are described and an example simulation session is included to demonstrate the steps required to go from the simulation loading phase to the execution phase.
Brain Information Sharing During Visual Short-Term Memory Binding Yields a Memory Biomarker for Familial Alzheimer's Disease.

Science.gov (United States)

Parra, Mario A; Mikulan, Ezequiel; Trujillo, Natalia; Sala, Sergio Della; Lopera, Francisco; Manes, Facundo; Starr, John; Ibanez, Agustin

2017-01-01

Alzheimer's disease (AD) as a disconnection syndrome which disrupts both brain information sharing and memory binding functions. The extent to which these two phenotypic expressions share pathophysiological mechanisms remains unknown. To unveil the electrophysiological correlates of integrative memory impairments in AD towards new memory biomarkers for its prodromal stages. Patients with 100% risk of familial AD (FAD) and healthy controls underwent assessment with the Visual Short-Term Memory binding test (VSTMBT) while we recorded their EEG. We applied a novel brain connectivity method (Weighted Symbolic Mutual Information) to EEG data. Patients showed significant deficits during the VSTMBT. A reduction of brain connectivity was observed during resting as well as during correct VSTM binding, particularly over frontal and posterior regions. An increase of connectivity was found during VSTM binding performance over central regions. While decreased connectivity was found in cases in more advanced stages of FAD, increased brain connectivity appeared in cases in earlier stages. Such altered patterns of task-related connectivity were found in 89% of the assessed patients. VSTM binding in the prodromal stages of FAD are associated to altered patterns of brain connectivity thus confirming the link between integrative memory deficits and impaired brain information sharing in prodromal FAD. While significant loss of brain connectivity seems to be a feature of the advanced stages of FAD increased brain connectivity characterizes its earlier stages. These findings are discussed in the light of recent proposals about the earliest pathophysiological mechanisms of AD and their clinical expression. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Data management strategies for multinational large-scale systems biology projects.

Science.gov (United States)

Wruck, Wasco; Peuker, Martin; Regenbrecht, Christian R A

2014-01-01

Good accessibility of publicly funded research data is essential to secure an open scientific system and eventually becomes mandatory [Wellcome Trust will Penalise Scientists Who Don't Embrace Open Access. The Guardian 2012]. By the use of high-throughput methods in many research areas from physics to systems biology, large data collections are increasingly important as raw material for research. Here, we present strategies worked out by international and national institutions targeting open access to publicly funded research data via incentives or obligations to share data. Funding organizations such as the British Wellcome Trust therefore have developed data sharing policies and request commitment to data management and sharing in grant applications. Increased citation rates are a profound argument for sharing publication data. Pre-publication sharing might be rewarded by a data citation credit system via digital object identifiers (DOIs) which have initially been in use for data objects. Besides policies and incentives, good practice in data management is indispensable. However, appropriate systems for data management of large-scale projects for example in systems biology are hard to find. Here, we give an overview of a selection of open-source data management systems proved to be employed successfully in large-scale projects.
Context-dependent encoding of fear and extinction memories in a large-scale network model of the basal amygdala.

Science.gov (United States)

Vlachos, Ioannis; Herry, Cyril; Lüthi, Andreas; Aertsen, Ad; Kumar, Arvind

2011-03-01

The basal nucleus of the amygdala (BA) is involved in the formation of context-dependent conditioned fear and extinction memories. To understand the underlying neural mechanisms we developed a large-scale neuron network model of the BA, composed of excitatory and inhibitory leaky-integrate-and-fire neurons. Excitatory BA neurons received conditioned stimulus (CS)-related input from the adjacent lateral nucleus (LA) and contextual input from the hippocampus or medial prefrontal cortex (mPFC). We implemented a plasticity mechanism according to which CS and contextual synapses were potentiated if CS and contextual inputs temporally coincided on the afferents of the excitatory neurons. Our simulations revealed a differential recruitment of two distinct subpopulations of BA neurons during conditioning and extinction, mimicking the activation of experimentally observed cell populations. We propose that these two subgroups encode contextual specificity of fear and extinction memories, respectively. Mutual competition between them, mediated by feedback inhibition and driven by contextual inputs, regulates the activity in the central amygdala (CEA) thereby controlling amygdala output and fear behavior. The model makes multiple testable predictions that may advance our understanding of fear and extinction memories.
Context-dependent encoding of fear and extinction memories in a large-scale network model of the basal amygdala.

Directory of Open Access Journals (Sweden)

Ioannis Vlachos

2011-03-01

Full Text Available The basal nucleus of the amygdala (BA is involved in the formation of context-dependent conditioned fear and extinction memories. To understand the underlying neural mechanisms we developed a large-scale neuron network model of the BA, composed of excitatory and inhibitory leaky-integrate-and-fire neurons. Excitatory BA neurons received conditioned stimulus (CS-related input from the adjacent lateral nucleus (LA and contextual input from the hippocampus or medial prefrontal cortex (mPFC. We implemented a plasticity mechanism according to which CS and contextual synapses were potentiated if CS and contextual inputs temporally coincided on the afferents of the excitatory neurons. Our simulations revealed a differential recruitment of two distinct subpopulations of BA neurons during conditioning and extinction, mimicking the activation of experimentally observed cell populations. We propose that these two subgroups encode contextual specificity of fear and extinction memories, respectively. Mutual competition between them, mediated by feedback inhibition and driven by contextual inputs, regulates the activity in the central amygdala (CEA thereby controlling amygdala output and fear behavior. The model makes multiple testable predictions that may advance our understanding of fear and extinction memories.
Graphical Visualization on Computational Simulation Using Shared Memory

International Nuclear Information System (INIS)

Lima, A B; Correa, Eberth

2014-01-01

The Shared Memory technique is a powerful tool for parallelizing computer codes. In particular it can be used to visualize the results ''on the fly'' without stop running the simulation. In this presentation we discuss and show how to use the technique conjugated with a visualization code using openGL
The Climate-G testbed: towards a large scale data sharing environment for climate change

Science.gov (United States)

Aloisio, G.; Fiore, S.; Denvil, S.; Petitdidier, M.; Fox, P.; Schwichtenberg, H.; Blower, J.; Barbera, R.

2009-04-01

The Climate-G testbed provides an experimental large scale data environment for climate change addressing challenging data and metadata management issues. The main scope of Climate-G is to allow scientists to carry out geographical and cross-institutional climate data discovery, access, visualization and sharing. Climate-G is a multidisciplinary collaboration involving both climate and computer scientists and it currently involves several partners such as: Centro Euro-Mediterraneo per i Cambiamenti Climatici (CMCC), Institut Pierre-Simon Laplace (IPSL), Fraunhofer Institut für Algorithmen und Wissenschaftliches Rechnen (SCAI), National Center for Atmospheric Research (NCAR), University of Reading, University of Catania and University of Salento. To perform distributed metadata search and discovery, we adopted a CMCC metadata solution (which provides a high level of scalability, transparency, fault tolerance and autonomy) leveraging both on P2P and grid technologies (GRelC Data Access and Integration Service). Moreover, data are available through OPeNDAP/THREDDS services, Live Access Server as well as the OGC compliant Web Map Service and they can be downloaded, visualized, accessed into the proposed environment through the Climate-G Data Distribution Centre (DDC), the web gateway to the Climate-G digital library. The DDC is a data-grid portal allowing users to easily, securely and transparently perform search/discovery, metadata management, data access, data visualization, etc. Godiva2 (integrated into the DDC) displays 2D maps (and animations) and also exports maps for display on the Google Earth virtual globe. Presently, Climate-G publishes (through the DDC) about 2TB of data related to the ENSEMBLES project (also including distributed replicas of data) as well as to the IPCC AR4. The main results of the proposed work are: wide data access/sharing environment for climate change; P2P/grid metadata approach; production-level Climate-G DDC; high quality tools for
Job-mix modeling and system analysis of an aerospace multiprocessor.

Science.gov (United States)

Mallach, E. G.

1972-01-01

An aerospace guidance computer organization, consisting of multiple processors and memory units attached to a central time-multiplexed data bus, is described. A job mix for this type of computer is obtained by analysis of Apollo mission programs. Multiprocessor performance is then analyzed using: 1) queuing theory, under certain 'limiting case' assumptions; 2) Markov process methods; and 3) system simulation. Results of the analyses indicate: 1) Markov process analysis is a useful and efficient predictor of simulation results; 2) efficient job execution is not seriously impaired even when the system is so overloaded that new jobs are inordinately delayed in starting; 3) job scheduling is significant in determining system performance; and 4) a system having many slow processors may or may not perform better than a system of equal power having few fast processors, but will not perform significantly worse.
The effect of the order in which episodic autobiographical memories versus autobiographical knowledge are shared on feelings of closeness.

Science.gov (United States)

Brandon, Nicole R; Beike, Denise R; Cole, Holly E

2017-07-01

Autobiographical memories (AMs) can be used to create and maintain closeness with others [Alea, N., & Bluck, S. (2003). Why are you telling me that? A conceptual model of the social function of autobiographical memory. Memory, 11(2), 165-178]. However, the differential effects of memory specificity are not well established. Two studies with 148 participants tested whether the order in which autobiographical knowledge (AK) and specific episodic AM (EAM) are shared affects feelings of closeness. Participants read two memories hypothetically shared by each of four strangers. The strangers first shared either AK or an EAM, and then shared either AK or an EAM. Participants were randomly assigned to read either positive or negative AMs from the strangers. Findings suggest that people feel closer to those who share positive AMs in the same way they construct memories: starting with general and moving to specific.
Iterative schemes for parallel Sn algorithms in a shared-memory computing environment

International Nuclear Information System (INIS)

Haghighat, A.; Hunter, M.A.; Mattis, R.E.

1995-01-01

Several two-dimensional spatial domain partitioning S n transport theory algorithms are developed on the basis of different iterative schemes. These algorithms are incorporated into TWOTRAN-II and tested on the shared-memory CRAY Y-MP C90 computer. For a series of fixed-source r-z geometry homogeneous problems, it is demonstrated that the concurrent red-black algorithms may result in large parallel efficiencies (>60%) on C90. It is also demonstrated that for a realistic shielding problem, the use of the negative flux fixup causes high load imbalance, which results in a significant loss of parallel efficiency
Theme II Joint Work Plan -2017 Collaboration and Knowledge Sharing on Large-scale Demonstration Projects

Energy Technology Data Exchange (ETDEWEB)

Zhang, Xiaoliang [World Resources Inst. (WRI), Washington, DC (United States); Stauffer, Philip H. [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2017-09-25

This effort is designed to expedite learnings from existing and planned large demonstration projects and their associated research through effective knowledge sharing among participants in the US and China.
The structural robustness of multiprocessor computing system

Directory of Open Access Journals (Sweden)

N. Andronaty

1996-03-01

Full Text Available The model of the multiprocessor computing system on the base of transputers which permits to resolve the question of valuation of a structural robustness (viability, survivability is described.
USC orthogonal multiprocessor for image processing with neural networks

Science.gov (United States)

Hwang, Kai; Panda, Dhabaleswar K.; Haddadi, Navid

1990-07-01

This paper presents the architectural features and imaging applications of the Orthogonal MultiProcessor (OMP) system, which is under construction at the University of Southern California with research funding from NSF and assistance from several industrial partners. The prototype OMP is being built with 16 Intel i860 RISC microprocessors and 256 parallel memory modules using custom-designed spanning buses, which are 2-D interleaved and orthogonally accessed without conflicts. The 16-processor OMP prototype is targeted to achieve 430 MIPS and 600 Mflops, which have been verified by simulation experiments based on the design parameters used. The prototype OMP machine will be initially applied for image processing, computer vision, and neural network simulation applications. We summarize important vision and imaging algorithms that can be restructured with neural network models. These algorithms can efficiently run on the OMP hardware with linear speedup. The ultimate goal is to develop a high-performance Visual Computer (Viscom) for integrated low- and high-level image processing and vision tasks.
High Performance Programming Using Explicit Shared Memory Model on Cray T3D1

Science.gov (United States)

Simon, Horst D.; Saini, Subhash; Grassi, Charles

1994-01-01

The Cray T3D system is the first-phase system in Cray Research, Inc.'s (CRI) three-phase massively parallel processing (MPP) program. This system features a heterogeneous architecture that closely couples DEC's Alpha microprocessors and CRI's parallel-vector technology, i.e., the Cray Y-MP and Cray C90. An overview of the Cray T3D hardware and available programming models is presented. Under Cray Research adaptive Fortran (CRAFT) model four programming methods (data parallel, work sharing, message-passing using PVM, and explicit shared memory model) are available to the users. However, at this time data parallel and work sharing programming models are not available to the user community. The differences between standard PVM and CRI's PVM are highlighted with performance measurements such as latencies and communication bandwidths. We have found that the performance of neither standard PVM nor CRI s PVM exploits the hardware capabilities of the T3D. The reasons for the bad performance of PVM as a native message-passing library are presented. This is illustrated by the performance of NAS Parallel Benchmarks (NPB) programmed in explicit shared memory model on Cray T3D. In general, the performance of standard PVM is about 4 to 5 times less than obtained by using explicit shared memory model. This degradation in performance is also seen on CM-5 where the performance of applications using native message-passing library CMMD on CM-5 is also about 4 to 5 times less than using data parallel methods. The issues involved (such as barriers, synchronization, invalidating data cache, aligning data cache etc.) while programming in explicit shared memory model are discussed. Comparative performance of NPB using explicit shared memory programming model on the Cray T3D and other highly parallel systems such as the TMC CM-5, Intel Paragon, Cray C90, IBM-SP1, etc. is presented.
MulticoreBSP for C : A high-performance library for shared-memory parallel programming

NARCIS (Netherlands)

Yzelman, A. N.; Bisseling, R. H.; Roose, D.; Meerbergen, K.

2014-01-01

The bulk synchronous parallel (BSP) model, as well as parallel programming interfaces based on BSP, classically target distributed-memory parallel architectures. In earlier work, Yzelman and Bisseling designed a MulticoreBSP for Java library specifically for shared-memory architectures. In the
Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

KAUST Repository

Wu, Xingfu; Taylor, Valerie

2013-01-01

In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore supercomputers: IBM POWER4, POWER5+ and BlueGene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks and Intel's MPI benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore supercomputers because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyrokinetic Toroidal Code (GTC) in magnetic fusion to validate our performance model of the hybrid application on these multicore supercomputers. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore supercomputers. © 2013 Elsevier Inc.
Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers

KAUST Repository

Wu, Xingfu

2013-12-01

In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore supercomputers: IBM POWER4, POWER5+ and BlueGene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks and Intel\\'s MPI benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore supercomputers because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyrokinetic Toroidal Code (GTC) in magnetic fusion to validate our performance model of the hybrid application on these multicore supercomputers. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore supercomputers. © 2013 Elsevier Inc.

Low rank approximation methods for MR fingerprinting with large scale dictionaries.

Science.gov (United States)

Yang, Mingrui; Ma, Dan; Jiang, Yun; Hamilton, Jesse; Seiberlich, Nicole; Griswold, Mark A; McGivney, Debra

2018-04-01

This work proposes new low rank approximation approaches with significant memory savings for large scale MR fingerprinting (MRF) problems. We introduce a compressed MRF with randomized singular value decomposition method to significantly reduce the memory requirement for calculating a low rank approximation of large sized MRF dictionaries. We further relax this requirement by exploiting the structures of MRF dictionaries in the randomized singular value decomposition space and fitting them to low-degree polynomials to generate high resolution MRF parameter maps. In vivo 1.5T and 3T brain scan data are used to validate the approaches. T 1 , T 2 , and off-resonance maps are in good agreement with that of the standard MRF approach. Moreover, the memory savings is up to 1000 times for the MRF-fast imaging with steady-state precession sequence and more than 15 times for the MRF-balanced, steady-state free precession sequence. The proposed compressed MRF with randomized singular value decomposition and dictionary fitting methods are memory efficient low rank approximation methods, which can benefit the usage of MRF in clinical settings. They also have great potentials in large scale MRF problems, such as problems considering multi-component MRF parameters or high resolution in the parameter space. Magn Reson Med 79:2392-2400, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Programming parallel architectures - The BLAZE family of languages

Science.gov (United States)

Mehrotra, Piyush

1989-01-01

This paper gives an overview of the various approaches to programming multiprocessor architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive, since they remove much of the burden of exploiting parallel architectures from the user. This paper also describes recent work in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described.
A highly efficient parallel algorithm for solving the neutron diffusion nodal equations on shared-memory computers

International Nuclear Information System (INIS)

Azmy, Y.Y.; Kirk, B.L.

1990-01-01

Modern parallel computer architectures offer an enormous potential for reducing CPU and wall-clock execution times of large-scale computations commonly performed in various applications in science and engineering. Recently, several authors have reported their efforts in developing and implementing parallel algorithms for solving the neutron diffusion equation on a variety of shared- and distributed-memory parallel computers. Testing of these algorithms for a variety of two- and three-dimensional meshes showed significant speedup of the computation. Even for very large problems (i.e., three-dimensional fine meshes) executed concurrently on a few nodes in serial (nonvector) mode, however, the measured computational efficiency is very low (40 to 86%). In this paper, the authors present a highly efficient (∼85 to 99.9%) algorithm for solving the two-dimensional nodal diffusion equations on the Sequent Balance 8000 parallel computer. Also presented is a model for the performance, represented by the efficiency, as a function of problem size and the number of participating processors. The model is validated through several tests and then extrapolated to larger problems and more processors to predict the performance of the algorithm in more computationally demanding situations
Backup flexibility classes in emerging large-scale renewable electricity systems

International Nuclear Information System (INIS)

Schlachtberger, D.P.; Becker, S.; Schramm, S.; Greiner, M.

2016-01-01

Highlights: • Flexible backup demand in a European wind and solar based power system is modelled. • Three flexibility classes are defined based on production and consumption timescales. • Seasonal backup capacities are shown to be only used below 50% renewable penetration. • Large-scale transmission between countries can reduce fast flexible capacities. - Abstract: High shares of intermittent renewable power generation in a European electricity system will require flexible backup power generation on the dominant diurnal, synoptic, and seasonal weather timescales. The same three timescales are already covered by today’s dispatchable electricity generation facilities, which are able to follow the typical load variations on the intra-day, intra-week, and seasonal timescales. This work aims to quantify the changing demand for those three backup flexibility classes in emerging large-scale electricity systems, as they transform from low to high shares of variable renewable power generation. A weather-driven modelling is used, which aggregates eight years of wind and solar power generation data as well as load data over Germany and Europe, and splits the backup system required to cover the residual load into three flexibility classes distinguished by their respective maximum rates of change of power output. This modelling shows that the slowly flexible backup system is dominant at low renewable shares, but its optimized capacity decreases and drops close to zero once the average renewable power generation exceeds 50% of the mean load. The medium flexible backup capacities increase for modest renewable shares, peak at around a 40% renewable share, and then continuously decrease to almost zero once the average renewable power generation becomes larger than 100% of the mean load. The dispatch capacity of the highly flexible backup system becomes dominant for renewable shares beyond 50%, and reach their maximum around a 70% renewable share. For renewable shares
Experiences in the parallelization of the discrete ordinates method using OpenMP and MPI

Energy Technology Data Exchange (ETDEWEB)

Pautz, A. [TUV Hannover/Sachsen-Anhalt e.V. (Germany); Langenbuch, S. [Gesellschaft fur Anlagen- und Reaktorsicherheit (GRS) mbH (Germany)

2003-07-01

The method of Discrete Ordinates is in principle parallelizable to a high degree, since the transport 'mesh sweeps' are mutually independent for all angular directions. However, in the well-known production code Dort such a type of angular domain decomposition has to be done on a spatial line-byline basis, causing the parallelism in the code to be very fine-grained. The construction of scalar fluxes and moments requires a large effort for inter-thread or inter-process communication. We have implemented two different parallelization approaches in Dort: firstly, we have used a shared-memory model suitable for SMP (Symmetric Multiprocessor) machines based on the standard OpenMP. The second approach uses the well-known Message Passing Interface (MPI) to establish communication between parallel processes running in a distributed-memory environment. We investigate the benefits and drawbacks of both models and show first results on performance and scaling behaviour of the parallel Dort code. (authors)
Experiences in the parallelization of the discrete ordinates method using OpenMP and MPI

International Nuclear Information System (INIS)

Pautz, A.; Langenbuch, S.

2003-01-01

The method of Discrete Ordinates is in principle parallelizable to a high degree, since the transport 'mesh sweeps' are mutually independent for all angular directions. However, in the well-known production code Dort such a type of angular domain decomposition has to be done on a spatial line-byline basis, causing the parallelism in the code to be very fine-grained. The construction of scalar fluxes and moments requires a large effort for inter-thread or inter-process communication. We have implemented two different parallelization approaches in Dort: firstly, we have used a shared-memory model suitable for SMP (Symmetric Multiprocessor) machines based on the standard OpenMP. The second approach uses the well-known Message Passing Interface (MPI) to establish communication between parallel processes running in a distributed-memory environment. We investigate the benefits and drawbacks of both models and show first results on performance and scaling behaviour of the parallel Dort code. (authors)
Large scale testing of nitinol shape memory alloy devices for retrofitting of bridges

International Nuclear Information System (INIS)

Johnson, Rita; Emmanuel Maragakis, M; Saiid Saiidi, M; Padgett, Jamie E; DesRoches, Reginald

2008-01-01

A large scale testing program was conducted to determine the effects of shape memory alloy (SMA) restrainer cables on the seismic performance of in-span hinges of a representative multiple-frame concrete box girder bridge subjected to earthquake excitations. Another objective of the study was to compare the performance of SMA restrainers to that of traditional steel restrainers as restraining devices for reducing hinge displacement and the likelihood of collapse during earthquakes. The results of the tests show that SMA restrainers performed very well as restraining devices. The forces in the SMA and steel restrainers were comparable. However, the SMA restrainer cables had minimal residual strain after repeated loading and exhibited the ability to undergo many cycles with little strength and stiffness degradation. In addition, the hysteretic damping that was observed in the larger ground accelerations demonstrated the ability of the materials to dissipate energy. An analytical study was conducted to assess the anticipated seismic response of the test setup and evaluate the accuracy of the analytical model. The results of the analytical simulation illustrate that the analytical model was able to match the responses from the experimental tests, including peak stresses, strains, forces, and hinge openings
Supporting Multiprocessors in the Icecap Safety-Critical Java Run-Time Environment

DEFF Research Database (Denmark)

Zhao, Shuai; Wellings, Andy; Korsholm, Stephan Erbs

The current version of the Safety Critical Java (SCJ) specification defines three compliance levels. Level 0 targets single processor programs while Level 1 and 2 can support multiprocessor platforms. Level 1 programs must be fully partitioned but Level 2 programs can also be more globally...... scheduled. As of yet, there is no official Reference Implementation for SCJ. However, the icecap project has produced a Safety-Critical Java Run-time Environment based on the Hardware-near Virtual Machine (HVM). This supports SCJ at all compliance levels and provides an implementation of the safety......-critical Java (javax.safetycritical) package. This is still work-in-progress and lacks certain key features. Among these is the ability to support multiprocessor platforms. In this paper, we explore two possible options to adding multiprocessor support to this environment: the “green thread” and the “native...
Pthreads vs MPI Parallel Performance of Angular-Domain Decomposed S

International Nuclear Information System (INIS)

Azmy, Y.Y.; Barnett, D.A.

2000-01-01

Two programming models for parallelizing the Angular Domain Decomposition (ADD) of the discrete ordinates (S n ) approximation of the neutron transport equation are examined. These are the shared memory model based on the POSIX threads (Pthreads) standard, and the message passing model based on the Message Passing Interface (MPI) standard. These standard libraries are available on most multiprocessor platforms thus making the resulting parallel codes widely portable. The question is: on a fixed platform, and for a particular code solving a given test problem, which of the two programming models delivers better parallel performance? Such comparison is possible on Symmetric Multi-Processors (SMP) architectures in which several CPUs physically share a common memory, and in addition are capable of emulating message passing functionality. Implementation of the two-dimensional,(S n ), Arbitrarily High Order Transport (AHOT) code for solving neutron transport problems using these two parallelization models is described. Measured parallel performance of each model on the COMPAQ AlphaServer 8400 and the SGI Origin 2000 platforms is described, and comparison of the observed speedup for the two programming models is reported. For the case presented in this paper it appears that the MPI implementation scales better than the Pthreads implementation on both platforms
Scaling properties in time-varying networks with memory

Science.gov (United States)

Kim, Hyewon; Ha, Meesoon; Jeong, Hawoong

2015-12-01

The formation of network structure is mainly influenced by an individual node's activity and its memory, where activity can usually be interpreted as the individual inherent property and memory can be represented by the interaction strength between nodes. In our study, we define the activity through the appearance pattern in the time-aggregated network representation, and quantify the memory through the contact pattern of empirical temporal networks. To address the role of activity and memory in epidemics on time-varying networks, we propose temporal-pattern coarsening of activity-driven growing networks with memory. In particular, we focus on the relation between time-scale coarsening and spreading dynamics in the context of dynamic scaling and finite-size scaling. Finally, we discuss the universality issue of spreading dynamics on time-varying networks for various memory-causality tests.
Large-Scale Cubic-Scaling Random Phase Approximation Correlation Energy Calculations Using a Gaussian Basis.

Science.gov (United States)

Wilhelm, Jan; Seewald, Patrick; Del Ben, Mauro; Hutter, Jürg

2016-12-13

We present an algorithm for computing the correlation energy in the random phase approximation (RPA) in a Gaussian basis requiring [Formula: see text] operations and [Formula: see text] memory. The method is based on the resolution of the identity (RI) with the overlap metric, a reformulation of RI-RPA in the Gaussian basis, imaginary time, and imaginary frequency integration techniques, and the use of sparse linear algebra. Additional memory reduction without extra computations can be achieved by an iterative scheme that overcomes the memory bottleneck of canonical RPA implementations. We report a massively parallel implementation that is the key for the application to large systems. Finally, cubic-scaling RPA is applied to a thousand water molecules using a correlation-consistent triple-ζ quality basis.
Solution of the Euler and Navier-Stokes equations on MIMD distributed memory multiprocessors using cyclic reduction

International Nuclear Information System (INIS)

Curchitser, E.N.; Pelz, R.B.; Marconi, F.

1992-01-01

The Euler and Navier-Stokes equations are solved for the steady, two-dimensional flow over a NACA 0012 airfoil using a 1024 node nCUBE/2 multiprocessor. Second-order, upwind-discretized difference equations are solved implicitly using ADI factorization. Parallel cyclic reduction is employed to solve the block tridiagonal systems. For realistic problems, communication times are negligible compared to calculation times. The processors are tightly synchronized, and their loads are well balanced. When the flux Jacobians flux are frozen, the wall-clock time for one implicit timestep is about equal to that of a multistage explicit scheme. 10 refs
Generalized Load Sharing for Homogeneous Networks of Distributed Environment

Directory of Open Access Journals (Sweden)

A. Satheesh

2008-01-01

Full Text Available We propose a method for job migration policies by considering effective usage of global memory in addition to CPU load sharing in distributed systems. When a node is identified for lacking sufficient memory space to serve jobs, one or more jobs of the node will be migrated to remote nodes with low memory allocations. If the memory space is sufficiently large, the jobs will be scheduled by a CPU-based load sharing policy. Following the principle of sharing both CPU and memory resources, we present several load sharing alternatives. Our objective is to reduce the number of page faults caused by unbalanced memory allocations for jobs among distributed nodes, so that overall performance of a distributed system can be significantly improved. We have conducted trace-driven simulations to compare CPU-based load sharing policies with our policies. We show that our load sharing policies not only improve performance of memory bound jobs, but also maintain the same load sharing quality as the CPU-based policies for CPU-bound jobs. Regarding remote execution and preemptive migration strategies, our experiments indicate that a strategy selection in load sharing is dependent on the amount of memory demand of jobs, remote execution is more effective for memory-bound jobs, and preemptive migration is more effective for CPU-bound jobs. Our CPU-memory-based policy using either high performance or high throughput approach and using the remote execution strategy performs the best for both CPU-bound and memory-bound job in homogeneous networks of distributed environment.
Compiling for Application Specific Computational Acceleration in Reconfigurable Architectures Final Report CRADA No. TSB-2033-01

Energy Technology Data Exchange (ETDEWEB)

De Supinski, B. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Caliga, D. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

2017-09-28

The primary objective of this project was to develop memory optimization technology to efficiently deliver data to, and distribute data within, the SRC-6's Field Programmable Gate Array- ("FPGA") based Multi-Adaptive Processors (MAPs). The hardware/software approach was to explore efficient MAP configurations and generate the compiler technology to exploit those configurations. This memory accessing technology represents an important step towards making reconfigurable symmetric multi-processor (SMP) architectures that will be a costeffective solution for large-scale scientific computing.
Performance Modeling of Hybrid MPI/OpenMP Scientific Applications on Large-scale Multicore Cluster Systems

KAUST Repository

Wu, Xingfu; Taylor, Valerie

2011-01-01

In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore clusters: IBM POWER4, POWER5+ and Blue Gene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore clusters because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyro kinetic Toroidal Code in magnetic fusion to validate our performance model of the hybrid application on these multicore clusters. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore clusters. © 2011 IEEE.
Performance Modeling of Hybrid MPI/OpenMP Scientific Applications on Large-scale Multicore Cluster Systems

KAUST Repository

Wu, Xingfu

2011-08-01

In this paper, we present a performance modeling framework based on memory bandwidth contention time and a parameterized communication model to predict the performance of OpenMP, MPI and hybrid applications with weak scaling on three large-scale multicore clusters: IBM POWER4, POWER5+ and Blue Gene/P, and analyze the performance of these MPI, OpenMP and hybrid applications. We use STREAM memory benchmarks to provide initial performance analysis and model validation of MPI and OpenMP applications on these multicore clusters because the measured sustained memory bandwidth can provide insight into the memory bandwidth that a system should sustain on scientific applications with the same amount of workload per core. In addition to using these benchmarks, we also use a weak-scaling hybrid MPI/OpenMP large-scale scientific application: Gyro kinetic Toroidal Code in magnetic fusion to validate our performance model of the hybrid application on these multicore clusters. The validation results for our performance modeling method show less than 7.77% error rate in predicting the performance of hybrid MPI/OpenMP GTC on up to 512 cores on these multicore clusters. © 2011 IEEE.
Specification and development of the sharing memory data management module for a nuclear processes simulator

International Nuclear Information System (INIS)

Telesforo R, D.

2003-01-01

Actually it is developed in the Engineering Faculty of UNAM a simulator of nuclear processes with research and teaching purposes. It consists of diverse modules, included the one that is described in the present work that is the shared memory module. It uses the IPC mechanisms of the UNIX System V operative system, and it was codified with C language. To model the diverse components of the simulator the RELAP code is used. The function of the module is to generate locations of shared memory for to deposit in these the necessary variables for the interaction among the diverse ones processes of the simulator. In its it will be able read and to write the information that generate the running of the simulation program, besides being able to interact with the internal variables of the code in execution time. The graphic unfolding (mimic, pictorials, tendency graphics, virtual instrumentation, etc.) they also obtain information of the shared memory. In turn, actions of the user in interactive unfolding, they modify the segments of shared memory, and the information is sent to the RELAP code to modify the simulation course. The program has two beginning modes: automatic and manual. In automatic mode taking an enter file of RELAP (indta) and it joins in shared memory, the control variables that in this appear. In manual mode the user joins, he reads and he writes the wanted control variables, whenever they exist in the enter file (indta). This is a dynamic mode of interacting with the simulator in a direct way and of even altering the values as when its don't exist in the board elements associated to the variables. (Author)
Reproducibility in a multiprocessor system

Science.gov (United States)

Bellofatto, Ralph A; Chen, Dong; Coteus, Paul W; Eisley, Noel A; Gara, Alan; Gooding, Thomas M; Haring, Rudolf A; Heidelberger, Philip; Kopcsay, Gerard V; Liebsch, Thomas A; Ohmacht, Martin; Reed, Don D; Senger, Robert M; Steinmacher-Burow, Burkhard; Sugawara, Yutaka

2013-11-26

Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed; a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.
Shared mushroom body circuits underlie visual and olfactory memories in Drosophila

Science.gov (United States)

Vogt, Katrin; Schnaitmann, Christopher; Dylla, Kristina V; Knapek, Stephan; Aso, Yoshinori; Rubin, Gerald M; Tanimoto, Hiromu

2014-01-01

In nature, animals form memories associating reward or punishment with stimuli from different sensory modalities, such as smells and colors. It is unclear, however, how distinct sensory memories are processed in the brain. We established appetitive and aversive visual learning assays for Drosophila that are comparable to the widely used olfactory learning assays. These assays share critical features, such as reinforcing stimuli (sugar reward and electric shock punishment), and allow direct comparison of the cellular requirements for visual and olfactory memories. We found that the same subsets of dopamine neurons drive formation of both sensory memories. Furthermore, distinct yet partially overlapping subsets of mushroom body intrinsic neurons are required for visual and olfactory memories. Thus, our results suggest that distinct sensory memories are processed in a common brain center. Such centralization of related brain functions is an economical design that avoids the repetition of similar circuit motifs. DOI: http://dx.doi.org/10.7554/eLife.02395.001 PMID:25139953
Nanographene charge trapping memory with a large memory window

International Nuclear Information System (INIS)

Meng, Jianling; Yang, Rong; Zhao, Jing; He, Congli; Wang, Guole; Shi, Dongxia; Zhang, Guangyu

2015-01-01

Nanographene is a promising alternative to metal nanoparticles or semiconductor nanocrystals for charge trapping memory. In general, a high density of nanographene is required in order to achieve high charge trapping capacity. Here, we demonstrate a strategy of fabrication for a high density of nanographene for charge trapping memory with a large memory window. The fabrication includes two steps: (1) direct growth of continuous nanographene film; and (2) isolation of the as-grown film into high-density nanographene by plasma etching. Compared with directly grown isolated nanographene islands, abundant defects and edges are formed in nanographene under argon or oxygen plasma etching, i.e. more isolated nanographene islands are obtained, which provides more charge trapping sites. As-fabricated nanographene charge trapping memory shows outstanding memory properties with a memory window as wide as ∼9 V at a relative low sweep voltage of ±8 V, program/erase speed of ∼1 ms and robust endurance of >1000 cycles. The high-density nanographene charge trapping memory provides an outstanding alternative for downscaling technology beyond the current flash memory. (paper)

Circuit engineering principles for construction of bipolar large-scale integrated circuit storage devices and very large-scale main memory

Science.gov (United States)

Neklyudov, A. A.; Savenkov, V. N.; Sergeyez, A. G.

1984-06-01

Memories are improved by increasing speed or the memory volume on a single chip. The most effective means for increasing speeds in bipolar memories are current control circuits with the lowest extraction times for a specific power consumption (1/4 pJ/bit). The control current circuitry involves multistage current switches and circuits accelerating transient processes in storage elements and links. Circuit principles for the design of bipolar memories with maximum speeds for an assigned minimum of circuit topology are analyzed. Two main classes of storage with current control are considered: the ECL type and super-integrated injection type storage with data capacities of N = 1/4 and N 4/16, respectively. The circuits reduce logic voltage differentials and the volumes of lexical and discharge buses and control circuit buses. The limiting speed is determined by the antiinterference requirements of the memory in storage and extraction modes.
Performance Health Monitoring of Large-Scale Systems

Energy Technology Data Exchange (ETDEWEB)

Rajamony, Ram [IBM Research, Austin, TX (United States)

2014-11-20

This report details the progress made on the ASCR funded project Performance Health Monitoring for Large Scale Systems. A large-scale application may not achieve its full performance potential due to degraded performance of even a single subsystem. Detecting performance faults, isolating them, and taking remedial action is critical for the scale of systems on the horizon. PHM aims to develop techniques and tools that can be used to identify and mitigate such performance problems. We accomplish this through two main aspects. The PHM framework encompasses diagnostics, system monitoring, fault isolation, and performance evaluation capabilities that indicates when a performance fault has been detected, either due to an anomaly present in the system itself or due to contention for shared resources between concurrently executing jobs. Software components called the PHM Control system then build upon the capabilities provided by the PHM framework to mitigate degradation caused by performance problems.
Discriminative Hierarchical K-Means Tree for Large-Scale Image Classification.

Science.gov (United States)

Chen, Shizhi; Yang, Xiaodong; Tian, Yingli

2015-09-01

A key challenge in large-scale image classification is how to achieve efficiency in terms of both computation and memory without compromising classification accuracy. The learning-based classifiers achieve the state-of-the-art accuracies, but have been criticized for the computational complexity that grows linearly with the number of classes. The nonparametric nearest neighbor (NN)-based classifiers naturally handle large numbers of categories, but incur prohibitively expensive computation and memory costs. In this brief, we present a novel classification scheme, i.e., discriminative hierarchical K-means tree (D-HKTree), which combines the advantages of both learning-based and NN-based classifiers. The complexity of the D-HKTree only grows sublinearly with the number of categories, which is much better than the recent hierarchical support vector machines-based methods. The memory requirement is the order of magnitude less than the recent Naïve Bayesian NN-based approaches. The proposed D-HKTree classification scheme is evaluated on several challenging benchmark databases and achieves the state-of-the-art accuracies, while with significantly lower computation cost and memory requirement.
Large-scale Ising-machines composed of magnetic neurons

Science.gov (United States)

Mizushima, Koichi; Goto, Hayato; Sato, Rie

2017-10-01

We propose Ising-machines composed of magnetic neurons, that is, magnetic bits in a recording track. In large-scale machines, the sizes of both neurons and synapses need to be reduced, and neat and smart connections among neurons are also required to achieve all-to-all connectivity among them. These requirements can be fulfilled by adopting magnetic recording technologies such as race-track memories and skyrmion tracks because the area of a magnetic bit is almost two orders of magnitude smaller than that of static random access memory, which has normally been used as a semiconductor neuron, and the smart connections among neurons are realized by using the read and write methods of these technologies.
Soil moisture memory at sub-monthly time scales

Science.gov (United States)

Mccoll, K. A.; Entekhabi, D.

2017-12-01

For soil moisture-climate feedbacks to occur, the soil moisture storage must have `memory' of past atmospheric anomalies. Quantifying soil moisture memory is, therefore, essential for mapping and characterizing land-atmosphere interactions globally. Most previous studies estimate soil moisture memory using metrics based on the autocorrelation function of the soil moisture time series (e.g., the e-folding autocorrelation time scale). This approach was first justified by Delworth and Manabe (1988) on the assumption that monthly soil moisture time series can be modelled as red noise. While this is a reasonable model for monthly soil moisture averages, at sub-monthly scales, the model is insufficient due to the highly non-Gaussian behavior of the precipitation forcing. Recent studies have shown that significant soil moisture-climate feedbacks appear to occur at sub-monthly time scales. Therefore, alternative metrics are required for defining and estimating soil moisture memory at these shorter time scales. In this study, we introduce metrics, based on the positive and negative increments of the soil moisture time series, that can be used to estimate soil moisture memory at sub-monthly time scales. The positive increments metric corresponds to a rapid drainage time scale. The negative increments metric represents a slower drying time scale that is most relevant to the study of land-atmosphere interactions. We show that autocorrelation-based metrics mix the two time scales, confounding physical interpretation. The new metrics are used to estimate soil moisture memory at sub-monthly scales from in-situ and satellite observations of soil moisture. Reference: Delworth, Thomas L., and Syukuro Manabe. "The Influence of Potential Evaporation on the Variabilities of Simulated Soil Wetness and Climate." Journal of Climate 1, no. 5 (May 1, 1988): 523-47. doi:10.1175/1520-0442(1988)0012.0.CO;2.
Sharing and Unsharing Memories of Jews of Moroccan Origin in Montréal and Paris Compared

Directory of Open Access Journals (Sweden)

Yolande Cohen

2012-11-01

Full Text Available This text 1 explores the memories of Moroccan Jews who left their country of origin to go to France and to Canada, through their life stories. By questioning the constitution of a shared memory and of a group memory, it stresses the interest to adopt a generational perspective to better understand the migration of this population. While some interviewees emphasize the rationalization of their departure, the younger ones, consider their leaving as a natural step in their many migrations. These distinctions are central to show how the memory of the departures and the depiction of the colonial society are shared by members of a group, and unshared with the larger Moroccan society.
Advanced lectures on multiprocessor programming (1/3)

CERN Multimedia

CERN. Geneva

2011-01-01

Three classes (60 mins) on Multiprocessor Programming Prof. Dr. Christoph von Praun Georg-Simon-Ohm University of Applied Sciences Nuremberg, Germany This is an advanced class on multiprocessor programming. The class gives an introduction to principles of concurrent objects and the notion of different progress guarantees that concurrent computations can have. The focus of this class is on non-blocking computations, i.e. concurrent programs that do not make use of locks. We discuss the implementation of practical non-blocking data structures in detail. 1st class: Introduction to concurrent objects 2nd class: Principles of non-blocking synchronization 3rd class: Concurrent queues Brief Bio of Christoph von Praun Christoph worked on a variety of analysis techniques and runtime platforms for parallel programs. Hist most recent research studies programming models and tools that support transactional synchronization. In prior work, which he also did at the IBM T.J. Watson Research Center in Yorktown Height...
Academic training: Advanced lectures on multiprocessor programming

CERN Multimedia

PH Department

2011-01-01

Academic Training Lecture - Regular Programme 31 October 1, 2 November 2011 from 11:00 to 12:00 - IT Auditorium, Bldg. 31 Three classes (60 mins) on Multiprocessor Programming Prof. Dr. Christoph von Praun Georg-Simon-Ohm University of Applied Sciences Nuremberg, Germany This is an advanced class on multiprocessor programming. The class gives an introduction to principles of concurrent objects and the notion of different progress guarantees that concurrent computations can have. The focus of this class is on non-blocking computations, i.e. concurrent programs that do not make use of locks. We discuss the implementation of practical non-blocking data structures in detail. 1st class: Introduction to concurrent objects 2nd class: Principles of non-blocking synchronization 3rd class: Concurrent queues Brief Bio of Christoph von Praun Christoph worked on a variety of analysis techniques and runtime platforms for parallel programs. Hist most recent research studies programming models an...
Large-scale in vitro expansion of polyclonal human switched-memory B lymphocytes.

Directory of Open Access Journals (Sweden)

Sonia Néron

Full Text Available Polyclonal preparations of therapeutic immunoglobulins, namely intravenous immunoglobulins (IVIg, are essential in the treatment of immunodeficiency and are increasingly used for the treatment of autoimmune and inflammatory diseases. Currently, patients' accessibility to IVIg depends exclusively upon volunteer blood donations followed by the fractionation of pooled human plasma obtained from thousands of individuals. Presently, there are no in vitro cell culture procedures allowing the preparation of polyclonal human antibodies. All in vitro human therapeutic antibodies that are currently generated are based on monoclonal antibodies, which are mostly issued from genetic engineering or single cell antibody technologies. Here, we describe an in vitro cell culture system, using CD40-CD154 interactions, that leads to a 1×10(6-fold expansion of switched memory B lymphocytes in approximately 50 days. These expanded cells secrete polyclonal IgG, which distribution into IgG(1, IgG(2, IgG(3 and IgG(4 is similar to that of normal human serum. Such in vitro generated IgG showed relatively low self-reactivity since they interacted moderately with only 24 human antigens among a total of 9484 targets. Furthermore, up to one liter of IgG secreting cells can be produced in about 40 days. This experimental model, providing large-scale expansion of human B lymphocytes, represents a critical step toward the in vitro production of polyclonal human IgG and a new method for the ex vivo expansion of B cells for therapeutic purposes.
Operating System for Runtime Reconfigurable Multiprocessor Systems

Directory of Open Access Journals (Sweden)

Diana Göhringer

2011-01-01

Full Text Available Operating systems traditionally handle the task scheduling of one or more application instances on processor-like hardware architectures. RAMPSoC, a novel runtime adaptive multiprocessor System-on-Chip, exploits the dynamic reconfiguration on FPGAs to generate, start and terminate hardware and software tasks. The hardware tasks have to be transferred to the reconfigurable hardware via a configuration access port. The software tasks can be loaded into the local memory of the respective IP core either via the configuration access port or via the on-chip communication infrastructure (e.g. a Network-on-Chip. Recent-series of Xilinx FPGAs, such as Virtex-5, provide two Internal Configuration Access Ports, which cannot be accessed simultaneously. To prevent conflicts, the access to these ports as well as the hardware resource management needs to be controlled, e.g. by a special-purpose operating system running on an embedded processor. For that purpose and to handle the relations between temporally and spatially scheduled operations, the novel approach of an operating system is of high importance. This special purpose operating system, called CAP-OS (Configuration Access Port-Operating System, which will be presented in this paper, supports the clients using the configuration port with the services of priority-based access scheduling, hardware task mapping and resource management.
Large scale electrolysers

International Nuclear Information System (INIS)

B Bello; M Junker

2006-01-01

Hydrogen production by water electrolysis represents nearly 4 % of the world hydrogen production. Future development of hydrogen vehicles will require large quantities of hydrogen. Installation of large scale hydrogen production plants will be needed. In this context, development of low cost large scale electrolysers that could use 'clean power' seems necessary. ALPHEA HYDROGEN, an European network and center of expertise on hydrogen and fuel cells, has performed for its members a study in 2005 to evaluate the potential of large scale electrolysers to produce hydrogen in the future. The different electrolysis technologies were compared. Then, a state of art of the electrolysis modules currently available was made. A review of the large scale electrolysis plants that have been installed in the world was also realized. The main projects related to large scale electrolysis were also listed. Economy of large scale electrolysers has been discussed. The influence of energy prices on the hydrogen production cost by large scale electrolysis was evaluated. (authors)
Social Transmission of False Memory in Small Groups and Large Networks.

Science.gov (United States)

Maswood, Raeya; Rajaram, Suparna

2018-05-21

Sharing information and memories is a key feature of social interactions, making social contexts important for developing and transmitting accurate memories and also false memories. False memory transmission can have wide-ranging effects, including shaping personal memories of individuals as well as collective memories of a network of people. This paper reviews a collection of key findings and explanations in cognitive research on the transmission of false memories in small groups. It also reviews the emerging experimental work on larger networks and collective false memories. Given the reconstructive nature of memory, the abundance of misinformation in everyday life, and the variety of social structures in which people interact, an understanding of transmission of false memories has both scientific and societal implications. © 2018 Cognitive Science Society, Inc.
An efficient and novel computation method for simulating diffraction patterns from large-scale coded apertures on large-scale focal plane arrays

Science.gov (United States)

Shrekenhamer, Abraham; Gottesman, Stephen R.

2012-10-01

A novel and memory efficient method for computing diffraction patterns produced on large-scale focal planes by largescale Coded Apertures at wavelengths where diffraction effects are significant has been developed and tested. The scheme, readily implementable on portable computers, overcomes the memory limitations of present state-of-the-art simulation codes such as Zemax. The method consists of first calculating a set of reference complex field (amplitude and phase) patterns on the focal plane produced by a single (reference) central hole, extending to twice the focal plane array size, with one such pattern for each Line-of-Sight (LOS) direction and wavelength in the scene, and with the pattern amplitude corresponding to the square-root of the spectral irradiance from each such LOS direction in the scene at selected wavelengths. Next the set of reference patterns is transformed to generate pattern sets for other holes. The transformation consists of a translational pattern shift corresponding to each hole's position offset and an electrical phase shift corresponding to each hole's position offset and incoming radiance's direction and wavelength. The set of complex patterns for each direction and wavelength is then summed coherently and squared for each detector to yield a set of power patterns unique for each direction and wavelength. Finally the set of power patterns is summed to produce the full waveband diffraction pattern from the scene. With this tool researchers can now efficiently simulate diffraction patterns produced from scenes by large-scale Coded Apertures onto large-scale focal plane arrays to support the development and optimization of coded aperture masks and image reconstruction algorithms.
Parallel iterative solution of the Hermite Collocation equations on GPUs II

International Nuclear Information System (INIS)

Vilanakis, N; Mathioudakis, E

2014-01-01

Hermite Collocation is a high order finite element method for Boundary Value Problems modelling applications in several fields of science and engineering. Application of this integration free numerical solver for the solution of linear BVPs results in a large and sparse general system of algebraic equations, suggesting the usage of an efficient iterative solver especially for realistic simulations. In part I of this work an efficient parallel algorithm of the Schur complement method coupled with Bi-Conjugate Gradient Stabilized (BiCGSTAB) iterative solver has been designed for multicore computing architectures with a Graphics Processing Unit (GPU). In the present work the proposed algorithm has been extended for high performance computing environments consisting of multiprocessor machines with multiple GPUs. Since this is a distributed GPU and shared CPU memory parallel architecture, a hybrid memory treatment is needed for the development of the parallel algorithm. The realization of the algorithm took place on a multiprocessor machine HP SL390 with Tesla M2070 GPUs using the OpenMP and OpenACC standards. Execution time measurements reveal the efficiency of the parallel implementation
Success Factors of Large Scale ERP Implementation in Thailand

OpenAIRE

Rotchanakitumnuai; Siriluck

2010-01-01

The objectives of the study are to examine the determinants of ERP implementation success factors of ERP implementation. The result indicates that large scale ERP implementation success consist of eight factors: project management competence, knowledge sharing, ERP system quality , understanding, user involvement, business process re-engineering, top management support, organization readiness.
GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics1

Energy Technology Data Exchange (ETDEWEB)

Simmhan, Yogesh; Kumbhare, Alok; Wickramaarachchi, Charith; Nagarkar, Soonil; Ravi, Santosh; Raghavendra, Cauligi; Prasanna, Viktor

2014-08-25

Large scale graph processing is a major research area for Big Data exploration. Vertex centric programming models like Pregel are gaining traction due to their simple abstraction that allows for scalable execution on distributed systems naturally. However, there are limitations to this approach which cause vertex centric algorithms to under-perform due to poor compute to communication overhead ratio and slow convergence of iterative superstep. In this paper we introduce GoFFish a scalable sub-graph centric framework co-designed with a distributed persistent graph storage for large scale graph analytics on commodity clusters. We introduce a sub-graph centric programming abstraction that combines the scalability of a vertex centric approach with the flexibility of shared memory sub-graph computation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation. We map Connected Components, SSSP and PageRank algorithms to this model to illustrate its flexibility. Further, we empirically analyze GoFFish using several real world graphs and demonstrate its significant performance improvement, orders of magnitude in some cases, compared to Apache Giraph, the leading open source vertex centric implementation.
Efficient process migration in the EMPS multiprocessor system

NARCIS (Netherlands)

van Dijk, G.J.W.; Gils, van M.J.

1992-01-01

The process migration facility in the Eindhoven multiprocessor system (EMPS) is presented. In the EMPS system, mailboxes are used for interprocess communication. These mailboxes provide transparency of location for communicating processes. The major advantages of mailbox communication in the EMPS
Parallel SN algorithms in shared- and distributed-memory environments

International Nuclear Information System (INIS)

Haghighat, Alireza; Hunter, Melissa A.; Mattis, Ronald E.

1995-01-01

Different 2-D spatial domain partitioning Sn transport theory algorithms have been developed on the basis of the Block-Jacobi iterative scheme. These algorithms have been incorporated into TWOTRAN-II, and tested on a shared-memory CRAY Y-MP C90 and a distributed-memory IBM SP1. For a series of fixed source r-z geometry homogeneous problems, parallel efficiencies in a range of 50-90% are achieved on the C90 with 6 processors, and lower values (20-60%) are obtained on the SP1. It is demonstrated that better performance is attainable if one addresses issues such as convergence rate, load-balancing, and granularity for both architectures, as well as message passing (network bandwidth and latency) for SP1. (author). 17 refs, 4 figs
A new shared-memory programming paradigm for molecular dynamics simulations on the Intel Paragon

International Nuclear Information System (INIS)

D'Azevedo, E.F.; Romine, C.H.

1994-12-01

This report describes the use of shared memory emulation with DOLIB (Distributed Object Library) to simplify parallel programming on the Intel Paragon. A molecular dynamics application is used as an example to illustrate the use of the DOLIB shared memory library. SOTON-PAR, a parallel molecular dynamics code with explicit message-passing using a Lennard-Jones 6-12 potential, is rewritten using DOLIB primitives. The resulting code has no explicit message primitives and resembles a serial code. The new code can perform dynamic load balancing and achieves better performance than the original parallel code with explicit message-passing
An Alternative Algorithm for Computing Watersheds on Shared Memory Parallel Computers

NARCIS (Netherlands)

Meijster, A.; Roerdink, J.B.T.M.

1995-01-01

In this paper a parallel implementation of a watershed algorithm is proposed. The algorithm can easily be implemented on shared memory parallel computers. The watershed transform is generally considered to be inherently sequential since the discrete watershed of an image is defined using recursion.

Report of the Workshop on Petascale Systems Integration for LargeScale Facilities

Energy Technology Data Exchange (ETDEWEB)

Kramer, William T.C.; Walter, Howard; New, Gary; Engle, Tom; Pennington, Rob; Comes, Brad; Bland, Buddy; Tomlison, Bob; Kasdorf, Jim; Skinner, David; Regimbal, Kevin

2007-10-01

There are significant issues regarding Large Scale System integration that are not being addressed in other forums such as current research portfolios or vendor user groups. Unfortunately, the issues in the area of large-scale system integration often fall into a netherworld; not research, not facilities, not procurement, not operations, not user services. Taken together, these issues along with the impact of sub-optimal integration technology means the time required to deploy, integrate and stabilize large scale system may consume up to 20 percent of the useful life of such systems. Improving the state of the art for large scale systems integration has potential to increase the scientific productivity of these systems. Sites have significant expertise, but there are no easy ways to leverage this expertise among them . Many issues inhibit the sharing of information, including available time and effort, as well as issues with sharing proprietary information. Vendors also benefit in the long run from the solutions to issues detected during site testing and integration. There is a great deal of enthusiasm for making large scale system integration a full-fledged partner along with the other major thrusts supported by funding agencies in the definition, design, and use of a petascale systems. Integration technology and issues should have a full 'seat at the table' as petascale and exascale initiatives and programs are planned. The workshop attendees identified a wide range of issues and suggested paths forward. Pursuing these with funding opportunities and innovation offers the opportunity to dramatically improve the state of large scale system integration.
Performance of large-scale scientific applications on the IBM ASCI Blue-Pacific system

International Nuclear Information System (INIS)

Mirin, A.

1998-01-01

The IBM ASCI Blue-Pacific System is a scalable, distributed/shared memory architecture designed to reach multi-teraflop performance. The IBM SP pieces together a large number of nodes, each having a modest number of processors. The system is designed to accommodate a mixed programming model as well as a pure message-passing paradigm. We examine a number of applications on this architecture and evaluate their performance and scalability
Cache-aware network-on-chip for chip multiprocessors

Science.gov (United States)

Tatas, Konstantinos; Kyriacou, Costas; Dekoulis, George; Demetriou, Demetris; Avraam, Costas; Christou, Anastasia

2009-05-01

This paper presents the hardware prototype of a Network-on-Chip (NoC) for a chip multiprocessor that provides support for cache coherence, cache prefetching and cache-aware thread scheduling. A NoC with support to these cache related mechanisms can assist in improving systems performance by reducing the cache miss ratio. The presented multi-core system employs the Data-Driven Multithreading (DDM) model of execution. In DDM thread scheduling is done according to data availability, thus the system is aware of the threads to be executed in the near future. This characteristic of the DDM model allows for cache aware thread scheduling and cache prefetching. The NoC prototype is a crossbar switch with output buffering that can support a cache-aware 4-node chip multiprocessor. The prototype is built on the Xilinx ML506 board equipped with a Xilinx Virtex-5 FPGA.
A generic library for large scale solution of PDEs on modern heterogeneous architectures

DEFF Research Database (Denmark)

Glimberg, Stefan Lemvig; Engsig-Karup, Allan Peter

2012-01-01

Adapting to new programming models for modern multi- and many-core architectures requires code-rewriting and changing algorithms and data structures, in order to achieve good efficiency and scalability. We present a generic library for solving large scale partial differential equations (PDEs......), capable of utilizing heterogeneous CPU/GPU environments. The library can be used for fast proto-typing of PDE solvers, based on finite difference approximations of spatial derivatives in one, two, or three dimensions. In order to efficiently solve large scale problems, we keep memory consumption...... and memory access low, using a low-storage implementation of flexible-order finite difference operators. We will illustrate the use of library components by assembling such matrix-free operators to be used with one of the supported iterative solvers, such as GMRES, CG, Multigrid or Defect Correction...
Hard Real-Time Performances in Multiprocessor-Embedded Systems Using ASMP-Linux

Directory of Open Access Journals (Sweden)

Daniel Pierre Bovet

2008-01-01

Full Text Available Multiprocessor systems, especially those based on multicore or multithreaded processors, and new operating system architectures can satisfy the ever increasing computational requirements of embedded systems. ASMP-LINUX is a modified, high responsiveness, open-source hard real-time operating system for multiprocessor systems capable of providing high real-time performance while maintaining the code simple and not impacting on the performances of the rest of the system. Moreover, ASMP-LINUX does not require code changing or application recompiling/relinking. In order to assess the performances of ASMP-LINUX, benchmarks have been performed on several hardware platforms and configurations.
Hard Real-Time Performances in Multiprocessor-Embedded Systems Using ASMP-Linux

Directory of Open Access Journals (Sweden)

Betti Emiliano

2008-01-01

Full Text Available Abstract Multiprocessor systems, especially those based on multicore or multithreaded processors, and new operating system architectures can satisfy the ever increasing computational requirements of embedded systems. ASMP-LINUX is a modified, high responsiveness, open-source hard real-time operating system for multiprocessor systems capable of providing high real-time performance while maintaining the code simple and not impacting on the performances of the rest of the system. Moreover, ASMP-LINUX does not require code changing or application recompiling/relinking. In order to assess the performances of ASMP-LINUX, benchmarks have been performed on several hardware platforms and configurations.
Parallel discrete event simulation using shared memory

Science.gov (United States)

Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

1988-01-01

With traditional event-list techniques, evaluating a detailed discrete-event simulation-model can often require hours or even days of computation time. By eliminating the event list and maintaining only sufficient synchronization to ensure causality, parallel simulation can potentially provide speedups that are linear in the numbers of processors. A set of shared-memory experiments, using the Chandy-Misra distributed-simulation algorithm, to simulate networks of queues is presented. Parameters of the study include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential-simulation of most queueing network models.
Memory-Optimized Software Synthesis from Dataflow Program Graphs with Large Size Data Samples

Directory of Open Access Journals (Sweden)

Hyunok Oh

2003-05-01

Full Text Available In multimedia and graphics applications, data samples of nonprimitive type require significant amount of buffer memory. This paper addresses the problem of minimizing the buffer memory requirement for such applications in embedded software synthesis from graphical dataflow programs based on the synchronous dataflow (SDF model with the given execution order of nodes. We propose a memory minimization technique that separates global memory buffers from local pointer buffers: the global buffers store live data samples and the local buffers store the pointers to the global buffer entries. The proposed algorithm reduces 67% memory for a JPEG encoder, 40% for an H.263 encoder compared with unshared versions, and 22% compared with the previous sharing algorithm for the H.263 encoder. Through extensive buffer sharing optimization, we believe that automatic software synthesis from dataflow program graphs achieves the comparable code quality with the manually optimized code in terms of memory requirement.
Analytical derivation of traffic patterns in cache-coherent shared-memory systems

DEFF Research Database (Denmark)

Stuart, Matthias Bo; Sparsø, Jens

2011-01-01

This paper presents an analytical method to derive the worst-case traffic pattern caused by a task graph mapped to a cache-coherent shared-memory system. Our analysis allows designers to rapidly evaluate the impact of different mappings of tasks to IP cores on the traffic pattern. The accuracy...
Comparing vector-based and Bayesian memory models using large-scale datasets: User-generated hashtag and tag prediction on Twitter and Stack Overflow.

Science.gov (United States)

Stanley, Clayton; Byrne, Michael D

2016-12-01

The growth of social media and user-created content on online sites provides unique opportunities to study models of human declarative memory. By framing the task of choosing a hashtag for a tweet and tagging a post on Stack Overflow as a declarative memory retrieval problem, 2 cognitively plausible declarative memory models were applied to millions of posts and tweets and evaluated on how accurately they predict a user's chosen tags. An ACT-R based Bayesian model and a random permutation vector-based model were tested on the large data sets. The results show that past user behavior of tag use is a strong predictor of future behavior. Furthermore, past behavior was successfully incorporated into the random permutation model that previously used only context. Also, ACT-R's attentional weight term was linked to an entropy-weighting natural language processing method used to attenuate high-frequency words (e.g., articles and prepositions). Word order was not found to be a strong predictor of tag use, and the random permutation model performed comparably to the Bayesian model without including word order. This shows that the strength of the random permutation model is not in the ability to represent word order, but rather in the way in which context information is successfully compressed. The results of the large-scale exploration show how the architecture of the 2 memory models can be modified to significantly improve accuracy, and may suggest task-independent general modifications that can help improve model fit to human data in a much wider range of domains. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Operating system for a real-time multiprocessor propulsion system simulator

Science.gov (United States)

Cole, G. L.

1984-01-01

The success of the Real Time Multiprocessor Operating System (RTMPOS) in the development and evaluation of experimental hardware and software systems for real time interactive simulation of air breathing propulsion systems was evaluated. The Real Time Multiprocessor Operating System (RTMPOS) provides the user with a versatile, interactive means for loading, running, debugging and obtaining results from a multiprocessor based simulator. A front end processor (FEP) serves as the simulator controller and interface between the user and the simulator. These functions are facilitated by the RTMPOS which resides on the FEP. The RTMPOS acts in conjunction with the FEP's manufacturer supplied disk operating system that provides typical utilities like an assembler, linkage editor, text editor, file handling services, etc. Once a simulation is formulated, the RTMPOS provides for engineering level, run time operations such as loading, modifying and specifying computation flow of programs, simulator mode control, data handling and run time monitoring. Run time monitoring is a powerful feature of RTMPOS that allows the user to record all actions taken during a simulation session and to receive advisories from the simulator via the FEP. The RTMPOS is programmed mainly in PASCAL along with some assembly language routines. The RTMPOS software is easily modified to be applicable to hardware from different manufacturers.
Information and processes underlying semantic and episodic memory across tasks, items, and individuals.

Science.gov (United States)

Cox, Gregory E; Hemmer, Pernille; Aue, William R; Criss, Amy H

2018-04-01

The development of memory theory has been constrained by a focus on isolated tasks rather than the processes and information that are common to situations in which memory is engaged. We present results from a study in which 453 participants took part in five different memory tasks: single-item recognition, associative recognition, cued recall, free recall, and lexical decision. Using hierarchical Bayesian techniques, we jointly analyzed the correlations between tasks within individuals-reflecting the degree to which tasks rely on shared cognitive processes-and within items-reflecting the degree to which tasks rely on the same information conveyed by the item. Among other things, we find that (a) the processes involved in lexical access and episodic memory are largely separate and rely on different kinds of information, (b) access to lexical memory is driven primarily by perceptual aspects of a word, (c) all episodic memory tasks rely to an extent on a set of shared processes which make use of semantic features to encode both single words and associations between words, and (d) recall involves additional processes likely related to contextual cuing and response production. These results provide a large-scale picture of memory across different tasks which can serve to drive the development of comprehensive theories of memory. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Parallel implementations of 2D explicit Euler solvers

International Nuclear Information System (INIS)

Giraud, L.; Manzini, G.

1996-01-01

In this work we present a subdomain partitioning strategy applied to an explicit high-resolution Euler solver. We describe the design of a portable parallel multi-domain code suitable for parallel environments. We present several implementations on a representative range of MlMD computers that include shared memory multiprocessors, distributed virtual shared memory computers, as well as networks of workstations. Computational results are given to illustrate the efficiency, the scalability, and the limitations of the different approaches. We discuss also the effect of the communication protocol on the optimal domain partitioning strategy for the distributed memory computers
Chip-Multiprocessor Hardware Locks for Safety-Critical Java

DEFF Research Database (Denmark)

Strøm, Torur Biskopstø; Puffitsch, Wolfgang; Schoeberl, Martin

2013-01-01

and may void a task set's schedulability. In this paper we present a hardware locking mechanism to reduce the synchronization overhead. The solution is implemented for the chip-multiprocessor version of the Java Optimized Processor in the context of safety-critical Java. The implementation is compared...
Optimization and parallelization of B-spline based orbital evaluations in QMC on multi/many-core shared memory processors

OpenAIRE

Mathuriya, Amrita; Luo, Ye; Benali, Anouar; Shulenburger, Luke; Kim, Jeongnim

2016-01-01

B-spline based orbital representations are widely used in Quantum Monte Carlo (QMC) simulations of solids, historically taking as much as 50% of the total run time. Random accesses to a large four-dimensional array make it challenging to efficiently utilize caches and wide vector units of modern CPUs. We present node-level optimizations of B-spline evaluations on multi/many-core shared memory processors. To increase SIMD efficiency and bandwidth utilization, we first apply data layout transfo...
Recommending the heterogeneous cluster type multi-processor system computing

International Nuclear Information System (INIS)

Iijima, Nobukazu

2010-01-01

Real-time reactor simulator had been developed by reusing the equipment of the Musashi reactor and its performance improvement became indispensable for research tools to increase sampling rate with introduction of arithmetic units using multi-Digital Signal Processor(DSP) system (cluster). In order to realize the heterogeneous cluster type multi-processor system computing, combination of two kinds of Control Processor (CP) s, Cluster Control Processor (CCP) and System Control Processor (SCP), were proposed with Large System Control Processor (LSCP) for hierarchical cluster if needed. Faster computing performance of this system was well evaluated by simulation results for simultaneous execution of plural jobs and also pipeline processing between clusters, which showed the system led to effective use of existing system and enhancement of the cost performance. (T. Tanaka)
The fast Amsterdam multiprocessor (FAMP) system hardware

International Nuclear Information System (INIS)

Hertzberger, L.O.; Kieft, G.; Kisielewski, B.; Wiggers, L.W.; Engster, C.; Koningsveld, L. van

1981-01-01

The architecture of a multiprocessor system is described that will be used for on-line filter and second stage trigger applications. The system is based on the MC 68000 microprocessor from Motorola. Emphasis is paid to hardware aspects, in particular the modularity, processor communication and interfacing, whereas the system software and the applications will be described in separate articles. (orig.)
Robust large-scale parallel nonlinear solvers for simulations.

Energy Technology Data Exchange (ETDEWEB)

Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson (Sandia National Laboratories, Livermore, CA)

2005-11-01

This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any
Shared memory parallelism for 3D cartesian discrete ordinates solver

International Nuclear Information System (INIS)

Moustafa, S.; Dutka-Malen, I.; Plagne, L.; Poncot, A.; Ramet, P.

2013-01-01

This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multi-core + SIMD - Single Instruction on Multiple Data) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46*10 6 spatial cells and 1*10 12 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40.74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool. (authors)
Event parallelism: Distributed memory parallel computing for high energy physics experiments

International Nuclear Information System (INIS)

Nash, T.

1989-05-01

This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. 6 figs

Event parallelism: Distributed memory parallel computing for high energy physics experiments

International Nuclear Information System (INIS)

Nash, T.

1989-01-01

This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC systems, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described. (orig.)
Event parallelism: Distributed memory parallel computing for high energy physics experiments

Science.gov (United States)

Nash, Thomas

1989-12-01

This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.
Large capacity temporary visual memory

Science.gov (United States)

Endress, Ansgar D.; Potter, Mary C.

2014-01-01

Visual working memory (WM) capacity is thought to be limited to three or four items. However, many cognitive activities seem to require larger temporary memory stores. Here, we provide evidence for a temporary memory store with much larger capacity than past WM capacity estimates. Further, based on previous WM research, we show that a single factor — proactive interference — is sufficient to bring capacity estimates down to the range of previous WM capacity estimates. Participants saw a rapid serial visual presentation (RSVP) of 5 to 21 pictures of familiar objects or words presented at rates of 4/s or 8/s, respectively, and thus too fast for strategies such as rehearsal. Recognition memory was tested with a single probe item. When new items were used on all trials, no fixed memory capacities were observed, with estimates of up to 9.1 retained pictures for 21-item lists, and up to 30.0 retained pictures for 100-item lists, and no clear upper bound to how many items could be retained. Further, memory items were not stored in a temporally stable form of memory, but decayed almost completely after a few minutes. In contrast, when, as in most WM experiments, a small set of items was reused across all trials, thus creating proactive interference among items, capacity remained in the range reported in previous WM experiments. These results show that humans have a large-capacity temporary memory store in the absence of proactive interference, and raise the question of whether temporary memory in everyday cognitive processing is severely limited as in WM experiments, or has the much larger capacity found in the present experiments. PMID:23937181
Task-FIFO co-scheduling of streaming applications on MPSoCs with predictable memory hierarchy

NARCIS (Netherlands)

Tang, Q.; Basten, A.A.; Geilen, M.C.W.; Stuijk, S.; Wei, Ji-Bo

This article studies the scheduling of real-time streaming applications on multiprocessor systems-on-chips with predictable memory hierarchy. An iteration-based task-FIFO co-scheduling framework is proposed for this problem. We obtain FIFO size distributions using Pareto space searching, based on
Task-FIFO co-scheduling of streaming applications on MPSoCs with predictable memory hierarchy

NARCIS (Netherlands)

Tang, Q.; Basten, T.; Geilen, M.; Stuijk, S.; Wei, J.B.

2017-01-01

This article studies the scheduling of real-time streaming applications on multiprocessor systems-on-chips with predictable memory hierarchy. An iteration-based task-FIFO co-scheduling framework is proposed for this problem. We obtain FIFO size distributions using Pareto space searching, based on
Design exploration of emerging nano-scale non-volatile memory

CERN Document Server

Yu, Hao

2014-01-01

This book presents the latest techniques for characterization, modeling and design for nano-scale non-volatile memory (NVM) devices. Coverage focuses on fundamental NVM device fabrication and characterization, internal state identification of memristic dynamics with physics modeling, NVM circuit design, and hybrid NVM memory system design-space optimization. The authors discuss design methodologies for nano-scale NVM devices from a circuits/systems perspective, including the general foundations for the fundamental memristic dynamics in NVM devices. Coverage includes physical modeling, as well as the development of a platform to explore novel hybrid CMOS and NVM circuit and system design. • Offers readers a systematic and comprehensive treatment of emerging nano-scale non-volatile memory (NVM) devices; • Focuses on the internal state of NVM memristic dynamics, novel NVM readout and memory cell circuit design, and hybrid NVM memory system optimization; • Provides both theoretical analysis and pr...
Large-Scale Functional Brain Network Abnormalities in Alzheimer’s Disease: Insights from Functional Neuroimaging

Directory of Open Access Journals (Sweden)

Bradford C. Dickerson

2009-01-01

Full Text Available Functional MRI (fMRI studies of mild cognitive impairment (MCI and Alzheimer’s disease (AD have begun to reveal abnormalities in large-scale memory and cognitive brain networks. Since the medial temporal lobe (MTL memory system is a site of very early pathology in AD, a number of studies have focused on this region of the brain. Yet it is clear that other regions of the large-scale episodic memory network are affected early in the disease as well, and fMRI has begun to illuminate functional abnormalities in frontal, temporal, and parietal cortices as well in MCI and AD. Besides predictable hypoactivation of brain regions as they accrue pathology and undergo atrophy, there are also areas of hyperactivation in brain memory and cognitive circuits, possibly representing attempted compensatory activity. Recent fMRI data in MCI and AD are beginning to reveal relationships between abnormalities of functional activity in the MTL memory system and in functionally connected brain regions, such as the precuneus. Additional work with “resting state” fMRI data is illuminating functional-anatomic brain circuits and their disruption by disease. As this work continues to mature, it will likely contribute to our understanding of fundamental memory processes in the human brain and how these are perturbed in memory disorders. We hope these insights will translate into the incorporation of measures of task-related brain function into diagnostic assessment or therapeutic monitoring, which will hopefully one day be useful for demonstrating beneficial effects of treatments being tested in clinical trials.
Rendering Large-Scale Terrain Models and Positioning Objects in Relation to 3D Terrain

National Research Council Canada - National Science Library

Hittner, Brian

2003-01-01

.... Rendering large scale landscapes based on 3D geometry generally did not occur because the scenes generated tended to use up too much system memory and overburden 3D graphics cards with too many polygons...
Large-scale modeling of epileptic seizures: scaling properties of two parallel neuronal network simulation algorithms.

Science.gov (United States)

Pesce, Lorenzo L; Lee, Hyong C; Hereld, Mark; Visser, Sid; Stevens, Rick L; Wildeman, Albert; van Drongelen, Wim

2013-01-01

Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.
Large-Scale Modeling of Epileptic Seizures: Scaling Properties of Two Parallel Neuronal Network Simulation Algorithms

Directory of Open Access Journals (Sweden)

Lorenzo L. Pesce

2013-01-01

Full Text Available Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale simulations of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale simulations. We have determined the detailed behavior of two such simulators on parallel computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons and processor pool sizes (1 to 256 processors. Our simulations required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, simulations of epileptic seizures on networks with millions of cells should be feasible on current supercomputers.
Active power reserves evaluation in large scale PVPPs

DEFF Research Database (Denmark)

Crăciun, Bogdan-Ionut; Kerekes, Tamas; Sera, Dezso

2013-01-01

The present trend on investing in renewable ways of producing electricity in the detriment of conventional fossil fuel-based plants will lead to a certain point where these plants have to provide ancillary services and contribute to overall grid stability. Photovoltaic (PV) power has the fastest...... growth among all renewable energies and managed to reach high penetration levels creating instabilities which at the moment are corrected by the conventional generation. This paradigm will change in the future scenarios where most of the power is supplied by large scale renewable plants and parts...... of the ancillary services have to be shared by the renewable plants. The main focus of the proposed paper is to technically and economically analyze the possibility of having active power reserves in large scale PV power plants (PVPPs) without any auxiliary storage equipment. The provided reserves should...
The development and validation of the Memory Support Rating Scale.

Science.gov (United States)

Lee, Jason Y; Worrell, Frank C; Harvey, Allison G

2016-06-01

Patient memory for treatment information is poor, and worse memory for treatment information is associated with poorer clinical outcomes. Memory support techniques have been harnessed to improve patient memory for treatment. However, a measure of memory support used by treatment providers during sessions has yet to be established. The present study reports on the development and psychometric properties of the Memory Support Rating Scale (MSRS)-an observer-rated scale designed to measure memory support. Adults with major depressive disorder (MDD; N = 42) were randomized to either cognitive therapy plus memory support (CT + MS; n = 22) or cognitive therapy as-usual (CT-as-usual; n = 20). At posttreatment, patients freely recalled treatment points via the patient recall task. Sessions (n = 171) were coded for memory support using the MSRS, 65% of which were also assessed for the quality of cognitive therapy via the Cognitive Therapy Rating Scale (CTRS). A unidimensional scale composed of 8 items was developed using exploratory factor analysis, though a larger sample is needed to further assess the factor structure of MSRS scores. High interrater and test-retest reliabilities of MSRS scores were observed across 7 MSRS coders. MSRS scores were higher in the CT + MS condition compared with CT-as-usual, demonstrating group differentiation ability. MSRS scores were positively associated with patient recall task scores but not associated with CTRS scores, demonstrating convergent and discriminant validity, respectively. Results indicate that the MSRS yields reliable and valid scores for measuring treatment providers' use of memory support while delivering cognitive therapy. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Synaptic scaling enables dynamically distinct short- and long-term memory formation.

Directory of Open Access Journals (Sweden)

Christian Tetzlaff

2013-10-01

Full Text Available Memory storage in the brain relies on mechanisms acting on time scales from minutes, for long-term synaptic potentiation, to days, for memory consolidation. During such processes, neural circuits distinguish synapses relevant for forming a long-term storage, which are consolidated, from synapses of short-term storage, which fade. How time scale integration and synaptic differentiation is simultaneously achieved remains unclear. Here we show that synaptic scaling - a slow process usually associated with the maintenance of activity homeostasis - combined with synaptic plasticity may simultaneously achieve both, thereby providing a natural separation of short- from long-term storage. The interaction between plasticity and scaling provides also an explanation for an established paradox where memory consolidation critically depends on the exact order of learning and recall. These results indicate that scaling may be fundamental for stabilizing memories, providing a dynamic link between early and late memory formation processes.
Synaptic scaling enables dynamically distinct short- and long-term memory formation.

Science.gov (United States)

Tetzlaff, Christian; Kolodziejski, Christoph; Timme, Marc; Tsodyks, Misha; Wörgötter, Florentin

2013-10-01

Memory storage in the brain relies on mechanisms acting on time scales from minutes, for long-term synaptic potentiation, to days, for memory consolidation. During such processes, neural circuits distinguish synapses relevant for forming a long-term storage, which are consolidated, from synapses of short-term storage, which fade. How time scale integration and synaptic differentiation is simultaneously achieved remains unclear. Here we show that synaptic scaling - a slow process usually associated with the maintenance of activity homeostasis - combined with synaptic plasticity may simultaneously achieve both, thereby providing a natural separation of short- from long-term storage. The interaction between plasticity and scaling provides also an explanation for an established paradox where memory consolidation critically depends on the exact order of learning and recall. These results indicate that scaling may be fundamental for stabilizing memories, providing a dynamic link between early and late memory formation processes.
A Visual Approach to Investigating Shared and Global Memory Behavior of CUDA Kernels

KAUST Repository

Rosen, Paul

2013-01-01

We present an approach to investigate the memory behavior of a parallel kernel executing on thousands of threads simultaneously within the CUDA architecture. Our top-down approach allows for quickly identifying any significant differences between the execution of the many blocks and warps. As interesting warps are identified, we allow further investigation of memory behavior by visualizing the shared memory bank conflicts and global memory coalescence, first with an overview of a single warp with many operations and, subsequently, with a detailed view of a single warp and a single operation. We demonstrate the strength of our approach in the context of a parallel matrix transpose kernel and a parallel 1D Haar Wavelet transform kernel. © 2013 The Author(s) Computer Graphics Forum © 2013 The Eurographics Association and Blackwell Publishing Ltd.
A Visual Approach to Investigating Shared and Global Memory Behavior of CUDA Kernels

KAUST Repository

Rosen, Paul

2013-06-01

We present an approach to investigate the memory behavior of a parallel kernel executing on thousands of threads simultaneously within the CUDA architecture. Our top-down approach allows for quickly identifying any significant differences between the execution of the many blocks and warps. As interesting warps are identified, we allow further investigation of memory behavior by visualizing the shared memory bank conflicts and global memory coalescence, first with an overview of a single warp with many operations and, subsequently, with a detailed view of a single warp and a single operation. We demonstrate the strength of our approach in the context of a parallel matrix transpose kernel and a parallel 1D Haar Wavelet transform kernel. © 2013 The Author(s) Computer Graphics Forum © 2013 The Eurographics Association and Blackwell Publishing Ltd.
Exploring Hardware Support For Scaling Irregular Applications on Multi-node Multi-core Architectures

Energy Technology Data Exchange (ETDEWEB)

Secchi, Simone; Ceriani, Marco; Tumeo, Antonino; Villa, Oreste; Palermo, Gianluca; Raffo, Luigi

2013-06-05

With the recent emergence of large-scale knowledge dis- covery, data mining and social network analysis, irregular applications have gained renewed interest. Classic cache-based high-performance architectures do not provide optimal performances with such kind of workloads, mainly due to the very low spatial and temporal locality of the irregular control and memory access patterns. In this paper, we present a multi-node, multi-core, fine-grained multi-threaded shared-memory system architecture specifically designed for the execution of large-scale irregular applications, and built on top of three pillars, that we believe are fundamental to support these workloads. First, we offer transparent hardware support for Partitioned Global Address Space (PGAS) to provide a large globally-shared address space with no software library overhead. Second, we employ multi-threaded multi-core processing nodes to achieve the necessary latency tolerance required by accessing global memory, which potentially resides in a remote node. Finally, we devise hardware support for inter-thread synchronization on the whole global address space. We first model the performances by using an analytical model that takes into account the main architecture and application characteristics. We describe the hardware design of the proposed cus- tom architectural building blocks that provide support for the above- mentioned three pillars. Finally, we present a limited-scale evaluation of the system on a multi-board FPGA prototype with typical irregular kernels and benchmarks. The experimental evaluation demonstrates the architecture performance scalability for different configurations of the whole system.
Large-scale digitizer system, analog converters

International Nuclear Information System (INIS)

Althaus, R.F.; Lee, K.L.; Kirsten, F.A.; Wagner, L.J.

1976-10-01

Analog to digital converter circuits that are based on the sharing of common resources, including those which are critical to the linearity and stability of the individual channels, are described. Simplicity of circuit composition is valued over other more costly approaches. These are intended to be applied in a large-scale processing and digitizing system for use with high-energy physics detectors such as drift-chambers or phototube-scintillator arrays. Signal distribution techniques are of paramount importance in maintaining adequate signal-to-noise ratio. Noise in both amplitude and time-jitter senses is held sufficiently low so that conversions with 10-bit charge resolution and 12-bit time resolution are achieved
Stream-processing pipelines: processing of streams on multiprocessor architecture

NARCIS (Netherlands)

Kavaldjiev, N.K.; Smit, Gerardus Johannes Maria; Jansen, P.G.

In this paper we study the timing aspects of the operation of stream-processing applications that run on a multiprocessor architecture. Dependencies are derived for the processing and communication times of the processors in such a system. Three cases of real-time constrained operation and four
Parallel k-means++ for Multiple Shared-Memory Architectures

Energy Technology Data Exchange (ETDEWEB)

Mackey, Patrick S.; Lewis, Robert R.

2016-09-22

In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varying data sizes.

Sharing Memories

DEFF Research Database (Denmark)

Rodil, Kasper; Nielsen, Emil Byskov; Nielsen, Jonathan Bernstorff

2018-01-01

in which it was to be contextualized and through a close partnership between aphasics and their caretakers. The underlying design methodology for the MemoryBook is Participatory Design manifested through the collaboration and creations by two aphasic residents and one member of the support staff. The idea...
MULTI: a shared memory approach to cooperative molecular modeling.

Science.gov (United States)

Darden, T; Johnson, P; Smith, H

1991-03-01

A general purpose molecular modeling system, MULTI, based on the UNIX shared memory and semaphore facilities for interprocess communication is described. In addition to the normal querying or monitoring of geometric data, MULTI also provides processes for manipulating conformations, and for displaying peptide or nucleic acid ribbons, Connolly surfaces, close nonbonded contacts, crystal-symmetry related images, least-squares superpositions, and so forth. This paper outlines the basic techniques used in MULTI to ensure cooperation among these specialized processes, and then describes how they can work together to provide a flexible modeling environment.
Transistor memory devices with large memory windows, using multi-stacking of densely packed, hydrophobic charge trapping metal nanoparticle array

International Nuclear Information System (INIS)

Cho, Ikjun; Cho, Jinhan; Kim, Beom Joon; Cho, Jeong Ho; Ryu, Sook Won

2014-01-01

Organic field-effect transistor (OFET) memories have rapidly evolved from low-cost and flexible electronics with relatively low-memory capacities to memory devices that require high-capacity memory such as smart memory cards or solid-state hard drives. Here, we report the high-capacity OFET memories based on the multilayer stacking of densely packed hydrophobic metal NP layers in place of the traditional transistor memory systems based on a single charge trapping layer. We demonstrated that the memory performances of devices could be significantly enhanced by controlling the adsorption isotherm behavior, multilayer stacking structure and hydrophobicity of the metal NPs. For this study, tetraoctylammonium (TOA)-stabilized Au nanoparticles (TOA-Au NPs ) were consecutively layer-by-layer (LbL) assembled with an amine-functionalized poly(amidoamine) dendrimer (PAD). The formed (PAD/TOA-Au NP ) n films were used as a multilayer stacked charge trapping layer at the interface between the tunneling dielectric layer and the SiO 2 gate dielectric layer. For a single Au NP layer (i.e. PAD/TOA-Au NP ) 1 ) with a number density of 1.82 × 10 12 cm −2 , the memory window of the OFET memory device was measured to be approximately 97 V. The multilayer stacked OFET memory devices prepared with four Au NP layers exhibited excellent programmable memory properties (i.e. a large memory window (ΔV th ) exceeding 145 V, a fast switching speed (1 μs), a high program/erase (P/E) current ratio (greater than 10 6 ) and good electrical reliability) during writing and erasing over a relatively short time scale under an operation voltage of 100 V applied at the gate. (paper)
Evict on write, a management strategy for a prefetch unit and/or first level cache in a multiprocessor system with speculative execution

Science.gov (United States)

Gara, Alan; Ohmacht, Martin

2014-09-16

In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.
Memory controllers for high-performance and real-time MPSoCs : requirements, architectures, and future trends

NARCIS (Netherlands)

Akesson, K.B.; Huang, Po-Chun; Clermidy, F.; Dutoit, D.; Goossens, K.G.W.; Chang, Yuan-Hao; Kuo, Tei-Wei; Vivet, P.; Wingard, D.

2011-01-01

Designing memory controllers for complex real-time and high-performance multi-processor systems-on-chip is challenging, since sufficient capacity and (real-time) performance must be provided in a reliable manner at low cost and with low power consumption. This special session contains four
Large-scale inverse model analyses employing fast randomized data reduction

Science.gov (United States)

Lin, Youzuo; Le, Ellen B.; O'Malley, Daniel; Vesselinov, Velimir V.; Bui-Thanh, Tan

2017-08-01

When the number of observations is large, it is computationally challenging to apply classical inverse modeling techniques. We have developed a new computationally efficient technique for solving inverse problems with a large number of observations (e.g., on the order of 107 or greater). Our method, which we call the randomized geostatistical approach (RGA), is built upon the principal component geostatistical approach (PCGA). We employ a data reduction technique combined with the PCGA to improve the computational efficiency and reduce the memory usage. Specifically, we employ a randomized numerical linear algebra technique based on a so-called "sketching" matrix to effectively reduce the dimension of the observations without losing the information content needed for the inverse analysis. In this way, the computational and memory costs for RGA scale with the information content rather than the size of the calibration data. Our algorithm is coded in Julia and implemented in the MADS open-source high-performance computational framework (http://mads.lanl.gov). We apply our new inverse modeling method to invert for a synthetic transmissivity field. Compared to a standard geostatistical approach (GA), our method is more efficient when the number of observations is large. Most importantly, our method is capable of solving larger inverse problems than the standard GA and PCGA approaches. Therefore, our new model inversion method is a powerful tool for solving large-scale inverse problems. The method can be applied in any field and is not limited to hydrogeological applications such as the characterization of aquifer heterogeneity.
Scientific applications and numerical algorithms on the midas multiprocessor system

International Nuclear Information System (INIS)

Logan, D.; Maples, C.

1986-01-01

The MIDAS multiprocessor system is a multi-level, hierarchial structure designed at the Advanced Computer Architecture Laboratory of the University of California's Lawrence Berkeley Laboratory. A two-stage, 11-processor system has been operational for over a year and is currently undergoing expansion. It has been employed to investigate the performance of different methods of decomposing various problems and algorithms into a multiprocessor environment. The results of such tests on a variety of applications such as scientific data analysis, Monte Carlo calculations, and image processing, are discussed. Often such decompositions involve investigating the parallel structure of fundamental algorithms. Several basic algorithms dealing with random number generation, matrix diagonalization, fast Fourier transforms, and finite element methods in solving partial differential equations are also discussed. The performance and projected extensibilities of these decompositions on the MIDAS system are reported
An efficient implementation of 3D high-resolution imaging for large-scale seismic data with GPU/CPU heterogeneous parallel computing

Science.gov (United States)

Xu, Jincheng; Liu, Wei; Wang, Jin; Liu, Linong; Zhang, Jianfeng

2018-02-01

De-absorption pre-stack time migration (QPSTM) compensates for the absorption and dispersion of seismic waves by introducing an effective Q parameter, thereby making it an effective tool for 3D, high-resolution imaging of seismic data. Although the optimal aperture obtained via stationary-phase migration reduces the computational cost of 3D QPSTM and yields 3D stationary-phase QPSTM, the associated computational efficiency is still the main problem in the processing of 3D, high-resolution images for real large-scale seismic data. In the current paper, we proposed a division method for large-scale, 3D seismic data to optimize the performance of stationary-phase QPSTM on clusters of graphics processing units (GPU). Then, we designed an imaging point parallel strategy to achieve an optimal parallel computing performance. Afterward, we adopted an asynchronous double buffering scheme for multi-stream to perform the GPU/CPU parallel computing. Moreover, several key optimization strategies of computation and storage based on the compute unified device architecture (CUDA) were adopted to accelerate the 3D stationary-phase QPSTM algorithm. Compared with the initial GPU code, the implementation of the key optimization steps, including thread optimization, shared memory optimization, register optimization and special function units (SFU), greatly improved the efficiency. A numerical example employing real large-scale, 3D seismic data showed that our scheme is nearly 80 times faster than the CPU-QPSTM algorithm. Our GPU/CPU heterogeneous parallel computing framework significant reduces the computational cost and facilitates 3D high-resolution imaging for large-scale seismic data.
To share and be shared

DEFF Research Database (Denmark)

Winther, Ida Wentzel

2018-01-01

to another. To a certain degree, they share their everyday lives, things, places, memories, and past/future, but as the ones who move back and forth, they belong a little less in each place. This article is about children who are shared between their parent, households and siblings. They are shared...
Programming parallel architectures: The BLAZE family of languages

Science.gov (United States)

Mehrotra, Piyush

1988-01-01

Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
An Adaptive Hybrid Multiprocessor technique for bioinformatics sequence alignment

KAUST Repository

Bonny, Talal

2012-07-28

Sequence alignment algorithms such as the Smith-Waterman algorithm are among the most important applications in the development of bioinformatics. Sequence alignment algorithms must process large amounts of data which may take a long time. Here, we introduce our Adaptive Hybrid Multiprocessor technique to accelerate the implementation of the Smith-Waterman algorithm. Our technique utilizes both the graphics processing unit (GPU) and the central processing unit (CPU). It adapts to the implementation according to the number of CPUs given as input by efficiently distributing the workload between the processing units. Using existing resources (GPU and CPU) in an efficient way is a novel approach. The peak performance achieved for the platforms GPU + CPU, GPU + 2CPUs, and GPU + 3CPUs is 10.4 GCUPS, 13.7 GCUPS, and 18.6 GCUPS, respectively (with the query length of 511 amino acid). © 2010 IEEE.
Large-scale molecular dynamics simulations of self-assembling systems.

Science.gov (United States)

Klein, Michael L; Shinoda, Wataru

2008-08-08

Relentless increases in the size and performance of multiprocessor computers, coupled with new algorithms and methods, have led to novel applications of simulations across chemistry. This Perspective focuses on the use of classical molecular dynamics and so-called coarse-grain models to explore phenomena involving self-assembly in complex fluids and biological systems.
Software Toolchain for Large-Scale RE-NFA Construction on FPGA

Directory of Open Access Journals (Sweden)

Yi-Hua E. Yang

2009-01-01

and O(n×m memory by our software. A large number of RE-NFAs are placed onto a two-dimensional staged pipeline, allowing scalability to thousands of RE-NFAs with linear area increase and little clock rate penalty due to scaling. On a PC with a 2 GHz Athlon64 processor and 2 GB memory, our prototype software constructs hundreds of RE-NFAs used by Snort in less than 10 seconds. We also designed a benchmark generator which can produce RE-NFAs with configurable pattern complexity parameters, including state count, state fan-in, loop-back and feed-forward distances. Several regular expressions with various complexities are used to test the performance of our RE-NFA construction software.
Translation techniques for distributed-shared memory programming models

Energy Technology Data Exchange (ETDEWEB)

Fuller, Douglas James [Iowa State Univ., Ames, IA (United States)

2005-01-01

The high performance computing community has experienced an explosive improvement in distributed-shared memory hardware. Driven by increasing real-world problem complexity, this explosion has ushered in vast numbers of new systems. Each new system presents new challenges to programmers and application developers. Part of the challenge is adapting to new architectures with new performance characteristics. Different vendors release systems with widely varying architectures that perform differently in different situations. Furthermore, since vendors need only provide a single performance number (total MFLOPS, typically for a single benchmark), they only have strong incentive initially to optimize the API of their choice. Consequently, only a fraction of the available APIs are well optimized on most systems. This causes issues porting and writing maintainable software, let alone issues for programmers burdened with mastering each new API as it is released. Also, programmers wishing to use a certain machine must choose their API based on the underlying hardware instead of the application. This thesis argues that a flexible, extensible translator for distributed-shared memory APIs can help address some of these issues. For example, a translator might take as input code in one API and output an equivalent program in another. Such a translator could provide instant porting for applications to new systems that do not support the application's library or language natively. While open-source APIs are abundant, they do not perform optimally everywhere. A translator would also allow performance testing using a single base code translated to a number of different APIs. Most significantly, this type of translator frees programmers to select the most appropriate API for a given application based on the application (and developer) itself instead of the underlying hardware.
Using memory-efficient algorithm for large-scale time-domain modeling of surface plasmon polaritons propagation in organic light emitting diodes

Science.gov (United States)

Zakirov, Andrey; Belousov, Sergei; Valuev, Ilya; Levchenko, Vadim; Perepelkina, Anastasia; Zempo, Yasunari

2017-10-01

We demonstrate an efficient approach to numerical modeling of optical properties of large-scale structures with typical dimensions much greater than the wavelength of light. For this purpose, we use the finite-difference time-domain (FDTD) method enhanced with a memory efficient Locally Recursive non-Locally Asynchronous (LRnLA) algorithm called DiamondTorre and implemented for General Purpose Graphical Processing Units (GPGPU) architecture. We apply our approach to simulation of optical properties of organic light emitting diodes (OLEDs), which is an essential step in the process of designing OLEDs with improved efficiency. Specifically, we consider a problem of excitation and propagation of surface plasmon polaritons (SPPs) in a typical OLED, which is a challenging task given that SPP decay length can be about two orders of magnitude greater than the wavelength of excitation. We show that with our approach it is possible to extend the simulated volume size sufficiently so that SPP decay dynamics is accounted for. We further consider an OLED with periodically corrugated metallic cathode and show how the SPP decay length can be greatly reduced due to scattering off the corrugation. Ultimately, we compare the performance of our algorithm to the conventional FDTD and demonstrate that our approach can efficiently be used for large-scale FDTD simulations with the use of only a single GPGPU-powered workstation, which is not practically feasible with the conventional FDTD.
Coupling Computer Codes for The Analysis of Severe Accident Using A Pseudo Shared Memory Based on MPI

International Nuclear Information System (INIS)

Cho, Young Chul; Park, Chang-Hwan; Kim, Dong-Min

2016-01-01

As there are four codes in-vessel analysis code (CSPACE), ex-vessel analysis code (SACAP), corium behavior analysis code (COMPASS), and fission product behavior analysis code, for the analysis of severe accident, it is complex to implement the coupling of codes with the similar methodologies for RELAP and CONTEMPT or SPACE and CAP. Because of that, an efficient coupling so called Pseudo shared memory architecture was introduced. In this paper, coupling methodologies will be compared and the methodology used for the analysis of severe accident will be discussed in detail. The barrier between in-vessel and ex-vessel has been removed for the analysis of severe accidents with the implementation of coupling computer codes with pseudo shared memory architecture based on MPI. The remaining are proper choice and checking of variables and values for the selected severe accident scenarios, e.g., TMI accident. Even though it is possible to couple more than two computer codes with pseudo shared memory architecture, the methodology should be revised to couple parallel codes especially when they are programmed using MPI
Coupling Computer Codes for The Analysis of Severe Accident Using A Pseudo Shared Memory Based on MPI

Energy Technology Data Exchange (ETDEWEB)

Cho, Young Chul; Park, Chang-Hwan; Kim, Dong-Min [FNC Technology Co., Yongin (Korea, Republic of)

2016-10-15

As there are four codes in-vessel analysis code (CSPACE), ex-vessel analysis code (SACAP), corium behavior analysis code (COMPASS), and fission product behavior analysis code, for the analysis of severe accident, it is complex to implement the coupling of codes with the similar methodologies for RELAP and CONTEMPT or SPACE and CAP. Because of that, an efficient coupling so called Pseudo shared memory architecture was introduced. In this paper, coupling methodologies will be compared and the methodology used for the analysis of severe accident will be discussed in detail. The barrier between in-vessel and ex-vessel has been removed for the analysis of severe accidents with the implementation of coupling computer codes with pseudo shared memory architecture based on MPI. The remaining are proper choice and checking of variables and values for the selected severe accident scenarios, e.g., TMI accident. Even though it is possible to couple more than two computer codes with pseudo shared memory architecture, the methodology should be revised to couple parallel codes especially when they are programmed using MPI.
TensorFlow: A system for large-scale machine learning

OpenAIRE

Abadi, Martín; Barham, Paul; Chen, Jianmin; Chen, Zhifeng; Davis, Andy; Dean, Jeffrey; Devin, Matthieu; Ghemawat, Sanjay; Irving, Geoffrey; Isard, Michael; Kudlur, Manjunath; Levenberg, Josh; Monga, Rajat; Moore, Sherry; Murray, Derek G.

2016-01-01

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexib...
Safe and Efficient Support for Embeded Multi-Processors in ADA

Science.gov (United States)

Ruiz, Jose F.

2010-08-01

New software demands increasing processing power, and multi-processor platforms are spreading as the answer to achieve the required performance. Embedded real-time systems are also subject to this trend, but in the case of real-time mission-critical systems, the properties of reliability, predictability and analyzability are also paramount. The Ada 2005 language defined a subset of its tasking model, the Ravenscar profile, that provides the basis for the implementation of deterministic and time analyzable applications on top of a streamlined run-time system. This Ravenscar tasking profile, originally designed for single processors, has proven remarkably useful for modelling verifiable real-time single-processor systems. This paper proposes a simple extension to the Ravenscar profile to support multi-processor systems using a fully partitioned approach. The implementation of this scheme is simple, and it can be used to develop applications amenable to schedulability analysis.
The Impact of Process Scaling on Scratchpad Memory Energy Savings

Directory of Open Access Journals (Sweden)

Bennion Redd

2014-09-01

Full Text Available Scratchpad memories have been shown to reduce power consumption, but the different characteristics of nanometer scale processes, such as increased leakage power, motivate an examination of how the benefits of these memories change with process scaling. Process and application characteristics affect the amount of energy saved by a scratchpad memory. Increases in leakage as a percentage of total power particularly impact applications that rarely access memory. This study examines how the benefits of scratchpad memories have changed in newer processes, based on the measured performance of the WIMS (Wireless Integrated MicroSystems microcontroller implemented in 180- and 65-nm processes and upon simulations of this microcontroller implemented in a 32-nm process. The results demonstrate that scratchpad memories will continue to improve the power dissipation of many applications, given the leakage anticipated in the foreseeable future.

Plasma physics modeling and the Cray-2 multiprocessor

International Nuclear Information System (INIS)

Killeen, J.

1985-01-01

The importance of computer modeling in the magnetic fusion energy research program is discussed. The need for the most advanced supercomputers is described. To meet the demand for more powerful scientific computers to solve larger and more complicated problems, the computer industry is developing multiprocessors. The role of the Cray-2 in plasma physics modeling is discussed with some examples. 28 refs., 2 figs., 1 tab
Operating experience with a VMEbus multiprocessor system for data acquisition and reduction in nuclear physics

International Nuclear Information System (INIS)

Kutt, P.H.; Balamuth, D.P.

1989-01-01

A multiprocessor system based on commercially available VMEbus components has been developed for the acquisition and reduction of event-mode data in nuclear physics experiments. The system contains seven 68000 CPU's and 14 MB of memory. A minimal operating system handles data transfer and task allocation, and a compiler for a specially designed event analysis language produces code for the processors. The system has been in operation for four years at the University of Pennsylvania Tandem Accelerator Laboratory. Computation rates over 3 times that of a MicroVAX II have been achieved at a fraction of the cost. The use of WORM optical disks for event recording allows the processing for gigabyte data sets without operator intervention. A more powerful system is being planned which will make use of recently developed RISC processors to obtain an order of magnitude increase in computing power per node
Decomposing the relationship between cognitive functioning and self-referent memory beliefs in older adulthood: what's memory got to do with it?

Science.gov (United States)

Payne, Brennan R; Gross, Alden L; Hill, Patrick L; Parisi, Jeanine M; Rebok, George W; Stine-Morrow, Elizabeth A L

2017-07-01

With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability.
Mapping of H.264 decoding on a multiprocessor architecture

Science.gov (United States)

van der Tol, Erik B.; Jaspers, Egbert G.; Gelderblom, Rob H.

2003-05-01

Due to the increasing significance of development costs in the competitive domain of high-volume consumer electronics, generic solutions are required to enable reuse of the design effort and to increase the potential market volume. As a result from this, Systems-on-Chip (SoCs) contain a growing amount of fully programmable media processing devices as opposed to application-specific systems, which offered the most attractive solutions due to a high performance density. The following motivates this trend. First, SoCs are increasingly dominated by their communication infrastructure and embedded memory, thereby making the cost of the functional units less significant. Moreover, the continuously growing design costs require generic solutions that can be applied over a broad product range. Hence, powerful programmable SoCs are becoming increasingly attractive. However, to enable power-efficient designs, that are also scalable over the advancing VLSI technology, parallelism should be fully exploited. Both task-level and instruction-level parallelism can be provided by means of e.g. a VLIW multiprocessor architecture. To provide the above-mentioned scalability, we propose to partition the data over the processors, instead of traditional functional partitioning. An advantage of this approach is the inherent locality of data, which is extremely important for communication-efficient software implementations. Consequently, a software implementation is discussed, enabling e.g. SD resolution H.264 decoding with a two-processor architecture, whereas High-Definition (HD) decoding can be achieved with an eight-processor system, executing the same software. Experimental results show that the data communication considerably reduces up to 65% directly improving the overall performance. Apart from considerable improvement in memory bandwidth, this novel concept of partitioning offers a natural approach for optimally balancing the load of all processors, thereby further improving the
Functions of Memory Sharing and Mother-Child Reminiscing Behaviors: Individual and Cultural Variations

Science.gov (United States)

Kulkofsky, Sarah; Wang, Qi; Koh, Jessie Bee Kim

2009-01-01

This study examined maternal beliefs about the functions of memory sharing and the relations between these beliefs and mother-child reminiscing behaviors in a cross-cultural context. Sixty-three European American and 47 Chinese mothers completed an open-ended questionnaire concerning their beliefs about the functions of parent-child memory…
Knowledge Sharing Strategies for Large Complex Building Projects.

Directory of Open Access Journals (Sweden)

Esra Bektas

2013-06-01

Full Text Available The construction industry is a project-based sector with a myriad of actors such as architects, construction companies, consultants, producers of building materials (Anumba et al., 2005. The interaction between the project partners is often quite limited, which leads to insufficient knowledge sharing during the project and knowledge being unavailable for reuse (Fruchter et al. 2002. The result can be a considerable amount of extra work, delays and cost overruns. Design outcomes that are supposed to function as boundary objects across different disciplines can lead to misinterpretation of requirements, project content and objectives. In this research, knowledge is seen as resulting from social interactions; knowledge resides in communities and it is generated through social relationships (Wenger 1998, Olsson et al. 2008. Knowledge is often tacit, intangible and context-dependent and it is articulated in the changing responsibilities, roles, attitudes and values that are present in the work environment (Bresnen et al., 2003. In a project environment, knowledge enables individuals to solve problems, take decisions, and apply these decisions to actions. In order to achieve a shared understanding and minimize the misunderstanding and misinterpretations among project actors, it is necessary to share knowledge (Fong 2003. Sharing knowledge is particularly crucial in large complex building projects (LCBPs in order to accelerate the building process, improve architectural quality and prevent mistakes or undesirable results. However, knowledge sharing is often hampered through professional or organizational boundaries or contractual concerns. When knowledge is seen as an organizational asset, there is little willingness among project organizations to share their knowledge. Individual people may recognize the need to promote knowledge sharing throughout the project, but typically there is no deliberate strategy agreed by all project partners to address
Large-scale computation in solid state physics - Recent developments and prospects

International Nuclear Information System (INIS)

DeVreese, J.T.

1985-01-01

During the past few years an increasing interest in large-scale computation is developing. Several initiatives were taken to evaluate and exploit the potential of ''supercomputers'' like the CRAY-1 (or XMP) or the CYBER-205. In the U.S.A., there first appeared the Lax report in 1982 and subsequently (1984) the National Science Foundation in the U.S.A. announced a program to promote large-scale computation at the universities. Also, in Europe several CRAY- and CYBER-205 systems have been installed. Although the presently available mainframes are the result of a continuous growth in speed and memory, they might have induced a discontinuous transition in the evolution of the scientific method; between theory and experiment a third methodology, ''computational science'', has become or is becoming operational
Large-scale solar purchasing

International Nuclear Information System (INIS)

1999-01-01

The principal objective of the project was to participate in the definition of a new IEA task concerning solar procurement (''the Task'') and to assess whether involvement in the task would be in the interest of the UK active solar heating industry. The project also aimed to assess the importance of large scale solar purchasing to UK active solar heating market development and to evaluate the level of interest in large scale solar purchasing amongst potential large scale purchasers (in particular housing associations and housing developers). A further aim of the project was to consider means of stimulating large scale active solar heating purchasing activity within the UK. (author)
Multiprocessor development for robot control

International Nuclear Information System (INIS)

Lee, John Min; Kim, Seung Ho; Kim, Chang Hoi; Kim, Byung Soo; Hwang, Suk Yeong; Lee, Young Bum; Sohn, Suk Won; Kim, Woon Gi

1990-01-01

The project of this study is to develop a real time controller applying autonomous robotic systems operated in hostile environment. Developed control system is designed with a multiprocessor to get independency and reliability as well as to extend the system easily. The control system is designed in three distinct subsystems (supervisory control part, functional control part, and remote control part). To review the functional performance of developed controller, a prototype mobile robot, which was installed 4 DOF mainpulator, was designed and manufactured. Initial tests showed that the robot could turn with a radius of 38 cm and a maximum speed of 1.26 km/hr and go over obstacle of 18 cm in height. (author)
Parallel discrete event simulation: A shared memory approach

Science.gov (United States)

Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

1987-01-01

With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.
Decomposing the relationship between cognitive functioning and self-referent memory beliefs in older adulthood: What’s memory got to do with it?

Science.gov (United States)

Payne, Brennan R.; Gross, Alden L.; Hill, Patrick L.; Parisi, Jeanine M.; Rebok, George W.; Stine-Morrow, Elizabeth A. L.

2018-01-01

With advancing age, episodic memory performance shows marked declines along with concurrent reports of lower subjective memory beliefs. Given that normative age-related declines in episodic memory co-occur with declines in other cognitive domains, we examined the relationship between memory beliefs and multiple domains of cognitive functioning. Confirmatory bi-factor structural equation models were used to parse the shared and independent variance among factors representing episodic memory, psychomotor speed, and executive reasoning in one large cohort study (Senior Odyssey, N = 462), and replicated using another large cohort of healthy older adults (ACTIVE, N = 2,802). Accounting for a general fluid cognitive functioning factor (comprised of the shared variance among measures of episodic memory, speed, and reasoning) attenuated the relationship between objective memory performance and subjective memory beliefs in both samples. Moreover, the general cognitive functioning factor was the strongest predictor of memory beliefs in both samples. These findings are consistent with the notion that dispositional memory beliefs may reflect perceptions of cognition more broadly. This may be one reason why memory beliefs have broad predictive validity for interventions that target fluid cognitive ability. PMID:27685541
A Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced Memories

Science.gov (United States)

1989-02-01

frames per second, font generation directly from conic spline descriptions, and rapid calculation of radiosity form factors. The hardware consists of...generality for rendering curved surfaces, volume data, objects dcscri id with Constructive Solid Geometry, for rendering scenes using the radiosity ...f.aces and for computing a spherical radiosity lighting model (see Section 7.6). Custom Memory Chips \\ 208 bits x 128 pixels - Renderer Board ix p o a
Adaptive scaling of reward in episodic memory: a replication study.

Science.gov (United States)

Mason, Alice; Ludwig, Casimir; Farrell, Simon

2017-11-01

Reward is thought to enhance episodic memory formation via dopaminergic consolidation. Bunzeck, Dayan, Dolan, and Duzel [(2010). A common mechanism for adaptive scaling of reward and novelty. Human Brain Mapping, 31, 1380-1394] provided functional magnetic resonance imaging (fMRI) and behavioural evidence that reward and episodic memory systems are sensitive to the contextual value of a reward-whether it is relatively higher or lower-as opposed to absolute value or prediction error. We carried out a direct replication of their behavioural study and did not replicate their finding that memory performance associated with reward follows this pattern of adaptive scaling. An effect of reward outcome was in the opposite direction to that in the original study, with lower reward outcomes leading to better memory than higher outcomes. There was a marginal effect of reward context, suggesting that expected value affected memory performance. We discuss the robustness of the reward memory relationship to variations in reward context, and whether other reward-related factors have a more reliable influence on episodic memory.
A Study of Shared-Memory Mutual Exclusion Protocols Using CADP

Science.gov (United States)

Mateescu, Radu; Serwe, Wendelin

Mutual exclusion protocols are an essential building block of concurrent systems: indeed, such a protocol is required whenever a shared resource has to be protected against concurrent non-atomic accesses. Hence, many variants of mutual exclusion protocols exist in the shared-memory setting, such as Peterson's or Dekker's well-known protocols. Although the functional correctness of these protocols has been studied extensively, relatively little attention has been paid to their non-functional aspects, such as their performance in the long run. In this paper, we report on experiments with the performance evaluation of mutual exclusion protocols using Interactive Markov Chains. Steady-state analysis provides an additional criterion for comparing protocols, which complements the verification of their functional properties. We also carefully re-examined the functional properties, whose accurate formulation as temporal logic formulas in the action-based setting turns out to be quite involved.
Modeling spatial-temporal operations with context-dependent associative memories.

Science.gov (United States)

Mizraji, Eduardo; Lin, Juan

2015-10-01

We organize our behavior and store structured information with many procedures that require the coding of spatial and temporal order in specific neural modules. In the simplest cases, spatial and temporal relations are condensed in prepositions like "below" and "above", "behind" and "in front of", or "before" and "after", etc. Neural operators lie beneath these words, sharing some similarities with logical gates that compute spatial and temporal asymmetric relations. We show how these operators can be modeled by means of neural matrix memories acting on Kronecker tensor products of vectors. The complexity of these memories is further enhanced by their ability to store episodes unfolding in space and time. How does the brain scale up from the raw plasticity of contingent episodic memories to the apparent stable connectivity of large neural networks? We clarify this transition by analyzing a model that flexibly codes episodic spatial and temporal structures into contextual markers capable of linking different memory modules.
Virtual memory support for distributed computing environments using a shared data object model

Science.gov (United States)

Huang, F.; Bacon, J.; Mapp, G.

1995-12-01

Conventional storage management systems provide one interface for accessing memory segments and another for accessing secondary storage objects. This hinders application programming and affects overall system performance due to mandatory data copying and user/kernel boundary crossings, which in the microkernel case may involve context switches. Memory-mapping techniques may be used to provide programmers with a unified view of the storage system. This paper extends such techniques to support a shared data object model for distributed computing environments in which good support for coherence and synchronization is essential. The approach is based on a microkernel, typed memory objects, and integrated coherence control. A microkernel architecture is used to support multiple coherence protocols and the addition of new protocols. Memory objects are typed and applications can choose the most suitable protocols for different types of object to avoid protocol mismatch. Low-level coherence control is integrated with high-level concurrency control so that the number of messages required to maintain memory coherence is reduced and system-wide synchronization is realized without severely impacting the system performance. These features together contribute a novel approach to the support for flexible coherence under application control.
A pipelined architecture for real time correction of non-uniformity in infrared focal plane arrays imaging system using multiprocessors

Science.gov (United States)

Zou, Liang; Fu, Zhuang; Zhao, YanZheng; Yang, JunYan

2010-07-01

This paper proposes a kind of pipelined electric circuit architecture implemented in FPGA, a very large scale integrated circuit (VLSI), which efficiently deals with the real time non-uniformity correction (NUC) algorithm for infrared focal plane arrays (IRFPA). Dual Nios II soft-core processors and a DSP with a 64+ core together constitute this image system. Each processor undertakes own systematic task, coordinating its work with each other's. The system on programmable chip (SOPC) in FPGA works steadily under the global clock frequency of 96Mhz. Adequate time allowance makes FPGA perform NUC image pre-processing algorithm with ease, which has offered favorable guarantee for the work of post image processing in DSP. And at the meantime, this paper presents a hardware (HW) and software (SW) co-design in FPGA. Thus, this systematic architecture yields an image processing system with multiprocessor, and a smart solution to the satisfaction with the performance of the system.
Abstractions for aperiodic multiprocessor scheduling of real-time stream processing applications

NARCIS (Netherlands)

Hausmans, J.P.H.M.

2015-01-01

Embedded multiprocessor systems are often used in the domain of real-time stream processing applications to keep up with increasing power and performance requirements. Examples of such real-time stream processing applications are digital radio baseband processing and WLAN transceivers. These stream
Multiprocessor Real-Time Scheduling with Hierarchical Processor Affinities

OpenAIRE

Bonifaci , Vincenzo; Brandenburg , Björn; D'Angelo , Gianlorenzo; Marchetti-Spaccamela , Alberto

2016-01-01

International audience; Many multiprocessor real-time operating systems offer the possibility to restrict the migrations of any task to a specified subset of processors by setting affinity masks. A notion of " strong arbitrary processor affinity scheduling " (strong APA scheduling) has been proposed; this notion avoids schedulability losses due to overly simple implementations of processor affinities. Due to potential overheads, strong APA has not been implemented so far in a real-time operat...
Hierarchical approach to optimization of parallel matrix multiplication on large-scale platforms

KAUST Repository

Hasanov, Khalid

2014-03-04

© 2014, Springer Science+Business Media New York. Many state-of-the-art parallel algorithms, which are widely used in scientific applications executed on high-end computing systems, were designed in the twentieth century with relatively small-scale parallelism in mind. Indeed, while in 1990s a system with few hundred cores was considered a powerful supercomputer, modern top supercomputers have millions of cores. In this paper, we present a hierarchical approach to optimization of message-passing parallel algorithms for execution on large-scale distributed-memory systems. The idea is to reduce the communication cost by introducing hierarchy and hence more parallelism in the communication scheme. We apply this approach to SUMMA, the state-of-the-art parallel algorithm for matrix–matrix multiplication, and demonstrate both theoretically and experimentally that the modified Hierarchical SUMMA significantly improves the communication cost and the overall performance on large-scale platforms.

A Performance-Prediction Model for PIC Applications on Clusters of Symmetric MultiProcessors: Validation with Hierarchical HPF+OpenMP Implementation

Directory of Open Access Journals (Sweden)

Sergio Briguglio

2003-01-01

Full Text Available A performance-prediction model is presented, which describes different hierarchical workload decomposition strategies for particle in cell (PIC codes on Clusters of Symmetric MultiProcessors. The devised workload decomposition is hierarchically structured: a higher-level decomposition among the computational nodes, and a lower-level one among the processors of each computational node. Several decomposition strategies are evaluated by means of the prediction model, with respect to the memory occupancy, the parallelization efficiency and the required programming effort. Such strategies have been implemented by integrating the high-level languages High Performance Fortran (at the inter-node stage and OpenMP (at the intra-node one. The details of these implementations are presented, and the experimental values of parallelization efficiency are compared with the predicted results.
A Parallel Saturation Algorithm on Shared Memory Architectures

Science.gov (United States)

Ezekiel, Jonathan; Siminiceanu

2007-01-01

Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.
Thermal-Aware Scheduling for Future Chip Multiprocessors

Directory of Open Access Journals (Sweden)

Pedro Trancoso

2007-04-01

Full Text Available The increased complexity and operating frequency in current single chip microprocessors is resulting in a decrease in the performance improvements. Consequently, major manufacturers offer chip multiprocessor (CMP architectures in order to keep up with the expected performance gains. This architecture is successfully being introduced in many markets including that of the embedded systems. Nevertheless, the integration of several cores onto the same chip may lead to increased heat dissipation and consequently additional costs for cooling, higher power consumption, decrease of the reliability, and thermal-induced performance loss, among others. In this paper, we analyze the evolution of the thermal issues for the future chip multiprocessor architectures and show that as the number of on-chip cores increases, the thermal-induced problems will worsen. In addition, we present several scenarios that result in excessive thermal stress to the CMP chip or significant performance loss. In order to minimize or even eliminate these problems, we propose thermal-aware scheduler (TAS algorithms. When assigning processes to cores, TAS takes their temperature and cooling ability into account in order to avoid thermal stress and at the same time improve the performance. Experimental results have shown that a TAS algorithm that considers also the temperatures of neighboring cores is able to significantly reduce the temperature-induced performance loss while at the same time, decrease the chip's temperature across many different operation and configuration scenarios.
A Latent Factor Analysis of Working Memory Measures Using Large-Scale Data

Directory of Open Access Journals (Sweden)

Otto Waris

2017-06-01

Full Text Available Working memory (WM is a key cognitive system that is strongly related to other cognitive domains and relevant for everyday life. However, the structure of WM is yet to be determined. A number of WM models have been put forth especially by factor analytical studies. In broad terms, these models vary by their emphasis on WM contents (e.g., visuospatial, verbal vs. WM processes (e.g., maintenance, updating as critical, dissociable elements. Here we conducted confirmatory and exploratory factor analyses on a broad set of WM tasks, half of them numerical-verbal and half of them visuospatial, representing four commonly used task paradigms: simple span, complex span, running memory, and n-back. The tasks were selected to allow the detection of both content-based (visuospatial, numerical-verbal and process-based (maintenance, updating divisions. The data were collected online which allowed the recruitment of a large and demographically diverse sample of adults (n = 711. Both factor analytical methods pointed to a clear division according to task content for all paradigms except n-back, while there was no indication for a process-based division. Besides the content-based division, confirmatory factor analyses supported a model that also included a general WM factor. The n-back tasks had the highest loadings on the general factor, suggesting that this factor reflected high-level cognitive resources such as executive functioning and fluid intelligence that are engaged with all WM tasks, and possibly even more so with the n-back. Together with earlier findings that indicate high variability of process-based WM divisions, we conclude that the most robust division of WM is along its contents (visuospatial vs. numerical-verbal, rather than along its hypothetical subprocesses.
The Evolution of the Wechsler Memory Scale: A Selective Review.

Science.gov (United States)

Kent, Phillip

2013-02-27

In clinical use since 1940, the Wechsler Memory Scale was formally introduced to the psychological community in 1945. By 1946, it ranked 90th out of the 100 most frequently used psychological tests. By 1969, it was the 19th most used psychological test and the 2nd most used test of memory. By 1982, it was the 12th most used test and the most used memory test-a popularity it continues to enjoy. The present article will briefly trace the origin of the Wechsler Memory Scale and examine its evolution across the revisions that appeared in 1987, 1997, and 2009. Issues with norming and standardization, as well as reliability and validity, will be summarized. It is argued that the test continues to have several serious shortcomings, including a lack of anchoring in an explicit neuroanatomical theory of memory and an underlying factor structure that appears to have changed little despite changes in the manifest structure and content of the test.
Multiprocessor Real-Time Locking Protocols for Replicated Resources

Science.gov (United States)

2016-07-01

assignment problem, the ac- tual identities of the allocated replicas must be known. When locking protocols are used, tasks may experience delays due to both...Multiprocessor Real-Time Locking Protocols for Replicated Resources ∗ Catherine E. Jarrett1, Kecheng Yang1, Ming Yang1, Pontus Ekberg2, and James H...replicas to execute. In prior work on replicated resources, k-exclusion locks have been used, but this restricts tasks to lock only one replica at a time. To
Multi-processor system-level synthesis for multiple applications on platform FPGA

NARCIS (Netherlands)

Kumar, A.; Fernando, S.D.; Ha, Y.; Mesman, B.; Corporaal, H.; Bertels, Koen

2007-01-01

Multiprocessor systems-on-chip (MPSoC) are being developed in increasing numbers to support the high number of applications running on modern embedded systems. Designing and programming such systems prove to be a major challenge. Most of the current design methodologies rely on creating the design
Energy transfers in large-scale and small-scale dynamos

Science.gov (United States)

Samtaney, Ravi; Kumar, Rohit; Verma, Mahendra

2015-11-01

We present the energy transfers, mainly energy fluxes and shell-to-shell energy transfers in small-scale dynamo (SSD) and large-scale dynamo (LSD) using numerical simulations of MHD turbulence for Pm = 20 (SSD) and for Pm = 0.2 on 10243 grid. For SSD, we demonstrate that the magnetic energy growth is caused by nonlocal energy transfers from the large-scale or forcing-scale velocity field to small-scale magnetic field. The peak of these energy transfers move towards lower wavenumbers as dynamo evolves, which is the reason for the growth of the magnetic fields at the large scales. The energy transfers U2U (velocity to velocity) and B2B (magnetic to magnetic) are forward and local. For LSD, we show that the magnetic energy growth takes place via energy transfers from large-scale velocity field to large-scale magnetic field. We observe forward U2U and B2B energy flux, similar to SSD.
On initial Brain Activity Mapping of episodic and semantic memory code in the hippocampus.

Science.gov (United States)

Tsien, Joe Z; Li, Meng; Osan, Remus; Chen, Guifen; Lin, Longian; Wang, Phillip Lei; Frey, Sabine; Frey, Julietta; Zhu, Dajiang; Liu, Tianming; Zhao, Fang; Kuang, Hui

2013-10-01

It has been widely recognized that the understanding of the brain code would require large-scale recording and decoding of brain activity patterns. In 2007 with support from Georgia Research Alliance, we have launched the Brain Decoding Project Initiative with the basic idea which is now similarly advocated by BRAIN project or Brain Activity Map proposal. As the planning of the BRAIN project is currently underway, we share our insights and lessons from our efforts in mapping real-time episodic memory traces in the hippocampus of freely behaving mice. We show that appropriate large-scale statistical methods are essential to decipher and measure real-time memory traces and neural dynamics. We also provide an example of how the carefully designed, sometime thinking-outside-the-box, behavioral paradigms can be highly instrumental to the unraveling of memory-coding cell assembly organizing principle in the hippocampus. Our observations to date have led us to conclude that the specific-to-general categorical and combinatorial feature-coding cell assembly mechanism represents an emergent property for enabling the neural networks to generate and organize not only episodic memory, but also semantic knowledge and imagination. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Shared filtering processes link attentional and visual short-term memory capacity limits.

Science.gov (United States)

Bettencourt, Katherine C; Michalka, Samantha W; Somers, David C

2011-09-30

Both visual attention and visual short-term memory (VSTM) have been shown to have capacity limits of 4 ± 1 objects, driving the hypothesis that they share a visual processing buffer. However, these capacity limitations also show strong individual differences, making the degree to which these capacities are related unclear. Moreover, other research has suggested a distinction between attention and VSTM buffers. To explore the degree to which capacity limitations reflect the use of a shared visual processing buffer, we compared individual subject's capacities on attentional and VSTM tasks completed in the same testing session. We used a multiple object tracking (MOT) and a VSTM change detection task, with varying levels of distractors, to measure capacity. Significant correlations in capacity were not observed between the MOT and VSTM tasks when distractor filtering demands differed between the tasks. Instead, significant correlations were seen when the tasks shared spatial filtering demands. Moreover, these filtering demands impacted capacity similarly in both attention and VSTM tasks. These observations fail to support the view that visual attention and VSTM capacity limits result from a shared buffer but instead highlight the role of the resource demands of underlying processes in limiting capacity.
Mind-to-mind heteroclinic coordination: Model of sequential episodic memory initiation

Science.gov (United States)

Afraimovich, V. S.; Zaks, M. A.; Rabinovich, M. I.

2018-05-01

Retrieval of episodic memory is a dynamical process in the large scale brain networks. In social groups, the neural patterns, associated with specific events directly experienced by single members, are encoded, recalled, and shared by all participants. Here, we construct and study the dynamical model for the formation and maintaining of episodic memory in small ensembles of interacting minds. We prove that the unconventional dynamical attractor of this process—the nonsmooth heteroclinic torus—is structurally stable within the Lotka-Volterra-like sets of equations. Dynamics on this torus combines the absence of chaos with asymptotic instability of every separate trajectory; its adequate quantitative characteristics are length-related Lyapunov exponents. Variation of the coupling strength between the participants results in different types of sequential switching between metastable states; we interpret them as stages in formation and modification of the episodic memory.
Multiprocessor performance modeling with ADAS

Science.gov (United States)

Hayes, Paul J.; Andrews, Asa M.

1989-01-01

A graph managing strategy referred to as the Algorithm to Architecture Mapping Model (ATAMM) appears useful for the time-optimized execution of application algorithm graphs in embedded multiprocessors and for the performance prediction of graph designs. This paper reports the modeling of ATAMM in the Architecture Design and Assessment System (ADAS) to make an independent verification of ATAMM's performance prediction capability and to provide a user framework for the evaluation of arbitrary algorithm graphs. Following an overview of ATAMM and its major functional rules are descriptions of the ADAS model of ATAMM, methods to enter an arbitrary graph into the model, and techniques to analyze the simulation results. The performance of a 7-node graph example is evaluated using the ADAS model and verifies the ATAMM concept by substantiating previously published performance results.
Verbalizing, Visualizing, and Navigating: The Effect of Strategies on Encoding a Large-Scale Virtual Environment

Science.gov (United States)

Kraemer, David J. M.; Schinazi, Victor R.; Cawkwell, Philip B.; Tekriwal, Anand; Epstein, Russell A.; Thompson-Schill, Sharon L.

2017-01-01

Using novel virtual cities, we investigated the influence of verbal and visual strategies on the encoding of navigation-relevant information in a large-scale virtual environment. In 2 experiments, participants watched videos of routes through 4 virtual cities and were subsequently tested on their memory for observed landmarks and their ability to…
A Comparison of Laboratory and Clinical Working Memory Tests and Their Prediction of Fluid Intelligence

Science.gov (United States)

Shelton, Jill T.; Elliott, Emily M.; Hill, B. D.; Calamia, Matthew R.; Gouvier, Wm. Drew

2010-01-01

The working memory (WM) construct is conceptualized similarly across domains of psychology, yet the methods used to measure WM function vary widely. The present study examined the relationship between WM measures used in the laboratory and those used in applied settings. A large sample of undergraduates completed three laboratory-based WM measures (operation span, listening span, and n-back), as well as the WM subtests from the Wechsler Adult Intelligence Scale-III and the Wechsler Memory Scale-III. Performance on all of the WM subtests of the clinical batteries shared positive correlations with the lab measures; however, the Arithmetic and Spatial Span subtests shared lower correlations than the other WM tests. Factor analyses revealed that a factor comprising scores from the three lab WM measures and the clinical subtest, Letter-Number Sequencing (LNS), provided the best measurement of WM. Additionally, a latent variable approach was taken using fluid intelligence as a criterion construct to further discriminate between the WM tests. The results revealed that the lab measures, along with the LNS task, were the best predictors of fluid abilities. PMID:20161647
Shared neuroanatomical substrates of impaired phonological working memory across reading disability and autism

OpenAIRE

Lu, Chunming; Qi, Zhenghan; Harris, Adrianne; Weil, Lisa Wisman; Han, Michelle; Halverson, Kelly; Perrachione, Tyler K.; Kjelgaard, Margaret; Wexler, Kenneth; Tager-Flusberg, Helen; Gabrieli, John D. E.

2016-01-01

Background Individuals with reading disability and individuals with autism spectrum disorder (ASD) are characterized, respectively, by their difficulties in reading and social communication, but both groups often have impaired phonological working memory (PWM). It is not known whether the impaired PWM reflects distinct or shared neuroanatomical abnormalities in these two diagnostic groups. Methods White-matter structural connectivity via diffusion weighted imaging was examined in 64 children,...
Memory for target height is scaled to observer height.

Science.gov (United States)

Twedt, Elyssa; Crawford, L Elizabeth; Proffitt, Dennis R

2012-04-01

According to the embodied approach to visual perception, individuals scale the environment to their bodies. This approach highlights the central role of the body for immediate, situated action. The present experiments addressed whether body scaling--specifically, eye-height scaling--occurs in memory when action is not immediate. Participants viewed standard targets that were either the same height as, taller than, or shorter than themselves. Participants then viewed a comparison target and judged whether the comparison was taller or shorter than the standard target. Participants were most accurate when the standard target height matched their own heights, taking into account postural changes. Participants were biased to underestimate standard target height, in general, and to push standard target height away from their own heights. These results are consistent with the literature on eye-height scaling in visual perception and suggest that body scaling is not only a useful metric for perception and action, but is also preserved in memory.
Domain-general involvement of the posterior frontolateral cortex in time-based resource-sharing in working memory: An fMRI study

NARCIS (Netherlands)

Vergauwe, E.; Hartstra, E.; Barrouillet, P.; Brass, M.

2015-01-01

Working memory is often defined in cognitive psychology as a system devoted to the simultaneous processing and maintenance of information. In line with the time-based resource-sharing model of working memory (TBRS; Barrouillet and Camos, 2015; Barrouillet et al., 2004), there is accumulating
Design of massively parallel hardware multi-processors for highly-demanding embedded applications

NARCIS (Netherlands)

Jozwiak, L.; Jan, Y.

2013-01-01

Many new embedded applications require complex computations to be performed to tight schedules, while at the same time demanding low energy consumption and low cost. For implementation of these highly-demanding applications, highly-optimized application-specific multi-processor system-on-a-chip
Mathematical Analysis of Vehicle Delivery Scale of Bike-Sharing Rental Nodes

Science.gov (United States)

Zhai, Y.; Liu, J.; Liu, L.

2018-04-01

Aiming at the lack of scientific and reasonable judgment of vehicles delivery scale and insufficient optimization of scheduling decision, based on features of the bike-sharing usage, this paper analyses the applicability of the discrete time and state of the Markov chain, and proves its properties to be irreducible, aperiodic and positive recurrent. Based on above analysis, the paper has reached to the conclusion that limit state (steady state) probability of the bike-sharing Markov chain only exists and is independent of the initial probability distribution. Then this paper analyses the difficulty of the transition probability matrix parameter statistics and the linear equations group solution in the traditional solving algorithm of the bike-sharing Markov chain. In order to improve the feasibility, this paper proposes a "virtual two-node vehicle scale solution" algorithm which considered the all the nodes beside the node to be solved as a virtual node, offered the transition probability matrix, steady state linear equations group and the computational methods related to the steady state scale, steady state arrival time and scheduling decision of the node to be solved. Finally, the paper evaluates the rationality and accuracy of the steady state probability of the proposed algorithm by comparing with the traditional algorithm. By solving the steady state scale of the nodes one by one, the proposed algorithm is proved to have strong feasibility because it lowers the level of computational difficulty and reduces the number of statistic, which will help the bike-sharing companies to optimize the scale and scheduling of nodes.
MATHEMATICAL ANALYSIS OF VEHICLE DELIVERY SCALE OF BIKE-SHARING RENTAL NODES

Directory of Open Access Journals (Sweden)

Y. Zhai

2018-04-01

Full Text Available Aiming at the lack of scientific and reasonable judgment of vehicles delivery scale and insufficient optimization of scheduling decision, based on features of the bike-sharing usage, this paper analyses the applicability of the discrete time and state of the Markov chain, and proves its properties to be irreducible, aperiodic and positive recurrent. Based on above analysis, the paper has reached to the conclusion that limit state (steady state probability of the bike-sharing Markov chain only exists and is independent of the initial probability distribution. Then this paper analyses the difficulty of the transition probability matrix parameter statistics and the linear equations group solution in the traditional solving algorithm of the bike-sharing Markov chain. In order to improve the feasibility, this paper proposes a "virtual two-node vehicle scale solution" algorithm which considered the all the nodes beside the node to be solved as a virtual node, offered the transition probability matrix, steady state linear equations group and the computational methods related to the steady state scale, steady state arrival time and scheduling decision of the node to be solved. Finally, the paper evaluates the rationality and accuracy of the steady state probability of the proposed algorithm by comparing with the traditional algorithm. By solving the steady state scale of the nodes one by one, the proposed algorithm is proved to have strong feasibility because it lowers the level of computational difficulty and reduces the number of statistic, which will help the bike-sharing companies to optimize the scale and scheduling of nodes.

Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer

Energy Technology Data Exchange (ETDEWEB)

Villa, Oreste; Tumeo, Antonino; Secchi, Simone; Manzano Franco, Joseph B.

2012-12-31

Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, we introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.
Memory bias for negative emotional words in recognition memory is driven by effects of category membership.

Science.gov (United States)

White, Corey N; Kapucu, Aycan; Bruno, Davide; Rotello, Caren M; Ratcliff, Roger

2014-01-01

Recognition memory studies often find that emotional items are more likely than neutral items to be labelled as studied. Previous work suggests this bias is driven by increased memory strength/familiarity for emotional items. We explored strength and bias interpretations of this effect with the conjecture that emotional stimuli might seem more familiar because they share features with studied items from the same category. Categorical effects were manipulated in a recognition task by presenting lists with a small, medium or large proportion of emotional words. The liberal memory bias for emotional words was only observed when a medium or large proportion of categorised words were presented in the lists. Similar, though weaker, effects were observed with categorised words that were not emotional (animal names). These results suggest that liberal memory bias for emotional items may be largely driven by effects of category membership.
VME multiprocessor system for plasma control at the JT-60 Upgrade

International Nuclear Information System (INIS)

Kimura, T.; Kurihara, K.; Takahashi, M.; Kawamata, Y.; Akasaka, H.; Matsukawa, M.

1989-01-01

In this paper design and preliminary tests are reported of a VME multiprocessor system for the JT-60 Upgrade plasma control utilizing three MC88100 based RISC computers and VME buses. The design of the VME system was stimulated by faster and more accurate computation requirements for the plasma position and shape control
Memory as the "whole brain work": a large-scale model based on "oscillations in super-synergy".

Science.gov (United States)

Başar, Erol

2005-01-01

According to recent trends, memory depends on several brain structures working in concert across many levels of neural organization; "memory is a constant work-in progress." The proposition of a brain theory based on super-synergy in neural populations is most pertinent for the understanding of this constant work in progress. This report introduces a new model on memory basing on the processes of EEG oscillations and Brain Dynamics. This model is shaped by the following conceptual and experimental steps: 1. The machineries of super-synergy in the whole brain are responsible for formation of sensory-cognitive percepts. 2. The expression "dynamic memory" is used for memory processes that evoke relevant changes in alpha, gamma, theta and delta activities. The concerted action of distributed multiple oscillatory processes provides a major key for understanding of distributed memory. It comprehends also the phyletic memory and reflexes. 3. The evolving memory, which incorporates reciprocal actions or reverberations in the APLR alliance and during working memory processes, is especially emphasized. 4. A new model related to "hierarchy of memories as a continuum" is introduced. 5. The notions of "longer activated memory" and "persistent memory" are proposed instead of long-term memory. 6. The new analysis to recognize faces emphasizes the importance of EEG oscillations in neurophysiology and Gestalt analysis. 7. The proposed basic framework called "Memory in the Whole Brain Work" emphasizes that memory and all brain functions are inseparable and are acting as a "whole" in the whole brain. 8. The role of genetic factors is fundamental in living system settings and oscillations and accordingly in memory, according to recent publications. 9. A link from the "whole brain" to "whole body," and incorporation of vegetative and neurological system, is proposed, EEG oscillations and ultraslow oscillations being a control parameter.
Large-scale grid management

International Nuclear Information System (INIS)

Langdal, Bjoern Inge; Eggen, Arnt Ove

2003-01-01

The network companies in the Norwegian electricity industry now have to establish a large-scale network management, a concept essentially characterized by (1) broader focus (Broad Band, Multi Utility,...) and (2) bigger units with large networks and more customers. Research done by SINTEF Energy Research shows so far that the approaches within large-scale network management may be structured according to three main challenges: centralization, decentralization and out sourcing. The article is part of a planned series
Energy-Aware Real-Time Task Scheduling for Heterogeneous Multiprocessors with Particle Swarm Optimization Algorithm

Directory of Open Access Journals (Sweden)

Weizhe Zhang

2014-01-01

Full Text Available Energy consumption in computer systems has become a more and more important issue. High energy consumption has already damaged the environment to some extent, especially in heterogeneous multiprocessors. In this paper, we first formulate and describe the energy-aware real-time task scheduling problem in heterogeneous multiprocessors. Then we propose a particle swarm optimization (PSO based algorithm, which can successfully reduce the energy cost and the time for searching feasible solutions. Experimental results show that the PSO-based energy-aware metaheuristic uses 40%–50% less energy than the GA-based and SFLA-based algorithms and spends 10% less time than the SFLA-based algorithm in finding the solutions. Besides, it can also find 19% more feasible solutions than the SFLA-based algorithm.
The distribution and the functions of autobiographical memories: Why do older adults remember autobiographical memories from their youth?

Science.gov (United States)

Wolf, Tabea; Zimprich, Daniel

2016-09-01

In the present study, the distribution of autobiographical memories was examined from a functional perspective: we examined whether the extent to which long-term autobiographical memories were rated as having a self-, a directive, or a social function affects the location (mean age) and scale (standard deviation) of the memory distribution. Analyses were based on a total of 5598 autobiographical memories generated by 149 adults aged between 50 and 81 years in response to 51 cue-words. Participants provided their age at the time when the recalled events had happened and rated how frequently they recall these events for self-, directive, and social purposes. While more frequently using autobiographical memories for self-functions was associated with an earlier mean age, memories frequently shared with others showed a narrower distribution around a later mean age. The directive function, by contrast, did not affect the memory distribution. The results strengthen the assumption that experiences from an individual's late adolescence serve to maintain a sense of self-continuity throughout the lifespan. Experiences that are frequently shared with others, in contrast, stem from a narrow age range located in young adulthood.
Interoperable mesh components for large-scale, distributed-memory simulations

International Nuclear Information System (INIS)

Devine, K; Leung, V; Diachin, L; Miller, M

2009-01-01

SciDAC applications have a demonstrated need for advanced software tools to manage the complexities associated with sophisticated geometry, mesh, and field manipulation tasks, particularly as computer architectures move toward the petascale. In this paper, we describe a software component - an abstract data model and programming interface - designed to provide support for parallel unstructured mesh operations. We describe key issues that must be addressed to successfully provide high-performance, distributed-memory unstructured mesh services and highlight some recent research accomplishments in developing new load balancing and MPI-based communication libraries appropriate for leadership class computing. Finally, we give examples of the use of parallel adaptive mesh modification in two SciDAC applications.
DAEDALUS: System-Level Design Methodology for Streaming Multiprocessor Embedded Systems on Chips

NARCIS (Netherlands)

Stefanov, T.; Pimentel, A.; Nikolov, H.; Ha, S.; Teich, J.

2017-01-01

The complexity of modern embedded systems, which are increasingly based on heterogeneous multiprocessor system-on-chip (MPSoC) architectures, has led to the emergence of system-level design. To cope with this design complexity, system-level design aims at raising the abstraction level of the design
Large-Scale Astrophysical Visualization on Smartphones

Science.gov (United States)

Becciani, U.; Massimino, P.; Costa, A.; Gheller, C.; Grillo, A.; Krokos, M.; Petta, C.

2011-07-01

Nowadays digital sky surveys and long-duration, high-resolution numerical simulations using high performance computing and grid systems produce multidimensional astrophysical datasets in the order of several Petabytes. Sharing visualizations of such datasets within communities and collaborating research groups is of paramount importance for disseminating results and advancing astrophysical research. Moreover educational and public outreach programs can benefit greatly from novel ways of presenting these datasets by promoting understanding of complex astrophysical processes, e.g., formation of stars and galaxies. We have previously developed VisIVO Server, a grid-enabled platform for high-performance large-scale astrophysical visualization. This article reviews the latest developments on VisIVO Web, a custom designed web portal wrapped around VisIVO Server, then introduces VisIVO Smartphone, a gateway connecting VisIVO Web and data repositories for mobile astrophysical visualization. We discuss current work and summarize future developments.
THEORETICAL REVIEW The Hippocampus, Time, and Memory Across Scales

Science.gov (United States)

Howard, Marc W.; Eichenbaum, Howard

2014-01-01

A wealth of experimental studies with animals have offered insights about how neural networks within the hippocampus support the temporal organization of memories. These studies have revealed the existence of “time cells” that encode moments in time, much as the well-known “place cells” map locations in space. Another line of work inspired by human behavioral studies suggests that episodic memories are mediated by a state of temporal context that changes gradually over long time scales, up to at least a few thousand seconds. In this view, the “mental time travel” hypothesized to support the experience of episodic memory corresponds to a “jump back in time” in which a previous state of temporal context is recovered. We suggest that these 2 sets of findings could be different facets of a representation of temporal history that maintains a record at the last few thousand seconds of experience. The ability to represent long time scales comes at the cost of discarding precise information about when a stimulus was experienced—this uncertainty becomes greater for events further in the past. We review recent computational work that describes a mechanism that could construct such a scale-invariant representation. Taken as a whole, this suggests the hippocampus plays its role in multiple aspects of cognition by representing events embedded in a general spatiotemporal context. The representation of internal time can be useful across nonhippocampal memory systems. PMID:23915126
Tunnel field-effect transistor charge-trapping memory with steep subthreshold slope and large memory window

Science.gov (United States)

Kino, Hisashi; Fukushima, Takafumi; Tanaka, Tetsu

2018-04-01

Charge-trapping memory requires the increase of bit density per cell and a larger memory window for lower-power operation. A tunnel field-effect transistor (TFET) can achieve to increase the bit density per cell owing to its steep subthreshold slope. In addition, a TFET structure has an asymmetric structure, which is promising for achieving a larger memory window. A TFET with the N-type gate shows a higher electric field between the P-type source and the N-type gate edge than the conventional FET structure. This high electric field enables large amounts of charges to be injected into the charge storage layer. In this study, we fabricated silicon-oxide-nitride-oxide-semiconductor (SONOS) memory devices with the TFET structure and observed a steep subthreshold slope and a larger memory window.
A shared memory based interface of MARTe with EPICS for real-time applications

International Nuclear Information System (INIS)

Yun, Sangwon; Neto, André C.; Park, Mikyung; Lee, Sangil; Park, Kaprai

2014-01-01

Highlights: • We implemented a shared memory based interface of MARTe with EPICS. • We implemented an EPICS module supporting device and driver support. • We implemented an example EPICS IOC and CSS OPI for evaluation. - Abstract: The Multithreaded Application Real-Time executor (MARTe) is a multi-platform C++ middleware designed for the implementation of real-time control systems. It currently supports the Linux, Linux + RTAI, VxWorks, Solaris and MS Windows platforms. In the fusion community MARTe is being used at JET, COMPASS, ISTTOK, FTU and RFX in fusion [1]. The Experimental Physics and Industrial Control System (EPICS), a standard framework for the control systems in KSTAR and ITER, is a set of software tools and applications which provide a software infrastructure for use in building distributed control systems to operate devices. For a MARTe based application to cooperate with an EPICS based application, an interface layer between MARTe and EPICS is required. To solve this issue, a number of interfacing solutions have been proposed and some of them have been implemented. Nevertheless, a new approach is required to mitigate the functional limitations of existing solutions and to improve their performance for real-time applications. This paper describes the design and implementation of a shared memory based interface between MARTe and EPICS
A shared memory based interface of MARTe with EPICS for real-time applications

Energy Technology Data Exchange (ETDEWEB)

Yun, Sangwon, E-mail: yunsw@nfri.re.kr [National Fusion Research Institute (NFRI), Gwahangno 169-148, Yuseong-Gu, Daejeon 305-806 (Korea, Republic of); Neto, André C. [Associação EURATOM/IST, Instituto de Plasmas e Fusão Nuclear, Instituto Superior Técnico, Universidade Técnica de Lisboa, P-1049-001 Lisboa (Portugal); Park, Mikyung; Lee, Sangil; Park, Kaprai [National Fusion Research Institute (NFRI), Gwahangno 169-148, Yuseong-Gu, Daejeon 305-806 (Korea, Republic of)

2014-05-15

Highlights: • We implemented a shared memory based interface of MARTe with EPICS. • We implemented an EPICS module supporting device and driver support. • We implemented an example EPICS IOC and CSS OPI for evaluation. - Abstract: The Multithreaded Application Real-Time executor (MARTe) is a multi-platform C++ middleware designed for the implementation of real-time control systems. It currently supports the Linux, Linux + RTAI, VxWorks, Solaris and MS Windows platforms. In the fusion community MARTe is being used at JET, COMPASS, ISTTOK, FTU and RFX in fusion [1]. The Experimental Physics and Industrial Control System (EPICS), a standard framework for the control systems in KSTAR and ITER, is a set of software tools and applications which provide a software infrastructure for use in building distributed control systems to operate devices. For a MARTe based application to cooperate with an EPICS based application, an interface layer between MARTe and EPICS is required. To solve this issue, a number of interfacing solutions have been proposed and some of them have been implemented. Nevertheless, a new approach is required to mitigate the functional limitations of existing solutions and to improve their performance for real-time applications. This paper describes the design and implementation of a shared memory based interface between MARTe and EPICS.
Evaluation of a Connectionless NoC for a Real-Time Distributed Shared Memory Many-Core System

NARCIS (Netherlands)

Rutgers, J.H.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria

2012-01-01

Real-time embedded systems like smartphones tend to comprise an ever increasing number of processing cores. For scalability and the need for guaranteed performance, the use of a connection-oriented network-on-chip (NoC) is advocated. Furthermore, a distributed shared memory architecture is preferred
Optimal data replication: A new approach to optimizing parallel EM algorithms on a mesh-connected multiprocessor for 3D PET image reconstruction

International Nuclear Information System (INIS)

Chen, C.M.; Lee, S.Y.

1995-01-01

The EM algorithm promises an estimated image with the maximal likelihood for 3D PET image reconstruction. However, due to its long computation time, the EM algorithm has not been widely used in practice. While several parallel implementations of the EM algorithm have been developed to make the EM algorithm feasible, they do not guarantee an optimal parallelization efficiency. In this paper, the authors propose a new parallel EM algorithm which maximizes the performance by optimizing data replication on a mesh-connected message-passing multiprocessor. To optimize data replication, the authors have formally derived the optimal allocation of shared data, group sizes, integration and broadcasting of replicated data as well as the scheduling of shared data accesses. The proposed parallel EM algorithm has been implemented on an iPSC/860 with 16 PEs. The experimental and theoretical results, which are consistent with each other, have shown that the proposed parallel EM algorithm could improve performance substantially over those using unoptimized data replication
Ethics of large-scale change

OpenAIRE

Arler, Finn

2006-01-01

The subject of this paper is long-term large-scale changes in human society. Some very significant examples of large-scale change are presented: human population growth, human appropriation of land and primary production, the human use of fossil fuels, and climate change. The question is posed, which kind of attitude is appropriate when dealing with large-scale changes like these from an ethical point of view. Three kinds of approaches are discussed: Aldo Leopold's mountain thinking, th...
Assessing Knowledge Sharing Among Academics: A Validation of the Knowledge Sharing Behavior Scale (KSBS).

Science.gov (United States)

Ramayah, T; Yeap, Jasmine A L; Ignatius, Joshua

2014-04-01

There is a belief that academics tend to hold on tightly to their knowledge and intellectual resources. However, not much effort has been put into the creation of a valid and reliable instrument to measure knowledge sharing behavior among the academics. To apply and validate the Knowledge Sharing Behavior Scale (KSBS) as a measure of knowledge sharing behavior within the academic community. Respondents (N = 447) were academics from arts and science streams in 10 local, public universities in Malaysia. Data were collected using the 28-item KSBS that assessed four dimensions of knowledge sharing behavior namely written contributions, organizational communications, personal interactions, and communities of practice. The exploratory factor analysis showed that the items loaded on the dimension constructs that they were supposed to represent, thus proving construct validity. A within-factor analysis revealed that each set of items representing their intended dimension loaded on only one construct, therefore establishing convergent validity. All four dimensions were not perfectly correlated with each other or organizational citizenship behavior, thereby proving discriminant validity. However, all four dimensions correlated with organizational commitment, thus confirming predictive validity. Furthermore, all four factors correlated with both tacit and explicit sharing, which confirmed their concurrent validity. All measures also possessed sufficient reliability (α > .70). The KSBS is a valid and reliable instrument that can be used to formally assess the types of knowledge artifacts residing among academics and the degree of knowledge sharing in relation to those artifacts. © The Author(s) 2014.
Secret Sharing Schemes with a large number of players from Toric Varieties

DEFF Research Database (Denmark)

Hansen, Johan P.

A general theory for constructing linear secret sharing schemes over a finite field $\\Fq$ from toric varieties is introduced. The number of players can be as large as $(q-1)^r-1$ for $r\\geq 1$. We present general methods for obtaining the reconstruction and privacy thresholds as well as conditions...... for multiplication on the associated secret sharing schemes. In particular we apply the method on certain toric surfaces. The main results are ideal linear secret sharing schemes where the number of players can be as large as $(q-1)^2-1$. We determine bounds for the reconstruction and privacy thresholds...
Design considerations for a multiprocessor based data acquisition system

International Nuclear Information System (INIS)

Tippie, J.W.; Kulaga, J.E.

1979-01-01

The rapid advance of digital technology has provided the systems designer with many new design options. Hardware is no longer the controlling expense. Complex operating systems provide the flexibility and development tools needed by software designers, but restrict throughput. Multiprocessor-based systems can be used to ''front-end'' high-throughput applications while maintaining the many advantages offered by multitasking operating systems. The design of a high-throughput data acquisition system for application in low energy nuclear physics is considered

A simple multiprocessor management system for event-parallel computing

International Nuclear Information System (INIS)

Bracker, S.; Gounder, K.; Hendrix, K.; Summers, D.

1996-01-01

Offline software using Transmission Control Protocol/Internet Protocol (TCP/IP) sockets to distribute particle physics events to multiple UNIX/RISC workstations is described. A modular, building block approach was taken that allowed tailoring to solve specific tasks efficiently and simply as they arose. The modest, initial cost was having to learn about sockets for interprocess communication. This multiprocessor management software has been used to control the reconstruction of eight billion raw data events from Fermilab Experiment E791
Reducing computational costs in large scale 3D EIT by using a sparse Jacobian matrix with block-wise CGLS reconstruction

International Nuclear Information System (INIS)

Yang, C L; Wei, H Y; Soleimani, M; Adler, A

2013-01-01

Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current–voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results. (paper)
Reducing computational costs in large scale 3D EIT by using a sparse Jacobian matrix with block-wise CGLS reconstruction.

Science.gov (United States)

Yang, C L; Wei, H Y; Adler, A; Soleimani, M

2013-06-01

Electrical impedance tomography (EIT) is a fast and cost-effective technique to provide a tomographic conductivity image of a subject from boundary current-voltage data. This paper proposes a time and memory efficient method for solving a large scale 3D EIT inverse problem using a parallel conjugate gradient (CG) algorithm. The 3D EIT system with a large number of measurement data can produce a large size of Jacobian matrix; this could cause difficulties in computer storage and the inversion process. One of challenges in 3D EIT is to decrease the reconstruction time and memory usage, at the same time retaining the image quality. Firstly, a sparse matrix reduction technique is proposed using thresholding to set very small values of the Jacobian matrix to zero. By adjusting the Jacobian matrix into a sparse format, the element with zeros would be eliminated, which results in a saving of memory requirement. Secondly, a block-wise CG method for parallel reconstruction has been developed. The proposed method has been tested using simulated data as well as experimental test samples. Sparse Jacobian with a block-wise CG enables the large scale EIT problem to be solved efficiently. Image quality measures are presented to quantify the effect of sparse matrix reduction in reconstruction results.
CSCW Challenges in Large-Scale Technical Projects - a case study

DEFF Research Database (Denmark)

Grønbæk, Kaj; Kyng, Morten; Mogensen, Preben Holst

1992-01-01

This paper investigates CSCW aspects of large-scale technical projects based on a case study of a specific Danish engineering company and uncovers s challenges to CSCW applications in this setting. The company is responsible for management and supervision of one of the worlds largest tunnel....... The initial qualitative analysis identified a number of bottlenecks in daily work, where support for cooperation is needed. Examples of bottlenecks are: sharing materials, issuing tasks, and keeping track of task status. Grounded in the analysis, cooperative design workshops based on scenarios of future work...
Parallel implementation and evaluation of motion estimation system algorithms on a distributed memory multiprocessor using knowledge based mappings

Science.gov (United States)

Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

1989-01-01

Several techniques to perform static and dynamic load balancing techniques for vision systems are presented. These techniques are novel in the sense that they capture the computational requirements of a task by examining the data when it is produced. Furthermore, they can be applied to many vision systems because many algorithms in different systems are either the same, or have similar computational characteristics. These techniques are evaluated by applying them on a parallel implementation of the algorithms in a motion estimation system on a hypercube multiprocessor system. The motion estimation system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from different time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters. It is shown that the performance gains when these data decomposition and load balancing techniques are used are significant and the overhead of using these techniques is minimal.
Preparing laboratory and real-world EEG data for large-scale analysis: A containerized approach

Directory of Open Access Journals (Sweden)

Nima eBigdely-Shamlo

2016-03-01

Full Text Available Large-scale analysis of EEG and other physiological measures promises new insights into brain processes and more accurate and robust brain-computer interface (BCI models.. However, the absence of standard-ized vocabularies for annotating events in a machine understandable manner, the welter of collection-specific data organizations, the diffi-culty in moving data across processing platforms, and the unavailability of agreed-upon standards for preprocessing have prevented large-scale analyses of EEG. Here we describe a containerized approach and freely available tools we have developed to facilitate the process of an-notating, packaging, and preprocessing EEG data collections to enable data sharing, archiving, large-scale machine learning/data mining and (meta-analysis. The EEG Study Schema (ESS comprises three data Levels, each with its own XML-document schema and file/folder convention, plus a standardized (PREP pipeline to move raw (Data Level 1 data to a basic preprocessed state (Data Level 2 suitable for application of a large class of EEG analysis methods. Researchers can ship a study as a single unit and operate on its data using a standardized interface. ESS does not require a central database and provides all the metadata data necessary to execute a wide variety of EEG processing pipelines. The primary focus of ESS is automated in-depth analysis and meta-analysis EEG studies. However, ESS can also encapsulate meta-information for the other modalities such as eye tracking, that are in-creasingly used in both laboratory and real-world neuroimaging. ESS schema and tools are freely available at eegstudy.org, and a central cata-log of over 850 GB of existing data in ESS format is available at study-catalog.org. These tools and resources are part of a larger effort to ena-ble data sharing at sufficient scale for researchers to engage in truly large-scale EEG analysis and data mining (BigEEG.org.
Writing to and reading from a nano-scale crossbar memory based on memristors

International Nuclear Information System (INIS)

Vontobel, Pascal O; Robinett, Warren; Kuekes, Philip J; Stewart, Duncan R; Straznicky, Joseph; Stanley Williams, R

2009-01-01

We present a design study for a nano-scale crossbar memory system that uses memristors with symmetrical but highly nonlinear current-voltage characteristics as memory elements. The memory is non-volatile since the memristors retain their state when un-powered. In order to address the nano-wires that make up this nano-scale crossbar, we use two coded demultiplexers implemented using mixed-scale crossbars (in which CMOS-wires cross nano-wires and in which the crosspoint junctions have one-time configurable memristors). This memory system does not utilize the kind of devices (diodes or transistors) that are normally used to isolate the memory cell being written to and read from in conventional memories. Instead, special techniques are introduced to perform the writing and the reading operation reliably by taking advantage of the nonlinearity of the type of memristors used. After discussing both writing and reading strategies for our memory system in general, we focus on a 64 x 64 memory array and present simulation results that show the feasibility of these writing and reading procedures. Besides simulating the case where all device parameters assume exactly their nominal value, we also simulate the much more realistic case where the device parameters stray around their nominal value: we observe a degradation in margins, but writing and reading is still feasible. These simulation results are based on a device model for memristors derived from measurements of fabricated devices in nano-scale crossbars using Pt and Ti nano-wires and using oxygen-depleted TiO 2 as the switching material.
'You should at least ask'. The expectations, hopes and fears of rare disease patients on large-scale data and biomaterial sharing for genomics research.

Science.gov (United States)

McCormack, Pauline; Kole, Anna; Gainotti, Sabina; Mascalzoni, Deborah; Molster, Caron; Lochmüller, Hanns; Woods, Simon

2016-10-01

Within the myriad articles about participants' opinions of genomics research, the views of a distinct group - people with a rare disease (RD) - are unknown. It is important to understand if their opinions differ from the general public by dint of having a rare disease and vulnerabilities inherent in this. Here we document RD patients' attitudes to participation in genomics research, particularly around large-scale, international data and biosample sharing. This work is unique in exploring the views of people with a range of rare disorders from many different countries. The authors work within an international, multidisciplinary consortium, RD-Connect, which has developed an integrated platform connecting databases, registries, biobanks and clinical bioinformatics for RD research. Focus groups were conducted with 52 RD patients from 16 countries. Using a scenario-based approach, participants were encouraged to raise topics relevant to their own experiences, rather than these being determined by the researcher. Issues include wide data sharing, and consent for new uses of historic samples and for children. Focus group members are positively disposed towards research and towards allowing data and biosamples to be shared internationally. Expressions of trust and attitudes to risk are often affected by the nature of the RD which they have experience of, as well as regulatory and cultural practices in their home country. Participants are concerned about data security and misuse. There is an acute recognition of the vulnerability inherent in having a RD and the possibility that open knowledge of this could lead to discrimination.
Coarse-Grain Bandwidth Estimation Scheme for Large-Scale Network

Science.gov (United States)

Cheung, Kar-Ming; Jennings, Esther H.; Sergui, John S.

2013-01-01

A large-scale network that supports a large number of users can have an aggregate data rate of hundreds of Mbps at any time. High-fidelity simulation of a large-scale network might be too complicated and memory-intensive for typical commercial-off-the-shelf (COTS) tools. Unlike a large commercial wide-area-network (WAN) that shares diverse network resources among diverse users and has a complex topology that requires routing mechanism and flow control, the ground communication links of a space network operate under the assumption of a guaranteed dedicated bandwidth allocation between specific sparse endpoints in a star-like topology. This work solved the network design problem of estimating the bandwidths of a ground network architecture option that offer different service classes to meet the latency requirements of different user data types. In this work, a top-down analysis and simulation approach was created to size the bandwidths of a store-and-forward network for a given network topology, a mission traffic scenario, and a set of data types with different latency requirements. These techniques were used to estimate the WAN bandwidths of the ground links for different architecture options of the proposed Integrated Space Communication and Navigation (SCaN) Network. A new analytical approach, called the "leveling scheme," was developed to model the store-and-forward mechanism of the network data flow. The term "leveling" refers to the spreading of data across a longer time horizon without violating the corresponding latency requirement of the data type. Two versions of the leveling scheme were developed: 1. A straightforward version that simply spreads the data of each data type across the time horizon and doesn't take into account the interactions among data types within a pass, or between data types across overlapping passes at a network node, and is inherently sub-optimal. 2. Two-state Markov leveling scheme that takes into account the second order behavior of
On the impact of communication complexity in the design of parallel numerical algorithms

Science.gov (United States)

Gannon, D.; Vanrosendale, J.

1984-01-01

This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation.
An FPGA design flow for reconfigurable network-based multi-processor systems on chip

NARCIS (Netherlands)

Kumar, A.; Hansson, M.A; Huisken, J.; Corporaal, H.

2007-01-01

Multi-processor systems on chip (MPSoC) platforms are becoming increasingly more heterogeneous and are shifting towards a more communication-centric methodology. Networks on chip (NoC) have emerged as the design paradigm for scalable on-chip communication architectures. As the system complexity
Large-scale educational telecommunications systems for the US: An analysis of educational needs and technological opportunities

Science.gov (United States)

Morgan, R. P.; Singh, J. P.; Rothenberg, D.; Robinson, B. E.

1975-01-01

The needs to be served, the subsectors in which the system might be used, the technology employed, and the prospects for future utilization of an educational telecommunications delivery system are described and analyzed. Educational subsectors are analyzed with emphasis on the current status and trends within each subsector. Issues which affect future development, and prospects for future use of media, technology, and large-scale electronic delivery within each subsector are included. Information on technology utilization is presented. Educational telecommunications services are identified and grouped into categories: public television and radio, instructional television, computer aided instruction, computer resource sharing, and information resource sharing. Technology based services, their current utilization, and factors which affect future development are stressed. The role of communications satellites in providing these services is discussed. Efforts to analyze and estimate future utilization of large-scale educational telecommunications are summarized. Factors which affect future utilization are identified. Conclusions are presented.
Intelligent discrete particle swarm optimization for multiprocessor task scheduling problem

Directory of Open Access Journals (Sweden)

S Sarathambekai

2017-03-01

Full Text Available Discrete particle swarm optimization is one of the most recently developed population-based meta-heuristic optimization algorithm in swarm intelligence that can be used in any discrete optimization problems. This article presents a discrete particle swarm optimization algorithm to efficiently schedule the tasks in the heterogeneous multiprocessor systems. All the optimization algorithms share a common algorithmic step, namely population initialization. It plays a significant role because it can affect the convergence speed and also the quality of the final solution. The random initialization is the most commonly used method in majority of the evolutionary algorithms to generate solutions in the initial population. The initial good quality solutions can facilitate the algorithm to locate the optimal solution or else it may prevent the algorithm from finding the optimal solution. Intelligence should be incorporated to generate the initial population in order to avoid the premature convergence. This article presents a discrete particle swarm optimization algorithm, which incorporates opposition-based technique to generate initial population and greedy algorithm to balance the load of the processors. Make span, flow time, and reliability cost are three different measures used to evaluate the efficiency of the proposed discrete particle swarm optimization algorithm for scheduling independent tasks in distributed systems. Computational simulations are done based on a set of benchmark instances to assess the performance of the proposed algorithm.
Explicit time integration of finite element models on a vectorized, concurrent computer with shared memory

Science.gov (United States)

Gilbertsen, Noreen D.; Belytschko, Ted

1990-01-01

The implementation of a nonlinear explicit program on a vectorized, concurrent computer with shared memory is described and studied. The conflict between vectorization and concurrency is described and some guidelines are given for optimal block sizes. Several example problems are summarized to illustrate the types of speed-ups which can be achieved by reprogramming as compared to compiler optimization.
Comparison of Conjugate Gradient Density Matrix Search and Chebyshev Expansion Methods for Avoiding Diagonalization in Large-Scale Electronic Structure Calculations

Science.gov (United States)

Bates, Kevin R.; Daniels, Andrew D.; Scuseria, Gustavo E.

1998-01-01

We report a comparison of two linear-scaling methods which avoid the diagonalization bottleneck of traditional electronic structure algorithms. The Chebyshev expansion method (CEM) is implemented for carbon tight-binding calculations of large systems and its memory and timing requirements compared to those of our previously implemented conjugate gradient density matrix search (CG-DMS). Benchmark calculations are carried out on icosahedral fullerenes from C60 to C8640 and the linear scaling memory and CPU requirements of the CEM demonstrated. We show that the CPU requisites of the CEM and CG-DMS are similar for calculations with comparable accuracy.
Method for wiring allocation and switch configuration in a multiprocessor environment

Science.gov (United States)

Aridor, Yariv [Zichron Ya'akov, IL; Domany, Tamar [Kiryat Tivon, IL; Frachtenberg, Eitan [Jerusalem, IL; Gal, Yoav [Haifa, IL; Shmueli, Edi [Haifa, IL; Stockmeyer, legal representative, Robert E.; Stockmeyer, Larry Joseph [San Jose, CA

2008-07-15

A method for wiring allocation and switch configuration in a multiprocessor computer, the method including employing depth-first tree traversal to determine a plurality of paths among a plurality of processing elements allocated to a job along a plurality of switches and wires in a plurality of D-lines, and selecting one of the paths in accordance with at least one selection criterion.
Frontal Neurons Modulate Memory Retrieval across Widely Varying Temporal Scales

Science.gov (United States)

Zhang, Wen-Hua; Williams, Ziv M.

2015-01-01

Once a memory has formed, it is thought to undergo a gradual transition within the brain from short- to long-term storage. This putative process, however, also poses a unique problem to the memory system in that the same learned items must also be retrieved across broadly varying time scales. Here, we find that neurons in the ventrolateral…
Working memory performance inversely predicts spontaneous delta and theta-band scaling relations.

Science.gov (United States)

Euler, Matthew J; Wiltshire, Travis J; Niermeyer, Madison A; Butner, Jonathan E

2016-04-15

Electrophysiological studies have strongly implicated theta-band activity in human working memory processes. Concurrently, work on spontaneous, non-task-related oscillations has revealed the presence of long-range temporal correlations (LRTCs) within sub-bands of the ongoing EEG, and has begun to demonstrate their functional significance. However, few studies have yet assessed the relation of LRTCs (also called scaling relations) to individual differences in cognitive abilities. The present study addressed the intersection of these two literatures by investigating the relation of narrow-band EEG scaling relations to individual differences in working memory ability, with a particular focus on the theta band. Fifty-four healthy adults completed standardized assessments of working memory and separate recordings of their spontaneous, non-task-related EEG. Scaling relations were quantified in each of the five classical EEG frequency bands via the estimation of the Hurst exponent obtained from detrended fluctuation analysis. A multilevel modeling framework was used to characterize the relation of working memory performance to scaling relations as a function of general scalp location in Cartesian space. Overall, results indicated an inverse relationship between both delta and theta scaling relations and working memory ability, which was most prominent at posterior sensors, and was independent of either spatial or individual variability in band-specific power. These findings add to the growing literature demonstrating the relevance of neural LRTCs for understanding brain functioning, and support a construct- and state-dependent view of their functional implications. Copyright © 2016 Elsevier B.V. All rights reserved.
Cognitive psychopathology in Schizophrenia: Comparing memory performances with Obsessive-compulsive disorder patients and normal subjects on the Wechsler Memory Scale-IV.

Science.gov (United States)

Cammisuli, Davide Maria; Sportiello, Marco Timpano

2016-06-01

Memory system turns out to be one of the cognitive domains most severely impaired in schizophrenia. Within the theoretical framework of cognitive psychopathology, we compared the performance of schizophrenia patients on the Wechsler Memory Scale-IV with that in matched patients with Obsessive-compulsive disorder and that in healthy control subjects to establish the specific nature of memory deficits in schizophrenia. 30 schizophrenia patients, 30 obsessive-compulsive disorder patients and 40 healthy controls completed the Wechsler Memory Scale-IV. Schizophrenia symptom severity was assessed by the Positive and Negative Syndrome Scale (PANSS). Performances on memory battery including Indexes and subtests scores were compared by a One-Way ANOVA (Scheffé post-hoc test). Spearman Rank correlations were performed between scores on PANSS subscales and symptoms and WMS-IV Indexes and subtests, respectively. Schizophrenia patients showed a memory profile characterized by mild difficulties in auditory memory and visual working memory and poor functioning of visual, immediate and delayed memory. As expected, schizophrenia patients scored lower than healthy controls on all WMS-IV measures. With regard to the WMS-IV Indexes, schizophrenia patients performed worse on Auditory Memory, Visual Memory, Immediate and Delayed Memory than Obsessive-compulsive disorder patients but not on Visual Working Memory. Such a pattern was made even clearer for specific tasks such as immediate and delayed recall and spatial recall and memory for visual details, as revealed by the lowest scores on Logical Memory (immediate and delayed conditions) and Designs (immediate condition) subtests, respectively. Significant negative correlations between Logical Memory I and II were found with PANSS Excitement symptom as well as between DE I and PANSS Tension symptom. Significant positive correlations between LM II and PANSS Blunted affect and Poor rapport symptoms as well as DE I and PANSS Blunted affect
Large-scale dynamo action due to α fluctuations in a linear shear flow

Science.gov (United States)

Sridhar, S.; Singh, Nishant K.

2014-12-01

We present a model of large-scale dynamo action in a shear flow that has stochastic, zero-mean fluctuations of the α parameter. This is based on a minimal extension of the Kraichnan-Moffatt model, to include a background linear shear and Galilean-invariant α-statistics. Using the first-order smoothing approximation we derive a linear integro-differential equation for the large-scale magnetic field, which is non-perturbative in the shearing rate S , and the α-correlation time τα . The white-noise case, τα = 0 , is solved exactly, and it is concluded that the necessary condition for dynamo action is identical to the Kraichnan-Moffatt model without shear; this is because white-noise does not allow for memory effects, whereas shear needs time to act. To explore memory effects we reduce the integro-differential equation to a partial differential equation, valid for slowly varying fields when τα is small but non-zero. Seeking exponential modal solutions, we solve the modal dispersion relation and obtain an explicit expression for the growth rate as a function of the six independent parameters of the problem. A non-zero τα gives rise to new physical scales, and dynamo action is completely different from the white-noise case; e.g. even weak α fluctuations can give rise to a dynamo. We argue that, at any wavenumber, both Moffatt drift and Shear always contribute to increasing the growth rate. Two examples are presented: (a) a Moffatt drift dynamo in the absence of shear and (b) a Shear dynamo in the absence of Moffatt drift.

Political consultation and large-scale research

International Nuclear Information System (INIS)

Bechmann, G.; Folkers, H.

1977-01-01

Large-scale research and policy consulting have an intermediary position between sociological sub-systems. While large-scale research coordinates science, policy, and production, policy consulting coordinates science, policy and political spheres. In this very position, large-scale research and policy consulting lack of institutional guarantees and rational back-ground guarantee which are characteristic for their sociological environment. This large-scale research can neither deal with the production of innovative goods under consideration of rentability, nor can it hope for full recognition by the basis-oriented scientific community. Policy consulting knows neither the competence assignment of the political system to make decisions nor can it judge succesfully by the critical standards of the established social science, at least as far as the present situation is concerned. This intermediary position of large-scale research and policy consulting has, in three points, a consequence supporting the thesis which states that this is a new form of institutionalization of science: These are: 1) external control, 2) the organization form, 3) the theoretical conception of large-scale research and policy consulting. (orig.) [de
Oscillatory mechanisms of process binding in memory.

Science.gov (United States)

Klimesch, Wolfgang; Freunberger, Roman; Sauseng, Paul

2010-06-01

A central topic in cognitive neuroscience is the question, which processes underlie large scale communication within and between different neural networks. The basic assumption is that oscillatory phase synchronization plays an important role for process binding--the transient linking of different cognitive processes--which may be considered a special type of large scale communication. We investigate this question for memory processes on the basis of different types of oscillatory synchronization mechanisms. The reviewed findings suggest that theta and alpha phase coupling (and phase reorganization) reflect control processes in two large memory systems, a working memory and a complex knowledge system that comprises semantic long-term memory. It is suggested that alpha phase synchronization may be interpreted in terms of processes that coordinate top-down control (a process guided by expectancy to focus on relevant search areas) and access to memory traces (a process leading to the activation of a memory trace). An analogous interpretation is suggested for theta oscillations and the controlled access to episodic memories. Copyright (c) 2009 Elsevier Ltd. All rights reserved.
Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

Science.gov (United States)

Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

2015-09-01

The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
High-performance computing — an overview

Science.gov (United States)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

Science.gov (United States)

Quealy, Angela; Cole, Gary L.; Blech, Richard A.

1993-01-01

The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
GPU-based large-scale visualization

KAUST Repository

Hadwiger, Markus

2013-11-19

Recent advances in image and volume acquisition as well as computational advances in simulation have led to an explosion of the amount of data that must be visualized and analyzed. Modern techniques combine the parallel processing power of GPUs with out-of-core methods and data streaming to enable the interactive visualization of giga- and terabytes of image and volume data. A major enabler for interactivity is making both the computational and the visualization effort proportional to the amount of data that is actually visible on screen, decoupling it from the full data size. This leads to powerful display-aware multi-resolution techniques that enable the visualization of data of almost arbitrary size. The course consists of two major parts: An introductory part that progresses from fundamentals to modern techniques, and a more advanced part that discusses details of ray-guided volume rendering, novel data structures for display-aware visualization and processing, and the remote visualization of large online data collections. You will learn how to develop efficient GPU data structures and large-scale visualizations, implement out-of-core strategies and concepts such as virtual texturing that have only been employed recently, as well as how to use modern multi-resolution representations. These approaches reduce the GPU memory requirements of extremely large data to a working set size that fits into current GPUs. You will learn how to perform ray-casting of volume data of almost arbitrary size and how to render and process gigapixel images using scalable, display-aware techniques. We will describe custom virtual texturing architectures as well as recent hardware developments in this area. We will also describe client/server systems for distributed visualization, on-demand data processing and streaming, and remote visualization. We will describe implementations using OpenGL as well as CUDA, exploiting parallelism on GPUs combined with additional asynchronous
Large-scale multimedia modeling applications

International Nuclear Information System (INIS)

Droppo, J.G. Jr.; Buck, J.W.; Whelan, G.; Strenge, D.L.; Castleton, K.J.; Gelston, G.M.

1995-08-01

Over the past decade, the US Department of Energy (DOE) and other agencies have faced increasing scrutiny for a wide range of environmental issues related to past and current practices. A number of large-scale applications have been undertaken that required analysis of large numbers of potential environmental issues over a wide range of environmental conditions and contaminants. Several of these applications, referred to here as large-scale applications, have addressed long-term public health risks using a holistic approach for assessing impacts from potential waterborne and airborne transport pathways. Multimedia models such as the Multimedia Environmental Pollutant Assessment System (MEPAS) were designed for use in such applications. MEPAS integrates radioactive and hazardous contaminants impact computations for major exposure routes via air, surface water, ground water, and overland flow transport. A number of large-scale applications of MEPAS have been conducted to assess various endpoints for environmental and human health impacts. These applications are described in terms of lessons learned in the development of an effective approach for large-scale applications
Computational design of RNA parts, devices, and transcripts with kinetic folding algorithms implemented on multiprocessor clusters.

Science.gov (United States)

Thimmaiah, Tim; Voje, William E; Carothers, James M

2015-01-01

With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts.
Associative-memory representations emerge as shared spatial patterns of theta activity spanning the primate temporal cortex.

Science.gov (United States)

Nakahara, Kiyoshi; Adachi, Ken; Kawasaki, Keisuke; Matsuo, Takeshi; Sawahata, Hirohito; Majima, Kei; Takeda, Masaki; Sugiyama, Sayaka; Nakata, Ryota; Iijima, Atsuhiko; Tanigawa, Hisashi; Suzuki, Takafumi; Kamitani, Yukiyasu; Hasegawa, Isao

2016-06-10

Highly localized neuronal spikes in primate temporal cortex can encode associative memory; however, whether memory formation involves area-wide reorganization of ensemble activity, which often accompanies rhythmicity, or just local microcircuit-level plasticity, remains elusive. Using high-density electrocorticography, we capture local-field potentials spanning the monkey temporal lobes, and show that the visual pair-association (PA) memory is encoded in spatial patterns of theta activity in areas TE, 36, and, partially, in the parahippocampal cortex, but not in the entorhinal cortex. The theta patterns elicited by learned paired associates are distinct between pairs, but similar within pairs. This pattern similarity, emerging through novel PA learning, allows a machine-learning decoder trained on theta patterns elicited by a particular visual item to correctly predict the identity of those elicited by its paired associate. Our results suggest that the formation and sharing of widespread cortical theta patterns via learning-induced reorganization are involved in the mechanisms of associative memory representation.
Aggregated Representation of Distribution Networks for Large-Scale Transmission Network Simulations

DEFF Research Database (Denmark)

Göksu, Ömer; Altin, Müfit; Sørensen, Poul Ejnar

2014-01-01

As a common practice of large-scale transmission network analysis the distribution networks have been represented as aggregated loads. However, with increasing share of distributed generation, especially wind and solar power, in the distribution networks, it became necessary to include...... the distributed generation within those analysis. In this paper a practical methodology to obtain aggregated behaviour of the distributed generation is proposed. The methodology, which is based on the use of the IEC standard wind turbine models, is applied on a benchmark distribution network via simulations....
A Temporal Domain Decomposition Algorithmic Scheme for Large-Scale Dynamic Traffic Assignment

Directory of Open Access Journals (Sweden)

Eric J. Nava

2012-03-01

This paper presents a temporal decomposition scheme for large spatial- and temporal-scale dynamic traffic assignment, in which the entire analysis period is divided into Epochs. Vehicle assignment is performed sequentially in each Epoch, thus improving the model scalability and confining the peak run-time memory requirement regardless of the total analysis period. A proposed self-turning scheme adaptively searches for the run-time-optimal Epoch setting during iterations regardless of the characteristics of the modeled network. Extensive numerical experiments confirm the promising performance of the proposed algorithmic schemes.
Sex-dependent dissociation between emotional appraisal and memory: a large-scale behavioral and fMRI study

OpenAIRE

Spalek, Klara; Fastenrath, Matthias; Ackermann, Sandra; Auschra, Bianca; Coynel, David; Frey, Julia; Gschwind, Leo; Hartmann, Francina; van der Maarel, Nadine; Papassotiropoulos, Andreas; de Quervain, Dominique; Milnik, Annette

2015-01-01

Extensive evidence indicates that women outperform men in episodic memory tasks. Furthermore, women are known to evaluate emotional stimuli as more arousing than men. Because emotional arousal typically increases episodic memory formation, the females' memory advantage might be more pronounced for emotionally arousing information than for neutral information. Here, we report behavioral data from 3398 subjects, who performed picture rating and memory tasks, and corresponding fMRI data from up ...
DOLIB: Distributed Object Library

Energy Technology Data Exchange (ETDEWEB)

D' Azevedo, E.F.

1994-01-01

This report describes the use and implementation of DOLIB (Distributed Object Library), a library of routines that emulates global or virtual shared memory on Intel multiprocessor systems. Access to a distributed global array is through explicit calls to gather and scatter. Advantages of using DOLIB include: dynamic allocation and freeing of huge (gigabyte) distributed arrays, both C and FORTRAN callable interfaces, and the ability to mix shared-memory and message-passing programming models for ease of use and optimal performance. DOLIB is independent of language and compiler extensions and requires no special operating system support. DOLIB also supports automatic caching of read-only data for high performance. The virtual shared memory support provided in DOLIB is well suited for implementing Lagrangian particle tracking techniques. We have also used DOLIB to create DONIO (Distributed Object Network I/O Library), which obtains over a 10-fold improvement in disk I/O performance on the Intel Paragon.
DOLIB: Distributed Object Library

Energy Technology Data Exchange (ETDEWEB)

D`Azevedo, E.F.; Romine, C.H.

1994-10-01

This report describes the use and implementation of DOLIB (Distributed Object Library), a library of routines that emulates global or virtual shared memory on Intel multiprocessor systems. Access to a distributed global array is through explicit calls to gather and scatter. Advantages of using DOLIB include: dynamic allocation and freeing of huge (gigabyte) distributed arrays, both C and FORTRAN callable interfaces, and the ability to mix shared-memory and message-passing programming models for ease of use and optimal performance. DOLIB is independent of language and compiler extensions and requires no special operating system support. DOLIB also supports automatic caching of read-only data for high performance. The virtual shared memory support provided in DOLIB is well suited for implementing Lagrangian particle tracking techniques. We have also used DOLIB to create DONIO (Distributed Object Network I/O Library), which obtains over a 10-fold improvement in disk I/O performance on the Intel Paragon.
MFTF supervisory control and diagnostics system hardware

International Nuclear Information System (INIS)

Butner, D.N.

1979-01-01

The Supervisory Control and Diagnostics System (SCDS) for the Mirror Fusion Test Facility (MFTF) is a multiprocessor minicomputer system designed so that for most single-point failures, the hardware may be quickly reconfigured to provide continued operation of the experiment. The system is made up of nine Perkin-Elmer computers - a mixture of 8/32's and 7/32's. Each computer has ports on a shared memory system consisting of two independent shared memory modules. Each processor can signal other processors through hardware external to the shared memory. The system communicates with the Local Control and Instrumentation System, which consists of approximately 65 microprocessors. Each of the six system processors has facilities for communicating with a group of microprocessors; the groups consist of from four to 24 microprocessors. There are hardware switches so that if an SCDS processor communicating with a group of microprocessors fails, another SCDS processor takes over the communication
A high speed multi-tasking, multi-processor telemetry system

Energy Technology Data Exchange (ETDEWEB)

Wu, Kung Chris [Univ. of Texas, El Paso, TX (United States)

1996-12-31

This paper describes a small size, light weight, multitasking, multiprocessor telemetry system capable of collecting 32 channels of differential signals at a sampling rate of 6.25 kHz per channel. The system is designed to collect data from remote wind turbine research sites and transfer the data via wireless communication. A description of operational theory, hardware components, and itemized cost is provided. Synchronization with other data acquisition systems and test data on data transmission rates is also given. 11 refs., 7 figs., 4 tabs.
Multi-scale analysis of collective behavior in 2D self-propelled particle models of swarms: An Advection-Diffusion with Memory Approach

Science.gov (United States)

Raghib, Michael; Levin, Simon; Kevrekidis, Ioannis

2010-05-01

2. The long-time behavior of the msd of the centroid walk scales linearly with time for naïve groups (diffusion), but shows a sharp transition to quadratic scaling (advection) for informed ones. These observations suggest that the mesoscopic variables of interest are the magnitude of the drift, the diffusion coefficient and the time-scales at which the anomalous and the asymptotic behavior respectively dominate transport, the latter being linked to the time scale at which the group reaches a decision. In order to estimate these summary statistics from the msd, we assumed that the configuration centroid follows an uncoupled Continuous Time Random Walk (CTRW) with smooth jump and waiting time pdf's. The mesoscopic transport equation for this type of random walk corresponds to an Advection-Diffusion Equation with Memory (ADEM). The introduction of the memory, and thus non-Markovian effects, is necessary in order to correctly account for the two time scales present. Although we were not able to calculate the memory directly from the individual-level rules, we show that it can estimated from a single, relatively short, simulation run using a Mittag-Leffler function as template. With this function it is possible to predict accurately the behavior of the msd, as well as the full pdf for the position of the centroid. The resulting ADEM is self-consistent in the sense that transport parameters estimated from the memory via a Kubo relationship coincide with those estimated from the moments of the jump size pdf of the associated CTRW for a large number of group sizes, proportions of informed individuals, and degrees of bias along the preferred direction. We also discuss the phase diagrams for the transport coefficients estimated from this method, where we notice velocity-precision trade-offs, where precision is a measure of the deviation of realized group orientations with respect to the informed direction. We also note that the time scale to collective decision is invariant
Sex-dependent dissociation between emotional appraisal and memory: a large-scale behavioral and fMRI study.

Science.gov (United States)

Spalek, Klara; Fastenrath, Matthias; Ackermann, Sandra; Auschra, Bianca; Coynel, David; Frey, Julia; Gschwind, Leo; Hartmann, Francina; van der Maarel, Nadine; Papassotiropoulos, Andreas; de Quervain, Dominique; Milnik, Annette

2015-01-21

Extensive evidence indicates that women outperform men in episodic memory tasks. Furthermore, women are known to evaluate emotional stimuli as more arousing than men. Because emotional arousal typically increases episodic memory formation, the females' memory advantage might be more pronounced for emotionally arousing information than for neutral information. Here, we report behavioral data from 3398 subjects, who performed picture rating and memory tasks, and corresponding fMRI data from up to 696 subjects. We were interested in the interaction between sex and valence category on emotional appraisal, memory performances, and fMRI activity. The behavioral results showed that females evaluate in particular negative (p pictures, as emotionally more arousing (pinteraction recall females outperformed males not only in positive (p picture recall (p pictures (pinteraction memory advantage during free recall was absent in a recognition setting. We identified activation differences in fMRI, which corresponded to the females' stronger appraisal of especially negative pictures, but no activation differences that reflected the interaction effect in the free recall memory task. In conclusion, females' valence-category-specific memory advantage is only observed in a free recall, but not a recognition setting and does not depend on females' higher emotional appraisal. Copyright © 2015 the authors 0270-6474/15/350920-16$15.00/0.
Quality-driven model-based design of multi-processor accelerators : an application to LDPC decoders

NARCIS (Netherlands)

Jan, Y.

2012-01-01

The recent spectacular progress in nano-electronic technology has enabled the implementation of very complex multi-processor systems on single chips (MPSoCs). However in parallel, new highly demanding complex embedded applications are emerging, in fields like communication and networking,
Effects of motor congruence on visual working memory.

Science.gov (United States)

Quak, Michel; Pecher, Diane; Zeelenberg, Rene

2014-10-01

Grounded-cognition theories suggest that memory shares processing resources with perception and action. The motor system could be used to help memorize visual objects. In two experiments, we tested the hypothesis that people use motor affordances to maintain object representations in working memory. Participants performed a working memory task on photographs of manipulable and nonmanipulable objects. The manipulable objects were objects that required either a precision grip (i.e., small items) or a power grip (i.e., large items) to use. A concurrent motor task that could be congruent or incongruent with the manipulable objects caused no difference in working memory performance relative to nonmanipulable objects. Moreover, the precision- or power-grip motor task did not affect memory performance on small and large items differently. These findings suggest that the motor system plays no part in visual working memory.

Decentralized Large-Scale Power Balancing

DEFF Research Database (Denmark)

Halvgaard, Rasmus; Jørgensen, John Bagterp; Poulsen, Niels Kjølstad

2013-01-01

problem is formulated as a centralized large-scale optimization problem but is then decomposed into smaller subproblems that are solved locally by each unit connected to an aggregator. For large-scale systems the method is faster than solving the full problem and can be distributed to include an arbitrary...
Behavior characterization of the shared last-level cache in a chip multiprocessor

OpenAIRE

Benedicte Illescas, Pedro

2014-01-01

[CATALÀ] Aquest projecte consisteix a analitzar diferents aspectes de la jerarquia de memòria i entendre la seva influència al rendiment del sistema. Els aspectes que s'analitzaran són els algorismes de reemplaçament, els esquemes de mapeig de memòria i les polítiques de pàgina de memòria. [ANGLÈS] This project consists in analyzing different aspects of the memory hierarchy and understanding its influence in the overall system performance. The aspects that will be analyzed are cache replac...
Discrete memory impairments in largely pure chronic users of MDMA.

Science.gov (United States)

Wunderli, Michael D; Vonmoos, Matthias; Fürst, Marina; Schädelin, Katrin; Kraemer, Thomas; Baumgartner, Markus R; Seifritz, Erich; Quednow, Boris B

2017-10-01

Chronic use of 3,4-methylenedioxymethamphetamine (MDMA, "ecstasy") has repeatedly been associated with deficits in working memory, declarative memory, and executive functions. However, previous findings regarding working memory and executive function are inconclusive yet, as in most studies concomitant stimulant use, which is known to affect these functions, was not adequately controlled for. Therefore, we compared the cognitive performance of 26 stimulant-free and largely pure (primary) MDMA users, 25 stimulant-using polydrug MDMA users, and 56 MDMA/stimulant-naïve controls by applying a comprehensive neuropsychological test battery. Neuropsychological tests were grouped into four cognitive domains. Recent drug use was objectively quantified by 6-month hair analyses on 17 substances and metabolites. Considerably lower mean hair concentrations of stimulants (amphetamine, methamphetamine, methylphenidate, cocaine), opioids (morphine, methadone, codeine), and hallucinogens (ketamine, 2C-B) were detected in primary compared to polydrug users, while both user groups did not differ in their MDMA hair concentration. Cohen's d effect sizes for both comparisons, i.e., primary MDMA users vs. controls and polydrug MDMA users vs. controls, were highest for declarative memory (d primary =.90, d polydrug =1.21), followed by working memory (d primary =.52, d polydrug =.96), executive functions (d primary =.46, d polydrug =.86), and attention (d primary =.23, d polydrug =.70). Thus, primary MDMA users showed strong and relatively discrete declarative memory impairments, whereas MDMA polydrug users displayed broad and unspecific cognitive impairments. Consequently, even largely pure chronic MDMA use is associated with decreased performance in declarative memory, while additional deficits in working memory and executive functions displayed by polydrug MDMA users are likely driven by stimulant co-use. Copyright © 2017 Elsevier B.V. and ECNP. All rights reserved.
On a Multiprocessor Computer Farm for Online Physics Data Processing

CERN Document Server

Sinanis, N J

1999-01-01

The topic of this thesis is the design-phase performance evaluation of a large multiprocessor (MP) computer farm intended for the on-line data processing of the Compact Muon Solenoid (CMS) experiment. CMS is a high energy Physics experiment, planned to operate at CERN (Geneva, Switzerland) during the year 2005. The CMS computer farm is consisting of 1,000 MP computer systems and a 1,000 X 1,000 communications switch. The followed approach to the farm performance evaluation is through simulation studies and evaluation of small prototype systems building blocks of the farm. For the purposes of the simulation studies, we have developed a discrete-event, event-driven simulator that is capable to describe the high-level architecture of the farm and give estimates of the farm's performance. The simulator is designed in a modular way to facilitate the development of various modules that model the behavior of the farm building blocks in the desired level of detail. With the aid of this simulator, we make a particular...
SharePoint governance

OpenAIRE

Ali, Mudassar

2013-01-01

Masteroppgave i informasjons- og kommunikasjonsteknologi IKT590 2013 – Universitetet i Agder, Grimstad SharePoint is a web-based business collaboration platform from Microsoft which is very robust and dynamic in nature. The platform has been in the market for more than a decade and has been adapted by large number of organisations in the world. The platform has become larger in scale, richer in features and is improving consistently with every new version. However, SharePoint ...
Some algorithms for the solution of the symmetric eigenvalue problem on a multiprocessor electronic computer

International Nuclear Information System (INIS)

Molchanov, I.N.; Khimich, A.N.

1984-01-01

This article shows how a reflection method can be used to find the eigenvalues of a matrix by transforming the matrix to tridiagonal form. The method of conjugate gradients is used to find the smallest eigenvalue and the corresponding eigenvector of symmetric positive-definite band matrices. Topics considered include the computational scheme of the reflection method, the organization of parallel calculations by the reflection method, the computational scheme of the conjugate gradient method, the organization of parallel calculations by the conjugate gradient method, and the effectiveness of parallel algorithms. It is concluded that it is possible to increase the overall effectiveness of the multiprocessor electronic computers by either letting the newly available processors of a new problem operate in the multiprocessor mode, or by improving the coefficient of uniform partition of the original information
Automating large-scale reactor systems

International Nuclear Information System (INIS)

Kisner, R.A.

1985-01-01

This paper conveys a philosophy for developing automated large-scale control systems that behave in an integrated, intelligent, flexible manner. Methods for operating large-scale systems under varying degrees of equipment degradation are discussed, and a design approach that separates the effort into phases is suggested. 5 refs., 1 fig
Analogical reasoning in working memory: resources shared among relational integration, interference resolution, and maintenance.

Science.gov (United States)

Cho, Soohyun; Holyoak, Keith J; Cannon, Tyrone D

2007-09-01

We report a series of experiments using a pictorial analogy task designed to manipulate relational integration, interference resolution, and active maintenance simultaneously. The difficulty of the problems was varied in terms of the number of relations to be integrated, the need for interference resolution, and the duration of maintenance required to correctly solve the analogy. The participants showed decreases in performance when integrating multiple relations, as compared with a single relation, and when interference resolution was required in solving the analogy. When the participants were required to integrate multiple relations while simultaneously engaged in interference resolution, performance was worse, as compared with problems that incorporated either of these features alone. Maintenance of information across delays in the range of 1-4.5 sec led to greater decrements in visual memory, as compared with analogical reasoning. Misleading information caused interference when it had been necessarily attended to and maintained in working memory and, hence, had to be actively suppressed. However, sources of conflict within information that had not been attended to or encoded into working memory did not interfere with the ongoing controlled information processing required for relational integration. The findings provide evidence that relational integration and interference resolution depend on shared cognitive resources in working memory during analogical reasoning.
2: Local area networks as a multiprocessor treatment planning system

International Nuclear Information System (INIS)

Neblett, D.L.; Hogan, S.E.

1987-01-01

The creation of a local area network (LAN) of interconnected computers provides an environment of multi computer processors that adds a new dimension to treatment planning. A LAN system provides the opportunity to have two or more computers working on the plan in parallel. With high speed interprocessor transfer, events such as the time consuming task of correcting several individual beams for contours and inhomogeneities can be performed simultaneously; thus, effectively creating a parallel multiprocessor treatment planning system
When the globe is your classroom: teaching and learning about large-scale environmental change online

Science.gov (United States)

Howard, E. A.; Coleman, K. J.; Barford, C. L.; Kucharik, C.; Foley, J. A.

2005-12-01

Understanding environmental problems that cross physical and disciplinary boundaries requires a more holistic view of the world - a "systems" approach. Yet it is a challenge for many learners to start thinking this way, particularly when the problems are large in scale and not easily visible. We will describe our online university course, "Humans and the Changing Biosphere," which takes a whole-systems perspective for teaching regional to global-scale environmental science concepts, including climate, hydrology, ecology, and human demographics. We will share our syllabus and learning objectives and summarize our efforts to incorporate "best" practices for online teaching. We will describe challenges we have faced, and our efforts to reach different learner types. Our goals for this presentation are: (1) to communicate how a systems approach ties together environmental sciences (including climate, hydrology, ecology, biogeochemistry, and demography) that are often taught as separate disciplines; (2) to generate discussion about challenges of teaching large-scale environmental processes; (3) to share our experiences in teaching these topics online; (4) to receive ideas and feedback on future teaching strategies. We will explain why we developed this course online, and share our experiences about benefits and challenges of teaching over the web - including some suggestions about how to use technology to supplement face-to-face learning experiences (and vice versa). We will summarize assessment data about what students learned during the course, and discuss key misconceptions and barriers to learning. We will highlight the role of an online discussion board in creating classroom community, identifying misconceptions, and engaging different types of learners.
The Wechsler Memory Scale: A Review of Research.

Science.gov (United States)

Ivison, David

1990-01-01

Research on the standardization, reliability, validity, factor structure, and subtests of the Wechsler Memory Scale (WMS) (1945) and its revised version (1987) is reviewed. Much research relating to the WMS appears to be relevant to the revised version. Use of the instrument in Australia is discussed. (SLD)
Efficacy of the SU(3) scheme for ab initio large-scale calculations beyond the lightest nuclei

Energy Technology Data Exchange (ETDEWEB)

Dytrych, T. [Academy of Sciences of the Czech Republic (ASCR), Prague (Czech Republic); Louisiana State Univ., Baton Rouge, LA (United States); Maris, Pieter [Iowa State Univ., Ames, IA (United States); Launey, K. D. [Louisiana State Univ., Baton Rouge, LA (United States); Draayer, J. P. [Louisiana State Univ., Baton Rouge, LA (United States); Vary, James [Iowa State Univ., Ames, IA (United States); Langr, D. [Czech Technical Univ., Prague (Czech Republic); Aerospace Research and Test Establishment, Prague (Czech Republic); Saule, E. [Univ. of North Carolina, Charlotte, NC (United States); Caprio, M. A. [Univ. of Notre Dame, IN (United States); Catalyurek, U. [The Ohio State Univ., Columbus, OH (United States). Dept. of Electrical and Computer Engineering; Sosonkina, M. [Old Dominion Univ., Norfolk, VA (United States)

2016-06-09

We report on the computational characteristics of ab initio nuclear structure calculations in a symmetry-adapted no-core shell model (SA-NCSM) framework. We examine the computational complexity of the current implementation of the SA-NCSM approach, dubbed LSU3shell, by analyzing ab initio results for ⁶Li and ¹²C in large harmonic oscillator model spaces and SU(3)-selected subspaces. We demonstrate LSU3shell's strong-scaling properties achieved with highly-parallel methods for computing the many-body matrix elements. Results compare favorably with complete model space calculations and signi cant memory savings are achieved in physically important applications. In particular, a well-chosen symmetry-adapted basis a ords memory savings in calculations of states with a fixed total angular momentum in large model spaces while exactly preserving translational invariance.
Implementation of a large-scale hospital information infrastructure for multi-unit health-care services.

Science.gov (United States)

Yoo, Sun K; Kim, Dong Keun; Kim, Jung C; Park, Youn Jung; Chang, Byung Chul

2008-01-01

With the increase in demand for high quality medical services, the need for an innovative hospital information system has become essential. An improved system has been implemented in all hospital units of the Yonsei University Health System. Interoperability between multi-units required appropriate hardware infrastructure and software architecture. This large-scale hospital information system encompassed PACS (Picture Archiving and Communications Systems), EMR (Electronic Medical Records) and ERP (Enterprise Resource Planning). It involved two tertiary hospitals and 50 community hospitals. The monthly data production rate by the integrated hospital information system is about 1.8 TByte and the total quantity of data produced so far is about 60 TByte. Large scale information exchange and sharing will be particularly useful for telemedicine applications.
Large-Scale Ocean Circulation-Cloud Interactions Reduce the Pace of Transient Climate Change

Science.gov (United States)

Trossman, D. S.; Palter, J. B.; Merlis, T. M.; Huang, Y.; Xia, Y.

2016-01-01

Changes to the large scale oceanic circulation are thought to slow the pace of transient climate change due, in part, to their influence on radiative feedbacks. Here we evaluate the interactions between CO2-forced perturbations to the large-scale ocean circulation and the radiative cloud feedback in a climate model. Both the change of the ocean circulation and the radiative cloud feedback strongly influence the magnitude and spatial pattern of surface and ocean warming. Changes in the ocean circulation reduce the amount of transient global warming caused by the radiative cloud feedback by helping to maintain low cloud coverage in the face of global warming. The radiative cloud feedback is key in affecting atmospheric meridional heat transport changes and is the dominant radiative feedback mechanism that responds to ocean circulation change. Uncertainty in the simulated ocean circulation changes due to CO2 forcing may contribute a large share of the spread in the radiative cloud feedback among climate models.
The Development of Time-Based Prospective Memory in Childhood: The Role of Working Memory Updating

Science.gov (United States)

Voigt, Babett; Mahy, Caitlin E. V.; Ellis, Judi; Schnitzspahn, Katharina; Krause, Ivonne; Altgassen, Mareike; Kliegel, Matthias

2014-01-01

This large-scale study examined the development of time-based prospective memory (PM) across childhood and the roles that working memory updating and time monitoring play in driving age effects in PM performance. One hundred and ninety-seven children aged 5 to 14 years completed a time-based PM task where working memory updating load was…
The Software Reliability of Large Scale Integration Circuit and Very Large Scale Integration Circuit

OpenAIRE

Artem Ganiyev; Jan Vitasek

2010-01-01

This article describes evaluation method of faultless function of large scale integration circuits (LSI) and very large scale integration circuits (VLSI). In the article there is a comparative analysis of factors which determine faultless of integrated circuits, analysis of already existing methods and model of faultless function evaluation of LSI and VLSI. The main part describes a proposed algorithm and program for analysis of fault rate in LSI and VLSI circuits.
Camera memory study for large space telescope. [charge coupled devices

Science.gov (United States)

Hoffman, C. P.; Brewer, J. E.; Brager, E. A.; Farnsworth, D. L.

1975-01-01

Specifications were developed for a memory system to be used as the storage media for camera detectors on the large space telescope (LST) satellite. Detectors with limited internal storage time such as intensities charge coupled devices and silicon intensified targets are implied. The general characteristics are reported of different approaches to the memory system with comparisons made within the guidelines set forth for the LST application. Priority ordering of comparisons is on the basis of cost, reliability, power, and physical characteristics. Specific rationales are provided for the rejection of unsuitable memory technologies. A recommended technology was selected and used to establish specifications for a breadboard memory. Procurement scheduling is provided for delivery of system breadboards in 1976, prototypes in 1978, and space qualified units in 1980.
A large-scale application of the Kalman alignment algorithm to the CMS tracker

International Nuclear Information System (INIS)

Widl, E; Fruehwirth, R

2008-01-01

The Kalman alignment algorithm has been specifically developed to cope with the demands that arise from the specifications of the CMS Tracker. The algorithmic concept is based on the Kalman filter formalism and is designed to avoid the inversion of large matrices. Most notably, the algorithm strikes a balance between conventional global and local track-based alignment algorithms, by restricting the computation of alignment parameters not only to alignable objects hit by the same track, but also to all other alignable objects that are significantly correlated. Nevertheless, this feature also comes with various trade-offs: Mechanisms are needed that affect which alignable objects are significantly correlated and keep track of these correlations. Due to the large amount of alignable objects involved at each update (at least compared to local alignment algorithms), the time spent for retrieving and writing alignment parameters as well as the required user memory becomes a significant factor. The large-scale test presented here applies the Kalman alignment algorithm to the (misaligned) CMS Tracker barrel, and demonstrates the feasibility of the algorithm in a realistic scenario. It is shown that both the computation time and the amount of required user memory are within reasonable bounds, given the available computing resources, and that the obtained results are satisfactory
Reducing the market impact of large shares of intermittent energy in Denmark

DEFF Research Database (Denmark)

Jacobsen, Henrik; Zvingilaite, Erika

2010-01-01

The increasing prevalence of renewable and intermittent energy sources in the electricity system is creating new challenges for the interaction of the system. In Denmark, high renewable shares have been achieved without great difficulty, mainly due to the flexibility of the nearby Nordic hydro......-power dominated system. Further increases in the share of renewable energy sources require that additional options are considered to facilitate integration with the lowest possible cost. With large shares of intermittent energy, the impact can be observed on wholesale prices, giving both lower prices and higher...... and the attractiveness of additional interconnection capacity. This paper also analyses options for increasing the flexibility of heat generation involving large and decentralized CHP plants and heat generation based on electricity. The incentives that the market provides for shifting demand and using electricity...
The broadcast of shared attention and its impact on political persuasion.

Science.gov (United States)

Shteynberg, Garriy; Bramlett, James M; Fles, Elizabeth H; Cameron, Jaclyn

2016-11-01

In democracies where multitudes yield political influence, so does broadcast media that reaches those multitudes. However, broadcast media may not be powerful simply because it reaches a certain audience, but because each of the recipients is aware of that fact. That is, watching broadcast media can evoke a state of shared attention, or the perception of simultaneous coattention with others. Whereas past research has investigated the effects of shared attention with a few socially close others (i.e., friends, acquaintances, minimal ingroup members), we examine the impact of shared attention with a multitude of unfamiliar others in the context of televised broadcasting. In this paper, we explore whether shared attention increases the psychological impact of televised political speeches, and whether fewer numbers of coattending others diminishes this effect. Five studies investigate whether the perception of simultaneous coattention, or shared attention, on a mass broadcasted political speech leads to more extreme judgments. The results indicate that the perception of synchronous coattention (as compared with coattending asynchronously and attending alone) renders persuasive speeches more persuasive, and unpersuasive speeches more unpersuasive. We also find that recall memory for the content of the speech mediates the effect of shared attention on political persuasion. The results are consistent with the notion that shared attention on mass broadcasted information results in deeper processing of the content, rendering judgments more extreme. In all, our findings imply that shared attention is a cognitive capacity that supports large-scale social coordination, where multitudes of people can cognitively prioritize simultaneously coattended information. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.

Science.gov (United States)

Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T

2014-06-01

Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.
Infrastructure for Large-Scale Quality-Improvement Projects: Early Lessons from North Carolina Improving Performance in Practice

Science.gov (United States)

Newton, Warren P.; Lefebvre, Ann; Donahue, Katrina E.; Bacon, Thomas; Dobson, Allen

2010-01-01

Introduction: Little is known regarding how to accomplish large-scale health care improvement. Our goal is to improve the quality of chronic disease care in all primary care practices throughout North Carolina. Methods: Methods for improvement include (1) common quality measures and shared data system; (2) rapid cycle improvement principles; (3)…
Key Recovery Using Noised Secret Sharing with Discounts over Large Clouds

OpenAIRE

JAJODIA , Sushil; Litwin , Witold; Schwarz , Thomas

2013-01-01

Encryption key loss problem is the Achilles's heel of cryptography. Key escrow helps, but favors disclosures. Schemes for recoverable encryption keys through noised secret sharing alleviate the dilemma. Key owner escrows a specifically encrypted backup. The recovery needs a large cloud. Cloud cost, money trail should rarefy illegal attempts. We now propose noised secret sharing schemes supporting discounts. The recovery request with discount code lowers the recovery complexity, easily by orde...
Parallel processors and nonlinear structural dynamics algorithms and software

Science.gov (United States)

Belytschko, Ted

1989-01-01

A nonlinear structural dynamics finite element program was developed to run on a shared memory multiprocessor with pipeline processors. The program, WHAMS, was used as a framework for this work. The program employs explicit time integration and has the capability to handle both the nonlinear material behavior and large displacement response of 3-D structures. The elasto-plastic material model uses an isotropic strain hardening law which is input as a piecewise linear function. Geometric nonlinearities are handled by a corotational formulation in which a coordinate system is embedded at the integration point of each element. Currently, the program has an element library consisting of a beam element based on Euler-Bernoulli theory and trianglar and quadrilateral plate element based on Mindlin theory.
Phylogenetic distribution of large-scale genome patchiness

Directory of Open Access Journals (Sweden)

Hackenberg Michael

2008-04-01

Full Text Available Abstract Background The phylogenetic distribution of large-scale genome structure (i.e. mosaic compositional patchiness has been explored mainly by analytical ultracentrifugation of bulk DNA. However, with the availability of large, good-quality chromosome sequences, and the recently developed computational methods to directly analyze patchiness on the genome sequence, an evolutionary comparative analysis can be carried out at the sequence level. Results The local variations in the scaling exponent of the Detrended Fluctuation Analysis are used here to analyze large-scale genome structure and directly uncover the characteristic scales present in genome sequences. Furthermore, through shuffling experiments of selected genome regions, computationally-identified, isochore-like regions were identified as the biological source for the uncovered large-scale genome structure. The phylogenetic distribution of short- and large-scale patchiness was determined in the best-sequenced genome assemblies from eleven eukaryotic genomes: mammals (Homo sapiens, Pan troglodytes, Mus musculus, Rattus norvegicus, and Canis familiaris, birds (Gallus gallus, fishes (Danio rerio, invertebrates (Drosophila melanogaster and Caenorhabditis elegans, plants (Arabidopsis thaliana and yeasts (Saccharomyces cerevisiae. We found large-scale patchiness of genome structure, associated with in silico determined, isochore-like regions, throughout this wide phylogenetic range. Conclusion Large-scale genome structure is detected by directly analyzing DNA sequences in a wide range of eukaryotic chromosome sequences, from human to yeast. In all these genomes, large-scale patchiness can be associated with the isochore-like regions, as directly detected in silico at the sequence level.
Language Constructs for Data Partitioning and Distribution

Directory of Open Access Journals (Sweden)

P. Crooks

1995-01-01

Full Text Available This article presents a survey of language features for distributed memory multiprocessor systems (DMMs, in particular, systems that provide features for data partitioning and distribution. In these systems the programmer is freed from consideration of the low-level details of the target architecture in that there is no need to program explicit processes or specify interprocess communication. Programs are written according to the shared memory programming paradigm but the programmer is required to specify, by means of directives, additional syntax or interactive methods, how the data of the program are decomposed and distributed.
Managing large-scale models: DBS

International Nuclear Information System (INIS)

1981-05-01

A set of fundamental management tools for developing and operating a large scale model and data base system is presented. Based on experience in operating and developing a large scale computerized system, the only reasonable way to gain strong management control of such a system is to implement appropriate controls and procedures. Chapter I discusses the purpose of the book. Chapter II classifies a broad range of generic management problems into three groups: documentation, operations, and maintenance. First, system problems are identified then solutions for gaining management control are disucssed. Chapters III, IV, and V present practical methods for dealing with these problems. These methods were developed for managing SEAS but have general application for large scale models and data bases
Distributed-Memory Fast Maximal Independent Set

Energy Technology Data Exchange (ETDEWEB)

Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew

2017-09-13

The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluate their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.
Large scale structure and baryogenesis

International Nuclear Information System (INIS)

Kirilova, D.P.; Chizhov, M.V.

2001-08-01

We discuss a possible connection between the large scale structure formation and the baryogenesis in the universe. An update review of the observational indications for the presence of a very large scale 120h -1 Mpc in the distribution of the visible matter of the universe is provided. The possibility to generate a periodic distribution with the characteristic scale 120h -1 Mpc through a mechanism producing quasi-periodic baryon density perturbations during inflationary stage, is discussed. The evolution of the baryon charge density distribution is explored in the framework of a low temperature boson condensate baryogenesis scenario. Both the observed very large scale of a the visible matter distribution in the universe and the observed baryon asymmetry value could naturally appear as a result of the evolution of a complex scalar field condensate, formed at the inflationary stage. Moreover, for some model's parameters a natural separation of matter superclusters from antimatter ones can be achieved. (author)
Automatic management software for large-scale cluster system

International Nuclear Information System (INIS)

Weng Yunjian; Chinese Academy of Sciences, Beijing; Sun Gongxing

2007-01-01

At present, the large-scale cluster system faces to the difficult management. For example the manager has large work load. It needs to cost much time on the management and the maintenance of large-scale cluster system. The nodes in large-scale cluster system are very easy to be chaotic. Thousands of nodes are put in big rooms so that some managers are very easy to make the confusion with machines. How do effectively carry on accurate management under the large-scale cluster system? The article introduces ELFms in the large-scale cluster system. Furthermore, it is proposed to realize the large-scale cluster system automatic management. (authors)
A Large-Scale Initiative Inviting Patients to Share Personal Fitness Tracker Data with Their Providers: Initial Results.

Directory of Open Access Journals (Sweden)

Joshua M Pevnick

Full Text Available Personal fitness trackers (PFT have substantial potential to improve healthcare.To quantify and characterize early adopters who shared their PFT data with providers.We used bivariate statistics and logistic regression to compare patients who shared any PFT data vs. patients who did not.A patient portal was used to invite 79,953 registered portal users to share their data. Of 66,105 users included in our analysis, 499 (0.8% uploaded data during an initial 37-day study period. Bivariate and regression analysis showed that early adopters were more likely than non-adopters to be younger, male, white, health system employees, and to have higher BMIs. Neither comorbidities nor utilization predicted adoption.Our results demonstrate that patients had little intrinsic desire to share PFT data with their providers, and suggest that patients most at risk for poor health outcomes are least likely to share PFT data. Marketing, incentives, and/or cultural change may be needed to induce such data-sharing.
Parallel statistical image reconstruction for cone-beam x-ray CT on a shared memory computation platform

International Nuclear Information System (INIS)

Kole, J S; Beekman, F J

2005-01-01

Statistical reconstruction methods offer possibilities of improving image quality as compared to analytical methods, but current reconstruction times prohibit routine clinical applications. To reduce reconstruction times we have parallelized a statistical reconstruction algorithm for cone-beam x-ray CT, the ordered subset convex algorithm (OSC), and evaluated it on a shared memory computer. Two different parallelization strategies were developed: one that employs parallelism by computing the work for all projections within a subset in parallel, and one that divides the total volume into parts and processes the work for each sub-volume in parallel. Both methods are used to reconstruct a three-dimensional mathematical phantom on two different grid densities. The reconstructed images are binary identical to the result of the serial (non-parallelized) algorithm. The speed-up factor equals approximately 30 when using 32 to 40 processors, and scales almost linearly with the number of cpus for both methods. The huge reduction in computation time allows us to apply statistical reconstruction to clinically relevant studies for the first time
A 32-bit computer for large memory applications on the FASTBUS

International Nuclear Information System (INIS)

Kellner, R.; Blossom, J.M.; Hung, J.P.

1985-01-01

A FASTBUS based 32-bit computer is being built at Los Alamos National Laboratory for use in systems requiring large fast memory in the FASTBUS environment. A separate local execution bus allows data reduction to proceed concurrently with other FASTBUS operations. The computer, which can operate in either master or slave mode, includes the National Semiconductor NS32032 chip set with demand paged memory management, floating point slave processor, interrupt control unit, timers, and time-of-day clock. The 16.0 megabytes of random access memory are interleaved to allow windowed direct memory access on and off the FASTBUS at 80 megabytes per second
Resource-sharing in multiple-component working memory

OpenAIRE

Doherty, Jason M.; Logie, Robert H.

2016-01-01

Working memory research often focuses on measuring the capacity of the system and how it relates to other cognitive abilities. However, research into the structure of working memory is less concerned with an overall capacity measure but rather with the intricacies of underlying components and their contribution to different tasks. A number of models of working memory structure have been proposed, each with different assumptions and predictions, but none of which adequately accounts for the fu...
Scaling dependence of memory windows and different carrier charging behaviors in Si nanocrystal nonvolatile memory devices

Science.gov (United States)

Yu, Jie; Chen, Kun-ji; Ma, Zhong-yuan; Zhang, Xin-xin; Jiang, Xiao-fan; Wu, Yang-qing; Huang, Xin-fan; Oda, Shunri

2016-09-01

Based on the charge storage mode, it is important to investigate the scaling dependence of memory performance in silicon nanocrystal (Si-NC) nonvolatile memory (NVM) devices for its scaling down limit. In this work, we made eight kinds of test key cells with different gate widths and lengths by 0.13-μm node complementary metal oxide semiconductor (CMOS) technology. It is found that the memory windows of eight kinds of test key cells are almost the same of about 1.64 V @ ± 7 V/1 ms, which are independent of the gate area, but mainly determined by the average size (12 nm) and areal density (1.8 × 1011/cm2) of Si-NCs. The program/erase (P/E) speed characteristics are almost independent of gate widths and lengths. However, the erase speed is faster than the program speed of test key cells, which is due to the different charging behaviors between electrons and holes during the operation processes. Furthermore, the data retention characteristic is also independent of the gate area. Our findings are useful for further scaling down of Si-NC NVM devices to improve the performance and on-chip integration. Project supported by the State Key Development Program for Basic Research of China (Grant No. 2010CB934402) and the National Natural Science Foundation of China (Grant Nos. 11374153, 61571221, and 61071008).
Apparently abnormal Wechsler Memory Scale index score patterns in the normal population.

Science.gov (United States)

Carrasco, Roman Marcus; Grups, Josefine; Evans, Brittney; Simco, Edward; Mittenberg, Wiley

2015-01-01

Interpretation of the Wechsler Memory Scale-Fourth Edition may involve examination of multiple memory index score contrasts and similar comparisons with Wechsler Adult Intelligence Scale-Fourth Edition ability indexes. Standardization sample data suggest that 15-point differences between any specific pair of index scores are relatively uncommon in normal individuals, but these base rates refer to a comparison between a single pair of indexes rather than multiple simultaneous comparisons among indexes. This study provides normative data for the occurrence of multiple index score differences calculated by using Monte Carlo simulations and validated against standardization data. Differences of 15 points between any two memory indexes or between memory and ability indexes occurred in 60% and 48% of the normative sample, respectively. Wechsler index score discrepancies are normally common and therefore not clinically meaningful when numerous such comparisons are made. Explicit prior interpretive hypotheses are necessary to reduce the number of index comparisons and associated false-positive conclusions. Monte Carlo simulation accurately predicts these false-positive rates.
[Evaluation of memory in acquired brain injury: a comparison between the Wechsler Memory Scale and the Rivermead Behaviour Memory Test].

Science.gov (United States)

Guinea-Hidalgo, A; Luna-Lario, P; Tirapu-Ustárroz, J

Learning processes and memory are frequently compromised in acquired brain injury (ABI), while at the same time such involvement is often heterogeneous and a source of deficits in other cognitive capacities and significant functional limitations. A good neuropsychological evaluation of memory is designed to study not only the type, intensity and nature of the problems, but also the way they manifest in daily life. This study examines the correlation between a traditional memory test, the Wechsler Memory Scale-III (WMS-III), and a memory test that is considered to be functional, the Rivermead Behavioural Memory Test (RBMT), in a sample of 60 patients with ABI. All the correlations that were observed were moderate. Greater correlations were found among the verbal memory subtests than among the visual memory tests. An important number of subjects with below-normal scalar scores on the WMS-III correctly performed (either fully or partially) the corresponding test in the RBMT. The joint use of the WMS-III and RBMT in evaluation can provide a more comprehensive analysis of the memory deficits and their rehabilitation. The lower scores obtained in the WMS-III compared to those of the RBMT indicate greater sensitivity of the former. Nevertheless, further testing needs to be carried out in the future to compare the performance in the tests after the patients and those around them have subjectively assessed their functional limitations. This would make it possible to determine which of the two tests offers the best balance between sensitivity and specificity, as well as a higher predictive value.
Software for the ACP [Advanced Computer Program] multiprocessor system

International Nuclear Information System (INIS)

Biel, J.; Areti, H.; Atac, R.

1987-01-01

Software has been developed for use with the Fermilab Advanced Computer Program (ACP) multiprocessor system. The software was designed to make a system of a hundred independent node processors as easy to use as a single, powerful CPU. Subroutines have been developed by which a user's host program can send data to and get results from the program running in each of his ACP node processors. Utility programs make it easy to compile and link host and node programs, to debug a node program on an ACP development system, and to submit a debugged program to an ACP production system
The Human Salivary Microbiome Is Shaped by Shared Environment Rather than Genetics: Evidence from a Large Family of Closely Related Individuals.

Science.gov (United States)

Shaw, Liam; Ribeiro, Andre L R; Levine, Adam P; Pontikos, Nikolas; Balloux, Francois; Segal, Anthony W; Roberts, Adam P; Smith, Andrew M

2017-09-12

The human microbiome is affected by multiple factors, including the environment and host genetics. In this study, we analyzed the salivary microbiomes of an extended family of Ashkenazi Jewish individuals living in several cities and investigated associations with both shared household and host genetic similarities. We found that environmental effects dominated over genetic effects. While there was weak evidence of geographical structuring at the level of cities, we observed a large and significant effect of shared household on microbiome composition, supporting the role of the immediate shared environment in dictating the presence or absence of taxa. This effect was also seen when including adults who had grown up in the same household but moved out prior to the time of sampling, suggesting that the establishment of the salivary microbiome earlier in life may affect its long-term composition. We found weak associations between host genetic relatedness and microbiome dissimilarity when using family pedigrees as proxies for genetic similarity. However, this association disappeared when using more-accurate measures of kinship based on genome-wide genetic markers, indicating that the environment rather than host genetics is the dominant factor affecting the composition of the salivary microbiome in closely related individuals. Our results support the concept that there is a consistent core microbiome conserved across global scales but that small-scale effects due to a shared living environment significantly affect microbial community composition. IMPORTANCE Previous research shows that the salivary microbiomes of relatives are more similar than those of nonrelatives, but it remains difficult to distinguish the effects of relatedness and shared household environment. Furthermore, pedigree measures may not accurately measure host genetic similarity. In this study, we include genetic relatedness based on genome-wide single nucleotide polymorphisms (SNPs) (rather than
Large scale network-centric distributed systems

CERN Document Server

Sarbazi-Azad, Hamid

2014-01-01

A highly accessible reference offering a broad range of topics and insights on large scale network-centric distributed systems Evolving from the fields of high-performance computing and networking, large scale network-centric distributed systems continues to grow as one of the most important topics in computing and communication and many interdisciplinary areas. Dealing with both wired and wireless networks, this book focuses on the design and performance issues of such systems. Large Scale Network-Centric Distributed Systems provides in-depth coverage ranging from ground-level hardware issu

Challenges in Managing Trustworthy Large-scale Digital Science

Science.gov (United States)

Evans, B. J. K.

2017-12-01

The increased use of large-scale international digital science has opened a number of challenges for managing, handling, using and preserving scientific information. The large volumes of information are driven by three main categories - model outputs including coupled models and ensembles, data products that have been processing to a level of usability, and increasingly heuristically driven data analysis. These data products are increasingly the ones that are usable by the broad communities, and far in excess of the raw instruments data outputs. The data, software and workflows are then shared and replicated to allow broad use at an international scale, which places further demands of infrastructure to support how the information is managed reliably across distributed resources. Users necessarily rely on these underlying "black boxes" so that they are productive to produce new scientific outcomes. The software for these systems depend on computational infrastructure, software interconnected systems, and information capture systems. This ranges from the fundamentals of the reliability of the compute hardware, system software stacks and libraries, and the model software. Due to these complexities and capacity of the infrastructure, there is an increased emphasis of transparency of the approach and robustness of the methods over the full reproducibility. Furthermore, with large volume data management, it is increasingly difficult to store the historical versions of all model and derived data. Instead, the emphasis is on the ability to access the updated products and the reliability by which both previous outcomes are still relevant and can be updated for the new information. We will discuss these challenges and some of the approaches underway that are being used to address these issues.
Large-Scale Outflows in Seyfert Galaxies

Science.gov (United States)

Colbert, E. J. M.; Baum, S. A.

1995-12-01

\\catcode`\\@=11 \\ialign{m @th#1hfil ##hfil \\crcr#2\\crcr\\sim\\crcr}}} \\catcode`\\@=12 Highly collimated outflows extend out to Mpc scales in many radio-loud active galaxies. In Seyfert galaxies, which are radio-quiet, the outflows extend out to kpc scales and do not appear to be as highly collimated. In order to study the nature of large-scale (>~1 kpc) outflows in Seyferts, we have conducted optical, radio and X-ray surveys of a distance-limited sample of 22 edge-on Seyfert galaxies. Results of the optical emission-line imaging and spectroscopic survey imply that large-scale outflows are present in >~{{1} /{4}} of all Seyferts. The radio (VLA) and X-ray (ROSAT) surveys show that large-scale radio and X-ray emission is present at about the same frequency. Kinetic luminosities of the outflows in Seyferts are comparable to those in starburst-driven superwinds. Large-scale radio sources in Seyferts appear diffuse, but do not resemble radio halos found in some edge-on starburst galaxies (e.g. M82). We discuss the feasibility of the outflows being powered by the active nucleus (e.g. a jet) or a circumnuclear starburst.
Manipulations of attention dissociate fragile visual short-term memory from visual working memory.

Science.gov (United States)

Vandenbroucke, Annelinde R E; Sligte, Ilja G; Lamme, Victor A F

2011-05-01

People often rely on information that is no longer in view, but maintained in visual short-term memory (VSTM). Traditionally, VSTM is thought to operate on either a short time-scale with high capacity - iconic memory - or a long time scale with small capacity - visual working memory. Recent research suggests that in addition, an intermediate stage of memory in between iconic memory and visual working memory exists. This intermediate stage has a large capacity and a lifetime of several seconds, but is easily overwritten by new stimulation. We therefore termed it fragile VSTM. In previous studies, fragile VSTM has been dissociated from iconic memory by the characteristics of the memory trace. In the present study, we dissociated fragile VSTM from visual working memory by showing a differentiation in their dependency on attention. A decrease in attention during presentation of the stimulus array greatly reduced the capacity of visual working memory, while this had only a small effect on the capacity of fragile VSTM. We conclude that fragile VSTM is a separate memory store from visual working memory. Thus, a tripartite division of VSTM appears to be in place, comprising iconic memory, fragile VSTM and visual working memory. Copyright © 2011 Elsevier Ltd. All rights reserved.
Parallel local search for solving Constraint Problems on the Cell Broadband Engine (Preliminary Results

Directory of Open Access Journals (Sweden)

Salvator Abreu

2009-10-01

Full Text Available We explore the use of the Cell Broadband Engine (Cell/BE for short for combinatorial optimization applications: we present a parallel version of a constraint-based local search algorithm that has been implemented on a multiprocessor BladeCenter machine with twin Cell/BE processors (total of 16 SPUs per blade. This algorithm was chosen because it fits very well the Cell/BE architecture and requires neither shared memory nor communication between processors, while retaining a compact memory footprint. We study the performance on several large optimization benchmarks and show that this achieves mostly linear time speedups, even sometimes super-linear. This is possible because the parallel implementation might explore simultaneously different parts of the search space and therefore converge faster towards the best sub-space and thus towards a solution. Besides getting speedups, the resulting times exhibit a much smaller variance, which benefits applications where a timely reply is critical.
Adaptive scaling of reward in episodic memory:a replication study

OpenAIRE

Mason, Alice; Ludwig, Casimir; Farrell, Simon

2017-01-01

Reward is thought to enhance episodic memory formation via dopaminergic consolidation. Bunzeck, Dayan, Dolan, and Duzel [(2010). A common mechanism for adaptive scaling of reward and novelty. Human Brain Mapping, 31, 1380–1394] provided functional magnetic resonance imaging (fMRI) and behavioural evidence that reward and episodic memory systems are sensitive to the contextual value of a reward—whether it is relatively higher or lower—as opposed to absolute value or prediction error. We carrie...
Coping with Memory Loss

Science.gov (United States)

... Consumers Home For Consumers Consumer Updates Coping With Memory Loss Share Tweet Linkedin Pin it More sharing ... be evaluated by a health professional. What Causes Memory Loss? Anything that affects cognition—the process of ...
Improving Large-scale Storage System Performance via Topology-aware and Balanced Data Placement

Energy Technology Data Exchange (ETDEWEB)

Wang, Feiyi [ORNL; Oral, H Sarp [ORNL; Vazhkudai, Sudharshan S [ORNL

2014-01-01

With the advent of big data, the I/O subsystems of large-scale compute clusters are becoming a center of focus, with more applications putting greater demands on end-to-end I/O performance. These subsystems are often complex in design. They comprise of multiple hardware and software layers to cope with the increasing capacity, capability and scalability requirements of data intensive applications. The sharing nature of storage resources and the intrinsic interactions across these layers make it to realize user-level, end-to-end performance gains a great challenge. We propose a topology-aware resource load balancing strategy to improve per-application I/O performance. We demonstrate the effectiveness of our algorithm on an extreme-scale compute cluster, Titan, at the Oak Ridge Leadership Computing Facility (OLCF). Our experiments with both synthetic benchmarks and a real-world application show that, even under congestion, our proposed algorithm can improve large-scale application I/O performance significantly, resulting in both the reduction of application run times and higher resolution simulation runs.
SCALE INTERACTION IN A MIXING LAYER. THE ROLE OF THE LARGE-SCALE GRADIENTS

KAUST Repository

Fiscaletti, Daniele

2015-08-23

The interaction between scales is investigated in a turbulent mixing layer. The large-scale amplitude modulation of the small scales already observed in other works depends on the crosswise location. Large-scale positive fluctuations correlate with a stronger activity of the small scales on the low speed-side of the mixing layer, and a reduced activity on the high speed-side. However, from physical considerations we would expect the scales to interact in a qualitatively similar way within the flow and across different turbulent flows. Therefore, instead of the large-scale fluctuations, the large-scale gradients modulation of the small scales has been additionally investigated.
Asynchronous Two-Level Checkpointing Scheme for Large-Scale Adjoints in the Spectral-Element Solver Nek5000

Energy Technology Data Exchange (ETDEWEB)

Schanen, Michel; Marin, Oana; Zhang, Hong; Anitescu, Mihai

2016-01-01

Adjoints are an important computational tool for large-scale sensitivity evaluation, uncertainty quantification, and derivative-based optimization. An essential component of their performance is the storage/recomputation balance in which efficient checkpointing methods play a key role. We introduce a novel asynchronous two-level adjoint checkpointing scheme for multistep numerical time discretizations targeted at large-scale numerical simulations. The checkpointing scheme combines bandwidth-limited disk checkpointing and binomial memory checkpointing. Based on assumptions about the target petascale systems, which we later demonstrate to be realistic on the IBM Blue Gene/Q system Mira, we create a model of the expected performance of our checkpointing approach and validate it using the highly scalable Navier-Stokes spectralelement solver Nek5000 on small to moderate subsystems of the Mira supercomputer. In turn, this allows us to predict optimal algorithmic choices when using all of Mira. We also demonstrate that two-level checkpointing is significantly superior to single-level checkpointing when adjoining a large number of time integration steps. To our knowledge, this is the first time two-level checkpointing had been designed, implemented, tuned, and demonstrated on fluid dynamics codes at large scale of 50k+ cores.
Multi-processor system for real-time flow estimation in medical ultrasound imaging

DEFF Research Database (Denmark)

Stetson, Paul F.; Jensen, Jesper Lomborg; Antonius, Peter

1997-01-01

the processed data. The generous bandwidth of the links makes it easy to balance the computational load among the processors.In order to manage the shared system memory and to make use of the parallel processing capabilities of the system, a real-time multitasking kernel has been developed. The kernel uses...
Scales of Memory in the Archaeology of the Second World War

Directory of Open Access Journals (Sweden)

Gabriel Moshenska

2006-11-01

Full Text Available The growing interest in archaeologies of the recent past has included attempts to link archaeology with memory in its various forms but has lacked a coherent theoretical and methodological approach. This paper outlines a model for engaging with memory in the archaeology of the Second World War, drawing on recent work in memory studies and oral history. One of the principal pitfalls in memory work is the conflation and confusion of individual and social memory: in this paper I attempt to identify and outline different forms or scales of memory: individual memory, group narratives, and social memorialisation. If we distinguish between these models in relation to Second World War archaeological sites we can assess their accuracy and usefulness and begin to trace the intricate power relations implicit in memory work. The sites in question, a Nazi prison in Berlin and a Prisoner of War camp in Poland, illustrate the contested and highly politicised nature of memory-based work and archaeological studies of this period. By opening up such sites to the popular gaze, archaeologists have the power to bring these debates into the public sphere, potentially undermining the hegemony of officially sanctioned memory and making the production of meaningful pasts a more inclusive process.
The Contribution of Working Memory to Fluid Reasoning: Capacity, Control, or Both?

Science.gov (United States)

Chuderski, Adam; Necka, Edward

2012-01-01

Fluid reasoning shares a large part of its variance with working memory capacity (WMC). The literature on working memory (WM) suggests that the capacity of the focus of attention responsible for simultaneous maintenance and integration of information within WM, as well as the effectiveness of executive control exerted over WM, determines…
Complex dewetting scenarios of ultrathin silicon films for large-scale nanoarchitectures.

Science.gov (United States)

Naffouti, Meher; Backofen, Rainer; Salvalaglio, Marco; Bottein, Thomas; Lodari, Mario; Voigt, Axel; David, Thomas; Benkouider, Abdelmalek; Fraj, Ibtissem; Favre, Luc; Ronda, Antoine; Berbezier, Isabelle; Grosso, David; Abbarchi, Marco; Bollani, Monica

2017-11-01

Dewetting is a ubiquitous phenomenon in nature; many different thin films of organic and inorganic substances (such as liquids, polymers, metals, and semiconductors) share this shape instability driven by surface tension and mass transport. Via templated solid-state dewetting, we frame complex nanoarchitectures of monocrystalline silicon on insulator with unprecedented precision and reproducibility over large scales. Phase-field simulations reveal the dominant role of surface diffusion as a driving force for dewetting and provide a predictive tool to further engineer this hybrid top-down/bottom-up self-assembly method. Our results demonstrate that patches of thin monocrystalline films of metals and semiconductors share the same dewetting dynamics. We also prove the potential of our method by fabricating nanotransfer molding of metal oxide xerogels on silicon and glass substrates. This method allows the novel possibility of transferring these Si-based patterns on different materials, which do not usually undergo dewetting, offering great potential also for microfluidic or sensing applications.
Parallel hierarchical global illumination

Energy Technology Data Exchange (ETDEWEB)

Snell, Quinn O. [Iowa State Univ., Ames, IA (United States)

1997-10-08

Solving the global illumination problem is equivalent to determining the intensity of every wavelength of light in all directions at every point in a given scene. The complexity of the problem has led researchers to use approximation methods for solving the problem on serial computers. Rather than using an approximation method, such as backward ray tracing or radiosity, the authors have chosen to solve the Rendering Equation by direct simulation of light transport from the light sources. This paper presents an algorithm that solves the Rendering Equation to any desired accuracy, and can be run in parallel on distributed memory or shared memory computer systems with excellent scaling properties. It appears superior in both speed and physical correctness to recent published methods involving bidirectional ray tracing or hybrid treatments of diffuse and specular surfaces. Like progressive radiosity methods, it dynamically refines the geometry decomposition where required, but does so without the excessive storage requirements for ray histories. The algorithm, called Photon, produces a scene which converges to the global illumination solution. This amounts to a huge task for a 1997-vintage serial computer, but using the power of a parallel supercomputer significantly reduces the time required to generate a solution. Currently, Photon can be run on most parallel environments from a shared memory multiprocessor to a parallel supercomputer, as well as on clusters of heterogeneous workstations.
Parallelization of a Monte Carlo particle transport simulation code

Science.gov (United States)

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Development of the Large-Scale Statistical Analysis System of Satellites Observations Data with Grid Datafarm Architecture

Science.gov (United States)

Yamamoto, K.; Murata, K.; Kimura, E.; Honda, R.

2006-12-01

In the Solar-Terrestrial Physics (STP) field, the amount of satellite observation data has been increasing every year. It is necessary to solve the following three problems to achieve large-scale statistical analyses of plenty of data. (i) More CPU power and larger memory and disk size are required. However, total powers of personal computers are not enough to analyze such amount of data. Super-computers provide a high performance CPU and rich memory area, but they are usually separated from the Internet or connected only for the purpose of programming or data file transfer. (ii) Most of the observation data files are managed at distributed data sites over the Internet. Users have to know where the data files are located. (iii) Since no common data format in the STP field is available now, users have to prepare reading program for each data by themselves. To overcome the problems (i) and (ii), we constructed a parallel and distributed data analysis environment based on the Gfarm reference implementation of the Grid Datafarm architecture. The Gfarm shares both computational resources and perform parallel distributed processings. In addition, the Gfarm provides the Gfarm filesystem which can be as virtual directory tree among nodes. The Gfarm environment is composed of three parts; a metadata server to manage distributed files information, filesystem nodes to provide computational resources and a client to throw a job into metadata server and manages data processing schedulings. In the present study, both data files and data processes are parallelized on the Gfarm with 6 file system nodes: CPU clock frequency of each node is Pentium V 1GHz, 256MB memory and40GB disk. To evaluate performances of the present Gfarm system, we scanned plenty of data files, the size of which is about 300MB for each, in three processing methods: sequential processing in one node, sequential processing by each node and parallel processing by each node. As a result, in comparison between the
Domain-general involvement of the posterior frontolateral cortex in time-based resource-sharing in working memory: An fMRI study.

Science.gov (United States)

Vergauwe, Evie; Hartstra, Egbert; Barrouillet, Pierre; Brass, Marcel

2015-07-15

Working memory is often defined in cognitive psychology as a system devoted to the simultaneous processing and maintenance of information. In line with the time-based resource-sharing model of working memory (TBRS; Barrouillet and Camos, 2015; Barrouillet et al., 2004), there is accumulating evidence that, when memory items have to be maintained while performing a concurrent activity, memory performance depends on the cognitive load of this activity, independently of the domain involved. The present study used fMRI to identify regions in the brain that are sensitive to variations in cognitive load in a domain-general way. More precisely, we aimed at identifying brain areas that activate during maintenance of memory items as a direct function of the cognitive load induced by both verbal and spatial concurrent tasks. Results show that the right IFJ and bilateral SPL/IPS are the only areas showing an increased involvement as cognitive load increases and do so in a domain general manner. When correlating the fMRI signal with the approximated cognitive load as defined by the TBRS model, it was shown that the main focus of the cognitive load-related activation is located in the right IFJ. The present findings indicate that the IFJ makes domain-general contributions to time-based resource-sharing in working memory and allowed us to generate the novel hypothesis by which the IFJ might be the neural basis for the process of rapid switching. We argue that the IFJ might be a crucial part of a central attentional bottleneck in the brain because of its inability to upload more than one task rule at once. Copyright © 2015 Elsevier Inc. All rights reserved.
Large-scale simulations of plastic neural networks on neuromorphic hardware

Directory of Open Access Journals (Sweden)

James Courtney Knight

2016-04-01

Full Text Available SpiNNaker is a digital, neuromorphic architecture designed for simulating large-scale spiking neural networks at speeds close to biological real-time. Rather than using bespoke analog or digital hardware, the basic computational unit of a SpiNNaker system is a general-purpose ARM processor, allowing it to be programmed to simulate a wide variety of neuron and synapse models. This flexibility is particularly valuable in the study of biological plasticity phenomena. A recently proposed learning rule based on the Bayesian Confidence Propagation Neural Network (BCPNN paradigm offers a generic framework for modeling the interaction of different plasticity mechanisms using spiking neurons. However, it can be computationally expensive to simulate large networks with BCPNN learning since it requires multiple state variables for each synapse, each of which needs to be updated every simulation time-step. We discuss the trade-offs in efficiency and accuracy involved in developing an event-based BCPNN implementation for SpiNNaker based on an analytical solution to the BCPNN equations, and detail the steps taken to fit this within the limited computational and memory resources of the SpiNNaker architecture. We demonstrate this learning rule by learning temporal sequences of neural activity within a recurrent attractor network which we simulate at scales of up to 20000 neurons and 51200000 plastic synapses: the largest plastic neural network ever to be simulated on neuromorphic hardware. We also run a comparable simulation on a Cray XC-30 supercomputer system and find that, if it is to match the run-time of our SpiNNaker simulation, the super computer system uses approximately more power. This suggests that cheaper, more power efficient neuromorphic systems are becoming useful discovery tools in the study of plasticity in large-scale brain models.
In Whom Do We Trust - Sharing Security Events

NARCIS (Netherlands)

Steinberger, Jessica; Kuhnert, Benjamin; Sperotto, Anna; Baier, Harald; Pras, Aiko

2016-01-01

Security event sharing is deemed of critical importance to counteract large-scale attacks at Internet service provider (ISP) networks as these attacks have become larger, more sophisticated and frequent. On the one hand, security event sharing is regarded to speed up organization's mitigation and
Lightweight computational steering of very large scale molecular dynamics simulations

International Nuclear Information System (INIS)

Beazley, D.M.

1996-01-01

We present a computational steering approach for controlling, analyzing, and visualizing very large scale molecular dynamics simulations involving tens to hundreds of millions of atoms. Our approach relies on extensible scripting languages and an easy to use tool for building extensions and modules. The system is extremely easy to modify, works with existing C code, is memory efficient, and can be used from inexpensive workstations and networks. We demonstrate how we have used this system to manipulate data from production MD simulations involving as many as 104 million atoms running on the CM-5 and Cray T3D. We also show how this approach can be used to build systems that integrate common scripting languages (including Tcl/Tk, Perl, and Python), simulation code, user extensions, and commercial data analysis packages

The Development of Time-Based Prospective Memory in Childhood: The Role of Working Memory Updating

NARCIS (Netherlands)

Voigt, B.; Mahy, C.E.V.; Ellis, J.; Schnitzspahn, K.M.; Krause, I.; Altgassen, A.M.; Kliegel, M.

2014-01-01

This large-scale study examined the development of time-based prospective memory (PM) across childhood and the roles that working memory updating and time monitoring play in driving age effects in PM performance. One hundred and ninety-seven children aged 5 to 14 years completed a time-based PM task
Large-scale perspective as a challenge

NARCIS (Netherlands)

Plomp, M.G.A.

2012-01-01

1. Scale forms a challenge for chain researchers: when exactly is something ‘large-scale’? What are the underlying factors (e.g. number of parties, data, objects in the chain, complexity) that determine this? It appears to be a continuum between small- and large-scale, where positioning on that
Scale interactions in a mixing layer – the role of the large-scale gradients

KAUST Repository

Fiscaletti, D.

2016-02-15

© 2016 Cambridge University Press. The interaction between the large and the small scales of turbulence is investigated in a mixing layer, at a Reynolds number based on the Taylor microscale of , via direct numerical simulations. The analysis is performed in physical space, and the local vorticity root-mean-square (r.m.s.) is taken as a measure of the small-scale activity. It is found that positive large-scale velocity fluctuations correspond to large vorticity r.m.s. on the low-speed side of the mixing layer, whereas, they correspond to low vorticity r.m.s. on the high-speed side. The relationship between large and small scales thus depends on position if the vorticity r.m.s. is correlated with the large-scale velocity fluctuations. On the contrary, the correlation coefficient is nearly constant throughout the mixing layer and close to unity if the vorticity r.m.s. is correlated with the large-scale velocity gradients. Therefore, the small-scale activity appears closely related to large-scale gradients, while the correlation between the small-scale activity and the large-scale velocity fluctuations is shown to reflect a property of the large scales. Furthermore, the vorticity from unfiltered (small scales) and from low pass filtered (large scales) velocity fields tend to be aligned when examined within vortical tubes. These results provide evidence for the so-called \\'scale invariance\\' (Meneveau & Katz, Annu. Rev. Fluid Mech., vol. 32, 2000, pp. 1-32), and suggest that some of the large-scale characteristics are not lost at the small scales, at least at the Reynolds number achieved in the present simulation.
Working Memory and Reasoning Benefit from Different Modes of Large-scale Brain Dynamics in Healthy Older Adults.

Science.gov (United States)

Lebedev, Alexander V; Nilsson, Jonna; Lövdén, Martin

2018-07-01

Researchers have proposed that solving complex reasoning problems, a key indicator of fluid intelligence, involves the same cognitive processes as solving working memory tasks. This proposal is supported by an overlap of the functional brain activations associated with the two types of tasks and by high correlations between interindividual differences in performance. We replicated these findings in 53 older participants but also showed that solving reasoning and working memory problems benefits from different configurations of the functional connectome and that this dissimilarity increases with a higher difficulty load. Specifically, superior performance in a typical working memory paradigm ( n-back) was associated with upregulation of modularity (increased between-network segregation), whereas performance in the reasoning task was associated with effective downregulation of modularity. We also showed that working memory training promotes task-invariant increases in modularity. Because superior reasoning performance is associated with downregulation of modular dynamics, training may thus have fostered an inefficient way of solving the reasoning tasks. This could help explain why working memory training does little to promote complex reasoning performance. The study concludes that complex reasoning abilities cannot be reduced to working memory and suggests the need to reconsider the feasibility of using working memory training interventions to attempt to achieve effects that transfer to broader cognition.
CMOL/CMOS hardware architectures and performance/price for Bayesian memory - The building block of intelligent systems

Science.gov (United States)

Zaveri, Mazad Shaheriar

The semiconductor/computer industry has been following Moore's law for several decades and has reaped the benefits in speed and density of the resultant scaling. Transistor density has reached almost one billion per chip, and transistor delays are in picoseconds. However, scaling has slowed down, and the semiconductor industry is now facing several challenges. Hybrid CMOS/nano technologies, such as CMOL, are considered as an interim solution to some of the challenges. Another potential architectural solution includes specialized architectures for applications/models in the intelligent computing domain, one aspect of which includes abstract computational models inspired from the neuro/cognitive sciences. Consequently in this dissertation, we focus on the hardware implementations of Bayesian Memory (BM), which is a (Bayesian) Biologically Inspired Computational Model (BICM). This model is a simplified version of George and Hawkins' model of the visual cortex, which includes an inference framework based on Judea Pearl's belief propagation. We then present a "hardware design space exploration" methodology for implementing and analyzing the (digital and mixed-signal) hardware for the BM. This particular methodology involves: analyzing the computational/operational cost and the related micro-architecture, exploring candidate hardware components, proposing various custom hardware architectures using both traditional CMOS and hybrid nanotechnology - CMOL, and investigating the baseline performance/price of these architectures. The results suggest that CMOL is a promising candidate for implementing a BM. Such implementations can utilize the very high density storage/computation benefits of these new nano-scale technologies much more efficiently; for example, the throughput per 858 mm2 (TPM) obtained for CMOL based architectures is 32 to 40 times better than the TPM for a CMOS based multiprocessor/multi-FPGA system, and almost 2000 times better than the TPM for a PC
Large-scale matrix-handling subroutines 'ATLAS'

International Nuclear Information System (INIS)

Tsunematsu, Toshihide; Takeda, Tatsuoki; Fujita, Keiichi; Matsuura, Toshihiko; Tahara, Nobuo

1978-03-01

Subroutine package ''ATLAS'' has been developed for handling large-scale matrices. The package is composed of four kinds of subroutines, i.e., basic arithmetic routines, routines for solving linear simultaneous equations and for solving general eigenvalue problems and utility routines. The subroutines are useful in large scale plasma-fluid simulations. (auth.)
Clinical utility of the Wechsler memory scale - fourth edition (WMS-IV) in patients with intractable temporal lobe epilepsy

NARCIS (Netherlands)

Bouman, Zita; Elhorst, Didi; Hendriks, Marc P H; Kessels, Roy P C; Aldenkamp, Albert P.

2016-01-01

Introduction: The Wechsler Memory Scale (WMS) is one of the most widely used test batteries to assess memory functions in patients with brain dysfunctions of different etiologies. This study examined the clinical validation of the Dutch Wechsler Memory Scale - Fourth Edition (WMS-IV-NL) in patients
Clinical utility of the Wechsler Memory Scale - Fourth Edition (WMS-IV) in patients with intractable temporal lobe epilepsy

NARCIS (Netherlands)

Bouman, Z.; Elhorst, D.; Hendriks, M.P.H.; Kessels, R.P.C.; Aldenkamp, A.P.

2016-01-01

Introduction: The Wechsler Memory Scale (WMS) is one of the most widely used test batteries to assess memory functions in patients with brain dysfunctions of different etiologies. This study examined the clinical validation of the Dutch Wechsler Memory Scale-Fourth Edition (WMS-IV-NL) in patients
Large-scale solar heat

Energy Technology Data Exchange (ETDEWEB)

Tolonen, J.; Konttinen, P.; Lund, P. [Helsinki Univ. of Technology, Otaniemi (Finland). Dept. of Engineering Physics and Mathematics

1998-12-31

In this project a large domestic solar heating system was built and a solar district heating system was modelled and simulated. Objectives were to improve the performance and reduce costs of a large-scale solar heating system. As a result of the project the benefit/cost ratio can be increased by 40 % through dimensioning and optimising the system at the designing stage. (orig.)
A measurement-based performability model for a multiprocessor system

Science.gov (United States)

Ilsueh, M. C.; Iyer, Ravi K.; Trivedi, K. S.

1987-01-01

A measurement-based performability model based on real error-data collected on a multiprocessor system is described. Model development from the raw errror-data to the estimation of cumulative reward is described. Both normal and failure behavior of the system are characterized. The measured data show that the holding times in key operational and failure states are not simple exponential and that semi-Markov process is necessary to model the system behavior. A reward function, based on the service rate and the error rate in each state, is then defined in order to estimate the performability of the system and to depict the cost of different failure types and recovery procedures.
Economic Model Predictive Control for Large-Scale and Distributed Energy Systems

DEFF Research Database (Denmark)

Standardi, Laura

Sources (RESs) in the smart grids is increasing. These energy sources bring uncertainty to the production due to their fluctuations. Hence,smart grids need suitable control systems that are able to continuously balance power production and consumption. We apply the Economic Model Predictive Control (EMPC......) strategy to optimise the economic performances of the energy systems and to balance the power production and consumption. In the case of large-scale energy systems, the electrical grid connects a high number of power units. Because of this, the related control problem involves a high number of variables......In this thesis, we consider control strategies for large and distributed energy systems that are important for the implementation of smart grid technologies. An electrical grid has to ensure reliability and avoid long-term interruptions in the power supply. Moreover, the share of Renewable Energy...
Geometric Algorithms for Private-Cache Chip Multiprocessors

DEFF Research Database (Denmark)

Ajwani, Deepak; Sitchinava, Nodari; Zeh, Norbert

2010-01-01

-D convex hulls. These results are obtained by analyzing adaptations of either the PEM merge sort algorithm or PRAM algorithms. For the second group of problems—orthogonal line segment intersection reporting, batched range reporting, and related problems—more effort is required. What distinguishes......We study techniques for obtaining efficient algorithms for geometric problems on private-cache chip multiprocessors. We show how to obtain optimal algorithms for interval stabbing counting, 1-D range counting, weighted 2-D dominance counting, and for computing 3-D maxima, 2-D lower envelopes, and 2...... these problems from the ones in the previous group is the variable output size, which requires I/O-efficient load balancing strategies based on the contribution of the individual input elements to the output size. To obtain nearly optimal algorithms for these problems, we introduce a parallel distribution...
Probes of large-scale structure in the Universe

International Nuclear Information System (INIS)

Suto, Yasushi; Gorski, K.; Juszkiewicz, R.; Silk, J.

1988-01-01

Recent progress in observational techniques has made it possible to confront quantitatively various models for the large-scale structure of the Universe with detailed observational data. We develop a general formalism to show that the gravitational instability theory for the origin of large-scale structure is now capable of critically confronting observational results on cosmic microwave background radiation angular anisotropies, large-scale bulk motions and large-scale clumpiness in the galaxy counts. (author)
Modeling the behaviour of shape memory materials under large deformations

Science.gov (United States)

Rogovoy, A. A.; Stolbova, O. S.

2017-06-01

In this study, the models describing the behavior of shape memory alloys, ferromagnetic materials and polymers have been constructed, using a formalized approach to develop the constitutive equations for complex media under large deformations. The kinematic and constitutive equations, satisfying the principles of thermodynamics and objectivity, have been derived. The application of the Galerkin procedure to the systems of equations of solid mechanics allowed us to obtain the Lagrange variational equation and variational formulation of the magnetostatics problems. These relations have been tested in the context of the problems of finite deformation in shape memory alloys and ferromagnetic materials during forward and reverse martensitic transformations and in shape memory polymers during forward and reverse relaxation transitions from a highly elastic to a glassy state.
Event management for large scale event-driven digital hardware spiking neural networks.

Science.gov (United States)

Caron, Louis-Charles; D'Haene, Michiel; Mailhot, Frédéric; Schrauwen, Benjamin; Rouat, Jean

2013-09-01

The interest in brain-like computation has led to the design of a plethora of innovative neuromorphic systems. Individually, spiking neural networks (SNNs), event-driven simulation and digital hardware neuromorphic systems get a lot of attention. Despite the popularity of event-driven SNNs in software, very few digital hardware architectures are found. This is because existing hardware solutions for event management scale badly with the number of events. This paper introduces the structured heap queue, a pipelined digital hardware data structure, and demonstrates its suitability for event management. The structured heap queue scales gracefully with the number of events, allowing the efficient implementation of large scale digital hardware event-driven SNNs. The scaling is linear for memory, logarithmic for logic resources and constant for processing time. The use of the structured heap queue is demonstrated on a field-programmable gate array (FPGA) with an image segmentation experiment and a SNN of 65,536 neurons and 513,184 synapses. Events can be processed at the rate of 1 every 7 clock cycles and a 406×158 pixel image is segmented in 200 ms. Copyright © 2013 Elsevier Ltd. All rights reserved.
Large-scale grid management; Storskala Nettforvaltning

Energy Technology Data Exchange (ETDEWEB)

Langdal, Bjoern Inge; Eggen, Arnt Ove

2003-07-01

The network companies in the Norwegian electricity industry now have to establish a large-scale network management, a concept essentially characterized by (1) broader focus (Broad Band, Multi Utility,...) and (2) bigger units with large networks and more customers. Research done by SINTEF Energy Research shows so far that the approaches within large-scale network management may be structured according to three main challenges: centralization, decentralization and out sourcing. The article is part of a planned series.
Data driven parallelism in experimental high energy physics applications

International Nuclear Information System (INIS)

Pohl, M.

1987-01-01

I present global design principles for the implementation of high energy physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of high energy physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordiate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms). (orig.)
Data driven parallelism in experimental high energy physics applications

Science.gov (United States)

Pohl, Martin

1987-08-01

I present global design principles for the implementation of High Energy Physics data analysis code on sequential and parallel processors with mixed shared and local memory. Potential parallelism in the structure of High Energy Physics tasks is identified with granularity varying from a few times 10 8 instructions all the way down to a few times 10 4 instructions. It follows the hierarchical structure of detector and data acquisition systems. To take advantage of this - yet preserving the necessary portability of the code - I propose a computational model with purely data driven concurrency in Single Program Multiple Data (SPMD) mode. The Task granularity is defined by varying the granularity of the central data structure manipulated. Concurrent processes coordinate themselves asynchroneously using simple lock constructs on parts of the data structure. Load balancing among processes occurs naturally. The scheme allows to map the internal layout of the data structure closely onto the layout of local and shared memory in a parallel architecture. It thus allows to optimize the application with respect to synchronization as well as data transport overheads. I present a coarse top level design for a portable implementation of this scheme on sequential machines, multiprocessor mainframes (e.g. IBM 3090), tightly coupled multiprocessors (e.g. RP-3) and loosely coupled processor arrays (e.g. LCAP, Emulating Processor Farms).
Japanese large-scale interferometers

CERN Document Server

Kuroda, K; Miyoki, S; Ishizuka, H; Taylor, C T; Yamamoto, K; Miyakawa, O; Fujimoto, M K; Kawamura, S; Takahashi, R; Yamazaki, T; Arai, K; Tatsumi, D; Ueda, A; Fukushima, M; Sato, S; Shintomi, T; Yamamoto, A; Suzuki, T; Saitô, Y; Haruyama, T; Sato, N; Higashi, Y; Uchiyama, T; Tomaru, T; Tsubono, K; Ando, M; Takamori, A; Numata, K; Ueda, K I; Yoneda, H; Nakagawa, K; Musha, M; Mio, N; Moriwaki, S; Somiya, K; Araya, A; Kanda, N; Telada, S; Sasaki, M; Tagoshi, H; Nakamura, T; Tanaka, T; Ohara, K

2002-01-01

The objective of the TAMA 300 interferometer was to develop advanced technologies for kilometre scale interferometers and to observe gravitational wave events in nearby galaxies. It was designed as a power-recycled Fabry-Perot-Michelson interferometer and was intended as a step towards a final interferometer in Japan. The present successful status of TAMA is presented. TAMA forms a basis for LCGT (large-scale cryogenic gravitational wave telescope), a 3 km scale cryogenic interferometer to be built in the Kamioka mine in Japan, implementing cryogenic mirror techniques. The plan of LCGT is schematically described along with its associated R and D.
Large-Scale Brain Network Coupling Predicts Total Sleep Deprivation Effects on Cognitive Capacity.

Directory of Open Access Journals (Sweden)

Yu Lei

Full Text Available Interactions between large-scale brain networks have received most attention in the study of cognitive dysfunction of human brain. In this paper, we aimed to test the hypothesis that the coupling strength of large-scale brain networks will reflect the pressure for sleep and will predict cognitive performance, referred to as sleep pressure index (SPI. Fourteen healthy subjects underwent this within-subject functional magnetic resonance imaging (fMRI study during rested wakefulness (RW and after 36 h of total sleep deprivation (TSD. Self-reported scores of sleepiness were higher for TSD than for RW. A subsequent working memory (WM task showed that WM performance was lower after 36 h of TSD. Moreover, SPI was developed based on the coupling strength of salience network (SN and default mode network (DMN. Significant increase of SPI was observed after 36 h of TSD, suggesting stronger pressure for sleep. In addition, SPI was significantly correlated with both the visual analogue scale score of sleepiness and the WM performance. These results showed that alterations in SN-DMN coupling might be critical in cognitive alterations that underlie the lapse after TSD. Further studies may validate the SPI as a potential clinical biomarker to assess the impact of sleep deprivation.

Resource-sharing between internal maintenance and external selection modulates attentional capture by working memory content

Directory of Open Access Journals (Sweden)

Anastasia eKiyonaga

2014-08-01

Full Text Available It is unclear why and under what circumstances working memory (WM and attention interact. Here, we apply the logic of the time-based resource-sharing (TBRS model of WM (e.g., Barrouillet, Bernardin, & Camos, 2004 to explore the mixed findings of a separate, but related, literature that studies the guidance of visual attention by WM contents. Specifically, we hypothesize that the linkage between WM representations and visual attention is governed by a time-shared cognitive resource that alternately refreshes internal (WM and selects external (visual attention information. If this were the case, WM content should guide visual attention (involuntarily, but only when there is time for it to be refreshed in an internal focus of attention. To provide an initial test for this hypothesis, we examined whether the amount of unoccupied time during a WM delay could impact the magnitude of attentional capture by WM contents. Participants were presented with a series of visual search trials while they maintained a WM cue for a delayed-recognition test. WM cues could coincide with the search target, a distracter, or neither. We varied both the number of searches to be performed, and the amount of available time to perform them. Slowing of visual search by a WM matching distracter—and facilitation by a matching target—were curtailed when the delay was filled with fast-paced (refreshing-preventing search trials, as was subsequent memory probe accuracy. WM content may, therefore, only capture visual attention when it can be refreshed, suggesting that internal (WM and external attention demands reciprocally impact one another because they share a limited resource. The TBRS rationale can thus be applied in a novel context to explain why WM contents capture attention, and under what conditions that effect should be observed.
Development of an early memories of warmth and safeness scale and its relationship to psychopathology.

Science.gov (United States)

Richter, A; Gilbert, P; McEwan, K

2009-06-01

Experiences of early childhood have a major impact on physiological, psychological, and social aspects of maturation and functioning. One avenue of work explores the recall and memory of positive or negative rearing experiences and their association with psychopathology measures. However, while many self-report studies have focused on the recall of parental behaviours this study developed a new measure called the early memories of warmth and safeness scale (EMWSS), which focuses on recall of one's own inner positive feelings, emotions and experiences in childhood. Student participants (N = 180) completed the new scale and a series of self-report scales measuring different types of early recall, psychopathology, types of positive affect, and self-criticism/reassurance. The EMWSS was found to have good psychometric properties and reliability. Recall of parental behaviour and recall of positive emotional memories were highly related, but recall of positive emotional memories was a better predictor of psychopathology, styles of self-criticism/self-reassurance and disposition to experience positive affect, than recall of parental behaviour.
The shared and unique values of optical, fluorescence, thermal and microwave satellite data for estimating large-scale crop yields

Science.gov (United States)

Large-scale crop monitoring and yield estimation are important for both scientific research and practical applications. Satellite remote sensing provides an effective means for regional and global cropland monitoring, particularly in data-sparse regions that lack reliable ground observations and rep...
Cooperative Data Sharing: Simple Support for Clusters of SMP Nodes

Science.gov (United States)

DiNucci, David C.; Balley, David H. (Technical Monitor)

1997-01-01

Libraries like PVM and MPI send typed messages to allow for heterogeneous cluster computing. Lower-level libraries, such as GAM, provide more efficient access to communication by removing the need to copy messages between the interface and user space in some cases. still lower-level interfaces, such as UNET, get right down to the hardware level to provide maximum performance. However, these are all still interfaces for passing messages from one process to another, and have limited utility in a shared-memory environment, due primarily to the fact that message passing is just another term for copying. This drawback is made more pertinent by today's hybrid architectures (e.g. clusters of SMPs), where it is difficult to know beforehand whether two communicating processes will share memory. As a result, even portable language tools (like HPF compilers) must either map all interprocess communication, into message passing with the accompanying performance degradation in shared memory environments, or they must check each communication at run-time and implement the shared-memory case separately for efficiency. Cooperative Data Sharing (CDS) is a single user-level API which abstracts all communication between processes into the sharing and access coordination of memory regions, in a model which might be described as "distributed shared messages" or "large-grain distributed shared memory". As a result, the user programs to a simple latency-tolerant abstract communication specification which can be mapped efficiently to either a shared-memory or message-passing based run-time system, depending upon the available architecture. Unlike some distributed shared memory interfaces, the user still has complete control over the assignment of data to processors, the forwarding of data to its next likely destination, and the queuing of data until it is needed, so even the relatively high latency present in clusters can be accomodated. CDS does not require special use of an MMU, which
Incremental validity of the MMPI-2-RF over-reporting scales and RBS in assessing the veracity of memory complaints.

Science.gov (United States)

Gervais, Roger O; Ben-Porath, Yossef S; Wygant, Dustin B; Sellbom, Martin

2010-06-01

The Response Bias Scale (RBS) has been found to be a better predictor of over-reported memory complaints than Minnesota Multiphasic Personality Inventory-2 (MMPI-2) F, Back Infrequency (Fb), Infrequency-Psychopathology (Fp), and FBS scales. The MMPI-2-Restructured Form (RF) validity scales were designed to meet or exceed the sensitivity of their MMPI-2 counterparts to symptom over-reporting. This study examined the incremental validity of MMPI-2-RF validity scales and RBS in assessing memory complaints. The MMPI-2-RF over-reporting validity scales were more strongly associated with mean Memory Complaints Inventory scores than their MMPI-2 counterparts (d = 0.22 to 0.49). RBS showed the strongest relationship with memory complaints. Regression analyses demonstrated the incremental validity of the MMPI-2-RF Infrequent Responses, Infrequent Psychopathology Responses, Infrequent Somatic Responses, and FBS-r scales relative to MMPI-2 F, Fp, and FBS in predicting memory complaints. This is consistent with the development objectives of the MMPI-2-RF validity scales as more efficient and sensitive measures of symptom over-reporting.
Distinct and shared cognitive functions mediate event- and time-based prospective memory impairment in normal ageing

Science.gov (United States)

Gonneaud, Julie; Kalpouzos, Grégoria; Bon, Laetitia; Viader, Fausto; Eustache, Francis; Desgranges, Béatrice

2011-01-01

Prospective memory (PM) is the ability to remember to perform an action at a specific point in the future. Regarded as multidimensional, PM involves several cognitive functions that are known to be impaired in normal aging. In the present study, we set out to investigate the cognitive correlates of PM impairment in normal aging. Manipulating cognitive load, we assessed event- and time-based PM, as well as several cognitive functions, including executive functions, working memory and retrospective episodic memory, in healthy subjects covering the entire adulthood. We found that normal aging was characterized by PM decline in all conditions and that event-based PM was more sensitive to the effects of aging than time-based PM. Whatever the conditions, PM was linked to inhibition and processing speed. However, while event-based PM was mainly mediated by binding and retrospective memory processes, time-based PM was mainly related to inhibition. The only distinction between high- and low-load PM cognitive correlates lays in an additional, but marginal, correlation between updating and the high-load PM condition. The association of distinct cognitive functions, as well as shared mechanisms with event- and time-based PM confirms that each type of PM relies on a different set of processes. PMID:21678154
An original approach to data acquisition CHADAC

CERN Document Server

CERN. Geneva

1981-01-01

Many labs try to boost existing data acquisition systems by inserting high performance intelligent devices in the important nodes of the system's structure. This strategy finds its limits in the system's architecture. The CHADAC project proposes a simple and efficient solution to this problem, using a multiprocessor modular architecture. CHADAC main features are: parallel acquisition of data; CHADAC is fast, it dedicates one processor per branch and each processor can read and store one 16 bit word in 800 ns; original structure; each processor can work in its own private memory, in its own shared memory (double access) and in the shared memory of any other processor. Simple and fast communications between processors are also provided by local DMAs; flexibility; each processor is autonomous and may be used as an independent acquisition system for a branch, by connecting local peripherals to it. Adjunction of fast trigger logic is possible. By its architecture and performances, CHADAC is designed to provide a g...
Large scale model testing

International Nuclear Information System (INIS)

Brumovsky, M.; Filip, R.; Polachova, H.; Stepanek, S.

1989-01-01

Fracture mechanics and fatigue calculations for WWER reactor pressure vessels were checked by large scale model testing performed using large testing machine ZZ 8000 (with a maximum load of 80 MN) at the SKODA WORKS. The results are described from testing the material resistance to fracture (non-ductile). The testing included the base materials and welded joints. The rated specimen thickness was 150 mm with defects of a depth between 15 and 100 mm. The results are also presented of nozzles of 850 mm inner diameter in a scale of 1:3; static, cyclic, and dynamic tests were performed without and with surface defects (15, 30 and 45 mm deep). During cyclic tests the crack growth rate in the elastic-plastic region was also determined. (author). 6 figs., 2 tabs., 5 refs
Division of attention as a function of the number of steps, visual shifts, and memory load

Science.gov (United States)

Chechile, R. A.; Butler, K.; Gutowski, W.; Palmer, E. A.

1986-01-01

The effects on divided attention of visual shifts and long-term memory retrieval during a monitoring task are considered. A concurrent vigilance task was standardized under all experimental conditions. The results show that subjects can perform nearly perfectly on all of the time-shared tasks if long-term memory retrieval is not required for monitoring. With the requirement of memory retrieval, however, there was a large decrease in accuracy for all of the time-shared activities. It was concluded that the attentional demand of longterm memory retrieval is appreciable (even for a well-learned motor sequence), and thus memory retrieval results in a sizable reduction in the capability of subjects to divide their attention. A selected bibliography on the divided attention literature is provided.
Translation and cross-cultural adaptation of the Brazilian Portuguese version of the Emotional Memory Scale.

Science.gov (United States)

Fijtman, Adam; Czepielewski, Letícia Sanguinetti; Souza, Ana Cláudia Mércio Loredo; Felder, Paul; Kauer-Sant'Anna, Marcia; Bücker, Joana

2018-03-01

Background Emotional memory is an important type of memory that is triggered by positive and negative emotions. It is characterized by an enhanced memory for emotional stimuli which is usually coupled with a decrease in memory of neutral preceding events. Emotional memory is strongly associated with amygdala function and therefore could be disrupted in neuropsychiatric disorders. To our knowledge, there is no translated and culturally adapted instrument for the Brazilian Portuguese speaking population to assess emotional memory. Objective To report the translation and cross-cultural adaptation of a Brazilian Portuguese version of the Emotional Memory Scale, originally published by Strange et al. in 2003. Methods The author of the original scale provided 36 lists with 16 words each. Translation was performed by three independent bilingual translators. Healthy subjects assessed how semantically related each word was within the list (0 to 10) and what the emotional valence of each word was (-6 to +6). Lists without negative words were excluded (negative selection), most positive and most unrelated words were excluded (positive and semantic selection, respectively), and lists with low semantic relationship were excluded (semantic assessment). Results Five lists were excluded during negative selection, four words from each list were excluded in positive and semantic selection, and 11 lists were excluded during semantic assessment. Finally, we reached 20 lists of semantically related words; each list had one negative word and 11 neutral words. Conclusion A scale is now available to evaluate emotional memory in the Brazilian population and requires further validation on its psychometrics properties.
Why small-scale cannabis growers stay small: five mechanisms that prevent small-scale growers from going large scale.

Science.gov (United States)

Hammersvik, Eirik; Sandberg, Sveinung; Pedersen, Willy

2012-11-01

Over the past 15-20 years, domestic cultivation of cannabis has been established in a number of European countries. New techniques have made such cultivation easier; however, the bulk of growers remain small-scale. In this study, we explore the factors that prevent small-scale growers from increasing their production. The study is based on 1 year of ethnographic fieldwork and qualitative interviews conducted with 45 Norwegian cannabis growers, 10 of whom were growing on a large-scale and 35 on a small-scale. The study identifies five mechanisms that prevent small-scale indoor growers from going large-scale. First, large-scale operations involve a number of people, large sums of money, a high work-load and a high risk of detection, and thus demand a higher level of organizational skills than for small growing operations. Second, financial assets are needed to start a large 'grow-site'. Housing rent, electricity, equipment and nutrients are expensive. Third, to be able to sell large quantities of cannabis, growers need access to an illegal distribution network and knowledge of how to act according to black market norms and structures. Fourth, large-scale operations require advanced horticultural skills to maximize yield and quality, which demands greater skills and knowledge than does small-scale cultivation. Fifth, small-scale growers are often embedded in the 'cannabis culture', which emphasizes anti-commercialism, anti-violence and ecological and community values. Hence, starting up large-scale production will imply having to renegotiate or abandon these values. Going from small- to large-scale cannabis production is a demanding task-ideologically, technically, economically and personally. The many obstacles that small-scale growers face and the lack of interest and motivation for going large-scale suggest that the risk of a 'slippery slope' from small-scale to large-scale growing is limited. Possible political implications of the findings are discussed. Copyright
Distributed large-scale dimensional metrology new insights

CERN Document Server

Franceschini, Fiorenzo; Maisano, Domenico

2011-01-01

Focuses on the latest insights into and challenges of distributed large scale dimensional metrology Enables practitioners to study distributed large scale dimensional metrology independently Includes specific examples of the development of new system prototypes
Neighborhood Discriminant Hashing for Large-Scale Image Retrieval.

Science.gov (United States)

Tang, Jinhui; Li, Zechao; Wang, Meng; Zhao, Ruizhen

2015-09-01

With the proliferation of large-scale community-contributed images, hashing-based approximate nearest neighbor search in huge databases has aroused considerable interest from the fields of computer vision and multimedia in recent years because of its computational and memory efficiency. In this paper, we propose a novel hashing method named neighborhood discriminant hashing (NDH) (for short) to implement approximate similarity search. Different from the previous work, we propose to learn a discriminant hashing function by exploiting local discriminative information, i.e., the labels of a sample can be inherited from the neighbor samples it selects. The hashing function is expected to be orthogonal to avoid redundancy in the learned hashing bits as much as possible, while an information theoretic regularization is jointly exploited using maximum entropy principle. As a consequence, the learned hashing function is compact and nonredundant among bits, while each bit is highly informative. Extensive experiments are carried out on four publicly available data sets and the comparison results demonstrate the outperforming performance of the proposed NDH method over state-of-the-art hashing techniques.
Recall of false memories in individuals scoring high in schizotypy: memory distortions are scale specific.

Science.gov (United States)

Saunders, Jo; Randell, Jordan; Reed, Phil

2012-06-01

Previous research has indicated abnormal semantic activation in individuals scoring higher in schizotypy. In the current experiment, semantic activation was examined by using the Deese-Roediger-McDermott paradigm of false memories. Participants were assessed for schizotypy using the Oxford-Liverpool Inventory of Feelings (OLIFE). Participants studied lists of semantically related words in which a critical and highly associated word was absent. Participants then recalled the list. Participants high in Unusual Experiences and Cognitive Disorganization recalled more critical non-presented words, weakly related studied words, and fewer studied words than participants who scored low on these measures. Previous research using the cognitive-perceptual factor of the Schizotypy Personality Questionnaire found reduced false memories, while the Unusual Experiences subscale of the OLIFE was associated with more false memories. Both scales cover similar unusual perceptual experiences and it is unclear why they led to divergent results. The findings suggest that subtypes of schizotypy are associated with abnormal semantic activation. Copyright © 2011 Elsevier Ltd. All rights reserved.
Temporal Organization of Sound Information in Auditory Memory

Directory of Open Access Journals (Sweden)

Kun Song

2017-06-01

Full Text Available Memory is a constructive and organizational process. Instead of being stored with all the fine details, external information is reorganized and structured at certain spatiotemporal scales. It is well acknowledged that time plays a central role in audition by segmenting sound inputs into temporal chunks of appropriate length. However, it remains largely unknown whether critical temporal structures exist to mediate sound representation in auditory memory. To address the issue, here we designed an auditory memory transferring study, by combining a previously developed unsupervised white noise memory paradigm with a reversed sound manipulation method. Specifically, we systematically measured the memory transferring from a random white noise sound to its locally temporal reversed version on various temporal scales in seven experiments. We demonstrate a U-shape memory-transferring pattern with the minimum value around temporal scale of 200 ms. Furthermore, neither auditory perceptual similarity nor physical similarity as a function of the manipulating temporal scale can account for the memory-transferring results. Our results suggest that sounds are not stored with all the fine spectrotemporal details but are organized and structured at discrete temporal chunks in long-term auditory memory representation.
Temporal Organization of Sound Information in Auditory Memory.

Science.gov (United States)

Song, Kun; Luo, Huan

2017-01-01

Memory is a constructive and organizational process. Instead of being stored with all the fine details, external information is reorganized and structured at certain spatiotemporal scales. It is well acknowledged that time plays a central role in audition by segmenting sound inputs into temporal chunks of appropriate length. However, it remains largely unknown whether critical temporal structures exist to mediate sound representation in auditory memory. To address the issue, here we designed an auditory memory transferring study, by combining a previously developed unsupervised white noise memory paradigm with a reversed sound manipulation method. Specifically, we systematically measured the memory transferring from a random white noise sound to its locally temporal reversed version on various temporal scales in seven experiments. We demonstrate a U-shape memory-transferring pattern with the minimum value around temporal scale of 200 ms. Furthermore, neither auditory perceptual similarity nor physical similarity as a function of the manipulating temporal scale can account for the memory-transferring results. Our results suggest that sounds are not stored with all the fine spectrotemporal details but are organized and structured at discrete temporal chunks in long-term auditory memory representation.
Resource Allocation Model for Modelling Abstract RTOS on Multiprocessor System-on-Chip

DEFF Research Database (Denmark)

Virk, Kashif Munir; Madsen, Jan

2003-01-01

Resource Allocation is an important problem in RTOS's, and has been an active area of research. Numerous approaches have been developed and many different techniques have been combined for a wide range of applications. In this paper, we address the problem of resource allocation in the context...... of modelling an abstract RTOS on multiprocessor SoC platforms. We discuss the implementation details of a simplified basic priority inheritance protocol for our abstract system model in SystemC....
Decoding Synteny Blocks and Large-Scale Duplications in Mammalian and Plant Genomes

Science.gov (United States)

Peng, Qian; Alekseyev, Max A.; Tesler, Glenn; Pevzner, Pavel A.

The existing synteny block reconstruction algorithms use anchors (e.g., orthologous genes) shared over all genomes to construct the synteny blocks for multiple genomes. This approach, while efficient for a few genomes, cannot be scaled to address the need to construct synteny blocks in many mammalian genomes that are currently being sequenced. The problem is that the number of anchors shared among all genomes quickly decreases with the increase in the number of genomes. Another problem is that many genomes (plant genomes in particular) had extensive duplications, which makes decoding of genomic architecture and rearrangement analysis in plants difficult. The existing synteny block generation algorithms in plants do not address the issue of generating non-overlapping synteny blocks suitable for analyzing rearrangements and evolution history of duplications. We present a new algorithm based on the A-Bruijn graph framework that overcomes these difficulties and provides a unified approach to synteny block reconstruction for multiple genomes, and for genomes with large duplications.
Enabling High Performance Large Scale Dense Problems through KBLAS

KAUST Repository

Abdelfattah, Ahmad

2014-05-04

KBLAS (KAUST BLAS) is a small library that provides highly optimized BLAS routines on systems accelerated with GPUs. KBLAS is entirely written in CUDA C, and targets NVIDIA GPUs with compute capability 2.0 (Fermi) or higher. The current focus is on level-2 BLAS routines, namely the general matrix vector multiplication (GEMV) kernel, and the symmetric/hermitian matrix vector multiplication (SYMV/HEMV) kernel. KBLAS provides these two kernels in all four precisions (s, d, c, and z), with support to multi-GPU systems. Through advanced optimization techniques that target latency hiding and pushing memory bandwidth to the limit, KBLAS outperforms state-of-the-art kernels by 20-90% improvement. Competitors include CUBLAS-5.5, MAGMABLAS-1.4.0, and CULAR17. The SYMV/HEMV kernel from KBLAS has been adopted by NVIDIA, and should appear in CUBLAS-6.0. KBLAS has been used in large scale simulations of multi-object adaptive optics.
Parallel Optimization of Polynomials for Large-scale Problems in Stability and Control

Science.gov (United States)

Kamyar, Reza

In this thesis, we focus on some of the NP-hard problems in control theory. Thanks to the converse Lyapunov theory, these problems can often be modeled as optimization over polynomials. To avoid the problem of intractability, we establish a trade off between accuracy and complexity. In particular, we develop a sequence of tractable optimization problems --- in the form of Linear Programs (LPs) and/or Semi-Definite Programs (SDPs) --- whose solutions converge to the exact solution of the NP-hard problem. However, the computational and memory complexity of these LPs and SDPs grow exponentially with the progress of the sequence - meaning that improving the accuracy of the solutions requires solving SDPs with tens of thousands of decision variables and constraints. Setting up and solving such problems is a significant challenge. The existing optimization algorithms and software are only designed to use desktop computers or small cluster computers --- machines which do not have sufficient memory for solving such large SDPs. Moreover, the speed-up of these algorithms does not scale beyond dozens of processors. This in fact is the reason we seek parallel algorithms for setting-up and solving large SDPs on large cluster- and/or super-computers. We propose parallel algorithms for stability analysis of two classes of systems: 1) Linear systems with a large number of uncertain parameters; 2) Nonlinear systems defined by polynomial vector fields. First, we develop a distributed parallel algorithm which applies Polya's and/or Handelman's theorems to some variants of parameter-dependent Lyapunov inequalities with parameters defined over the standard simplex. The result is a sequence of SDPs which possess a block-diagonal structure. We then develop a parallel SDP solver which exploits this structure in order to map the computation, memory and communication to a distributed parallel environment. Numerical tests on a supercomputer demonstrate the ability of the algorithm to

Some links on this page may take you to non-federal websites. Their policies may differ from this site.