heterogeneous multi-processor systems: Topics by WorldWideScience.org

Sample records for heterogeneous multi-processor systems

Recommending the heterogeneous cluster type multi-processor system computing

International Nuclear Information System (INIS)

Iijima, Nobukazu

2010-01-01

Real-time reactor simulator had been developed by reusing the equipment of the Musashi reactor and its performance improvement became indispensable for research tools to increase sampling rate with introduction of arithmetic units using multi-Digital Signal Processor(DSP) system (cluster). In order to realize the heterogeneous cluster type multi-processor system computing, combination of two kinds of Control Processor (CP) s, Cluster Control Processor (CCP) and System Control Processor (SCP), were proposed with Large System Control Processor (LSCP) for hierarchical cluster if needed. Faster computing performance of this system was well evaluated by simulation results for simultaneous execution of plural jobs and also pipeline processing between clusters, which showed the system led to effective use of existing system and enhancement of the cost performance. (T. Tanaka)
Network Coding on Heterogeneous Multi-Core Processors for Wireless Sensor Networks

Science.gov (United States)

Kim, Deokho; Park, Karam; Ro, Won W.

2011-01-01

While network coding is well known for its efficiency and usefulness in wireless sensor networks, the excessive costs associated with decoding computation and complexity still hinder its adoption into practical use. On the other hand, high-performance microprocessors with heterogeneous multi-cores would be used as processing nodes of the wireless sensor networks in the near future. To this end, this paper introduces an efficient network coding algorithm developed for the heterogenous multi-core processors. The proposed idea is fully tested on one of the currently available heterogeneous multi-core processors referred to as the Cell Broadband Engine. PMID:22164053
Energy-aware Thread and Data Management in Heterogeneous Multi-core, Multi-memory Systems

Energy Technology Data Exchange (ETDEWEB)

Su, Chun-Yi [Virginia Polytechnic Inst. and State Univ. (Virginia Tech), Blacksburg, VA (United States)

2014-12-16

By 2004, microprocessor design focused on multicore scaling—increasing the number of cores per die in each generation—as the primary strategy for improving performance. These multicore processors typically equip multiple memory subsystems to improve data throughput. In addition, these systems employ heterogeneous processors such as GPUs and heterogeneous memories like non-volatile memory to improve performance, capacity, and energy efficiency. With the increasing volume of hardware resources and system complexity caused by heterogeneity, future systems will require intelligent ways to manage hardware resources. Early research to improve performance and energy efficiency on heterogeneous, multi-core, multi-memory systems focused on tuning a single primitive or at best a few primitives in the systems. The key limitation of past efforts is their lack of a holistic approach to resource management that balances the tradeoff between performance and energy consumption. In addition, the shift from simple, homogeneous systems to these heterogeneous, multicore, multi-memory systems requires in-depth understanding of efficient resource management for scalable execution, including new models that capture the interchange between performance and energy, smarter resource management strategies, and novel low-level performance/energy tuning primitives and runtime systems. Tuning an application to control available resources efficiently has become a daunting challenge; managing resources in automation is still a dark art since the tradeoffs among programming, energy, and performance remain insufficiently understood. In this dissertation, I have developed theories, models, and resource management techniques to enable energy-efficient execution of parallel applications through thread and data management in these heterogeneous multi-core, multi-memory systems. I study the effect of dynamic concurrent throttling on the performance and energy of multi-core, non-uniform memory access
An FPGA design flow for reconfigurable network-based multi-processor systems on chip

NARCIS (Netherlands)

Kumar, A.; Hansson, M.A; Huisken, J.; Corporaal, H.

2007-01-01

Multi-processor systems on chip (MPSoC) platforms are becoming increasingly more heterogeneous and are shifting towards a more communication-centric methodology. Networks on chip (NoC) have emerged as the design paradigm for scalable on-chip communication architectures. As the system complexity
Coordinated Energy Management in Heterogeneous Processors

Directory of Open Access Journals (Sweden)

Indrani Paul

2014-01-01

Full Text Available This paper examines energy management in a heterogeneous processor consisting of an integrated CPU–GPU for high-performance computing (HPC applications. Energy management for HPC applications is challenged by their uncompromising performance requirements and complicated by the need for coordinating energy management across distinct core types – a new and less understood problem. We examine the intra-node CPU–GPU frequency sensitivity of HPC applications on tightly coupled CPU–GPU architectures as the first step in understanding power and performance optimization for a heterogeneous multi-node HPC system. The insights from this analysis form the basis of a coordinated energy management scheme, called DynaCo, for integrated CPU–GPU architectures. We implement DynaCo on a modern heterogeneous processor and compare its performance to a state-of-the-art power- and performance-management algorithm. DynaCo improves measured average energy-delay squared (ED2 product by up to 30% with less than 2% average performance loss across several exascale and other HPC workloads.
Multi-processor data acquisition and monitoring systems for particle physics

International Nuclear Information System (INIS)

White, V.; Burch, B.; Eng, K.; Heinicke, P.; Pyatetsky, M.; Ritchie, D.

1983-01-01

A high speed distributed processing system, using PDP-11 and VAX processors, is being developed at Fermilab. The acquisition of data is done using one or more PDP-11s. Additional processors are connected to provide either data logging or extra data analysis capabilities. Within this framework, functional interchangeability of PDP-11 and VAX processors and of the PDP-11 operating systems, RT-11 and RSX-11M, has been maintained. Inter-processor connections have been implemented in a general way using the 5 megabit DR11-W hardware currently selected for the purpose. Using this approach the authors have been able to make use of several existing data acquisition and analysis packages, such as RT/MULTI, in a multi-processor system
A Heterogeneous Multi-core Architecture with a Hardware Kernel for Control Systems

DEFF Research Database (Denmark)

Li, Gang; Guan, Wei; Sierszecki, Krzysztof

2012-01-01

Rapid industrialisation has resulted in a demand for improved embedded control systems with features such as predictability, high processing performance and low power consumption. Software kernel implementation on a single processor is becoming more difficult to satisfy those constraints. This pa......Rapid industrialisation has resulted in a demand for improved embedded control systems with features such as predictability, high processing performance and low power consumption. Software kernel implementation on a single processor is becoming more difficult to satisfy those constraints......). Second, a heterogeneous multi-core architecture is investigated, focusing on its performance in relation to hard real-time constraints and predictable behavior. Third, the hardware implementation of HARTEX is designated to support the heterogeneous multi-core architecture. This hardware kernel has...... several advantages over a similar kernel implemented in software: higher-speed processing capability, parallel computation, and separation between the kernel itself and the applications being run. A microbenchmark has been used to compare the hardware kernel with the software kernel, and compare...
A survey of Tumult, a real-time multi-processor system

International Nuclear Information System (INIS)

Jansen, P.G.

1986-01-01

Tumult (Twente University MULTi processor system) is the name of an ongoing project aiming at the design and implementation of a modular extendible multiprocessor system. All memory is distributed and processors communicate in parallel via a fast and reliable local switching network instead of a shared bus. A distributed real-time operating system is being designed and implemented, consisting of a multi-tasking subsystem per processor. Processes can communicate via a message passing mechanism. Communication links and processes are dynamically created and disposed by the application. In this article a brief description of the system is given; communication aspects are emphasized. (Auth.)
Multi-Core Processor Memory Contention Benchmark Analysis Case Study

Science.gov (United States)

Simon, Tyler; McGalliard, James

2009-01-01

Multi-core processors dominate current mainframe, server, and high performance computing (HPC) systems. This paper provides synthetic kernel and natural benchmark results from an HPC system at the NASA Goddard Space Flight Center that illustrate the performance impacts of multi-core (dual- and quad-core) vs. single core processor systems. Analysis of processor design, application source code, and synthetic and natural test results all indicate that multi-core processors can suffer from significant memory subsystem contention compared to similar single-core processors.
Efficient Execution of Video Applications on Heterogeneous Multi- and Many-Core Processors

NARCIS (Netherlands)

Pereira de Azevedo Filho, A.

2011-01-01

In this dissertation we present methodologies and evaluations aiming at increasing the efficiency of video coding applications for heterogeneous many-core processors composed of SIMD-only, scratchpad memory based cores. Our contributions are spread in three different fronts: thread-level parallelism
Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

Science.gov (United States)

Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

2017-11-01

Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
Commodity multi-processor systems in the ATLAS level-2 trigger

International Nuclear Information System (INIS)

Abolins, M.; Blair, R.; Bock, R.; Bogaerts, A.; Dawson, J.; Ermoline, Y.; Hauser, R.; Kugel, A.; Lay, R.; Muller, M.; Noffz, K.-H.; Pope, B.; Schlereth, J.; Werner, P.

2000-01-01

Low cost SMP (Symmetric Multi-Processor) systems provide substantial CPU and I/O capacity. These features together with the ease of system integration make them an attractive and cost effective solution for a number of real-time applications in event selection. In ATLAS the authors consider them as intelligent input buffers (active ROB complex), as event flow supervisors or as powerful processing nodes. Measurements of the performance of one off-the-shelf commercial 4-processor PC with two PCI buses, equipped with commercial FPGA based data source cards (microEnable) and running commercial software are presented and mapped on such applications together with a long-term program of work. The SMP systems may be considered as an important building block in future data acquisition systems
Commodity multi-processor systems in the ATLAS level-2 trigger

CERN Document Server

Abolins, M; Bock, R; Bogaerts, J A C; Dawson, J; Ermoline, Y; Hauser, R; Kugel, A; Lay, R; Müller, M; Noffz, K H; Pope, B; Schlereth, J L; Werner, P

2000-01-01

Low cost SMP (symmetric multi-processor) systems provide substantial CPU and I/O capacity. These features together with the ease of system integration make them an attractive and cost effective solution for a number of real-time applications in event selection. In ATLAS we consider them as intelligent input buffers (an "active" ROB complex), as event flow supervisors or as powerful processing nodes. Measurements of the performance of one off-the-shelf commercial 4- processor PC with two PCI buses, equipped with commercial FPGA based data source cards (microEnable) and running commercial software are presented and mapped on such applications together with a long-term programme of work. The SMP systems may be considered as an important building block in future data acquisition systems. (9 refs).
Multi-processor network implementations in Multibus II and VME

International Nuclear Information System (INIS)

Briegel, C.

1992-01-01

ACNET (Fermilab Accelerator Controls Network), a proprietary network protocol, is implemented in a multi-processor configuration for both Multibus II and VME. The implementations are contrasted by the bus protocol and software design goals. The Multibus II implementation provides for multiple processors running a duplicate set of tasks on each processor. For a network connected task, messages are distributed by a network round-robin scheduler. Further, messages can be stopped, continued, or re-routed for each task by user-callable commands. The VME implementation provides for multiple processors running one task across all processors. The process can either be fixed to a particular processor or dynamically allocated to an available processor depending on the scheduling algorithm of the multi-processing operating system. (author)
Hardware Synchronization for Embedded Multi-Core Processors

DEFF Research Database (Denmark)

Stoif, Christian; Schoeberl, Martin; Liccardi, Benito

2011-01-01

Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper, establi......Multi-core processors are about to conquer embedded systems — it is not the question of whether they are coming but how the architectures of the microcontrollers should look with respect to the strict requirements in the field. We present the step from one to multiple cores in this paper...
High-Level Design for Ultra-Fast Software Defined Radio Prototyping on Multi-Processors Heterogeneous Platforms

OpenAIRE

Moy , Christophe; Raulet , Mickaël

2010-01-01

International audience; The design of Software Defined Radio (SDR) equipments (terminals, base stations, etc.) is still very challenging. We propose here a design methodology for ultra-fast prototyping on heterogeneous platforms made of GPPs (General Purpose Processors), DSPs (Digital Signal Processors) and FPGAs (Field Programmable Gate Array). Lying on a component-based approach, the methodology mainly aims at automating as much as possible the design from an algorithmic validation to a mul...
Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures

Energy Technology Data Exchange (ETDEWEB)

Dong, Fengguang [Univ. of Tennessee, Knoxville, TN (United States); Tomov, Stanimire [Univ. of Tennessee, Knoxville, TN (United States); Dongarra, Jack [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

2011-06-01

We present a new methodology for utilizing all CPU cores and all GPUs on a heterogeneous multicore and multi-GPU system to support matrix computations e ciently. Our approach is able to achieve the objectives of a high degree of parallelism, minimized synchronization, minimized communication, and load balancing. Our main idea is to treat the heterogeneous system as a distributed-memory machine, and to use a heterogeneous 1-D block cyclic distribution to allocate data to the host system and GPUs to minimize communication. We have designed heterogeneous algorithms with two di erent tile sizes (one for CPU cores and the other for GPUs) to cope with processor heterogeneity. We propose an auto-tuning method to determine the best tile sizes to attain both high performance and load balancing. We have also implemented a new runtime system and applied it to the Cholesky and QR factorizations. Our experiments on a compute node with two Intel Westmere hexa-core CPUs and three Nvidia Fermi GPUs demonstrate good weak scalability, strong scalability, load balance, and e ciency of our approach.
VIRTUS: a multi-processor system in FASTBUS

International Nuclear Information System (INIS)

Ellett, J.; Jackson, R.; Ritter, R.; Schlein, P.; Yaeger, D.; Zweizig, J.

1986-01-01

VIRTUS is a system of parallel MC68000-based processors interconnected by FASTBUS that is used either on-line as an intelligent trigger component or off-line for full event processing. Each processor receives the complete set of data from one event. The host computer, a VAX 11/780, down-line loads all software to the processors, controls and monitors the functioning of all processors, and writes processed data to tape. Instructions, programs, and data are transferred among the processors and the host in the form of fixed format, variable length data blocks. (Auth.)
Safe and Efficient Support for Embeded Multi-Processors in ADA

Science.gov (United States)

Ruiz, Jose F.

2010-08-01

New software demands increasing processing power, and multi-processor platforms are spreading as the answer to achieve the required performance. Embedded real-time systems are also subject to this trend, but in the case of real-time mission-critical systems, the properties of reliability, predictability and analyzability are also paramount. The Ada 2005 language defined a subset of its tasking model, the Ravenscar profile, that provides the basis for the implementation of deterministic and time analyzable applications on top of a streamlined run-time system. This Ravenscar tasking profile, originally designed for single processors, has proven remarkably useful for modelling verifiable real-time single-processor systems. This paper proposes a simple extension to the Ravenscar profile to support multi-processor systems using a fully partitioned approach. The implementation of this scheme is simple, and it can be used to develop applications amenable to schedulability analysis.
Interference control by best-effort process duty-cycling in chip multi-processor systems for real-time medical image processing

NARCIS (Netherlands)

Westmijze, M.; Bekooij, Marco Jan Gerrit; Smit, Gerardus Johannes Maria

2013-01-01

Systems with chip multi-processors are currently used for several applications that have real-time requirements. In chip multi-processor architectures, many hardware resources such as parts of the cache hierarchy are shared between cores and by using such resources, applications can significantly

HardwareSoftware Co-design for Heterogeneous Multi-core Platforms The hArtes Toolchain

CERN Document Server

2012-01-01

This book describes the results and outcome of the FP6 project, known as hArtes, which focuses on the development of an integrated tool chain targeting a heterogeneous multi core platform comprising of a general purpose processor (ARM or powerPC), a DSP (the diopsis) and an FPGA. The tool chain takes existing source code and proposes transformations and mappings such that legacy code can easily be ported to a modern, multi-core platform. Benefits of the hArtes approach, described in this book, include: Uses a familiar programming paradigm: hArtes proposes a familiar programming paradigm which is compatible with the widely used programming practice, irrespective of the target platform. Enables users to view multiple cores as a single processor: the hArtes approach abstracts away the heterogeneity as well as the multi-core aspect of the underlying hardware so the developer can view the platform as consisting of a single, general purpose processor. Facilitates easy porting of existing applications: hArtes provid...
Design of Networks-on-Chip for Real-Time Multi-Processor Systems-on-Chip

DEFF Research Database (Denmark)

Sparsø, Jens

2012-01-01

This paper addresses the design of networks-on-chips for use in multi-processor systems-on-chips - the hardware platforms used in embedded systems. These platforms typically have to guarantee real-time properties, and as the network is a shared resource, it has to provide service guarantees...... (bandwidth and/or latency) to different communication flows. The paper reviews some past work in this field and the lessons learned, and the paper discusses ongoing research conducted as part of the project "Time-predictable Multi-Core Architecture for Embedded Systems" (T-CREST), supported by the European...
Computing effective properties of random heterogeneous materials on heterogeneous parallel processors

Science.gov (United States)

Leidi, Tiziano; Scocchi, Giulio; Grossi, Loris; Pusterla, Simone; D'Angelo, Claudio; Thiran, Jean-Philippe; Ortona, Alberto

2012-11-01

In recent decades, finite element (FE) techniques have been extensively used for predicting effective properties of random heterogeneous materials. In the case of very complex microstructures, the choice of numerical methods for the solution of this problem can offer some advantages over classical analytical approaches, and it allows the use of digital images obtained from real material samples (e.g., using computed tomography). On the other hand, having a large number of elements is often necessary for properly describing complex microstructures, ultimately leading to extremely time-consuming computations and high memory requirements. With the final objective of reducing these limitations, we improved an existing freely available FE code for the computation of effective conductivity (electrical and thermal) of microstructure digital models. To allow execution on hardware combining multi-core CPUs and a GPU, we first translated the original algorithm from Fortran to C, and we subdivided it into software components. Then, we enhanced the C version of the algorithm for parallel processing with heterogeneous processors. With the goal of maximizing the obtained performances and limiting resource consumption, we utilized a software architecture based on stream processing, event-driven scheduling, and dynamic load balancing. The parallel processing version of the algorithm has been validated using a simple microstructure consisting of a single sphere located at the centre of a cubic box, yielding consistent results. Finally, the code was used for the calculation of the effective thermal conductivity of a digital model of a real sample (a ceramic foam obtained using X-ray computed tomography). On a computer equipped with dual hexa-core Intel Xeon X5670 processors and an NVIDIA Tesla C2050, the parallel application version features near to linear speed-up progression when using only the CPU cores. It executes more than 20 times faster when additionally using the GPU.
Optimizing survivability of multi-state systems with multi-level protection by multi-processor genetic algorithm

International Nuclear Information System (INIS)

Levitin, Gregory; Dai Yuanshun; Xie Min; Leng Poh, Kim

2003-01-01

In this paper we consider vulnerable systems which can have different states corresponding to different combinations of available elements composing the system. Each state can be characterized by a performance rate, which is the quantitative measure of a system's ability to perform its task. Both the impact of external factors (stress) and internal causes (failures) affect system survivability, which is determined as probability of meeting a given demand. In order to increase the survivability of the system, a multi-level protection is applied to its subsystems. This means that a subsystem and its inner level of protection are in their turn protected by the protection of an outer level. This double-protected subsystem has its outer protection and so forth. In such systems, the protected subsystems can be destroyed only if all of the levels of their protection are destroyed. Each level of protection can be destroyed only if all of the outer levels of protection are destroyed. We formulate the problem of finding the structure of series-parallel multi-state system (including choice of system elements, choice of structure of multi-level protection and choice of protection methods) in order to achieve a desired level of system survivability by the minimal cost. An algorithm based on the universal generating function method is used for determination of the system survivability. A multi-processor version of genetic algorithm is used as optimization tool in order to solve the structure optimization problem. An application example is presented to illustrate the procedure presented in this paper
Design concepts for a virtualizable embedded MPSoC architecture enabling virtualization in embedded multi-processor systems

CERN Document Server

Biedermann, Alexander

2014-01-01

Alexander Biedermann presents a generic hardware-based virtualization approach, which may transform an array of any off-the-shelf embedded processors into a multi-processor system with high execution dynamism. Based on this approach, he highlights concepts for the design of energy aware systems, self-healing systems as well as parallelized systems. For the latter, the novel so-called Agile Processing scheme is introduced by the author, which enables a seamless transition between sequential and parallel execution schemes. The design of such virtualizable systems is further aided by introduction
CQPSO scheduling algorithm for heterogeneous multi-core DAG task model

Science.gov (United States)

Zhai, Wenzheng; Hu, Yue-Li; Ran, Feng

2017-07-01

Efficient task scheduling is critical to achieve high performance in a heterogeneous multi-core computing environment. The paper focuses on the heterogeneous multi-core directed acyclic graph (DAG) task model and proposes a novel task scheduling method based on an improved chaotic quantum-behaved particle swarm optimization (CQPSO) algorithm. A task priority scheduling list was built. A processor with minimum cumulative earliest finish time (EFT) was acted as the object of the first task assignment. The task precedence relationships were satisfied and the total execution time of all tasks was minimized. The experimental results show that the proposed algorithm has the advantage of optimization abilities, simple and feasible, fast convergence, and can be applied to the task scheduling optimization for other heterogeneous and distributed environment.
Debugging in a multi-processor environment

International Nuclear Information System (INIS)

Spann, J.M.

1981-01-01

The Supervisory Control and Diagnostic System (SCDS) for the Mirror Fusion Test Facility (MFTF) consists of nine 32-bit minicomputers arranged in a tightly coupled distributed computer system utilizing a share memory as the data exchange medium. Debugging of more than one program in the multi-processor environment is a difficult process. This paper describes what new tools were developed and how the testing of software is performed in the SCDS for the MFTF project
HEP - A semaphore-synchronized multiprocessor with central control. [Heterogeneous Element Processor

Science.gov (United States)

Gilliland, M. C.; Smith, B. J.; Calvert, W.

1976-01-01

The paper describes the design concept of the Heterogeneous Element Processor (HEP), a system tailored to the special needs of scientific simulation. In order to achieve high-speed computation required by simulation, HEP features a hierarchy of processes executing in parallel on a number of processors, with synchronization being largely accomplished by hardware. A full-empty-reserve scheme of synchronization is realized by zero-one-valued hardware semaphores. A typical system has, besides the control computer and the scheduler, an algebraic module, a memory module, a first-in first-out (FIFO) module, an integrator module, and an I/O module. The architecture of the scheduler and the algebraic module is examined in detail.
Tinuso: A processor architecture for a multi-core hardware simulation platform

DEFF Research Database (Denmark)

Schleuniger, Pascal; Karlsson, Sven

2010-01-01

Multi-core systems have the potential to improve performance, energy and cost properties of embedded systems but also require new design methods and tools to take advantage of the new architectures. Due to the limited accuracy and performance of pure software simulators, we are working on a cycle...... accurate hardware simulation platform. We have developed the Tinuso processor architecture for this platform. Tinuso is a processor architecture optimized for FPGA implementation. The instruction set makes use of predicated instructions and supports C/C++ and assembly language programming. It is designed...... to be easy extendable to maintain the exibility required for the research on multi-core systems. Tinuso contains a co-processor interface to connect to a network interface. This interface allow for communication over an on-chip network. A clock frequency estimation study on a deeply pipelined Tinuso...
Image matrix processor for fast multi-dimensional computations

Science.gov (United States)

Roberson, George P.; Skeate, Michael F.

1996-01-01

An apparatus for multi-dimensional computation which comprises a computation engine, including a plurality of processing modules. The processing modules are configured in parallel and compute respective contributions to a computed multi-dimensional image of respective two dimensional data sets. A high-speed, parallel access storage system is provided which stores the multi-dimensional data sets, and a switching circuit routes the data among the processing modules in the computation engine and the storage system. A data acquisition port receives the two dimensional data sets representing projections through an image, for reconstruction algorithms such as encountered in computerized tomography. The processing modules include a programmable local host, by which they may be configured to execute a plurality of different types of multi-dimensional algorithms. The processing modules thus include an image manipulation processor, which includes a source cache, a target cache, a coefficient table, and control software for executing image transformation routines using data in the source cache and the coefficient table and loading resulting data in the target cache. The local host processor operates to load the source cache with a two dimensional data set, loads the coefficient table, and transfers resulting data out of the target cache to the storage system, or to another destination.
3D Seismic Imaging through Reverse-Time Migration on Homogeneous and Heterogeneous Multi-Core Processors

Directory of Open Access Journals (Sweden)

Mauricio Araya-Polo

2009-01-01

Full Text Available Reverse-Time Migration (RTM is a state-of-the-art technique in seismic acoustic imaging, because of the quality and integrity of the images it provides. Oil and gas companies trust RTM with crucial decisions on multi-million-dollar drilling investments. But RTM requires vastly more computational power than its predecessor techniques, and this has somewhat hindered its practical success. On the other hand, despite multi-core architectures promise to deliver unprecedented computational power, little attention has been devoted to mapping efficiently RTM to multi-cores. In this paper, we present a mapping of the RTM computational kernel to the IBM Cell/B.E. processor that reaches close-to-optimal performance. The kernel proves to be memory-bound and it achieves a 98% utilization of the peak memory bandwidth. Our Cell/B.E. implementation outperforms a traditional processor (PowerPC 970MP in terms of performance (with an 15.0× speedup and energy-efficiency (with a 10.0× increase in the GFlops/W delivered. Also, it is the fastest RTM implementation available to the best of our knowledge. These results increase the practical usability of RTM. Also, the RTM-Cell/B.E. combination proves to be a strong competitor in the seismic arena.
Enhancing Image Processing Performance for PCID in a Heterogeneous Network of Multi-code Processors

Science.gov (United States)

Linderman, R.; Spetka, S.; Fitzgerald, D.; Emeny, S.

The Physically-Constrained Iterative Deconvolution (PCID) image deblurring code is being ported to heterogeneous networks of multi-core systems, including Intel Xeons and IBM Cell Broadband Engines. This paper reports results from experiments using the JAWS supercomputer at MHPCC (60 TFLOPS of dual-dual Xeon nodes linked with Infiniband) and the Cell Cluster at AFRL in Rome, NY. The Cell Cluster has 52 TFLOPS of Playstation 3 (PS3) nodes with IBM Cell Broadband Engine multi-cores and 15 dual-quad Xeon head nodes. The interconnect fabric includes Infiniband, 10 Gigabit Ethernet and 1 Gigabit Ethernet to each of the 336 PS3s. The results compare approaches to parallelizing FFT executions across the Xeons and the Cell's Synergistic Processing Elements (SPEs) for frame-level image processing. The experiments included Intel's Performance Primitives and Math Kernel Library, FFTW3.2, and Carnegie Mellon's SPIRAL. Optimization of FFTs in the PCID code led to a decrease in relative processing time for FFTs. Profiling PCID version 6.2, about one year ago, showed the 13 functions that accounted for the highest percentage of processing were all FFT processing functions. They accounted for over 88% of processing time in one run on Xeons. FFT optimizations led to improvement in the current PCID version 8.0. A recent profile showed that only two of the 19 functions with the highest processing time were FFT processing functions. Timing measurements showed that FFT processing for PCID version 8.0 has been reduced to less than 19% of overall processing time. We are working toward a goal of scaling to 200-400 cores per job (1-2 imagery frames/core). Running a pair of cores on each set of frames reduces latency by implementing parallel FFT processing. Our current results show scaling well out to 100 pairs of cores. These results support the next higher level of parallelism in PCID, where groups of several hundred frames each producing one resolved image are sent to cliques of several
On the effective parallel programming of multi-core processors

NARCIS (Netherlands)

Varbanescu, A.L.

2010-01-01

Multi-core processors are considered now the only feasible alternative to the large single-core processors which have become limited by technological aspects such as power consumption and heat dissipation. However, due to their inherent parallel structure and their diversity, multi-cores are
Exploring Task Mappings on Heterogeneous MPSoCs using a Bias-Elitist Genetic Algorithm

NARCIS (Netherlands)

Quan, W.; Pimentel, A.D.

2014-01-01

Exploration of task mappings plays a crucial role in achieving high performance in heterogeneous multi-processor system-on-chip (MPSoC) platforms. The problem of optimally mapping a set of tasks onto a set of given heterogeneous processors for maximal throughput has been known, in general, to be
Development of a VME multi-processor system for plasma control at the JT-60 Upgrade

International Nuclear Information System (INIS)

Takahashi, M.; Kurihara, K.; Kawamata, Y.; Akasaka, H.; Kimura, T.

1992-01-01

Design and initial operation results are reported of a VME multi-processor system [1] for plasma control at a large fusion device named 'the JT-60 Upgrade' utilizing three 32-bit MC88100 based RISC computers and VME components. Development of the system was stimulated by faster and more accurate computation requirements for the plasma position and current control. The RISC computers operate at 25 MHz along with two cashe memories named MC88200. We newly developed VME bus modules of up/down counter, analog-to-digital converter and clock pulse generator for measuring magnetic field and coil current and for synchronizing the processing in the three RISCs and direct digital controllers (DDCs) of magnet power supplies. We also evaluated that the speed of the data transfer between the VME bus system and the DDCs through CAMAC highways satisfies the above requirements. In the initial operation of the JT-60 upgrade, it has been proved that the VME multi-processor system well controls the plasma position and current with a sampling period of 250 μsec and a delay of 500 μsec. (author)
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

Science.gov (United States)

Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

2008-04-01

This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
Combining on-hardware prototyping and high-level simulation for DSE of multi-ASIP systems

NARCIS (Netherlands)

Meloni, P.; Pomata, S.; Raffo, L.; Piscitelli, R.; Pimentel, A.D.; McAllister, J.; Bhattacharyya, S.

2012-01-01

Modern heterogeneous multi-processor embedded systems very often expose to the designer a large number of degrees of freedom, related to the application partitioning/mapping and to the component- and system-level architecture composition. The number is even larger when the designer targets systems
Application of Advanced Multi-Core Processor Technologies to Oceanographic Research

Science.gov (United States)

2013-09-30

1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Application of Advanced Multi-Core Processor Technologies...STM32 NXP LPC series No Proprietary Microchip PIC32/DSPIC No > 500 mW; < 5 W ARM Cortex TI OMAP TI Sitara Broadcom BCM2835 Varies FPGA...state-of-the-art information processing architectures. OBJECTIVES Next-generation processor architectures (multi-core, multi-threaded) hold the
Evaluation of Selected Resource Allocation and Scheduling Methods in Heterogeneous Many-Core Processors and Graphics Processing Units

Directory of Open Access Journals (Sweden)

Ciznicki Milosz

2014-12-01

Full Text Available Heterogeneous many-core computing resources are increasingly popular among users due to their improved performance over homogeneous systems. Many developers have realized that heterogeneous systems, e.g. a combination of a shared memory multi-core CPU machine with massively parallel Graphics Processing Units (GPUs, can provide significant performance opportunities to a wide range of applications. However, the best overall performance can only be achieved if application tasks are efficiently assigned to different types of processor units in time taking into account their specific resource requirements. Additionally, one should note that available heterogeneous resources have been designed as general purpose units, however, with many built-in features accelerating specific application operations. In other words, the same algorithm or application functionality can be implemented as a different task for CPU or GPU. Nevertheless, from the perspective of various evaluation criteria, e.g. the total execution time or energy consumption, we may observe completely different results. Therefore, as tasks can be scheduled and managed in many alternative ways on both many-core CPUs or GPUs and consequently have a huge impact on the overall computing resources performance, there are needs for new and improved resource management techniques. In this paper we discuss results achieved during experimental performance studies of selected task scheduling methods in heterogeneous computing systems. Additionally, we present a new architecture for resource allocation and task scheduling library which provides a generic application programming interface at the operating system level for improving scheduling polices taking into account a diversity of tasks and heterogeneous computing resources characteristics.
The Fermilab Advanced Computer Program multi-array processor system (ACPMAPS): A site oriented supercomputer for theoretical physics

International Nuclear Information System (INIS)

Nash, T.; Areti, H.; Atac, R.

1988-08-01

The ACP Multi-Array Processor System (ACPMAPS) is a highly cost effective, local memory parallel computer designed for floating point intensive grid based problems. The processing nodes of the system are single board array processors based on the FORTRAN and C programmable Weitek XL chip set. The nodes are connected by a network of very high bandwidth 16 port crossbar switches. The architecture is designed to achieve the highest possible cost effectiveness while maintaining a high level of programmability. The primary application of the machine at Fermilab will be lattice gauge theory. The hardware is supported by a transparent site oriented software system called CANOPY which shields theorist users from the underlying node structure. 4 refs., 2 figs

Quality-Driven Model-Based Design of MultiProcessor Embedded Systems for Highlydemanding Applications

DEFF Research Database (Denmark)

Jozwiak, Lech; Madsen, Jan

2013-01-01

The recent spectacular progress in modern nano-dimension semiconductor technology enabled implementation of a complete complex multi-processor system on a single chip (MPSoC), global networking and mobile wire-less communication, and facilitated a fast progress in these areas. New important...... accessible or distant) objects, installations, machines or devices, or even implanted in human or animal body can serve as examples. However, many of the modern embedded application impose very stringent functional and parametric demands. Moreover, the spectacular advances in microelectronics introduced...
Consensus of heterogeneous multi-agent systems based on sampled data with a small sampling delay

International Nuclear Information System (INIS)

Wang Na; Wu Zhi-Hai; Peng Li

2014-01-01

In this paper, consensus problems of heterogeneous multi-agent systems based on sampled data with a small sampling delay are considered. First, a consensus protocol based on sampled data with a small sampling delay for heterogeneous multi-agent systems is proposed. Then, the algebra graph theory, the matrix method, the stability theory of linear systems, and some other techniques are employed to derive the necessary and sufficient conditions guaranteeing heterogeneous multi-agent systems to asymptotically achieve the stationary consensus. Finally, simulations are performed to demonstrate the correctness of the theoretical results. (interdisciplinary physics and related areas of science and technology)
Benchmarking NWP Kernels on Multi- and Many-core Processors

Science.gov (United States)

Michalakes, J.; Vachharajani, M.

2008-12-01

Increased computing power for weather, climate, and atmospheric science has provided direct benefits for defense, agriculture, the economy, the environment, and public welfare and convenience. Today, very large clusters with many thousands of processors are allowing scientists to move forward with simulations of unprecedented size. But time-critical applications such as real-time forecasting or climate prediction need strong scaling: faster nodes and processors, not more of them. Moreover, the need for good cost- performance has never been greater, both in terms of performance per watt and per dollar. For these reasons, the new generations of multi- and many-core processors being mass produced for commercial IT and "graphical computing" (video games) are being scrutinized for their ability to exploit the abundant fine- grain parallelism in atmospheric models. We present results of our work to date identifying key computational kernels within the dynamics and physics of a large community NWP model, the Weather Research and Forecast (WRF) model. We benchmark and optimize these kernels on several different multi- and many-core processors. The goals are to (1) characterize and model performance of the kernels in terms of computational intensity, data parallelism, memory bandwidth pressure, memory footprint, etc. (2) enumerate and classify effective strategies for coding and optimizing for these new processors, (3) assess difficulties and opportunities for tool or higher-level language support, and (4) establish a continuing set of kernel benchmarks that can be used to measure and compare effectiveness of current and future designs of multi- and many-core processors for weather and climate applications.
NMR-MPar: A Fault-Tolerance Approach for Multi-Core and Many-Core Processors

Directory of Open Access Journals (Sweden)

Vanessa Vargas

2018-03-01

Full Text Available Multi-core and many-core processors are a promising solution to achieve high performance by maintaining a lower power consumption. However, the degree of miniaturization makes them more sensitive to soft-errors. To improve the system reliability, this work proposes a fault-tolerance approach based on redundancy and partitioning principles called N-Modular Redundancy and M-Partitions (NMR-MPar. By combining both principles, this approach allows multi-/many-core processors to perform critical functions in mixed-criticality systems. Benefiting from the capabilities of these devices, NMR-MPar creates different partitions that perform independent functions. For critical functions, it is proposed that N partitions with the same configuration participate of an N-modular redundancy system. In order to validate the approach, a case study is implemented on the KALRAY Multi-Purpose Processing Array (MPPA-256 many-core processor running two parallel benchmark applications. The traveling salesman problem and matrix multiplication applications were selected to test different device’s resources. The effectiveness of NMR-MPar is assessed by software-implemented fault-injection. For evaluation purposes, it is considered that the system is intended to be used in avionics. Results show the improvement of the application reliability by two orders of magnitude when implementing NMR-MPar on the system. Finally, this work opens the possibility to use massive parallelism for dependable applications in embedded systems.
A multi-chip data acquisition system based on a heterogeneous system-on-chip platform

CERN Document Server

Fiergolski, Adrian

2017-01-01

The Control and Readout Inner tracking BOard (CaRIBOu) is a versatile readout system targeting a multitude of detector prototypes. It profits from the heterogeneous platform of the Zynq System-on-Chip (SoC) and integrates in a monolithic device front-end FPGA resources with a back-end software running on a hard-core ARM-based processor. The user-friendly Linux terminal with the pre-installed DAQ software is combined with the efficiency and throughput of a system fully implemented in the FPGA fabric. The paper presents the design of the SoC-based DAQ system and its building blocks. It also shows examples of the achieved functionality for the CLICpix2 readout ASIC.
Climbing Mont Blanc - A Training Site for Energy Efficient Programming on Heterogeneous Multicore Processors

OpenAIRE

Natvig, Lasse; Follan, Torbjørn; Støa, Simen; Magnussen, Sindre; Guirado, Antonio Garcia

2015-01-01

Climbing Mont Blanc (CMB) is an open online judge used for training in energy efficient programming of state-of-the-art heterogeneous multicores. It uses an Odroid-XU3 board from Hardkernel with an Exynos Octa processor and integrated power sensors. This processor is three-way heterogeneous containing 14 different cores of three different types. The board currently accepts C and C++ programs, with support for OpenCL v1.1, OpenMP 4.0 and Pthreads. Programs submitted using the graphical user in...
ARTiS, an Asymmetric Real-Time Scheduler for Linux on Multi-Processor Architectures

OpenAIRE

Piel , Éric; Marquet , Philippe; Soula , Julien; Osuna , Christophe; Dekeyser , Jean-Luc

2005-01-01

The ARTiS system is a real-time extension of the GNU/Linux scheduler dedicated to SMP (Symmetric Multi-Processors) systems. It allows to mix High Performance Computing and real-time. ARTiS exploits the SMP architecture to guarantee the preemption of a processor when the system has to schedule a real-time task. The implementation is available as a modification of the Linux kernel, especially focusing (but not restricted to) IA-64 architecture. The basic idea of ARTiS is to assign a selected se...
A scalable single-chip multi-processor architecture with on-chip RTOS kernel

NARCIS (Netherlands)

Theelen, B.D.; Verschueren, A.C.; Reyes Suarez, V.V.; Stevens, M.P.J.; Nunez, A.

2003-01-01

Now that system-on-chip technology is emerging, single-chip multi-processors are becoming feasible. A key problem of designing such systems is the complexity of their on-chip interconnects and memory architecture. It is furthermore unclear at what level software should be integrated. An example of a
Temporal analysis and scheduling of hard real-time radios running on a multi-processor

NARCIS (Netherlands)

Moreira, O.

2012-01-01

On a multi-radio baseband system, multiple independent transceivers must share the resources of a multi-processor, while meeting each its own hard real-time requirements. Not all possible combinations of transceivers are known at compile time, so a solution must be found that either allows for
A Heterogeneous Multi-core Architecture with a Hardware Kernel for Control Systems

DEFF Research Database (Denmark)

Li, Gang; Guan, Wei; Sierszecki, Krzysztof

2012-01-01

Rapid industrialisation has resulted in a demand for improved embedded control systems with features such as predictability, high processing performance and low power consumption. Software kernel implementation on a single processor is becoming more difficult to satisfy those constraints....... This paper presents a multi-core architecture incorporating a hardware kernel on FPGAs, intended for high performance applications in control engineering domain. First, the hardware kernel is investigated on the basis of a component-based real-time kernel HARTEX (Hard Real-Time Executive for Control Systems...
Design of massively parallel hardware multi-processors for highly-demanding embedded applications

NARCIS (Netherlands)

Jozwiak, L.; Jan, Y.

2013-01-01

Many new embedded applications require complex computations to be performed to tight schedules, while at the same time demanding low energy consumption and low cost. For implementation of these highly-demanding applications, highly-optimized application-specific multi-processor system-on-a-chip
FAST: framework for heterogeneous medical image computing and visualization.

Science.gov (United States)

Smistad, Erik; Bozorgi, Mohammadmehdi; Lindseth, Frank

2015-11-01

Computer systems are becoming increasingly heterogeneous in the sense that they consist of different processors, such as multi-core CPUs and graphic processing units. As the amount of medical image data increases, it is crucial to exploit the computational power of these processors. However, this is currently difficult due to several factors, such as driver errors, processor differences, and the need for low-level memory handling. This paper presents a novel FrAmework for heterogeneouS medical image compuTing and visualization (FAST). The framework aims to make it easier to simultaneously process and visualize medical images efficiently on heterogeneous systems. FAST uses common image processing programming paradigms and hides the details of memory handling from the user, while enabling the use of all processors and cores on a system. The framework is open-source, cross-platform and available online. Code examples and performance measurements are presented to show the simplicity and efficiency of FAST. The results are compared to the insight toolkit (ITK) and the visualization toolkit (VTK) and show that the presented framework is faster with up to 20 times speedup on several common medical imaging algorithms. FAST enables efficient medical image computing and visualization on heterogeneous systems. Code examples and performance evaluations have demonstrated that the toolkit is both easy to use and performs better than existing frameworks, such as ITK and VTK.
The communication processor of TUMULT-64

NARCIS (Netherlands)

Smit, Gerardus Johannes Maria; Jansen, P.G.

1988-01-01

Tumult (Twente University MULTi-processor system) is a modular extendible multi-processor system designed and implemented at the Twente University of Technology in co-operation with Oce Nederland B.V. and the Dr. Neher Laboratories (Dutch PTT). Characteristics of the hardware are: MIMD type,
Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors

DEFF Research Database (Denmark)

Liu, Weifeng; Vinter, Brian

2015-01-01

of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over...
Automation of multi-agent control for complex dynamic systems in heterogeneous computational network

Science.gov (United States)

Oparin, Gennady; Feoktistov, Alexander; Bogdanova, Vera; Sidorov, Ivan

2017-01-01

The rapid progress of high-performance computing entails new challenges related to solving large scientific problems for various subject domains in a heterogeneous distributed computing environment (e.g., a network, Grid system, or Cloud infrastructure). The specialists in the field of parallel and distributed computing give the special attention to a scalability of applications for problem solving. An effective management of the scalable application in the heterogeneous distributed computing environment is still a non-trivial issue. Control systems that operate in networks, especially relate to this issue. We propose a new approach to the multi-agent management for the scalable applications in the heterogeneous computational network. The fundamentals of our approach are the integrated use of conceptual programming, simulation modeling, network monitoring, multi-agent management, and service-oriented programming. We developed a special framework for an automation of the problem solving. Advantages of the proposed approach are demonstrated on the parametric synthesis example of the static linear regulator for complex dynamic systems. Benefits of the scalable application for solving this problem include automation of the multi-agent control for the systems in a parallel mode with various degrees of its detailed elaboration.
CASPER: Embedding Power Estimation and Hardware-Controlled Power Management in a Cycle-Accurate Micro-Architecture Simulation Platform for Many-Core Multi-Threading Heterogeneous Processors

Directory of Open Access Journals (Sweden)

Arun Ravindran

2012-02-01

Full Text Available Despite the promising performance improvement observed in emerging many-core architectures in high performance processors, high power consumption prohibitively affects their use and marketability in the low-energy sectors, such as embedded processors, network processors and application specific instruction processors (ASIPs. While most chip architects design power-efficient processors by finding an optimal power-performance balance in their design, some use sophisticated on-chip autonomous power management units, which dynamically reduce the voltage or frequencies of idle cores and hence extend battery life and reduce operating costs. For large scale designs of many-core processors, a holistic approach integrating both these techniques at different levels of abstraction can potentially achieve maximal power savings. In this paper we present CASPER, a robust instruction trace driven cycle-accurate many-core multi-threading micro-architecture simulation platform where we have incorporated power estimation models of a wide variety of tunable many-core micro-architectural design parameters, thus enabling processor architects to explore a sufficiently large design space and achieve power-efficient designs. Additionally CASPER is designed to accommodate cycle-accurate models of hardware controlled power management units, enabling architects to experiment with and evaluate different autonomous power-saving mechanisms to study the run-time power-performance trade-offs in embedded many-core processors. We have implemented two such techniques in CASPER–Chipwide Dynamic Voltage and Frequency Scaling, and Performance Aware Core-Specific Frequency Scaling, which show average power savings of 35.9% and 26.2% on a baseline 4-core SPARC based architecture respectively. This power saving data accounts for the power consumption of the power management units themselves. The CASPER simulation platform also provides users with complete support of SPARCV9
Multi-processor system for real-time deconvolution and flow estimation in medical ultrasound

DEFF Research Database (Denmark)

Jensen, Jesper Lomborg; Jensen, Jørgen Arendt; Stetson, Paul F.

1996-01-01

of the algorithms. Many of the algorithms can only be properly evaluated in a clinical setting with real-time processing, which generally cannot be done with conventional equipment. This paper therefore presents a multi-processor system capable of performing 1.2 billion floating point operations per second on RF...... filter is used with a second time-reversed recursive estimation step. Here it is necessary to perform about 70 arithmetic operations per RF sample or about 1 billion operations per second for real-time deconvolution. Furthermore, these have to be floating point operations due to the adaptive nature...... interfaced to our previously-developed real-time sampling system that can acquire RF data at a rate of 20 MHz and simultaneously transmit the data at 20 MHz to the processing system via several parallel channels. These two systems can, thus, perform real-time processing of ultrasound data. The advantage...
The LASS hardware processor

International Nuclear Information System (INIS)

Kunz, P.F.

1976-01-01

The problems of data analysis with hardware processors are reviewed and a description is given of a programmable processor. This processor, the 168/E, has been designed for use in the LASS multi-processor system; it has an execution speed comparable to the IBM 370/168 and uses the subset of IBM 370 instructions appropriate to the LASS analysis task. (Auth.)
Automated and Assistive Tools for Accelerated Code migration of Scientific Computing on to Heterogeneous MultiCore Systems

Science.gov (United States)

2017-04-13

AFRL-AFOSR-UK-TR-2017-0029 Automated and Assistive Tools for Accelerated Code migration of Scientific Computing on to Heterogeneous MultiCore Systems ...2012, “ Automated and Assistive Tools for Accelerated Code migration of Scientific Computing on to Heterogeneous MultiCore Systems .” 2. The objective...2012 - 01/25/2015 4. TITLE AND SUBTITLE Automated and Assistive Tools for Accelerated Code migration of Scientific Computing on to Heterogeneous
Parallelising a molecular dynamics algorithm on a multi-processor workstation

Science.gov (United States)

Müller-Plathe, Florian

1990-12-01

The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.

Optimal task partition and state-dependent loading in heterogeneous two-element work sharing system

International Nuclear Information System (INIS)

Levitin, Gregory; Xing, Liudong; Ben-Haim, Hanoch; Dai, Yuanshun

2016-01-01

Many real-world systems such as multi-channel data communication, multi-path flow transmission and multi-processor computing systems have work sharing attributes where system elements perform different portions of the same task simultaneously. Motivated by these applications, this paper models a heterogeneous work-sharing system with two non-repairable elements. When one element fails, the other element takes over the uncompleted task of the failed element upon finishing its own part; the load level of the remaining operating element can change at the time of the failure, which further affects its performance, failure behavior and operation cost. Considering these dynamics, mission success probability (MSP), expected mission completion time (EMCT) and expected cost of successful mission (ECSM) are first derived. Further, optimization problems are formulated and solved, which find optimal task partition and element load levels maximizing MSP, minimizing EMCT or minimizing ECSM. Effects of element reliability, performance, operation cost on the optimal solutions are also investigated through examples. Results of this work can facilitate a tradeoff analysis of different mission performance indices for heterogeneous work-sharing systems. - Highlights: • A heterogeneous work-sharing system with two non-repairable elements is considered. • The optimal work distribution and element loading problem is formulated and solved. • Effects of element reliability, performance, operation cost on the optimal solutions are investigated.
Quality-driven model-based design of multi-processor accelerators : an application to LDPC decoders

NARCIS (Netherlands)

Jan, Y.

2012-01-01

The recent spectacular progress in nano-electronic technology has enabled the implementation of very complex multi-processor systems on single chips (MPSoCs). However in parallel, new highly demanding complex embedded applications are emerging, in fields like communication and networking,
C-HEAP : a heterogeneous multi-processor architecture template and scalable and flexible protocol for the design of embedded signal processing systems

NARCIS (Netherlands)

Nieuwland, A.K.; Kang, J.; Gangwal, O.P.; Sethuraman, R.; Busá, N.G.; Goossens, K.G.W.; Peset Llopis, R.; Lippens, P.E.R.

2002-01-01

The key issue in the design of Systems-on-a-Chip (SoC) is to trade-off efficiency against flexibility, and time to market versus cost. Current deep submicron processing technologies enable integration of multiple software programmable processors (e.g., CPUs, DSPs) and dedicated hardware components
Heterogeneous Multi-Robot System for Mapping Environmental Variables of Greenhouses.

Science.gov (United States)

Roldán, Juan Jesús; Garcia-Aunon, Pablo; Garzón, Mario; de León, Jorge; Del Cerro, Jaime; Barrientos, Antonio

2016-07-01

The productivity of greenhouses highly depends on the environmental conditions of crops, such as temperature and humidity. The control and monitoring might need large sensor networks, and as a consequence, mobile sensory systems might be a more suitable solution. This paper describes the application of a heterogeneous robot team to monitor environmental variables of greenhouses. The multi-robot system includes both ground and aerial vehicles, looking to provide flexibility and improve performance. The multi-robot sensory system measures the temperature, humidity, luminosity and carbon dioxide concentration in the ground and at different heights. Nevertheless, these measurements can be complemented with other ones (e.g., the concentration of various gases or images of crops) without a considerable effort. Additionally, this work addresses some relevant challenges of multi-robot sensory systems, such as the mission planning and task allocation, the guidance, navigation and control of robots in greenhouses and the coordination among ground and aerial vehicles. This work has an eminently practical approach, and therefore, the system has been extensively tested both in simulations and field experiments.
Heterogeneous Multi-Robot System for Mapping Environmental Variables of Greenhouses

Directory of Open Access Journals (Sweden)

Juan Jesús Roldán

2016-07-01

Full Text Available The productivity of greenhouses highly depends on the environmental conditions of crops, such as temperature and humidity. The control and monitoring might need large sensor networks, and as a consequence, mobile sensory systems might be a more suitable solution. This paper describes the application of a heterogeneous robot team to monitor environmental variables of greenhouses. The multi-robot system includes both ground and aerial vehicles, looking to provide flexibility and improve performance. The multi-robot sensory system measures the temperature, humidity, luminosity and carbon dioxide concentration in the ground and at different heights. Nevertheless, these measurements can be complemented with other ones (e.g., the concentration of various gases or images of crops without a considerable effort. Additionally, this work addresses some relevant challenges of multi-robot sensory systems, such as the mission planning and task allocation, the guidance, navigation and control of robots in greenhouses and the coordination among ground and aerial vehicles. This work has an eminently practical approach, and therefore, the system has been extensively tested both in simulations and field experiments.
Performance analysis of general purpose and digital signal processor kernels for heterogeneous systems-on-chip

Directory of Open Access Journals (Sweden)

T. von Sydow

2003-01-01

Full Text Available Various reasons like technology progress, flexibility demands, shortened product cycle time and shortened time to market have brought up the possibility and necessity to integrate different architecture blocks on one heterogeneous System-on-Chip (SoC. Architecture blocks like programmable processor cores (DSP- and GPP-kernels, embedded FPGAs as well as dedicated macros will be integral parts of such a SoC. Especially programmable architecture blocks and associated optimization techniques are discussed in this contribution. Design space exploration and thus the choice which architecture blocks should be integrated in a SoC is a challenging task. Crucial to this exploration is the evaluation of the application domain characteristics and the costs caused by individual architecture blocks integrated on a SoC. An ATE-cost function has been applied to examine the performance of the aforementioned programmable architecture blocks. Therefore, representative discrete devices have been analyzed. Furthermore, several architecture dependent optimization steps and their effects on the cost ratios are presented.
Efficient gate set tomography on a multi-qubit superconducting processor

Science.gov (United States)

Nielsen, Erik; Rudinger, Kenneth; Blume-Kohout, Robin; Bestwick, Andrew; Bloom, Benjamin; Block, Maxwell; Caldwell, Shane; Curtis, Michael; Hudson, Alex; Orgiazzi, Jean-Luc; Papageorge, Alexander; Polloreno, Anthony; Reagor, Matt; Rubin, Nicholas; Scheer, Michael; Selvanayagam, Michael; Sete, Eyob; Sinclair, Rodney; Smith, Robert; Vahidpour, Mehrnoosh; Villiers, Marius; Zeng, William; Rigetti, Chad

Quantum information processors with five or more qubits are becoming common. Complete, predictive characterization of such devices e.g. via any form of tomography, including gate set tomography appears impossible because the parameter space is intractably large. Randomized benchmarking scales well, but cannot predict device behavior or diagnose failure modes. We introduce a new type of gate set tomography that uses an efficient ansatz to model physically plausible errors, but scales polynomially with the number of qubits. We will describe the theory behind this multi-qubit tomography and present experimental results from using it to characterize a multi-qubit processor made by Rigetti Quantum Computing. Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidary of Lockheed Martin Corporation, for the US Department of Energy's NNSA under contract DE-AC04-94AL85000.
Sistem Komunikasi Modul Sensor Jamak Berbasiskan Mikrokontroler Menggunakan Serial Rs-485 Mode Multi Processor Communication (Mpc

Directory of Open Access Journals (Sweden)

Suar wibawa

2016-08-01

Full Text Available Multi-sensor communication system uses RS-485 standard communication connecting each microcontroller-based data processing unit to form BUS topology network. The advantages of this communication system are: connectivity (easy to connecting devices on a network, scalability (flexibility to expand the network, more resistant to noise, and easier maintenance. The System is built using Master-Slave communication approach model. This system need to filter every data packet on communication channel because every device that connect in this network can hear every data packet across this network. Multi Processor Communication (MPC model is applied to reduce processor’s burden in inspecting every data packet, so the processor that work in slave side only need to inspect the message for itself without inspecting every data packet across the communication chanel.
Programming heterogeneous MPSoCs tool flows to close the software productivity gap

CERN Document Server

Castrillón Mazo, Jerónimo

2014-01-01

This book provides embedded software developers with techniques for programmingheterogeneous Multi-Processor Systems-on-Chip (MPSoCs), capable of executing multiple applications simultaneously. It describes a set of algorithms and methodologies to narrow the software productivity gap, as well as an in-depth description of the underlying problems and challenges of today’s programming practices. The authors present four different tool flows: A parallelism extraction flow for applications writtenusing the C programming language, a mapping and scheduling flow for parallel applications, a special mapping flow for baseband applications in the context of Software Defined Radio (SDR) and a final flow for analyzing multiple applications at design time. The tool flows are evaluated on Virtual Platforms (VPs), which mimic different characteristics of state-of-the-art heterogeneous MPSoCs. • Provides a novel set of algorithms and methodologies for programming heterogeneous Multi-Processor Systems-on-Chip (MPSoCs)...
A Scalable Multicore Architecture With Heterogeneous Memory Structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs).

Science.gov (United States)

Moradi, Saber; Qiao, Ning; Stefanini, Fabio; Indiveri, Giacomo

2018-02-01

Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here, we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multicore neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.
A longitudinal multi-bunch feedback system using parallel digital signal processors

International Nuclear Information System (INIS)

Sapozhnikov, L.; Fox, J.D.; Olsen, J.J.; Oxoby, G.; Linscott, I.; Drago, A.; Serio, M.

1994-01-01

A programmable longitudinal feedback system based on four AT ampersand T 1610 digital signal processors has been developed as a component of the PEP-II R ampersand D program. This longitudinal quick prototype is a proof of concept for the PEP-II system and implements full-speed bunch-by-bunch signal processing for storage rings with bunch spacings of 4 ns. The design incorporates a phase-detector-based front end that digitizes the oscillation phases of bunches at the 250 MHz crossing rate, four programmable signal processors that compute correction signals, and a 250-MHz hold buffer/kicker driver stage that applies correction signals back on the beam. The design implements a general-purpose, table-driven downsampler that allows the system to be operated at several accelerator facilities. The hardware architecture of the signal processing is described, and the software algorithms used in the feedback signal computation are discussed. The system configuration used for tests at the LBL Advanced Light Source is presented
Automotive Fuel Processor Development and Demonstration with Fuel Cell Systems

Energy Technology Data Exchange (ETDEWEB)

Nuvera Fuel Cells

2005-04-15

The potential for fuel cell systems to improve energy efficiency and reduce emissions over conventional power systems has generated significant interest in fuel cell technologies. While fuel cells are being investigated for use in many applications such as stationary power generation and small portable devices, transportation applications present some unique challenges for fuel cell technology. Due to their lower operating temperature and non-brittle materials, most transportation work is focusing on fuel cells using proton exchange membrane (PEM) technology. Since PEM fuel cells are fueled by hydrogen, major obstacles to their widespread use are the lack of an available hydrogen fueling infrastructure and hydrogen's relatively low energy storage density, which leads to a much lower driving range than conventional vehicles. One potential solution to the hydrogen infrastructure and storage density issues is to convert a conventional fuel such as gasoline into hydrogen onboard the vehicle using a fuel processor. Figure 2 shows that gasoline stores roughly 7 times more energy per volume than pressurized hydrogen gas at 700 bar and 4 times more than liquid hydrogen. If integrated properly, the fuel processor/fuel cell system would also be more efficient than traditional engines and would give a fuel economy benefit while hydrogen storage and distribution issues are being investigated. Widespread implementation of fuel processor/fuel cell systems requires improvements in several aspects of the technology, including size, startup time, transient response time, and cost. In addition, the ability to operate on a number of hydrocarbon fuels that are available through the existing infrastructure is a key enabler for commercializing these systems. In this program, Nuvera Fuel Cells collaborated with the Department of Energy (DOE) to develop efficient, low-emission, multi-fuel processors for transportation applications. Nuvera's focus was on (1) developing fuel
Efficient Programming for Multicore Processor Heterogeneity: OpenMP versus OmpSs

OpenAIRE

Butko , Anastasiia; Bruguier , Florent; Gamatié , Abdoulaye; Sassatelli , Gilles

2017-01-01

International audience; ARM single-ISA heterogeneous multicore processors combine high-performance big cores with power-efficient small cores. They aim at achieving a suitable balance between performance and energy. How- ever, a main challenge is to program such architectures so as to efficiently exploit their features. In this paper, we study the impact on performance and energy trade-offs of single-ISA architecture according to OpenMP 3.0 and the OmpSs programming models. We consider differ...
System, methods and apparatus for program optimization for multi-threaded processor architectures

Science.gov (United States)

Bastoul, Cedric; Lethin, Richard A; Leung, Allen K; Meister, Benoit J; Szilagyi, Peter; Vasilache, Nicolas T; Wohlford, David E

2015-01-06

Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
Heterogeneous System Architectures from APUs to discrete GPUs

CERN Multimedia

CERN. Geneva

2013-01-01

We will present the Heterogeneous Systems Architectures that new AMD processors are bringing with the new GCN based GPUs and the new APUs. We will show how together they represent a huge step forward for programming flexibility and performance efficiently for Compute.
Highway traffic simulation on multi-processor computers

Energy Technology Data Exchange (ETDEWEB)

Hanebutte, U.R.; Doss, E.; Tentner, A.M.

1997-04-01

A computer model has been developed to simulate highway traffic for various degrees of automation with a high level of fidelity in regard to driver control and vehicle characteristics. The model simulates vehicle maneuvering in a multi-lane highway traffic system and allows for the use of Intelligent Transportation System (ITS) technologies such as an Automated Intelligent Cruise Control (AICC). The structure of the computer model facilitates the use of parallel computers for the highway traffic simulation, since domain decomposition techniques can be applied in a straight forward fashion. In this model, the highway system (i.e. a network of road links) is divided into multiple regions; each region is controlled by a separate link manager residing on an individual processor. A graphical user interface augments the computer model kv allowing for real-time interactive simulation control and interaction with each individual vehicle and road side infrastructure element on each link. Average speed and traffic volume data is collected at user-specified loop detector locations. Further, as a measure of safety the so- called Time To Collision (TTC) parameter is being recorded.
Simulation of Particulate Flows Multi-Processor Machines with Distributed Memory

Energy Technology Data Exchange (ETDEWEB)

Uhlmann, M.

2004-07-01

We presented a method for the parallelization of an immersed boundary algorithm for particulate flows using the MPI standard of communication. The treatment of the fluid phase used the domain decomposition technique over a Cartesian processor grid. The solution of the Helmholtz problem is approximately factorized an relies upon apparel tri-diagonal solver the Poisson problem is solved by means of a parallel multi-grid technique similar to MUDPACK. for the solid phase we employ a master-slaves technique where one processor handles all the particles contained in its Eulerian fluid sub-domain and zero or more neighbor processors collaborate in the computation of particle-related quantities whenever a particle position over laps the boundary of a sub-domain. the parallel efficiency for some preliminary computations is presented. (Author) 9 refs.
DiFX: A software correlator for very long baseline interferometry using multi-processor computing environments

OpenAIRE

Deller, A. T.; Tingay, S. J.; Bailes, M.; West, C.

2007-01-01

We describe the development of an FX style correlator for Very Long Baseline Interferometry (VLBI), implemented in software and intended to run in multi-processor computing environments, such as large clusters of commodity machines (Beowulf clusters) or computers specifically designed for high performance computing, such as multi-processor shared-memory machines. We outline the scientific and practical benefits for VLBI correlation, these chiefly being due to the inherent flexibility of softw...
Scheduling Driven Partitioning of Heterogeneous Embedded Systems

DEFF Research Database (Denmark)

Pop, Paul; Eles, Petru; Peng, Zebo

1998-01-01

In this paper we present an algorithm for system level hardware/software partitioning of heterogeneous embedded systems. The system is represented as an abstract graph which captures both data-flow and the flow of control. Given an architecture consisting of several processors, ASICs and shared...... busses, our partitioning algorithm finds the partitioning with the smallest hardware cost and is able to predict and guarantee the performance of the system in terms of worst case delay....
SPORT: An Algorithm for Divisible Load Scheduling with Result Collection on Heterogeneous Systems

Science.gov (United States)

Ghatpande, Abhay; Nakazato, Hidenori; Beaumont, Olivier; Watanabe, Hiroshi

Divisible Load Theory (DLT) is an established mathematical framework to study Divisible Load Scheduling (DLS). However, traditional DLT does not address the scheduling of results back to source (i. e., result collection), nor does it comprehensively deal with system heterogeneity. In this paper, the DLSRCHETS (DLS with Result Collection on HET-erogeneous Systems) problem is addressed. The few papers to date that have dealt with DLSRCHETS, proposed simplistic LIFO (Last In, First Out) and FIFO (First In, First Out) type of schedules as solutions to DLSRCHETS. In this paper, a new polynomial time heuristic algorithm, SPORT (System Parameters based Optimized Result Transfer), is proposed as a solution to the DLSRCHETS problem. With the help of simulations, it is proved that the performance of SPORT is significantly better than existing algorithms. The other major contributions of this paper include, for the first time ever, (a) the derivation of the condition to identify the presence of idle time in a FIFO schedule for two processors, (b) the identification of the limiting condition for the optimality of FIFO and LIFO schedules for two processors, and (c) the introduction of the concept of equivalent processor in DLS for heterogeneous systems with result collection.

Parallelization of applications for networks with homogeneous and heterogeneous processors

International Nuclear Information System (INIS)

Colombet, L.

1994-01-01

The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs
Multi-processor system for real-time flow estimation in medical ultrasound imaging

DEFF Research Database (Denmark)

Stetson, Paul F.; Jensen, Jesper Lomborg; Antonius, Peter

1997-01-01

the processed data. The generous bandwidth of the links makes it easy to balance the computational load among the processors.In order to manage the shared system memory and to make use of the parallel processing capabilities of the system, a real-time multitasking kernel has been developed. The kernel uses...
Processors and systems (picture processing)

Energy Technology Data Exchange (ETDEWEB)

Gemmar, P

1983-01-01

Automatic picture processing requires high performance computers and high transmission capacities in the processor units. The author examines the possibilities of operating processors in parallel in order to accelerate the processing of pictures. He therefore discusses a number of available processors and systems for picture processing and illustrates their capacities for special types of picture processing. He stresses the fact that the amount of storage required for picture processing is exceptionally high. The author concludes that it is as yet difficult to decide whether very large groups of simple processors or highly complex multiprocessor systems will provide the best solution. Both methods will be aided by the development of VLSI. New solutions have already been offered (systolic arrays and 3-d processing structures) but they also are subject to losses caused by inherently parallel algorithms. Greater efforts must be made to produce suitable software for multiprocessor systems. Some possibilities for future picture processing systems are discussed. 33 references.
Composable processor virtualization for embedded systems

NARCIS (Netherlands)

Molnos, A.M.; Milutinovic, A.; She, D.; Goossens, K.G.W.

2010-01-01

Processor virtualization divides a physical processor's time among a set of virual machines, enabling efficient hardware utilization, application security and allowing co-existence of different operating systems on the same processor. Through initially intended for the server domain, virtualization
Synchronous message-based communication for distributed heterogeneous systems

International Nuclear Information System (INIS)

Wilkinson, N.; Dohan, D.

1992-01-01

The use of a synchronous, message-based real-time operating system (Unison) as the basis of transparent interprocess and inter-processor communication over VME-bus is described. The implementation of a synchronous, message-based protocol for network communication between heterogeneous systems is discussed. In particular, the design and implementation of a message-based session layer over a virtual circuit transport layer protocol using UDP/IP is described. Inter-process communication is achieved via a message-based semantic which is portable by virtue of its ease of implementation in other operating system environments. Protocol performance for network communication among heterogeneous architecture is presented, including VMS, Unix, Mach and Unison. (author)
Pruning techniques for multi-objective system-level design space exploration

NARCIS (Netherlands)

Piscitelli, R.

2014-01-01

System-level design space exploration (DSE), which is performed early in the design process, is of eminent importance to the design of complex multi-processor embedded system architectures. During system-level DSE, system parameters like, e.g., the number and type of processors, the type and size of
Consensus pursuit of heterogeneous multi-agent systems under a directed acyclic graph

Science.gov (United States)

Yan, Jing; Guan, Xin-Ping; Luo, Xiao-Yuan

2011-04-01

This paper is concerned with the cooperative target pursuit problem by multiple agents based on directed acyclic graph. The target appears at a random location and moves only when sensed by the agents, and agents will pursue the target once they detect its existence. Since the ability of each agent may be different, we consider the heterogeneous multi-agent systems. According to the topology of the multi-agent systems, a novel consensus-based control law is proposed, where the target and agents are modeled as a leader and followers, respectively. Based on Mason's rule and signal flow graph analysis, the convergence conditions are provided to show that the agents can catch the target in a finite time. Finally, simulation studies are provided to verify the effectiveness of the proposed approach.
EXPANDA-75: one-dimensional diffusion code for multi-region plate lattice heterogeneous system

International Nuclear Information System (INIS)

Kikuchi, Yasuyuki; Katsuragi, Satoru; Suzuki, Tomoo; Ogitsu, Makoto.

1975-08-01

An advanced treatment has been developed for analyzing a multi-region plate lattice heterogeneous system using the coarse group constants set provided for a homogeneous system. The essential points of this treatment are modification of effective admixture cross sections and improvement of effective elastic removal cross sections. By this treatment the heterogeneity effects for flux distributions and effective cross sections in the unit cell can be reproduced accurately in comparison with the ultra fine group treatment which consumes huge amounts of computing time. Based on the present treatment and using the JAERI-Fast set, a one-dimensional diffusion code, EXPANDA-75, was developed for extensive use for analyses of fast critical experiments. The user's guide is also presented in this report. (auth.)
Distributed Circumnavigation Control with Dynamic Spacings for a Heterogeneous Multi-robot System

OpenAIRE

Yao, Weijia; Luo, Sha; Lu, Huimin; Xiao, Junhao

2018-01-01

Circumnavigation control is useful in real-world applications such as entrapping a hostile target. In this paper, we consider a heterogeneous multi-robot system where robots have different physical properties, such as maximum movement speeds. Instead of equal-spacings, dynamic spacings according to robots' properties, which are termed utilities in this paper, will be more desirable in a scenario such as target entrapment. A distributed circumnavigation control algorithm based on utilities is ...
Behavioral Simulation and Performance Evaluation of Multi-Processor Architectures

Directory of Open Access Journals (Sweden)

Ausif Mahmood

1996-01-01

Full Text Available The development of multi-processor architectures requires extensive behavioral simulations to verify the correctness of design and to evaluate its performance. A high level language can provide maximum flexibility in this respect if the constructs for handling concurrent processes and a time mapping mechanism are added. This paper describes a novel technique for emulating hardware processes involved in a parallel architecture such that an object-oriented description of the design is maintained. The communication and synchronization between hardware processes is handled by splitting the processes into their equivalent subprograms at the entry points. The proper scheduling of these subprograms is coordinated by a timing wheel which provides a time mapping mechanism. Finally, a high level language pre-processor is proposed so that the timing wheel and the process emulation details can be made transparent to the user.
Power-Energy Simulation for Multi-Core Processors in Bench-marking

Directory of Open Access Journals (Sweden)

Mona A. Abou-Of

2017-01-01

Full Text Available At Microarchitectural level, multi-core processor, as a complex System on Chip, has sophisticated on-chip components including cores, shared caches, interconnects and system controllers such as memory and ethernet controllers. At technological level, architects should consider the device types forecast in the International Technology Roadmap for Semiconductors (ITRS. Energy simulation enables architects to study two important metrics simultaneously. Timing is a key element of the CPU performance that imposes constraints on the CPU target clock frequency. Power and the resulting heat impose more severe design constraints, such as core clustering, while semiconductor industry is providing more transistors in the die area in pace with Moore’s law. Energy simulators provide a solution for such serious challenge. Energy is modelled either by combining performance benchmarking tool with a power simulator or by an integrated framework of both performance simulator and power proﬁling system. This article presents and asses trade-oﬀs between diﬀerent architectures using four cores battery-powered mobile systems by running a custom-made and a standard benchmark tools. The experimental results assure the Energy/ Frequency convexity rule over a range of frequency settings on diﬀerent number of enabled cores. The reported results show that increasing the number of cores has a great eﬀect on increasing the power consumption. However, a minimum energy dissipation will occur at a lower frequency which reduces the power consumption. Despite that, increasing the number of cores will also increase the eﬀective cores value which will reﬂect a better processor performance.
Multi spectral scaling data acquisition system

International Nuclear Information System (INIS)

Behere, Anita; Patil, R.D.; Ghodgaonkar, M.D.; Gopalakrishnan, K.R.

1997-01-01

In nuclear spectroscopy applications, it is often desired to acquire data at high rate with high resolution. With the availability of low cost computers, it is possible to make a powerful data acquisition system with minimum hardware and software development, by designing a PC plug-in acquisition board. But in using the PC processor for data acquisition, the PC can not be used as a multitasking node. Keeping this in view, PC plug-in acquisition boards with on-board processor find tremendous applications. Transputer based data acquisition board has been designed which can be configured as a high count rate pulse height MCA or as a Multi Spectral Scaler. Multi Spectral Scaling (MSS) is a new technique, in which multiple spectra are acquired in small time frames and are then analyzed. This paper describes the details of this multi spectral scaling data acquisition system. 2 figs
Heterogeneous computing with OpenCL 2.0

CERN Document Server

Kaeli, David R; Schaa, Dana; Zhang, Dong Ping

2015-01-01

Heterogeneous Computing with OpenCL 2.0 teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs). This fully-revised edition includes the latest enhancements in OpenCL 2.0 including: Shared virtual memory to increase programming flexibility and reduce data transfers that consume resources Dynamic parallelism which reduces processor load and avoids bottlenecks Improved imaging support and integration with OpenGL Designed to work on multiple platfor
Upgrade of the PreProcessor System for the ATLAS Level-1 Calorimeter Trigger

CERN Document Server

Khomich, A

2010-01-01

The ATLAS Level-1 Calorimeter Trigger is a hardware-based pipelined system designed to identify high-pT objects in the ATLAS calorimeters within a fixed latency of 2.5\\,us. It consists of three subsystems: the PreProcessor which conditions and digitizes analogue signals and two digital processors. The majority of the PreProcessor's tasks are performed on a dense Multi-Chip Module(MCM) consisting of FADCs, a time-adjustment and digital processing ASICs, and LVDS serialisers designed and implemented in ten years old technologies. An MCM substitute, based on today's components (dual channel FADCs and FPGA), is being developed to profit from state-of-the-art electronics and to enhance the flexibility of the digital processing. Development and first test results are presented.
Upgrade of the PreProcessor System for the ATLAS LVL1 Calorimeter Trigger

CERN Document Server

Khomich, A; The ATLAS collaboration

2010-01-01

The ATLAS Level-1 Calorimeter Trigger is a hardware-based pipelined system designed to identify high-pT objects in the ATLAS calorimeters within a fixed latency of 2.5us. It consists of three subsystems: the PreProcessor which conditions and digitizes analogue signals and two digital processors. The majority of the PreProcessor's tasks are performed on a dense Multi-Chip Module(MCM) consisting of FADCs, a time-adjustment and digital processing ASICs, and LVDS serializers designed and implemented in ten years old technologies. An MCM substitute, based on today's components (dual channel FADCs and FPGA), is being developed to profit from state-of-the-art electronics and to enhance the flexibility of the digital processing. Development and first test results are presented.
The Heidelberg POLYP - a flexible and fault-tolerant poly-processor

International Nuclear Information System (INIS)

Maenner, R.; Deluigi, B.

1981-01-01

The Heidelberg poly-processor system POLYP is described. It is intended to be used in nuclear physics for reprocessing of experimental data, in high energy physics as second-stage trigger processor, and generally in other applications requiring high-computing power. The POLYP system consists of any number of I/O-processors, processor modules (eventually of different types), global memory segments, and a host processor. All modules (up to several hundred) are connected by a multiple common-data-bus system; all processors, additionally, by a multiple sync bus system for processor/task-scheduling. All hard- and software is designed to be decentralized and free of bottle-necks. Most hardware-faults like single-bit errors in memory or multi-bit errors during transfers are automatically corrected. Defective modules, buses, etc., can be removed with only a graceful degradation of the system-throughput. (orig.)
Neurovision processor for designing intelligent sensors

Science.gov (United States)

Gupta, Madan M.; Knopf, George K.

1992-03-01

A programmable multi-task neuro-vision processor, called the Positive-Negative (PN) neural processor, is proposed as a plausible hardware mechanism for constructing robust multi-task vision sensors. The computational operations performed by the PN neural processor are loosely based on the neural activity fields exhibited by certain nervous tissue layers situated in the brain. The neuro-vision processor can be programmed to generate diverse dynamic behavior that may be used for spatio-temporal stabilization (STS), short-term visual memory (STVM), spatio-temporal filtering (STF) and pulse frequency modulation (PFM). A multi- functional vision sensor that performs a variety of information processing operations on time- varying two-dimensional sensory images can be constructed from a parallel and hierarchical structure of numerous individually programmed PN neural processors.
A Reconfigurable Readout Integrated Circuit for Heterogeneous Display-Based Multi-Sensor Systems

Directory of Open Access Journals (Sweden)

Kyeonghwan Park

2017-04-01

Full Text Available This paper presents a reconfigurable multi-sensor interface and its readout integrated circuit (ROIC for display-based multi-sensor systems, which builds up multi-sensor functions by utilizing touch screen panels. In addition to inherent touch detection, physiological and environmental sensor interfaces are incorporated. The reconfigurable feature is effectively implemented by proposing two basis readout topologies of amplifier-based and oscillator-based circuits. For noise-immune design against various noises from inherent human-touch operations, an alternate-sampling error-correction scheme is proposed and integrated inside the ROIC, achieving a 12-bit resolution of successive approximation register (SAR of analog-to-digital conversion without additional calibrations. A ROIC prototype that includes the whole proposed functions and data converters was fabricated in a 0.18 μm complementary metal oxide semiconductor (CMOS process, and its feasibility was experimentally verified to support multiple heterogeneous sensing functions of touch, electrocardiogram, body impedance, and environmental sensors.
Multi-processor developments in the United States for future high energy physics experiments and accelerators

International Nuclear Information System (INIS)

Gaines, I.

1988-03-01

The use of multi-processors for analysis and high-level triggering in High Energy Physics experiments, pioneered by the early emulator systems, has reached maturity, in particular with the multiple microprocessor systems in use at Fermilab. It is widely acknowledged that such systems will fulfill the major portion of the computing needs of future large experiments. Recent developments at Fermilab's Advanced Computer Program will make such systems even more powerful, cost-effective, and easier to use than they are at present. The next generation of microprocessors, already available, will provide CPU power of about one VAX 780 equivalent/$300, while supporting most VMS FORTRAN extensions and large (>8MB) amounts of memory. Low cost high density mass storage devices (based on video tape cartridge technology) will allow parallel I/O to remove potential I/O bottlenecks in systems of over 1000 VAX equipment processors. New interconnection schemes and system software will allow more flexible topologies and extremely high data bandwidth, especially for on-line systems. This talk will summarize the work at the Advanced Computer Program and the rest of the US in this field. 3 refs., 4 figs
Optimization of multi-phase compressible lattice Boltzmann codes on massively parallel multi-core systems

NARCIS (Netherlands)

Biferale, L.; Mantovani, F.; Pivanti, M.; Pozzati, F.; Sbragaglia, M.; Schifano, S.F.; Toschi, F.; Tripiccione, R.

2011-01-01

We develop a Lattice Boltzmann code for computational fluid-dynamics and optimize it for massively parallel systems based on multi-core processors. Our code describes 2D multi-phase compressible flows. We analyze the performance bottlenecks that we find as we gradually expose a larger fraction of

A lock circuit for a multi-core processor

DEFF Research Database (Denmark)

2015-01-01

An integrated circuit comprising a multiple processor cores and a lock circuit that comprises a queue register with respective bits set or reset via respective, connections dedicated to respective processor cores, whereby the queue register identifies those among the multiple processor cores...... that are enqueued in the queue register. Furthermore, the integrated circuit comprises a current register and a selector circuit configured to select a processor core and identify that processor core by a value in the current register. A selected processor core is a prioritized processor core among the cores...... configured with an integrated circuit; and a silicon die configured with an integrated circuit....
Keystone Business Models for Network Security Processors

OpenAIRE

Arthur Low; Steven Muegge

2013-01-01

Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor...
Application of the coupled code Athlet-Quabox/Cubbox for the extreme scenarios of the OECD/NRC BWR turbine trip benchmark and its performance on multi-processor computers

International Nuclear Information System (INIS)

Langenbuch, S.; Schmidt, K.D.; Velkov, K.

2003-01-01

The OECD/NRC BWR Turbine Trip (TT) Benchmark is investigated to perform code-to-code comparison of coupled codes including a comparison to measured data which are available from turbine trip experiments at Peach Bottom 2. This Benchmark problem for a BWR over-pressure transient represents a challenging application of coupled codes which integrate 3-dimensional neutron kinetics into thermal-hydraulic system codes for best-estimate simulation of plant transients. This transient represents a typical application of coupled codes which are usually performed on powerful workstations using a single CPU. Nowadays, the availability of multi-CPUs is much easier. Indeed, powerful workstations already provide 4 to 8 CPU, computer centers give access to multi-processor systems with numbers of CPUs in the order of 16 up to several 100. Therefore, the performance of the coupled code Athlet-Quabox/Cubbox on multi-processor systems is studied. Different cases of application lead to changing requirements of the code efficiency, because the amount of computer time spent in different parts of the code is varying. This paper presents main results of the coupled code Athlet-Quabox/Cubbox for the extreme scenarios of the BWR TT Benchmark together with evaluations of the code performance on multi-processor computers. (authors)
Runtime adaptive multi-processor system-on-chip: RAMPSoC

OpenAIRE

Göhringer, D.; Hübner, M.; Schatz, V.; Becker, J.

2008-01-01

Current trends in high performance computing show, that the usage of multiprocessor systems on chip are one approach for the requirements of computing intensive applications. The multiprocessor system on chip (MPSoC) approaches often provide a static and homogeneous infrastructure of networked microprocessor on the chip die. A novel idea in this research area is to introduce the dynamic adaptivity of reconfigurable hardware in order to provide a flexible heterogeneous set of processing elemen...
Simulation of Particulate Flows on Multi-Processor Machines with Distributed Memory

International Nuclear Information System (INIS)

Uhlmann, M.

2004-01-01

We present a method for the parallelization of an immersed boundary algorithm for particulate flows using the MPI standard of communication. The treatment of the fluid phase uses the domain decomposition technique over a Cartesian processor grid. The solution of the Hehnholtz problem is approximately factorized an relies upon apparel tri-diagonal solver; the Poisson problem is solved by means of a parallel multi-grid technique simulator MUDPACK. For the solid phase we employ a master-slaves technique where one process or handles all the particles contained in its Eulerian fluid sub-domain and zero or more neighbor processors collaborate in the computation of particle-related quantities whenever a particle position overlaps the boundary of a sub- do mam.The parallel efficiency for some preliminary computations is presented. (Author) 9 refs
Interleaved Subtask Scheduling on Multi Processor SOC

NARCIS (Netherlands)

Zhe, M.

2006-01-01

The ever-progressing semiconductor processing technique has integrated more and more embedded processors on a single system-on-achip (SoC). With such powerful SoC platforms, and also due to the stringent time-to-market deadlines, many functionalities which used to be implemented in ASICs are
Special processor for in-core control systems

International Nuclear Information System (INIS)

Golovanov, M.N.; Duma, V.R.; Levin, G.L.; Mel'nikov, A.V.; Polikanin, A.V.; Filatov, V.P.

1978-01-01

The BUTs-20 special processor is discussed, designed to control the units of the in-core control equipment which are incorporated into the VECTOR communication channel, and to provide preliminary data processing prior to computer calculations. A set of instructions and flowsheet of the processor, organization of its communication with memories and other units of the system are given. The processor components: a control unit and an arithmetic logical unit are discussed. It is noted that the special processor permits more effective utilization of the computer time
High Fidelity, Numerical Investigation of Cross Talk in a Multi-Qubit Xmon Processor

Science.gov (United States)

Najafi-Yazdi, Alireza; Kelly, Julian; Martinis, John

Unwanted electromagnetic interference between qubits, transmission lines, flux lines and other elements of a superconducting quantum processor poses a challenge in engineering such devices. This problem is exacerbated with scaling up the number of qubits. High fidelity, massively parallel computational toolkits, which can simulate the 3D electromagnetic environment and all features of the device, are instrumental in addressing this challenge. In this work, we numerically investigated the crosstalk between various elements of a multi-qubit quantum processor designed and tested by the Google team. The processor consists of 6 superconducting Xmon qubits with flux lines and gatelines. The device also consists of a Purcell filter for readout. The simulations are carried out with a high fidelity, massively parallel EM solver. We will present our findings regarding the sources of crosstalk in the device, as well as numerical model setup, and a comparison with available experimental data.
Interactive high-resolution isosurface ray casting on multicore processors.

Science.gov (United States)

Wang, Qin; JaJa, Joseph

2008-01-01

We present a new method for the interactive rendering of isosurfaces using ray casting on multi-core processors. This method consists of a combination of an object-order traversal that coarsely identifies possible candidate 3D data blocks for each small set of contiguous pixels, and an isosurface ray casting strategy tailored for the resulting limited-size lists of candidate 3D data blocks. While static screen partitioning is widely used in the literature, our scheme performs dynamic allocation of groups of ray casting tasks to ensure almost equal loads among the different threads running on multi-cores while maintaining spatial locality. We also make careful use of memory management environment commonly present in multi-core processors. We test our system on a two-processor Clovertown platform, each consisting of a Quad-Core 1.86-GHz Intel Xeon Processor, for a number of widely different benchmarks. The detailed experimental results show that our system is efficient and scalable, and achieves high cache performance and excellent load balancing, resulting in an overall performance that is superior to any of the previous algorithms. In fact, we achieve an interactive isosurface rendering on a 1024(2) screen for all the datasets tested up to the maximum size of the main memory of our platform.
Scalable High-Performance Parallel Design for Network Intrusion Detection Systems on Many-Core Processors

OpenAIRE

Jiang, Hayang; Xie, Gaogang; Salamatian, Kavé; Mathy, Laurent

2013-01-01

Network Intrusion Detection Systems (NIDSes) face significant challenges coming from the relentless network link speed growth and increasing complexity of threats. Both hardware accelerated and parallel software-based NIDS solutions, based on commodity multi-core and GPU processors, have been proposed to overcome these challenges. Network Intrusion Detection Systems (NIDSes) face significant challenges coming from the relentless network link speed growth and increasing complexity of threats. ...
A heterogeneous multi-core platform for low power signal processing in systems-on-chip

DEFF Research Database (Denmark)

Paker, Ozgun; Sparsø, Jens; Haandbæk, Niels

2002-01-01

is based on message passing. The mini-cores are designed as parameterized soft macros intended for a synthesis based design flow. A 520.000 transistor 0.25µm CMOS prototype chip containing 6 mini-cores has been fabricated and tested. Its power consumption is only 50% higher than a hardwired ASIC and more......This paper presents a low-power and programmable DSP architecture - a heterogeneous multiprocessor platform consisting of standard CPU/DSP cores, and a set of simple instruction set processors called mini-cores each optimized for a particular class of algorithm (FIR, IIR, LMS, etc.). Communication...
Hardware trigger processor for the MDT system

CERN Document Server

AUTHOR|(SzGeCERN)757787; The ATLAS collaboration; Hazen, Eric; Butler, John; Black, Kevin; Gastler, Daniel Edward; Ntekas, Konstantinos; Taffard, Anyes; Martinez Outschoorn, Verena; Ishino, Masaya; Okumura, Yasuyuki

2017-01-01

We are developing a low-latency hardware trigger processor for the Monitored Drift Tube system in the Muon spectrometer. The processor will fit candidate Muon tracks in the drift tubes in real time, improving significantly the momentum resolution provided by the dedicated trigger chambers. We present a novel pure-FPGA implementation of a Legendre transform segment finder, an associative-memory alternative implementation, an ARM (Zynq) processor-based track fitter, and compact ATCA carrier board architecture. The ATCA architecture is designed to allow a modular, staged approach to deployment of the system and exploration of alternative technologies.
Fuzzy logic based power-efficient real-time multi-core system

CERN Document Server

Ahmed, Jameel; Najam, Shaheryar; Najam, Zohaib

2017-01-01

This book focuses on identifying the performance challenges involved in computer architectures, optimal configuration settings and analysing their impact on the performance of multi-core architectures. Proposing a power and throughput-aware fuzzy-logic-based reconfiguration for Multi-Processor Systems on Chip (MPSoCs) in both simulation and real-time environments, it is divided into two major parts. The first part deals with the simulation-based power and throughput-aware fuzzy logic reconfiguration for multi-core architectures, presenting the results of a detailed analysis on the factors impacting the power consumption and performance of MPSoCs. In turn, the second part highlights the real-time implementation of fuzzy-logic-based power-efficient reconfigurable multi-core architectures for Intel and Leone3 processors. .
Sojourn time tails in processor-sharing systems

NARCIS (Netherlands)

Egorova, R.R.

2009-01-01

The processor-sharing discipline was originally introduced as a modeling abstraction for the design and performance analysis of the processing unit of a computer system. Under the processor-sharing discipline, all active tasks are assumed to be processed simultaneously, receiving an equal share of
Adaptive Synchronization for Heterogeneous Multi-Agent Systems with Switching Topologies

Directory of Open Access Journals (Sweden)

Muhammad Ridho Rosa

2018-02-01

Full Text Available This work provides a multi-agent extension of output-feedback model reference adaptive control (MRAC, designed to synchronize a network of heterogeneous uncertain agents. The implementation of this scheme is based on multi-agent matching conditions. The practical advantage of the proposed MRAC is the possibility of handling the case of the unknown dynamics of the agents only by using the output and the control input of its neighbors. In addition, it is reasonable to consider the case when the communication topology is time-varying. In this work, the time-varying communication leads to a switching control structure that depends on the number of the predecessor of the agents. By using the switching control structure to handle the time-varying topologies, we show that synchronization can be achieved. The multi-agent adaptive switching controller is first analyzed, and numerical simulations based on formation control of simplifier quadcopter dynamics are provided.
Multi-core System Architecture for Safety-critical Control Applications

DEFF Research Database (Denmark)

Li, Gang

and size, and high power consumption. Increasing the frequency of a processor is becoming painful now due to the explosive power consumption. Furthermore, components integrated into a single-core processor have to be certified to the highest SIL, due to that no isolation is provided in a traditional single...... certification cost. Meanwhile, hardware platforms with improved processing power are required to execute the applications of larger size. To tackle the two issues mentioned above, the state of the art approaches are using more Electronic Control Units (ECU) in a federated architecture or increasing......-core processor. A promising alternative to improve processing power and provide isolation is to adopt a multi-core architecture with on-chip isolation. In general, a specific multi-core architecture can facilitate the development and certification of safety-related systems, due to its physical isolation between...
An orthogonal wavelet division multiple-access processor architecture for LTE-advanced wireless/radio-over-fiber systems over heterogeneous networks

Science.gov (United States)

Mahapatra, Chinmaya; Leung, Victor CM; Stouraitis, Thanos

2014-12-01

The increase in internet traffic, number of users, and availability of mobile devices poses a challenge to wireless technologies. In long-term evolution (LTE) advanced system, heterogeneous networks (HetNet) using centralized coordinated multipoint (CoMP) transmitting radio over optical fibers (LTE A-ROF) have provided a feasible way of satisfying user demands. In this paper, an orthogonal wavelet division multiple-access (OWDMA) processor architecture is proposed, which is shown to be better suited to LTE advanced systems as compared to orthogonal frequency division multiple access (OFDMA) as in LTE systems 3GPP rel.8 (3GPP, http://www.3gpp.org/DynaReport/36300.htm). ROF systems are a viable alternative to satisfy large data demands; hence, the performance in ROF systems is also evaluated. To validate the architecture, the circuit is designed and synthesized on a Xilinx vertex-6 field-programmable gate array (FPGA). The synthesis results show that the circuit performs with a clock period as short as 7.036 ns (i.e., a maximum clock frequency of 142.13 MHz) for transform size of 512. A pipelined version of the architecture reduces the power consumption by approximately 89%. We compare our architecture with similar available architectures for resource utilization and timing and provide performance comparison with OFDMA systems for various quality metrics of communication systems. The OWDMA architecture is found to perform better than OFDMA for bit error rate (BER) performance versus signal-to-noise ratio (SNR) in wireless channel as well as ROF media. It also gives higher throughput and mitigates the bad effect of peak-to-average-power ratio (PAPR).
Parallelization of applications for networks with homogeneous and heterogeneous processors; Parallelisation d`applications pour des reseaux de processeurs homogenes ou heterogenes

Energy Technology Data Exchange (ETDEWEB)

Colombet, L

1994-10-07

The aim of this thesis is to study and develop efficient methods for parallelization of scientific applications on parallel computers with distributed memory. The first part presents two libraries of PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) communication tools. They allow implementation of programs on most parallel machines, but also on heterogeneous computer networks. This chapter illustrates the problems faced when trying to evaluate performances of networks with heterogeneous processors. To evaluate such performances, the concepts of speed-up and efficiency have been modified and adapted to account for heterogeneity. The second part deals with a study of parallel application libraries such as ScaLAPACK and with the development of communication masking techniques. The general concept is based on communication anticipation, in particular by pipelining message sending operations. Experimental results on Cray T3D and IBM SP1 machines validates the theoretical studies performed on basic algorithms of the libraries discussed above. Two examples of scientific applications are given: the first is a model of young stars for astrophysics and the other is a model of photon trajectories in the Compton effect. (J.S.). 83 refs., 65 figs., 24 tabs.
A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System.

Science.gov (United States)

Zhang, Zhen; Ma, Cheng; Zhu, Rong

2017-08-23

Artificial Neural Networks (ANNs), including Deep Neural Networks (DNNs), have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA) architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP). The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO) real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas.
A FPGA-Based, Granularity-Variable Neuromorphic Processor and Its Application in a MIMO Real-Time Control System

Directory of Open Access Journals (Sweden)

Zhen Zhang

2017-08-01

Full Text Available Artificial Neural Networks (ANNs, including Deep Neural Networks (DNNs, have become the state-of-the-art methods in machine learning and achieved amazing success in speech recognition, visual object recognition, and many other domains. There are several hardware platforms for developing accelerated implementation of ANN models. Since Field Programmable Gate Array (FPGA architectures are flexible and can provide high performance per watt of power consumption, they have drawn a number of applications from scientists. In this paper, we propose a FPGA-based, granularity-variable neuromorphic processor (FBGVNP. The traits of FBGVNP can be summarized as granularity variability, scalability, integrated computing, and addressing ability: first, the number of neurons is variable rather than constant in one core; second, the multi-core network scale can be extended in various forms; third, the neuron addressing and computing processes are executed simultaneously. These make the processor more flexible and better suited for different applications. Moreover, a neural network-based controller is mapped to FBGVNP and applied in a multi-input, multi-output, (MIMO real-time, temperature-sensing and control system. Experiments validate the effectiveness of the neuromorphic processor. The FBGVNP provides a new scheme for building ANNs, which is flexible, highly energy-efficient, and can be applied in many areas.

AMD's 64-bit Opteron processor

CERN Multimedia

CERN. Geneva

2003-01-01

This talk concentrates on issues that relate to obtaining peak performance from the Opteron processor. Compiler options, memory layout, MPI issues in multi-processor configurations and the use of a NUMA kernel will be covered. A discussion of recent benchmarking projects and results will also be included.BiographiesDavid RichDavid directs AMD's efforts in high performance computing and also in the use of Opteron processors...
Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors

OpenAIRE

Catalán, Sandra; Herrero, José R.; Igual, Francisco D.; Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.

2015-01-01

Dense linear algebra libraries, such as BLAS and LAPACK, provide a relevant collection of numerical tools for many scientific and engineering applications. While there exist high performance implementations of the BLAS (and LAPACK) functionality for many current multi-threaded architectures,the adaption of these libraries for asymmetric multicore processors (AMPs)is still pending. In this paper we address this challenge by developing an asymmetry-aware implementation of the BLAS, based on the...
A multi-agent conversational system with heterogeneous data sources access

KAUST Repository

Eisman, Eduardo M.; Navarro, Marí a; Castro, Juan Luis

2016-01-01

In many of the problems that can be found nowadays, information is scattered across different heterogeneous data sources. Most of the natural language interfaces just focus on a very specific part of the problem (e.g. an interface to a relational database, or an interface to an ontology). However, from the point of view of users, it does not matter where the information is stored, they just want to get the knowledge in an integrated, transparent, efficient, effective, and pleasant way. To solve this problem, this article proposes a generic multi-agent conversational architecture that follows the divide and conquer philosophy and considers two different types of agents. Expert agents are specialized in accessing different knowledge sources, and decision agents coordinate them to provide a coherent final answer to the user. This architecture has been used to design and implement SmartSeller, a specific system which includes a Virtual Assistant to answer general questions and a Bookseller to query a book database. A deep analysis regarding other relevant systems has demonstrated that our proposal provides several improvements at some key features presented along the paper.
A multi-agent conversational system with heterogeneous data sources access

KAUST Repository

Eisman, Eduardo M.

2016-01-28

In many of the problems that can be found nowadays, information is scattered across different heterogeneous data sources. Most of the natural language interfaces just focus on a very specific part of the problem (e.g. an interface to a relational database, or an interface to an ontology). However, from the point of view of users, it does not matter where the information is stored, they just want to get the knowledge in an integrated, transparent, efficient, effective, and pleasant way. To solve this problem, this article proposes a generic multi-agent conversational architecture that follows the divide and conquer philosophy and considers two different types of agents. Expert agents are specialized in accessing different knowledge sources, and decision agents coordinate them to provide a coherent final answer to the user. This architecture has been used to design and implement SmartSeller, a specific system which includes a Virtual Assistant to answer general questions and a Bookseller to query a book database. A deep analysis regarding other relevant systems has demonstrated that our proposal provides several improvements at some key features presented along the paper.
The research and application of multi-biometric acquisition embedded system

Science.gov (United States)

Deng, Shichao; Liu, Tiegen; Guo, Jingjing; Li, Xiuyan

2009-11-01

The identification technology based on multi-biometric can greatly improve the applicability, reliability and antifalsification. This paper presents a multi-biometric system bases on embedded system, which includes: three capture daughter boards are applied to obtain different biometric: one each for fingerprint, iris and vein of the back of hand; FPGA (Field Programmable Gate Array) is designed as coprocessor, which uses to configure three daughter boards on request and provides data path between DSP (digital signal processor) and daughter boards; DSP is the master processor and its functions include: control the biometric information acquisition, extracts feature as required and responsible for compare the results with the local database or data server through network communication. The advantages of this system were it can acquire three different biometric in real time, extracts complexity feature flexibly in different biometrics' raw data according to different purposes and arithmetic and network interface on the core-board will be the solution of big data scale. Because this embedded system has high stability, reliability, flexibility and fit for different data scale, it can satisfy the demand of multi-biometric recognition.
Multi-threaded ATLAS Simulation on Intel Knights Landing Processors

CERN Document Server

Farrell, Steven; The ATLAS collaboration; Calafiura, Paolo; Leggett, Charles

2016-01-01

The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), will be delivered to its users in two phases with the first phase online now and the second phase expected in mid-2016. Cori Phase 2 will be based on the KNL architecture and will contain over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a great use-case for the KNL architecture and supercomputers like Cori. Simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this presentation we will give an overview of the ATLAS simulation application with details on its multi-thr...
An FPGA-based heterogeneous image fusion system design method

Science.gov (United States)

Song, Le; Lin, Yu-chi; Chen, Yan-hua; Zhao, Mei-rong

2011-08-01

Taking the advantages of FPGA's low cost and compact structure, an FPGA-based heterogeneous image fusion platform is established in this study. Altera's Cyclone IV series FPGA is adopted as the core processor of the platform, and the visible light CCD camera and infrared thermal imager are used as the image-capturing device in order to obtain dualchannel heterogeneous video images. Tailor-made image fusion algorithms such as gray-scale weighted averaging, maximum selection and minimum selection methods are analyzed and compared. VHDL language and the synchronous design method are utilized to perform a reliable RTL-level description. Altera's Quartus II 9.0 software is applied to simulate and implement the algorithm modules. The contrast experiments of various fusion algorithms show that, preferably image quality of the heterogeneous image fusion can be obtained on top of the proposed system. The applied range of the different fusion algorithms is also discussed.
OLYMPUS system and development of its pre-processor

International Nuclear Information System (INIS)

Okamoto, Masao; Takeda, Tatsuoki; Tanaka, Masatoshi; Asai, Kiyoshi; Nakano, Koh.

1977-08-01

The OLYMPUS SYSTEM developed by K. V. Roverts et al. was converted and introduced in computer system FACOM 230/75 of the JAERI Computing Center. A pre-processor was also developed for the OLYMPUS SYSTEM. The OLYMPUS SYSTEM is very useful for development, standardization and exchange of programs in thermonuclear fusion research and plasma physics. The pre-processor developed by the present authors is not only essential for the JAERI OLYMPUS SYSTEM, but also useful in manipulation, creation and correction of program files. (auth.)
Very wide register : an asymmetric register file organization for low power embedded processors.

NARCIS (Netherlands)

Raghavan, P.; Lambrechts, A.; Jayapala, M.; Catthoor, F.; Verkest, D.T.M.L.; Corporaal, H.

2007-01-01

In current embedded systems processors, multi-ported register files are one of the most power hungry parts of the processor, even when they are clustered. This paper presents a novel register file architecture, which has single ported cells and asymmetric interfaces to the memory and to the
Digital Signal Processor System for AC Power Drivers

Directory of Open Access Journals (Sweden)

Ovidiu Neamtu

2009-10-01

Full Text Available DSP (Digital Signal Processor is the bestsolution for motor control systems to make possible thedevelopment of advanced motor drive systems. The motorcontrol processor calculates the required motor windingvoltage magnitude and frequency to operate the motor atthe desired speed. A PWM (Pulse Width Modulationcircuit controls the on and off duty cycle of the powerinverter switches to vary the magnitude of the motorvoltages.
Mathematical Methods and Algorithms of Mobile Parallel Computing on the Base of Multi-core Processors

Directory of Open Access Journals (Sweden)

Alexander B. Bakulev

2012-11-01

Full Text Available This article deals with mathematical models and algorithms, providing mobility of sequential programs parallel representation on the high-level language, presents formal model of operation environment processes management, based on the proposed model of programs parallel representation, presenting computation process on the base of multi-core processors.
Digital image processing software system using an array processor

International Nuclear Information System (INIS)

Sherwood, R.J.; Portnoff, M.R.; Journeay, C.H.; Twogood, R.E.

1981-01-01

A versatile array processor-based system for general-purpose image processing was developed. At the heart of this system is an extensive, flexible software package that incorporates the array processor for effective interactive image processing. The software system is described in detail, and its application to a diverse set of applications at LLNL is briefly discussed. 4 figures, 1 table
Dynamic Voltage-Frequency and Workload Joint Scaling Power Management for Energy Harvesting Multi-Core WSN Node SoC

Directory of Open Access Journals (Sweden)

Xiangyu Li

2017-02-01

Full Text Available This paper proposes a scheduling and power management solution for energy harvesting heterogeneous multi-core WSN node SoC such that the system continues to operate perennially and uses the harvested energy efficiently. The solution consists of a heterogeneous multi-core system oriented task scheduling algorithm and a low-complexity dynamic workload scaling and configuration optimization algorithm suitable for light-weight platforms. Moreover, considering the power consumption of most WSN applications have the characteristic of data dependent behavior, we introduce branches handling mechanism into the solution as well. The experimental result shows that the proposed algorithm can operate in real-time on a lightweight embedded processor (MSP430, and that it can make a system do more valuable works and make more than 99.9% use of the power budget.
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems

Science.gov (United States)

Downie, John D.; Goodman, Joseph W.

1989-10-01

The accuracy requirements of optical processors in adaptive optics systems are determined by estimating the required accuracy in a general optical linear algebra processor (OLAP) that results in a smaller average residual aberration than that achieved with a conventional electronic digital processor with some specific computation speed. Special attention is given to an error analysis of a general OLAP with regard to the residual aberration that is created in an adaptive mirror system by the inaccuracies of the processor, and to the effect of computational speed of an electronic processor on the correction. Results are presented on the ability of an OLAP to compete with a digital processor in various situations.
ARM Processor Based Embedded System for Remote Data Acquisition

OpenAIRE

Raj Kumar Tiwari; Santosh Kumar Agrahari

2014-01-01

The embedded systems are widely used for the data acquisition. The data acquired may be used for monitoring various activity of the system or it can be used to control the parts of the system. Accessing various signals with remote location has greater advantage for multisite operation or unmanned systems. The remote data acquisition used in this paper is based on ARM processor. The Cortex M3 processor used in this system has in-built Ethernet controller which facilitate to acquire the remote ...
A High Performance Multi-Core FPGA Implementation for 2D Pixel Clustering for the ATLAS Fast TracKer (FTK) Processor

CERN Document Server

Sotiropoulou, C-L; The ATLAS collaboration; Beretta, M; Gkaitatzis, S; Kordas, K; Nikolaidis, S; Petridou, C; Volpi, G

2014-01-01

The high performance multi-core 2D pixel clustering FPGA implementation used for the input system of the ATLAS Fast TracKer (FTK) processor is presented. The input system for the FTK processor will receive data from the Pixel and micro-strip detectors read out drivers (RODs) at 760Gbps, the full rate of level 1 triggers. Clustering is required as a method to reduce the high rate of the received data before further processing, as well as to determine the cluster centroid for obtaining obtain the best spatial measurement. Our implementation targets the pixel detectors and uses a 2D-clustering algorithm that takes advantage of a moving window technique to minimize the logic required for cluster identification. The design is fully generic and the cluster detection window size can be adjusted for optimizing the cluster identification process. Τhe implementation can be parallelized by instantiating multiple cores to identify different clusters independently thus exploiting more FPGA resources. This flexibility mak...
Reconfigurable signal processor designs for advanced digital array radar systems

Science.gov (United States)

Suarez, Hernan; Zhang, Yan (Rockee); Yu, Xining

2017-05-01

The new challenges originated from Digital Array Radar (DAR) demands a new generation of reconfigurable backend processor in the system. The new FPGA devices can support much higher speed, more bandwidth and processing capabilities for the need of digital Line Replaceable Unit (LRU). This study focuses on using the latest Altera and Xilinx devices in an adaptive beamforming processor. The field reprogrammable RF devices from Analog Devices are used as analog front end transceivers. Different from other existing Software-Defined Radio transceivers on the market, this processor is designed for distributed adaptive beamforming in a networked environment. The following aspects of the novel radar processor will be presented: (1) A new system-on-chip architecture based on Altera's devices and adaptive processing module, especially for the adaptive beamforming and pulse compression, will be introduced, (2) Successful implementation of generation 2 serial RapidIO data links on FPGA, which supports VITA-49 radio packet format for large distributed DAR processing. (3) Demonstration of the feasibility and capabilities of the processor in a Micro-TCA based, SRIO switching backplane to support multichannel beamforming in real-time. (4) Application of this processor in ongoing radar system development projects, including OU's dual-polarized digital array radar, the planned new cylindrical array radars, and future airborne radars.
Heterogeneous reconfigurable processors for real-time baseband processing from algorithm to architecture

CERN Document Server

Zhang, Chenxin; Öwall, Viktor

2016-01-01

This book focuses on domain-specific heterogeneous reconfigurable architectures, demonstrating for readers a computing platform which is flexible enough to support multiple standards, multiple modes, and multiple algorithms. The content is multi-disciplinary, covering areas of wireless communication, computing architecture, and circuit design. The platform described provides real-time processing capability with reasonable implementation cost, achieving balanced trade-offs among flexibility, performance, and hardware costs. The authors discuss efficient design methods for wireless communication processing platforms, from both an algorithm and architecture design perspective. Coverage also includes computing platforms for different wireless technologies and standards, including MIMO, OFDM, Massive MIMO, DVB, WLAN, LTE/LTE-A, and 5G. •Discusses reconfigurable architectures, including hardware building blocks such as processing elements, memory sub-systems, Network-on-Chip (NoC), and dynamic hardware reconfigur...
ACP/R3000 processors in data acquisition systems

International Nuclear Information System (INIS)

Deppe, J.; Areti, H.; Atac, R.

1989-02-01

We describe ACP/R3000 processor based data acquisition systems for high energy physics. This VME bus compatible processor board, with a computational power equivalent to 15 VAX 11/780s or better, contains 8 Mb of memory for event buffering and has a high speed secondary bus that allows data gathering from front end electronics. 2 refs., 3 figs
Distributed processor systems

International Nuclear Information System (INIS)

Zacharov, B.

1976-01-01

In recent years, there has been a growing tendency in high-energy physics and in other fields to solve computational problems by distributing tasks among the resources of inter-coupled processing devices and associated system elements. This trend has gained further momentum more recently with the increased availability of low-cost processors and with the development of the means of data distribution. In two lectures, the broad question of distributed computing systems is examined and the historical development of such systems reviewed. An attempt is made to examine the reasons for the existence of these systems and to discern the main trends for the future. The components of distributed systems are discussed in some detail and particular emphasis is placed on the importance of standards and conventions in certain key system components. The ideas and principles of distributed systems are discussed in general terms, but these are illustrated by a number of concrete examples drawn from the context of the high-energy physics environment. (Auth.)

Early Student Support for Application of Advanced Multi-Core Processor Technologies to Oceanographic Research

Science.gov (United States)

2016-05-07

REPORT DOCUMENTATION PAGE I . ... ... .. . ,...,.., ............. OMB No. 0704-0188 The public reporting burden for this collection of...Student Support for Appl ication of Advanced Multi- Core Processor N00014-12-1-0298 Technologies to Oceanographic Research Sb. GRANT NUMBER Sc...communications protocols (i.e. UART, I2C, and SPI), through the , ’ . handing off of the data to the server APis. By providing a common set of tools
The Advances, Challenges and Future Possibilities of Millimeter-Wave Chip-to-Chip Interconnections for Multi-Chip Systems

Directory of Open Access Journals (Sweden)

Amlan Ganguly

2018-02-01

Full Text Available With aggressive scaling of device geometries, density of manufacturing faults is expected to increase. Therefore, yield of complex Multi-Processor Systems-on-Chips (MP-SoCs will decrease due to higher probability of manufacturing defects especially, in dies with large area. Therefore, disintegration of large SoCs into smaller chips called chiplets will improve yield and cost of complex platform-based systems. This will also provide functional flexibility, modular scalability as well as the capability to integrate heterogeneous architectures and technologies in a single unit. However, with scaling of the number of chiplets in such a system, the shared resources in the system such as the interconnection fabric and memory modules will become performance bottlenecks. Additionally, the integration of heterogeneous chiplets operating at different frequencies and voltages can be challenging. State-of-the-art inter-chip communication requires power-hungry high-speed I/O circuits and data transfer over long wired traces on substrates. This increases energy consumption and latency while decreasing data bandwidth for chip-to-chip communication. In this paper, we explore the advances and the challenges of interconnecting a multi-chip system with millimeter-wave (mm-wave wireless interconnects from a variety of perspectives spanning multiple aspects of the wireless interconnection design. Our discussion on the recent advances include aspects such as interconnection topology, physical layer, Medium Access Control (MAC and routing protocols. We also present some potential paradigm-shifting applications as well as complementary technologies of wireless inter-chip communications.
On-board landmark navigation and attitude reference parallel processor system

Science.gov (United States)

Gilbert, L. E.; Mahajan, D. T.

1978-01-01

An approach to autonomous navigation and attitude reference for earth observing spacecraft is described along with the landmark identification technique based on a sequential similarity detection algorithm (SSDA). Laboratory experiments undertaken to determine if better than one pixel accuracy in registration can be achieved consistent with onboard processor timing and capacity constraints are included. The SSDA is implemented using a multi-microprocessor system including synchronization logic and chip library. The data is processed in parallel stages, effectively reducing the time to match the small known image within a larger image as seen by the onboard image system. Shared memory is incorporated in the system to help communicate intermediate results among microprocessors. The functions include finding mean values and summation of absolute differences over the image search area. The hardware is a low power, compact unit suitable to onboard application with the flexibility to provide for different parameters depending upon the environment.
Evaluation of the Intel Westmere-EX server processor

CERN Document Server

Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

2011-01-01

One year after the arrival of the Intel Xeon 7500 systems (“Nehalem-EX”), CERN openlab is presenting a set of benchmark results obtained when running on the new Xeon E7-4870 Processors, representing the “Westmere-EX” family. A modern 4-socket, 40-core system is confronted with the previous generation of expandable (“EX”) platforms, represented by a 4-socket, 32-core Intel Xeon X7560 based system – both being “top of the line” systems. Benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores via Symmetric MultiThreading (SMT), the cache sizes available, the configured memory topology, as well as the power configuration if throughput per watt is to be measured. As in previous activities, we have tried to do a good job of comparing like with like. In a “top of the line” comparison based on the HEPSPEC06 benchmark, the “We...
Modcomp MAX IV System Processors reference guide

Energy Technology Data Exchange (ETDEWEB)

Cummings, J.

1990-10-01

A user almost always faces a big problem when having to learn to use a new computer system. The information necessary to use the system is often scattered throughout many different manuals. The user also faces the problem of extracting the information really needed from each manual. Very few computer vendors supply a single Users Guide or even a manual to help the new user locate the necessary manuals. Modcomp is no exception to this, Modcomp MAX IV requires that the user be familiar with the system file usage which adds to the problem. At General Atomics there is an ever increasing need for new users to learn how to use the Modcomp computers. This paper was written to provide a condensed Users Reference Guide'' for Modcomp computer users. This manual should be of value not only to new users but any users that are not Modcomp computer systems experts. This Users Reference Guide'' is intended to provided the basic information for the use of the various Modcomp System Processors necessary to, create, compile, link-edit, and catalog a program. Only the information necessary to provide the user with a basic understanding of the Systems Processors is included. This document provides enough information for the majority of programmers to use the Modcomp computers without having to refer to any other manuals. A lot of emphasis has been placed on the file description and usage for each of the System Processors. This allows the user to understand how Modcomp MAX IV does things rather than just learning the system commands.
A resilient and efficient CFD framework: Statistical learning tools for multi-fidelity and heterogeneous information fusion

Science.gov (United States)

Lee, Seungjoon; Kevrekidis, Ioannis G.; Karniadakis, George Em

2017-09-01

Exascale-level simulations require fault-resilient algorithms that are robust against repeated and expected software and/or hardware failures during computations, which may render the simulation results unsatisfactory. If each processor can share some global information about the simulation from a coarse, limited accuracy but relatively costless auxiliary simulator we can effectively fill-in the missing spatial data at the required times by a statistical learning technique - multi-level Gaussian process regression, on the fly; this has been demonstrated in previous work [1]. Based on the previous work, we also employ another (nonlinear) statistical learning technique, Diffusion Maps, that detects computational redundancy in time and hence accelerate the simulation by projective time integration, giving the overall computation a "patch dynamics" flavor. Furthermore, we are now able to perform information fusion with multi-fidelity and heterogeneous data (including stochastic data). Finally, we set the foundations of a new framework in CFD, called patch simulation, that combines information fusion techniques from, in principle, multiple fidelity and resolution simulations (and even experiments) with a new adaptive timestep refinement technique. We present two benchmark problems (the heat equation and the Navier-Stokes equations) to demonstrate the new capability that statistical learning tools can bring to traditional scientific computing algorithms. For each problem, we rely on heterogeneous and multi-fidelity data, either from a coarse simulation of the same equation or from a stochastic, particle-based, more "microscopic" simulation. We consider, as such "auxiliary" models, a Monte Carlo random walk for the heat equation and a dissipative particle dynamics (DPD) model for the Navier-Stokes equations. More broadly, in this paper we demonstrate the symbiotic and synergistic combination of statistical learning, domain decomposition, and scientific computing in
Methanol fuel processor and PEM fuel cell modeling for mobile application

Energy Technology Data Exchange (ETDEWEB)

Chrenko, Daniela [ISAT, University of Burgundy, Rue Mlle Bourgoise, 58000 Nevers (France); Gao, Fei; Blunier, Benjamin; Bouquain, David; Miraoui, Abdellatif [Transport and Systems Laboratory (SeT) - EA 3317/UTBM, Fuel cell Laboratory (FCLAB), University of Technology of Belfort-Montbeliard, Rue Thierry Mieg 90010, Belfort Cedex (France)

2010-07-15

The use of hydrocarbon fed fuel cell systems including a fuel processor can be an entry market for this emerging technology avoiding the problem of hydrogen infrastructure. This article presents a 1 kW low temperature PEM fuel cell system with fuel processor, the system is fueled by a mixture of methanol and water that is converted into hydrogen rich gas using a steam reformer. A complete system model including a fluidic fuel processor model containing evaporation, steam reformer, hydrogen filter, combustion, as well as a multi-domain fuel cell model is introduced. Experiments are performed with an IDATECH FCS1200 trademark fuel cell system. The results of modeling and experimentation show good results, namely with regard to fuel cell current and voltage as well as hydrogen production and pressure. The system is auto sufficient and shows an efficiency of 25.12%. The presented work is a step towards a complete system model, needed to develop a well adapted system control assuring optimized system efficiency. (author)
Green Secure Processors: Towards Power-Efficient Secure Processor Design

Science.gov (United States)

Chhabra, Siddhartha; Solihin, Yan

With the increasing wealth of digital information stored on computer systems today, security issues have become increasingly important. In addition to attacks targeting the software stack of a system, hardware attacks have become equally likely. Researchers have proposed Secure Processor Architectures which utilize hardware mechanisms for memory encryption and integrity verification to protect the confidentiality and integrity of data and computation, even from sophisticated hardware attacks. While there have been many works addressing performance and other system level issues in secure processor design, power issues have largely been ignored. In this paper, we first analyze the sources of power (energy) increase in different secure processor architectures. We then present a power analysis of various secure processor architectures in terms of their increase in power consumption over a base system with no protection and then provide recommendations for designs that offer the best balance between performance and power without compromising security. We extend our study to the embedded domain as well. We also outline the design of a novel hybrid cryptographic engine that can be used to minimize the power consumption for a secure processor. We believe that if secure processors are to be adopted in future systems (general purpose or embedded), it is critically important that power issues are considered in addition to performance and other system level issues. To the best of our knowledge, this is the first work to examine the power implications of providing hardware mechanisms for security.
The processor farm for online triggering and full event reconstruction of the HERA-B experiment at HERA

International Nuclear Information System (INIS)

Gellrich, A.; Dippel, R.; Gensch, U.; Kowallik, R.; Legrand, I.C.; Leich, H.; Sun, F.; Wegner, P.

1996-01-01

The main goal of the HERA-B experiment which start taking data in 1988 is to study CP violation in B decays. This article describes the concept and the planned implementation of a multi-processor system, called processor farm,as the last part of the data acquisition and trigger system of the HERA B experiment. The third level trigger task and a full online event reconstruction will be performed on this processor farm, consisting of more then 100 powerful RISC processors which are based on commercial hardware boards. The controlling will be done by a real-time operating system which provides a software development environment, including FORTRAN and C compilers. (author)
Information flow within relational multi-context systems

DEFF Research Database (Denmark)

Cruz-Filipe, Luís; Gaspar, GrąCa; Nunes, Isabel

2014-01-01

Multi-context systems (MCSs) are an important framework for heterogeneous combinations of systems within the Semantic Web. In this paper, we propose generic constructions to achieve specific forms of interaction in a principled way, and systematize some useful techniques to work with ontologies w...... within an MCS. All these mechanisms are presented in the form of general-purpose design patterns. Their study also suggests new ways in which this framework can be further extended.......Multi-context systems (MCSs) are an important framework for heterogeneous combinations of systems within the Semantic Web. In this paper, we propose generic constructions to achieve specific forms of interaction in a principled way, and systematize some useful techniques to work with ontologies...
UNIBUS processor interface for a FASTBUS data acquisition system

International Nuclear Information System (INIS)

Larwill, M.; Lagerlund, T.D.; Barsotti, E.; Taff, L.M.; Franzen, J.

1981-01-01

Current work on a FASTBUS data acquisition system at Fermilab is described. The system will consist of three pieces of FASTBUS hardware: a UNIBUS processor interface (UPI), a dual-ported bulk memory, and a FASTBUS ''event builder'' (i.e., data acquisition processor). Primary efforts have been on specifying and constructing a UPI. The present specification includes capability for all basic FASTBUS operations, including list processing of consecutive FASTBUS operations. Some possible FASTBUS data acquisition system architectures employing the UPI are discussed along with some detailed specifications of the UPI itself
The micro-processor controlled process radiation monitoring system for reactor safety systems

International Nuclear Information System (INIS)

Mizuno, K.; Noguchi, A.; Kumagami, S.; Gotoh, Y.; Kumahara, T.; Arita, S.

1986-01-01

Digital computers are soon expected to be applied to various real-time safety and safety-related systems in nuclear power plants. Hitachi is now engaged in the development of a micro-processor controlled process radiation monitoring system, which operates on digital processing methods employed with a log ratemeter. A newly defined methodology of design and test procedures is being applied as a means of software program verification for these safety systems. Recently implemented micro-processor technology will help to achieve an advanced man-machine interface and highly reliable performance. (author)
Trends in practical applications of heterogeneous multi-agent systems : the PAAMS collection

CERN Document Server

Rodríguez, Juan; Mathieu, Philippe; Campbell, Andrew; Ortega, Alfonso; Adam, Emmanuel; Navarro, Elena; Ahrndt, Sebastian; Moreno, María; Julián, Vicente

2014-01-01

PAAMS, the International Conference on Practical Applications of Agents and Multi-Agent Systems is an evolution of the International Workshop on Practical Applications of Agents and Multi-Agent Systems. PAAMS is an international yearly tribune to present, to discuss, and to disseminate the latest developments and the most important outcomes related to real-world applications. It provides a unique opportunity to bring multi-disciplinary experts, academics and practitioners together to exchange their experience in the development of Agents and Multi-Agent Systems. This volume presents the papers that have been accepted for the 2014 special sessions: Agents Behaviours and Artificial Markets (ABAM), Agents and Mobile Devices (AM), Bio-Inspired and Multi-Agents Systems: Applications to Languages (BioMAS), Multi-Agent Systems and Ambient Intelligence (MASMAI), Self-Explaining Agents (SEA), Web Mining and Recommender systems (WebMiRes) and Intelligent Educational Systems (SSIES).
On developing B-spline registration algorithms for multi-core processors

International Nuclear Information System (INIS)

Shackleford, J A; Kandasamy, N; Sharp, G C

2010-01-01

Spline-based deformable registration methods are quite popular within the medical-imaging community due to their flexibility and robustness. However, they require a large amount of computing time to obtain adequate results. This paper makes two contributions towards accelerating B-spline-based registration. First, we propose a grid-alignment scheme and associated data structures that greatly reduce the complexity of the registration algorithm. Based on this grid-alignment scheme, we then develop highly data parallel designs for B-spline registration within the stream-processing model, suitable for implementation on multi-core processors such as graphics processing units (GPUs). Particular attention is focused on an optimal method for performing analytic gradient computations in a data parallel fashion. CPU and GPU versions are validated for execution time and registration quality. Performance results on large images show that our GPU algorithm achieves a speedup of 15 times over the single-threaded CPU implementation whereas our multi-core CPU algorithm achieves a speedup of 8 times over the single-threaded implementation. The CPU and GPU versions achieve near-identical registration quality in terms of RMS differences between the generated vector fields.
An enhanced Ada run-time system for real-time embedded processors

Science.gov (United States)

Sims, J. T.

1991-01-01

An enhanced Ada run-time system has been developed to support real-time embedded processor applications. The primary focus of this development effort has been on the tasking system and the memory management facilities of the run-time system. The tasking system has been extended to support efficient and precise periodic task execution as required for control applications. Event-driven task execution providing a means of task-asynchronous control and communication among Ada tasks is supported in this system. Inter-task control is even provided among tasks distributed on separate physical processors. The memory management system has been enhanced to provide object allocation and protected access support for memory shared between disjoint processors, each of which is executing a distinct Ada program.
Adaptive signal processor

Energy Technology Data Exchange (ETDEWEB)

Walz, H.V.

1980-07-01

An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 ..mu..sec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed.
Adaptive signal processor

International Nuclear Information System (INIS)

Walz, H.V.

1980-07-01

An experimental, general purpose adaptive signal processor system has been developed, utilizing a quantized (clipped) version of the Widrow-Hoff least-mean-square adaptive algorithm developed by Moschner. The system accommodates 64 adaptive weight channels with 8-bit resolution for each weight. Internal weight update arithmetic is performed with 16-bit resolution, and the system error signal is measured with 12-bit resolution. An adapt cycle of adjusting all 64 weight channels is accomplished in 8 μsec. Hardware of the signal processor utilizes primarily Schottky-TTL type integrated circuits. A prototype system with 24 weight channels has been constructed and tested. This report presents details of the system design and describes basic experiments performed with the prototype signal processor. Finally some system configurations and applications for this adaptive signal processor are discussed
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

Energy Technology Data Exchange (ETDEWEB)

Barhen, Jacob [ORNL; Kerekes, Ryan A [ORNL; ST Charles, Jesse Lee [ORNL; Buckner, Mark A [ORNL

2008-01-01

High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlation processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY

International Nuclear Information System (INIS)

Barhen, Jacob; Kerekes, Ryan A.; St Charles, Jesse Lee; Buckner, Mark A.

2008-01-01

High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlation processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core
Parallel Computational Intelligence-Based Multi-Camera Surveillance System

OpenAIRE

Orts-Escolano, Sergio; Garcia-Rodriguez, Jose; Morell, Vicente; Cazorla, Miguel; Azorin-Lopez, Jorge; García-Chamizo, Juan Manuel

2014-01-01

In this work, we present a multi-camera surveillance system based on the use of self-organizing neural networks to represent events on video. The system processes several tasks in parallel using GPUs (graphic processor units). It addresses multiple vision tasks at various levels, such as segmentation, representation or characterization, analysis and monitoring of the movement. These features allow the construction of a robust representation of the environment and interpret the behavior of mob...

GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

OpenAIRE

H. Wang; H. Wang; H. Wang; H. Wang; H. Chen; H. Chen; Q. Wu; Q. Wu; J. Lin; X. Chen; X. Xie; R. Wang; R. Wang; X. Tang; Z. Wang

2017-01-01

The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (code...
RTEMS SMP and MTAPI for Efficient Multi-Core Space Applications on LEON3/LEON4 Processors

Science.gov (United States)

Cederman, Daniel; Hellstrom, Daniel; Sherrill, Joel; Bloom, Gedare; Patte, Mathieu; Zulianello, Marco

2015-09-01

This paper presents the final result of an European Space Agency (ESA) activity aimed at improving the software support for LEON processors used in SMP configurations. One of the benefits of using a multicore system in a SMP configuration is that in many instances it is possible to better utilize the available processing resources by load balancing between cores. This however comes with the cost of having to synchronize operations between cores, leading to increased complexity. While in an AMP system one can use multiple instances of operating systems that are only uni-processor capable, a SMP system requires the operating system to be written to support multicore systems. In this activity we have improved and extended the SMP support of the RTEMS real-time operating system and ensured that it fully supports the multicore capable LEON processors. The targeted hardware in the activity has been the GR712RC, a dual-core core LEON3FT processor, and the functional prototype of ESA's Next Generation Multiprocessor (NGMP), a quad core LEON4 processor. The final version of the NGMP is now available as a product under the name GR740. An implementation of the Multicore Task Management API (MTAPI) has been developed as part of this activity to aid in the parallelization of applications for RTEMS SMP. It allows for simplified development of parallel applications using the task-based programming model. An existing space application, the Gaia Video Processing Unit, has been ported to RTEMS SMP using the MTAPI implementation to demonstrate the feasibility and usefulness of multicore processors for space payload software. The activity is funded by ESA under contract 4000108560/13/NL/JK. Gedare Bloom is supported in part by NSF CNS-0934725.
Debugging multi-core systems-on-chip

NARCIS (Netherlands)

Vermeulen, B.; Goossens, K.G.W.; Kornaros, G.

2010-01-01

In this chapter, we introduced three fundamental reasons why debugging a multi-processor SoC is intrinsically difficult; (1) limited internal observability, (2) asynchronicity, and (3) non-determinism.
Multi-threaded ATLAS simulation on Intel Knights Landing processors

Science.gov (United States)

Farrell, Steven; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea; ATLAS Collaboration

2017-10-01

The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with details on its multi-threaded design. Then, we will present a performance analysis of the application on KNL devices and compare it to a traditional x86 platform to demonstrate the capabilities of the architecture and evaluate the benefits of utilizing KNL platforms like Cori for ATLAS production.
Parallel Computational Intelligence-Based Multi-Camera Surveillance System

Directory of Open Access Journals (Sweden)

Sergio Orts-Escolano

2014-04-01

Full Text Available In this work, we present a multi-camera surveillance system based on the use of self-organizing neural networks to represent events on video. The system processes several tasks in parallel using GPUs (graphic processor units. It addresses multiple vision tasks at various levels, such as segmentation, representation or characterization, analysis and monitoring of the movement. These features allow the construction of a robust representation of the environment and interpret the behavior of mobile agents in the scene. It is also necessary to integrate the vision module into a global system that operates in a complex environment by receiving images from multiple acquisition devices at video frequency. Offering relevant information to higher level systems, monitoring and making decisions in real time, it must accomplish a set of requirements, such as: time constraints, high availability, robustness, high processing speed and re-configurability. We have built a system able to represent and analyze the motion in video acquired by a multi-camera network and to process multi-source data in parallel on a multi-GPU architecture.
Accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) model on Intel Xeon Phi processors

OpenAIRE

Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junming; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

2017-01-01

The GNAQPMS model is the global version of the Nested Air Quality Prediction Modelling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present our work of porting and optimizing the GNAQPMS model on the second generation Intel Xeon Phi processor codename “Knights Landing” (KNL). Compared with the first generation Xeon Phi coprocessor, KNL introduced many new hardware features such as a boo...
A Time-Composable Operating System for the Patmos Processor

DEFF Research Database (Denmark)

Ziccardi, Marco; Schoeberl, Martin; Vardanega, Tullio

2015-01-01

-composable operating system, on top of a time-composable processor, facilitates incremental development, which is highly desirable for industry. This paper makes a twofold contribution. First, we present enhancements to the Patmos processor to allow achieving time composability at the operating system level. Second......, we extend an existing time-composable operating system, TiCOS, to make best use of advanced Patmos hardware features in the pursuit of time composability.......In the last couple of decades we have witnessed a steady growth in the complexity and widespread of real-time systems. In order to master the rising complexity in the timing behaviour of those systems, rightful attention has been given to the development of time-predictable computer architectures...
Processor farming method for multi-scale analysis of masonry structures

Science.gov (United States)

Krejčí, Tomáš; Koudelka, Tomáš

2017-07-01

This paper describes a processor farming method for a coupled heat and moisture transport in masonry using a two-level approach. The motivation for the two-level description comes from difficulties connected with masonry structures, where the size of stone blocks is much larger than the size of mortar layers and very fine finite element mesh has to be used. The two-level approach is suitable for parallel computing because nearly all computations can be performed independently with little synchronization. This approach is called processor farming. The master processor is dealing with the macro-scale level - the structure and the slave processors are dealing with a homogenization procedure on the meso-scale level which is represented by an appropriate representative volume element.
Leader-following exponential consensus of fractional order nonlinear multi-agents system with hybrid time-varying delay: A heterogeneous impulsive method

Science.gov (United States)

Wang, Fei; Yang, Yongqing

2017-09-01

In this paper, we study the leader-following exponential consensus of multi-agent system. Each agent in the system is described by nonlinear fractional order differential equation. Both the internal delay and coupling delay are taken into consideration. The heterogeneous impulsive control is used for ensuring the consensus of all agents. Based on Lyapunov function method and matrix analysis, some sufficient conditions for exponential consensus are obtained. Finally, some illustrative examples are given to show the effectiveness of the obtained results.
Programmable applications in a heterogeneous and concurrent environment

International Nuclear Information System (INIS)

Dubois, P.F.

1995-01-01

Equipe Basis (EB) is a new system for programmable applications which is under development at Lawrence Livermore Laboratory. EB is designed to permit user control of teams of interconnecting processes in a heterogeneous environment. Current systems work with programs written in Fortran or C on a single processor. The programs of the future will be in many languages and distributed over many processors. The object-oriented kernel can communicate data and commands between processes that are unaware of each other's inner structure. The programming language, Eiffel, is described. This document consists of extensive viewgraphs
Multithreading for Embedded Reconfigurable Multicore Systems

NARCIS (Netherlands)

Zaykov, P.G.

2014-01-01

In this dissertation, we address the problem of performance efficient multithreading execution on heterogeneous multicore embedded systems. By heterogeneous multicore embedded systems we refer to those, which have real-time requirements and consist of processor tiles with General Purpose Processor
Multithreading for embedded reconfigurable multicore systems

NARCIS (Netherlands)

Zaykov, P.G.

2014-01-01

In this dissertation, we address the problem of performance efficient multithreading execution on heterogeneous multicore embedded systems. By heterogeneous multicore embedded systems we refer to those, which have real-time requirements and consist of processor tiles with General Purpose Processor
Regulated open multi-agent systems (ROMAS) a multi-agent approach for designing normative open systems

CERN Document Server

Garcia, Emilia; Botti, Vicente

2015-01-01

Addressing the open problem of engineering normative open systems using the multi-agent paradigm, normative open systems are explained as systems in which heterogeneous and autonomous entities and institutions coexist in a complex social and legal framework that can evolve to address the different and often conflicting objectives of the many stakeholders involved. Presenting a software engineering approach which covers both the analysis and design of these kinds of systems, and which deals with the open issues in the area, ROMAS (Regulated Open Multi-Agent Systems) defines a specific multi-agent architecture, meta-model, methodology and CASE tool. This CASE tool is based on Model-Driven technology and integrates the graphical design with the formal verification of some properties of these systems by means of model checking techniques. Utilizing tables to enhance reader insights into the most important requirements for designing normative open multi-agent systems, the book also provides a detailed and easy t...
Expert System Constant False Alarm Rate (CFAR) Processor

National Research Council Canada - National Science Library

Wicks, Michael C

2006-01-01

An artificial intelligence system improves radar signal processor performance by increasing target probability of detection and reducing probability of false alarm in a severe radar clutter environment...
Centaure: an heterogeneous parallel architecture for computer vision

International Nuclear Information System (INIS)

Peythieux, Marc

1997-01-01

This dissertation deals with the architecture of parallel computers dedicated to computer vision. In the first chapter, the problem to be solved is presented, as well as the architecture of the Sympati and Symphonie computers, on which this work is based. The second chapter is about the state of the art of computers and integrated processors that can execute computer vision and image processing codes. The third chapter contains a description of the architecture of Centaure. It has an heterogeneous structure: it is composed of a multiprocessor system based on Analog Devices ADSP21060 Sharc digital signal processor, and of a set of Symphonie computers working in a multi-SIMD fashion. Centaure also has a modular structure. Its basic node is composed of one Symphonie computer, tightly coupled to a Sharc thanks to a dual ported memory. The nodes of Centaure are linked together by the Sharc communication links. The last chapter deals with a performance validation of Centaure. The execution times on Symphonie and on Centaure of a benchmark which is typical of industrial vision, are presented and compared. In the first place, these results show that the basic node of Centaure allows a faster execution than Symphonie, and that increasing the size of the tested computer leads to a better speed-up with Centaure than with Symphonie. In the second place, these results validate the choice of running the low level structure of Centaure in a multi- SIMD fashion. (author) [fr
A Modular Environment for Geophysical Inversion and Run-time Autotuning using Heterogeneous Computing Systems

Science.gov (United States)

Myre, Joseph M.

Heterogeneous computing systems have recently come to the forefront of the High-Performance Computing (HPC) community's interest. HPC computer systems that incorporate special purpose accelerators, such as Graphics Processing Units (GPUs), are said to be heterogeneous. Large scale heterogeneous computing systems have consistently ranked highly on the Top500 list since the beginning of the heterogeneous computing trend. By using heterogeneous computing systems that consist of both general purpose processors and special- purpose accelerators, the speed and problem size of many simulations could be dramatically increased. Ultimately this results in enhanced simulation capabilities that allows, in some cases for the first time, the execution of parameter space and uncertainty analyses, model optimizations, and other inverse modeling techniques that are critical for scientific discovery and engineering analysis. However, simplifying the usage and optimization of codes for heterogeneous computing systems remains a challenge. This is particularly true for scientists and engineers for whom understanding HPC architectures and undertaking performance analysis may not be primary research objectives. To enable scientists and engineers to remain focused on their primary research objectives, a modular environment for geophysical inversion and run-time autotuning on heterogeneous computing systems is presented. This environment is composed of three major components: 1) CUSH---a framework for reducing the complexity of programming heterogeneous computer systems, 2) geophysical inversion routines which can be used to characterize physical systems, and 3) run-time autotuning routines designed to determine configurations of heterogeneous computing systems in an attempt to maximize the performance of scientific and engineering codes. Using three case studies, a lattice-Boltzmann method, a non-negative least squares inversion, and a finite-difference fluid flow method, it is shown that
Adaptive Load Balancing of Parallel Applications with Multi-Agent Reinforcement Learning on Heterogeneous Systems

Directory of Open Access Journals (Sweden)

Johan Parent

2004-01-01

Full Text Available We report on the improvements that can be achieved by applying machine learning techniques, in particular reinforcement learning, for the dynamic load balancing of parallel applications. The applications being considered in this paper are coarse grain data intensive applications. Such applications put high pressure on the interconnect of the hardware. Synchronization and load balancing in complex, heterogeneous networks need fast, flexible, adaptive load balancing algorithms. Viewing a parallel application as a one-state coordination game in the framework of multi-agent reinforcement learning, and by using a recently introduced multi-agent exploration technique, we are able to improve upon the classic job farming approach. The improvements are achieved with limited computation and communication overhead.
Optical Array Processor: Laboratory Results

Science.gov (United States)

Casasent, David; Jackson, James; Vaerewyck, Gerard

1987-01-01

A Space Integrating (SI) Optical Linear Algebra Processor (OLAP) is described and laboratory results on its performance in several practical engineering problems are presented. The applications include its use in the solution of a nonlinear matrix equation for optimal control and a parabolic Partial Differential Equation (PDE), the transient diffusion equation with two spatial variables. Frequency-multiplexed, analog and high accuracy non-base-two data encoding are used and discussed. A multi-processor OLAP architecture is described and partitioning and data flow issues are addressed.
Workload Balancing on Heterogeneous Systems: A Case Study of Sparse Grid Interpolation

KAUST Repository

Muraraşu, Alin

2012-01-01

Multi-core parallelism and accelerators are becoming common features of today’s computer systems, as they allow for computational power without sacrificing energy efficiency. Due to heterogeneity, tuning for each type of compute unit and adequate load balancing is essential. This paper proposes static and dynamic solutions for load balancing in the context of an application for visualizing high-dimensional simulation data. The application relies on the sparse grid technique for data compression. Its performance critical part is the interpolation routine used for decompression. Results show that our load balancing scheme allows for an efficient acceleration of interpolation on heterogeneous systems containing multi-core CPUs and GPUs.
Multi-kilowatt modularized spacecraft power processing system development

International Nuclear Information System (INIS)

Andrews, R.E.; Hayden, J.H.; Hedges, R.T.; Rehmann, D.W.

1975-07-01

A review of existing information pertaining to spacecraft power processing systems and equipment was accomplished with a view towards applicability to the modularization of multi-kilowatt power processors. Power requirements for future spacecraft were determined from the NASA mission model-shuttle systems payload data study which provided the limits for modular power equipment capabilities. Three power processing systems were compared to evaluation criteria to select the system best suited for modularity. The shunt regulated direct energy transfer system was selected by this analysis for a conceptual design effort which produced equipment specifications, schematics, envelope drawings, and power module configurations

LDRD final report : a lightweight operating system for multi-core capability class supercomputers.

Energy Technology Data Exchange (ETDEWEB)

Kelly, Suzanne Marie; Hudson, Trammell B. (OS Research); Ferreira, Kurt Brian; Bridges, Patrick G. (University of New Mexico); Pedretti, Kevin Thomas Tauke; Levenhagen, Michael J.; Brightwell, Ronald Brian

2010-09-01

The two primary objectives of this LDRD project were to create a lightweight kernel (LWK) operating system(OS) designed to take maximum advantage of multi-core processors, and to leverage the virtualization capabilities in modern multi-core processors to create a more flexible and adaptable LWK environment. The most significant technical accomplishments of this project were the development of the Kitten lightweight kernel, the co-development of the SMARTMAP intra-node memory mapping technique, and the development and demonstration of a scalable virtualization environment for HPC. Each of these topics is presented in this report by the inclusion of a published or submitted research paper. The results of this project are being leveraged by several ongoing and new research projects.
Multi-threaded ATLAS simulation on Intel Knights Landing processors

CERN Document Server

AUTHOR|(INSPIRE)INSPIRE-00014247; The ATLAS collaboration; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea

2017-01-01

The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with detai...
Wavelength-encoded OCDMA system using opto-VLSI processors.

Science.gov (United States)

Aljada, Muhsen; Alameh, Kamal

2007-07-01

We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.
Wavelength-encoded OCDMA system using opto-VLSI processors

Science.gov (United States)

Aljada, Muhsen; Alameh, Kamal

2007-07-01

We propose and experimentally demonstrate a 2.5 Gbits/sper user wavelength-encoded optical code-division multiple-access encoder-decoder structure based on opto-VLSI processing. Each encoder and decoder is constructed using a single 1D opto-very-large-scale-integrated (VLSI) processor in conjunction with a fiber Bragg grating (FBG) array of different Bragg wavelengths. The FBG array spectrally and temporally slices the broadband input pulse into several components and the opto-VLSI processor generates codewords using digital phase holograms. System performance is measured in terms of the autocorrelation and cross-correlation functions as well as the eye diagram.
Launching applications on compute and service processors running under different operating systems in scalable network of processor boards with routers

Science.gov (United States)

Tomkins, James L [Albuquerque, NM; Camp, William J [Albuquerque, NM

2009-03-17

A multiple processor computing apparatus includes a physical interconnect structure that is flexibly configurable to support selective segregation of classified and unclassified users. The physical interconnect structure also permits easy physical scalability of the computing apparatus. The computing apparatus can include an emulator which permits applications from the same job to be launched on processors that use different operating systems.
Summary of multi-core hardware and programming model investigations

Energy Technology Data Exchange (ETDEWEB)

Kelly, Suzanne Marie; Pedretti, Kevin Thomas Tauke; Levenhagen, Michael J.

2008-05-01

This report summarizes our investigations into multi-core processors and programming models for parallel scientific applications. The motivation for this study was to better understand the landscape of multi-core hardware, future trends, and the implications on system software for capability supercomputers. The results of this study are being used as input into the design of a new open-source light-weight kernel operating system being targeted at future capability supercomputers made up of multi-core processors. A goal of this effort is to create an agile system that is able to adapt to and efficiently support whatever multi-core hardware and programming models gain acceptance by the community.
The application of charge-coupled device processors in automatic-control systems

Science.gov (United States)

Mcvey, E. S.; Parrish, E. A., Jr.

1977-01-01

The application of charge-coupled device (CCD) processors to automatic-control systems is suggested. CCD processors are a new form of semiconductor component with the unique ability to process sampled signals on an analog basis. Specific implementations of controllers are suggested for linear time-invariant, time-varying, and nonlinear systems. Typical processing time should be only a few microseconds. This form of technology may become competitive with microprocessors and minicomputers in addition to supplementing them.
Investigation of Large Scale Cortical Models on Clustered Multi-Core Processors

Science.gov (United States)

2013-02-01

Playstation 3 with 6 available SPU cores outperforms the Intel Xeon processor (with 4 cores) by about 1.9 times for the HTM model and by 2.4 times...runtime breakdowns of the HTM and Dean models respectively on the Cell processor (on the Playstation 3) and the Intel Xeon processor ( 4 thread...YOUR FORM TO THE ABOVE ORGANIZATION. 1. REPORT DATE (DD-MM-YYYY) 2. REPORT TYPE 3. DATES COVERED (From - To) 4 . TITLE AND SUBTITLE 5a. CONTRACT NUMBER
Allocation and sequencing in 1-out-of-N heterogeneous cold-standby systems: Multi-objective harmony search with dynamic parameters tuning

International Nuclear Information System (INIS)

Valaei, M.R.; Behnamian, J.

2017-01-01

A redundancy allocation is a famous problem in reliability sciences. A lot of researcher investigated about this problem, but a few of them focus on heterogeneous 1-out-of-N: G cold-standby redundancy in each subsystem. This paper considers a redundancy allocation problem (RAP) and standby element sequencing problem (SESP) for 1-out-of-N: G heterogeneous cold-standby system, simultaneously. Moreover, here, maximizing reliability of cold-standby allocation and minimizing cost of buying and time-independent elements are considered as two conflict objectives. This problem is NP-Hard and consequently, devizing a metaheuristic to solve this problem, especially for large-sized instances, is highly desirable. In this paper, we propose a multi-objective harmony search. Based on Taguchi experimental design, we, also, present a new parameters tuning method to improve the proposed algorithm. - Graphical abstract: Example of solution encoding. - Highlights: • This paper considers a redundancy allocation and standby element sequencing problem. • To solve this problem, a multi-objective harmony search algorithm is proposed. • Dynamic parameters tuning method is applied to improved the algorithm. • With this method, sensitivity of the algorithm to initial parameters is reduced.
Advanced Avionics and Processor Systems for a Flexible Space Exploration Architecture

Science.gov (United States)

Keys, Andrew S.; Adams, James H.; Smith, Leigh M.; Johnson, Michael A.; Cressler, John D.

2010-01-01

The Advanced Avionics and Processor Systems (AAPS) project, formerly known as the Radiation Hardened Electronics for Space Environments (RHESE) project, endeavors to develop advanced avionic and processor technologies anticipated to be used by NASA s currently evolving space exploration architectures. The AAPS project is a part of the Exploration Technology Development Program, which funds an entire suite of technologies that are aimed at enabling NASA s ability to explore beyond low earth orbit. NASA s Marshall Space Flight Center (MSFC) manages the AAPS project. AAPS uses a broad-scoped approach to developing avionic and processor systems. Investment areas include advanced electronic designs and technologies capable of providing environmental hardness, reconfigurable computing techniques, software tools for radiation effects assessment, and radiation environment modeling tools. Near-term emphasis within the multiple AAPS tasks focuses on developing prototype components using semiconductor processes and materials (such as Silicon-Germanium (SiGe)) to enhance a device s tolerance to radiation events and low temperature environments. As the SiGe technology will culminate in a delivered prototype this fiscal year, the project emphasis shifts its focus to developing low-power, high efficiency total processor hardening techniques. In addition to processor development, the project endeavors to demonstrate techniques applicable to reconfigurable computing and partially reconfigurable Field Programmable Gate Arrays (FPGAs). This capability enables avionic architectures the ability to develop FPGA-based, radiation tolerant processor boards that can serve in multiple physical locations throughout the spacecraft and perform multiple functions during the course of the mission. The individual tasks that comprise AAPS are diverse, yet united in the common endeavor to develop electronics capable of operating within the harsh environment of space. Specifically, the AAPS tasks for
NeuroFlow: A General Purpose Spiking Neural Network Simulation Platform using Customizable Processors.

Science.gov (United States)

Cheung, Kit; Schultz, Simon R; Luk, Wayne

2015-01-01

NeuroFlow is a scalable spiking neural network simulation platform for off-the-shelf high performance computing systems using customizable hardware processors such as Field-Programmable Gate Arrays (FPGAs). Unlike multi-core processors and application-specific integrated circuits, the processor architecture of NeuroFlow can be redesigned and reconfigured to suit a particular simulation to deliver optimized performance, such as the degree of parallelism to employ. The compilation process supports using PyNN, a simulator-independent neural network description language, to configure the processor. NeuroFlow supports a number of commonly used current or conductance based neuronal models such as integrate-and-fire and Izhikevich models, and the spike-timing-dependent plasticity (STDP) rule for learning. A 6-FPGA system can simulate a network of up to ~600,000 neurons and can achieve a real-time performance of 400,000 neurons. Using one FPGA, NeuroFlow delivers a speedup of up to 33.6 times the speed of an 8-core processor, or 2.83 times the speed of GPU-based platforms. With high flexibility and throughput, NeuroFlow provides a viable environment for large-scale neural network simulation.
The associative memory system for the FTK processor at ATLAS

CERN Document Server

Magalotti, D; The ATLAS collaboration; Donati, S; Luciano, P; Piendibene, M; Giannetti, P; Lanza, A; Verzellesi, G; Sakellariou, Andreas; Billereau, W; Combe, J M

2014-01-01

In high energy physics experiments, the most interesting processes are very rare and hidden in an extremely large level of background. As the experiment complexity, accelerator backgrounds, and instantaneous luminosity increase, more effective and accurate data selection techniques are needed. The Fast TracKer processor (FTK) is a real time tracking processor designed for the ATLAS trigger upgrade. The FTK core is the Associative Memory system. It provides massive computing power to minimize the processing time of complex tracking algorithms executed online. This paper reports on the results and performance of a new prototype of Associative Memory system.
Suboptimal processor for anomaly detection for system surveillance and diagnosis

Energy Technology Data Exchange (ETDEWEB)

Ciftcioglu, Oe.; Hoogenboom, J.E.; Dam, H. van

1989-06-01

Anomaly detection for nuclear reactor surveillance and diagnosis is described. The residual noise obtained as a result of autoregressive (AR) modelling is essential to obtain high sensitivity for anomaly detection. By means of the method of hypothesis testing a suboptimal anomaly detection processor is devised for system surveillance and diagnosis. Experiments are carried out to investigate the performance of the processor, which is in particular of interest for on-line and real-time applications.
Processor tradeoffs in distributed real-time systems

Science.gov (United States)

Krishna, C. M.; Shin, Kang G.; Bhandari, Inderpal S.

1987-01-01

The problem of the optimization of the design of real-time distributed systems is examined with reference to a class of computer architectures similar to the continuously reconfigurable multiprocessor flight control system structure, CM2FCS. Particular attention is given to the impact of processor replacement and the burn-in time on the probability of dynamic failure and mean cost. The solution is obtained numerically and interpreted in the context of real-time applications.
Analysis of Intel IA-64 Processor Support for Secure Systems

National Research Council Canada - National Science Library

Unalmis, Bugra

2001-01-01

.... Systems could be constructed for which serious security threats would be eliminated. This thesis explores the Intel IA-64 processor's hardware support and its relationship to software for building a secure system...
A discussion of tools and techniques for distributed processor based control systems using CAMAC

International Nuclear Information System (INIS)

Tippie, J.W.; Scandora, A.E.

1985-01-01

This paper describes and analyzes various distributed processor architectures using commercially available CAMAC components. The general orientation is toward distributed control systems using Digital Equipment Corporation LSI11 processors in a CAMAC environment. The paper describes in detail software tools available to simplify the development of applications software and to provide a high-level runtime environment both at the host and the remote processors. Discussion focuses on techniques for downloading of operating systems from a large host and applications tasks written in high-level languages. It also discusses software tools which enable tasks in the remote processors to exchange messages and data with tasks in the host in a simple and elegant way
A Decision-Making Method with Grey Multi-Source Heterogeneous Data and Its Application in Green Supplier Selection

Science.gov (United States)

Dang, Yaoguo; Mao, Wenxin

2018-01-01

In view of the multi-attribute decision-making problem that the attribute values are grey multi-source heterogeneous data, a decision-making method based on kernel and greyness degree is proposed. The definitions of kernel and greyness degree of an extended grey number in a grey multi-source heterogeneous data sequence are given. On this basis, we construct the kernel vector and greyness degree vector of the sequence to whiten the multi-source heterogeneous information, then a grey relational bi-directional projection ranking method is presented. Considering the multi-attribute multi-level decision structure and the causalities between attributes in decision-making problem, the HG-DEMATEL method is proposed to determine the hierarchical attribute weights. A green supplier selection example is provided to demonstrate the rationality and validity of the proposed method. PMID:29510521
A Decision-Making Method with Grey Multi-Source Heterogeneous Data and Its Application in Green Supplier Selection.

Science.gov (United States)

Sun, Huifang; Dang, Yaoguo; Mao, Wenxin

2018-03-03

In view of the multi-attribute decision-making problem that the attribute values are grey multi-source heterogeneous data, a decision-making method based on kernel and greyness degree is proposed. The definitions of kernel and greyness degree of an extended grey number in a grey multi-source heterogeneous data sequence are given. On this basis, we construct the kernel vector and greyness degree vector of the sequence to whiten the multi-source heterogeneous information, then a grey relational bi-directional projection ranking method is presented. Considering the multi-attribute multi-level decision structure and the causalities between attributes in decision-making problem, the HG-DEMATEL method is proposed to determine the hierarchical attribute weights. A green supplier selection example is provided to demonstrate the rationality and validity of the proposed method.
Fast decision algorithms in low-power embedded processors for quality-of-service based connectivity of mobile sensors in heterogeneous wireless sensor networks.

Science.gov (United States)

Jaraíz-Simón, María D; Gómez-Pulido, Juan A; Vega-Rodríguez, Miguel A; Sánchez-Pérez, Juan M

2012-01-01

When a mobile wireless sensor is moving along heterogeneous wireless sensor networks, it can be under the coverage of more than one network many times. In these situations, the Vertical Handoff process can happen, where the mobile sensor decides to change its connection from a network to the best network among the available ones according to their quality of service characteristics. A fitness function is used for the handoff decision, being desirable to minimize it. This is an optimization problem which consists of the adjustment of a set of weights for the quality of service. Solving this problem efficiently is relevant to heterogeneous wireless sensor networks in many advanced applications. Numerous works can be found in the literature dealing with the vertical handoff decision, although they all suffer from the same shortfall: a non-comparable efficiency. Therefore, the aim of this work is twofold: first, to develop a fast decision algorithm that explores the entire space of possible combinations of weights, searching that one that minimizes the fitness function; and second, to design and implement a system on chip architecture based on reconfigurable hardware and embedded processors to achieve several goals necessary for competitive mobile terminals: good performance, low power consumption, low economic cost, and small area integration.
Design and implementation of a high performance network security processor

Science.gov (United States)

Wang, Haixin; Bai, Guoqiang; Chen, Hongyi

2010-03-01

The last few years have seen many significant progresses in the field of application-specific processors. One example is network security processors (NSPs) that perform various cryptographic operations specified by network security protocols and help to offload the computation intensive burdens from network processors (NPs). This article presents a high performance NSP system architecture implementation intended for both internet protocol security (IPSec) and secure socket layer (SSL) protocol acceleration, which are widely employed in virtual private network (VPN) and e-commerce applications. The efficient dual one-way pipelined data transfer skeleton and optimised integration scheme of the heterogenous parallel crypto engine arrays lead to a Gbps rate NSP, which is programmable with domain specific descriptor-based instructions. The descriptor-based control flow fragments large data packets and distributes them to the crypto engine arrays, which fully utilises the parallel computation resources and improves the overall system data throughput. A prototyping platform for this NSP design is implemented with a Xilinx XC3S5000 based FPGA chip set. Results show that the design gives a peak throughput for the IPSec ESP tunnel mode of 2.85 Gbps with over 2100 full SSL handshakes per second at a clock rate of 95 MHz.

Analysis and optimisation of heterogeneous real-time embedded systems

DEFF Research Database (Denmark)

Pop, Paul; Eles, Petru; Peng, Zebo

2005-01-01

. The success of such new design methods depends on the availability of analysis and optimisation techniques. Analysis and optimisation techniques for heterogeneous real-time embedded systems are presented in the paper. The authors address in more detail a particular class of such systems called multi...... of application messages to frames. Optimisation heuristics for frame packing aimed at producing a schedulable system are presented. Extensive experiments and a real-life example show the efficiency of the frame-packing approach....
Analysis and optimisation of heterogeneous real-time embedded systems

DEFF Research Database (Denmark)

Pop, Paul; Eles, Petru; Peng, Zebo

2006-01-01

. The success of such new design methods depends on the availability of analysis and optimisation techniques. Analysis and optimisation techniques for heterogeneous real-time embedded systems are presented in the paper. The authors address in more detail a particular class of such systems called multi...... of application messages to frames. Optimisation heuristics for frame packing aimed at producing a schedulable system are presented. Extensive experiments and a real-life example show the efficiency of the frame-packing approach....
Analysis and Optimization of Heterogeneous Real-Time Embedded Systems

DEFF Research Database (Denmark)

Pop, Paul; Eles, Petru; Peng, Zebo

2005-01-01

. The success of such new design methods depends on the availability of analysis and optimization techniques. In this paper, we present analysis and optimization techniques for heterogeneous real-time embedded systems. We address in more detail a particular class of such systems called multi-clusters, composed...... to frames. Optimization heuristics for frame packing aiming at producing a schedulable system are presented. Extensive experiments and a real-life example show the efficiency of the frame-packing approach....
Thermal Dissipation Efficiency in a Micro-Processor Using Carbon Nanotubes Based Composite

Science.gov (United States)

Thang, Bui Hung; Van Quang, Cao; Nghia, Van Trong; Hong, Phan Ngoc; Van Chuc, Nguyen; Tam, Ngo Thi Thanh; Quang, Le Dinh; Khang, Dao Duc; Khoi, Phan Hong; Minh, Phan Ngoc

2009-09-01

Modern electronic and optoelectronic devices such as μ-processor, light emitting diode, semiconductor laser issued a challenge in the thermal dissipation problem. Finding an effective way for thermal dissipation therefore becomes a very important issue. It is known that carbon nanotubes (CNTs) is one of the most valuable materials with high thermal conductivity (2000 W/m.K compared to thermal conductivity of Ag 419 W/m.K). This suggested an approach in applying the CNTs as an essential component for thermal dissipation media to improve the performance of computer processor and other high power electronic devices. In this work multi walled carbon nanotubes (MWCNTs) based composites were utilized as the thermal dissipation media in a micro processor of a personal computer. The MWCNTs of different concentrations were added into polyaniline, commercial silicon thermal paste and commercial silver thermal paste by mechanical methods. A personal computer with configuration: Intel Pentium IV 3.066 GHz, 512 MB of RAM and Windows XP Service Pack 2 Operating System was employed. The thermal dissipation efficiency of the system was evaluated by directly measure the temperature of the μ-processor during the operation of the computer in different CPU speeds. The measured results showed that the CNTs based composite could reduce the temperature of the u-processor more than 5° C, and the time for increasing the temperature of the μ-processor was three times longer than that when using commercial thermal paste.
Distributed Cooperative Control of Nonlinear and Non-identical Multi-agent Systems

DEFF Research Database (Denmark)

Bidram, Ali; Lewis, Frank; Davoudi, Ali

2013-01-01

This paper exploits input-output feedback linearization technique to implement distributed cooperative control of multi-agent systems with nonlinear and non-identical dynamics. Feedback linearization transforms the synchronization problem for a nonlinear and heterogeneous multi-agent system...... for electric power microgrids. The effectiveness of the proposed control is verified by simulating a microgrid test system....
Multithreading in vector processors

Science.gov (United States)

Evangelinos, Constantinos; Kim, Changhoan; Nair, Ravi

2018-01-16

In one embodiment, a system includes a processor having a vector processing mode and a multithreading mode. The processor is configured to operate on one thread per cycle in the multithreading mode. The processor includes a program counter register having a plurality of program counters, and the program counter register is vectorized. Each program counter in the program counter register represents a distinct corresponding thread of a plurality of threads. The processor is configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle.
Vivaldi: A Domain-Specific Language for Volume Processing and Visualization on Distributed Heterogeneous Systems.

Science.gov (United States)

Choi, Hyungsuk; Choi, Woohyuk; Quan, Tran Minh; Hildebrand, David G C; Pfister, Hanspeter; Jeong, Won-Ki

2014-12-01

As the size of image data from microscopes and telescopes increases, the need for high-throughput processing and visualization of large volumetric data has become more pressing. At the same time, many-core processors and GPU accelerators are commonplace, making high-performance distributed heterogeneous computing systems affordable. However, effectively utilizing GPU clusters is difficult for novice programmers, and even experienced programmers often fail to fully leverage the computing power of new parallel architectures due to their steep learning curve and programming complexity. In this paper, we propose Vivaldi, a new domain-specific language for volume processing and visualization on distributed heterogeneous computing systems. Vivaldi's Python-like grammar and parallel processing abstractions provide flexible programming tools for non-experts to easily write high-performance parallel computing code. Vivaldi provides commonly used functions and numerical operators for customized visualization and high-throughput image processing applications. We demonstrate the performance and usability of Vivaldi on several examples ranging from volume rendering to image segmentation.
Dataflow formalisation of real-time streaming applications on a composable and predictable multi-processor SOC

NARCIS (Netherlands)

Nelson, A.T.; Goossens, K.G.W.; Akesson, K.B.

2015-01-01

Embedded systems often contain multiple applications, some of which have real-time requirements and whose performance must be guaranteed. To efficiently execute applications, modern embedded systems contain Globally Asynchronous Locally Synchronous (GALS) processors, network on chip, DRAM and SRAM
Probing water motion in heterogenous systems : a multi-parameter NMR approach

NARCIS (Netherlands)

Dusschoten, van D.

1996-01-01

In this Thesis a practical approach is presented to study water mobility in heterogeneous systems by a number of novel NMR sequences. The major part of this Thesis describes how the reliability of diffusion measurements can be improved using some of the novel NMR sequences. The
System-on-Chip Environment: A SpecC-Based Framework for Heterogeneous MPSoC Design

Directory of Open Access Journals (Sweden)

Daniel D. Gajski

2008-07-01

Full Text Available The constantly growing complexity of embedded systems is a challenge that drives the development of novel design automation techniques. C-based system-level design addresses the complexity challenge by raising the level of abstraction and integrating the design processes for the heterogeneous system components. In this article, we present a comprehensive design framework, the system-on-chip environment (SCE which is based on the influential SpecC language and methodology. SCE implements a top-down system design flow based on a specify-explore-refine paradigm with support for heterogeneous target platforms consisting of custom hardware components, embedded software processors, dedicated IP blocks, and complex communication bus architectures. Starting from an abstract specification of the desired system, models at various levels of abstraction are automatically generated through successive step-wise refinement, resulting in a pin-and cycle-accurate system implementation. The seamless integration of automatic model generation, estimation, and verification tools enables rapid design space exploration and efficient MPSoC implementation. Using a large set of industrial-strength examples with a wide range of target architectures, our experimental results demonstrate the effectiveness of our framework and show significant productivity gains in design time.
A Fault Detection Mechanism in a Data-flow Scheduled Multithreaded Processor

NARCIS (Netherlands)

Fu, J.; Yang, Q.; Poss, R.; Jesshope, C.R.; Zhang, C.

2014-01-01

This paper designs and implements the Redundant Multi-Threading (RMT) in a Data-flow scheduled MultiThreaded (DMT) multicore processor, called Data-flow scheduled Redundant Multi-Threading (DRMT). Meanwhile, It presents Asynchronous Output Comparison (AOC) for RMT techniques to avoid fault detection
A Performance-Prediction Model for PIC Applications on Clusters of Symmetric MultiProcessors: Validation with Hierarchical HPF+OpenMP Implementation

Directory of Open Access Journals (Sweden)

Sergio Briguglio

2003-01-01

Full Text Available A performance-prediction model is presented, which describes different hierarchical workload decomposition strategies for particle in cell (PIC codes on Clusters of Symmetric MultiProcessors. The devised workload decomposition is hierarchically structured: a higher-level decomposition among the computational nodes, and a lower-level one among the processors of each computational node. Several decomposition strategies are evaluated by means of the prediction model, with respect to the memory occupancy, the parallelization efficiency and the required programming effort. Such strategies have been implemented by integrating the high-level languages High Performance Fortran (at the inter-node stage and OpenMP (at the intra-node one. The details of these implementations are presented, and the experimental values of parallelization efficiency are compared with the predicted results.
Sensitometric control of roentgen film processors

International Nuclear Information System (INIS)

Forsberg, H.; Karolinska Sjukhuset, Stockholm

1987-01-01

Monitoring of film processors performance is essential since image quality, patient dose and costs are influenced by the performance. A system for sensitometric constancy control of film processors and their associated components is described. Experience with the system for 3 years is given when implemented on 17 film processors. Modern high quality film processors have a stability that makes a test frequency of once a week sufficient to maintain adequate image quality. The test system is so sensitive that corrective actions almost invariably have been taken before any technical problem degraded the image quality to a visible degree. (orig.)
Confabulation Based Real-time Anomaly Detection for Wide-area Surveillance Using Heterogeneous High Performance Computing Architecture

Science.gov (United States)

2015-06-01

CONFABULATION BASED REAL-TIME ANOMALY DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE SYRACUSE...DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE 5a. CONTRACT NUMBER FA8750-12-1-0251 5b. GRANT...processors including graphic processor units (GPUs) and Intel Xeon Phi processors. Experimental results showed significant speedups, which can enable
Development of an Advanced Digital Reactor Protection System Using Diverse Dual Processors to Prevent Common-Mode Failure

International Nuclear Information System (INIS)

Shin, Hyun Kook; Nam, Sang Ku; Sohn, Se Do; Chang, Hoon Seon

2003-01-01

The advanced digital reactor protection system (ADRPS) with diverse dual processors has been developed to prevent common-mode failure (CMF). The principle of diversity is applied to both hardware design and software design. For hardware diversity, two different types of CPUs are used for the bistable processor and local coincidence logic (LCL) processor. The Versa Module Eurocard-based single board computers are used for the CPU hardware platforms. The QNX operating system and the VxWorks operating system were selected for software diversity. Functional diversity is also applied to the input and output modules, and to the algorithm in the bistable processors and LCL processors. The characteristics of the newly developed digital protection system are described together with the preventive capability against CMF. Also, system reliability analysis is discussed. The evaluation results show that the ADRPS has a good preventive capability against the CMF and is a highly reliable reactor protection system
Optimization and parallelization of B-spline based orbital evaluations in QMC on multi/many-core shared memory processors

OpenAIRE

Mathuriya, Amrita; Luo, Ye; Benali, Anouar; Shulenburger, Luke; Kim, Jeongnim

2016-01-01

B-spline based orbital representations are widely used in Quantum Monte Carlo (QMC) simulations of solids, historically taking as much as 50% of the total run time. Random accesses to a large four-dimensional array make it challenging to efficiently utilize caches and wide vector units of modern CPUs. We present node-level optimizations of B-spline evaluations on multi/many-core shared memory processors. To increase SIMD efficiency and bandwidth utilization, we first apply data layout transfo...
An optimal multi-channel memory controller for real-time systems

NARCIS (Netherlands)

Gomony, M.D.; Akesson, K.B.; Goossens, K.G.W.

2013-01-01

Optimal utilization of a multi-channel memory, such as Wide IO DRAM, as shared memory in multi-processor platforms depends on the mapping of memory clients to the memory channels, the granularity at which the memory requests are interleaved in each channel, and the bandwidth and memory capacity
Heterogeneous Face Attribute Estimation: A Deep Multi-Task Learning Approach.

Science.gov (United States)

Han, Hu; K Jain, Anil; Shan, Shiguang; Chen, Xilin

2017-08-10

Face attribute estimation has many potential applications in video surveillance, face retrieval, and social media. While a number of methods have been proposed for face attribute estimation, most of them did not explicitly consider the attribute correlation and heterogeneity (e.g., ordinal vs. nominal and holistic vs. local) during feature representation learning. In this paper, we present a Deep Multi-Task Learning (DMTL) approach to jointly estimate multiple heterogeneous attributes from a single face image. In DMTL, we tackle attribute correlation and heterogeneity with convolutional neural networks (CNNs) consisting of shared feature learning for all the attributes, and category-specific feature learning for heterogeneous attributes. We also introduce an unconstrained face database (LFW+), an extension of public-domain LFW, with heterogeneous demographic attributes (age, gender, and race) obtained via crowdsourcing. Experimental results on benchmarks with multiple face attributes (MORPH II, LFW+, CelebA, LFWA, and FotW) show that the proposed approach has superior performance compared to state of the art. Finally, evaluations on a public-domain face database (LAP) with a single attribute show that the proposed approach has excellent generalization ability.
3081/E processor

International Nuclear Information System (INIS)

Kunz, P.F.; Gravina, M.; Oxoby, G.

1984-04-01

The 3081/E project was formed to prepare a much improved IBM mainframe emulator for the future. Its design is based on a large amount of experience in using the 168/E processor to increase available CPU power in both online and offline environments. The processor will be at least equal to the execution speed of a 370/168 and up to 1.5 times faster for heavy floating point code. A single processor will thus be at least four times more powerful than the VAX 11/780, and five processors on a system would equal at least the performance of the IBM 3081K. With its large memory space and simple but flexible high speed interface, the 3081/E is well suited for the online and offline needs of high energy physics in the future
IMPERA: Integrated Mission Planning for Multi-Robot Systems

Directory of Open Access Journals (Sweden)

Daniel Saur

2015-10-01

Full Text Available This paper presents the results of the project IMPERA (Integrated Mission Planning for Distributed Robot Systems. The goal of IMPERA was to realize an extraterrestrial exploration scenario using a heterogeneous multi-robot system. The main challenge was the development of a multi-robot planning and plan execution architecture. The robot team consists of three heterogeneous robots, which have to explore an unknown environment and collect lunar drill samples. The team activities are described using the language ALICA (A Language for Interactive Agents. Furthermore, we use the mission planning system pRoPhEt MAS (Reactive Planning Engine for Multi-Agent Systems to provide an intuitive interface to generate team activities. Therefore, we define the basic skills of our team with ALICA and define the desired goal states by using a logic description. Based on the skills, pRoPhEt MAS creates a valid ALICA plan, which will be executed by the team. The paper describes the basic components for communication, coordinated exploration, perception and object transportation. Finally, we evaluate the planning engine pRoPhEt MAS in the IMPERA scenario. In addition, we present further evaluation of pRoPhEt MAS in more dynamic environments.

GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

Science.gov (United States)

Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

2017-08-01

The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy
A Novel Approach to Develop the Lower Order Model of Multi-Input Multi-Output System

Science.gov (United States)

Rajalakshmy, P.; Dharmalingam, S.; Jayakumar, J.

2017-10-01

A mathematical model is a virtual entity that uses mathematical language to describe the behavior of a system. Mathematical models are used particularly in the natural sciences and engineering disciplines like physics, biology, and electrical engineering as well as in the social sciences like economics, sociology and political science. Physicists, Engineers, Computer scientists, and Economists use mathematical models most extensively. With the advent of high performance processors and advanced mathematical computations, it is possible to develop high performing simulators for complicated Multi Input Multi Ouptut (MIMO) systems like Quadruple tank systems, Aircrafts, Boilers etc. This paper presents the development of the mathematical model of a 500 MW utility boiler which is a highly complex system. A synergistic combination of operational experience, system identification and lower order modeling philosophy has been effectively used to develop a simplified but accurate model of a circulation system of a utility boiler which is a MIMO system. The results obtained are found to be in good agreement with the physics of the process and with the results obtained through design procedure. The model obtained can be directly used for control system studies and to realize hardware simulators for boiler testing and operator training.
APPLICATION OF THE MODELS-3 COMMUNITY MULTI-SCALE AIR QUALITY (CMAQ) MODEL SYSTEM TO SOS/NASHVILLE 1999

Science.gov (United States)

The Models-3 Community Multi-scale Air Quality (CMAQ) model, first released by the USEPA in 1999 (Byun and Ching. 1999), continues to be developed and evaluated. The principal components of the CMAQ system include a comprehensive emission processor known as the Sparse Matrix O...
Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations

Science.gov (United States)

Bernaschi, M.; Bisson, M.; Salvadore, F.

2014-10-01

We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of spin systems. As a benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. We present data also for a traditional high-end multi-core architecture: the Intel Sandy Bridge. The results show that although on the two Intel architectures it is possible to use basically the same code, the performances of a Intel MIC change dramatically depending on (apparently) minor details. Another issue is that to obtain a reasonable scalability with the Intel Phi coprocessor (Phi is the coprocessor that implements the MIC architecture) in a cluster configuration it is necessary to use the so-called offload mode which reduces the performances of the single system. As to the GPU, the Kepler architecture offers a clear advantage with respect to the previous Fermi architecture maintaining exactly the same source code. Scalability of the multi-GPU implementation remains very good by using the CPU as a communication co-processor of the GPU. All source codes are provided for inspection and for double-checking the results.
A Modular Pipelined Processor for High Resolution Gamma-Ray Spectroscopy

Science.gov (United States)

Veiga, Alejandro; Grunfeld, Christian

2016-02-01

The design of a digital signal processor for gamma-ray applications is presented in which a single ADC input can simultaneously provide temporal and energy characterization of gamma radiation for a wide range of applications. Applying pipelining techniques, the processor is able to manage and synchronize very large volumes of streamed real-time data. Its modular user interface provides a flexible environment for experimental design. The processor can fit in a medium-sized FPGA device operating at ADC sampling frequency, providing an efficient solution for multi-channel applications. Two experiments are presented in order to characterize its temporal and energy resolution.
Multilevel processor-sharing algorithm for M/G/1 systems with priorities

Energy Technology Data Exchange (ETDEWEB)

Yassouridis, A.; Koller, R.

1983-01-01

The well-known multilevel processor-sharing algorithm for M/G/1 systems without priorities is extended to M/G/1 systems with priority classes. The average response time t/sub j/(x) and the average waiting time w/sub j/(x) for a j-class job, which requires a total service of x sec, are analytically calculated. Some figures demonstrate how the priority classes and the total number of different levels affect the behaviour of the functions t/sub j/(x) and w/sub j/(x). In addition, the foreground-background algorithm with priorities, which is not yet covered in the literature, is treated as a special case of the multilevel processor-sharing algorithm. 8 references.
Scalable Motion Estimation Processor Core for Multimedia System-on-Chip Applications

Science.gov (United States)

Lai, Yeong-Kang; Hsieh, Tian-En; Chen, Lien-Fei

2007-04-01

In this paper, we describe a high-throughput and scalable motion estimation processor architecture for multimedia system-on-chip applications. The number of processing elements (PEs) is scalable according to the variable algorithm parameters and the performance required for different applications. Using the PE rings efficiently and an intelligent memory-interleaving organization, the efficiency of the architecture can be increased. Moreover, using efficient on-chip memories and a data management technique can effectively decrease the power consumption and memory bandwidth. Techniques for reducing the number of interconnections and external memory accesses are also presented. Our results demonstrate that the proposed scalable PE-ringed architecture is a flexible and high-performance processor core in multimedia system-on-chip applications.
ECH system developments including the design of an intelligent fault processor on the DIII-D tokamak

International Nuclear Information System (INIS)

Ponce, D.; Lohr, J.; Tooker, J.F.; O'Neill, R.C.; Moeller, C.P.; Doane, J.L.; Noraky, S.; Dubovenko, K.; Gorelov, Y.A.; Cengher, M.; Penaflor, B.G.; Ellis, R.A.

2011-01-01

A new generation fault processor is in development which is intended to increase fault handling flexibility and reduce the number of incomplete DIII-D shots due to gyrotron faults. The processor, which is based upon a field programmable gate array device, will analyze signals for aberrant operation and ramp down high voltage to try to avoid hard faults. The processor will then attempt to ramp back up to an attainable operating point. The new generation fault processor will be developed during an expansion of the electron cyclotron heating (ECH) areas that will include the installation of a depressed collector gyrotron and associated equipment. Existing systems will also be upgraded. Testing of real-time control of the ECH launcher poloidal drives by the DIII-D plasma control system will be completed. The ECH control system software will be upgraded for increased scalability and to increase operator productivity. Resources permitting, all systems will receive an extra layer of interlocks for the filament and magnet power supplies, added shielding for the tank electronics, programmable filament boost shape for long pulses, and electronics upgrades for the installation of the advanced fault processor.
Evaluation of the Intel Westmere-EP server processor

CERN Document Server

Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

2010-01-01

In this paper we report on a set of benchmark results recently obtained by CERN openlab when comparing the 6-core “Westmere-EP” processor with Intel’s previous generation of the same microarchitecture, the “Nehalem-EP”. The former is produced in a new 32nm process, the latter in 45nm. Both platforms are dual-socket servers. Multiple benchmarks were used to get a good understanding of the performance of the new processor. We used both industry-standard benchmarks, such as SPEC2006, and specific High Energy Physics benchmarks, representing both simulation of physics detectors and data analysis of physics events. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following features: processor frequency, overclocking via Turbo mode, the number of physical cores in use, the use of logical cores via Simultaneous Multi-Threading (SMT), the cache sizes available, the memory configuration installed, as well...
Accelerated Synchrotron X-ray Diffraction Data Analysis on a Heterogeneous High Performance Computing System

Energy Technology Data Exchange (ETDEWEB)

Qin, J; Bauer, M A, E-mail: qin.jinhui@gmail.com, E-mail: bauer@uwo.ca [Computer Science Department, University of Western Ontario, London, ON N6A 5B7 (Canada)

2010-11-01

The analysis of synchrotron X-ray Diffraction (XRD) data has been used by scientists and engineers to understand and predict properties of materials. However, the large volume of XRD image data and the intensive computations involved in the data analysis makes it hard for researchers to quickly reach any conclusions about the images from an experiment when using conventional XRD data analysis software. Synchrotron time is valuable and delays in XRD data analysis can impact decisions about subsequent experiments or about materials that they are investigating. In order to improve the data analysis performance, ideally to achieve near real time data analysis during an XRD experiment, we designed and implemented software for accelerated XRD data analysis. The software has been developed for a heterogeneous high performance computing (HPC) system, comprised of IBM PowerXCell 8i processors and Intel quad-core Xeon processors. This paper describes the software and reports on the improved performance. The results indicate that it is possible for XRD data to be analyzed at the rate it is being produced.
Accelerated Synchrotron X-ray Diffraction Data Analysis on a Heterogeneous High Performance Computing System

International Nuclear Information System (INIS)

Qin, J; Bauer, M A

2010-01-01

The analysis of synchrotron X-ray Diffraction (XRD) data has been used by scientists and engineers to understand and predict properties of materials. However, the large volume of XRD image data and the intensive computations involved in the data analysis makes it hard for researchers to quickly reach any conclusions about the images from an experiment when using conventional XRD data analysis software. Synchrotron time is valuable and delays in XRD data analysis can impact decisions about subsequent experiments or about materials that they are investigating. In order to improve the data analysis performance, ideally to achieve near real time data analysis during an XRD experiment, we designed and implemented software for accelerated XRD data analysis. The software has been developed for a heterogeneous high performance computing (HPC) system, comprised of IBM PowerXCell 8i processors and Intel quad-core Xeon processors. This paper describes the software and reports on the improved performance. The results indicate that it is possible for XRD data to be analyzed at the rate it is being produced.
Accuracy requirements of optical linear algebra processors in adaptive optics imaging systems

Science.gov (United States)

Downie, John D.

1990-01-01

A ground-based adaptive optics imaging telescope system attempts to improve image quality by detecting and correcting for atmospherically induced wavefront aberrations. The required control computations during each cycle will take a finite amount of time. Longer time delays result in larger values of residual wavefront error variance since the atmosphere continues to change during that time. Thus an optical processor may be well-suited for this task. This paper presents a study of the accuracy requirements in a general optical processor that will make it competitive with, or superior to, a conventional digital computer for the adaptive optics application. An optimization of the adaptive optics correction algorithm with respect to an optical processor's degree of accuracy is also briefly discussed.
Stationary average consensus protocol for a class of heterogeneous high-order multi-agent systems with application for aircraft

Science.gov (United States)

Rezaei, Mohammad Hadi; Menhaj, Mohammad Bagher

2018-01-01

This paper investigates the stationary average consensus problem for a class of heterogeneous-order multi-agent systems. The goal is to bring the positions of agents to the average of their initial positions while letting the other states converge to zero. To this end, three different consensus protocols are proposed. First, based on the auxiliary variables information among the agents under switching directed networks and state-feedback control, a protocol is proposed whereby all the agents achieve stationary average consensus. In the second and third protocols, by resorting to only measurements of relative positions of neighbouring agents under fixed balanced directed networks, two control frameworks are presented with two strategies based on state-feedback and output-feedback control. Finally, simulation results are given to illustrate the effectiveness of the proposed protocols.
A High-Speed and Low-Energy-Consumption Processor for SVD-MIMO-OFDM Systems

Directory of Open Access Journals (Sweden)

Hiroki Iwaizumi

2013-01-01

Full Text Available A processor design for singular value decomposition (SVD and compression/decompression of feedback matrices, which are mandatory operations for SVD multiple-input multiple-output orthogonal frequency-division multiplexing (MIMO-OFDM systems, is proposed and evaluated. SVD-MIMO is a transmission method for suppressing multistream interference and improving communication quality by beamforming. An application specific instruction-set processor (ASIP architecture is adopted to achieve flexibility in terms of operations and matrix size. The proposed processor realizes a high-speed/low-power design and real-time processing by the parallelization of floating-point units (FPUs and arithmetic instructions specialized in complex matrix operations.
ArcFVDSL, a DSEL Combined to HARTS, a Runtime System Layer to Implement Efficient Numerical Methods to Solve Diffusive Problems on New Heterogeneous Hardware Architecture

Directory of Open Access Journals (Sweden)

Gratien Jean-Marc

2017-03-01

Full Text Available Nowadays, some frameworks like Arcane and Dune offer a number of advanced tools to deal with the complexity related to parallelism, meshes and linear solvers. However, they do not handle the high level complexity related to discretization methods and physical models. Generative programming and Domain Specific Languages (DSL are key technologies allowing to write code with a high level expressive language and take advantage of the efficiency of generated code with low level services. DSL may be embedded in host languages like Python or C++. Such languages, named in that case Domain Specific Embedded Languages (DSEL, are applied for instance in frameworks like Fenics or Feel++ which are dedicated to the domain of Finite Element (FE methods and Galerkin methods. ArcFVDSL is a DSEL developed on top of the Arcane framework, aiming to implement various lowest order methods (Finite-Volume (FV, Mimetic Finite Difference (MFD, Mixed Hybrid Finite Volume (MHFV, etc. for diffusive problems on general meshes. In this paper, we present various implementations of different complex academic problems. We focus on the capability of the language to allow the description and the resolution of these problems with several lowest-order methods. We illustrate the benefits of such technology combined to runtime system tools like Heterogeneous Abstract RunTime System (HARTS and its ability to handle seamlessly new heterogeneous architectures with multi-core processors enhanced by General Purpose computing on Graphics Processing Units (GP-GPU. We present the performance results of each implementation on different kinds of heterogeneous hardware architecture.
Critical Path Driven Cosynthesis for Heterogeneous Target Architectures

DEFF Research Database (Denmark)

Bjørn-Jørgensen, Peter; Madsen, Jan

1997-01-01

This paper presents a critical path driven algorithm to produce a static schedule of a single-rate system onto a heterogeneous target architecture. Our algorithm is a list based scheduling algorithm which concurrently assigns tasks to processors and allocates nets to interprocessor communication........ Experimental results show that our algorithm is able to find good results, as compared to other methods, in small amount of CPU time....
Deterministic chaos in the processor load

International Nuclear Information System (INIS)

Halbiniak, Zbigniew; Jozwiak, Ireneusz J.

2007-01-01

In this article we present the results of research whose purpose was to identify the phenomenon of deterministic chaos in the processor load. We analysed the time series of the processor load during efficiency tests of database software. Our research was done on a Sparc Alpha processor working on the UNIX Sun Solaris 5.7 operating system. The conducted analyses proved the presence of the deterministic chaos phenomenon in the processor load in this particular case
Keystone Business Models for Network Security Processors

Directory of Open Access Journals (Sweden)

Arthur Low

2013-07-01

Full Text Available Network security processors are critical components of high-performance systems built for cybersecurity. Development of a network security processor requires multi-domain experience in semiconductors and complex software security applications, and multiple iterations of both software and hardware implementations. Limited by the business models in use today, such an arduous task can be undertaken only by large incumbent companies and government organizations. Neither the “fabless semiconductor” models nor the silicon intellectual-property licensing (“IP-licensing” models allow small technology companies to successfully compete. This article describes an alternative approach that produces an ongoing stream of novel network security processors for niche markets through continuous innovation by both large and small companies. This approach, referred to here as the "business ecosystem model for network security processors", includes a flexible and reconfigurable technology platform, a “keystone” business model for the company that maintains the platform architecture, and an extended ecosystem of companies that both contribute and share in the value created by innovation. New opportunities for business model innovation by participating companies are made possible by the ecosystem model. This ecosystem model builds on: i the lessons learned from the experience of the first author as a senior integrated circuit architect for providers of public-key cryptography solutions and as the owner of a semiconductor startup, and ii the latest scholarly research on technology entrepreneurship, business models, platforms, and business ecosystems. This article will be of interest to all technology entrepreneurs, but it will be of particular interest to owners of small companies that provide security solutions and to specialized security professionals seeking to launch their own companies.
Development of a highly reliable CRT processor

International Nuclear Information System (INIS)

Shimizu, Tomoya; Saiki, Akira; Hirai, Kenji; Jota, Masayoshi; Fujii, Mikiya

1996-01-01

Although CRT processors have been employed by the main control board to reduce the operator's workload during monitoring, the control systems are still operated by hardware switches. For further advancement, direct controller operation through a display device is expected. A CRT processor providing direct controller operation must be as reliable as the hardware switches are. The authors are developing a new type of highly reliable CRT processor that enables direct controller operations. In this paper, we discuss the design principles behind a highly reliable CRT processor. The principles are defined by studies of software reliability and of the functional reliability of the monitoring and operation systems. The functional configuration of an advanced CRT processor is also addressed. (author)
A natural-gas fuel processor for a residential fuel cell system

Science.gov (United States)

Adachi, H.; Ahmed, S.; Lee, S. H. D.; Papadias, D.; Ahluwalia, R. K.; Bendert, J. C.; Kanner, S. A.; Yamazaki, Y.

A system model was used to develop an autothermal reforming fuel processor to meet the targets of 80% efficiency (higher heating value) and start-up energy consumption of less than 500 kJ when operated as part of a 1-kWe natural-gas fueled fuel cell system for cogeneration of heat and power. The key catalytic reactors of the fuel processor - namely the autothermal reformer, a two-stage water gas shift reactor and a preferential oxidation reactor - were configured and tested in a breadboard apparatus. Experimental results demonstrated a reformate containing ∼48% hydrogen (on a dry basis and with pure methane as fuel) and less than 5 ppm CO. The effects of steam-to-carbon and part load operations were explored.

How to program 122,400 heterogeneous cores and retain your sanity

International Nuclear Information System (INIS)

Patkin, Scott

2010-01-01

Current technology trends favor hybrid architectures, typically with each node in a cluster containing both general-purpose and specialized 'accelerator' processors. The typical model for programming such systems is host-centric: The general-purpose processor orchestrates the computation, offloading performance-critical work to the accelerator, and data is communicated only among general-purpose processors. In this talk we propose a radically different hybrid-programming approach, which we call the 'reverse-acceleration model'. In this model the accelerators orchestrate the computation, offloading unacceleratable work to the general-purpose processors. Data is communicated among accelerators, not among general-purpose processors. We present the Cell Messaging Layer (CML), an implementation of the reverse-acceleration model for Los Alamos National Laboratory's Roadrunner supercomputer, a complex conglomerate of 122,400 processor cores of various types, multiple memory domains, and multiple network types, all with radically different performance characteristics but which together make Roadrunner the world's second-fastest supercomputer. CML demonstrates a new messaging-layer implementation technique called 'receiver-initiated message passing', which reduces communication latency by up to a third. Our thesis is that the reverse-acceleration model simplifies porting codes to heterogeneous systems and facilitates performance optimization. We present a case study of a legacy neutron-transport code that we modified to use reverse acceleration. Performance results from running this code across the full Roadrunner system indicate a substantial performance improvement over the unaccelerated version of the code.
Multi-sensor explosive detection system

International Nuclear Information System (INIS)

Gozani, T.; Shea, P.M.; Sawa, Z.P.

1992-01-01

This patent describes an explosive detection system. It comprises a source of neutrons; a detector array comprising a plurality of gamma ray detectors, each of the gamma ray detectors providing a detection signal in the event a gamma ray is captured by the detector, and at least one neutron detector, the neutron detector providing a neutron detection signal in the event a neutron is captured by the neutron detector; means for irradiating an object being examined with neutrons from the neutron source and for positioning the detector array relative to the object so that gamma rays emitted from the elements within the object as a result of the neutron irradiation are detected by the gamma ray detectors of the detector array; and parallel distributed processing means responsive to the detection signals of the detector array for discriminating between objects carrying explosives and objects not carrying explosives, the parallel distributed processing means including an artificial neural system (ANS), the ANS having a parallel network of processors, each processor of the parallel network of processors, each processor of the parallel network of processors including means for receiving at least one input signal, and means for generating an output signal as a function of the at least one input signal
Environmental data processor of the adaptive intrusion data system

International Nuclear Information System (INIS)

Rogers, M.S.

1977-06-01

A data acquisition system oriented specifically toward collection and processing of various meteorological and environmental parameters has been designed around a National Semiconductor IMP-16 microprocessor, This system, called the Environmental Data Processor (EDP), was developed specifically for use with the Adaptive Intrusion Data System (AIDS) in a perimeter intrusion alarm evaluation, although its design is sufficiently general to permit use elsewhere. This report describes in general detail the design of the EDP and its interaction with other AIDS components
Collective Machine Learning: Team Learning and Classification in Multi-Agent Systems

Science.gov (United States)

Gifford, Christopher M.

2009-01-01

This dissertation focuses on the collaboration of multiple heterogeneous, intelligent agents (hardware or software) which collaborate to learn a task and are capable of sharing knowledge. The concept of collaborative learning in multi-agent and multi-robot systems is largely under studied, and represents an area where further research is needed to…
Integrated fuel processor development

International Nuclear Information System (INIS)

Ahmed, S.; Pereira, C.; Lee, S. H. D.; Krumpelt, M.

2001-01-01

The Department of Energy's Office of Advanced Automotive Technologies has been supporting the development of fuel-flexible fuel processors at Argonne National Laboratory. These fuel processors will enable fuel cell vehicles to operate on fuels available through the existing infrastructure. The constraints of on-board space and weight require that these fuel processors be designed to be compact and lightweight, while meeting the performance targets for efficiency and gas quality needed for the fuel cell. This paper discusses the performance of a prototype fuel processor that has been designed and fabricated to operate with liquid fuels, such as gasoline, ethanol, methanol, etc. Rated for a capacity of 10 kWe (one-fifth of that needed for a car), the prototype fuel processor integrates the unit operations (vaporization, heat exchange, etc.) and processes (reforming, water-gas shift, preferential oxidation reactions, etc.) necessary to produce the hydrogen-rich gas (reformate) that will fuel the polymer electrolyte fuel cell stacks. The fuel processor work is being complemented by analytical and fundamental research. With the ultimate objective of meeting on-board fuel processor goals, these studies include: modeling fuel cell systems to identify design and operating features; evaluating alternative fuel processing options; and developing appropriate catalysts and materials. Issues and outstanding challenges that need to be overcome in order to develop practical, on-board devices are discussed
GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS on Intel Xeon Phi processors

Directory of Open Access Journals (Sweden)

H. Wang

2017-08-01

Full Text Available The Global Nested Air Quality Prediction Modeling System (GNAQPMS is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS, which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL. Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC, KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1 updating the pure Message Passing Interface (MPI parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2 fully employing the 512 bit wide vector processing units (VPUs on the KNL platform; (3 reducing unnecessary memory access to improve cache efficiency; (4 reducing the thread local storage (TLS in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5 changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined
The bit slice micro-processor 'GESPRO' as a project in the UA2 experiment

CERN Document Server

Becam, C; Delanghe, J; Fest, H M; Lecoq, J; Martin, H; Mencik, M; MerkeI, B; Meyer, J M; Perrin, M; Plothow, H; Rampazzo, J P; Schittly, A

1981-01-01

The bit slice micro-processor GESPRO is a CAMAC module plugged into a standard Elliot system crate via which it communicates as a slave with its host computer. It has full control of CAMAC as a master unit. GESPRO is a 24 bit machine with multi-mode memory addressing capacity of 64K words. The micro-processor structure uses 5 buses including pipe-line registers to mask access time and 16 interrupt levels. The micro-program memory capacity is 2K (RAM) words of 48 bits each. A special hardwired module allows floating point, as well as integer, multiplication of 24*24 bits, result in 48 bits, in about 200 ns. This micro-processor could be used in the UA2 data acquisition chain and trigger system for the following tasks: (a) online data reduction, i.e. to read DURANDAL, process the information resulting in accepting or rejecting the event; (b) readout and analysis of the accepted data; (c) preprocess the data. The UA2 version of GESPRO is under construction, programs and micro-programs are under development. Hard...
Scalable Task Assignment for Heterogeneous Multi-Robot Teams

Directory of Open Access Journals (Sweden)

Paula García

2013-02-01

Full Text Available This work deals with the development of a dynamic task assignment strategy for heterogeneous multi-robot teams in typical real world scenarios. The strategy must be efficiently scalable to support problems of increasing complexity with minimum designer intervention. To this end, we have selected a very simple auction-based strategy, which has been implemented and analysed in a multi-robot cleaning problem that requires strong coordination and dynamic complex subtask organization. We will show that the selection of a simple auction strategy provides a linear computational cost increase with the number of robots that make up the team and allows the solving of highly complex assignment problems in dynamic conditions by means of a hierarchical sub-auction policy. To coordinate and control the team, a layered behaviour-based architecture has been applied that allows the reusing of the auction-based strategy to achieve different coordination levels.
An Alternative Water Processor for Long Duration Space Missions

Science.gov (United States)

Barta, Daniel J.; Pickering, Karen D.; Meyer, Caitlin; Pennsinger, Stuart; Vega, Leticia; Flynn, Michael; Jackson, Andrew; Wheeler, Raymond

2014-01-01

A new wastewater recovery system has been developed that combines novel biological and physicochemical components for recycling wastewater on long duration human space missions. Functionally, this Alternative Water Processor (AWP) would replace the Urine Processing Assembly on the International Space Station and reduce or eliminate the need for the multi-filtration beds of the Water Processing Assembly (WPA). At its center are two unique game changing technologies: 1) a biological water processor (BWP) to mineralize organic forms of carbon and nitrogen and 2) an advanced membrane processor (Forward Osmosis Secondary Treatment) for removal of solids and inorganic ions. The AWP is designed for recycling larger quantities of wastewater from multiple sources expected during future exploration missions, including urine, hygiene (hand wash, shower, oral and shave) and laundry. The BWP utilizes a single-stage membrane-aerated biological reactor for simultaneous nitrification and denitrification. The Forward Osmosis Secondary Treatment (FOST) system uses a combination of forward osmosis (FO) and reverse osmosis (RO), is resistant to biofouling and can easily tolerate wastewaters high in non-volatile organics and solids associated with shower and/or hand washing. The BWP has been operated continuously for over 300 days. After startup, the mature biological system averaged 85% organic carbon removal and 44% nitrogen removal, close to stoichiometric maximum based on available carbon. To date, the FOST has averaged 93% water recovery, with a maximum of 98%. If the wastewater is slighty acidified, ammonia rejection is optimal. This paper will provide a description of the technology and summarize results from ground-based testing using real wastewater
A single chip pulse processor for nuclear spectroscopy

International Nuclear Information System (INIS)

Hilsenrath, F.; Bakke, J.C.; Voss, H.D.

1985-01-01

A high performance digital pulse processor, integrated into a single gate array microcircuit, has been developed for spaceflight applications. The new approach takes advantage of the latest CMOS high speed A/D flash converters and low-power gated logic arrays. The pulse processor measures pulse height, pulse area and the required timing information (e.g. multi detector coincidence and pulse pile-up detection). The pulse processor features high throughput rate (e.g. 0.5 Mhz for 2 usec gausssian pulses) and improved differential linearity (e.g. + or - 0.2 LSB for a + or - 1 LSB A/D). Because of the parallel digital architecture of the device, the interface is microprocessor bus compatible. A satellite flight application of this module is presented for use in the X-ray imager and high energy particle spectrometers of the PEM experiment on the Upper Atmospheric Research Satellite
Input data requirements for special processors in the computation system containing the VENTURE neutronics code

International Nuclear Information System (INIS)

Vondy, D.R.; Fowler, T.B.; Cunningham, G.W.

1976-11-01

This report presents user input data requirements for certain special processors in a nuclear reactor computation system. These processors generally read data in formatted form and generate binary interface data files. Some data processing is done to convert from the user-oriented form to the interface file forms. The VENTURE diffusion theory neutronics code and other computation modules in this system use the interface data files which are generated
Input data requirements for special processors in the computation system containing the VENTURE neutronics code

International Nuclear Information System (INIS)

Vondy, D.R.; Fowler, T.B.; Cunningham, G.W.

1979-07-01

User input data requirements are presented for certain special processors in a nuclear reactor computation system. These processors generally read data in formatted form and generate binary interface data files. Some data processing is done to convert from the user oriented form to the interface file forms. The VENTURE diffusion theory neutronics code and other computation modules in this system use the interface data files which are generated
A high speed multi-tasking, multi-processor telemetry system

Energy Technology Data Exchange (ETDEWEB)

Wu, Kung Chris [Univ. of Texas, El Paso, TX (United States)

1996-12-31

This paper describes a small size, light weight, multitasking, multiprocessor telemetry system capable of collecting 32 channels of differential signals at a sampling rate of 6.25 kHz per channel. The system is designed to collect data from remote wind turbine research sites and transfer the data via wireless communication. A description of operational theory, hardware components, and itemized cost is provided. Synchronization with other data acquisition systems and test data on data transmission rates is also given. 11 refs., 7 figs., 4 tabs.
Toward an understanding of the building blocks: constructing programs for high processor count systems

International Nuclear Information System (INIS)

Reilly, M H

2008-01-01

Technology and industry trends have clearly shown that the future of technical computing lies in exploitation of more processors in larger multiprocessor systems. Exploitation of high processor count architectures demands a more thorough understanding of the underlying system dynamics and an accounting for them in the design of high-performance applications. Currently these dynamics are incompletely described by the widely adopted benchmarks and kernel metrics. Systems are most often characterized to allow comparisons and ranking. Often the characterizations are in the form of a scalar measure of some aspect of system performance that is a 'not to exceed' number: the maximum possible level of performance that could be attained. While such comparisons typically drive both system design and procurement, more useful characterizations can be used to drive application development and design. This paper explores a few of these measures and presents a few simple examples of their application. The first set of metrics addresses individual processor performance, specifically performance related to memory references. The second set of metrics attempts to describe the behavior of the message-passing system under load and across a range of conditions
Development of a software for a multi-processor system aimed at the on-line control of nuclear physics experiments

International Nuclear Information System (INIS)

Poggioli, Jean Renaud

1984-01-01

This research thesis reports the development of a software for an acquisition computer aimed at the on-line control of nuclear physics experiments. An original architecture, based on the assignment of a processor to each fundamental task, enables the implementation of a high performance system. In order to make the user free of programming constraints, the author developed a software for dynamic generation of acquisition and processing codes. These codes are created from a data base which is programmed by the user by using a language close to the physical reality. Procedures of interactive control of the experiment are thus simplified by displaying function menus on the operator terminal. The author evokes possible hardware improvements and possible extensions of the system [fr
Multimode power processor

Science.gov (United States)

O'Sullivan, George A.; O'Sullivan, Joseph A.

1999-01-01

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.
A uniform approach for programming distributed heterogeneous computing systems.

Science.gov (United States)

Grasso, Ivan; Pellegrini, Simone; Cosenza, Biagio; Fahringer, Thomas

2014-12-01

Large-scale compute clusters of heterogeneous nodes equipped with multi-core CPUs and GPUs are getting increasingly popular in the scientific community. However, such systems require a combination of different programming paradigms making application development very challenging. In this article we introduce libWater, a library-based extension of the OpenCL programming model that simplifies the development of heterogeneous distributed applications. libWater consists of a simple interface, which is a transparent abstraction of the underlying distributed architecture, offering advanced features such as inter-context and inter-node device synchronization. It provides a runtime system which tracks dependency information enforced by event synchronization to dynamically build a DAG of commands, on which we automatically apply two optimizations: collective communication pattern detection and device-host-device copy removal. We assess libWater's performance in three compute clusters available from the Vienna Scientific Cluster, the Barcelona Supercomputing Center and the University of Innsbruck, demonstrating improved performance and scaling with different test applications and configurations.
Using Multi-Core Systems for Rover Autonomy

Science.gov (United States)

Clement, Brad; Estlin, Tara; Bornstein, Benjamin; Springer, Paul; Anderson, Robert C.

2010-01-01

Task Objectives are: (1) Develop and demonstrate key capabilities for rover long-range science operations using multi-core computing, (a) Adapt three rover technologies to execute on SOA multi-core processor (b) Illustrate performance improvements achieved (c) Demonstrate adapted capabilities with rover hardware, (2) Targeting three high-level autonomy technologies (a) Two for onboard data analysis (b) One for onboard command sequencing/planning, (3) Technologies identified as enabling for future missions, (4)Benefits will be measured along several metrics: (a) Execution time / Power requirements (b) Number of data products processed per unit time (c) Solution quality
Embedded Processor Laboratory

Data.gov (United States)

Federal Laboratory Consortium — The Embedded Processor Laboratory provides the means to design, develop, fabricate, and test embedded computers for missile guidance electronics systems in support...
System Level Design of Reconfigurable Server Farms Using Elliptic Curve Cryptography Processor Engines

Directory of Open Access Journals (Sweden)

Sangook Moon

2014-01-01

Full Text Available As today’s hardware architecture becomes more and more complicated, it is getting harder to modify or improve the microarchitecture of a design in register transfer level (RTL. Consequently, traditional methods we have used to develop a design are not capable of coping with complex designs. In this paper, we suggest a way of designing complex digital logic circuits with a soft and advanced type of SystemVerilog at an electronic system level. We apply the concept of design-and-reuse with a high level of abstraction to implement elliptic curve crypto-processor server farms. With the concept of the superior level of abstraction to the RTL used with the traditional HDL design, we successfully achieved the soft implementation of the crypto-processor server farms as well as robust test bench code with trivial effort in the same simulation environment. Otherwise, it could have required error-prone Verilog simulations for the hardware IPs and other time-consuming jobs such as C/SystemC verification for the software, sacrificing more time and effort. In the design of the elliptic curve cryptography processor engine, we propose a 3X faster GF(2m serial multiplication architecture.

Accuracy Limitations in Optical Linear Algebra Processors

Science.gov (United States)

Batsell, Stephen Gordon

1990-01-01

One of the limiting factors in applying optical linear algebra processors (OLAPs) to real-world problems has been the poor achievable accuracy of these processors. Little previous research has been done on determining noise sources from a systems perspective which would include noise generated in the multiplication and addition operations, noise from spatial variations across arrays, and from crosstalk. In this dissertation, we propose a second-order statistical model for an OLAP which incorporates all these system noise sources. We now apply this knowledge to determining upper and lower bounds on the achievable accuracy. This is accomplished by first translating the standard definition of accuracy used in electronic digital processors to analog optical processors. We then employ our second-order statistical model. Having determined a general accuracy equation, we consider limiting cases such as for ideal and noisy components. From the ideal case, we find the fundamental limitations on improving analog processor accuracy. From the noisy case, we determine the practical limitations based on both device and system noise sources. These bounds allow system trade-offs to be made both in the choice of architecture and in individual components in such a way as to maximize the accuracy of the processor. Finally, by determining the fundamental limitations, we show the system engineer when the accuracy desired can be achieved from hardware or architecture improvements and when it must come from signal pre-processing and/or post-processing techniques.
Optimizing structure of complex technical system by heterogeneous vector criterion in interval form

Science.gov (United States)

Lysenko, A. V.; Kochegarov, I. I.; Yurkov, N. K.; Grishko, A. K.

2018-05-01

The article examines the methods of development and multi-criteria choice of the preferred structural variant of the complex technical system at the early stages of its life cycle in the absence of sufficient knowledge of parameters and variables for optimizing this structure. The suggested methods takes into consideration the various fuzzy input data connected with the heterogeneous quality criteria of the designed system and the parameters set by their variation range. The suggested approach is based on the complex use of methods of interval analysis, fuzzy sets theory, and the decision-making theory. As a result, the method for normalizing heterogeneous quality criteria has been developed on the basis of establishing preference relations in the interval form. The method of building preferential relations in the interval form on the basis of the vector of heterogeneous quality criteria suggest the use of membership functions instead of the coefficients considering the criteria value. The former show the degree of proximity of the realization of the designed system to the efficient or Pareto optimal variants. The study analyzes the example of choosing the optimal variant for the complex system using heterogeneous quality criteria.
Optical Associative Processors For Visual Perception"

Science.gov (United States)

Casasent, David; Telfer, Brian

1988-05-01

We consider various associative processor modifications required to allow these systems to be used for visual perception, scene analysis, and object recognition. For these applications, decisions on the class of the objects present in the input image are required and thus heteroassociative memories are necessary (rather than the autoassociative memories that have been given most attention). We analyze the performance of both associative processors and note that there is considerable difference between heteroassociative and autoassociative memories. We describe associative processors suitable for realizing functions such as: distortion invariance (using linear discriminant function memory synthesis techniques), noise and image processing performance (using autoassociative memories in cascade with with a heteroassociative processor and with a finite number of autoassociative memory iterations employed), shift invariance (achieved through the use of associative processors operating on feature space data), and the analysis of multiple objects in high noise (which is achieved using associative processing of the output from symbolic correlators). We detail and provide initial demonstrations of the use of associative processors operating on iconic, feature space and symbolic data, as well as adaptive associative processors.
Energy Efﬁcient Multi-Core Processing

Directory of Open Access Journals (Sweden)

Charles Leech

2014-06-01

Full Text Available This paper evaluates the present state of the art of energy-efficient embedded processor design techniques and demonstrates, how small, variable-architecture embedded processors may exploit a run-time minimal architectural synthesis technique to achieve greater energy and area efficiency whilst maintaining performance. The picoMIPS architecture is presented, inspired by the MIPS, as an example of a minimal and energy efficient processor. The picoMIPS is a variablearchitecture RISC microprocessor with an application-specific minimised instruction set. Each implementation will contain only the necessary datapath elements in order to maximise area efficiency. Due to the relationship between logic gate count and power consumption, energy efficiency is also maximised in the processor therefore the system is designed to perform a specific task in the most efficient processor-based form. The principles of the picoMIPS processor are illustrated with an example of the discrete cosine transform (DCT and inverse DCT (IDCT algorithms implemented in a multi-core context to demonstrate the concept of minimal architecture synthesis and how it can be used to produce an application specific, energy efficient processor.
A dedicated database system for handling multi-level data in systems biology

OpenAIRE

Pornputtapong, Natapol; Wanichthanarak, Kwanjeera; Nilsson, Avlant; Nookaew, Intawat; Nielsen, Jens

2014-01-01

Background Advances in high-throughput technologies have enabled extensive generation of multi-level omics data. These data are crucial for systems biology research, though they are complex, heterogeneous, highly dynamic, incomplete and distributed among public databases. This leads to difficulties in data accessibility and often results in errors when data are merged and integrated from varied resources. Therefore, integration and management of systems biological data remain very challenging...
Multi-region and single-cell sequencing reveal variable genomic heterogeneity in rectal cancer.

Science.gov (United States)

Liu, Mingshan; Liu, Yang; Di, Jiabo; Su, Zhe; Yang, Hong; Jiang, Beihai; Wang, Zaozao; Zhuang, Meng; Bai, Fan; Su, Xiangqian

2017-11-23

Colorectal cancer is a heterogeneous group of malignancies with complex molecular subtypes. While colon cancer has been widely investigated, studies on rectal cancer are very limited. Here, we performed multi-region whole-exome sequencing and single-cell whole-genome sequencing to examine the genomic intratumor heterogeneity (ITH) of rectal tumors. We sequenced nine tumor regions and 88 single cells from two rectal cancer patients with tumors of the same molecular classification and characterized their mutation profiles and somatic copy number alterations (SCNAs) at the multi-region and the single-cell levels. A variable extent of genomic heterogeneity was observed between the two patients, and the degree of ITH increased when analyzed on the single-cell level. We found that major SCNAs were early events in cancer development and inherited steadily. Single-cell sequencing revealed mutations and SCNAs which were hidden in bulk sequencing. In summary, we studied the ITH of rectal cancer at regional and single-cell resolution and demonstrated that variable heterogeneity existed in two patients. The mutational scenarios and SCNA profiles of two patients with treatment naïve from the same molecular subtype are quite different. Our results suggest each tumor possesses its own architecture, which may result in different diagnosis, prognosis, and drug responses. Remarkable ITH exists in the two patients we have studied, providing a preliminary impression of ITH in rectal cancer.
Parallelizing Compiler Framework and API for Power Reduction and Software Productivity of Real-Time Heterogeneous Multicores

Science.gov (United States)

Hayashi, Akihiro; Wada, Yasutaka; Watanabe, Takeshi; Sekiguchi, Takeshi; Mase, Masayoshi; Shirako, Jun; Kimura, Keiji; Kasahara, Hironori

Heterogeneous multicores have been attracting much attention to attain high performance keeping power consumption low in wide spread of areas. However, heterogeneous multicores force programmers very difficult programming. The long application program development period lowers product competitiveness. In order to overcome such a situation, this paper proposes a compilation framework which bridges a gap between programmers and heterogeneous multicores. In particular, this paper describes the compilation framework based on OSCAR compiler. It realizes coarse grain task parallel processing, data transfer using a DMA controller, power reduction control from user programs with DVFS and clock gating on various heterogeneous multicores from different vendors. This paper also evaluates processing performance and the power reduction by the proposed framework on a newly developed 15 core heterogeneous multicore chip named RP-X integrating 8 general purpose processor cores and 3 types of accelerator cores which was developed by Renesas Electronics, Hitachi, Tokyo Institute of Technology and Waseda University. The framework attains speedups up to 32x for an optical flow program with eight general purpose processor cores and four DRP(Dynamically Reconfigurable Processor) accelerator cores against sequential execution by a single processor core and 80% of power reduction for the real-time AAC encoding.
Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Directory of Open Access Journals (Sweden)

Ananya Muddukrishna

2015-01-01

Full Text Available Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. Distributing data across NUMA nodes and manycore processor caches is necessary to reduce the impact of nonuniform latencies. However, techniques for distributing data are error-prone and fragile and require low-level architectural knowledge. Existing task scheduling policies favor quick load-balancing at the expense of locality and ignore NUMA node/manycore cache access latencies while scheduling. Locality-aware scheduling, in conjunction with or as a replacement for existing scheduling, is necessary to minimize NUMA effects and sustain performance. We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. Our technique relieves the programmer from thinking of NUMA system/manycore processor architecture details by delegating data distribution to the runtime system and uses task data dependence information to guide the scheduling of OpenMP tasks to reduce data stall times. We demonstrate our technique on a four-socket AMD Opteron machine with eight NUMA nodes and on the TILEPro64 processor and identify that data distribution and locality-aware task scheduling improve performance up to 69% for scientific benchmarks compared to default policies and yet provide an architecture-oblivious approach for programmers.
Survey of cochlear implant user satisfaction with the Neptune™ waterproof sound processor

Directory of Open Access Journals (Sweden)

Jeroen J. Briaire

2016-04-01

Full Text Available A multi-center self-assessment survey was conducted to evaluate patient satisfaction with the Advanced Bionics Neptune™ waterproof sound processor used with the AquaMic™ totally submersible microphone. Subjective satisfaction with the different Neptune™ wearing options, comfort, ease of use, sound quality and use of the processor in a range of active and water related situations were assessed for 23 adults and 73 children, using an online and paper based questionnaire. Upgraded subjects compared their previous processor to the Neptune™. The Neptune™ was most popular for use in general sports and in the pool. Subjects were satisfied with the sound quality of the sound processor outside and under water and following submersion. Seventyeight percent of subjects rated waterproofness as being very useful and 83% of the newly implanted subjects selected waterproofness as one of the reasons why they chose the Neptune™ processor. Providing a waterproof sound processor is considered by cochlear implant recipients to be useful and important and is a factor in their processor choice. Subjects reported that they were satisfied with the Neptune™ sound quality, ease of use and different wearing options.
Median and Morphological Specialized Processors for a Real-Time Image Data Processing

Directory of Open Access Journals (Sweden)

Kazimierz Wiatr

2002-01-01

Full Text Available This paper presents the considerations on selecting a multiprocessor MISD architecture for fast implementation of the vision image processing. Using the authorÃ¢Â€Â²s earlier experience with real-time systems, implementing of specialized hardware processors based on the programmable FPGA systems has been proposed in the pipeline architecture. In particular, the following processors are presented: median filter and morphological processor. The structure of a universal reconfigurable processor developed has been proposed as well. Experimental results are presented as delays on LCA level implementation for median filter, morphological processor, convolution processor, look-up-table processor, logic processor and histogram processor. These times compare with delays in general purpose processor and DSP processor.
Extensions to the Parallel Real-Time Artificial Intelligence System (PRAIS) for fault-tolerant heterogeneous cycle-stealing reasoning

Science.gov (United States)

Goldstein, David

1991-01-01

Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.
The Multi-energy High precision Data Processor Based on AD7606

Science.gov (United States)

Zhao, Chen; Zhang, Yanchi; Xie, Da

2017-11-01

This paper designs an information collector based on AD7606 to realize the high-precision simultaneous acquisition of multi-source information of multi-energy systems to form the information platform of the energy Internet at Laogang with electricty as its major energy source. Combined with information fusion technologies, this paper analyzes the data to improve the overall energy system scheduling capability and reliability.
Subjective usability of speech, touch and gesture in a heterogeneous multi-display environment

NARCIS (Netherlands)

Jong, A.P.J. de; Tak, S.; Toet, A.; Schultz, S.; Wijbenga, J.P.; Erp, J.B.F. van

2013-01-01

Several interaction techniques have been proposed to enable transfer of information between different displays in heterogeneous multi-display environments. However, it is not clear whether subjective user preference for these different techniques depends on the nature of the displays between which
Implementation of CT and IHT Processors for Invariant Object Recognition System

Directory of Open Access Journals (Sweden)

J. Turan jr.

2004-12-01

Full Text Available This paper presents PDL or ASIC implementation of key modules ofinvariant object recognition system based on the combination of theIncremental Hough transform (IHT, correlation and rapid transform(RT. The invariant object recognition system was represented partiallyin C++ language for general-purpose processor on personal computer andpartially described in VHDL code for implementation in PLD or ASIC.
Next Generation Space Telescope Integrated Science Module Data System

Science.gov (United States)

Schnurr, Richard G.; Greenhouse, Matthew A.; Jurotich, Matthew M.; Whitley, Raymond; Kalinowski, Keith J.; Love, Bruce W.; Travis, Jeffrey W.; Long, Knox S.

1999-01-01

The Data system for the Next Generation Space Telescope (NGST) Integrated Science Module (ISIM) is the primary data interface between the spacecraft, telescope, and science instrument systems. This poster includes block diagrams of the ISIM data system and its components derived during the pre-phase A Yardstick feasibility study. The poster details the hardware and software components used to acquire and process science data for the Yardstick instrument compliment, and depicts the baseline external interfaces to science instruments and other systems. This baseline data system is a fully redundant, high performance computing system. Each redundant computer contains three 150 MHz power PC processors. All processors execute a commercially available real time multi-tasking operating system supporting, preemptive multi-tasking, file management and network interfaces. These six processors in the system are networked together. The spacecraft interface baseline is an extension of the network, which links the six processors. The final selection for Processor busses, processor chips, network interfaces, and high-speed data interfaces will be made during mid 2002.
Applying Hamming Code to Memory System of Safety Grade PLC (POSAFE-Q) Processor Module

Energy Technology Data Exchange (ETDEWEB)

Kim, Taehee; Hwang, Sungjae; Park, Gangmin [POSCO Nuclear Technology, Seoul (Korea, Republic of)

2013-05-15

If some errors such as inverted bits occur in the memory, instructions and data will be corrupted. As a result, the PLC may execute the wrong instructions or refer to the wrong data. Hamming Code can be considered as the solution for mitigating this mis operation. In this paper, we apply hamming Code, then, we inspect whether hamming code is suitable for to the memory system of the processor module. In this paper, we applied hamming code to existing safety grade PLC (POSAFE-Q). Inspection data are collected and they will be referred for improving the PLC in terms of the soundness. In our future work, we will try to improve time delay caused by hamming calculation. It will include CPLD optimization and memory architecture or parts alteration. In addition to these hamming code-based works, we will explore any methodologies such as mirroring for the soundness of safety grade PLC. Hamming code-based works can correct bit errors, but they have limitation in multi bits errors.
Applying Hamming Code to Memory System of Safety Grade PLC (POSAFE-Q) Processor Module

International Nuclear Information System (INIS)

Kim, Taehee; Hwang, Sungjae; Park, Gangmin

2013-01-01

If some errors such as inverted bits occur in the memory, instructions and data will be corrupted. As a result, the PLC may execute the wrong instructions or refer to the wrong data. Hamming Code can be considered as the solution for mitigating this mis operation. In this paper, we apply hamming Code, then, we inspect whether hamming code is suitable for to the memory system of the processor module. In this paper, we applied hamming code to existing safety grade PLC (POSAFE-Q). Inspection data are collected and they will be referred for improving the PLC in terms of the soundness. In our future work, we will try to improve time delay caused by hamming calculation. It will include CPLD optimization and memory architecture or parts alteration. In addition to these hamming code-based works, we will explore any methodologies such as mirroring for the soundness of safety grade PLC. Hamming code-based works can correct bit errors, but they have limitation in multi bits errors
Computer Generated Inputs for NMIS Processor Verification

International Nuclear Information System (INIS)

J. A. Mullens; J. E. Breeding; J. A. McEvers; R. W. Wysor; L. G. Chiang; J. R. Lenarduzzi; J. T. Mihalczo; J. K. Mattingly

2001-01-01

Proper operation of the Nuclear Identification Materials System (NMIS) processor can be verified using computer-generated inputs [BIST (Built-In-Self-Test)] at the digital inputs. Preselected sequences of input pulses to all channels with known correlation functions are compared to the output of the processor. These types of verifications have been utilized in NMIS type correlation processors at the Oak Ridge National Laboratory since 1984. The use of this test confirmed a malfunction in a NMIS processor at the All-Russian Scientific Research Institute of Experimental Physics (VNIIEF) in 1998. The NMIS processor boards were returned to the U.S. for repair and subsequently used in NMIS passive and active measurements with Pu at VNIIEF in 1999
Numerical modeling of macrodispersion in heterogeneous media: a comparison of multi-Gaussian and non-multi-Gaussian models

Science.gov (United States)

Wen, Xian-Huan; Gómez-Hernández, J. Jaime

1998-03-01

The macrodispersion of an inert solute in a 2-D heterogeneous porous media is estimated numerically in a series of fields of varying heterogeneity. Four different random function (RF) models are used to model log-transmissivity (ln T) spatial variability, and for each of these models, ln T variance is varied from 0.1 to 2.0. The four RF models share the same univariate Gaussian histogram and the same isotropic covariance, but differ from one another in terms of the spatial connectivity patterns at extreme transmissivity values. More specifically, model A is a multivariate Gaussian model for which, by definition, extreme values (both high and low) are spatially uncorrelated. The other three models are non-multi-Gaussian: model B with high connectivity of high extreme values, model C with high connectivity of low extreme values, and model D with high connectivities of both high and low extreme values. Residence time distributions (RTDs) and macrodispersivities (longitudinal and transverse) are computed on ln T fields corresponding to the different RF models, for two different flow directions and at several scales. They are compared with each other, as well as with predicted values based on first-order analytical results. Numerically derived RTDs and macrodispersivities for the multi-Gaussian model are in good agreement with analytically derived values using first-order theories for log-transmissivity variance up to 2.0. The results from the non-multi-Gaussian models differ from each other and deviate largely from the multi-Gaussian results even when ln T variance is small. RTDs in non-multi-Gaussian realizations with high connectivity at high extreme values display earlier breakthrough than in multi-Gaussian realizations, whereas later breakthrough and longer tails are observed for RTDs from non-multi-Gaussian realizations with high connectivity at low extreme values. Longitudinal macrodispersivities in the non-multi-Gaussian realizations are, in general, larger than
Design Technology for Heterogeneous Embedded Systems

CERN Document Server

O'Connor, Ian; Piguet, Christian

2012-01-01

Designing technology to address the problem of heterogeneous embedded systems, while remaining compatible with standard “More Moore” flows, i.e. capable of handling simultaneously both silicon complexity and system complexity, represents one of the most important challenges facing the semiconductor industry today. While the micro-electronics industry has built its own specific design methods to focus mainly on the management of complexity through the establishment of abstraction levels, the emergence of device heterogeneity requires new approaches enabling the satisfactory design of physically heterogeneous embedded systems for the widespread deployment of such systems. This book, compiled largely from a set of contributions from participants of past editions of the Winter School on Heterogeneous Embedded Systems Design Technology (FETCH), proposes a broad and holistic overview of design techniques used to tackle the various facets of heterogeneity in terms of technology and opportunities at the physical ...

Advanced control system for the Integral Fast Reactor fuel pin processor

International Nuclear Information System (INIS)

Lau, L.D.; Randall, P.F.; Benedict, R.W.; Levinskas, D.

1993-01-01

A computerized control system has been developed for the remotely-operated fuel pin processor used in the Integral Fast Reactor Program, Fuel Cycle Facility (FCF). The pin processor remotely shears cast EBR- reactor fuel pins to length, inspects them for diameter, straightness, length, and weight, and then inserts acceptable pins into new sodium-loaded stainless-steel fuel element jackets. Two main components comprise the control system: (1) a programmable logic controller (PLC), together with various input/output modules and associated relay ladder-logic associated computer software. The PLC system controls the remote operation of the machine as directed by the OCS, and also monitors the machine operation to make operational data available to the OCS. The OCS allows operator control of the machine, provides nearly real-time viewing of the operational data, allows on-line changes of machine operational parameters, and records the collected data for each acceptable pin on a central data archiving computer. The two main components of the control system provide the operator with various levels of control ranging from manual operation to completely automatic operation by means of a graphic touch screen interface
Array processors based on Gaussian fraction-free method

Energy Technology Data Exchange (ETDEWEB)

Peng, S; Sedukhin, S [Aizu Univ., Aizuwakamatsu, Fukushima (Japan); Sedukhin, I

1998-03-01

The design of algorithmic array processors for solving linear systems of equations using fraction-free Gaussian elimination method is presented. The design is based on a formal approach which constructs a family of planar array processors systematically. These array processors are synthesized and analyzed. It is shown that some array processors are optimal in the framework of linear allocation of computations and in terms of number of processing elements and computing time. (author)
A general-purpose trigger processor system and its application to fast vertex trigger

International Nuclear Information System (INIS)

Hazumi, M.; Banas, E.; Natkaniec, Z.; Ostrowicz, W.

1997-12-01

A general-purpose hardware trigger system has been developed. The system comprises programmable trigger processors and pattern generator/samplers. The hardware design of the system is described. An application as a prototype of the very fast vertex trigger in an asymmetric B-factory at KEK is also explained. (author)
Handoff Rate and Coverage Analysis in Multi-tier Heterogeneous Networks

OpenAIRE

Sadr, Sanam; Adve, Raviraj S.

2015-01-01

This paper analyzes the impact of user mobility in multi-tier heterogeneous networks. We begin by obtaining the handoff rate for a mobile user in an irregular cellular network with the access point locations modeled as a homogeneous Poisson point process. The received signal-to-interference-ratio (SIR) distribution along with a chosen SIR threshold is then used to obtain the probability of coverage. To capture potential connection failures due to mobility, we assume that a fraction of handoff...
Producing chopped firewood with firewood processors

International Nuclear Information System (INIS)

Kaerhae, K.; Jouhiaho, A.

2009-01-01

The TTS Institute's research and development project studied both the productivity of new, chopped firewood processors (cross-cutting and splitting machines) suitable for professional and independent small-scale production, and the costs of the chopped firewood produced. Seven chopped firewood processors were tested in the research, six of which were sawing processors and one shearing processor. The chopping work was carried out using wood feeding racks and a wood lifter. The work was also carried out without any feeding appliances. Altogether 132.5 solid m 3 of wood were chopped in the time studies. The firewood processor used had the most significant impact on chopping work productivity. In addition to the firewood processor, the stem mid-diameter, the length of the raw material, and of the firewood were also found to affect productivity. The wood feeding systems also affected productivity. If there is a feeding rack and hydraulic grapple loader available for use in chopping firewood, then it is worth using the wood feeding rack. A wood lifter is only worth using with the largest stems (over 20 cm mid-diameter) if a feeding rack cannot be used. When producing chopped firewood from small-diameter wood, i.e. with a mid-diameter less than 10 cm, the costs of chopping work were over 10 EUR solid m -3 with sawing firewood processors. The shearing firewood processor with a guillotine blade achieved a cost level of 5 EUR solid m -3 when the mid-diameter of the chopped stem was 10 cm. In addition to the raw material, the cost-efficient chopping work also requires several hundred annual operating hours with a firewood processor, which is difficult for individual firewood entrepreneurs to achieve. The operating hours of firewood processors can be increased to the required level by the joint use of the processors by a number of firewood entrepreneurs. (author)
Processor farming in two-level analysis of historical bridge

Science.gov (United States)

Krejčí, T.; Kruis, J.; Koudelka, T.; Šejnoha, M.

2017-11-01

This contribution presents a processor farming method in connection with a multi-scale analysis. In this method, each macro-scopic integration point or each finite element is connected with a certain meso-scopic problem represented by an appropriate representative volume element (RVE). The solution of a meso-scale problem provides then effective parameters needed on the macro-scale. Such an analysis is suitable for parallel computing because the meso-scale problems can be distributed among many processors. The application of the processor farming method to a real world masonry structure is illustrated by an analysis of Charles bridge in Prague. The three-dimensional numerical model simulates the coupled heat and moisture transfer of one half of arch No. 3. and it is a part of a complex hygro-thermo-mechanical analysis which has been developed to determine the influence of climatic loading on the current state of the bridge.
Interconnecting heterogeneous database management systems

Science.gov (United States)

Gligor, V. D.; Luckenbaugh, G. L.

1984-01-01

It is pointed out that there is still a great need for the development of improved communication between remote, heterogeneous database management systems (DBMS). Problems regarding the effective communication between distributed DBMSs are primarily related to significant differences between local data managers, local data models and representations, and local transaction managers. A system of interconnected DBMSs which exhibit such differences is called a network of distributed, heterogeneous DBMSs. In order to achieve effective interconnection of remote, heterogeneous DBMSs, the users must have uniform, integrated access to the different DBMs. The present investigation is mainly concerned with an analysis of the existing approaches to interconnecting heterogeneous DBMSs, taking into account four experimental DBMS projects.
A low power biomedical signal processor ASIC based on hardware software codesign.

Science.gov (United States)

Nie, Z D; Wang, L; Chen, W G; Zhang, T; Zhang, Y T

2009-01-01

A low power biomedical digital signal processor ASIC based on hardware and software codesign methodology was presented in this paper. The codesign methodology was used to achieve higher system performance and design flexibility. The hardware implementation included a low power 32bit RISC CPU ARM7TDMI, a low power AHB-compatible bus, and a scalable digital co-processor that was optimized for low power Fast Fourier Transform (FFT) calculations. The co-processor could be scaled for 8-point, 16-point and 32-point FFTs, taking approximate 50, 100 and 150 clock circles, respectively. The complete design was intensively simulated using ARM DSM model and was emulated by ARM Versatile platform, before conducted to silicon. The multi-million-gate ASIC was fabricated using SMIC 0.18 microm mixed-signal CMOS 1P6M technology. The die area measures 5,000 microm x 2,350 microm. The power consumption was approximately 3.6 mW at 1.8 V power supply and 1 MHz clock rate. The power consumption for FFT calculations was less than 1.5 % comparing with the conventional embedded software-based solution.
An improved multi-value cellular automata model for heterogeneous bicycle traffic flow

Energy Technology Data Exchange (ETDEWEB)

Jin, Sheng [College of Civil Engineering and Architecture, Zhejiang University, Hangzhou, 310058 China (China); Qu, Xiaobo [Griffith School of Engineering, Griffith University, Gold Coast, 4222 Australia (Australia); Xu, Cheng [Department of Transportation Management Engineering, Zhejiang Police College, Hangzhou, 310053 China (China); College of Transportation, Jilin University, Changchun, 130022 China (China); Ma, Dongfang, E-mail: mdf2004@zju.edu.cn [Ocean College, Zhejiang University, Hangzhou, 310058 China (China); Wang, Dianhai [College of Civil Engineering and Architecture, Zhejiang University, Hangzhou, 310058 China (China)

2015-10-16

This letter develops an improved multi-value cellular automata model for heterogeneous bicycle traffic flow taking the higher maximum speed of electric bicycles into consideration. The update rules of both regular and electric bicycles are improved, with maximum speeds of two and three cells per second respectively. Numerical simulation results for deterministic and stochastic cases are obtained. The fundamental diagrams and multiple states effects under different model parameters are analyzed and discussed. Field observations were made to calibrate the slowdown probabilities. The results imply that the improved extended Burgers cellular automata (IEBCA) model is more consistent with the field observations than previous models and greatly enhances the realism of the bicycle traffic model. - Highlights: • We proposed an improved multi-value CA model with higher maximum speed. • Update rules are introduced for heterogeneous bicycle traffic with maximum speed 2 and 3 cells/s. • Simulation results of the proposed model are consistent with field bicycle data. • Slowdown probabilities of both regular and electric bicycles are calibrated.
An improved multi-value cellular automata model for heterogeneous bicycle traffic flow

International Nuclear Information System (INIS)

Jin, Sheng; Qu, Xiaobo; Xu, Cheng; Ma, Dongfang; Wang, Dianhai

2015-01-01

This letter develops an improved multi-value cellular automata model for heterogeneous bicycle traffic flow taking the higher maximum speed of electric bicycles into consideration. The update rules of both regular and electric bicycles are improved, with maximum speeds of two and three cells per second respectively. Numerical simulation results for deterministic and stochastic cases are obtained. The fundamental diagrams and multiple states effects under different model parameters are analyzed and discussed. Field observations were made to calibrate the slowdown probabilities. The results imply that the improved extended Burgers cellular automata (IEBCA) model is more consistent with the field observations than previous models and greatly enhances the realism of the bicycle traffic model. - Highlights: • We proposed an improved multi-value CA model with higher maximum speed. • Update rules are introduced for heterogeneous bicycle traffic with maximum speed 2 and 3 cells/s. • Simulation results of the proposed model are consistent with field bicycle data. • Slowdown probabilities of both regular and electric bicycles are calibrated
A digital signal processor based rf control system for the TRIUMF ISAC RFQ prototype

International Nuclear Information System (INIS)

Fong, K.; Fang, S.; Laverty, M.

1996-01-01

A stand alone digital signal processor is used to control the RFQ prototype in the TRIUMF ISAC development program. The advantage of a digital control system over the traditional analogue system is that it offers the higher degree of flexibility necessary for a development system. For this application the system is designed to have the outward appearance of an analogue system, and uses dials, knobs, and switches as the operator interface. The digital signal processor is used as a feedback controller during CW rf operation, with the feedback gain parameters continually adjustable. It is also able to perform the same regulation during pulsed operation, with additional feedforward compensation for initial pulse on duration. Using a low cost analogue-to-digital converter with a sample rate of 100 kHz, a regulation bandwidth of 10 kHz is achieved. (author)
A Wearable Healthcare System With a 13.7 μA Noise Tolerant ECG Processor.

Science.gov (United States)

Izumi, Shintaro; Yamashita, Ken; Nakano, Masanao; Kawaguchi, Hiroshi; Kimura, Hiromitsu; Marumoto, Kyoji; Fuchikami, Takaaki; Fujimori, Yoshikazu; Nakajima, Hiroshi; Shiga, Toshikazu; Yoshimoto, Masahiko

2015-10-01

To prevent lifestyle diseases, wearable bio-signal monitoring systems for daily life monitoring have attracted attention. Wearable systems have strict size and weight constraints, which impose significant limitations of the battery capacity and the signal-to-noise ratio of bio-signals. This report describes an electrocardiograph (ECG) processor for use with a wearable healthcare system. It comprises an analog front end, a 12-bit ADC, a robust Instantaneous Heart Rate (IHR) monitor, a 32-bit Cortex-M0 core, and 64 Kbyte Ferroelectric Random Access Memory (FeRAM). The IHR monitor uses a short-term autocorrelation (STAC) algorithm to improve the heart-rate detection accuracy despite its use in noisy conditions. The ECG processor chip consumes 13.7 μA for heart rate logging application.
A digital retina-like low-level vision processor.

Science.gov (United States)

Mertoguno, S; Bourbakis, N G

2003-01-01

This correspondence presents the basic design and the simulation of a low level multilayer vision processor that emulates to some degree the functional behavior of a human retina. This retina-like multilayer processor is the lower part of an autonomous self-organized vision system, called Kydon, that could be used on visually impaired people with a damaged visual cerebral cortex. The Kydon vision system, however, is not presented in this paper. The retina-like processor consists of four major layers, where each of them is an array processor based on hexagonal, autonomous processing elements that perform a certain set of low level vision tasks, such as smoothing and light adaptation, edge detection, segmentation, line recognition and region-graph generation. At each layer, the array processor is a 2D array of k/spl times/m hexagonal identical autonomous cells that simultaneously execute certain low level vision tasks. Thus, the hardware design and the simulation at the transistor level of the processing elements (PEs) of the retina-like processor and its simulated functionality with illustrative examples are provided in this paper.
FY1995 study of design methodology and environment of high-performance processor architectures; 1995 nendo koseino processor architecture sekkeiho to sekkei kankyo no kenkyu

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-03-01

The aim of our project is to develop high-performance processor architectures for both general purpose and application-specific purpose. We also plan to develop basic softwares, such as compliers, and various design aid tools for those architectures. We are particularly interested in performance evaluation at architecture design phase, design optimization, automatic generation of compliers from processor designs, and architecture design methodologies combined with circuit layout. We have investigated both microprocessor architectures and design methodologies / environments for the processors. Our goal is to establish design technologies for high-performance, low-power, low-cost and highly-reliable systems in system-on-silicon era. We have proposed PPRAM architecture for high-performance system using DRAM and logic mixture technology, Softcore processor architecture for special purpose processors in embedded systems, and Power-Pro architecture for low power systems. We also developed design methodologies and design environments for the above architectures as well as a new method for design verification of microprocessors. (NEDO)
Power-Aware DVB-H Mobile TV System on Heterogeneous Multicore Platform

Directory of Open Access Journals (Sweden)

Chao Han-Chieh

2010-01-01

Full Text Available In mobile communication network, the mobile device integrated with TV player is a novel technology that provides TV program services to end users. As TV program is a real-time video service, it has greater technical difficulties to overcome than a traditional video file download or online streaming, especially when TV programs are played on handheld devices. A challenge is how to save power in order to provide users with longer TV program services. To address this issue, this study proposes a mobile TV system on a heterogeneous multicore platform, which utilizes a Digital Video Broadcasting-Handheld (DVB-H wireless network to receive the TV program signal, thus, saving power according to the features of DVB-H TV signal and heterogeneous multi-core.
Simulation of a 250 kW diesel fuel processor/PEM fuel cell system

Science.gov (United States)

Amphlett, J. C.; Mann, R. F.; Peppley, B. A.; Roberge, P. R.; Rodrigues, A.; Salvador, J. P.

Polymer-electrolyte membrane (PEM) fuel cell systems offer a potential power source for utility and mobile applications. Practical fuel cell systems use fuel processors for the production of hydrogen-rich gas. Liquid fuels, such as diesel or other related fuels, are attractive options as feeds to a fuel processor. The generation of hydrogen gas for fuel cells, in most cases, becomes the crucial design issue with respect to weight and volume in these applications. Furthermore, these systems will require a gas clean-up system to insure that the fuel quality meets the demands of the cell anode. The endothermic nature of the reformer will have a significant affect on the overall system efficiency. The gas clean-up system may also significantly effect the overall heat balance. To optimize the performance of this integrated system, therefore, waste heat must be used effectively. Previously, we have concentrated on catalytic methanol-steam reforming. A model of a methanol steam reformer has been previously developed and has been used as the basis for a new, higher temperature model for liquid hydrocarbon fuels. Similarly, our fuel cell evaluation program previously led to the development of a steady-state electrochemical fuel cell model (SSEM). The hydrocarbon fuel processor model and the SSEM have now been incorporated in the development of a process simulation of a 250 kW diesel-fueled reformer/fuel cell system using a process simulator. The performance of this system has been investigated for a variety of operating conditions and a preliminary assessment of thermal integration issues has been carried out. This study demonstrates the application of a process simulation model as a design analysis tool for the development of a 250 kW fuel cell system.
Accuracies Of Optical Processors For Adaptive Optics

Science.gov (United States)

Downie, John D.; Goodman, Joseph W.

1992-01-01

Paper presents analysis of accuracies and requirements concerning accuracies of optical linear-algebra processors (OLAP's) in adaptive-optics imaging systems. Much faster than digital electronic processor and eliminate some residual distortion. Question whether errors introduced by analog processing of OLAP overcome advantage of greater speed. Paper addresses issue by presenting estimate of accuracy required in general OLAP that yields smaller average residual aberration of wave front than digital electronic processor computing at given speed.
The performance of an LSI-11/23 with a SKYMNK-Q array processor as a high speed front end processor

International Nuclear Information System (INIS)

Clark, D.L.

1983-01-01

The NSRL has recently installed a VAX-11/750 based data acquisition system which is networked to two LSI-11/23 satellite processors. Each of the LSI's are connected to CAMAC branch drivers. The LSI's have small array processors installed for use in preprocessing data. The objective is to provide an easy to use high speed processor that will relieve the VAX of some of the real-time data analysis tasks. The basic operation of the array processor and some of the results of performance tests are described
The software design of multi-branch, multi-point remote monitoring system for temperature measurement based on MSP430 and DS18B20

International Nuclear Information System (INIS)

Yu Jun; Yan Yu

2009-01-01

This paper present that the system can acquire the remote temperature measurement data of 40 monitoring points, through the RS-232 serial port and the Intranet. System's hardware is consist of TI's MSP430F149 mixed-signal processor and UA7000A network module. Using digital temperature sensor DS18B20, the structure is simple and easy to expand, the sensors directly send out the temperature data, MSP430F149 has the advantage of ultra-low-power and high degree of integration. Using msp430F149, the multi-branch multi-point temperature measurement system is powerful, simple structure, high reliability, strong anti-interference capability. The client software is user-friendly and easy to use, it is designed in Microsoft Visual C+ +6.0 environment. The monitoring system is able to complete a total of 4 branches of the 40-point temperature measurements in real-time remote monitoring. (authors)
Multibus-based parallel processor for simulation

Science.gov (United States)

Ogrady, E. P.; Wang, C.-H.

1983-01-01

A Multibus-based parallel processor simulation system is described. The system is intended to serve as a vehicle for gaining hands-on experience, testing system and application software, and evaluating parallel processor performance during development of a larger system based on the horizontal/vertical-bus interprocessor communication mechanism. The prototype system consists of up to seven Intel iSBC 86/12A single-board computers which serve as processing elements, a multiple transmission controller (MTC) designed to support system operation, and an Intel Model 225 Microcomputer Development System which serves as the user interface and input/output processor. All components are interconnected by a Multibus/IEEE 796 bus. An important characteristic of the system is that it provides a mechanism for a processing element to broadcast data to other selected processing elements. This parallel transfer capability is provided through the design of the MTC and a minor modification to the iSBC 86/12A board. The operation of the MTC, the basic hardware-level operation of the system, and pertinent details about the iSBC 86/12A and the Multibus are described.

Achieving semantic interoperability in multi-agent systems: A dialogue-based approach

NARCIS (Netherlands)

Diggelen, J. van

2007-01-01

Software agents sharing the same ontology can exchange their knowledge fluently as their knowledge representations are compatible with respect to the concepts regarded as relevant and with respect to the names given to these concepts. However, in open heterogeneous multi-agent systems, this scenario
A Sharable and Efficient Metadata Model for Heterogeneous Earth Observation Data Retrieval in Multi-Scale Flood Mapping

Directory of Open Access Journals (Sweden)

Nengcheng Chen

2015-07-01

Full Text Available Remote sensing plays an important role in flood mapping and is helping advance flood monitoring and management. Multi-scale flood mapping is necessary for dividing floods into several stages for comprehensive management. However, existing data systems are typically heterogeneous owing to the use of different access protocols and archiving metadata models. In this paper, we proposed a sharable and efficient metadata model (APEOPM for constructing an Earth observation (EO data system to retrieve remote sensing data for flood mapping. The proposed model contains two sub-models, an access protocol model and an enhanced encoding model. The access protocol model helps unify heterogeneous access protocols and can achieve intelligent access via a semantic enhancement method. The enhanced encoding model helps unify a heterogeneous archiving metadata model. Wuhan city, one of the most important cities in the Yangtze River Economic Belt in China, is selected as a study area for testing the retrieval of heterogeneous EO data and flood mapping. The past torrential rain period from 25 March 2015 to 10 April 2015 is chosen as the temporal range in this study. To aid in comprehensive management, mapping is conducted at different spatial and temporal scales. In addition, the efficiency of data retrieval is analyzed, and validation between the flood maps and actual precipitation was conducted. The results show that the flood map coincided with the actual precipitation.
Statistical mechanics of socio-economic systems with heterogeneous agents

International Nuclear Information System (INIS)

De Martino, Andrea; Marsili, Matteo

2006-01-01

We review the statistical mechanics approach to the study of the emerging collective behaviour of systems of heterogeneous interacting agents. The general framework is presented through examples in such contexts as ecosystem dynamics and traffic modelling. We then focus on the analysis of the optimal properties of large random resource-allocation problems and on Minority Games and related models of speculative trading in financial markets, discussing a number of extensions including multi-asset models, majority games and models with asymmetric information. Finally, we summarize the main conclusions and outline the major open problems and limitations of the approach. (topical review)
XOP, a fast versatile processor, as a building block for parallel processing in high energy physics experiments

International Nuclear Information System (INIS)

Baehler, P.; Bosco, N.; Lingjaerde, T.; Ljuslin, C.; Van Praag, A.; Werner, P.

1986-01-01

The XOP processor has been designed for trigger calculation and data compression in high energy physics experiments. Therefore, emphasis has been placed upon fast execution and high input/output rate. The fast execution is achieved by a wide instruction word holding operations which are executed concurrently. Thus, the arithmetic operations, data address calculations, data accessing, condition checking, loop count checking and next instruction evaluation all overlap in time. In conventional micro-processors these operations are performed sequentially. In addition, the instruction set comprises not only the classical computer instructions, but also specialized instructions suitable for trigger calculations, such as bit search, population count, loose compare and vector instructions. In order to achieve a high input/output rate, each XOP ECLine interface board is equipped with an input and an output port which fulfil the LeCroy ECLine specifications. The autonomous input port allows a data rate of 40 Mbytes/sec, while the program controlled output port allows 20 Mbytes/sec. For Fastbus based systems a dual Fastbus master interface is under design which allows to build up a Fastbus multi-processor system. This design is being done in collaboration with LAPP Annecy for the CERN Lep L3 experiment. Their scheme comprises 4-5 XOP processors, each of them with a master interface on a data input segment and a master interface on a data output segment. This paper describes the structure of the XOP processor, the interface capabilities and the software development and debugging tools. (Auth.)
Multi-petascale highly efficient parallel supercomputer

Science.gov (United States)

Asaad, Sameh; Bellofatto, Ralph E.; Blocksome, Michael A.; Blumrich, Matthias A.; Boyle, Peter; Brunheroto, Jose R.; Chen, Dong; Cher, Chen -Yong; Chiu, George L.; Christ, Norman; Coteus, Paul W.; Davis, Kristan D.; Dozsa, Gabor J.; Eichenberger, Alexandre E.; Eisley, Noel A.; Ellavsky, Matthew R.; Evans, Kahn C.; Fleischer, Bruce M.; Fox, Thomas W.; Gara, Alan; Giampapa, Mark E.; Gooding, Thomas M.; Gschwind, Michael K.; Gunnels, John A.; Hall, Shawn A.; Haring, Rudolf A.; Heidelberger, Philip; Inglett, Todd A.; Knudson, Brant L.; Kopcsay, Gerard V.; Kumar, Sameer; Mamidala, Amith R.; Marcella, James A.; Megerian, Mark G.; Miller, Douglas R.; Miller, Samuel J.; Muff, Adam J.; Mundy, Michael B.; O'Brien, John K.; O'Brien, Kathryn M.; Ohmacht, Martin; Parker, Jeffrey J.; Poole, Ruth J.; Ratterman, Joseph D.; Salapura, Valentina; Satterfield, David L.; Senger, Robert M.; Smith, Brian; Steinmacher-Burow, Burkhard; Stockdell, William M.; Stunkel, Craig B.; Sugavanam, Krishnan; Sugawara, Yutaka; Takken, Todd E.; Trager, Barry M.; Van Oosten, James L.; Wait, Charles D.; Walkup, Robert E.; Watson, Alfred T.; Wisniewski, Robert W.; Wu, Peng

2015-07-14

A Multi-Petascale Highly Efficient Parallel Supercomputer of 100 petaOPS-scale computing, at decreased cost, power and footprint, and that allows for a maximum packaging density of processing nodes from an interconnect point of view. The Supercomputer exploits technological advances in VLSI that enables a computing model where many processors can be integrated into a single Application Specific Integrated Circuit (ASIC). Each ASIC computing node comprises a system-on-chip ASIC utilizing four or more processors integrated into one die, with each having full access to all system resources and enabling adaptive partitioning of the processors to functions such as compute or messaging I/O on an application by application basis, and preferably, enable adaptive partitioning of functions in accordance with various algorithmic phases within an application, or if I/O or other processors are underutilized, then can participate in computation or communication nodes are interconnected by a five dimensional torus network with DMA that optimally maximize the throughput of packet communications between nodes and minimize latency.
Development of a Survivable Cloud Multi-Robot Framework for Heterogeneous Environments

Directory of Open Access Journals (Sweden)

Isaac Osunmakinde

2014-10-01

Full Text Available Cloud robotics is a paradigm that allows for robots to offload computationally intensive and data storage requirements into the cloud by providing a secure and customizable environment. The challenge for cloud robotics is the inherent problem of cloud disconnection. A major assumption made in the development of the current cloud robotics frameworks is that the connection between the cloud and the robot is always available. However, for multi-robots working in heterogeneous environments, the connection between the cloud and the robots cannot always be guaranteed. This work serves to assist with the challenge of disconnection in cloud robotics by proposing a survivable cloud multi-robotics (SCMR framework for heterogeneous environments. The SCMR framework leverages the combination of a virtual ad hoc network formed by robot-to-robot communication and a physical cloud infrastructure formed by robot-to-cloud communications. The quality of service (QoS on the SCMR framework was tested and validated by determining the optimal energy utilization and time of response (ToR on drivability analysis with and without cloud connection. The design trade-off, including the result, is between the computation energy for the robot execution and the offloading energy for the cloud execution.
Using Runtime Systems Tools to Implement Efficient Preconditioners for Heterogeneous Architectures

Directory of Open Access Journals (Sweden)

Roussel Adrien

2016-11-01

Full Text Available Solving large sparse linear systems is a time-consuming step in basin modeling or reservoir simulation. The choice of a robust preconditioner strongly impact the performance of the overall simulation. Heterogeneous architectures based on General Purpose computing on Graphic Processing Units (GPGPU or many-core architectures introduce programming challenges which can be managed in a transparent way for developer with the use of runtime systems. Nevertheless, algorithms need to be well suited for these massively parallel architectures. In this paper, we present preconditioning techniques which enable to take advantage of emerging architectures. We also present our task-based implementations through the use of the HARTS (Heterogeneous Abstract RunTime System runtime system, which aims to manage the recent architectures. We focus on two preconditoners. The first is ILU(0 preconditioner implemented on distributing memory systems. The second one is a multi-level domain decomposition method implemented on a shared-memory system. Obtained results are then presented on corresponding architectures, which open the way to discuss on the scalability of such methods according to numerical performances while keeping in mind that the next step is to propose a massively parallel implementations of these techniques.
The power processor of a high temperature superconducting energy storage system

Energy Technology Data Exchange (ETDEWEB)

Ollila, J. [Power Electronics, Tampere University of Technology, Tampere (Finland)

1997-12-31

This report introduces the structure and properties of a power processor unit for a high temperature superconducting magnetic energy storage system which is bused in an UPS demonstration application. The operation is first demonstrated using simulations. The software based operating and control system utilising combined Delta-Sigma and Sliding-Mode control is described shortly. Preliminary test results using a conventional NbTi superconducting energy y storage magnet operating at 4.2 K is shown. (orig.)
Multi-Objective Clustering Optimization for Multi-Channel Cooperative Spectrum Sensing in Heterogeneous Green CRNs

KAUST Repository

Celik, Abdulkadir

2016-06-27

In this paper, we address energy efficient (EE) cooperative spectrum sensing policies for large scale heterogeneous cognitive radio networks (CRNs) which consist of multiple primary channels and large number of secondary users (SUs) with heterogeneous sensing and reporting channel qualities. We approach this issue from macro and micro perspectives. Macro perspective groups SUs into clusters with the objectives: 1) total energy consumption minimization; 2) total throughput maximization; and 3) inter-cluster energy and throughput fairness. We adopt and demonstrate how to solve these using the nondominated sorting genetic algorithm-II. The micro perspective, on the other hand, operates as a sub-procedure on cluster formations decided by the macro perspective. For the micro perspectives, we first propose a procedure to select the cluster head (CH) which yields: 1) the best CH which gives the minimum total multi-hop error rate and 2) the optimal routing paths from SUs to the CH. Exploiting Poisson-Binomial distribution, a novel and generalized K-out-of-N voting rule is developed for heterogeneous CRNs to allow SUs to have different local detection performances. Then, a convex optimization framework is established to minimize the intra-cluster energy cost by jointly obtaining the optimal sensing durations and thresholds of feature detectors for the proposed voting rule. Likewise, instead of a common fixed sample size test, we developed a weighted sample size test for quantized soft decision fusion to obtain a more EE regime under heterogeneity. We have shown that the combination of proposed CH selection and cooperation schemes gives a superior performance in terms of energy efficiency and robustness against reporting error wall.
An interactive parallel processor for data analysis

International Nuclear Information System (INIS)

Mong, J.; Logan, D.; Maples, C.; Rathbun, W.; Weaver, D.

1984-01-01

A parallel array of eight minicomputers has been assembled in an attempt to deal with kiloparameter data events. By exporting computer system functions to a separate processor, the authors have been able to achieve computer amplification linearly proportional to the number of executing processors
A Compute Environment of ABC95 Array Computer Based on Multi-FPGA Chip

Institute of Scientific and Technical Information of China (English)

无

2000-01-01

ABC95 array computer is a multi-function network's computer based on FPGA technology, The multi-function network supports processors conflict-free access data from memory and supports processors access data from processors based on enhanced MESH network.ABC95 instruction's system includes control instructions, scalar instructions, vectors instructions.Mostly net-work instructions are introduced.A programming environment of ABC95 array computer assemble language is designed.A programming environment of ABC95 array computer for VC++ is advanced.It includes load function of ABC95 array computer program and data, store function, run function and so on.Specially, The data type of ABC95 array computer conflict-free access is defined.The results show that these technologies can develop programmer of ABC95 array computer effectively.
An efficient ASIC implementation of 16-channel on-line recursive ICA processor for real-time EEG system.

Science.gov (United States)

Fang, Wai-Chi; Huang, Kuan-Ju; Chou, Chia-Ching; Chang, Jui-Chung; Cauwenberghs, Gert; Jung, Tzyy-Ping

2014-01-01

This is a proposal for an efficient very-large-scale integration (VLSI) design, 16-channel on-line recursive independent component analysis (ORICA) processor ASIC for real-time EEG system, implemented with TSMC 40 nm CMOS technology. ORICA is appropriate to be used in real-time EEG system to separate artifacts because of its highly efficient and real-time process features. The proposed ORICA processor is composed of an ORICA processing unit and a singular value decomposition (SVD) processing unit. Compared with previous work [1], this proposed ORICA processor has enhanced effectiveness and reduced hardware complexity by utilizing a deeper pipeline architecture, shared arithmetic processing unit, and shared registers. The 16-channel random signals which contain 8-channel super-Gaussian and 8-channel sub-Gaussian components are used to analyze the dependence of the source components, and the average correlation coefficient is 0.95452 between the original source signals and extracted ORICA signals. Finally, the proposed ORICA processor ASIC is implemented with TSMC 40 nm CMOS technology, and it consumes 15.72 mW at 100 MHz operating frequency.
Real-time wavefront processors for the next generation of adaptive optics systems: a design and analysis

Science.gov (United States)

Truong, Tuan; Brack, Gary L.; Troy, Mitchell; Trinh, Thang; Shi, Fang; Dekany, Richard G.

2003-02-01

Adaptive optics (AO) systems currently under investigation will require at least two orders of magitude increase in the number of actuators, which in turn translates to effectively a 104 increase in compute latency. Since the performance of an AO system invariably improves as the compute latency decreases, it is important to study how today's computer systems will scale to address this expected increase in actuator utilization. This paper answers this question by characterizing the performance of a single deformable mirror (DM) Shack-Hartmann natural guide star AO system implemented on the present-generation digital signal processor (DSP) TMS320C6701 from Texas Instruments. We derive the compute latency of such a system in terms of a few basic parameters, such as the number of DM actuators, the number of data channels used to read out the camera pixels, the number of DSPs, the available memory bandwidth, as well as the inter-processor communication (IPC) bandwidth and the pixel transfer rate. We show how the results would scale for future systems that utilizes multiple DMs and guide stars. We demonstrate that the principal performance bottleneck of such a system is the available memory bandwidth of the processors and to lesser extent the IPC bandwidth. This paper concludes with suggestions for mitigating this bottleneck.
Development methods for VLSI-processors

International Nuclear Information System (INIS)

Horninger, K.; Sandweg, G.

1982-01-01

The aim of this project, which was originally planed for 3 years, was the development of modern system and circuit concepts, for VLSI-processors having a 32 bit wide data path. The result of this first years work is the concept of a general purpose processor. This processor is not only logically but also physically (on the chip) divided into four functional units: a microprogrammable instruction unit, an execution unit in slice technique, a fully associative cache memory and an I/O unit. For the ALU of the execution unit circuits in PLA and slice techniques have been realized. On the basis of regularity, area consumption and achievable performance the slice technique has been prefered. The designs utilize selftesting circuitry. (orig.) [de
High-speed packet filtering utilizing stream processors

Science.gov (United States)

Hummel, Richard J.; Fulp, Errin W.

2009-04-01

Parallel firewalls offer a scalable architecture for the next generation of high-speed networks. While these parallel systems can be implemented using multiple firewalls, the latest generation of stream processors can provide similar benefits with a significantly reduced latency due to locality. This paper describes how the Cell Broadband Engine (CBE), a popular stream processor, can be used as a high-speed packet filter. Results show the CBE can potentially process packets arriving at a rate of 1 Gbps with a latency less than 82 μ-seconds. Performance depends on how well the packet filtering process is translated to the unique stream processor architecture. For example the method used for transmitting data and control messages among the pseudo-independent processor cores has a significant impact on performance. Experimental results will also show the current limitations of a CBE operating system when used to process packets. Possible solutions to these issues will be discussed.
Heterogeneous Embedded Real-Time Systems Environment

Science.gov (United States)

2003-12-01

AFRL-IF-RS-TR-2003-290 Final Technical Report December 2003 HETEROGENEOUS EMBEDDED REAL - TIME SYSTEMS ENVIRONMENT Integrated...HETEROGENEOUS EMBEDDED REAL - TIME SYSTEMS ENVIRONMENT 6. AUTHOR(S) Cosmo Castellano and James Graham 5. FUNDING NUMBERS C - F30602-97-C-0259
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

CERN Document Server

Abdurachmanov, David; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad

2014-01-01

Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

Science.gov (United States)

Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Knight, Robert; Muzaffar, Shahzad

2015-05-01

Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG).
Heterogeneous High Throughput Scientific Computing with APM X-Gene and Intel Xeon Phi

International Nuclear Information System (INIS)

Abdurachmanov, David; Bockelman, Brian; Elmer, Peter; Eulisse, Giulio; Muzaffar, Shahzad; Knight, Robert

2015-01-01

Electrical power requirements will be a constraint on the future growth of Distributed High Throughput Computing (DHTC) as used by High Energy Physics. Performance-per-watt is a critical metric for the evaluation of computer architectures for cost- efficient computing. Additionally, future performance growth will come from heterogeneous, many-core, and high computing density platforms with specialized processors. In this paper, we examine the Intel Xeon Phi Many Integrated Cores (MIC) co-processor and Applied Micro X-Gene ARMv8 64-bit low-power server system-on-a-chip (SoC) solutions for scientific computing applications. We report our experience on software porting, performance and energy efficiency and evaluate the potential for use of such technologies in the context of distributed computing systems such as the Worldwide LHC Computing Grid (WLCG). (paper)
A data base processor semantics specification package

Science.gov (United States)

Fishwick, P. A.

1983-01-01

A Semantics Specification Package (DBPSSP) for the Intel Data Base Processor (DBP) is defined. DBPSSP serves as a collection of cross assembly tools that allow the analyst to assemble request blocks on the host computer for passage to the DBP. The assembly tools discussed in this report may be effectively used in conjunction with a DBP compatible data communications protocol to form a query processor, precompiler, or file management system for the database processor. The source modules representing the components of DBPSSP are fully commented and included.

Online track processor for the CDF upgrade

International Nuclear Information System (INIS)

Thomson, E. J.

2002-01-01

A trigger track processor, called the eXtremely Fast Tracker (XFT), has been designed for the CDF upgrade. This processor identifies high transverse momentum (> 1.5 GeV/c) charged particles in the new central outer tracking chamber for CDF II. The XFT design is highly parallel to handle the input rate of 183 Gbits/s and output rate of 44 Gbits/s. The processor is pipelined and reports the result for a new event every 132 ns. The processor uses three stages: hit classification, segment finding, and segment linking. The pattern recognition algorithms for the three stages are implemented in programmable logic devices (PLDs) which allow in-situ modification of the algorithm at any time. The PLDs reside on three different types of modules. The complete system has been installed and commissioned at CDF II. An overview of the track processor and performance in CDF Run II are presented
Heterogeneous Gpu&Cpu Cluster For High Performance Computing In Cryptography

Directory of Open Access Journals (Sweden)

Michał Marks

2012-01-01

Full Text Available This paper addresses issues associated with distributed computing systems andthe application of mixed GPU&CPU technology to data encryption and decryptionalgorithms. We describe a heterogenous cluster HGCC formed by twotypes of nodes: Intel processor with NVIDIA graphics processing unit and AMDprocessor with AMD graphics processing unit (formerly ATI, and a novel softwareframework that hides the heterogeneity of our cluster and provides toolsfor solving complex scientific and engineering problems. Finally, we present theresults of numerical experiments. The considered case study is concerned withparallel implementations of selected cryptanalysis algorithms. The main goal ofthe paper is to show the wide applicability of the GPU&CPU technology tolarge scale computation and data processing.
Bank switched memory interface for an image processor

International Nuclear Information System (INIS)

Barron, M.; Downward, J.

1980-09-01

A commercially available image processor is interfaced to a PDP-11/45 through an 8K window of memory addresses. When the image processor was not in use it was desired to be able to use the 8K address space as real memory. The standard method of accomplishing this would have been to use UNIBUS switches to switch in either the physical 8K bank of memory or the image processor memory. This method has the disadvantage of being rather expensive. As a simple alternative, a device was built to selectively enable or disable either an 8K bank of memory or the image processor memory. To enable the image processor under program control, GEN is contracted in size, the memory is disabled, a device partition for the image processor is created above GEN, and the image processor memory is enabled. The process is reversed to restore memory to GEN. The hardware to enable/disable the image and computer memories is controlled using spare bits from a DR-11K output register. The image processor and physical memory can be switched in or out on line with no adverse affects on the system's operation
Contaminant Permeation in the Ionomer-Membrane Water Processor (IWP) System

Science.gov (United States)

Kelsey, Laura K.; Finger, Barry W.; Pasadilla, Patrick; Perry, Jay

2016-01-01

The Ionomer-membrane Water Processor (IWP) is a patented membrane-distillation based urine brine water recovery system. The unique properties of the IWP membrane pair limit contaminant permeation from the brine to the recovered water and purge gas. A paper study was conducted to predict volatile trace contaminant permeation in the IWP system. Testing of a large-scale IWP Engineering Development Unit (EDU) with urine brine pretreated with the International Space Station (ISS) pretreatment formulation was then conducted to collect air and water samples for quality analysis. Distillate water quality and purge air GC-MS results are presented and compared to predictions, along with implications for the IWP brine processing system.
Parallel computation for distributed parameter system-from vector processors to Adena computer

Energy Technology Data Exchange (ETDEWEB)

Nogi, T

1983-04-01

Research on advanced parallel hardware and software architectures for very high-speed computation deserves and needs more support and attention to fulfil its promise. Novel architectures for parallel processing are being made ready. Architectures for parallel processing can be roughly divided into two groups. One is a vector processor in which a single central processing unit involves multiple vector-arithmetic registers. The other is a processor array in which slave processors are connected to a host processor to perform parallel computation. In this review, the concept and data structure of the Adena (alternating-direction edition nexus array) architecture, which is conformable to distributed-parameter simulation algorithms, are described. 5 references.
Array processor architecture

Science.gov (United States)

Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)

1983-01-01

A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
A new approach to tracer transport analysis: From fracture systems to strongly heterogeneous porous media

International Nuclear Information System (INIS)

Tsang, Chin-Fu.

1989-02-01

Many current development and utilization of groundwater resources include a study of their flow and transport properties. These properties are needed in evaluating possible changes in groundwater quality and potential transport of hazardous solutes through the groundwater system. Investigation of transport properties of fractured rocks is an active area of research. Most of the current approaches to the study of flow and transport in fractured rocks cannot be easily used for analysis of tracer transport field data. A new approach is proposed based on a detailed study of transport through a fracture of variable aperture. This is a two-dimensional strongly heterogeneous permeable system. It is suggested that tracer breakthrough curves can be analyzed based on an aperture or permeability probability distribution function that characterizes the tracer flow through the fracture. The results are extended to a multi-fracture system and can be equally applied to a strongly heterogeneous porous medium. Finally, the need for multi-point or line and areal tracer injection and observation tests is indicated as a way to avoid the sensitive dependence of point measurements on local permeability variability. 30 refs., 15 figs
VLSI Design of a Variable-Length FFT/IFFT Processor for OFDM-Based Communication Systems

Directory of Open Access Journals (Sweden)

Jen-Chih Kuo

2003-12-01

Full Text Available The technique of {orthogonal frequency division multiplexing (OFDM} is famous for its robustness against frequency-selective fading channel. This technique has been widely used in many wired and wireless communication systems. In general, the {fast Fourier transform (FFT} and {inverse FFT (IFFT} operations are used as the modulation/demodulation kernel in the OFDM systems, and the sizes of FFT/IFFT operations are varied in different applications of OFDM systems. In this paper, we design and implement a variable-length prototype FFT/IFFT processor to cover different specifications of OFDM applications. The cached-memory FFT architecture is our suggested VLSI system architecture to design the prototype FFT/IFFT processor for the consideration of low-power consumption. We also implement the twiddle factor butterfly {processing element (PE} based on the {{coordinate} rotation digital computer (CORDIC} algorithm, which avoids the use of conventional multiplication-and-accumulation unit, but evaluates the trigonometric functions using only add-and-shift operations. Finally, we implement a variable-length prototype FFT/IFFT processor with TSMC 0.35Ã¢Â€Â‰ÃŽÂ¼m 1P4M CMOS technology. The simulations results show that the chip can perform (64-2048-point FFT/IFFT operations up to 80Ã¢Â€Â‰MHz operating frequency which can meet the speed requirement of most OFDM standards such as WLAN, ADSL, VDSL (256Ã¢ÂˆÂ¼2K, DAB, and 2K-mode DVB.
CMS readiness for multi-core workload scheduling

Science.gov (United States)

Perez-Calero Yzquierdo, A.; Balcas, J.; Hernandez, J.; Aftab Khan, F.; Letts, J.; Mason, D.; Verguilov, V.

2017-10-01

In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides a solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described.
CMS Readiness for Multi-Core Workload Scheduling

Energy Technology Data Exchange (ETDEWEB)

Perez-Calero Yzquierdo, A. [Madrid, CIEMAT; Balcas, J. [Caltech; Hernandez, J. [Madrid, CIEMAT; Aftab Khan, F. [NCP, Islamabad; Letts, J. [UC, San Diego; Mason, D. [Fermilab; Verguilov, V. [CLMI, Sofia

2017-11-22

In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides a solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described.
Multiprocessor Real-Time Scheduling with Hierarchical Processor Affinities

OpenAIRE

Bonifaci , Vincenzo; Brandenburg , Björn; D'Angelo , Gianlorenzo; Marchetti-Spaccamela , Alberto

2016-01-01

International audience; Many multiprocessor real-time operating systems offer the possibility to restrict the migrations of any task to a specified subset of processors by setting affinity masks. A notion of " strong arbitrary processor affinity scheduling " (strong APA scheduling) has been proposed; this notion avoids schedulability losses due to overly simple implementations of processor affinities. Due to potential overheads, strong APA has not been implemented so far in a real-time operat...
Multi-Layer Soft Frequency Reuse Scheme for 5G Heterogeneous Cellular Networks

DEFF Research Database (Denmark)

Shakir, Md. Hossain; Tariq, Faisal; Safdar, Ghazanfar

2017-01-01

Heterogeneous network (HetNet) is a promising cell deployment technique where low power access points are deployed overlaid on a macrocell system. It attains high throughput by intelligently reusing spectrum, and brings a trade-off between energy- and spectral-efficiency. An efficient resource...... allocation strategy is required to significantly improve its throughput in a bid to meet the fifth-generation (5G) high data rate requirements. In this correspondence, a new resource allocation scheme for HetNet, called multi-level soft frequency reuse for HetNet (ML-SFR HetNet), is proposed which increases...... the throughput several fold. We derived spectrum and power allocation expression for a generalized HetNet scenario. In addition, analytical expressions for the throughput and area spectral efficiency (ASE) are also developed. The simulations results demonstrates the efficiency of the proposed scheme which...
Probabilistic programmable quantum processors

International Nuclear Information System (INIS)

Buzek, V.; Ziman, M.; Hillery, M.

2004-01-01

We analyze how to improve performance of probabilistic programmable quantum processors. We show how the probability of success of the probabilistic processor can be enhanced by using the processor in loops. In addition, we show that an arbitrary SU(2) transformations of qubits can be encoded in program state of a universal programmable probabilistic quantum processor. The probability of success of this processor can be enhanced by a systematic correction of errors via conditional loops. Finally, we show that all our results can be generalized also for qudits. (Abstract Copyright [2004], Wiley Periodicals, Inc.)
Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

OpenAIRE

Abdul Kareem PARCHUR; Ram Asaray SINGH

2012-01-01

High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310). The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many ke...
Heterogeneity and Self-Organization of Complex Systems Through an Application to Financial Market with Multiagent Systems

Science.gov (United States)

Lucas, Iris; Cotsaftis, Michel; Bertelle, Cyrille

2017-12-01

Multiagent systems (MAS) provide a useful tool for exploring the complex dynamics and behavior of financial markets and now MAS approach has been widely implemented and documented in the empirical literature. This paper introduces the implementation of an innovative multi-scale mathematical model for a computational agent-based financial market. The paper develops a method to quantify the degree of self-organization which emerges in the system and shows that the capacity of self-organization is maximized when the agent behaviors are heterogeneous. Numerical results are presented and analyzed, showing how the global market behavior emerges from specific individual behavior interactions.
APRON: A Cellular Processor Array Simulation and Hardware Design Tool

Science.gov (United States)

Barr, David R. W.; Dudek, Piotr

2009-12-01

We present a software environment for the efficient simulation of cellular processor arrays (CPAs). This software (APRON) is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
APRON: A Cellular Processor Array Simulation and Hardware Design Tool

Directory of Open Access Journals (Sweden)

David R. W. Barr

2009-01-01

Full Text Available We present a software environment for the efficient simulation of cellular processor arrays (CPAs. This software (APRON is used to explore algorithms that are designed for massively parallel fine-grained processor arrays, topographic multilayer neural networks, vision chips with SIMD processor arrays, and related architectures. The software uses a highly optimised core combined with a flexible compiler to provide the user with tools for the design of new processor array hardware architectures and the emulation of existing devices. We present performance benchmarks for the software processor array implemented on standard commodity microprocessors. APRON can be configured to use additional processing hardware if necessary and can be used as a complete graphical user interface and development environment for new or existing CPA systems, allowing more users to develop algorithms for CPA systems.
Considerations for control system software verification and validation specific to implementations using distributed processor architectures

International Nuclear Information System (INIS)

Munro, J.K. Jr.

1993-01-01

Until recently, digital control systems have been implemented on centralized processing systems to function in one of several ways: (1) as a single processor control system; (2) as a supervisor at the top of a hierarchical network of multiple processors; or (3) in a client-server mode. Each of these architectures uses a very different set of communication protocols. The latter two architectures also belong to the category of distributed control systems. Distributed control systems can have a central focus, as in the cases just cited, or be quite decentralized in a loosely coupled, shared responsibility arrangement. This last architecture is analogous to autonomous hosts on a local area network. Each of the architectures identified above will have a different set of architecture-associated issues to be addressed in the verification and validation activities during software development. This paper summarizes results of efforts to identify, describe, contrast, and compare these issues
A UNIX-based prototype biomedical virtual image processor

International Nuclear Information System (INIS)

Fahy, J.B.; Kim, Y.

1987-01-01

The authors have developed a multiprocess virtual image processor for the IBM PC/AT, in order to maximize image processing software portability for biomedical applications. An interprocess communication scheme, based on two-way metacode exchange, has been developed and verified for this purpose. Application programs call a device-independent image processing library, which transfers commands over a shared data bridge to one or more Autonomous Virtual Image Processors (AVIP). Each AVIP runs as a separate process in the UNIX operating system, and implements the device-independent functions on the image processor to which it corresponds. Application programs can control multiple image processors at a time, change the image processor configuration used at any time, and are completely portable among image processors for which an AVIP has been implemented. Run-time speeds have been found to be acceptable for higher level functions, although rather slow for lower level functions, owing to the overhead associated with sending commands and data over the shared data bridge
Programs for Testing Processor-in-Memory Computing Systems

Science.gov (United States)

Katz, Daniel S.

2006-01-01

The Multithreaded Microbenchmarks for Processor-In-Memory (PIM) Compilers, Simulators, and Hardware are computer programs arranged in a series for use in testing the performances of PIM computing systems, including compilers, simulators, and hardware. The programs at the beginning of the series test basic functionality; the programs at subsequent positions in the series test increasingly complex functionality. The programs are intended to be used while designing a PIM system, and can be used to verify that compilers, simulators, and hardware work correctly. The programs can also be used to enable designers of these system components to examine tradeoffs in implementation. Finally, these programs can be run on non-PIM hardware (either single-threaded or multithreaded) using the POSIX pthreads standard to verify that the benchmarks themselves operate correctly. [POSIX (Portable Operating System Interface for UNIX) is a set of standards that define how programs and operating systems interact with each other. pthreads is a library of pre-emptive thread routines that comply with one of the POSIX standards.

Computational Hydrodynamics: How Portable and Scalable Are Heterogeneous Programming Paradigms?

DEFF Research Database (Denmark)

Pawlak, Wojciech; Glimberg, Stefan Lemvig; Engsig-Karup, Allan Peter

New many-core era applications at the interface of mathematics and computer science adopt modern parallel programming paradigms and expose parallelism through proper algorithms. We present new performance results for a novel massively parallel free surface wave model suitable for advanced......-device system sizes from desktops to large HPC systems such as superclusters and in the cloud utilizing heterogeneous devices like multi-core CPUs, GPUs, and Xeon Phi coprocessors. The numerical efficiency is evaluated on heterogeneous devices like multi-core CPUs, GPUs and Xeon Phi coprocessors to test...
Real-time portable system for fabric defect detection using an ARM processor

Science.gov (United States)

Fernandez-Gallego, J. A.; Yañez-Puentes, J. P.; Ortiz-Jaramillo, B.; Alvarez, J.; Orjuela-Vargas, S. A.; Philips, W.

2012-06-01

Modern textile industry seeks to produce textiles as little defective as possible since the presence of defects can decrease the final price of products from 45% to 65%. Automated visual inspection (AVI) systems, based on image analysis, have become an important alternative for replacing traditional inspections methods that involve human tasks. An AVI system gives the advantage of repeatability when implemented within defined constrains, offering more objective and reliable results for particular tasks than human inspection. Costs of automated inspection systems development can be reduced using modular solutions with embedded systems, in which an important advantage is the low energy consumption. Among the possibilities for developing embedded systems, the ARM processor has been explored for acquisition, monitoring and simple signal processing tasks. In a recent approach we have explored the use of the ARM processor for defects detection by implementing the wavelet transform. However, the computation speed of the preprocessing was not yet sufficient for real time applications. In this approach we significantly improve the preprocessing speed of the algorithm, by optimizing matrix operations, such that it is adequate for a real time application. The system was tested for defect detection using different defect types. The paper is focused in giving a detailed description of the basis of the algorithm implementation, such that other algorithms may use of the ARM operations for fast implementations.
OFFSCALE: A PC input processor for the SCALE code system. The ORIGNATE processor for ORIGEN-S

International Nuclear Information System (INIS)

Bowman, S.M.

1994-11-01

OFFSCALE is a suite of personal computer input processor programs developed at Oak Ridge National Laboratory to provide an easy-to-use interface for modules in the SCALE-4 code system. ORIGNATE is a program in the OFFSCALE suite that serves as a user-friendly interface for the ORIGEN-S isotopic generation and depletion code. It is designed to assist an ORIGEN-S user in preparing an input file for execution of light-water-reactor (LWR) fuel depletion and decay cases. ORIGNATE generates an input file that may be used to execute ORIGEN-S in SCALE-4. ORIGNATE features a pulldown menu system that accesses sophisticated data entry screens. The program allows the user to quickly set up an ORIGEN-S input file and perform error checking. This capability increases productivity and decreases the chance of user error
Response of Moist Convection to Multi-scale Surface Flux Heterogeneity

Science.gov (United States)

Kang, S. L.; Ryu, J. H.

2015-12-01

We investigate response of moist convection to multi-scale feature of the spatial variation of surface sensible heat fluxes (SHF) in the afternoon evolution of the convective boundary layer (CBL), utilizing a mesoscale-domain large eddy simulation (LES) model. The multi-scale surface heterogeneity feature is analytically created as a function of the spectral slope in the wavelength range from a few tens of km to a few hundreds of m in the spectrum of surface SHF on a log-log scale. The response of moist convection to the κ-3 - slope (where κ is wavenumber) surface SHF field is compared with that to the κ-2 - slope surface, which has a relatively weak mesoscale feature, and the homogeneous κ0 - slope surface. Given the surface energy balance with a spatially uniform available energy, the prescribed SHF has a 180° phase lag with the latent heat flux (LHF) in a horizontal domain of (several tens of km)2. Thus, warmer (cooler) surface is relatively dry (moist). For all the cases, the same observation-based sounding is prescribed for the initial condition. For all the κ-3 - slope surface heterogeneity cases, early non-precipitating shallow clouds further develop into precipitating deep thunderstorms. But for all the κ-2 - slope cases, only shallow clouds develop. We compare the vertical profiles of domain-averaged fluxes and variances, and the contribution of the mesoscale and turbulence contributions to the fluxes and variances, between the κ-3 versus κ-2 slope cases. Also the cross-scale processes are investigated.
The bit slice micro-processor 'GESPRO' as a project in the UA2 experiment

International Nuclear Information System (INIS)

Becam, C.; Bernaudin, P.; Delanghe, J.; Mencik, M.; Merkel, B.; Plothow, H.; Fest, H.M.; Lecoq, J.; Martin, H.; Meyer, J.M.

1981-01-01

The bit slice micro-processor GESPRO, as it is proposed for use in the UA 2 data acquisition chain and trigger system, is a CAMAC module plugged into a standard Elliott System crate via which it communicates as a slave with its host computer (ND, DEC). It has full control of CAMAC as a master unit. GESPRO is a 24 bit machine (150 ns effective cycle time) with multi-mode memory addressing capacity of 64 K words. The micro-processor structure uses 5 busses including pipe-line registers to mask access time and 16 interrupt levels. The micro-program memory capacity is 2 K (RAM) words of 48 bits each. A special hardwired module allows floating point (as well as integer) multiplication of 24 x 24 bits, result in 48 bits, in about 200 ns. This micro-processor could be used in the UA2 data acquisition chain and trigger system for the following tasks: a) online data reduction, i.e. to read DURANDAL (fast ADC's = the hardware trigger in the experiment), process the information (effective mass calculation, etc.) resulting in accepting or rejecting the event. b) read out and analysis of the accepted data (collect statistical information). c) preprocess the data (calculation of pointers, address decoding, etc.). The UA2 version of GESPRO is under construction, programs and micro-programs are under development. Hardware and software will be tested with simulated data. First results are expected in about one year from now. (orig.)
A fast continuous magnetic field measurement system based on digital signal processors

International Nuclear Information System (INIS)

Velev, G.V.; Carcagno, R.; DiMarco, J.; Kotelnikov, S.; Lamm, M.; Makulski, A.; Maroussov, V.; Nehring, R.; Nogiec, J.; Orris, D.; Poukhov, O.; Prakoshyn, F.; Schlabach, P.; Tompkins, J.C.

2005-01-01

In order to study dynamic effects in accelerator magnets, such as the decay of the magnetic field during the dwell at injection and the rapid so-called ''snapback'' during the first few seconds of the resumption of the energy ramp, a fast continuous harmonics measurement system was required. A new magnetic field measurement system, based on the use of digital signal processors (DSP) and Analog to Digital (A/D) converters, was developed and prototyped at Fermilab. This system uses Pentek 6102 16 bit A/D converters and the Pentek 4288 DSP board with the SHARC ADSP-2106 family digital signal processor. It was designed to acquire multiple channels of data with a wide dynamic range of input signals, which are typically generated by a rotating coil probe. Data acquisition is performed under a RTOS, whereas processing and visualization are performed under a host computer. Firmware code was developed for the DSP to perform fast continuous readout of the A/D FIFO memory and integration over specified intervals, synchronized to the probe's rotation in the magnetic field. C, C++ and Java code was written to control the data acquisition devices and to process a continuous stream of data. The paper summarizes the characteristics of the system and presents the results of initial tests and measurements
Evaluation of the Intel Nehalem-EX server processor

CERN Document Server

Jarp, S; Leduc, J; Nowak, A; CERN. Geneva. IT Department

2010-01-01

In this paper we report on a set of benchmark results recently obtained by the CERN openlab by comparing the 4-socket, 32-core Intel Xeon X7560 server with the previous generation 4-socket server, based on the Xeon X7460 processor. The Xeon X7560 processor represents a major change in many respects, especially the memory sub-system, so it was important to make multiple comparisons. In most benchmarks the two 4-socket servers were compared. It should be underlined that both servers represent the “top of the line” in terms of frequency. However, in some cases, it was important to compare systems that integrated the latest processor features, such as QPI links, Symmetric multithreading and over-clocking via Turbo mode, and in such situations the X7560 server was compared to a dual socket L5520 based system with an identical frequency of 2.26 GHz. Before summarizing the results we must stress the fact that benchmarking of modern processors is a very complex affair. One has to control (at least) the following ...
Intra- and intersubject comparison of cochlear implant systems using the Esprit and the Tempo+ behind-the-ear speech processor.

Science.gov (United States)

Kompis, Martin; Jenk, Martin; Vischer, Mattheus W; Seifert, Eberhard; Häusler, Rudolf

2002-12-01

A patient with bilateral profound deafness was implanted with a Nucleus CI24M cochlear implant (CI) and used an Esprit behind-the-ear (BTE) speech processor. Thirteen months later, the implant had to be removed because of a cholesteatoma. As the same electrode could not be reinserted, a Medel combi40s CI was implanted in the same ear, and the patient used a Tempo+ BTE processor. After 1 year of use of the Combi40s/Tempo+ system, speech recognition was better and was rated better subjectively than with the CI24M/Esprit system. Speech recognition and subjective ratings were also assessed for two matched groups of nine CI users each, using either an Esprit or a Tempo+ processor. On average, speech recognition scores were higher for the group of Tempo+ users, but the difference was not statistically significant. Users of the Esprit processors rated their device higher in terms of cosmetic appearance and comfort of wearing.
A CNN-Specific Integrated Processor

Directory of Open Access Journals (Sweden)

Suleyman Malki

2009-01-01

Full Text Available Integrated Processors (IP are algorithm-specific cores that either by programming or by configuration can be re-used within many microelectronic systems. This paper looks at Cellular Neural Networks (CNN to become realized as IP. First current digital implementations are reviewed, and the memoryprocessor bandwidth issues are analyzed. Then a generic view is taken on the structure of the network, and a new intra-communication protocol based on rotating wheels is proposed. It is shown that this provides for guaranteed high-performance with a minimal network interface. The resulting node is small and supports multi-level CNN designs, giving the system a 30-fold increase in capacity compared to classical designs. As it facilitates multiple operations on a single image, and single operations on multiple images, with minimal access to the external image memory, balancing the internal and external data transfer requirements optimizes the system operation. In conventional digital CNN designs, the treatment of boundary nodes requires additional logic to handle the CNN value propagation scheme. In the new architecture, only a slight modification of the existing cells is necessary to model the boundary effect. A typical prototype for visual pattern recognition will house 4096 CNN cells with a 2% overhead for making it an IP.
Hardware Realization of an FPGA Processor - Operating System Call Offload and Experiences

DEFF Research Database (Denmark)

Hindborg, Andreas Erik; Karlsson, Sven

2014-01-01

core that can be integrated in many signal and data processing platforms on FPGAs. We also show how we allow the processor to use operating system services. For a set of SPLASH-2 and SPEC2006 benchmarks we show an speedup of up to 64% over a similar Xilinx MicroBlaze implementation while using 27...
Exploiting Domain Knowledge in System-level MPSoC Design Space Exploration

NARCIS (Netherlands)

Thompson, M.; Pimentel, A.D.

2013-01-01

System-level design space exploration (DSE), which is performed early in the design process, is of eminent importance to the design of complex multi-processor embedded multimedia systems. During system-level DSE, system parameters like, e.g., the number and type of processors, and the mapping of
Java Processor Optimized for RTSJ

Directory of Open Access Journals (Sweden)

Tu Shiliang

2007-01-01

Full Text Available Due to the preeminent work of the real-time specification for Java (RTSJ, Java is increasingly expected to become the leading programming language in real-time systems. To provide a Java platform suitable for real-time applications, a Java processor which can execute Java bytecode is directly proposed in this paper. It provides efficient support in hardware for some mechanisms specified in the RTSJ and offers a simpler programming model through ameliorating the scoped memory of the RTSJ. The worst case execution time (WCET of the bytecodes implemented in this processor is predictable by employing the optimization method proposed in our previous work, in which all the processing interfering predictability is handled before bytecode execution. Further advantage of this method is to make the implementation of the processor simpler and suited to a low-cost FPGA chip.
The UA1 trigger processor

International Nuclear Information System (INIS)

Grayer, G.H.

1981-01-01

Experiment UA1 is a large multi-purpose spectrometer at the CERN proton-antiproton collider, scheduled for late 1981. The principal trigger is formed on the basis of the energy deposition in calorimeters. A trigger decision taken in under 2.4 microseconds can avoid dead time losses due to the bunched nature of the beam. To achieve this we have built fast 8-bit charge to digital converters followed by two identical digital processors tailored to the experiment. The outputs of groups of the 2440 photomultipliers in the calorimeters are summed to form a total of 288 input channels to the ADCs. A look-up table in RAM is used to convert the digitised photomultiplier signals to energy in one processor, combinations of input channels, and also counts the number of clusters with electromagnetic or hadronic energy above pre-determined levels. Up to twelve combinations of these conditions, together with external information, may be combined in coincidence or in veto to form the final trigger. Provision has been made for testing using simulated data in an off-line mode, and sampling real data when on-line. (orig.)
Heterogeneous Active Matter

Science.gov (United States)

Kolb, Thomas; Klotsa, Daphne

Active systems are composed of self-propelled (active) particles that locally convert energy into motion and exhibit emergent collective behaviors, such as fish schooling and bird flocking. Most works so far have focused on monodisperse, one-component active systems. However, real systems are heterogeneous, and consist of several active components. We perform molecular dynamics simulations of multi-component active matter systems and report on their emergent behavior. We discuss the phase diagram of dynamic states as well as parameters where we see mixing versus segregation.
Hierarchical DSE for multi-ASIP platforms

DEFF Research Database (Denmark)

Micconi, Laura; Corvino, Rosilde; Gangadharan, Deepak

2013-01-01

This work proposes a hierarchical Design Space Exploration (DSE) for the design of multi-processor platforms targeted to specific applications with strict timing and area constraints. In particular, it considers platforms integrating multiple Application Specific Instruction Set Processors (ASIPs...
New development for low energy electron beam processor

International Nuclear Information System (INIS)

Takei, Taro; Goto, Hitoshi; Oizumi, Matsutoshi; Hirakawa, Tetsuya; Ochi, Masafumi

2003-01-01

Newly developed low-energy electron beam (EB) processors that have unique designs and configurations compared to conventional ones enable electron-beam treatment of small three-dimensional objects, such as grain-like agricultural products and small plastic parts. As the EB processor can irradiate the products from the whole angles, the uniform EB treatment can be achieved at one time regardless the complex shapes of the product. Here presented are two new EB processors: the first system has cylindrical process zone, which allows three-dimensional objects to be irradiated with one-pass treatment. The second is a tube-type small EB processor, achieving not only its compactor design, but also higher beam extraction efficiency and flexible installation of the irradiation heads. The basic design of each processor and potential applications with them will be presented in this paper. (author)
Computer controlled multi-leaf conformation radiotherapy

Energy Technology Data Exchange (ETDEWEB)

Matsuda, T [Tokyo Metropolitan Komagome Hospital (Japan); Inamura, K

1981-10-01

A conformation radiotherapy system with 5-split collimators of which openings can be controlled symmetrically by computerized techniques during rotational irradiation by a linear accelerator has been developed. Outline of the system performance and its clinical applications are described as follows. 1. Profile of the system: The hardware is composed of three parts, namely, the multi-split collimator, the electronic data processor, and the interface between those two parts. 1) The multi-leaf collimator is composed of 5 pairs (10 leaves) diaphragms. It can be mounted to the X-ray head of a linear accelerator when used, and can be dismounted after its use. 2) The electronic data processor sends control signal to the collimator according to the 5-leaf target volume data which can be stored into a minifloppy disc through the curve digitizer previously. This part is composed of a) dedicated micro processor, b) I/O expansion unit, c) color CRT display with key board, d) dual mini-floppy disc unit, e) curve digitizer and f) digital plotter for recording and verification of resulted accuracy. 2. Performance of the system: 1) Maximum field size: 15 cm x 15 cm at isocenter. 2) Maximum elongation ratio of the target volume: 3 : 1 when the longer diameter is 15 cm. 3) Control accuracy: Within +-3 mm deviation from planned beam focus at isocenter. 3. Clinical application: The method of treatment planning and clinical advantages of this irradiation method are explained by raising clinical experiences such as treating brain tumor and rectal cancer.
Computer controlled multi-leaf conformation radiotherapy

International Nuclear Information System (INIS)

Matsuda, Tadayoshi; Inamura, Kiyonari.

1981-01-01

A conformation radiotherapy system with 5-split collimators of which openings can be controlled symmetrically by computerized techniques during rotational irradiation by a linear accelerator has been developed. Outline of the system performance and its clinical applications are described as follows. 1. Profile of the system: The hardware is composed of three parts, namely, the multi-split collimator, the electronic data processor, and the interface between those two parts. 1) The multi-leaf collimator is composed of 5 pairs (10 leaves) diaphragms. It can be mounted to the X-ray head of a linear accelerator when used, and can be dismounted after its use. 2) The electronic data processor sends control signal to the collimator according to the 5-leaf target volume data which can be stored into a minifloppy disc through the curve digitizer previously. This part is composed of a) dedicated micro processor, b) I/O expansion unit, c) color CRT display with key board, d) dual mini-floppy disc unit, e) curve digitizer and f) digital plotter for recording and verification of resulted accuracy. 2. Performance of the system: 1) Maximum field size: 15 cm x 15 cm at isocenter. 2) Maximum elongation ratio of the target volume: 3 : 1 when the longer diameter is 15 cm. 3) Control accuracy: Within +-3 mm deviation from planned beam focus at isocenter. 3. Clinical application: The method of treatment planning and clinical advantages of this irradiation method are explained by raising clinical experiences such as treating brain tumor and rectal cancer. (author)
Parallel processor for fast event analysis

International Nuclear Information System (INIS)

Hensley, D.C.

1983-01-01

Current maximum data rates from the Spin Spectrometer of approx. 5000 events/s (up to 1.3 MBytes/s) and minimum analysis requiring at least 3000 operations/event require a CPU cycle time near 70 ns. In order to achieve an effective cycle time of 70 ns, a parallel processing device is proposed where up to 4 independent processors will be implemented in parallel. The individual processors are designed around the Am2910 Microsequencer, the AM29116 μP, and the Am29517 Multiplier. Satellite histogramming in a mass memory system will be managed by a commercial 16-bit μP system
Individual heterogeneity generating explosive system network dynamics.

Science.gov (United States)

Manrique, Pedro D; Johnson, Neil F

2018-03-01

Individual heterogeneity is a key characteristic of many real-world systems, from organisms to humans. However, its role in determining the system's collective dynamics is not well understood. Here we study how individual heterogeneity impacts the system network dynamics by comparing linking mechanisms that favor similar or dissimilar individuals. We find that this heterogeneity-based evolution drives an unconventional form of explosive network behavior, and it dictates how a polarized population moves toward consensus. Our model shows good agreement with data from both biological and social science domains. We conclude that individual heterogeneity likely plays a key role in the collective development of real-world networks and communities, and it cannot be ignored.

Individual heterogeneity generating explosive system network dynamics

Science.gov (United States)

Manrique, Pedro D.; Johnson, Neil F.

2018-03-01

Individual heterogeneity is a key characteristic of many real-world systems, from organisms to humans. However, its role in determining the system's collective dynamics is not well understood. Here we study how individual heterogeneity impacts the system network dynamics by comparing linking mechanisms that favor similar or dissimilar individuals. We find that this heterogeneity-based evolution drives an unconventional form of explosive network behavior, and it dictates how a polarized population moves toward consensus. Our model shows good agreement with data from both biological and social science domains. We conclude that individual heterogeneity likely plays a key role in the collective development of real-world networks and communities, and it cannot be ignored.
A Bayesian sequential processor approach to spectroscopic portal system decisions

Energy Technology Data Exchange (ETDEWEB)

Sale, K; Candy, J; Breitfeller, E; Guidry, B; Manatt, D; Gosnell, T; Chambers, D

2007-07-31

The development of faster more reliable techniques to detect radioactive contraband in a portal type scenario is an extremely important problem especially in this era of constant terrorist threats. Towards this goal the development of a model-based, Bayesian sequential data processor for the detection problem is discussed. In the sequential processor each datum (detector energy deposit and pulse arrival time) is used to update the posterior probability distribution over the space of model parameters. The nature of the sequential processor approach is that a detection is produced as soon as it is statistically justified by the data rather than waiting for a fixed counting interval before any analysis is performed. In this paper the Bayesian model-based approach, physics and signal processing models and decision functions are discussed along with the first results of our research.
Interleaving methods for hybrid system-level MPSoC design space exploration

NARCIS (Netherlands)

Piscitelli, R.; Pimentel, A.D.; McAllister, J.; Bhattacharyya, S.

2012-01-01

System-level design space exploration (DSE), which is performed early in the design process, is of eminent importance to the design of complex multi-processor embedded system architectures. During system-level DSE, system parameters like, e.g., the number and type of processors, the type and size of
Image Matrix Processor for Volumetric Computations Final Report CRADA No. TSB-1148-95

Energy Technology Data Exchange (ETDEWEB)

Roberson, G. Patrick [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Browne, Jolyon [Advanced Research & Applications Corporation, Sunnyvale, CA (United States)

2018-01-22

The development of an Image Matrix Processor (IMP) was proposed that would provide an economical means to perform rapid ray-tracing processes on volume "Giga Voxel" data sets. This was a multi-phased project. The objective of the first phase of the IMP project was to evaluate the practicality of implementing a workstation-based Image Matrix Processor for use in volumetric reconstruction and rendering using hardware simulation techniques. Additionally, ARACOR and LLNL worked together to identify and pursue further funding sources to complete a second phase of this project.
Comparison of Processor Performance of SPECint2006 Benchmarks of some Intel Xeon Processors

Directory of Open Access Journals (Sweden)

Abdul Kareem PARCHUR

2012-08-01

Full Text Available High performance is a critical requirement to all microprocessors manufacturers. The present paper describes the comparison of performance in two main Intel Xeon series processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310. The microarchitecture of these processors is implemented using the basis of a new family of processors from Intel starting with the Pentium 4 processor. These processors can provide a performance boost for many key application areas in modern generation. The scaling of performance in two major series of Intel Xeon processors (Type A: Intel Xeon X5260, X5460, E5450 and L5320 and Type B: Intel Xeon X5140, 5130, 5120 and E5310 has been analyzed using the performance numbers of 12 CPU2006 integer benchmarks, performance numbers that exhibit significant differences in performance. The results and analysis can be used by performance engineers, scientists and developers to better understand the performance scaling in modern generation processors.
Simulation of a parallel processor on a serial processor: The neutron diffusion equation

International Nuclear Information System (INIS)

Honeck, H.C.

1981-01-01

Parallel processors could provide the nuclear industry with very high computing power at a very moderate cost. Will we be able to make effective use of this power. This paper explores the use of a very simple parallel processor for solving the neutron diffusion equation to predict power distributions in a nuclear reactor. We first describe a simple parallel processor and estimate its theoretical performance based on the current hardware technology. Next, we show how the parallel processor could be used to solve the neutron diffusion equation. We then present the results of some simulations of a parallel processor run on a serial processor and measure some of the expected inefficiencies. Finally we extrapolate the results to estimate how actual design codes would perform. We find that the standard numerical methods for solving the neutron diffusion equation are still applicable when used on a parallel processor. However, some simple modifications to these methods will be necessary if we are to achieve the full power of these new computers. (orig.) [de
Embedded processor extensions for image processing

Science.gov (United States)

Thevenin, Mathieu; Paindavoine, Michel; Letellier, Laurent; Heyrman, Barthélémy

2008-04-01

The advent of camera phones marks a new phase in embedded camera sales. By late 2009, the total number of camera phones will exceed that of both conventional and digital cameras shipped since the invention of photography. Use in mobile phones of applications like visiophony, matrix code readers and biometrics requires a high degree of component flexibility that image processors (IPs) have not, to date, been able to provide. For all these reasons, programmable processor solutions have become essential. This paper presents several techniques geared to speeding up image processors. It demonstrates that a gain of twice is possible for the complete image acquisition chain and the enhancement pipeline downstream of the video sensor. Such results confirm the potential of these computing systems for supporting future applications.
Single-chip serial channel enhances multi-processor systems

Energy Technology Data Exchange (ETDEWEB)

Millar, J.

1982-01-01

In this paper multiprocessor systems are described and explained. The impact that VLSI advancements are having on multiprocessor design is pointed out. The TMS 7041 single-chip microcomputer is described briefly, highlighting its multiprocessor communication capability. And finally, a typical multiprocessor system is shown, implementing the TMS 7041.
XL-100S microprogrammable processor

International Nuclear Information System (INIS)

Gorbunov, N.V.; Guzik, Z.; Sutulin, V.A.; Forytski, A.

1983-01-01

The XL-100S microprogrammable processor providing the multiprocessor operation mode in the XL system crate is described. The processor meets the EUR 6500 CAMAC standards, address up to 4 Mbyte memory, and interacts with 7 CAMAC branchas. Eight external requests initiate operations preset by a sequence of microcommands in a memory of the capacity up to 64 kwords of 32-Git. The microprocessor architecture allows one to emulate commands of the majority of mini- or micro-computers, including floating point operations. The XL-100S processor may be used in various branches of experimental physics: for physical experiment apparatus control, fast selection of useful physical events, organization of the of input/output operations, organization of direct assess to memory included, etc. The Am2900 microprocessor set is used as an elementary base. The device is made in the form of a single width CAMAC module
Acoustooptic linear algebra processors - Architectures, algorithms, and applications

Science.gov (United States)

Casasent, D.

1984-01-01

Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
Combining a Multi-Agent System and Communication Middleware for Smart Home Control: A Universal Control Platform Architecture.

Science.gov (United States)

Zheng, Song; Zhang, Qi; Zheng, Rong; Huang, Bi-Qin; Song, Yi-Lin; Chen, Xin-Chu

2017-09-16

In recent years, the smart home field has gained wide attention for its broad application prospects. However, families using smart home systems must usually adopt various heterogeneous smart devices, including sensors and devices, which makes it more difficult to manage and control their home system. How to design a unified control platform to deal with the collaborative control problem of heterogeneous smart devices is one of the greatest challenges in the current smart home field. The main contribution of this paper is to propose a universal smart home control platform architecture (IAPhome) based on a multi-agent system and communication middleware, which shows significant adaptability and advantages in many aspects, including heterogeneous devices connectivity, collaborative control, human-computer interaction and user self-management. The communication middleware is an important foundation to design and implement this architecture which makes it possible to integrate heterogeneous smart devices in a flexible way. A concrete method of applying the multi-agent software technique to solve the integrated control problem of the smart home system is also presented. The proposed platform architecture has been tested in a real smart home environment, and the results indicate that the effectiveness of our approach for solving the collaborative control problem of different smart devices.
Combining a Multi-Agent System and Communication Middleware for Smart Home Control: A Universal Control Platform Architecture

Science.gov (United States)

Zheng, Song; Zhang, Qi; Zheng, Rong; Huang, Bi-Qin; Song, Yi-Lin; Chen, Xin-Chu

2017-01-01

In recent years, the smart home field has gained wide attention for its broad application prospects. However, families using smart home systems must usually adopt various heterogeneous smart devices, including sensors and devices, which makes it more difficult to manage and control their home system. How to design a unified control platform to deal with the collaborative control problem of heterogeneous smart devices is one of the greatest challenges in the current smart home field. The main contribution of this paper is to propose a universal smart home control platform architecture (IAPhome) based on a multi-agent system and communication middleware, which shows significant adaptability and advantages in many aspects, including heterogeneous devices connectivity, collaborative control, human-computer interaction and user self-management. The communication middleware is an important foundation to design and implement this architecture which makes it possible to integrate heterogeneous smart devices in a flexible way. A concrete method of applying the multi-agent software technique to solve the integrated control problem of the smart home system is also presented. The proposed platform architecture has been tested in a real smart home environment, and the results indicate that the effectiveness of our approach for solving the collaborative control problem of different smart devices. PMID:28926957
Combining a Multi-Agent System and Communication Middleware for Smart Home Control: A Universal Control Platform Architecture

Directory of Open Access Journals (Sweden)

Song Zheng

2017-09-01

Full Text Available In recent years, the smart home field has gained wide attention for its broad application prospects. However, families using smart home systems must usually adopt various heterogeneous smart devices, including sensors and devices, which makes it more difficult to manage and control their home system. How to design a unified control platform to deal with the collaborative control problem of heterogeneous smart devices is one of the greatest challenges in the current smart home field. The main contribution of this paper is to propose a universal smart home control platform architecture (IAPhome based on a multi-agent system and communication middleware, which shows significant adaptability and advantages in many aspects, including heterogeneous devices connectivity, collaborative control, human-computer interaction and user self-management. The communication middleware is an important foundation to design and implement this architecture which makes it possible to integrate heterogeneous smart devices in a flexible way. A concrete method of applying the multi-agent software technique to solve the integrated control problem of the smart home system is also presented. The proposed platform architecture has been tested in a real smart home environment, and the results indicate that the effectiveness of our approach for solving the collaborative control problem of different smart devices.
Real time monitoring of electron processors

International Nuclear Information System (INIS)

Nablo, S.V.; Kneeland, D.R.; McLaughlin, W.L.

1995-01-01

A real time radiation monitor (RTRM) has been developed for monitoring the dose rate (current density) of electron beam processors. The system provides continuous monitoring of processor output, electron beam uniformity, and an independent measure of operating voltage or electron energy. In view of the device's ability to replace labor-intensive dosimetry in verification of machine performance on a real-time basis, its application to providing archival performance data for in-line processing is discussed. (author)
Design space pruning through hybrid analysis in system-level design space exploration

NARCIS (Netherlands)

Piscitelli, R.; Pimentel, A.D.

2012-01-01

System-level design space exploration (DSE), which is performed early in the design process, is of eminent importance to the design of complex multi-processor embedded system archi- tectures. During system-level DSE, system parameters like, e.g., the number and type of processors, the type and size
Functional unit for a processor

NARCIS (Netherlands)

Rohani, A.; Kerkhoff, Hans G.

2013-01-01

The invention relates to a functional unit for a processor, such as a Very Large Instruction Word Processor. The invention further relates to a processor comprising at least one such functional unit. The invention further relates to a functional unit and processor capable of mitigating the effect of
A fast continuous magnetic field measurement system based on digital signal processors

Energy Technology Data Exchange (ETDEWEB)

Velev, G.V.; Carcagno, R.; DiMarco, J.; Kotelnikov, S.; Lamm, M.; Makulski, A.; /Fermilab; Maroussov, V.; /Purdue U.; Nehring, R.; Nogiec, J.; Orris, D.; /Fermilab; Poukhov,; Prakoshyn, F.; /Dubna, JINR; Schlabach, P.; Tompkins, J.C.; /Fermilab

2005-09-01

In order to study dynamic effects in accelerator magnets, such as the decay of the magnetic field during the dwell at injection and the rapid so-called ''snapback'' during the first few seconds of the resumption of the energy ramp, a fast continuous harmonics measurement system was required. A new magnetic field measurement system, based on the use of digital signal processors (DSP) and Analog to Digital (A/D) converters, was developed and prototyped at Fermilab. This system uses Pentek 6102 16 bit A/D converters and the Pentek 4288 DSP board with the SHARC ADSP-2106 family digital signal processor. It was designed to acquire multiple channels of data with a wide dynamic range of input signals, which are typically generated by a rotating coil probe. Data acquisition is performed under a RTOS, whereas processing and visualization are performed under a host computer. Firmware code was developed for the DSP to perform fast continuous readout of the A/D FIFO memory and integration over specified intervals, synchronized to the probe's rotation in the magnetic field. C, C++ and Java code was written to control the data acquisition devices and to process a continuous stream of data. The paper summarizes the characteristics of the system and presents the results of initial tests and measurements.
A high-accuracy optical linear algebra processor for finite element applications

Science.gov (United States)

Casasent, D.; Taylor, B. K.

1984-01-01

Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced.
First level trigger processor for the ZEUS calorimeter

International Nuclear Information System (INIS)

Dawson, J.W.; Talaga, R.L.; Burr, G.W.; Laird, R.J.; Smith, W.; Lackey, J.

1990-01-01

This paper discusses the design of the first level trigger processor for the ZEUS calorimeter. This processor accepts data from the 13,000 photomultipliers of the calorimeter which is topologically divided into 16 regions, and after regional preprocessing, performs logical and numerical operations which cross regional boundaries. Because the crossing period at the HERA collider is 96 ns, it is necessary that first-level trigger decisions be made in pipelined hardware. One microsecond is allowed for the processor to perform the required logical and numerical operations, during which time the data from ten crossings would be resident in the processor while being clocked through the pipelined hardware. The circuitry is implemented in 100K ECL, Advanced CMOS discrete devices, and programmable gate arrays, and operates in a VME environment. All tables and registers are written/read from VME, and all diagnostic codes are executed from VME. Preprocessed data flows into the processor at a rate of 5.2GB/s, and processed data flows from the processor to the Global First-Level Trigger at a rate of 700MB/s. The system allows for subsets of the logic to be configured by software and for various important variables to be histogrammed as they flow through the processor. 2 refs., 3 figs
First-level trigger processor for the ZEUS calorimeter

International Nuclear Information System (INIS)

Dawson, J.W.; Talaga, R.L.; Burr, G.W.; Laird, R.J.; Smith, W.; Lackey, J.

1990-01-01

The design of the first-level trigger processor for the Zeus calorimeter is discussed. This processor accepts data from the 13,000 photomultipliers of the calorimeter, which is topologically divided into 16 regions, and after regional preprocessing performs logical and numerical operations that cross regional boundaries. Because the crossing period at the HERA collider is 96 ns, it is necessary that first-level trigger decisions be made in pipelined hardware. One microsecond is allowed for the processor to perform the required logical and numerical operations, during which time the data from ten crossings would be resident in the processor while being clocked through the pipelined hardware. The circuitry is implemented in 100K emitter-coupled logic (ECL), advanced CMOS discrete devices and programmable gate arrays, and operates in a VME environment. All tables and registers are written/read from VME, and all diagnostic codes are executed from VME. Preprocessed data flows into the processor at a rate of 5.2 Gbyte/s, and processed data flows from the processor to the global first-level trigger at a rate of 70 Mbyte/s. The system allows for subsets of the logic to be configured by software and for various important variables to be histogrammed as they flow through the processor

Atmel's New Rad-Hard Sparc V8 Processor 200Mhz & Low Power System on Chip

Science.gov (United States)

Ganry, Nicolas; Mantelet, Guy; Parkes, Steve; McClements, Chris

2014-08-01

The AT6981 is a new generation of processor designed for critical spaceflight applications, which combines a high-performance SPARC® V8 radiation hard processor, with enough on-chip memory for many aerospace applications and state-of-the-art SpaceWire networking technology from STAR- Dundee. The AT6981 is implemented in Atmel 90nm rad-hard technology, enabling 200 MHz operating speed for the processor with power consumption levels around 1W. This advanced technology allows strong system integration in a SoC with embedded peripherals like CAN, 1553, Ethernet, DDR and embedded memory with 1Mbytes SRAM. The device is ITAR- free and is developed in France by Atmel Aerospace having more than of 30years space experience. This paper describes this new SoC architecture and technical options considered to insure the best performances, the minimum power consumption and high reliability. This device will be available on the market in H2 2014 for evaluation with first flight models targeted end 2015.
Heterogeneous flow in multi-layer joint networks and its influence on incipient karst generation

Science.gov (United States)

Wang, X.; Jourde, H.

2017-12-01

Various dissolution types (e.g. pipe, stripe and sheet karstic features) have been observed in fractured layered limestones. Yet, due to a large range of structural and hydraulic parameters play a role in the karstification process, the dissolution mechanism, occurring either along fractures or bedding planes, is difficult to quantify. In this study, we use numerical models to investigate the influence of these parameters on the generation of different types of incipient karst. Specifically, we focus on two parameters: the fracture intensity contrast between adjacent layers and the aperture ratio between bedding planes and joints (abed/ajoint). The DFN models were generated using a pseudo-genetic code that considers the stress shadow zone. Flow simulations were performed using a combined finite-volume finite-element simulator under practical boundary conditions. The flow channeling within the fracture networks was characterized by applying a multi-fractal technique. The rock block equivalent permeability (keff) was also calculated to quantify the change in bulk hydraulic properties when changing the selected structural and hydraulic parameters. The flow simulation results show that the abed/ajoint ratio has a first-order control on the heterogeneous distribution of flow in the multi-layer system and on the magnitude of equivalent permeability. When abed/ajoint 0.1, the bedding plane has more control and flow becomes more pervasive and uniform, and the keff is accordingly high. A simple model, accounting for the calculation of the heterogeneous distributions of Damköhler number associated with different aperture ratios, is proposed to predict what type of incipient karst tends to develop under the studied flow conditions.
OFFSCALE: A PC input processor for the SCALE code system. The CSASIN processor for the criticality sequences

International Nuclear Information System (INIS)

Bowman, S.M.

1994-11-01

OFFSCALE is a suite of personal computer input processor programs developed at Oak Ridge National Laboratory to provide an easy-to-use interface for modules in the SCALE-4 code system. CSASIN (formerly known as OFFSCALE) is a program in the OFFSCALE suite that serves as a user-friendly interface for the Criticality Safety Analysis Sequences (CSAS) available in SCALE-4. It is designed to assist a SCALE-4 user in preparing an input file for execution of criticality safety problems. Output from CSASIN generates an input file that may be used to execute the CSAS control module in SCALE-4. CSASIN features a pulldown menu system that accesses sophisticated data entry screens. The program allows the user to quickly set up a CSAS input file and perform data checking. This capability increases productivity and decreases the chance of user error
The ATLAS Level-1 Central Trigger Processor (CTP)

CERN Document Server

Spiwoks, Ralf; Ellis, Nick; Farthouat, P; Gällnö, P; Haller, J; Krasznahorkay, A; Maeno, T; Pauly, T; Pessoa-Lima, H; Resurreccion-Arcas, I; Schuler, G; De Seixas, J M; Torga-Teixeira, R; Wengler, T

2005-01-01

The ATLAS Level-1 Central Trigger Processor (CTP) combines information from calorimeter and muon trigger processors and makes the final Level-1 Accept (L1A) decision on the basis of lists of selection criteria (trigger menus). In addition to the event-selection decision, the CTP also provides trigger summary information to the Level-2 trigger and the data acquisition system. It further provides accumulated and bunch-by-bunch scaler data for monitoring of the trigger, detector and beam conditions. The CTP is presented and results are shown from tests with the calorimeter adn muon trigger processors connected to detectors in a particle beam, as well as from stand-alone full-system tests in the laboratory which were used to validate the CTP.
GPU: the biggest key processor for AI and parallel processing

Science.gov (United States)

Baji, Toru

2017-07-01

Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
Optimal Control of Heterogeneous Systems with Endogenous Domain of Heterogeneity

International Nuclear Information System (INIS)

Belyakov, Anton O.; Tsachev, Tsvetomir; Veliov, Vladimir M.

2011-01-01

The paper deals with optimal control of heterogeneous systems, that is, families of controlled ODEs parameterized by a parameter running over a domain called domain of heterogeneity. The main novelty in the paper is that the domain of heterogeneity is endogenous: it may depend on the control and on the state of the system. This extension is crucial for several economic applications and turns out to rise interesting mathematical problems. A necessary optimality condition is derived, where one of the adjoint variables satisfies a differential inclusion (instead of equation) and the maximization of the Hamiltonian takes the form of “min-max”. As a consequence, a Pontryagin-type maximum principle is obtained under certain regularity conditions for the optimal control. A formula for the derivative of the objective function with respect to the control from L ∞ is presented together with a sufficient condition for its existence. A stylized economic example is investigated analytically and numerically.
Formal heterogeneous system modeling with SystemC

DEFF Research Database (Denmark)

Niaki, Seyed Hosein Attarzadeh; Jakobsen, Mikkel Koefoed; Sulonen, Tero

2012-01-01

Electronic System Level (ESL) design of embedded systems proposes raising the abstraction level of the design entry to cope with the increasing complexity of such systems. To exploit the benefits of ESL, design languages should allow specification of models which are a) heterogeneous, to describe...
Multi-fuel reformers for fuel cells used in transportation. Phase 1: Multi-fuel reformers

Science.gov (United States)

1994-05-01

DOE has established the goal, through the Fuel Cells in Transportation Program, of fostering the rapid development and commercialization of fuel cells as economic competitors for the internal combustion engine. Central to this goal is a safe feasible means of supplying hydrogen of the required purity to the vehicular fuel cell system. Two basic strategies are being considered: (1) on-board fuel processing whereby alternative fuels such as methanol, ethanol or natural gas stored on the vehicle undergo reformation and subsequent processing to produce hydrogen, and (2) on-board storage of pure hydrogen provided by stationary fuel processing plants. This report analyzes fuel processor technologies, types of fuel and fuel cell options for on-board reformation. As the Phase 1 of a multi-phased program to develop a prototype multi-fuel reformer system for a fuel cell powered vehicle, the objective of this program was to evaluate the feasibility of a multi-fuel reformer concept and to select a reforming technology for further development in the Phase 2 program, with the ultimate goal of integration with a DOE-designated fuel cell and vehicle configuration. The basic reformer processes examined in this study included catalytic steam reforming (SR), non-catalytic partial oxidation (POX) and catalytic partial oxidation (also known as Autothermal Reforming, or ATR). Fuels under consideration in this study included methanol, ethanol, and natural gas. A systematic evaluation of reforming technologies, fuels, and transportation fuel cell applications was conducted for the purpose of selecting a suitable multi-fuel processor for further development and demonstration in a transportation application.
A Framework for Joint Optical-Wireless Resource Management in Multi-RAT, Heterogeneous Mobile Networks

DEFF Research Database (Denmark)

Zakrzewska, Anna; Popovska Avramova, Andrijana; Christiansen, Henrik Lehrmann

2013-01-01

Mobile networks are constantly evolving: new Radio Access Technologies (RATs) are being introduced, and backhaul architectures like Cloud-RAN (C-RAN) and distributed base stations are being proposed. Furthermore, small cells are being deployed to enhance network capacity. The end-users wish...... to be always connected to a high-quality service (high bit rates, low latency), thus causing a very complex network control task from an operator’s point of view. We thus propose a framework allowing joint overall network resource management. This scheme covers different types of network heterogeneity (multi......-RAT, multi-layer, multi-architecture) by introducing a novel, hierarchical approach to network resource management. Self-Organizing Networks (SON) and cognitive network behaviors are covered as well as more traditional mobile network features. The framework is applicable to all phases of network operation...
Code compression for VLIW embedded processors

Science.gov (United States)

Piccinelli, Emiliano; Sannino, Roberto

2004-04-01

The implementation of processors for embedded systems implies various issues: main constraints are cost, power dissipation and die area. On the other side, new terminals perform functions that require more computational flexibility and effort. Long code streams must be loaded into memories, which are expensive and power consuming, to run on DSPs or CPUs. To overcome this issue, the "SlimCode" proprietary algorithm presented in this paper (patent pending technology) can reduce the dimensions of the program memory. It can run offline and work directly on the binary code the compiler generates, by compressing it and creating a new binary file, about 40% smaller than the original one, to be loaded into the program memory of the processor. The decompression unit will be a small ASIC, placed between the Memory Controller and the System bus of the processor, keeping unchanged the internal CPU architecture: this implies that the methodology is completely transparent to the core. We present comparisons versus the state-of-the-art IBM Codepack algorithm, along with its architectural implementation into the ST200 VLIW family core.
Dual-core Itanium Processor

CERN Multimedia

2006-01-01

Intel’s first dual-core Itanium processor, code-named "Montecito" is a major release of Intel's Itanium 2 Processor Family, which implements the Intel Itanium architecture on a dual-core processor with two cores per die (integrated circuit). Itanium 2 is much more powerful than its predecessor. It has lower power consumption and thermal dissipation.
Safety-critical Java on a time-predictable processor

DEFF Research Database (Denmark)

Korsholm, Stephan E.; Schoeberl, Martin; Puffitsch, Wolfgang

2015-01-01

For real-time systems the whole execution stack needs to be time-predictable and analyzable for the worst-case execution time (WCET). This paper presents a time-predictable platform for safety-critical Java. The platform consists of (1) the Patmos processor, which is a time-predictable processor......; (2) a C compiler for Patmos with support for WCET analysis; (3) the HVM, which is a Java-to-C compiler; (4) the HVM-SCJ implementation which supports SCJ Level 0, 1, and 2 (for both single and multicore platforms); and (5) a WCET analysis tool. We show that real-time Java programs translated to C...... and compiled to a Patmos binary can be analyzed by the AbsInt aiT WCET analysis tool. To the best of our knowledge the presented system is the second WCET analyzable real-time Java system; and the first one on top of a RISC processor....
Stepping motor control processor reference manual. Volume I

International Nuclear Information System (INIS)

Holloway, F.W.; VanArsdall, P.J.; Suski, G.J.; Gant, R.G.; Rash, M.

1980-01-01

This manual is intended to serve several purposes. The first goal is to describe the capabilities and operation of the SMC processor package from an operator or user point of view. Secondly, the manual will describe in some detail the basic hardware elements and how they can be used effectively to implement a step motor control system. Practical information on the use, installation and checkout of the hardware set is presented in the following sections along with programming suggestions. Available related system software is described in this manual for reference and as an aid in understanding the system architecture. Section two presents an overview and operations manual of the SMC processor describing its composition and functional capabilities. Section three contains hardware descriptions in some detail for the LLL-designed hardware used in the SMC processor. Basic theory of operation and important features are explained
HTGR core seismic analysis using an array processor

International Nuclear Information System (INIS)

Shatoff, H.; Charman, C.M.

1983-01-01

A Floating Point Systems array processor performs nonlinear dynamic analysis of the high-temperature gas-cooled reactor (HTGR) core with significant time and cost savings. The graphite HTGR core consists of approximately 8000 blocks of various shapes which are subject to motion and impact during a seismic event. Two-dimensional computer programs (CRUNCH2D, MCOCO) can perform explicit step-by-step dynamic analyses of up to 600 blocks for time-history motions. However, use of two-dimensional codes was limited by the large cost and run times required. Three-dimensional analysis of the entire core, or even a large part of it, had been considered totally impractical. Because of the needs of the HTGR core seismic program, a Floating Point Systems array processor was used to enhance computer performance of the two-dimensional core seismic computer programs, MCOCO and CRUNCH2D. This effort began by converting the computational algorithms used in the codes to a form which takes maximum advantage of the parallel and pipeline processors offered by the architecture of the Floating Point Systems array processor. The subsequent conversion of the vectorized FORTRAN coding to the array processor required a significant programming effort to make the system work on the General Atomic (GA) UNIVAC 1100/82 host. These efforts were quite rewarding, however, since the cost of running the codes has been reduced approximately 50-fold and the time threefold. The core seismic analysis with large two-dimensional models has now become routine and extension to three-dimensional analysis is feasible. These codes simulate the one-fifth-scale full-array HTGR core model. This paper compares the analysis with the test results for sine-sweep motion
Accelerating molecular dynamic simulation on the cell processor and Playstation 3.

Science.gov (United States)

Luttmann, Edgar; Ensign, Daniel L; Vaidyanathan, Vishal; Houston, Mike; Rimon, Noam; Øland, Jeppe; Jayachandran, Guha; Friedrichs, Mark; Pande, Vijay S

2009-01-30

Implementation of molecular dynamics (MD) calculations on novel architectures will vastly increase its power to calculate the physical properties of complex systems. Herein, we detail algorithmic advances developed to accelerate MD simulations on the Cell processor, a commodity processor found in PlayStation 3 (PS3). In particular, we discuss issues regarding memory access versus computation and the types of calculations which are best suited for streaming processors such as the Cell, focusing on implicit solvation models. We conclude with a comparison of improved performance on the PS3's Cell processor over more traditional processors. (c) 2008 Wiley Periodicals, Inc.
Architecture of the Multi-Modal Organizational Research and Production Heterogeneous Network (MORPHnet)

Energy Technology Data Exchange (ETDEWEB)

Aiken, R.J.; Carlson, R.A.; Foster, I.T. [and others

1997-01-01

The research and education (R&E) community requires persistent and scaleable network infrastructure to concurrently support production and research applications as well as network research. In the past, the R&E community has relied on supporting parallel network and end-node infrastructures, which can be very expensive and inefficient for network service managers and application programmers. The grand challenge in networking is to provide support for multiple, concurrent, multi-layer views of the network for the applications and the network researchers, and to satisfy the sometimes conflicting requirements of both while ensuring one type of traffic does not adversely affect the other. Internet and telecommunications service providers will also benefit from a multi-modal infrastructure, which can provide smoother transitions to new technologies and allow for testing of these technologies with real user traffic while they are still in the pre-production mode. The authors proposed approach requires the use of as much of the same network and end system infrastructure as possible to reduce the costs needed to support both classes of activities (i.e., production and research). Breaking the infrastructure into segments and objects (e.g., routers, switches, multiplexors, circuits, paths, etc.) gives the capability to dynamically construct and configure the virtual active networks to address these requirements. These capabilities must be supported at the campus, regional, and wide-area network levels to allow for collaboration by geographically dispersed groups. The Multi-Modal Organizational Research and Production Heterogeneous Network (MORPHnet) described in this report is an initial architecture and framework designed to identify and support the capabilities needed for the proposed combined infrastructure and to address related research issues.
Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

OpenAIRE

Catalán, Sandra; Igual, Francisco D.; Mayo, Rafael; Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.

2015-01-01

Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware ...
Multi-Agent Decision Support Tool to Enable Interoperability among Heterogeneous Energy Systems

Directory of Open Access Journals (Sweden)

Brígida Teixeira

2018-02-01

Full Text Available Worldwide electricity markets are undergoing a major restructuring process. One of the main reasons for the ongoing changes is to enable the adaptation of current market models to the new paradigm that arises from the large-scale integration of distributed generation sources. In order to deal with the unpredictability caused by the intermittent nature of the distributed generation and the large number of variables that contribute to the energy sector balance, it is extremely important to use simulation systems that are capable of dealing with the required complexity. This paper presents the Tools Control Center (TOOCC, a framework that allows the interoperability between heterogeneous energy and power simulation systems through the use of ontologies, allowing the simulation of scenarios with a high degree of complexity, through the cooperation of the individual capacities of each system. A case study based on real data is presented in order to demonstrate the interoperability capabilities of TOOCC. The simulation considers the energy management of a microgrid of a real university campus, from the perspective of the network manager and also of its consumers/producers, in a projection for a typical day of the winter of 2050.
A FAST AND ELITIST BI-OBJECTIVE EVOLUTIONARY ALGORITHM FOR SCHEDULING INDEPENDENT TASKS ON HETEROGENEOUS SYSTEMS

Directory of Open Access Journals (Sweden)

G.Subashini

2010-07-01

Full Text Available To meet the increasing computational demands, geographically distributed resources need to be logically coupled to make them work as a unified resource. In analyzing the performance of such distributed heterogeneous computing systems scheduling a set of tasks to the available set of resources for execution is highly important. Task scheduling being an NP-complete problem, use of metaheuristics is more appropriate in obtaining optimal solutions. Schedules thus obtained can be evaluated using several criteria that may conflict with one another which require multi objective problem formulation. This paper investigates the application of an elitist Nondominated Sorting Genetic Algorithm (NSGA-II, to efficiently schedule a set of independent tasks in a heterogeneous distributed computing system. The objectives considered in this paper include minimizing makespan and average flowtime simultaneously. The implementation of NSGA-II algorithm and Weighted-Sum Genetic Algorithm (WSGA has been tested on benchmark instances for distributed heterogeneous systems. As NSGA-II generates a set of Pareto optimal solutions, to verify the effectiveness of NSGA-II over WSGA a fuzzy based membership value assignment method is employed to choose the best compromise solution from the obtained Pareto solution set.
An Asynchronous Time-Division-Multiplexed Network-on-Chip for Real-Time Systems

DEFF Research Database (Denmark)

Kasapaki, Evangelia

is an important part of the T-CREST paltform and used in a number of configurations. The flexible timing organization of Argo combines asynchronous routers with mesochronous NIs, which are connected to individually clocked cores, supporting a GALS system organization. The mesochronous NIs operate at the same......Multi-processor architectures using networks-on-chip (NOCs) for communication are becoming the standard approach in the development of embedded systems and general purpose platforms. Typically, multi-processor platforms follow a globally asynchronous locally synchronous (GALS) timing organization....... This thesis focuses on the design of Argo, a NOC targeted at hard real-time multi-processor platforms with a GALS timing organization. To support real-time communication, NOCs establish end-to-end connections and provide latency and throughput guarantees for these connections. Argo uses time division...

Energy Efficiency Experiments on Samsung Exynos 5 Heterogeneous Multicore using OmpSs Task Based Programming

OpenAIRE

Holmgren, Rune

2015-01-01

This thesis explore the energy efficiency of task based programming with OpenMP SuperScalar (OmpSs) on the heterogeneous Samsung Exynos 5422 system on a chip. The system features small energy efficient cores, large high performance cores and a GPGPU, and OmpSs tasks were run on all three different processors. Experiments running a genetic algorithm and a Cholesky decomposition were used to gather results. The option of running applications on the energy efficient cores, on the high perfo...
Bulk-memory processor for data acquisition

International Nuclear Information System (INIS)

Nelson, R.O.; McMillan, D.E.; Sunier, J.W.; Meier, M.; Poore, R.V.

1981-01-01

To meet the diverse needs and data rate requirements at the Van de Graaff and Weapons Neutron Research (WNR) facilities, a bulk memory system has been implemented which includes a fast and flexible processor. This bulk memory processor (BMP) utilizes bit slice and microcode techniques and features a 24 bit wide internal architecture allowing direct addressing of up to 16 megawords of memory and histogramming up to 16 million counts per channel without overflow. The BMP is interfaced to the MOSTEK MK 8000 bulk memory system and to the standard MODCOMP computer I/O bus. Coding for the BMP both at the microcode level and with macro instructions is supported. The generalized data acquisition system has been extended to support the BMP in a manner transparent to the user
Investigating the effectiveness of many-core network processors for high performance cyber protection systems. Part I, FY2011.

Energy Technology Data Exchange (ETDEWEB)

Wheeler, Kyle Bruce; Naegle, John Hunt; Wright, Brian J.; Benner, Robert E., Jr.; Shelburg, Jeffrey Scott; Pearson, David Benjamin; Johnson, Joshua Alan; Onunkwo, Uzoma A.; Zage, David John; Patel, Jay S.

2011-09-01

This report documents our first year efforts to address the use of many-core processors for high performance cyber protection. As the demands grow for higher bandwidth (beyond 1 Gbits/sec) on network connections, the need to provide faster and more efficient solution to cyber security grows. Fortunately, in recent years, the development of many-core network processors have seen increased interest. Prior working experiences with many-core processors have led us to investigate its effectiveness for cyber protection tools, with particular emphasis on high performance firewalls. Although advanced algorithms for smarter cyber protection of high-speed network traffic are being developed, these advanced analysis techniques require significantly more computational capabilities than static techniques. Moreover, many locations where cyber protections are deployed have limited power, space and cooling resources. This makes the use of traditionally large computing systems impractical for the front-end systems that process large network streams; hence, the drive for this study which could potentially yield a highly reconfigurable and rapidly scalable solution.
Point and track-finding processors for multiwire chambers

CERN Document Server

Hansroul, M

1973-01-01

The hardware processors described below are designed to be used in conjunction with multi-wire chambers. They have the characteristic of being based on computational methods in contrast to analogue procedures. In a sense, they are hardware implementations of computer programs. But, being specially designed for their purpose, they are free of the restrictions imposed by the architecture of the computer on which the equivalent program is to run. The parallelism inherent in the algorithms can thus be fully exploited. Combined with the use of fast access scratch-pad memories and the non-sequential nature of the control program, the parallelism accounts for the fact that these processors are expected to execute 2-3 orders of magnitude faster than the equivalent Fortran programs on a CDC 7600 or 6600. As a consequence, methods which are simple and straightforward, but which are impractical because they require an exorbitant amount of computer time can on the contrary be very attractive for hardware implementation. ...
Teamwork in Multi-Agent Systems A Formal Approach

CERN Document Server

Dunin-Keplicz, Barbara Maria

2010-01-01

What makes teamwork tick?. Cooperation matters, in daily life and in complex applications. After all, many tasks need more than a single agent to be effectively performed. Therefore, teamwork rules!. Teams are social groups of agents dedicated to the fulfilment of particular persistent tasks. In modern multiagent environments, heterogeneous teams often consist of autonomous software agents, various types of robots and human beings. Teamwork in Multi-agent Systems: A Formal Approach explains teamwork rules in terms of agents' attitudes and their complex interplay. It provides the first comprehe
Graphical user interface for TOUGH/TOUGH2 - development of database, pre-processor, and post-processor

Energy Technology Data Exchange (ETDEWEB)

Sato, Tatsuya; Okabe, Takashi; Osato, Kazumi [Geothermal Energy Research and Development Co., Ltd., Tokyo (Japan)

1995-03-01

One of the advantages of the TOUGH/TOUGH2 (Pruess, 1987 and 1991) is the modeling using {open_quotes}free shape{close_quotes} polygonal blocks. However, the treatment of three-dimensional information, particularly for TOUGH/TOUGH2 is not easy because of the {open_quotes}free shape{close_quotes} polygonal blocks. Therefore, we have developed a database named {open_quotes}GEOBASE{close_quotes} and a pre/post-processor named {open_quotes}GEOGRAPH{close_quotes} for TOUGH/TOUGH2 on engineering work station (EWS). {open_quotes}GEOGRAPH{close_quotes} is based on the ORACLE{sup *1} relational database manager system to access data sets of surface exploration (geology, geophysics, geochemistry, etc.), drilling (well trajectory, geological column, logging, etc.), well testing (production test, injection test, interference test, tracer test, etc.) and production/injection history.{open_quotes}GEOGRAPH{close_quotes} consists of {open_quotes}Pre-processor{close_quotes} that can construct the three-dimensional free shape reservoir modeling by mouse operation on X-window and {open_quotes}Post-processor{close_quotes} that can display several kinds of two/three-dimensional maps and X-Y plots to compile data on {open_quotes}GEOBASE{close_quotes} and result of TOUGH/TOUGH2 calculation. This paper shows concept of the systems and examples of utilization.
A word processor optimized for preparing journal articles and student papers.

Science.gov (United States)

Wolach, A H; McHale, M A

2001-11-01

A new Windows-based word processor for preparing journal articles and student papers is described. In addition to standard features found in word processors, the present word processor provides specific help in preparing manuscripts. Clicking on "Reference Help (APA Form)" in the "File" menu provides a detailed help system for entering the references in a journal article. Clicking on "Examples and Explanations of APA Form" provides a help system with examples of the various sections of a review article, journal article that has one experiment, or journal article that has two or more experiments. The word processor can automatically place the manuscript page header and page number at the top of each page using the form required by APA and Psychonomic Society journals. The "APA Form" submenu of the "Help" menu provides detailed information about how the word processor is optimized for preparing articles and papers.
Development of level 2 processor for the readout of TMC

International Nuclear Information System (INIS)

Arai, Y.; Ikeno, M.; Murata, T.; Sudo, F.; Emura, T.

1995-01-01

We have developed a prototype 8-bit processor for the level 2 data processing for the Time Memory Cell (TMC). The first prototype processor successfully runs with 18 MHz clock. The operation of same clock frequency as TMC (30 MHz) will be easily achieved with simple modifications. Although the processor is very primitive one but shows its powerful performance and flexibility. To realize the compact TMC/L2P (Level 2 Processor) system, it is better to include the microcode memory within the chip. Encoding logic of the microcode must be included to reduce the microcode memory in this case. (J.P.N.)
The symbol coding language for the BUTs processor of in-core reactor control systems

International Nuclear Information System (INIS)

Vorob'ev, D.M.; Golovanov, M.N.; Levin, G.L.; Parfenova, T.K.; Filatov, V.P.

1978-01-01

A symbolic coding language is described; it has been developed for automation of making up programs for in-core control systems. The systems use the ideology of the CAMAC-VECTOR system and include the BUTs-20 processor. The symbolic coding language has been developed as a programming language of the ASSEMBLER type. Operators of instructions and pseudo-instructions, the rules of reading in the text of the source program, and operator record formats are considered
A Trade Study of Two Membrane-Aerated Biological Water Processors

Science.gov (United States)

Allada, Ram; Lange, Kevin; Vega. Leticia; Roberts, Michael S.; Jackson, Andrew; Anderson, Molly; Pickering, Karen

2011-01-01

Biologically based systems are under evaluation as primary water processors for next generation life support systems due to their low power requirements and their inherent regenerative nature. This paper will summarize the results of two recent studies involving membrane aerated biological water processors and present results of a trade study comparing the two systems with regards to waste stream composition, nutrient loading and system design. Results of optimal configurations will be presented.
Automatic differentiation for design sensitivity analysis of structural systems using multiple processors

Science.gov (United States)

Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi

1994-01-01

An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.
Software verification and validation methodology for advanced digital reactor protection system using diverse dual processors to prevent common mode failure

International Nuclear Information System (INIS)

Son, Ki Chang; Shin, Hyun Kook; Lee, Nam Hoon; Baek, Seung Min; Kim, Hang Bae

2001-01-01

The Advanced Digital Reactor Protection System (ADRPS) with diverse dual processors is being developed by the National Research Lab of KOPEC for ADRPS development. One of the ADRPS goals is to develop digital Plant Protection System (PPS) free of Common Mode Failure (CMF). To prevent CMF, the principle of diversity is applied to both hardware design and software design. For the hardware diversity, two different types of CPUs are used for Bistable Processor and Local Coincidence Logic Processor. The VME based Single Board Computers (SBC) are used for the CPU hardware platforms. The QNX Operating System (OS) and the VxWorks OS are used for software diversity. Rigorous Software Verification and Validation (V and V) is also required to prevent CMF. In this paper, software V and V methodology for the ADRPS is described to enhance the ADRPS software reliability and to assure high quality of the ADRPS software
PPM-based System for Guided Waves Communication Through Corrosion Resistant Multi-wire Cables

Science.gov (United States)

Trane, G.; Mijarez, R.; Guevara, R.; Pascacio, D.

Novel wireless communication channels are a necessity in applications surrounded by harsh environments, for instance down-hole oil reservoirs. Traditional radio frequency (RF) communication schemes are not capable of transmitting signals through metal enclosures surrounded by corrosive gases and liquids. As an alternative to RF, a pulse position modulation (PPM) guided waves communication system has been developed and evaluated using a corrosion resistant 4H18 multi-wire cable, commonly used to descend electronic gauges in down-hole oil applications, as the communication medium. The system consists of a transmitter and a receiver that utilizes a PZT crystal, for electrical/mechanical coupling, attached to each extreme of the multi-wire cable. The modulator is based on a microcontroller, which transmits60 kHz guided wave pulses, and the demodulator is based on a commercial digital signal processor (DSP) module that performs real time DSP algorithms. Experimental results are presented, which were obtained using a 1m corrosion resistant 4H18multi-wire cable, commonly used with downhole electronic gauges in the oil sector. Although there was significant dispersion and multiple mode excitations of the transmitted guided wave energy pulses, the results show that data rates on the order of 500 bits per second are readily available employing PPM and simple communications techniques.
UTLEON3 Exploring Fine-Grain Multi-Threading in FPGAs

CERN Document Server

Daněk, Martin; Kohout, Lukáš; Sýkora, Jaroslav; Bartosinski, Roman

2013-01-01

This book describes a specification, microarchitecture, VHDL implementation and evaluation of a SPARC v8 CPU with fine-grain multi-threading, called micro-threading. The CPU, named UTLEON3, is an alternative platform for exploring CPU multi-threading that is compatible with the industry-standard GRLIB package. The processor microarchitecture was designed to map in an efficient way the data-flow scheme on a classical von Neumann pipelined processing used in common processors, while retaining full binary compatibility with existing legacy programs. Describes and documents a working SPARC v8, with fine-grain multithreading and fast context switch; Provides VHDL sources for the described processor; Describes a latency-tolerant framework for coupling hardware accelerators to microthreaded processor pipelines; Includes programming by example in the micro-threaded assembly language.
Numeric processor and text manipulator for the ''MASTER CONTROL'' data-base-management system

International Nuclear Information System (INIS)

Kuhn, R.W.

1976-01-01

The numeric and text processor of the MASTER CONTROL (MCP) data-base-management system permits the user to define fields and arrays that are functionally dependent on the data retained in a data base. This allows the storage of only the essential and unique information and data, and the calculation of derivable quantities as required. The derived quantity can be expressed as an arithmetic expression, that is, a functional relationship. Functions can be multiply subscripted and can be embedded within other functions at up to 58 levels. They can be stored either semi-permanently in a repertoire of functional relations, or they can be defined interactively from a terminal and used immediately for searching on the derived value. The processor also permits the conversion of literal strings into numbers, and vice versa. In addition, the user can define dictionaries that allow the expansion of keyed sentinels associated with records in the data base into fully descriptive expressions. This option can be used for cost-effective searching and data compaction. The functional definitions are reduced to Polish notation and stored in a disk file from which they are either retrieved on demand and evaluated according to the data of records specified or used in any given MASTER CONTROL command. The language used for the definitions of the numeric processor is essentially FORTRAN; most of the standard functions and over two dozen special functions are thus available. The functional processor provides a powerful technique for the integration of text and data for energy research and for scientific and technological work in general. MASTER CONTROL is operational at the Lawrence Livermore Laboratory (LLL) and at the Los Alamos Scientific Laboratory (LASL). 6 figures, 1 table
Performance and Operational Experience with the Heterogeneous Farm of the ATLAS Trigger and Data Acquisition System

CERN Document Server

Garelli, N; The ATLAS collaboration; Vandelli, W

2011-01-01

The ATLAS trigger and data acquisition (TDAQ) is a distributed, multi trigger level, data-acquisition system, mostly made of off-the-shelf processing units organized in a farm. In its final configuration the system will account more than 2000 nodes, sporting heterogeneous capabilities and network connections, due to the TDAQ program for rolling expansions and upgrades. In this paper we present how we dealt with the farm heterogeneity during the proton-proton collisions of 2010 and 2011: a period characterized by changing working conditions, and constantly increasing LHC instantaneous luminosity. We describe a graphical tool to balance the computing-power and bandwidth sharing across the trigger farms, a data-flow monitoring daemon that provides high-level resource-aware data-flow operational information and the evolution of data-flow communication protocols.
The Molen Polymorphic Media Processor

NARCIS (Netherlands)

Kuzmanov, G.K.

2004-01-01

In this dissertation, we address high performance media processing based on a tightly coupled co-processor architectural paradigm. More specifically, we introduce a reconfigurable media augmentation of a general purpose processor and implement it into a fully operational processor prototype. The
Resource allocation in heterogeneous cloud radio access networks: advances and challenges

KAUST Repository

Dahrouj, Hayssam; Douik, Ahmed S.; Dhifallah, Oussama Najeeb; Al-Naffouri, Tareq Y.; Alouini, Mohamed-Slim

2015-01-01

, becomes a necessity. By connecting all the base stations from different tiers to a central processor (referred to as the cloud) through wire/wireline backhaul links, the heterogeneous cloud radio access network, H-CRAN, provides an open, simple
A Low Power, Parallel Wearable Multi-Sensor System for Human Activity Evaluation.

Science.gov (United States)

Li, Yuecheng; Jia, Wenyan; Yu, Tianjian; Luan, Bo; Mao, Zhi-Hong; Zhang, Hong; Sun, Mingui

2015-04-01

In this paper, the design of a low power heterogeneous wearable multi-sensor system, built with Zynq System-on-Chip (SoC), for human activity evaluation is presented. The powerful data processing capability and flexibility of this SoC represent significant improvements over our previous ARM based system designs. The new system captures and compresses multiple color images and sensor data simultaneously. Several strategies are adopted to minimize power consumption. Our wearable system provides a new tool for the evaluation of human activity, including diet, physical activity and lifestyle.
Multicriterion problem of allocation of resources in the heterogeneous distributed information processing systems

Science.gov (United States)

Antamoshkin, O. A.; Kilochitskaya, T. R.; Ontuzheva, G. A.; Stupina, A. A.; Tynchenko, V. S.

2018-05-01

This study reviews the problem of allocation of resources in the heterogeneous distributed information processing systems, which may be formalized in the form of a multicriterion multi-index problem with the linear constraints of the transport type. The algorithms for solution of this problem suggest a search for the entire set of Pareto-optimal solutions. For some classes of hierarchical systems, it is possible to significantly speed up the procedure of verification of a system of linear algebraic inequalities for consistency due to the reducibility of them to the stream models or the application of other solution schemes (for strongly connected structures) that take into account the specifics of the hierarchies under consideration.

Multi-Objective Approach for Energy-Aware Workflow Scheduling in Cloud Computing Environments

Directory of Open Access Journals (Sweden)

Sonia Yassa

2013-01-01

Full Text Available We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach.
Multi-Objective Approach for Energy-Aware Workflow Scheduling in Cloud Computing Environments

Science.gov (United States)

Kadima, Hubert; Granado, Bertrand

2013-01-01

We address the problem of scheduling workflow applications on heterogeneous computing systems like cloud computing infrastructures. In general, the cloud workflow scheduling is a complex optimization problem which requires considering different criteria so as to meet a large number of QoS (Quality of Service) requirements. Traditional research in workflow scheduling mainly focuses on the optimization constrained by time or cost without paying attention to energy consumption. The main contribution of this study is to propose a new approach for multi-objective workflow scheduling in clouds, and present the hybrid PSO algorithm to optimize the scheduling performance. Our method is based on the Dynamic Voltage and Frequency Scaling (DVFS) technique to minimize energy consumption. This technique allows processors to operate in different voltage supply levels by sacrificing clock frequencies. This multiple voltage involves a compromise between the quality of schedules and energy. Simulation results on synthetic and real-world scientific applications highlight the robust performance of the proposed approach. PMID:24319361
Improved multi-stage neonatal seizure detection using a heuristic classifier and a data-driven post-processor.

Science.gov (United States)

Ansari, A H; Cherian, P J; Dereymaeker, A; Matic, V; Jansen, K; De Wispelaere, L; Dielman, C; Vervisch, J; Swarte, R M; Govaert, P; Naulaers, G; De Vos, M; Van Huffel, S

2016-09-01

After identifying the most seizure-relevant characteristics by a previously developed heuristic classifier, a data-driven post-processor using a novel set of features is applied to improve the performance. The main characteristics of the outputs of the heuristic algorithm are extracted by five sets of features including synchronization, evolution, retention, segment, and signal features. Then, a support vector machine and a decision making layer remove the falsely detected segments. Four datasets including 71 neonates (1023h, 3493 seizures) recorded in two different university hospitals, are used to train and test the algorithm without removing the dubious seizures. The heuristic method resulted in a false alarm rate of 3.81 per hour and good detection rate of 88% on the entire test databases. The post-processor, effectively reduces the false alarm rate by 34% while the good detection rate decreases by 2%. This post-processing technique improves the performance of the heuristic algorithm. The structure of this post-processor is generic, improves our understanding of the core visually determined EEG features of neonatal seizures and is applicable for other neonatal seizure detectors. The post-processor significantly decreases the false alarm rate at the expense of a small reduction of the good detection rate. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
New system applying image processor to automatically separate cation exchange resin and anion exchange resin for condensate demineralizer

International Nuclear Information System (INIS)

Adachi, Tsuneyasu; Nagao, Nobuaki; Yoshimori, Yasuhide; Inoue, Takashi; Yoda, Shuji

2014-01-01

In PWR plant, condensate demineralizer is equipped to remove corrosive ion in condensate water. Mixed bed packing cation exchange resin (CER) and anion exchange resin (AER) is generally applied, and these are regenerated after separation to each layer periodically. Since the AER particle is slightly lighter than the CER particle, the AER layer is brought up onto the CER layer by feeding water upward from the bottom of column (backwashing). The separation performance is affected by flow rate and temperature of water for backwashing, so normally operators set the proper condition parameters regarding separation manually every time for regeneration. The authors have developed the new separation system applying CCD camera and image processor. The system is comprised of CCD camera, LED lamp, image processor, controller, flow control valves and background color panel. Blue color of the panel, which is corresponding to the complementary color against both ivory color of AER and brown color of CER, is key to secure the system precision. At first the color image of the CER via the CCD camera is digitized and memorized by the image processor. The color of CER in the field of vision of the camera is scanned by the image processor, and the position where the maximum difference of digitized color index is indicated is judged as the interface. The detected interface is able to make the accordance with the set point by adjusting the flow rate of backwashing. By adopting the blue background panel, it is also possible to draw the AER out of the column since detecting the interface of the CER clearly. The system has provided the reduction of instability factor concerning separation of resin during regeneration process. The system has been adopted in two PWR plants in Japan, it has been demonstrating its stable and precise performance. (author)
The usefulness of multi-well aquifer tests in heterogeneous aquifers

International Nuclear Information System (INIS)

Young, S.C.; Benton, D.J.; Herweijer, J.C.; Sims, P.

1990-01-01

Three large-scale (100 m) and seven small-scale (3-7 m) multi-well aquifer tests were conducted in a heterogeneous aquifer to determine the transmissivity distribution across a one-hectare test site. Two of the large-scale tests had constant but different rates of discharge; the remaining large-scale test had a discharge that was pulsed at regulated intervals. The small-scale tests were conducted at two well clusters 20 m apart. The program WELTEST was written to analyze the data. By using the methods of non-linear least squares regression analysis and Broyden's method to solve for non-linear extrema, WELTEST automatically determines the best values of transmissivity and the storage coefficient. The test results show that order of magnitude differences in the calculated transmissivities at a well location can be realized by varying the discharge rate at the pumping well, the duration of the aquifer test, and/or the location of the pumping well. The calculated storage coefficients for the tests cover a five-order magnitude range. The data show a definite trend for the storage coefficient to increase with the distance between the pumping and the observation wells. This trend is shown to be related to the orientation of high hydraulic conductivity zones between the pumping and the observation wells. A comparison among single-well aquifer tests, geological investigations and multi-well aquifer tests indicate that the multi-well tests are poorly suited for characterizing a transmissivity field. (Author) (11 refs., 14 figs.)
Efficient Use of Distributed Systems for Scientific Applications

Science.gov (United States)

Taylor, Valerie; Chen, Jian; Canfield, Thomas; Richard, Jacques

2000-01-01

Distributed computing has been regarded as the future of high performance computing. Nationwide high speed networks such as vBNS are becoming widely available to interconnect high-speed computers, virtual environments, scientific instruments and large data sets. One of the major issues to be addressed with distributed systems is the development of computational tools that facilitate the efficient execution of parallel applications on such systems. These tools must exploit the heterogeneous resources (networks and compute nodes) in distributed systems. This paper presents a tool, called PART, which addresses this issue for mesh partitioning. PART takes advantage of the following heterogeneous system features: (1) processor speed; (2) number of processors; (3) local network performance; and (4) wide area network performance. Further, different finite element applications under consideration may have different computational complexities, different communication patterns, and different element types, which also must be taken into consideration when partitioning. PART uses parallel simulated annealing to partition the domain, taking into consideration network and processor heterogeneity. The results of using PART for an explicit finite element application executing on two IBM SPs (located at Argonne National Laboratory and the San Diego Supercomputer Center) indicate an increase in efficiency by up to 36% as compared to METIS, a widely used mesh partitioning tool. The input to METIS was modified to take into consideration heterogeneous processor performance; METIS does not take into consideration heterogeneous networks. The execution times for these applications were reduced by up to 30% as compared to METIS. These results are given in Figure 1 for four irregular meshes with number of elements ranging from 30,269 elements for the Barth5 mesh to 11,451 elements for the Barth4 mesh. Future work with PART entails using the tool with an integrated application requiring
LORIS: A web-based data management system for multi-center studies.

Directory of Open Access Journals (Sweden)

Samir eDas

2012-01-01

Full Text Available LORIS (Longitudinal Online Research and Imaging System is a modular and extensible web-based data management system that integrates all aspects of a multi-center study: from heterogeneous data acquisition (imaging, clinical, behavior, genetics to storage, processing and ultimately dissemination. It provides a secure, user-friendly, and streamlined platform to automate the flow of clinical trials and complex multi-center studies. A subject-centric internal organization allows researchers to capture and subsequently extract all information, longitudinal or cross-sectional, from any subset of the study cohort. Extensive error-checking and quality control procedures, security, data management, data querying and administrative functions provide LORIS with a triple capability (i continuous project coordination and monitoring of data acquisition (ii data storage/cleaning/querying, (iii interface with arbitrary external data processing pipelines. LORIS is a complete solution that has been thoroughly tested through the full life cycle of a multi-center longitudinal project# and is now supporting numerous neurodevelopment and neurodegeneration research projects internationally.
Quantitative multi-scale analysis of mineral distributions and fractal pore structures for a heterogeneous Junger Basin shale

International Nuclear Information System (INIS)

Wang, Y.D.; Ren, Y.Q.; Hu, T.; Deng, B.; Xiao, T.Q.; Liu, K.Y.; Yang, Y.S.

2016-01-01

Three dimensional (3D) characterization of shales has recently attracted wide attentions in relation to the growing importance of shale oil and gas. Obtaining a complete 3D compositional distribution of shale has proven to be challenging due to its multi-scale characteristics. A combined multi-energy X-ray micro-CT technique and data-constrained modelling (DCM) approach has been used to quantitatively investigate the multi-scale mineral and porosity distributions of a heterogeneous shale from the Junger Basin, northwestern China by sub-sampling. The 3D sub-resolution structures of minerals and pores in the samples are quantitatively obtained as the partial volume fraction distributions, with colours representing compositions. The shale sub-samples from two areas have different physical structures for minerals and pores, with the dominant minerals being feldspar and dolomite, respectively. Significant heterogeneities have been observed in the analysis. The sub-voxel sized pores form large interconnected clusters with fractal structures. The fractal dimensions of the largest clusters for both sub-samples were quantitatively calculated and found to be 2.34 and 2.86, respectively. The results are relevant in quantitative modelling of gas transport in shale reservoirs
Implementing a low-latency parallel graphic equalizer with heterogeneous computing

NARCIS (Netherlands)

Norilo, Vesa; Verstraelen, Martinus Johannes Wilhelmina; Valimaki, Vesa; Svensson, Peter; Kristiansen, Ulf

2015-01-01

This paper describes the implementation of a recently introduced parallel graphic equalizer (PGE) in a heterogeneous way. The control and audio signal processing parts of the PGE are distributed to a PC and to a signal processor, of WaveCore architecture, respectively. This arrangement is
Application of evolutionary algorithms for multi-objective optimization in VLSI and embedded systems

CERN Document Server

2015-01-01

This book describes how evolutionary algorithms (EA), including genetic algorithms (GA) and particle swarm optimization (PSO) can be utilized for solving multi-objective optimization problems in the area of embedded and VLSI system design. Many complex engineering optimization problems can be modelled as multi-objective formulations. This book provides an introduction to multi-objective optimization using meta-heuristic algorithms, GA and PSO, and how they can be applied to problems like hardware/software partitioning in embedded systems, circuit partitioning in VLSI, design of operational amplifiers in analog VLSI, design space exploration in high-level synthesis, delay fault testing in VLSI testing, and scheduling in heterogeneous distributed systems. It is shown how, in each case, the various aspects of the EA, namely its representation, and operators like crossover, mutation, etc. can be separately formulated to solve these problems. This book is intended for design engineers and researchers in the field ...
A multi-attribute vertical handoff scheme for heterogeneous wireless networks

Directory of Open Access Journals (Sweden)

JI Xiaolong

2014-04-01

Full Text Available In order to meet the user demand for different services as well as to mitigate the Ping-pong effect caused by vertical handoff for wireless network,a multi-attribute vertical handoff scheme for heterogeneous wireless network is proposed.In the algorithm,a fuzzy logic method is used to make pre-decision.The optimal handoff target network is selected by a cost function of network which uses an Analytic Hierarchy Process to calculate the weights of SNR,delay,cost and user preference in different business scenarios.Simulation is performed in the environment which is overlapped by WiMAX and UMTS networks.Results show that the proposed approach can effectively reduce the number of handoff and power consumption in a condition to satisfy the user needs.
Alternative Water Processor Test Development

Science.gov (United States)

Pickering, Karen D.; Mitchell, Julie; Vega, Leticia; Adam, Niklas; Flynn, Michael; Wjee (er. Rau); Lunn, Griffin; Jackson, Andrew

2012-01-01

The Next Generation Life Support Project is developing an Alternative Water Processor (AWP) as a candidate water recovery system for long duration exploration missions. The AWP consists of biological water processor (BWP) integrated with a forward osmosis secondary treatment system (FOST). The basis of the BWP is a membrane aerated biological reactor (MABR), developed in concert with Texas Tech University. Bacteria located within the MABR metabolize organic material in wastewater, converting approximately 90% of the total organic carbon to carbon dioxide. In addition, bacteria convert a portion of the ammonia-nitrogen present in the wastewater to nitrogen gas, through a combination of nitrogen and denitrification. The effluent from the BWP system is low in organic contaminants, but high in total dissolved solids. The FOST system, integrated downstream of the BWP, removes dissolved solids through a combination of concentration-driven forward osmosis and pressure driven reverse osmosis. The integrated system is expected to produce water with a total organic carbon less than 50 mg/l and dissolved solids that meet potable water requirements for spaceflight. This paper describes the test definition, the design of the BWP and FOST subsystems, and plans for integrated testing.
Discussion paper for a highly parallel array processor-based machine

International Nuclear Information System (INIS)

Hagstrom, R.; Bolotin, G.; Dawson, J.

1984-01-01

The architectural plant for a quickly realizable implementation of a highly parallel special-purpose computer system with peak performance in the range of 6 billion floating point operations per second is discussed. The architecture is suitable to Lattice Gauge theoretical computations of fundamental physics interest and may be applicable to a range of other problems which deal with numerically intensive computational problems. The plan is quickly realizable because it employs a maximum of commercially available hardware subsystems and because the architecture is software-transparent to the individual processors, allowing straightforward re-use of whatever commercially available operating-systems and support software that is suitable to run on the commercially-produced processors. A tiny prototype instrument, designed along this architecture has already operated. A few elementary examples of programs which can run efficiently are presented. The large machine which the authors would propose to build would be based upon a highly competent array-processor, the ST-100 Array Processor, and specific design possibilities are discussed. The first step toward realizing this plan practically is to install a single ST-100 to allow algorithm development to proceed while a demonstration unit is built using two of the ST-100 Array Processors
Direction-sensitive smart monitoring of structures using heterogeneous smartphone sensor data and coordinate system transformation

Science.gov (United States)

Ozer, Ekin; Feng, Maria Q.

2017-04-01

Mobile, heterogeneous, and smart sensor networks produce pervasive structural health monitoring (SHM) information. With various embedded sensors, smartphones have emerged to innovate SHM by empowering citizens to serve as sensors. By default, smartphones meet the fundamental smart sensor criteria, thanks to the built-in processor, memory, wireless communication units and mobile operating system. SHM using smartphones, however, faces technical challenges due to citizen-induced uncertainties, undesired sensor-structure integration, and lack of control over the sensing platform. Previously, the authors presented successful applications of smartphone accelerometers for structural vibration measurement and proposed a monitoring framework under citizen-induced spatiotemporal uncertainties. This study aims at extending the capabilities of smartphone-based SHM with a special focus on the lack of control over the sensor (i.e., the phone) positioning by citizens resulting in unknown sensor orientations. Using smartphone gyroscope, accelerometer, and magnetometer; instantaneous sensor orientation can be obtained with respect to gravitational and magnetic north directions. Using these sensor data, mobile operating system frameworks return processed features such as attitude and heading that can be used to correct misaligned sensor signals. For this purpose, a coordinate transformation procedure is proposed and illustrated on a two-story laboratory structural model and real-scale bridges with various sensor positioning examples. The proposed method corrects the sensor signals by tracking their orientations and improves measurement accuracy. Moreover, knowing structure’s coordinate system a priori, even the data from arbitrarily positioned sensors can automatically be transformed to the structural coordinates. In addition, this paper also touches some secondary mobile and heterogeneous data issues including imperfect sampling and geolocation services. The coordinate system
Satellite on-board real-time SAR processor prototype

Science.gov (United States)

Bergeron, Alain; Doucet, Michel; Harnisch, Bernd; Suess, Martin; Marchese, Linda; Bourqui, Pascal; Desnoyers, Nicholas; Legros, Mathieu; Guillot, Ludovic; Mercier, Luc; Châteauneuf, François

2017-11-01

A Compact Real-Time Optronic SAR Processor has been successfully developed and tested up to a Technology Readiness Level of 4 (TRL4), the breadboard validation in a laboratory environment. SAR, or Synthetic Aperture Radar, is an active system allowing day and night imaging independent of the cloud coverage of the planet. The SAR raw data is a set of complex data for range and azimuth, which cannot be compressed. Specifically, for planetary missions and unmanned aerial vehicle (UAV) systems with limited communication data rates this is a clear disadvantage. SAR images are typically processed electronically applying dedicated Fourier transformations. This, however, can also be performed optically in real-time. Originally the first SAR images were optically processed. The optical Fourier processor architecture provides inherent parallel computing capabilities allowing real-time SAR data processing and thus the ability for compression and strongly reduced communication bandwidth requirements for the satellite. SAR signal return data are in general complex data. Both amplitude and phase must be combined optically in the SAR processor for each range and azimuth pixel. Amplitude and phase are generated by dedicated spatial light modulators and superimposed by an optical relay set-up. The spatial light modulators display the full complex raw data information over a two-dimensional format, one for the azimuth and one for the range. Since the entire signal history is displayed at once, the processor operates in parallel yielding real-time performances, i.e. without resulting bottleneck. Processing of both azimuth and range information is performed in a single pass. This paper focuses on the onboard capabilities of the compact optical SAR processor prototype that allows in-orbit processing of SAR images. Examples of processed ENVISAT ASAR images are presented. Various SAR processor parameters such as processing capabilities, image quality (point target analysis), weight and
The Serial Link Processor for the Fast TracKer (FTK) processor at ATLAS

CERN Document Server

Biesuz, Nicolo Vladi; The ATLAS collaboration; Luciano, Pierluigi; Magalotti, Daniel; Rossi, Enrico

2015-01-01

The Associative Memory (AM) system of the Fast Tracker (FTK) processor has been designed to perform pattern matching using the hit information of the ATLAS experiment silicon tracker. The AM is the heart of FTK and is mainly based on the use of ASICs (AM chips) designed to execute pattern matching with a high degree of parallelism. The AM system finds track candidates at low resolution that are seeds for a full resolution track fitting. To solve the very challenging data traffic problems inside FTK, multiple board and chip designs have been performed. The currently proposed solution is named the “Serial Link Processor” and is based on an extremely powerful network of 828 2 Gbit/s serial links for a total in/out bandwidth of 56 Gb/s. This paper reports on the design of the Serial Link Processor consisting of two types of boards, the Local Associative Memory Board (LAMB), a mezzanine where the AM chips are mounted, and the Associative Memory Board (AMB), a 9U VME board which holds and exercises four LAMBs. ...
Architecture of high reliable control systems using complex software

International Nuclear Information System (INIS)

Tallec, M.

1990-01-01

The problems involved by the use of complex softwares in control systems that must insure a very high level of safety are examined. The first part makes a brief description of the prototype of PROSPER system. PROSPER means protection system for nuclear reactor with high performances. It has been installed on a French nuclear power plant at the beginnning of 1987 and has been continually working since that time. This prototype is realized on a multi-processors system. The processors communicate between themselves using interruptions and protected shared memories. On each processor, one or more protection algorithms are implemented. Those algorithms use data coming directly from the plant and, eventually, data computed by the other protection algorithms. Each processor makes its own acquisitions from the process and sends warning messages if some operating anomaly is detected. All algorithms are activated concurrently on an asynchronous way. The results are presented and the safety related problems are detailed. - The second part is about measurements' validation. First, we describe how the sensors' measurements will be used in a protection system. Then, a proposal for a method based on the techniques of artificial intelligence (expert systems and neural networks) is presented. - The last part is about the problems of architectures of systems including hardware and software: the different types of redundancies used till now and a proposition of a multi-processors architecture which uses an operating system that is able to manage several tasks implemented on different processors, which verifies the good operating of each of those tasks and of the related processors and which allows to carry on the operation of the system, even in a degraded manner when a failure has been detected are detailed [fr
Robust mechanobiological behavior emerges in heterogeneous myosin systems

Science.gov (United States)

Egan, Paul F.; Moore, Jeffrey R.; Ehrlicher, Allen J.; Weitz, David A.; Schunn, Christian; Cagan, Jonathan; LeDuc, Philip

2017-09-01

Biological complexity presents challenges for understanding natural phenomenon and engineering new technologies, particularly in systems with molecular heterogeneity. Such complexity is present in myosin motor protein systems, and computational modeling is essential for determining how collective myosin interactions produce emergent system behavior. We develop a computational approach for altering myosin isoform parameters and their collective organization, and support predictions with in vitro experiments of motility assays with α-actinins as molecular force sensors. The computational approach models variations in single myosin molecular structure, system organization, and force stimuli to predict system behavior for filament velocity, energy consumption, and robustness. Robustness is the range of forces where a filament is expected to have continuous velocity and depends on used myosin system energy. Myosin systems are shown to have highly nonlinear behavior across force conditions that may be exploited at a systems level by combining slow and fast myosin isoforms heterogeneously. Results suggest some heterogeneous systems have lower energy use near stall conditions and greater energy consumption when unloaded, therefore promoting robustness. These heterogeneous system capabilities are unique in comparison with homogenous systems and potentially advantageous for high performance bionanotechnologies. Findings open doors at the intersections of mechanics and biology, particularly for understanding and treating myosin-related diseases and developing approaches for motor molecule-based technologies.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets.

Science.gov (United States)

Scharfe, Michael; Pielot, Rainer; Schreiber, Falk

2010-01-11

Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.
Cross-layer Modelling for Heterogeneous MPSoCs

DEFF Research Database (Denmark)

Madsen, Jan

2005-01-01

of different mappings of tasks to processors (software or hardware) including memory usage, and effects of RTOS selection, including scheduling, synchronization and resource allocation policies. In this presentation we focus on the programmer’s view illustrated through a design space exploration of a multi...

Monte Carlo dose calculation using a cell processor based PlayStation 3 system

International Nuclear Information System (INIS)

Chow, James C L; Lam, Phil; Jaffray, David A

2012-01-01

This study investigates the performance of the EGSnrc computer code coupled with a Cell-based hardware in Monte Carlo simulation of radiation dose in radiotherapy. Performance evaluations of two processor-intensive functions namely, HOWNEAR and RANMAR G ET in the EGSnrc code were carried out basing on the 20-80 rule (Pareto principle). The execution speeds of the two functions were measured by the profiler gprof specifying the number of executions and total time spent on the functions. A testing architecture designed for Cell processor was implemented in the evaluation using a PlayStation3 (PS3) system. The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made. However, as the EGSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.
Monte Carlo dose calculation using a cell processor based PlayStation 3 system

Science.gov (United States)

Chow, James C. L.; Lam, Phil; Jaffray, David A.

2012-02-01

This study investigates the performance of the EGSnrc computer code coupled with a Cell-based hardware in Monte Carlo simulation of radiation dose in radiotherapy. Performance evaluations of two processor-intensive functions namely, HOWNEAR and RANMAR_GET in the EGSnrc code were carried out basing on the 20-80 rule (Pareto principle). The execution speeds of the two functions were measured by the profiler gprof specifying the number of executions and total time spent on the functions. A testing architecture designed for Cell processor was implemented in the evaluation using a PlayStation3 (PS3) system. The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made. However, as the EGSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.
Monte Carlo dose calculation using a cell processor based PlayStation 3 system

Energy Technology Data Exchange (ETDEWEB)

Chow, James C L; Lam, Phil; Jaffray, David A, E-mail: james.chow@rmp.uhn.on.ca [Department of Radiation Oncology, University of Toronto and Radiation Medicine Program, Princess Margaret Hospital, University Health Network, Toronto, Ontario M5G 2M9 (Canada)

2012-02-09

This study investigates the performance of the EGSnrc computer code coupled with a Cell-based hardware in Monte Carlo simulation of radiation dose in radiotherapy. Performance evaluations of two processor-intensive functions namely, HOWNEAR and RANMAR{sub G}ET in the EGSnrc code were carried out basing on the 20-80 rule (Pareto principle). The execution speeds of the two functions were measured by the profiler gprof specifying the number of executions and total time spent on the functions. A testing architecture designed for Cell processor was implemented in the evaluation using a PlayStation3 (PS3) system. The evaluation results show that the algorithms examined are readily parallelizable on the Cell platform, provided that an architectural change of the EGSnrc was made. However, as the EGSnrc performance was limited by the PowerPC Processing Element in the PS3, PC coupled with graphics processing units or GPCPU may provide a more viable avenue for acceleration.
High performance graphics processors for medical imaging applications

International Nuclear Information System (INIS)

Goldwasser, S.M.; Reynolds, R.A.; Talton, D.A.; Walsh, E.S.

1989-01-01

This paper describes a family of high- performance graphics processors with special hardware for interactive visualization of 3D human anatomy. The basic architecture expands to multiple parallel processors, each processor using pipelined arithmetic and logical units for high-speed rendering of Computed Tomography (CT), Magnetic Resonance (MR) and Positron Emission Tomography (PET) data. User-selectable display alternatives include multiple 2D axial slices, reformatted images in sagittal or coronal planes and shaded 3D views. Special facilities support applications requiring color-coded display of multiple datasets (such as radiation therapy planning), or dynamic replay of time- varying volumetric data (such as cine-CT or gated MR studies of the beating heart). The current implementation is a single processor system which generates reformatted images in true real time (30 frames per second), and shaded 3D views in a few seconds per frame. It accepts full scale medical datasets in their native formats, so that minimal preprocessing delay exists between data acquisition and display
Performance and operational experience with the heterogeneous farm of the ATLAS Trigger and Data Acquisition system.

CERN Document Server

Garelli, N; The ATLAS collaboration; Vandelli, W

2011-01-01

The ATLAS trigger and data acquisition (TDAQ) is a distributed, multi trigger level, data-acquisition system, mostly made of off-the-shelf processing units organized in a farm. In its final configuration the system will account more than 2000 nodes, sporting heterogeneous capabilities and network connectivities, due to the TDAQ program for rolling expansions and upgrades. In this paper we will present how we dealt with the farm heterogeneity during the proton-proton collisions of 2010 and 2011: a period characterized by changing working conditions, and constantly increasing LHC instantaneous luminosity. We will describe a graphical tool to show, control, modify and balance the computing-power and bandwidth sharing across the trigger farms, a data-flow monitoring daemon which provides a high-level resource-aware data-flow operational information, and the evolution of data-flow communication protocols.
Multiple optical code-label processing using multi-wavelength frequency comb generator and multi-port optical spectrum synthesizer.

Science.gov (United States)

Moritsuka, Fumi; Wada, Naoya; Sakamoto, Takahide; Kawanishi, Tetsuya; Komai, Yuki; Anzai, Shimako; Izutsu, Masayuki; Kodate, Kashiko

2007-06-11

In optical packet switching (OPS) and optical code division multiple access (OCDMA) systems, label generation and processing are key technologies. Recently, several label processors have been proposed and demonstrated. However, in order to recognize N different labels, N separate devices are required. Here, we propose and experimentally demonstrate a large-scale, multiple optical code (OC)-label generation and processing technology based on multi-port, a fully tunable optical spectrum synthesizer (OSS) and a multi-wavelength electro-optic frequency comb generator. The OSS can generate 80 different OC-labels simultaneously and can perform 80-parallel matched filtering. We also demonstrated its application to OCDMA.
RISC Processors and High Performance Computing

Science.gov (United States)

Bailey, David H.; Saini, Subhash; Craw, James M. (Technical Monitor)

1995-01-01

This tutorial will discuss the top five RISC microprocessors and the parallel systems in which they are used. It will provide a unique cross-machine comparison not available elsewhere. The effective performance of these processors will be compared by citing standard benchmarks in the context of real applications. The latest NAS Parallel Benchmarks, both absolute performance and performance per dollar, will be listed. The next generation of the NPB will be described. The tutorial will conclude with a discussion of future directions in the field. Technology Transfer Considerations: All of these computer systems are commercially available internationally. Information about these processors is available in the public domain, mostly from the vendors themselves. The NAS Parallel Benchmarks and their results have been previously approved numerous times for public release, beginning back in 1991.
JPP: A Java Pre-Processor

OpenAIRE

Kiniry, Joseph R.; Cheong, Elaine

1998-01-01

The Java Pre-Processor, or JPP for short, is a parsing pre-processor for the Java programming language. Unlike its namesake (the C/C++ Pre-Processor, cpp), JPP provides functionality above and beyond simple textual substitution. JPP's capabilities include code beautification, code standard conformance checking, class and interface specification and testing, and documentation generation.
Two-dimensional optoelectronic interconnect-processor and its operational bit error rate

Science.gov (United States)

Liu, J. Jiang; Gollsneider, Brian; Chang, Wayne H.; Carhart, Gary W.; Vorontsov, Mikhail A.; Simonis, George J.; Shoop, Barry L.

2004-10-01

Two-dimensional (2-D) multi-channel 8x8 optical interconnect and processor system were designed and developed using complementary metal-oxide-semiconductor (CMOS) driven 850-nm vertical-cavity surface-emitting laser (VCSEL) arrays and the photodetector (PD) arrays with corresponding wavelengths. We performed operation and bit-error-rate (BER) analysis on this free-space integrated 8x8 VCSEL optical interconnects driven by silicon-on-sapphire (SOS) circuits. Pseudo-random bit stream (PRBS) data sequence was used in operation of the interconnects. Eye diagrams were measured from individual channels and analyzed using a digital oscilloscope at data rates from 155 Mb/s to 1.5 Gb/s. Using a statistical model of Gaussian distribution for the random noise in the transmission, we developed a method to compute the BER instantaneously with the digital eye-diagrams. Direct measurements on this interconnects were also taken on a standard BER tester for verification. We found that the results of two methods were in the same order and within 50% accuracy. The integrated interconnects were investigated in an optoelectronic processing architecture of digital halftoning image processor. Error diffusion networks implemented by the inherently parallel nature of photonics promise to provide high quality digital halftoned images.
Clinical heterogeneity in Fabry disease

Directory of Open Access Journals (Sweden)

G. N. Salogub

2015-01-01

Full Text Available Fabry disease is an X-linked, lysosomal storage disease (OMIM: 301500, caused by α-galactosidase A deficiency, resulting in accumulation of its substrates, glycosphingolipids, primarily – globotriaosylceramide, in the lysosomes of multiple cell types with multi-system clinical manifestations, even within the same family, including abnormalities of the central and peripheral nervous system, kidneys, heart, gastrointestinal tract, lungs, organ of vision. Clinical heterogeneity is often the reason of the delayed diagnosis. Nowadays enzyme replacement therapy has proved its efficiency in the treatment of Fabry disease. Including Fabry disease in the differential diagnosis of a large range of disorders is important because of its wide clinical heterogeneity and the possibility of an earlier intervention with a beneficial treatment.
Design of 10Gbps optical encoder/decoder structure for FE-OCDMA system using SOA and opto-VLSI processors.

Science.gov (United States)

Aljada, Muhsen; Hwang, Seow; Alameh, Kamal

2008-01-21

In this paper we propose and experimentally demonstrate a reconfigurable 10Gbps frequency-encoded (1D) encoder/decoder structure for optical code division multiple access (OCDMA). The encoder is constructed using a single semiconductor optical amplifier (SOA) and 1D reflective Opto-VLSI processor. The SOA generates broadband amplified spontaneous emission that is dynamically sliced using digital phase holograms loaded onto the Opto-VLSI processor to generate 1D codewords. The selected wavelengths are injected back into the same SOA for amplifications. The decoder is constructed using single Opto-VLSI processor only. The encoded signal can successfully be retrieved at the decoder side only when the digital phase holograms of the encoder and the decoder are matched. The system performance is measured in terms of the auto-correlation and cross-correlation functions as well as the eye diagram.
TENTACLE Multi-Camera Immersive Surveillance System Phase 2

Science.gov (United States)

2015-04-16

variants available. For a video with 640x480 frame resolution the CPU CIEL*a*b* performs at 11 FPS (Frames Per Second) on a quad core Xeon processor ...running at 2.67 GHz. For the same video the GPU variant of CIEL*a*b* background model processing speed is 44 FPS on the same processor with an Nvidia...The model is also sometimes called HMAX (Hierarchical Model and X). The BIM tries to build a system that emulates object recognition in human cortex
Measuring the effects of heterogeneity on distributed systems

Science.gov (United States)

El-Toweissy, Mohamed; Zeineldine, Osman; Mukkamala, Ravi

1991-01-01

Distributed computer systems in daily use are becoming more and more heterogeneous. Currently, much of the design and analysis studies of such systems assume homogeneity. This assumption of homogeneity has been mainly driven by the resulting simplicity in modeling and analysis. A simulation study is presented which investigated the effects of heterogeneity on scheduling algorithms for hard real time distributed systems. In contrast to previous results which indicate that random scheduling may be as good as a more complex scheduler, this algorithm is shown to be consistently better than a random scheduler. This conclusion is more prevalent at high workloads as well as at high levels of heterogeneity.
Heterogeneous computing with OpenCL

CERN Document Server

2013-01-01

Heterogeneous Computing with OpenCL teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs) such as AMD Fusion technology. Designed to work on multiple platforms and with wide industry support, OpenCL will help you more effectively program for a heterogeneous future. Written by leaders in the parallel computing and OpenCL communities, this book will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms. The authors explore memory spaces, optimization techniques, graphics interoperability, extensions, and debugging and profiling. Intended to support a parallel programming course, Heterogeneous Computing with OpenCL includes detailed examples throughout, plus additional online exercises and other supporting materials.
Analysis of simulated triples γ-ray data on a 100-processor transputer system

International Nuclear Information System (INIS)

Loennroth, T.; Hattula, J.; Julin, R.; Lampinen, A.; Aspnaes, M.; Back, R.J.R.; Granlund, J.; Waxlax, P.

1993-01-01

We present a study of the administration and analysis of simulated nuclear 3-fold coincidences implemented on a 100-processor transputer system, which has a computational capacity of about 150 Mflops. From a simple model of a level scheme the corresponding 'data' is extracted in the form of three-dimensional addresses with background. The data structure and the analysis rate is studied. Further, a set of real data is used to estimate the rate of sort to be performed. The feasibility of using such a system to analyse large data spaces is discussed. (orig.)
Asymmetrical floating point array processors, their application to exploration and exploitation

Energy Technology Data Exchange (ETDEWEB)

Geriepy, B L

1983-01-01

An asymmetrical floating point array processor is a special-purpose scientific computer which operates under asymmetrical control of a host computer. Although an array processor can receive fixed point input and produce fixed point output, its primary mode of operation is floating point. The first generation of array processors was oriented towards time series information. The next generation of array processors has proved much more versatile and their applicability ranges from petroleum reservoir simulation to speech syntheses. Array processors are becoming commonplace in mining, the primary usage being construction of grids-by usual methods or by kriging. The Australian mining community is among the world's leaders in regard to computer-assisted exploration and exploitation systems. Part of this leadership role must be providing guidance to computer vendors in regard to current and future requirements.
Collective Motion in Behaviorally Heterogeneous Systems

Science.gov (United States)

Copenhagen, Katherine

Collective motion is a widespread phenomenon in nature where individuals actively propel themselves, gather together and move as a group. Some examples of collective motion are bird flocks, fish schools, bacteria swarms, cell clusters, and crowds of people. Many models seek to understand the effects of activity in collective systems including things such as environmental disorder, density, and interaction details primarily at infinite size limits and with uniform populations. In this dissertation I investigate the effects of finite sizes and behavioral heterogeneity as it exists in nature. Behavioral heterogeneity can originate from several different sources. Mixed populations of individuals can have inherently different behaviors such as mutant bacteria, injured fish, or agents that prefer individualistic behavior over coordinated motion. Alternatively, agents may modify their own behavior based on some local environmental dependency, such as local substrate, or density. In cases such as mutant cheaters in bacteria or malfunctioning drones in swarms, mixed populations of behaviorally heterogeneous agents can be modelled as arising in the form of aligning and non-aligning agents. When this kind of heterogeneity is introduced, there is a critical carrying capacity of non-aligners above which the system is unable to form a cohesive ordered group. However, if the cohesion of the group is relaxed to allow for fracture, the system will actively sort out non-aligning agents the system will exist at a critical non-aligner fraction. A similar heterogeneity could result in a mixture of high and low noise individuals. In this case there is also a critical carry capacity beyond which the system is unable to reach an ordered state, however the nature of this transition depends on the model details. Agents which are part of an ordered collective may vary their behavior as the group changes environments such as a flock of birds flying into a cloud. Using a unique model of a
Researching, building a soft-processor and Ethernet interface circuit using EDK

International Nuclear Information System (INIS)

Tuong Thi Thu Huong; Pham Ngoc Tuan; Truong Van Dat, Dang Lanh; Chau Thi Nhu Quynh

2014-01-01

The processor is an indispensable component in the measurement and automatic control systems. This report describes the fabrication of a soft-processor (32-bits, on-chip block RAM 64K, 50M clock, internal and peripheral bus) for receiving, sending and processing of data Ethernet packets. This processor is fabricated using the XPS component from EDK (Xilinx) software toolkit. After that, it is configured on the FPGA named Spartan XC3S500E circuit. A firmware of a processor for controlling the interface between processor and Ethernet port is written in C language and can play a role of a HOST (station) which has its own IP to connect to Ethernet network. Besides, there are some needed parts as follows: an Ethernet interfacing controller chip, a suitable cable providing a speed up to 100 Mbs and an application program running under Window XP environment written in LabView to communicate with soft-processor. (author)
Mr-Moose: An advanced SED-fitting tool for heterogeneous multi-wavelength datasets

Science.gov (United States)

Drouart, G.; Falkendal, T.

2018-04-01

We present the public release of Mr-Moose, a fitting procedure that is able to perform multi-wavelength and multi-object spectral energy distribution (SED) fitting in a Bayesian framework. This procedure is able to handle a large variety of cases, from an isolated source to blended multi-component sources from an heterogeneous dataset (i.e. a range of observation sensitivities and spectral/spatial resolutions). Furthermore, Mr-Moose handles upper-limits during the fitting process in a continuous way allowing models to be gradually less probable as upper limits are approached. The aim is to propose a simple-to-use, yet highly-versatile fitting tool fro handling increasing source complexity when combining multi-wavelength datasets with fully customisable filter/model databases. The complete control of the user is one advantage, which avoids the traditional problems related to the "black box" effect, where parameter or model tunings are impossible and can lead to overfitting and/or over-interpretation of the results. Also, while a basic knowledge of Python and statistics is required, the code aims to be sufficiently user-friendly for non-experts. We demonstrate the procedure on three cases: two artificially-generated datasets and a previous result from the literature. In particular, the most complex case (inspired by a real source, combining Herschel, ALMA and VLA data) in the context of extragalactic SED fitting, makes Mr-Moose a particularly-attractive SED fitting tool when dealing with partially blended sources, without the need for data deconvolution.
First Results of an “Artificial Retina” Processor Prototype

International Nuclear Information System (INIS)

Cenci, Riccardo; Bedeschi, Franco; Marino, Pietro; Morello, Michael J.; Ninci, Daniele; Piucci, Alessio; Punzi, Giovanni; Ristori, Luciano; Spinella, Franco; Stracka, Simone; Tonelli, Diego; Walsh, John

2016-01-01

We report on the performance of a specialized processor capable of reconstructing charged particle tracks in a realistic LHC silicon tracker detector, at the same speed of the readout and with sub-microsecond latency. The processor is based on an innovative pattern-recognition algorithm, called “artificial retina algorithm”, inspired from the vision system of mammals. A prototype of the processor has been designed, simulated, and implemented on Tel62 boards equipped with high-bandwidth Altera Stratix III FPGA devices. The prototype is the first step towards a real-time track reconstruction device aimed at processing complex events of high-luminosity LHC experiments at 40 MHz crossing rate

Processors for wavelet analysis and synthesis: NIFS and TI-C80 MVP

Science.gov (United States)

Brooks, Geoffrey W.

1996-03-01

Two processors are considered for image quadrature mirror filtering (QMF). The neuromorphic infrared focal-plane sensor (NIFS) is an existing prototype analog processor offering high speed spatio-temporal Gaussian filtering, which could be used for the QMF low- pass function, and difference of Gaussian filtering, which could be used for the QMF high- pass function. Although not designed specifically for wavelet analysis, the biologically- inspired system accomplishes the most computationally intensive part of QMF processing. The Texas Instruments (TI) TMS320C80 Multimedia Video Processor (MVP) is a 32-bit RISC master processor with four advanced digital signal processors (DSPs) on a single chip. Algorithm partitioning, memory management and other issues are considered for optimal performance. This paper presents these considerations with simulated results leading to processor implementation of high-speed QMF analysis and synthesis.
Accelerating 3D Elastic Wave Equations on Knights Landing based Intel Xeon Phi processors

Science.gov (United States)

Sourouri, Mohammed; Birger Raknes, Espen

2017-04-01

In advanced imaging methods like reverse-time migration (RTM) and full waveform inversion (FWI) the elastic wave equation (EWE) is numerically solved many times to create the seismic image or the elastic parameter model update. Thus, it is essential to optimize the solution time for solving the EWE as this will have a major impact on the total computational cost in running RTM or FWI. From a computational point of view applications implementing EWEs are associated with two major challenges. The first challenge is the amount of memory-bound computations involved, while the second challenge is the execution of such computations over very large datasets. So far, multi-core processors have not been able to tackle these two challenges, which eventually led to the adoption of accelerators such as Graphics Processing Units (GPUs). Compared to conventional CPUs, GPUs are densely populated with many floating-point units and fast memory, a type of architecture that has proven to map well to many scientific computations. Despite its architectural advantages, full-scale adoption of accelerators has yet to materialize. First, accelerators require a significant programming effort imposed by programming models such as CUDA or OpenCL. Second, accelerators come with a limited amount of memory, which also require explicit data transfers between the CPU and the accelerator over the slow PCI bus. The second generation of the Xeon Phi processor based on the Knights Landing (KNL) architecture, promises the computational capabilities of an accelerator but require the same programming effort as traditional multi-core processors. The high computational performance is realized through many integrated cores (number of cores and tiles and memory varies with the model) organized in tiles that are connected via a 2D mesh based interconnect. In contrary to accelerators, KNL is a self-hosted system, meaning explicit data transfers over the PCI bus are no longer required. However, like most
High performance deformable image registration algorithms for manycore processors

CERN Document Server

Shackleford, James; Sharp, Gregory

2013-01-01

High Performance Deformable Image Registration Algorithms for Manycore Processors develops highly data-parallel image registration algorithms suitable for use on modern multi-core architectures, including graphics processing units (GPUs). Focusing on deformable registration, we show how to develop data-parallel versions of the registration algorithm suitable for execution on the GPU. Image registration is the process of aligning two or more images into a common coordinate frame and is a fundamental step to be able to compare or fuse data obtained from different sensor measurements. E
Compact gasoline fuel processor for passenger vehicle APU

Science.gov (United States)

Severin, Christopher; Pischinger, Stefan; Ogrzewalla, Jürgen

Due to the increasing demand for electrical power in today's passenger vehicles, and with the requirements regarding fuel consumption and environmental sustainability tightening, a fuel cell-based auxiliary power unit (APU) becomes a promising alternative to the conventional generation of electrical energy via internal combustion engine, generator and battery. It is obvious that the on-board stored fuel has to be used for the fuel cell system, thus, gasoline or diesel has to be reformed on board. This makes the auxiliary power unit a complex integrated system of stack, air supply, fuel processor, electrics as well as heat and water management. Aside from proving the technical feasibility of such a system, the development has to address three major barriers:start-up time, costs, and size/weight of the systems. In this paper a packaging concept for an auxiliary power unit is presented. The main emphasis is placed on the fuel processor, as good packaging of this large subsystem has the strongest impact on overall size. The fuel processor system consists of an autothermal reformer in combination with water-gas shift and selective oxidation stages, based on adiabatic reactors with inter-cooling. The configuration was realized in a laboratory set-up and experimentally investigated. The results gained from this confirm a general suitability for mobile applications. A start-up time of 30 min was measured, while a potential reduction to 10 min seems feasible. An overall fuel processor efficiency of about 77% was measured. On the basis of the know-how gained by the experimental investigation of the laboratory set-up a packaging concept was developed. Using state-of-the-art catalyst and heat exchanger technology, the volumes of these components are fixed. However, the overall volume is higher mainly due to mixing zones and flow ducts, which do not contribute to the chemical or thermal function of the system. Thus, the concept developed mainly focuses on minimization of those
Modelling complex systems of heterogeneous agents to better design sustainability transitions policy

NARCIS (Netherlands)

Mercure, J.F.A.; Pollitt, H.; Bassi, A.M.; Viñuales, J.E.; Edwards, N.R.

2016-01-01

This article proposes a fundamental methodological shift in the modelling of policy interventions for sustainability transitions in order to account for complexity (e.g. self-reinforcing mechanisms, such as technology lock-ins, arising from multi-agent interactions) and agent heterogeneity (e.g.
Preliminary design of an advanced programmable digital filter network for large passive acoustic ASW systems. [Parallel processor

Energy Technology Data Exchange (ETDEWEB)

McWilliams, T.; Widdoes, Jr., L. C.; Wood, L.

1976-09-30

The design of an extremely high performance programmable digital filter of novel architecture, the LLL Programmable Digital Filter, is described. The digital filter is a high-performance multiprocessor having general purpose applicability and high programmability; it is extremely cost effective either in a uniprocessor or a multiprocessor configuration. The architecture and instruction set of the individual processor was optimized with regard to the multiple processor configuration. The optimal structure of a parallel processing system was determined for addressing the specific Navy application centering on the advanced digital filtering of passive acoustic ASW data of the type obtained from the SOSUS net. 148 figures. (RWR)
A Versatile Image Processor For Digital Diagnostic Imaging And Its Application In Computed Radiography

Science.gov (United States)

Blume, H.; Alexandru, R.; Applegate, R.; Giordano, T.; Kamiya, K.; Kresina, R.

1986-06-01

In a digital diagnostic imaging department, the majority of operations for handling and processing of images can be grouped into a small set of basic operations, such as image data buffering and storage, image processing and analysis, image display, image data transmission and image data compression. These operations occur in almost all nodes of the diagnostic imaging communications network of the department. An image processor architecture was developed in which each of these functions has been mapped into hardware and software modules. The modular approach has advantages in terms of economics, service, expandability and upgradeability. The architectural design is based on the principles of hierarchical functionality, distributed and parallel processing and aims at real time response. Parallel processing and real time response is facilitated in part by a dual bus system: a VME control bus and a high speed image data bus, consisting of 8 independent parallel 16-bit busses, capable of handling combined up to 144 MBytes/sec. The presented image processor is versatile enough to meet the video rate processing needs of digital subtraction angiography, the large pixel matrix processing requirements of static projection radiography, or the broad range of manipulation and display needs of a multi-modality diagnostic work station. Several hardware modules are described in detail. For illustrating the capabilities of the image processor, processed 2000 x 2000 pixel computed radiographs are shown and estimated computation times for executing the processing opera-tions are presented.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

Science.gov (United States)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; Masciovecchio, Mario; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2017-08-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

Directory of Open Access Journals (Sweden)

Cerati Giuseppe

2017-01-01

Full Text Available For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU, ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC, for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Processors and GPUs

Energy Technology Data Exchange (ETDEWEB)

Cerati, Giuseppe [Fermilab; Elmer, Peter [Princeton U.; Krutelyov, Slava [UC, San Diego; Lantz, Steven [Cornell U.; Lefebvre, Matthieu [Princeton U.; Masciovecchio, Mario [UC, San Diego; McDermott, Kevin [Cornell U.; Riley, Daniel [Cornell U., LNS; Tadel, Matevž [UC, San Diego; Wittich, Peter [Cornell U.; Würthwein, Frank [UC, San Diego; Yagil, Avi [UC, San Diego

2017-01-01

For over a decade now, physical and energy constraints have limited clock speed improvements in commodity microprocessors. Instead, chipmakers have been pushed into producing lower-power, multi-core processors such as Graphical Processing Units (GPU), ARM CPUs, and Intel MICs. Broad-based efforts from manufacturers and developers have been devoted to making these processors user-friendly enough to perform general computations. However, extracting performance from a larger number of cores, as well as specialized vector or SIMD units, requires special care in algorithm design and code optimization. One of the most computationally challenging problems in high-energy particle experiments is finding and fitting the charged-particle tracks during event reconstruction. This is expected to become by far the dominant problem at the High-Luminosity Large Hadron Collider (HL-LHC), for example. Today the most common track finding methods are those based on the Kalman filter. Experience with Kalman techniques on real tracking detector systems has shown that they are robust and provide high physics performance. This is why they are currently in use at the LHC, both in the trigger and offine. Previously we reported on the significant parallel speedups that resulted from our investigations to adapt Kalman filters to track fitting and track building on Intel Xeon and Xeon Phi. Here, we discuss our progresses toward the understanding of these processors and the new developments to port the Kalman filter to NVIDIA GPUs.
CellPilot: Seamless communication within Cell BE and heterogeneous clusters

International Nuclear Information System (INIS)

Girard, N; Carter, J; Gardner, W B; Grewal, G

2010-01-01

The Pilot library is targeted to novice scientific programmers within High Performance Computing. The CellPilot library extends the Pilot library to the Cell Broadband Engine processor and heterogeneous clusters. Using Pilot's process and channel abstractions, the CellPilot library can create a process on any of the processor types, both PPEs and SPEs, across the cluster. Communication is achieved by creating a channel between any two processes, and using the write/read channel functions in the participating processes. The CellPilot library uses MPI for the inter-node communication and the Cell SDK within a Cell node. All the architecture specific details of Cell communications are hidden from the user.
Structure and performance of a nuclear medicine organisation and information system (NOISY)

International Nuclear Information System (INIS)

Spitz, J.; Schenk, S.

1989-01-01

On the basis of an INTEL 80386 32-bit processor a multi-user, multi-task system was designed, using UNIX and INFORMIX as basic software. By connecting all existing subsystems (gamma-camera, RIA-processor, personal-computers) and as much as ten intelligent terminals to the central processor, the system is able to handle all the data administration of the department. This includes permanent patient data as well as the results on in-vivo and in-vitro diagnostic procedures, their statistical evaluation and billing procedures. The most important feature of the realized concept is, that there exists only one set of permanent data for each patient, to which incomming informations are attached. By this principle, routine work load is reduced as effectively as errors and mistakes in patient identification. Furthermore diagnostic quality is improved as all the data of a patient are present where and whenever they are needed within the department. Additional progress in efficiency will be gained by connecting the system to other processors of the hospital. Another advantage of the system compared with former concepts is the decentralized data collection and processing in autonomous subsystems (gamm-camera, RIA-processor, personal-computer). In case of defective central processor, routine work can go on and the update of results for data administration is only delayed. The system is useful to meet the growing demands in effective data administration, handling scientific questions and reducing archiving problems. (orig./MG) [de
Digital control card based on digital signal processor

International Nuclear Information System (INIS)

Hou Shigang; Yin Zhiguo; Xia Le

2008-01-01

A digital control card based on digital signal processor was developed. Two Freescale DSP-56303 processors were utilized to achieve 3 channels proportional- integral-differential regulations. The card offers high flexibility for 100 MeV cyclotron RF system development. It was used as feedback controller in low level radio frequency control prototype, with the feedback gain parameters continuously adjustable. By using high precision analog to digital converter with 500 kHz sampling rate, a regulation bandwidth of 20 kHz was achieved. (authors)
High-Dimensional Adaptive Particle Swarm Optimization on Heterogeneous Systems

International Nuclear Information System (INIS)

Wachowiak, M P; Sarlo, B B; Foster, A E Lambe

2014-01-01

Much work has recently been reported in parallel GPU-based particle swarm optimization (PSO). Motivated by the encouraging results of these investigations, while also recognizing the limitations of GPU-based methods for big problems using a large amount of data, this paper explores the efficacy of employing other types of parallel hardware for PSO. Most commodity systems feature a variety of architectures whose high-performance capabilities can be exploited. In this paper, high-dimensional problems and those that employ a large amount of external data are explored within the context of heterogeneous systems. Large problems are decomposed into constituent components, and analyses are undertaken of which components would benefit from multi-core or GPU parallelism. The current study therefore provides another demonstration that ''supercomputing on a budget'' is possible when subtasks of large problems are run on hardware most suited to these tasks. Experimental results show that large speedups can be achieved on high dimensional, data-intensive problems. Cost functions must first be analysed for parallelization opportunities, and assigned hardware based on the particular task
Review of trigger and on-line processors at SLAC

International Nuclear Information System (INIS)

Lankford, A.J.

1984-07-01

The role of trigger and on-line processors in reducing data rates to manageable proportions in e + e - physics experiments is defined not by high physics or background rates, but by the large event sizes of the general-purpose detectors employed. The rate of e + e - annihilation is low, and backgrounds are not high; yet the number of physics processes which can be studied is vast and varied. This paper begins by briefly describing the role of trigger processors in the e + e - context. The usual flow of the trigger decision process is illustrated with selected examples of SLAC trigger processing. The features are mentioned of triggering at the SLC and the trigger processing plans of the two SLC detectors: The Mark II and the SLD. The most common on-line processors at SLAC, the BADC, the SLAC Scanner Processor, the SLAC FASTBUS Controller, and the VAX CAMAC Channel, are discussed. Uses of the 168/E, 3081/E, and FASTBUS VAX processors are mentioned. The manner in which these processors are interfaced and the function they serve on line is described. Finally, the accelerator control system for the SLC is outlined. This paper is a survey in nature, and hence, relies heavily upon references to previous publications for detailed description of work mentioned here. 27 references, 9 figures, 1 table
Scalable Parallelization of Skyline Computation for Multi-core Processors

DEFF Research Database (Denmark)

Chester, Sean; Sidlauskas, Darius; Assent, Ira

2015-01-01

The skyline is an important query operator for multi-criteria decision making. It reduces a dataset to only those points that offer optimal trade-offs of dimensions. In general, it is very expensive to compute. Recently, multi-core CPU algorithms have been proposed to accelerate the computation...... of the skyline. However, they do not sufficiently minimize dominance tests and so are not competitive with state-of-the-art sequential algorithms. In this paper, we introduce a novel multi-core skyline algorithm, Hybrid, which processes points in blocks. It maintains a shared, global skyline among all threads...
How to harness the performance potential of current multi-core processors

International Nuclear Information System (INIS)

Jarp, Sverre; Lazzaro, Alfio; Leduc, Julien; Nowak, Andrzej

2011-01-01

Leakage currents have put a stop to the semiconductor industry's ability to increase processor frequency in order to enhance the performance of new microprocessors. Instead, we observe a slew of changes inside the micro-architecture with an aim of enhancing the performance. Several of these changes, however, do not translate into automatic speed improvements for the software. This paper discusses the increased complexity of modern microprocessors by separating out into dimensions each feature that impacts performance and mentions briefly ways of improving software, in particular that of the High Energy Physics community, to take full advantage.
Compiling the functional data-parallel language SaC for Microgrids of Self-Adaptive Virtual Processors

NARCIS (Netherlands)

Grelck, C.; Herhut, S.; Jesshope, C.; Joslin, C.; Lankamp, M.; Scholz, S.-B.; Shafarenko, A.

2009-01-01

We present preliminary results from compiling the high-level, functional and data-parallel programming language SaC into a novel multi-core design: Microgrids of Self-Adaptive Virtual Processors (SVPs). The side-effect free nature of SaC in conjunction with its data-parallel foundation make it an
Catalyst development and systems analysis of methanol partial oxidation for the fuel processor - fuel cell integration

Energy Technology Data Exchange (ETDEWEB)

Newson, E; Mizsey, P; Hottinger, P; Truong, T B; Roth, F von; Schucan, Th H [Paul Scherrer Inst. (PSI), Villigen (Switzerland)

1999-08-01

Methanol partial oxidation (pox) to produce hydrogen for mobile fuel cell applications has proved initially more successful than hydrocarbon pox. Recent results of catalyst screening and kinetic studies with methanol show that hydrogen production rates have reached 7000 litres/hour/(litre reactor volume) for the dry pox route and 12,000 litres/hour/(litre reactor volume) for wet pox. These rates are equivalent to 21 and 35 kW{sub th}/(litre reactor volume) respectively. The reaction engineering problems remain to be solved for dry pox due to the significant exotherm of the reaction (hot spots of 100-200{sup o}C), but wet pox is essentially isothermal in operation. Analyses of the integrated fuel processor - fuel cell systems show that two routes are available to satisfy the sensitivity of the fuel cell catalysts to carbon monoxide, i.e. a preferential oxidation reactor or a membrane separator. Targets for individual system components are evaluated for the base and best case systems for both routes to reach the combined 40% efficiency required for the integrated fuel processor - fuel cell system. (author) 2 figs., 1 tab., 3 refs.
The Associative Memory system for the FTK processor at ATLAS

CERN Document Server

Cipriani, R; The ATLAS collaboration; Donati, S; Giannetti, P; Lanza, A; Luciano, P; Magalotti, D; Piendibene, M

2013-01-01

Experiments at the LHC hadron collider search for extremely rare processes hidden in much larger background levels. As the experiment complexity, the accelerator backgrounds and instantaneus luminosity increase, increasingly complex and exclusive selections are necessary. We present results and performances of a new prototype of Associative Memory (AM) system, the core of the Fast Tracker processor (FTK). FTK is a real time tracking device for the ATLAS experiment trigger upgrade. The AM system provides massive computing power to minimize the online execution time of complex tracking algorithms. The time consuming pattern recognition problem, generally referred to as the "combinatorial challenge", is beat by the AM technology exploiting parallelism to the maximum level. The Associative Memory compares the event to pre-calculated "expectations" or "patterns" (pattern matching) at once and look for candidate tracks called "roads". The problem is solved by the time data are loaded into the AM devices. We report ...

35-We polymer electrolyte membrane fuel cell system for notebook computer using a compact fuel processor

Science.gov (United States)

Son, In-Hyuk; Shin, Woo-Cheol; Lee, Yong-Kul; Lee, Sung-Chul; Ahn, Jin-Gu; Han, Sang-Il; kweon, Ho-Jin; Kim, Ju-Yong; Kim, Moon-Chan; Park, Jun-Yong

A polymer electrolyte membrane fuel cell (PEMFC) system is developed to power a notebook computer. The system consists of a compact methanol-reforming system with a CO preferential oxidation unit, a 16-cell PEMFC stack, and a control unit for the management of the system with a d.c.-d.c. converter. The compact fuel-processor system (260 cm 3) generates about 1.2 L min -1 of reformate, which corresponds to 35 We, with a low CO concentration (notebook computers.
The hardware implementation of the CERN SPS ultrafast feedback processor demonstrator

CERN Document Server

Dusakto, J E; Fox, J D; Olsen, J; Rivetta, C H; Höfle, W

2013-01-01

An ultrafast 4GSa/s transverse feedback processor has been developed for proof-of-concept studies of feedback control of e-cloud driven and transverse mode coupled intra-bunch instabilities in the CERN SPS. This system consists of a high-speed ADC on the front end and equally fast DAC on the back end. All control and signal processing is implemented in FPGA logic. This system is capable of taking up to 16 sample slices across a single SPS bunch and processing each slice individually within a reconfigurable signal processor. This demonstrator system is a rapidly developed prototype, consisting of both commercial and custom-design components. It can stabilize the motion of a single particle bunch using closed loop feedback. The system can also run open loop as a high-speed arbitrary waveform generator and contains diagnostic features including a special ADC snapshot capture memory. This paper describes the overall system, the feedback processor and focuses on the hardware architecture, design ...
List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor

Science.gov (United States)

Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.

2014-03-01

List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging
PEM Fuel Cells with Bio-Ethanol Processor Systems A Multidisciplinary Study of Modelling, Simulation, Fault Diagnosis and Advanced Control

CERN Document Server

Feroldi, Diego; Outbib, Rachid

2012-01-01

An apparently appropriate control scheme for PEM fuel cells may actually lead to an inoperable plant when it is connected to other unit operations in a process with recycle streams and energy integration. PEM Fuel Cells with Bio-Ethanol Processor Systems presents a control system design that provides basic regulation of the hydrogen production process with PEM fuel cells. It then goes on to construct a fault diagnosis system to improve plant safety above this control structure. PEM Fuel Cells with Bio-Ethanol Processor Systems is divided into two parts: the first covers fuel cells and the second discusses plants for hydrogen production from bio-ethanol to feed PEM fuel cells. Both parts give detailed analyses of modeling, simulation, advanced control, and fault diagnosis. They give an extensive, in-depth discussion of the problems that can occur in fuel cell systems and propose a way to control these systems through advanced control algorithms. A significant part of the book is also given over to computer-aid...
Emergency product generation for disaster management using RISAT and DMSAR quick look SAR processors

Science.gov (United States)

Desai, Nilesh; Sharma, Ritesh; Kumar, Saravana; Misra, Tapan; Gujraty, Virendra; Rana, SurinderSingh

2006-12-01

Since last few years, ISRO has embarked upon the development of two complex Synthetic Aperture Radar (SAR) missions, viz. Spaceborne Radar Imaging Satellite (RISAT) and Airborne SAR for Disaster Mangement (DMSAR), as a capacity building measure under country's Disaster Management Support (DMS) Program, for estimating the extent of damage over large areas (~75 Km) and also assess the effectiveness of the relief measures undertaken during natural disasters such as cyclones, epidemics, earthquakes, floods and landslides, forest fires, crop diseases etc. Synthetic Aperture Radar (SAR) has an unique role to play in mapping and monitoring of large areas affected by natural disasters especially floods, owing to its unique capability to see through clouds as well as all-weather imaging capability. The generation of SAR images with quick turn around time is very essential to meet the above DMS objectives. Thus the development of SAR Processors, for these two SAR systems poses considerable challenges and design efforts. Considering the growing user demand and inevitable necessity for a full-fledged high throughput processor, to process SAR data and generate image in real or near-real time, the design and development of a generic SAR Processor has been taken up and evolved, which will meet the SAR processing requirements for both Airborne and Spaceborne SAR systems. This hardware SAR processor is being built, to the extent possible, using only Commercial-Off-The-Shelf (COTS) DSP and other hardware plug-in modules on a Compact PCI (cPCI) platform. Thus, the major thrust has been on working out Multi-processor Digital Signal Processor (DSP) architecture and algorithm development and optimization rather than hardware design and fabrication. For DMSAR, this generic SAR Processor operates as a Quick Look SAR Processor (QLP) on-board the aircraft to produce real time full swath DMSAR images and as a ground based Near-Real Time high precision full swath Processor (NRTP). It will
Optimization of Hierarchically Scheduled Heterogeneous Embedded Systems

DEFF Research Database (Denmark)

Pop, Traian; Pop, Paul; Eles, Petru

2005-01-01

We present an approach to the analysis and optimization of heterogeneous distributed embedded systems. The systems are heterogeneous not only in terms of hardware components, but also in terms of communication protocols and scheduling policies. When several scheduling policies share a resource......, they are organized in a hierarchy. In this paper, we address design problems that are characteristic to such hierarchically scheduled systems: assignment of scheduling policies to tasks, mapping of tasks to hardware components, and the scheduling of the activities. We present algorithms for solving these problems....... Our heuristics are able to find schedulable implementations under limited resources, achieving an efficient utilization of the system. The developed algorithms are evaluated using extensive experiments and a real-life example....
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets

Directory of Open Access Journals (Sweden)

Pielot Rainer

2010-01-01

Full Text Available Abstract Background Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE, a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. Results We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. Conclusions The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.
Sojourn times in finite-capacity processor-sharing queues

NARCIS (Netherlands)

Borst, S.C.; Boxma, O.J.; Hegde, N.

2005-01-01

Motivated by the need to develop simple parsimonious models for evaluating the performance of wireless data systems, we consider finite-capacity processor-sharing systems. For such systems, we analyze the sojourn time distribution, which presents a useful measure for the transfer delay of documents
Development of the new trigger processor board for the ATLAS Level-1 endcap muon trigger for Run-3

CERN Document Server

AUTHOR|(INSPIRE)INSPIRE-00525035; The ATLAS collaboration

2017-01-01

The instantaneous luminosity of the LHC will be increased by up to a factor of three with respect to the original design value at Run-3 (starting 2021). The ATLAS Level-1 end-cap muon trigger in LHC Run-3 will identify muons by combining data from the Thin-Gap Chamber detector (TGC) and the New Small Wheel (NSW), which is a new detector and will be able to operate in a high background hit rate at Run-3, to suppress the Level-1 trigger rate. In order to handle data from both TGC and NSW, a new trigger processor board has been developed. The board has a modern FPGA to make use of Multi-Gigabit transceiver technology. The readout system for trigger data has also been designed with TCP/IP instead of a dedicated ASIC. This letter presents the electronics and its firmware of the ATLAS Level-1 end-cap muon trigger processor board for LHC Run-3.
H∞ Consensus for Multiagent Systems with Heterogeneous Time-Varying Delays

Directory of Open Access Journals (Sweden)

Beibei Wang

2013-01-01

Full Text Available We apply the linear matrix inequality method to consensus and H∞ consensus problems of the single integrator multiagent system with heterogeneous delays in directed networks. To overcome the difficulty caused by heterogeneous time-varying delays, we rewrite the multiagent system into a partially reduced-order system and an integral system. As a result, a particular Lyapunov function is constructed to derive sufficient conditions for consensus of multiagent systems with fixed (switched topologies. We also apply this method to the H∞ consensus of multiagent systems with disturbances and heterogeneous delays. Numerical examples are given to illustrate the theoretical results.
Conceptual design for controller software of mechatronic systems

NARCIS (Netherlands)

Broenink, Johannes F.; Hilderink, G.H.; Bakkers, André; Bradshaw, Alan; Counsell, John

1998-01-01

The method and software tool presented here, aims at supporting the development of control software for mechatronic systems. Heterogeneous distributed embedded processors are considered as target hardware. Principles of the method are that the implementation process is a stepwise refinement from
Mapping High-Resolution Soil Moisture over Heterogeneous Cropland Using Multi-Resource Remote Sensing and Ground Observations

Directory of Open Access Journals (Sweden)

Lei Fan

2015-10-01

Full Text Available High spatial resolution soil moisture (SM data are crucial in agricultural applications, river-basin management, and understanding hydrological processes. Merging multi-resource observations is one of the ways to improve the accuracy of high spatial resolution SM data in the heterogeneous cropland. In this paper, the Bayesian Maximum Entropy (BME methodology is implemented to merge the following four types of observed data to obtain the spatial distribution of SM at 100 m scale: soil moisture observed by wireless sensor network (WSN, Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER-derived soil evaporative efficiency (SEE, irrigation statistics, and Polarimetric L-band Multi-beam Radiometer (PLMR-derived SM products (~700 m. From the poor BME predictions obtained by merging only WSN and SEE data, we observed that the SM heterogeneity caused by irrigation and the attenuating sensitivity of the SEE data to SM caused by the canopies result in BME prediction errors. By adding irrigation statistics to the merged datasets, the overall RMSD of the BME predictions during the low-vegetated periods can be successively reduced from 0.052 m3·m−3 to 0.033 m3·m−3. The coefficient of determination (R2 and slope between the predicted and in situ measured SM data increased from 0.32 to 0.64 and from 0.38 to 0.82, respectively, but large estimation errors occurred during the moderately vegetated periods (RMSD = 0.041 m3·m−3, R = 0.43 and the slope = 0.41. Further adding the downscaled SM information from PLMR SM products to the merged datasets, the predictions were satisfactorily accurate with an RMSD of 0.034 m3·m−3, R2 of 0.4 and a slope of 0.69 during moderately vegetated periods. Overall, the results demonstrated that merging multi-resource observations into SM estimations can yield improved accuracy in heterogeneous cropland.
DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

Directory of Open Access Journals (Sweden)

Kaufmann Michael

2004-09-01

Full Text Available Abstract Background Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Results Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. Conclusions By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
SLAC Scanner Processor applications in the data acquisition system for the upgraded Mark II detector

International Nuclear Information System (INIS)

Barklow, T.; Glanzman, T.; Lankford, A.J.; Riles, K.

1985-09-01

The SLAC Scanner Processor is a general purpose, programmable FASTBUS crate/cable master/slave module. This device plays a central role in the readout, buffering and pre-processing of data from the upgraded Mark II detector's new central drift chamber. In addition to data readout, the SSPs assist in a variety of other services, such as detector calibration, FASTBUS system management, FASTBUS system initialization and verification, and FASTBUS module testing. 9 refs., 1 fig., 2 tabs
Global synchronization of parallel processors using clock pulse width modulation

Science.gov (United States)

Chen, Dong; Ellavsky, Matthew R.; Franke, Ross L.; Gara, Alan; Gooding, Thomas M.; Haring, Rudolf A.; Jeanson, Mark J.; Kopcsay, Gerard V.; Liebsch, Thomas A.; Littrell, Daniel; Ohmacht, Martin; Reed, Don D.; Schenck, Brandon E.; Swetz, Richard A.

2013-04-02

A circuit generates a global clock signal with a pulse width modification to synchronize processors in a parallel computing system. The circuit may include a hardware module and a clock splitter. The hardware module may generate a clock signal and performs a pulse width modification on the clock signal. The pulse width modification changes a pulse width within a clock period in the clock signal. The clock splitter may distribute the pulse width modified clock signal to a plurality of processors in the parallel computing system.
Array processors: an introduction to their architecture, software, and applications in nuclear medicine

International Nuclear Information System (INIS)

King, M.A.; Doherty, P.W.; Rosenberg, R.J.; Cool, S.L.

1983-01-01

Array processors are ''number crunchers'' that dramatically enhance the processing power of nuclear medicine computer systems for applicatons dealing with the repetitive operations involved in digital image processing of large segments of data. The general architecture and the programming of array processors are introduced, along with some applications of array processors to the reconstruction of emission tomographic images, digital image enhancement, and functional image formation
Solar fuels generation and molecular systems: is it homogeneous or heterogeneous catalysis?

Science.gov (United States)

Artero, Vincent; Fontecave, Marc

2013-03-21

Catalysis is a key enabling technology for solar fuel generation. A number of catalytic systems, either molecular/homogeneous or solid/heterogeneous, have been developed during the last few decades for both the reductive and oxidative multi-electron reactions required for fuel production from water or CO(2) as renewable raw materials. While allowing for a fine tuning of the catalytic properties through ligand design, molecular approaches are frequently criticized because of the inherent fragility of the resulting catalysts, when exposed to extreme redox potentials. In a number of cases, it has been clearly established that the true catalytic species is heterogeneous in nature, arising from the transformation of the initial molecular species, which should rather be considered as a pre-catalyst. Whether such a situation is general or not is a matter of debate in the community. In this review, covering water oxidation and reduction catalysts, involving noble and non-noble metal ions, we limit our discussion to the cases in which this issue has been directly and properly addressed as well as those requiring more confirmation. The methodologies proposed for discriminating homogeneous and heterogeneous catalysis are inspired in part by those previously discussed by Finke in the case of homogeneous hydrogenation reaction in organometallic chemistry [J. A. Widegren and R. G. Finke, J. Mol. Catal. A, 2003, 198, 317-341].
Endogenous Price Bubbles in a Multi-Agent System of the Housing Market.

Science.gov (United States)

Kouwenberg, Roy; Zwinkels, Remco C J

2015-01-01

Economic history shows a large number of boom-bust cycles, with the U.S. real estate market as one of the latest examples. Classical economic models have not been able to provide a full explanation for this type of market dynamics. Therefore, we analyze home prices in the U.S. using an alternative approach, a multi-agent complex system. Instead of the classical assumptions of agent rationality and market efficiency, agents in the model are heterogeneous, adaptive, and boundedly rational. We estimate the multi-agent system with historical house prices for the U.S. market. The model fits the data well and a deterministic version of the model can endogenously produce boom-and-bust cycles on the basis of the estimated coefficients. This implies that trading between agents themselves can create major price swings in absence of fundamental news.
Endogenous Price Bubbles in a Multi-Agent System of the Housing Market.

Directory of Open Access Journals (Sweden)

Roy Kouwenberg

Full Text Available Economic history shows a large number of boom-bust cycles, with the U.S. real estate market as one of the latest examples. Classical economic models have not been able to provide a full explanation for this type of market dynamics. Therefore, we analyze home prices in the U.S. using an alternative approach, a multi-agent complex system. Instead of the classical assumptions of agent rationality and market efficiency, agents in the model are heterogeneous, adaptive, and boundedly rational. We estimate the multi-agent system with historical house prices for the U.S. market. The model fits the data well and a deterministic version of the model can endogenously produce boom-and-bust cycles on the basis of the estimated coefficients. This implies that trading between agents themselves can create major price swings in absence of fundamental news.
Endogenous Price Bubbles in a Multi-Agent System of the Housing Market

Science.gov (United States)

2015-01-01

Economic history shows a large number of boom-bust cycles, with the U.S. real estate market as one of the latest examples. Classical economic models have not been able to provide a full explanation for this type of market dynamics. Therefore, we analyze home prices in the U.S. using an alternative approach, a multi-agent complex system. Instead of the classical assumptions of agent rationality and market efficiency, agents in the model are heterogeneous, adaptive, and boundedly rational. We estimate the multi-agent system with historical house prices for the U.S. market. The model fits the data well and a deterministic version of the model can endogenously produce boom-and-bust cycles on the basis of the estimated coefficients. This implies that trading between agents themselves can create major price swings in absence of fundamental news. PMID:26107740

Some links on this page may take you to non-federal websites. Their policies may differ from this site.