performance computer architectures: Topics by WorldWideScience.org

Sample records for performance computer architectures

Performance Analysis of Cloud Computing Architectures Using Discrete Event Simulation

Science.gov (United States)

Stocker, John C.; Golomb, Andrew M.

2011-01-01

Cloud computing offers the economic benefit of on-demand resource allocation to meet changing enterprise computing needs. However, the flexibility of cloud computing is disadvantaged when compared to traditional hosting in providing predictable application and service performance. Cloud computing relies on resource scheduling in a virtualized network-centric server environment, which makes static performance analysis infeasible. We developed a discrete event simulation model to evaluate the overall effectiveness of organizations in executing their workflow in traditional and cloud computing architectures. The two part model framework characterizes both the demand using a probability distribution for each type of service request as well as enterprise computing resource constraints. Our simulations provide quantitative analysis to design and provision computing architectures that maximize overall mission effectiveness. We share our analysis of key resource constraints in cloud computing architectures and findings on the appropriateness of cloud computing in various applications.
Performance evaluation of scientific programs on advanced architecture computers

International Nuclear Information System (INIS)

Walker, D.W.; Messina, P.; Baille, C.F.

1988-01-01

Recently a number of advanced architecture machines have become commercially available. These new machines promise better cost-performance then traditional computers, and some of them have the potential of competing with current supercomputers, such as the Cray X/MP, in terms of maximum performance. This paper describes an on-going project to evaluate a broad range of advanced architecture computers using a number of complete scientific application programs. The computers to be evaluated include distributed- memory machines such as the NCUBE, INTEL and Caltech/JPL hypercubes, and the MEIKO computing surface, shared-memory, bus architecture machines such as the Sequent Balance and the Alliant, very long instruction word machines such as the Multiflow Trace 7/200 computer, traditional supercomputers such as the Cray X.MP and Cray-2, and SIMD machines such as the Connection Machine. Currently 11 application codes from a number of scientific disciplines have been selected, although it is not intended to run all codes on all machines. Results are presented for two of the codes (QCD and missile tracking), and future work is proposed
Benchmarking high performance computing architectures with CMS’ skeleton framework

Science.gov (United States)

Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

2017-10-01

In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.
Improving Software Performance in the Compute Unified Device Architecture

Directory of Open Access Journals (Sweden)

Alexandru PIRJAN

2010-01-01

Full Text Available This paper analyzes several aspects regarding the improvement of software performance for applications written in the Compute Unified Device Architecture CUDA. We address an issue of great importance when programming a CUDA application: the Graphics Processing Unit’s (GPU’s memory management through ranspose ernels. We also benchmark and evaluate the performance for progressively optimizing a transposing matrix application in CUDA. One particular interest was to research how well the optimization techniques, applied to software application written in CUDA, scale to the latest generation of general-purpose graphic processors units (GPGPU, like the Fermi architecture implemented in the GTX480 and the previous architecture implemented in GTX280. Lately, there has been a lot of interest in the literature for this type of optimization analysis, but none of the works so far (to our best knowledge tried to validate if the optimizations can apply to a GPU from the latest Fermi architecture and how well does the Fermi architecture scale to these software performance improving techniques.
A High Performance COTS Based Computer Architecture

Science.gov (United States)

Patte, Mathieu; Grimoldi, Raoul; Trautner, Roland

2014-08-01

Using Commercial Off The Shelf (COTS) electronic components for space applications is a long standing idea. Indeed the difference in processing performance and energy efficiency between radiation hardened components and COTS components is so important that COTS components are very attractive for use in mass and power constrained systems. However using COTS components in space is not straightforward as one must account with the effects of the space environment on the COTS components behavior. In the frame of the ESA funded activity called High Performance COTS Based Computer, Airbus Defense and Space and its subcontractor OHB CGS have developed and prototyped a versatile COTS based architecture for high performance processing. The rest of the paper is organized as follows: in a first section we will start by recapitulating the interests and constraints of using COTS components for space applications; then we will briefly describe existing fault mitigation architectures and present our solution for fault mitigation based on a component called the SmartIO; in the last part of the paper we will describe the prototyping activities executed during the HiP CBC project.
Confabulation Based Real-time Anomaly Detection for Wide-area Surveillance Using Heterogeneous High Performance Computing Architecture

Science.gov (United States)

2015-06-01

CONFABULATION BASED REAL-TIME ANOMALY DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE SYRACUSE...DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE 5a. CONTRACT NUMBER FA8750-12-1-0251 5b. GRANT...processors including graphic processor units (GPUs) and Intel Xeon Phi processors. Experimental results showed significant speedups, which can enable
Computer architecture fundamentals and principles of computer design

CERN Document Server

Dumas II, Joseph D

2005-01-01

Introduction to Computer ArchitectureWhat is Computer Architecture?Architecture vs. ImplementationBrief History of Computer SystemsThe First GenerationThe Second GenerationThe Third GenerationThe Fourth GenerationModern Computers - The Fifth GenerationTypes of Computer SystemsSingle Processor SystemsParallel Processing SystemsSpecial ArchitecturesQuality of Computer SystemsGenerality and ApplicabilityEase of UseExpandabilityCompatibilityReliabilitySuccess and Failure of Computer Architectures and ImplementationsQuality and the Perception of QualityCost IssuesArchitectural Openness, Market Timi
Power-efficient computer architectures recent advances

CERN Document Server

Själander, Magnus; Kaxiras, Stefanos

2014-01-01

As Moore's Law and Dennard scaling trends have slowed, the challenges of building high-performance computer architectures while maintaining acceptable power efficiency levels have heightened. Over the past ten years, architecture techniques for power efficiency have shifted from primarily focusing on module-level efficiencies, toward more holistic design styles based on parallelism and heterogeneity. This work highlights and synthesizes recent techniques and trends in power-efficient computer architecture.Table of Contents: Introduction / Voltage and Frequency Management / Heterogeneity and Sp
A High Performance VLSI Computer Architecture For Computer Graphics

Science.gov (United States)

Chin, Chi-Yuan; Lin, Wen-Tai

1988-10-01

A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.
High-level language computer architecture

CERN Document Server

Chu, Yaohan

1975-01-01

High-Level Language Computer Architecture offers a tutorial on high-level language computer architecture, including von Neumann architecture and syntax-oriented architecture as well as direct and indirect execution architecture. Design concepts of Japanese-language data processing systems are discussed, along with the architecture of stack machines and the SYMBOL computer system. The conceptual design of a direct high-level language processor is also described.Comprised of seven chapters, this book first presents a classification of high-level language computer architecture according to the pr
Specialized computer architectures for computational aerodynamics

Science.gov (United States)

Stevenson, D. K.

1978-01-01

In recent years, computational fluid dynamics has made significant progress in modelling aerodynamic phenomena. Currently, one of the major barriers to future development lies in the compute-intensive nature of the numerical formulations and the relative high cost of performing these computations on commercially available general purpose computers, a cost high with respect to dollar expenditure and/or elapsed time. Today's computing technology will support a program designed to create specialized computing facilities to be dedicated to the important problems of computational aerodynamics. One of the still unresolved questions is the organization of the computing components in such a facility. The characteristics of fluid dynamic problems which will have significant impact on the choice of computer architecture for a specialized facility are reviewed.
Time-Predictable Computer Architecture

Directory of Open Access Journals (Sweden)

Schoeberl Martin

2009-01-01

Full Text Available Today's general-purpose processors are optimized for maximum throughput. Real-time systems need a processor with both a reasonable and a known worst-case execution time (WCET. Features such as pipelines with instruction dependencies, caches, branch prediction, and out-of-order execution complicate WCET analysis and lead to very conservative estimates. In this paper, we evaluate the issues of current architectures with respect to WCET analysis. Then, we propose solutions for a time-predictable computer architecture. The proposed architecture is evaluated with implementation of some features in a Java processor. The resulting processor is a good target for WCET analysis and still performs well in the average case.
Computing architecture for autonomous microgrids

Science.gov (United States)

Goldsmith, Steven Y.

2015-09-29

A computing architecture that facilitates autonomously controlling operations of a microgrid is described herein. A microgrid network includes numerous computing devices that execute intelligent agents, each of which is assigned to a particular entity (load, source, storage device, or switch) in the microgrid. The intelligent agents can execute in accordance with predefined protocols to collectively perform computations that facilitate uninterrupted control of the .
The new landscape of parallel computer architecture

International Nuclear Information System (INIS)

Shalf, John

2007-01-01

The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models
The new landscape of parallel computer architecture

Energy Technology Data Exchange (ETDEWEB)

Shalf, John [NERSC Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley California, 94720 (United States)

2007-07-15

The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.
Computers in Academic Architecture Libraries.

Science.gov (United States)

Willis, Alfred; And Others

1992-01-01

Computers are widely used in architectural research and teaching in U.S. schools of architecture. A survey of libraries serving these schools sought information on the emphasis placed on computers by the architectural curriculum, accessibility of computers to library staff, and accessibility of computers to library patrons. Survey results and…
Computer Architecture A Quantitative Approach

CERN Document Server

Hennessy, John L

2007-01-01

The era of seemingly unlimited growth in processor performance is over: single chip architectures can no longer overcome the performance limitations imposed by the power they consume and the heat they generate. Today, Intel and other semiconductor firms are abandoning the single fast processor model in favor of multi-core microprocessors--chips that combine two or more processors in a single package. In the fourth edition of Computer Architecture, the authors focus on this historic shift, increasing their coverage of multiprocessors and exploring the most effective ways of achieving parallelis
Computing on Knights and Kepler Architectures

International Nuclear Information System (INIS)

Bortolotti, G; Caberletti, M; Ferraro, A; Giacomini, F; Manzali, M; Maron, G; Salomoni, D; Crimi, G; Zanella, M

2014-01-01

A recent trend in scientific computing is the increasingly important role of co-processors, originally built to accelerate graphics rendering, and now used for general high-performance computing. The INFN Computing On Knights and Kepler Architectures (COKA) project focuses on assessing the suitability of co-processor boards for scientific computing in a wide range of physics applications, and on studying the best programming methodologies for these systems. Here we present in a comparative way our results in porting a Lattice Boltzmann code on two state-of-the-art accelerators: the NVIDIA K20X, and the Intel Xeon-Phi. We describe our implementations, analyze results and compare with a baseline architecture adopting Intel Sandy Bridge CPUs.
Performance evaluation for compressible flow calculations on five parallel computers of different architectures

International Nuclear Information System (INIS)

Kimura, Toshiya.

1997-03-01

A two-dimensional explicit Euler solver has been implemented for five MIMD parallel computers of different machine architectures in Center for Promotion of Computational Science and Engineering of Japan Atomic Energy Research Institute. These parallel computers are Fujitsu VPP300, NEC SX-4, CRAY T94, IBM SP2, and Hitachi SR2201. The code was parallelized by several parallelization methods, and a typical compressible flow problem has been calculated for different grid sizes changing the number of processors. Their effective performances for parallel calculations, such as calculation speed, speed-up ratio and parallel efficiency, have been investigated and evaluated. The communication time among processors has been also measured and evaluated. As a result, the differences on the performance and the characteristics between vector-parallel and scalar-parallel computers can be pointed, and it will present the basic data for efficient use of parallel computers and for large scale CFD simulations on parallel computers. (author)
Computer architecture technology trends

CERN Document Server

1991-01-01

Please note this is a Short Discount publication. This year's edition of Computer Architecture Technology Trends analyses the trends which are taking place in the architecture of computing systems today. Due to the sheer number of different applications to which computers are being applied, there seems no end to the different adoptions which proliferate. There are, however, some underlying trends which appear. Decision makers should be aware of these trends when specifying architectures, particularly for future applications. This report is fully revised and updated and provides insight in

Digital architecture, wearable computers and providing affinity

DEFF Research Database (Denmark)

Guglielmi, Michel; Johannesen, Hanne Louise

2005-01-01

as the setting for the events of experience. Contemporary architecture is a meta-space residing almost any thinkable field, striving to blur boundaries between art, architecture, design and urbanity and break down the distinction between the material and the user or inhabitant. The presentation for this paper...... will, through research, a workshop and participation in a cumulus competition, focus on the exploration of boundaries between digital architecture, performative space and wearable computers. Our design method in general focuses on the interplay between the performing body and the environment – between...
Polymorphous Computing Architecture (PCA) Application Benchmark 1: Three-Dimensional Radar Data Processing

National Research Council Canada - National Science Library

Lebak, J

2001-01-01

The DARPA Polymorphous Computing Architecture (PCA) program is building advanced computer architectures that can reorganize their computation and communication structures to achieve better overall application performance...
Computer architecture a quantitative approach

CERN Document Server

Hennessy, John L

2019-01-01

Computer Architecture: A Quantitative Approach, Sixth Edition has been considered essential reading by instructors, students and practitioners of computer design for over 20 years. The sixth edition of this classic textbook is fully revised with the latest developments in processor and system architecture. It now features examples from the RISC-V (RISC Five) instruction set architecture, a modern RISC instruction set developed and designed to be a free and openly adoptable standard. It also includes a new chapter on domain-specific architectures and an updated chapter on warehouse-scale computing that features the first public information on Google's newest WSC. True to its original mission of demystifying computer architecture, this edition continues the longstanding tradition of focusing on areas where the most exciting computing innovation is happening, while always keeping an emphasis on good engineering design.
Electromagnetic Physics Models for Parallel Computing Architectures

Science.gov (United States)

Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

2016-10-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
New Developments in Modeling MHD Systems on High Performance Computing Architectures

Science.gov (United States)

Germaschewski, K.; Raeder, J.; Larson, D. J.; Bhattacharjee, A.

2009-04-01

Modeling the wide range of time and length scales present even in fluid models of plasmas like MHD and X-MHD (Extended MHD including two fluid effects like Hall term, electron inertia, electron pressure gradient) is challenging even on state-of-the-art supercomputers. In the last years, HPC capacity has continued to grow exponentially, but at the expense of making the computer systems more and more difficult to program in order to get maximum performance. In this paper, we will present a new approach to managing the complexity caused by the need to write efficient codes: Separating the numerical description of the problem, in our case a discretized right hand side (r.h.s.), from the actual implementation of efficiently evaluating it. An automatic code generator is used to describe the r.h.s. in a quasi-symbolic form while leaving the translation into efficient and parallelized code to a computer program itself. We implemented this approach for OpenGGCM (Open General Geospace Circulation Model), a model of the Earth's magnetosphere, which was accelerated by a factor of three on regular x86 architecture and a factor of 25 on the Cell BE architecture (commonly known for its deployment in Sony's PlayStation 3).
Electromagnetic Physics Models for Parallel Computing Architectures

International Nuclear Information System (INIS)

Amadio, G; Bianchini, C; Iope, R; Ananya, A; Apostolakis, J; Aurora, A; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S Y; Lima, G; Duhem, L

2016-01-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. (paper)
Performative Urban Architecture

DEFF Research Database (Denmark)

Thomsen, Bo Stjerne; Jensen, Ole B.

The paper explores how performative urban architecture can enhance community-making and public domain using socio-technical systems and digital technologies to constitute an urban reality. Digital medias developed for the web are now increasingly occupying the urban realm as a tool for navigating...... the physical world e.g. as exemplified by the Google Walk Score and the mobile extension of the Google Maps to the iPhone. At the same time the development in pervasive technologies and situated computing extends the build environment with digital feedback systems that are increasingly embedded and deployed...... using sensor technologies opening up for new access considerations in architecture as well as the ability for a local environment to act as real-time sources of information and facilities. Starting from the NoRA pavilion for the 10th International Architecture Biennale in Venice the paper discusses...
High-performance computing on the Intel Xeon Phi how to fully exploit MIC architectures

CERN Document Server

Wang, Endong; Shen, Bo; Zhang, Guangyong; Lu, Xiaowei; Wu, Qing; Wang, Yajuan

2014-01-01

The aim of this book is to explain to high-performance computing (HPC) developers how to utilize the Intel® Xeon Phi™ series products efficiently. To that end, it introduces some computing grammar, programming technology and optimization methods for using many-integrated-core (MIC) platforms and also offers tips and tricks for actual use, based on the authors' first-hand optimization experience.The material is organized in three sections. The first section, "Basics of MIC", introduces the fundamentals of MIC architecture and programming, including the specific Intel MIC programming environment
NET-COMPUTER: Internet Computer Architecture and its Application in E-Commerce

Directory of Open Access Journals (Sweden)

P. O. Umenne

2012-12-01

Full Text Available Research in Intelligent Agents has yielded interesting results, some of which have been translated into commercial ventures. Intelligent Agents are executable software components that represent the user, perform tasks on behalf of the user and when the task terminates, the Agents send the result to the user. Intelligent Agents are best suited for the Internet: a collection of computers connected together in a world-wide computer network. Swarm and HYDRA computer architectures for Agents’ execution were developed at the University of Surrey, UK in the 90s. The objective of the research was to develop a software-based computer architecture on which Agents execution could be explored. The combination of Intelligent Agents and HYDRA computer architecture gave rise to a new computer concept: the NET-Computer in which the computing resources reside on the Internet. The Internet computers form the hardware and software resources, and the user is provided with a simple interface to access the Internet and run user tasks. The Agents autonomously roam the Internet (NET-Computer executing the tasks. A growing segment of the Internet is E-Commerce for online shopping for products and services. The Internet computing resources provide a marketplace for product suppliers and consumers alike. Consumers are looking for suppliers selling products and services, while suppliers are looking for buyers. Searching the vast amount of information available on the Internet causes a great deal of problems for both consumers and suppliers. Intelligent Agents executing on the NET-Computer can surf through the Internet and select specific information of interest to the user. The simulation results show that Intelligent Agents executing HYDRA computer architecture could be applied in E-Commerce.
Digital design and computer architecture

CERN Document Server

Harris, David

2010-01-01

Digital Design and Computer Architecture is designed for courses that combine digital logic design with computer organization/architecture or that teach these subjects as a two-course sequence. Digital Design and Computer Architecture begins with a modern approach by rigorously covering the fundamentals of digital logic design and then introducing Hardware Description Languages (HDLs). Featuring examples of the two most widely-used HDLs, VHDL and Verilog, the first half of the text prepares the reader for what follows in the second: the design of a MIPS Processor. By the end of D
Fundamentals of computer architecture and design

CERN Document Server

Bindal, Ahmet

2017-01-01

This textbook provides semester-length coverage of computer architecture and design, providing a strong foundation for students to understand modern computer system architecture and to apply these insights and principles to future computer designs. It is based on the author’s decades of industrial experience with computer architecture and design, as well as with teaching students focused on pursuing careers in computer engineering. Unlike a number of existing textbooks for this course, this one focuses not only on CPU architecture, but also covers in great detail in system buses, peripherals and memories.This book teaches every element in a computing system in two steps. First, it introduces the functionality of each topic (and subtopics) and then goes into “from-scratch design” of a particular digital block from its architectural specifications using timing diagrams. The author describes how the data-path of a certain digital block is generated using timin g diagrams, a method which most textbo...
CITAstudio: Computation in Architecture 2015

DEFF Research Database (Denmark)

Nicholas, Paul; Ayres, Phil

2016-01-01

CITAstudio yearbook. CITAstudio: Computation in Architecture is a two year International Master's Programme at The Royal Danish Academy of Fine Arts, School of Architecture. With a focus on digital design and material fabrication the programme questions how computation is changing our spatial...
Hybrid parallel computing architecture for multiview phase shifting

Science.gov (United States)

Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun

2014-11-01

The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
Layered architecture for quantum computing

OpenAIRE

Jones, N. Cody; Van Meter, Rodney; Fowler, Austin G.; McMahon, Peter L.; Kim, Jungsang; Ladd, Thaddeus D.; Yamamoto, Yoshihisa

2010-01-01

We develop a layered quantum-computer architecture, which is a systematic framework for tackling the individual challenges of developing a quantum computer while constructing a cohesive device design. We discuss many of the prominent techniques for implementing circuit-model quantum computing and introduce several new methods, with an emphasis on employing surface-code quantum error correction. In doing so, we propose a new quantum-computer architecture based on optical control of quantum dot...
Matrix multiplication operations with data pre-conditioning in a high performance computing architecture

Science.gov (United States)

Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

2013-11-05

Mechanisms for performing matrix multiplication operations with data pre-conditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation.
High-performance computing using FPGAs

CERN Document Server

Benkrid, Khaled

2013-01-01

This book is concerned with the emerging field of High Performance Reconfigurable Computing (HPRC), which aims to harness the high performance and relative low power of reconfigurable hardware–in the form Field Programmable Gate Arrays (FPGAs)–in High Performance Computing (HPC) applications. It presents the latest developments in this field from applications, architecture, and tools and methodologies points of view. We hope that this work will form a reference for existing researchers in the field, and entice new researchers and developers to join the HPRC community. The book includes: Thirteen application chapters which present the most important application areas tackled by high performance reconfigurable computers, namely: financial computing, bioinformatics and computational biology, data search and processing, stencil computation e.g. computational fluid dynamics and seismic modeling, cryptanalysis, astronomical N-body simulation, and circuit simulation. Seven architecture chapters which...
Layered Architecture for Quantum Computing

Directory of Open Access Journals (Sweden)

N. Cody Jones

2012-07-01

Full Text Available We develop a layered quantum-computer architecture, which is a systematic framework for tackling the individual challenges of developing a quantum computer while constructing a cohesive device design. We discuss many of the prominent techniques for implementing circuit-model quantum computing and introduce several new methods, with an emphasis on employing surface-code quantum error correction. In doing so, we propose a new quantum-computer architecture based on optical control of quantum dots. The time scales of physical-hardware operations and logical, error-corrected quantum gates differ by several orders of magnitude. By dividing functionality into layers, we can design and analyze subsystems independently, demonstrating the value of our layered architectural approach. Using this concrete hardware platform, we provide resource analysis for executing fault-tolerant quantum algorithms for integer factoring and quantum simulation, finding that the quantum-dot architecture we study could solve such problems on the time scale of days.
Progress in a novel architecture for high performance processing

Science.gov (United States)

Zhang, Zhiwei; Liu, Meng; Liu, Zijun; Du, Xueliang; Xie, Shaolin; Ma, Hong; Ding, Guangxin; Ren, Weili; Zhou, Fabiao; Sun, Wenqin; Wang, Huijuan; Wang, Donglin

2018-04-01

The high performance processing (HPP) is an innovative architecture which targets on high performance computing with excellent power efficiency and computing performance. It is suitable for data intensive applications like supercomputing, machine learning and wireless communication. An example chip with four application-specific integrated circuit (ASIC) cores which is the first generation of HPP cores has been taped out successfully under Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm low power process. The innovative architecture shows great energy efficiency over the traditional central processing unit (CPU) and general-purpose computing on graphics processing units (GPGPU). Compared with MaPU, HPP has made great improvement in architecture. The chip with 32 HPP cores is being developed under TSMC 16 nm field effect transistor (FFC) technology process and is planed to use commercially. The peak performance of this chip can reach 4.3 teraFLOPS (TFLOPS) and its power efficiency is up to 89.5 gigaFLOPS per watt (GFLOPS/W).
Computer architecture evaluation for structural dynamics computations: Project summary

Science.gov (United States)

Standley, Hilda M.

1989-01-01

The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies.
RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization.

Science.gov (United States)

Fang, Yuling; Chen, Qingkui; Xiong, Neal N; Zhao, Deyu; Wang, Jingjuan

2017-08-04

This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes' diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services.

Savannah River Site computing architecture

Energy Technology Data Exchange (ETDEWEB)

1991-03-29

A computing architecture is a framework for making decisions about the implementation of computer technology and the supporting infrastructure. Because of the size, diversity, and amount of resources dedicated to computing at the Savannah River Site (SRS), there must be an overall strategic plan that can be followed by the thousands of site personnel who make decisions daily that directly affect the SRS computing environment and impact the site's production and business systems. This plan must address the following requirements: There must be SRS-wide standards for procurement or development of computing systems (hardware and software). The site computing organizations must develop systems that end users find easy to use. Systems must be put in place to support the primary function of site information workers. The developers of computer systems must be given tools that automate and speed up the development of information systems and applications based on computer technology. This document describes a proposal for a site-wide computing architecture that addresses the above requirements. In summary, this architecture is standards-based data-driven, and workstation-oriented with larger systems being utilized for the delivery of needed information to users in a client-server relationship.
Savannah River Site computing architecture

Energy Technology Data Exchange (ETDEWEB)

1991-03-29

A computing architecture is a framework for making decisions about the implementation of computer technology and the supporting infrastructure. Because of the size, diversity, and amount of resources dedicated to computing at the Savannah River Site (SRS), there must be an overall strategic plan that can be followed by the thousands of site personnel who make decisions daily that directly affect the SRS computing environment and impact the site`s production and business systems. This plan must address the following requirements: There must be SRS-wide standards for procurement or development of computing systems (hardware and software). The site computing organizations must develop systems that end users find easy to use. Systems must be put in place to support the primary function of site information workers. The developers of computer systems must be given tools that automate and speed up the development of information systems and applications based on computer technology. This document describes a proposal for a site-wide computing architecture that addresses the above requirements. In summary, this architecture is standards-based data-driven, and workstation-oriented with larger systems being utilized for the delivery of needed information to users in a client-server relationship.
Architecture and Programming Models for High Performance Intensive Computation

Science.gov (United States)

2016-06-29

commands from the data processing center to the sensors is needed. It has been noted that the ubiquity of mobile communication devices offers the...commands from a Processing Facility by way of mobile Relay Stations. The activity of each component of this model other than the Merge module can be...evaluation of the initial system implementation. Gao also was in charge of the development of Fresh Breeze architecture backend on new many-core computers
NET-COMPUTER: Internet Computer Architecture and its Application in E-Commerce

OpenAIRE

P. O. Umenne; M. O. Odhiambo

2012-01-01

Research in Intelligent Agents has yielded interesting results, some of which have been translated into commercial ventures. Intelligent Agents are executable software components that represent the user, perform tasks on behalf of the user and when the task terminates, the Agents send the result to the user. Intelligent Agents are best suited for the Internet: a collection of computers connected together in a world-wide computer network. Swarm and HYDRA computer architectures for Agents’ ex...
Architectures for single-chip image computing

Science.gov (United States)

Gove, Robert J.

1992-04-01

This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.
A computer architecture for the implementation of SDL

Energy Technology Data Exchange (ETDEWEB)

Crutcher, L A

1989-01-01

Finite State Machines (FSMs) are a part of well-established automata theory. The FSM model is useful in all stages of system design, from abstract specification to implementation in hardware. The FSM model has been studied as a technique in software design, and the implementation of this type of software considered. The Specification and Description Language (SDL) has been considered in detail as an example of this approach. The complexity of systems designed using SDL warrants their implementation through a programmed computer. A benchmark for the implementation of SDL has been established and the performance of SDL on three particular computer architectures investigated. Performance is judged according to this benchmark and also the ease of implementation, which is related to the confidence of a correct implementation. The implementation on 68000s and transputers is considered as representative of established and state-of-the-art microprocessors respectively. A third architecture that uses a processor that has been proposed specifically for the implementation of SDL is considered as a high-level custom architecture. Analysis and measurements of the benchmark on each architecture indicates that the execution time of SDL decreases by an order of magnitude from the 68000 to the transputer to the custom architecture. The ease of implementation is also greater when the execution time is reduced. A study of some real applications of SDL indicates that the benchmark figures are reflected in user-oriented measures of performance such as data throughput and response time. A high-level architecture such as the one proposed here for SDL can provide benefits in terms of execution time and correctness.
Contemporary high performance computing from petascale toward exascale

CERN Document Server

Vetter, Jeffrey S

2013-01-01

Contemporary High Performance Computing: From Petascale toward Exascale focuses on the ecosystems surrounding the world's leading centers for high performance computing (HPC). It covers many of the important factors involved in each ecosystem: computer architectures, software, applications, facilities, and sponsors. The first part of the book examines significant trends in HPC systems, including computer architectures, applications, performance, and software. It discusses the growth from terascale to petascale computing and the influence of the TOP500 and Green500 lists. The second part of the
Lightgrid-an agile distributed computing architecture for Geant4

International Nuclear Information System (INIS)

Young, Jason; Perry, John O.; Jevremovic, Tatjana

2010-01-01

A light weight grid based computing architecture has been developed to accelerate Geant4 computations on a variety of network architectures. This new software is called LightGrid. LightGrid has a variety of features designed to overcome current limitations on other grid based computing platforms, more specifically, smaller network architectures. By focusing on smaller, local grids, LightGrid is able to simplify the grid computing process with minimal changes to existing Geant4 code. LightGrid allows for integration between Geant4 and MySQL, which both increases flexibility in the grid as well as provides a faster, reliable, and more portable method for accessing results than traditional data storage systems. This unique method of data acquisition allows for more fault tolerant runs as well as instant results from simulations as they occur. The performance increases brought along by using LightGrid allow simulation times to be decreased linearly. LightGrid also allows for pseudo-parallelization with minimal Geant4 code changes.
Computer programming and architecture the VAX

CERN Document Server

Levy, Henry

2014-01-01

Takes a unique systems approach to programming and architecture of the VAXUsing the VAX as a detailed example, the first half of this book offers a complete course in assembly language programming. The second describes higher-level systems issues in computer architecture. Highlights include the VAX assembler and debugger, other modern architectures such as RISCs, multiprocessing and parallel computing, microprogramming, caches and translation buffers, and an appendix on the Berkeley UNIX assembler.
A computer architecture for intelligent machines

Science.gov (United States)

Lefebvre, D. R.; Saridis, G. N.

1992-01-01

The theory of intelligent machines proposes a hierarchical organization for the functions of an autonomous robot based on the principle of increasing precision with decreasing intelligence. An analytic formulation of this theory using information-theoretic measures of uncertainty for each level of the intelligent machine has been developed. The authors present a computer architecture that implements the lower two levels of the intelligent machine. The architecture supports an event-driven programming paradigm that is independent of the underlying computer architecture and operating system. Execution-level controllers for motion and vision systems are briefly addressed, as well as the Petri net transducer software used to implement coordination-level functions. A case study illustrates how this computer architecture integrates real-time and higher-level control of manipulator and vision systems.
Brain architecture: a design for natural computation.

Science.gov (United States)

Kaiser, Marcus

2007-12-15

Fifty years ago, John von Neumann compared the architecture of the brain with that of the computers he invented and which are still in use today. In those days, the organization of computers was based on concepts of brain organization. Here, we give an update on current results on the global organization of neural systems. For neural systems, we outline how the spatial and topological architecture of neuronal and cortical networks facilitates robustness against failures, fast processing and balanced network activation. Finally, we discuss mechanisms of self-organization for such architectures. After all, the organization of the brain might again inspire computer architecture.
Centaure: an heterogeneous parallel architecture for computer vision

International Nuclear Information System (INIS)

Peythieux, Marc

1997-01-01

This dissertation deals with the architecture of parallel computers dedicated to computer vision. In the first chapter, the problem to be solved is presented, as well as the architecture of the Sympati and Symphonie computers, on which this work is based. The second chapter is about the state of the art of computers and integrated processors that can execute computer vision and image processing codes. The third chapter contains a description of the architecture of Centaure. It has an heterogeneous structure: it is composed of a multiprocessor system based on Analog Devices ADSP21060 Sharc digital signal processor, and of a set of Symphonie computers working in a multi-SIMD fashion. Centaure also has a modular structure. Its basic node is composed of one Symphonie computer, tightly coupled to a Sharc thanks to a dual ported memory. The nodes of Centaure are linked together by the Sharc communication links. The last chapter deals with a performance validation of Centaure. The execution times on Symphonie and on Centaure of a benchmark which is typical of industrial vision, are presented and compared. In the first place, these results show that the basic node of Centaure allows a faster execution than Symphonie, and that increasing the size of the tested computer leads to a better speed-up with Centaure than with Symphonie. In the second place, these results validate the choice of running the low level structure of Centaure in a multi- SIMD fashion. (author) [fr
A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures

Science.gov (United States)

Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.

2016-05-01

In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.
Fast semivariogram computation using FPGA architectures

Science.gov (United States)

Lagadapati, Yamuna; Shirvaikar, Mukul; Dong, Xuanliang

2015-02-01

The semivariogram is a statistical measure of the spatial distribution of data and is based on Markov Random Fields (MRFs). Semivariogram analysis is a computationally intensive algorithm that has typically seen applications in the geosciences and remote sensing areas. Recently, applications in the area of medical imaging have been investigated, resulting in the need for efficient real time implementation of the algorithm. The semivariogram is a plot of semivariances for different lag distances between pixels. A semi-variance, γ(h), is defined as the half of the expected squared differences of pixel values between any two data locations with a lag distance of h. Due to the need to examine each pair of pixels in the image or sub-image being processed, the base algorithm complexity for an image window with n pixels is O(n2). Field Programmable Gate Arrays (FPGAs) are an attractive solution for such demanding applications due to their parallel processing capability. FPGAs also tend to operate at relatively modest clock rates measured in a few hundreds of megahertz, but they can perform tens of thousands of calculations per clock cycle while operating in the low range of power. This paper presents a technique for the fast computation of the semivariogram using two custom FPGA architectures. The design consists of several modules dedicated to the constituent computational tasks. A modular architecture approach is chosen to allow for replication of processing units. This allows for high throughput due to concurrent processing of pixel pairs. The current implementation is focused on isotropic semivariogram computations only. Anisotropic semivariogram implementation is anticipated to be an extension of the current architecture, ostensibly based on refinements to the current modules. The algorithm is benchmarked using VHDL on a Xilinx XUPV5-LX110T development Kit, which utilizes the Virtex5 FPGA. Medical image data from MRI scans are utilized for the experiments
ELASTIC CLOUD COMPUTING ARCHITECTURE AND SYSTEM FOR HETEROGENEOUS SPATIOTEMPORAL COMPUTING

Directory of Open Access Journals (Sweden)

X. Shi

2017-10-01

Full Text Available Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs, while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.
Elastic Cloud Computing Architecture and System for Heterogeneous Spatiotemporal Computing

Science.gov (United States)

Shi, X.

2017-10-01

Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs), while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC) or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA) may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.
Architecture independent environment for developing engineering software on MIMD computers

Science.gov (United States)

Valimohamed, Karim A.; Lopez, L. A.

1990-01-01

Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management.
Efficient universal computing architectures for decoding neural activity.

Directory of Open Access Journals (Sweden)

Benjamin I Rapoport

Full Text Available The ability to decode neural activity into meaningful control signals for prosthetic devices is critical to the development of clinically useful brain- machine interfaces (BMIs. Such systems require input from tens to hundreds of brain-implanted recording electrodes in order to deliver robust and accurate performance; in serving that primary function they should also minimize power dissipation in order to avoid damaging neural tissue; and they should transmit data wirelessly in order to minimize the risk of infection associated with chronic, transcutaneous implants. Electronic architectures for brain- machine interfaces must therefore minimize size and power consumption, while maximizing the ability to compress data to be transmitted over limited-bandwidth wireless channels. Here we present a system of extremely low computational complexity, designed for real-time decoding of neural signals, and suited for highly scalable implantable systems. Our programmable architecture is an explicit implementation of a universal computing machine emulating the dynamics of a network of integrate-and-fire neurons; it requires no arithmetic operations except for counting, and decodes neural signals using only computationally inexpensive logic operations. The simplicity of this architecture does not compromise its ability to compress raw neural data by factors greater than [Formula: see text]. We describe a set of decoding algorithms based on this computational architecture, one designed to operate within an implanted system, minimizing its power consumption and data transmission bandwidth; and a complementary set of algorithms for learning, programming the decoder, and postprocessing the decoded output, designed to operate in an external, nonimplanted unit. The implementation of the implantable portion is estimated to require fewer than 5000 operations per second. A proof-of-concept, 32-channel field-programmable gate array (FPGA implementation of this portion
Performances of multiprocessor multidisk architectures for continuous media storage

Science.gov (United States)

Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.

1996-03-01

Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.
Outline of a novel architecture for cortical computation.

Science.gov (United States)

Majumdar, Kaushik

2008-03-01

In this paper a novel architecture for cortical computation has been proposed. This architecture is composed of computing paths consisting of neurons and synapses. These paths have been decomposed into lateral, longitudinal and vertical components. Cortical computation has then been decomposed into lateral computation (LaC), longitudinal computation (LoC) and vertical computation (VeC). It has been shown that various loop structures in the cortical circuit play important roles in cortical computation as well as in memory storage and retrieval, keeping in conformity with the molecular basis of short and long term memory. A new learning scheme for the brain has also been proposed and how it is implemented within the proposed architecture has been explained. A few mathematical results about the architecture have been proposed, some of which are without proof.

Field-programmable custom computing technology architectures, tools, and applications

CERN Document Server

Luk, Wayne; Pocek, Ken

2000-01-01

Field-Programmable Custom Computing Technology: Architectures, Tools, and Applications brings together in one place important contributions and up-to-date research results in this fast-moving area. In seven selected chapters, the book describes the latest advances in architectures, design methods, and applications of field-programmable devices for high-performance reconfigurable systems. The contributors to this work were selected from the leading researchers and practitioners in the field. It will be valuable to anyone working or researching in the field of custom computing technology. It serves as an excellent reference, providing insight into some of the most challenging issues being examined today.
Memristor-based nanoelectronic computing circuits and architectures

CERN Document Server

Vourkas, Ioannis

2016-01-01

This book considers the design and development of nanoelectronic computing circuits, systems and architectures focusing particularly on memristors, which represent one of today’s latest technology breakthroughs in nanoelectronics. The book studies, explores, and addresses the related challenges and proposes solutions for the smooth transition from conventional circuit technologies to emerging computing memristive nanotechnologies. Its content spans from fundamental device modeling to emerging storage system architectures and novel circuit design methodologies, targeting advanced non-conventional analog/digital massively parallel computational structures. Several new results on memristor modeling, memristive interconnections, logic circuit design, memory circuit architectures, computer arithmetic systems, simulation software tools, and applications of memristors in computing are presented. High-density memristive data storage combined with memristive circuit-design paradigms and computational tools applied t...
A memory-array architecture for computer vision

Energy Technology Data Exchange (ETDEWEB)

Balsara, P.T.

1989-01-01

With the fast advances in the area of computer vision and robotics there is a growing need for machines that can understand images at a very high speed. A conventional von Neumann computer is not suited for this purpose because it takes a tremendous amount of time to solve most typical image processing problems. Exploiting the inherent parallelism present in various vision tasks can significantly reduce the processing time. Fortunately, parallelism is increasingly affordable as hardware gets cheaper. Thus it is now imperative to study computer vision in a parallel processing framework. The author should first design a computational structure which is well suited for a wide range of vision tasks and then develop parallel algorithms which can run efficiently on this structure. Recent advances in VLSI technology have led to several proposals for parallel architectures for computer vision. In this thesis he demonstrates that a memory array architecture with efficient local and global communication capabilities can be used for high speed execution of a wide range of computer vision tasks. This architecture, called the Access Constrained Memory Array Architecture (ACMAA), is efficient for VLSI implementation because of its modular structure, simple interconnect and limited global control. Several parallel vision algorithms have been designed for this architecture. The choice of vision problems demonstrates the versatility of ACMAA for a wide range of vision tasks. These algorithms were simulated on a high level ACMAA simulator running on the Intel iPSC/2 hypercube, a parallel architecture. The results of this simulation are compared with those of sequential algorithms running on a single hypercube node. Details of the ACMAA processor architecture are also presented.
Neuromorphic Computing – From Materials Research to Systems Architecture Roundtable

Energy Technology Data Exchange (ETDEWEB)

Schuller, Ivan K. [Univ. of California, San Diego, CA (United States); Stevens, Rick [Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States); Pino, Robinson [Dept. of Energy (DOE) Office of Science, Washington, DC (United States); Pechan, Michael [Dept. of Energy (DOE) Office of Science, Washington, DC (United States)

2015-10-29

Computation in its many forms is the engine that fuels our modern civilization. Modern computation—based on the von Neumann architecture—has allowed, until now, the development of continuous improvements, as predicted by Moore’s law. However, computation using current architectures and materials will inevitably—within the next 10 years—reach a limit because of fundamental scientific reasons. DOE convened a roundtable of experts in neuromorphic computing systems, materials science, and computer science in Washington on October 29-30, 2015 to address the following basic questions: Can brain-like (“neuromorphic”) computing devices based on new material concepts and systems be developed to dramatically outperform conventional CMOS based technology? If so, what are the basic research challenges for materials sicence and computing? The overarching answer that emerged was: The development of novel functional materials and devices incorporated into unique architectures will allow a revolutionary technological leap toward the implementation of a fully “neuromorphic” computer. To address this challenge, the following issues were considered: The main differences between neuromorphic and conventional computing as related to: signaling models, timing/clock, non-volatile memory, architecture, fault tolerance, integrated memory and compute, noise tolerance, analog vs. digital, and in situ learning New neuromorphic architectures needed to: produce lower energy consumption, potential novel nanostructured materials, and enhanced computation Device and materials properties needed to implement functions such as: hysteresis, stability, and fault tolerance Comparisons of different implementations: spin torque, memristors, resistive switching, phase change, and optical schemes for enhanced breakthroughs in performance, cost, fault tolerance, and/or manufacturability.
Experimental high energy physics and modern computer architectures

International Nuclear Information System (INIS)

Hoek, J.

1988-06-01

The paper examines how experimental High Energy Physics can use modern computer architectures efficiently. In this connection parallel and vector architectures are investigated, and the types available at the moment for general use are discussed. A separate section briefly describes some architectures that are either a combination of both, or exemplify other architectures. In an appendix some directions in which computing seems to be developing in the USA are mentioned. (author)
Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

Science.gov (United States)

Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

2015-09-01

The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
A Heterogeneous Quantum Computer Architecture

NARCIS (Netherlands)

Fu, X.; Riesebos, L.; Lao, L.; Garcia Almudever, C.; Sebastiano, F.; Versluis, R.; Charbon, E.; Bertels, K.

2016-01-01

In this paper, we present a high level view of the heterogeneous quantum computer architecture as any future quantum computer will consist of both a classical and quantum computing part. The classical part is needed for error correction as well as for the execution of algorithms that contain both
High performance computing on vector systems

CERN Document Server

Roller, Sabine

2008-01-01

Presents the developments in high-performance computing and simulation on modern supercomputer architectures. This book covers trends in hardware and software development in general and specifically the vector-based systems and heterogeneous architectures. It presents innovative fields like coupled multi-physics or multi-scale simulations.
HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation

Science.gov (United States)

Sterling, Thomas; Bergman, Larry

2000-01-01

Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention
Spatial computing in interactive architecture

NARCIS (Netherlands)

S.O. Dulman (Stefan); M. Krezer; L. Hovestad

2014-01-01

htmlabstractDistributed computing is the theoretical foundation for applications and technologies like interactive architecture, wearable computing, and smart materials. It evolves continuously, following needs rising from scientific developments, novel uses of technology, or simply the curiosity to
Large computer systems and new architectures

International Nuclear Information System (INIS)

Bloch, T.

1978-01-01

The super-computers of today are becoming quite specialized and one can no longer expect to get all the state-of-the-art software and hardware facilities in one package. In order to achieve faster and faster computing it is necessary to experiment with new architectures, and the cost of developing each experimental architecture into a general-purpose computer system is too high when one considers the relatively small market for these computers. The result is that such computers are becoming 'back-ends' either to special systems (BSP, DAP) or to anything (CRAY-1). Architecturally the CRAY-1 is the most attractive today since it guarantees a speed gain of a factor of two over a CDC 7600 thus allowing us to regard any speed up resulting from vectorization as a bonus. It looks, however, as if it will be very difficult to make substantially faster computers using only pipe-lining techniques and that it will be necessary to explore multiple processors working on the same problem. The experience which will be gained with the BSP and the DAP over the next few years will certainly be most valuable in this respect. (Auth.)
Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture

KAUST Repository

AbdulJabbar, Mustafa Abdulmajeed; Al Farhan, Mohammed; Yokota, Rio; Keyes, David E.

2017-01-01

Manycore optimizations are essential for achieving performance worthy of anticipated exascale systems. Utilization of manycore chips is inevitable to attain the desired floating point performance of these energy-austere systems. In this work, we revisit ExaFMM, the open source Fast Multiple Method (FMM) library, in light of highly tuned shared-memory parallelization and detailed performance analysis on the new highly parallel Intel manycore architecture, Knights Landing (KNL). We assess scalability and performance gain using task-based parallelism of the FMM tree traversal. We also provide an in-depth analysis of the most computationally intensive part of the traversal kernel (i.e., the particle-to-particle (P2P) kernel), by comparing its performance across KNL and Broadwell architectures. We quantify different configurations that exploit the on-chip 512-bit vector units within different task-based threading paradigms. MPI communication-reducing and NUMA-aware approaches for the FMM’s global tree data exchange are examined with different cluster modes of KNL. By applying several algorithm- and architecture-aware optimizations for FMM, we show that the N-Body kernel on 256 threads of KNL achieves on average 2.8× speedup compared to the non-vectorized version, whereas on 56 threads of Broadwell, it achieves on average 2.9× speedup. In addition, the tree traversal kernel on KNL scales monotonically up to 256 threads with task-based programming models. The MPI-based communication-reducing algorithms show expected improvements of the data locality across the KNL on-chip network.
Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture

KAUST Repository

AbdulJabbar, Mustafa Abdulmajeed

2017-07-31

Manycore optimizations are essential for achieving performance worthy of anticipated exascale systems. Utilization of manycore chips is inevitable to attain the desired floating point performance of these energy-austere systems. In this work, we revisit ExaFMM, the open source Fast Multiple Method (FMM) library, in light of highly tuned shared-memory parallelization and detailed performance analysis on the new highly parallel Intel manycore architecture, Knights Landing (KNL). We assess scalability and performance gain using task-based parallelism of the FMM tree traversal. We also provide an in-depth analysis of the most computationally intensive part of the traversal kernel (i.e., the particle-to-particle (P2P) kernel), by comparing its performance across KNL and Broadwell architectures. We quantify different configurations that exploit the on-chip 512-bit vector units within different task-based threading paradigms. MPI communication-reducing and NUMA-aware approaches for the FMM’s global tree data exchange are examined with different cluster modes of KNL. By applying several algorithm- and architecture-aware optimizations for FMM, we show that the N-Body kernel on 256 threads of KNL achieves on average 2.8× speedup compared to the non-vectorized version, whereas on 56 threads of Broadwell, it achieves on average 2.9× speedup. In addition, the tree traversal kernel on KNL scales monotonically up to 256 threads with task-based programming models. The MPI-based communication-reducing algorithms show expected improvements of the data locality across the KNL on-chip network.
Outline of a novel architecture for cortical computation

OpenAIRE

Majumdar, Kaushik

2007-01-01

In this paper a novel architecture for cortical computation has been proposed. This architecture is composed of computing paths consisting of neurons and synapses only. These paths have been decomposed into lateral, longitudinal and vertical components. Cortical computation has then been decomposed into lateral computation (LaC), longitudinal computation (LoC) and vertical computation (VeC). It has been shown that various loop structures in the cortical circuit play important roles in cortica...
Explaining the gap between theoretical peak performance and real performance for supercomputer architectures

International Nuclear Information System (INIS)

Schoenauer, W.; Haefner, H.

1993-01-01

The basic architectures of vector and parallel computers with their properties are presented. Then the memory size and the arithmetic operations in the context of memory bandwidth are discussed. For the exemplary discussion of a single operation micro-measurements of the vector triad for the IBM 3090 VF and the CRAY Y-MP/8 are presented. They reveal the details of the losses for a single operation. Then we analyze the global performance of a whole supercomputer by identifying reduction factors that bring down the theoretical peak performance to the poor real performance. The responsibilities of the manufacturer and of the user for these losses are dicussed. Then the price-performance ratio for different architectures in a snapshot of January 1991 is briefly mentioned. Finally some remarks to a user-friendly architecture for a supercomputer will be made. (orig.)
Architecture, systems research and computational sciences

CERN Document Server

2012-01-01

The Winter 2012 (vol. 14 no. 1) issue of the Nexus Network Journal is dedicated to the theme “Architecture, Systems Research and Computational Sciences”. This is an outgrowth of the session by the same name which took place during the eighth international, interdisciplinary conference “Nexus 2010: Relationships between Architecture and Mathematics, held in Porto, Portugal, in June 2010. Today computer science is an integral part of even strictly historical investigations, such as those concerning the construction of vaults, where the computer is used to survey the existing building, analyse the data and draw the ideal solution. What the papers in this issue make especially evident is that information technology has had an impact at a much deeper level as well: architecture itself can now be considered as a manifestation of information and as a complex system. The issue is completed with other research papers, conference reports and book reviews.
Developing a Distributed Computing Architecture at Arizona State University.

Science.gov (United States)

Armann, Neil; And Others

1994-01-01

Development of Arizona State University's computing architecture, designed to ensure that all new distributed computing pieces will work together, is described. Aspects discussed include the business rationale, the general architectural approach, characteristics and objectives of the architecture, specific services, and impact on the university…
Addressing Cloud Computing in Enterprise Architecture: Issues and Challenges

OpenAIRE

Khan, Khaled; Gangavarapu, Narendra

2009-01-01

This article discusses how the characteristics of cloud computing affect the enterprise architecture in four domains: business, data, application and technology. The ownership and control of architectural components are shifted from organisational perimeters to cloud providers. It argues that although cloud computing promises numerous benefits to enterprises, the shifting control from enterprises to cloud providers on architectural components introduces several architectural challenges. The d...
High-performance computing — an overview

Science.gov (United States)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
CAAD as Computer-Activated Architectural Design

DEFF Research Database (Denmark)

Galle, Per

1998-01-01

In a brief sketch, drawing on a general philosophical conception of human interaction with the world, the architectural design process is analysed in terms of two kinds of human action: interpretation and production. Both of these are seen as establishing a link between mental and material entities....... On this background two alternative roles of computers in computer-aided architectural design (CAAD) are distinguished: a passive and a more active role, where in the latter case, the computer’s capacity for symbol manipulation is utilized to influence design thinking actively. The analysis offered in this paper may...... serve at least two purposes: to provide a conceptual machinery for research and reflection on CAAD, and to clarify the notion of ‘artificial intelligence’ in the light of architectural design....

Programmable architecture for quantum computing

NARCIS (Netherlands)

Chen, J.; Wang, L.; Charbon, E.; Wang, B.

2013-01-01

A programmable architecture called “quantum FPGA (field-programmable gate array)” (QFPGA) is presented for quantum computing, which is a hybrid model combining the advantages of the qubus system and the measurement-based quantum computation. There are two kinds of buses in QFPGA, the local bus and
Monte Carlo simulations on SIMD computer architectures

International Nuclear Information System (INIS)

Burmester, C.P.; Gronsky, R.; Wille, L.T.

1992-01-01

In this paper algorithmic considerations regarding the implementation of various materials science applications of the Monte Carlo technique to single instruction multiple data (SIMD) computer architectures are presented. In particular, implementation of the Ising model with nearest, next nearest, and long range screened Coulomb interactions on the SIMD architecture MasPar MP-1 (DEC mpp-12000) series of massively parallel computers is demonstrated. Methods of code development which optimize processor array use and minimize inter-processor communication are presented including lattice partitioning and the use of processor array spanning tree structures for data reduction. Both geometric and algorithmic parallel approaches are utilized. Benchmarks in terms of Monte Carl updates per second for the MasPar architecture are presented and compared to values reported in the literature from comparable studies on other architectures
Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS

Science.gov (United States)

Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.

2018-03-01

Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.
Heterogeneous computing architecture for fast detection of SNP-SNP interactions.

Science.gov (United States)

Sluga, Davor; Curk, Tomaz; Zupan, Blaz; Lotric, Uros

2014-06-25

The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.
Computer Architecture A Quantitative Approach

CERN Document Server

Hennessy, John L

2011-01-01

The computing world today is in the middle of a revolution: mobile clients and cloud computing have emerged as the dominant paradigms driving programming and hardware innovation today. The Fifth Edition of Computer Architecture focuses on this dramatic shift, exploring the ways in which software and technology in the cloud are accessed by cell phones, tablets, laptops, and other mobile computing devices. Each chapter includes two real-world examples, one mobile and one datacenter, to illustrate this revolutionary change.Updated to cover the mobile computing revolutionEmphasizes the two most im
The performance of a new Geant4 Bertini intra-nuclear cascade model in high throughput computing (HTC) cluster architecture

Energy Technology Data Exchange (ETDEWEB)

Aatos, Heikkinen; Andi, Hektor; Veikko, Karimaki; Tomas, Linden [Helsinki Univ., Institute of Physics (Finland)

2003-07-01

We study the performance of a new Bertini intra-nuclear cascade model implemented in the general detector simulation tool-kit Geant4 with a High Throughput Computing (HTC) cluster architecture. A 60 node Pentium III open-Mosix cluster is used with the Mosix kernel performing automatic process load-balancing across several CPUs. The Mosix cluster consists of several computer classes equipped with Windows NT workstations that automatically boot, daily and become nodes of the Mosix cluster. The models included in our study are a Bertini intra-nuclear cascade model with excitons, consisting of a pre-equilibrium model, a nucleus explosion model, a fission model and an evaporation model. The speed and accuracy obtained for these models is presented. (authors)
Using EDUCache Simulator for the Computer Architecture and Organization Course

Directory of Open Access Journals (Sweden)

Sasko Ristov

2013-07-01

Full Text Available The computer architecture and organization course is essential in all computer science and engineering programs, and the most selected and liked elective course for related engineering disciplines. However, the attractiveness brings a new challenge, it requires a lot of effort by the instructor, to explain rather complicated concepts to beginners or to those who study related disciplines. The usage of visual simulators can improve both the teaching and learning processes. The overall goal is twofold: 1~to enable a visual environment to explain the basic concepts and 2~to increase the student's willingness and ability to learn the material.A lot of visual simulators have been used for the computer architecture and organization course. However, due to the lack of visual simulators for simulation of the cache memory concepts, we have developed a new visual simulator EDUCache simulator. In this paper we present that it can be effectively and efficiently used as a supporting tool in the learning process of modern multi-layer, multi-cache and multi-core multi-processors.EDUCache's features enable an environment for performance evaluation and engineering of software systems, i.e. the students will also understand the importance of computer architecture building parts and hopefully, will increase their curiosity for hardware courses in general.
An energy efficient and high speed architecture for convolution computing based on binary resistive random access memory

Science.gov (United States)

Liu, Chen; Han, Runze; Zhou, Zheng; Huang, Peng; Liu, Lifeng; Liu, Xiaoyan; Kang, Jinfeng

2018-04-01

In this work we present a novel convolution computing architecture based on metal oxide resistive random access memory (RRAM) to process the image data stored in the RRAM arrays. The proposed image storage architecture shows performances of better speed-device consumption efficiency compared with the previous kernel storage architecture. Further we improve the architecture for a high accuracy and low power computing by utilizing the binary storage and the series resistor. For a 28 × 28 image and 10 kernels with a size of 3 × 3, compared with the previous kernel storage approach, the newly proposed architecture shows excellent performances including: 1) almost 100% accuracy within 20% LRS variation and 90% HRS variation; 2) more than 67 times speed boost; 3) 71.4% energy saving.
Brain architecture: A design for natural computation

OpenAIRE

Kaiser, Marcus

2008-01-01

Fifty years ago, John von Neumann compared the architecture of the brain with that of computers that he invented and which is still in use today. In those days, the organisation of computers was based on concepts of brain organisation. Here, we give an update on current results on the global organisation of neural systems. For neural systems, we outline how the spatial and topological architecture of neuronal and cortical networks facilitates robustness against failures, fast processing, and ...
Thrifty: An Exascale Architecture for Energy Proportional Computing

Energy Technology Data Exchange (ETDEWEB)

Torrellas, Josep [Univ. of Illinois, Champaign, IL (United States)

2014-12-23

The objective of this project is to design different aspects of a novel exascale architecture called Thrifty. Our goal is to focus on the challenges of power/energy efficiency, performance, and resiliency in exascale systems. The project includes work on computer architecture (Josep Torrellas from University of Illinois), compilation (Daniel Quinlan from Lawrence Livermore National Laboratory), runtime and applications (Laura Carrington from University of California San Diego), and circuits (Wilfred Pinfold from Intel Corporation). In this report, we focus on the progress at the University of Illinois during the last year of the grant (September 1, 2013 to August 31, 2014). We also point to the progress in the other collaborating institutions when needed.
Memristor-Based Synapse Design and Training Scheme for Neuromorphic Computing Architecture

Science.gov (United States)

2012-06-01

system level built upon the conventional Von Neumann computer architecture [2][3]. Developing the neuromorphic architecture at chip level by...SCHEME FOR NEUROMORPHIC COMPUTING ARCHITECTURE 5a. CONTRACT NUMBER FA8750-11-2-0046 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 62788F 6...creation of memristor-based neuromorphic computing architecture. Rather than the existing crossbar-based neuron network designs, we focus on memristor
Quantum computation architecture using optical tweezers

DEFF Research Database (Denmark)

Weitenberg, Christof; Kuhr, Stefan; Mølmer, Klaus

2011-01-01

We present a complete architecture for scalable quantum computation with ultracold atoms in optical lattices using optical tweezers focused to the size of a lattice spacing. We discuss three different two-qubit gates based on local collisional interactions. The gates between arbitrary qubits...... quantum computing....
A Multi-Time Scale Morphable Software Milieu for Polymorphous Computing Architectures (PCA) - Composable, Scalable Systems

National Research Council Canada - National Science Library

Skjellum, Anthony

2004-01-01

Polymorphous Computing Architectures (PCA) rapidly "morph" (reorganize) software and hardware configurations in order to achieve high performance on computation styles ranging from specialized streaming to general threaded applications...
Architecture and VHDL behavioural validation of a parallel processor dedicated to computer vision

International Nuclear Information System (INIS)

Collette, Thierry

1992-01-01

Speeding up image processing is mainly obtained using parallel computers; SIMD processors (single instruction stream, multiple data stream) have been developed, and have proven highly efficient regarding low-level image processing operations. Nevertheless, their performances drop for most intermediate of high level operations, mainly when random data reorganisations in processor memories are involved. The aim of this thesis was to extend the SIMD computer capabilities to allow it to perform more efficiently at the image processing intermediate level. The study of some representative algorithms of this class, points out the limits of this computer. Nevertheless, these limits can be erased by architectural modifications. This leads us to propose SYMPATIX, a new SIMD parallel computer. To valid its new concept, a behavioural model written in VHDL - Hardware Description Language - has been elaborated. With this model, the new computer performances have been estimated running image processing algorithm simulations. VHDL modeling approach allows to perform the system top down electronic design giving an easy coupling between system architectural modifications and their electronic cost. The obtained results show SYMPATIX to be an efficient computer for low and intermediate level image processing. It can be connected to a high level computer, opening up the development of new computer vision applications. This thesis also presents, a top down design method, based on the VHDL, intended for electronic system architects. (author) [fr
High Performance Systolic Array Core Architecture Design for DNA Sequencer

Directory of Open Access Journals (Sweden)

Saiful Nurdin Dayana

2018-01-01

Full Text Available This paper presents a high performance systolic array (SA core architecture design for Deoxyribonucleic Acid (DNA sequencer. The core implements the affine gap penalty score Smith-Waterman (SW algorithm. This time-consuming local alignment algorithm guarantees optimal alignment between DNA sequences, but it requires quadratic computation time when performed on standard desktop computers. The use of linear SA decreases the time complexity from quadratic to linear. In addition, with the exponential growth of DNA databases, the SA architecture is used to overcome the timing issue. In this work, the SW algorithm has been captured using Verilog Hardware Description Language (HDL and simulated using Xilinx ISIM simulator. The proposed design has been implemented in Xilinx Virtex -6 Field Programmable Gate Array (FPGA and improved in the core area by 90% reduction.
An Overview of the Most Important Reference Architectures for Cloud Computing

Directory of Open Access Journals (Sweden)

Razvan Daniel ZOTA

2014-01-01

Full Text Available In this paper we have presented the main characteristics of the most important reference archi-tectures designed for the cloud computing environment. Specifically, we have introduced the proposed architectures of the worldwide cloud computing companies like Cisco, IBM and VMware and we also had a look at the National Institute of Standards and Technology (NIST reference architecture which is the starting point for all proposed architectures in the field. As one would expect, the provider dependent reference architectures are written is such a way to suit the services and products of the company, while NIST’s architecture is a more general model with more comprehensive architectural details that we highlighted in this article. In the end of the article we draw out some conclusions regarding the existing reference architectures for cloud computing.
Compact, open-architecture computed radiography system

International Nuclear Information System (INIS)

Huang, H.K.; Lim, A.; Kangarloo, H.; Eldredge, S.; Loloyan, M.; Chuang, K.S.

1990-01-01

Computed radiography (CR) was introduced in 1982, and its basic system design has not changed. Current CR systems have certain limitations: spatial resolution and signal-to-noise ratios are lower than those of screen-film systems, they are complicated and expensive to build, and they have a closed architecture. The authors of this paper designed and implemented a simpler, lower-cost, compact, open-architecture CR system to overcome some of these limitations. The open-architecture system is a manual-load-single-plate reader that can fit on a desk top. Phosphor images are stored in a local disk and can be sent to any other computer through standard interfaces. Any manufacturer's plate can be read with a scanning time of 90 second for a 35 x 43-cm plate. The standard pixel size is 174 μm and can be adjusted for higher spatial resolution. The data resolution is 12 bits/pixel over an x-ray exposure range of 0.01-100 mR
Switching from computer to microcomputer architecture education

Science.gov (United States)

Bolanakis, Dimosthenis E.; Kotsis, Konstantinos T.; Laopoulos, Theodore

2010-03-01

In the last decades, the technological and scientific evolution of the computing discipline has been widely affecting research in software engineering education, which nowadays advocates more enlightened and liberal ideas. This article reviews cross-disciplinary research on a computer architecture class in consideration of its switching to microcomputer architecture. The authors present their strategies towards a successful crossing of boundaries between engineering disciplines. This communication aims at providing a different aspect on professional courses that are, nowadays, addressed at the expense of traditional courses.
Geometric Computing for Freeform Architecture

KAUST Repository

Wallner, J.; Pottmann, Helmut

2011-01-01

Geometric computing has recently found a new field of applications, namely the various geometric problems which lie at the heart of rationalization and construction-aware design processes of freeform architecture. We report on our work in this area
Teaching Computer Organization and Architecture Using Simulation and FPGA Applications

OpenAIRE

D. K.M. Al-Aubidy

2007-01-01

This paper presents the design concepts and realization of incorporating micro-operation simulation and FPGA implementation into a teaching tool for computer organization and architecture. This teaching tool helps computer engineering and computer science students to be familiarized practically with computer organization and architecture through the development of their own instruction set, computer programming and interfacing experiments. A two-pass assembler has been designed and implemente...

Scalable quantum computer architecture with coupled donor-quantum dot qubits

Science.gov (United States)

Schenkel, Thomas; Lo, Cheuk Chi; Weis, Christoph; Lyon, Stephen; Tyryshkin, Alexei; Bokor, Jeffrey

2014-08-26

A quantum bit computing architecture includes a plurality of single spin memory donor atoms embedded in a semiconductor layer, a plurality of quantum dots arranged with the semiconductor layer and aligned with the donor atoms, wherein a first voltage applied across at least one pair of the aligned quantum dot and donor atom controls a donor-quantum dot coupling. A method of performing quantum computing in a scalable architecture quantum computing apparatus includes arranging a pattern of single spin memory donor atoms in a semiconductor layer, forming a plurality of quantum dots arranged with the semiconductor layer and aligned with the donor atoms, applying a first voltage across at least one aligned pair of a quantum dot and donor atom to control a donor-quantum dot coupling, and applying a second voltage between one or more quantum dots to control a Heisenberg exchange J coupling between quantum dots and to cause transport of a single spin polarized electron between quantum dots.
Developing Materials Processing to Performance Modeling Capabilities and the Need for Exascale Computing Architectures (and Beyond)

Energy Technology Data Exchange (ETDEWEB)

Schraad, Mark William [Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Physics and Engineering Models; Luscher, Darby Jon [Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Advanced Simulation and Computing

2016-09-06

Additive Manufacturing techniques are presenting the Department of Energy and the NNSA Laboratories with new opportunities to consider novel component production and repair processes, and to manufacture materials with tailored response and optimized performance characteristics. Additive Manufacturing technologies already are being applied to primary NNSA mission areas, including Nuclear Weapons. These mission areas are adapting to these new manufacturing methods, because of potential advantages, such as smaller manufacturing footprints, reduced needs for specialized tooling, an ability to embed sensing, novel part repair options, an ability to accommodate complex geometries, and lighter weight materials. To realize the full potential of Additive Manufacturing as a game-changing technology for the NNSA’s national security missions; however, significant progress must be made in several key technical areas. In addition to advances in engineering design, process optimization and automation, and accelerated feedstock design and manufacture, significant progress must be made in modeling and simulation. First and foremost, a more mature understanding of the process-structure-property-performance relationships must be developed. Because Additive Manufacturing processes change the nature of a material’s structure below the engineering scale, new models are required to predict materials response across the spectrum of relevant length scales, from the atomistic to the continuum. New diagnostics will be required to characterize materials response across these scales. And not just models, but advanced algorithms, next-generation codes, and advanced computer architectures will be required to complement the associated modeling activities. Based on preliminary work in each of these areas, a strong argument for the need for Exascale computing architectures can be made, if a legitimate predictive capability is to be developed.
Biomorphic Multi-Agent Architecture for Persistent Computing

Science.gov (United States)

Lodding, Kenneth N.; Brewster, Paul

2009-01-01

A multi-agent software/hardware architecture, inspired by the multicellular nature of living organisms, has been proposed as the basis of design of a robust, reliable, persistent computing system. Just as a multicellular organism can adapt to changing environmental conditions and can survive despite the failure of individual cells, a multi-agent computing system, as envisioned, could adapt to changing hardware, software, and environmental conditions. In particular, the computing system could continue to function (perhaps at a reduced but still reasonable level of performance) if one or more component( s) of the system were to fail. One of the defining characteristics of a multicellular organism is unity of purpose. In biology, the purpose is survival of the organism. The purpose of the proposed multi-agent architecture is to provide a persistent computing environment in harsh conditions in which repair is difficult or impossible. A multi-agent, organism-like computing system would be a single entity built from agents or cells. Each agent or cell would be a discrete hardware processing unit that would include a data processor with local memory, an internal clock, and a suite of communication equipment capable of both local line-of-sight communications and global broadcast communications. Some cells, denoted specialist cells, could contain such additional hardware as sensors and emitters. Each cell would be independent in the sense that there would be no global clock, no global (shared) memory, no pre-assigned cell identifiers, no pre-defined network topology, and no centralized brain or control structure. Like each cell in a living organism, each agent or cell of the computing system would contain a full description of the system encoded as genes, but in this case, the genes would be components of a software genome.
Simulating Hydrologic Flow and Reactive Transport with PFLOTRAN and PETSc on Emerging Fine-Grained Parallel Computer Architectures

Science.gov (United States)

Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.

2017-12-01

As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.
VLSI Architectures for Computing DFT's

Science.gov (United States)

Truong, T. K.; Chang, J. J.; Hsu, I. S.; Reed, I. S.; Pei, D. Y.

1986-01-01

Simplifications result from use of residue Fermat number systems. System of finite arithmetic over residue Fermat number systems enables calculation of discrete Fourier transform (DFT) of series of complex numbers with reduced number of multiplications. Computer architectures based on approach suitable for design of very-large-scale integrated (VLSI) circuits for computing DFT's. General approach not limited to DFT's; Applicable to decoding of error-correcting codes and other transform calculations. System readily implemented in VLSI.
Toward a Fault Tolerant Architecture for Vital Medical-Based Wearable Computing.

Science.gov (United States)

Abdali-Mohammadi, Fardin; Bajalan, Vahid; Fathi, Abdolhossein

2015-12-01

Advancements in computers and electronic technologies have led to the emergence of a new generation of efficient small intelligent systems. The products of such technologies might include Smartphones and wearable devices, which have attracted the attention of medical applications. These products are used less in critical medical applications because of their resource constraint and failure sensitivity. This is due to the fact that without safety considerations, small-integrated hardware will endanger patients' lives. Therefore, proposing some principals is required to construct wearable systems in healthcare so that the existing concerns are dealt with. Accordingly, this paper proposes an architecture for constructing wearable systems in critical medical applications. The proposed architecture is a three-tier one, supporting data flow from body sensors to cloud. The tiers of this architecture include wearable computers, mobile computing, and mobile cloud computing. One of the features of this architecture is its high possible fault tolerance due to the nature of its components. Moreover, the required protocols are presented to coordinate the components of this architecture. Finally, the reliability of this architecture is assessed by simulating the architecture and its components, and other aspects of the proposed architecture are discussed.
Peer-to-peer architectures for exascale computing : LDRD final report.

Energy Technology Data Exchange (ETDEWEB)

Vorobeychik, Yevgeniy; Mayo, Jackson R.; Minnich, Ronald G.; Armstrong, Robert C.; Rudish, Donald W.

2010-09-01

The goal of this research was to investigate the potential for employing dynamic, decentralized software architectures to achieve reliability in future high-performance computing platforms. These architectures, inspired by peer-to-peer networks such as botnets that already scale to millions of unreliable nodes, hold promise for enabling scientific applications to run usefully on next-generation exascale platforms ({approx} 10{sup 18} operations per second). Traditional parallel programming techniques suffer rapid deterioration of performance scaling with growing platform size, as the work of coping with increasingly frequent failures dominates over useful computation. Our studies suggest that new architectures, in which failures are treated as ubiquitous and their effects are considered as simply another controllable source of error in a scientific computation, can remove such obstacles to exascale computing for certain applications. We have developed a simulation framework, as well as a preliminary implementation in a large-scale emulation environment, for exploration of these 'fault-oblivious computing' approaches. High-performance computing (HPC) faces a fundamental problem of increasing total component failure rates due to increasing system sizes, which threaten to degrade system reliability to an unusable level by the time the exascale range is reached ({approx} 10{sup 18} operations per second, requiring of order millions of processors). As computer scientists seek a way to scale system software for next-generation exascale machines, it is worth considering peer-to-peer (P2P) architectures that are already capable of supporting 10{sup 6}-10{sup 7} unreliable nodes. Exascale platforms will require a different way of looking at systems and software because the machine will likely not be available in its entirety for a meaningful execution time. Realistic estimates of failure rates range from a few times per day to more than once per hour for these
Biomimetic design processes in architecture: morphogenetic and evolutionary computational design

International Nuclear Information System (INIS)

Menges, Achim

2012-01-01

Design computation has profound impact on architectural design methods. This paper explains how computational design enables the development of biomimetic design processes specific to architecture, and how they need to be significantly different from established biomimetic processes in engineering disciplines. The paper first explains the fundamental difference between computer-aided and computational design in architecture, as the understanding of this distinction is of critical importance for the research presented. Thereafter, the conceptual relation and possible transfer of principles from natural morphogenesis to design computation are introduced and the related developments of generative, feature-based, constraint-based, process-based and feedback-based computational design methods are presented. This morphogenetic design research is then related to exploratory evolutionary computation, followed by the presentation of two case studies focusing on the exemplary development of spatial envelope morphologies and urban block morphologies. (paper)
Roadmap to the SRS computing architecture

Energy Technology Data Exchange (ETDEWEB)

Johnson, A.

1994-07-05

This document outlines the major steps that must be taken by the Savannah River Site (SRS) to migrate the SRS information technology (IT) environment to the new architecture described in the Savannah River Site Computing Architecture. This document proposes an IT environment that is {open_quotes}...standards-based, data-driven, and workstation-oriented, with larger systems being utilized for the delivery of needed information to users in a client-server relationship.{close_quotes} Achieving this vision will require many substantial changes in the computing applications, systems, and supporting infrastructure at the site. This document consists of a set of roadmaps which provide explanations of the necessary changes for IT at the site and describes the milestones that must be completed to finish the migration.
High Performance Computing in Science and Engineering '02 : Transactions of the High Performance Computing Center

CERN Document Server

Jäger, Willi

2003-01-01

This book presents the state-of-the-art in modeling and simulation on supercomputers. Leading German research groups present their results achieved on high-end systems of the High Performance Computing Center Stuttgart (HLRS) for the year 2002. Reports cover all fields of supercomputing simulation ranging from computational fluid dynamics to computer science. Special emphasis is given to industrially relevant applications. Moreover, by presenting results for both vector sytems and micro-processor based systems the book allows to compare performance levels and usability of a variety of supercomputer architectures. It therefore becomes an indispensable guidebook to assess the impact of the Japanese Earth Simulator project on supercomputing in the years to come.
High-performance full adder architecture in quantum-dot cellular automata

Directory of Open Access Journals (Sweden)

Hamid Rashidi

2017-06-01

Full Text Available Quantum-dot cellular automata (QCA is a new and promising computation paradigm, which can be a viable replacement for the complementary metal–oxide–semiconductor technology at nano-scale level. This technology provides a possible solution for improving the computation in various computational applications. Two QCA full adder architectures are presented and evaluated: a new and efficient 1-bit QCA full adder architecture and a 4-bit QCA ripple carry adder (RCA architecture. The proposed architectures are simulated using QCADesigner tool version 2.0.1. These architectures are implemented with the coplanar crossover approach. The simulation results show that the proposed 1-bit QCA full adder and 4-bit QCA RCA architectures utilise 33 and 175 QCA cells, respectively. Our simulation results show that the proposed architectures outperform most results so far in the literature.
Neuromorphic Computing, Architectures, Models, and Applications. A Beyond-CMOS Approach to Future Computing, June 29-July 1, 2016, Oak Ridge, TN

Energy Technology Data Exchange (ETDEWEB)

Potok, Thomas [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Schuman, Catherine [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Patton, Robert [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Hylton, Todd [Brain Corporation, San Diego, CA (United States); Li, Hai [Univ. of Pittsburgh, PA (United States); Pino, Robinson [US Dept. of Energy, Washington, DC (United States)

2016-12-31

The White House and Department of Energy have been instrumental in driving the development of a neuromorphic computing program to help the United States continue its lead in basic research into (1) Beyond Exascale—high performance computing beyond Moore’s Law and von Neumann architectures, (2) Scientific Discovery—new paradigms for understanding increasingly large and complex scientific data, and (3) Emerging Architectures—assessing the potential of neuromorphic and quantum architectures. Neuromorphic computing spans a broad range of scientific disciplines from materials science to devices, to computer science, to neuroscience, all of which are required to solve the neuromorphic computing grand challenge. In our workshop we focus on the computer science aspects, specifically from a neuromorphic device through an application. Neuromorphic devices present a very different paradigm to the computer science community from traditional von Neumann architectures, which raises six major questions about building a neuromorphic application from the device level. We used these fundamental questions to organize the workshop program and to direct the workshop panels and discussions. From the white papers, presentations, panels, and discussions, there emerged several recommendations on how to proceed.
Utilizing a multiprocessor architecture - The performance of MIDAS

International Nuclear Information System (INIS)

Maples, C.; Logan, D.; Meng, J.; Rathbun, W.; Weaver, D.

1983-01-01

The MIDAS architecture organizes multiple CPUs into clusters called distributed subsystems. Each subsystem consists of an array of processors controlled by a supervisory CPU. The multiprocessor array is composed of commercial CPUs (with floating point hardware) and specialized processing elements. Interprocessor communication within the array may occur either through switched memory modules or common shared memory. The architecture permits multiple processors to be focused on single problems. A distributed subsystem has been constructed and tested. It currently consists of a supervisor CPU; 16 blocks of independently switchable memory; 9 general purpose, VAX-class CPUs; and 2 specialized pipelined processors to handle I/O. Results on a variety of problems indicate that the subsystem performs 8 to 15 times faster than a standard computer with an identical CPU. The difference in performance represents the effect of differing CPU and I/O requirements
Performative Computation-aided Design Optimization

Directory of Open Access Journals (Sweden)

Ming Tang

2012-12-01

Full Text Available This article discusses a collaborative research and teaching project between the University of Cincinnati, Perkins+Will’s Tech Lab, and the University of North Carolina Greensboro. The primary investigation focuses on the simulation, optimization, and generation of architectural designs using performance-based computational design approaches. The projects examine various design methods, including relationships between building form, performance and the use of proprietary software tools for parametric design.
A heterogeneous hierarchical architecture for real-time computing

Energy Technology Data Exchange (ETDEWEB)

Skroch, D.A.; Fornaro, R.J.

1988-12-01

The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.
SCinet Architecture: Featured at the International Conference for High Performance Computing,Networking, Storage and Analysis 2016

Energy Technology Data Exchange (ETDEWEB)

Lyonnais, Marc; Smith, Matt; Mace, Kate P.

2017-02-06

SCinet is the purpose-built network that operates during the International Conference for High Performance Computing,Networking, Storage and Analysis (Super Computing or SC). Created each year for the conference, SCinet brings to life a high-capacity network that supports applications and experiments that are a hallmark of the SC conference. The network links the convention center to research and commercial networks around the world. This resource serves as a platform for exhibitors to demonstrate the advanced computing resources of their home institutions and elsewhere by supporting a wide variety of applications. Volunteers from academia, government and industry work together to design and deliver the SCinet infrastructure. Industry vendors and carriers donate millions of dollars in equipment and services needed to build and support the local and wide area networks. Planning begins more than a year in advance of each SC conference and culminates in a high intensity installation in the days leading up to the conference. The SCinet architecture for SC16 illustrates a dramatic increase in participation from the vendor community, particularly those that focus on network equipment. Software-Defined Networking (SDN) and Data Center Networking (DCN) are present in nearly all aspects of the design.
Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

Science.gov (United States)

Carroll, Chester C.; Youngblood, John N.; Saha, Aindam

1987-01-01

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
A learnable parallel processing architecture towards unity of memory and computing.

Science.gov (United States)

Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

2015-08-14

Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
A learnable parallel processing architecture towards unity of memory and computing

Science.gov (United States)

Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

2015-08-01

Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.
Enabling high performance computational science through combinatorial algorithms

International Nuclear Information System (INIS)

Boman, Erik G; Bozdag, Doruk; Catalyurek, Umit V; Devine, Karen D; Gebremedhin, Assefaw H; Hovland, Paul D; Pothen, Alex; Strout, Michelle Mills

2007-01-01

The Combinatorial Scientific Computing and Petascale Simulations (CSCAPES) Institute is developing algorithms and software for combinatorial problems that play an enabling role in scientific and engineering computations. Discrete algorithms will be increasingly critical for achieving high performance for irregular problems on petascale architectures. This paper describes recent contributions by researchers at the CSCAPES Institute in the areas of load balancing, parallel graph coloring, performance improvement, and parallel automatic differentiation

Enabling high performance computational science through combinatorial algorithms

Energy Technology Data Exchange (ETDEWEB)

Boman, Erik G [Discrete Algorithms and Math Department, Sandia National Laboratories (United States); Bozdag, Doruk [Biomedical Informatics, and Electrical and Computer Engineering, Ohio State University (United States); Catalyurek, Umit V [Biomedical Informatics, and Electrical and Computer Engineering, Ohio State University (United States); Devine, Karen D [Discrete Algorithms and Math Department, Sandia National Laboratories (United States); Gebremedhin, Assefaw H [Computer Science and Center for Computational Science, Old Dominion University (United States); Hovland, Paul D [Mathematics and Computer Science Division, Argonne National Laboratory (United States); Pothen, Alex [Computer Science and Center for Computational Science, Old Dominion University (United States); Strout, Michelle Mills [Computer Science, Colorado State University (United States)

2007-07-15

The Combinatorial Scientific Computing and Petascale Simulations (CSCAPES) Institute is developing algorithms and software for combinatorial problems that play an enabling role in scientific and engineering computations. Discrete algorithms will be increasingly critical for achieving high performance for irregular problems on petascale architectures. This paper describes recent contributions by researchers at the CSCAPES Institute in the areas of load balancing, parallel graph coloring, performance improvement, and parallel automatic differentiation.
Supporting Undergraduate Computer Architecture Students Using a Visual MIPS64 CPU Simulator

Science.gov (United States)

Patti, D.; Spadaccini, A.; Palesi, M.; Fazzino, F.; Catania, V.

2012-01-01

The topics of computer architecture are always taught using an Assembly dialect as an example. The most commonly used textbooks in this field use the MIPS64 Instruction Set Architecture (ISA) to help students in learning the fundamentals of computer architecture because of its orthogonality and its suitability for real-world applications. This…
A performance analysis of advanced I/O architectures for PC-based network file servers

Science.gov (United States)

Huynh, K. D.; Khoshgoftaar, T. M.

1994-12-01

In the personal computing and workstation environments, more and more I/O adapters are becoming complete functional subsystems that are intelligent enough to handle I/O operations on their own without much intervention from the host processor. The IBM Subsystem Control Block (SCB) architecture has been defined to enhance the potential of these intelligent adapters by defining services and conventions that deliver command information and data to and from the adapters. In recent years, a new storage architecture, the Redundant Array of Independent Disks (RAID), has been quickly gaining acceptance in the world of computing. In this paper, we would like to discuss critical system design issues that are important to the performance of a network file server. We then present a performance analysis of the SCB architecture and disk array technology in typical network file server environments based on personal computers (PCs). One of the key issues investigated in this paper is whether a disk array can outperform a group of disks (of same type, same data capacity, and same cost) operating independently, not in parallel as in a disk array.
Real-time FPGA architectures for computer vision

Science.gov (United States)

Arias-Estrada, Miguel; Torres-Huitzil, Cesar

2000-03-01

This paper presents an architecture for real-time generic convolution of a mask and an image. The architecture is intended for fast low level image processing. The FPGA-based architecture takes advantage of the availability of registers in FPGAs to implement an efficient and compact module to process the convolutions. The architecture is designed to minimize the number of accesses to the image memory and is based on parallel modules with internal pipeline operation in order to improve its performance. The architecture is prototyped in a FPGA, but it can be implemented on a dedicated VLSI to reach higher clock frequencies. Complexity issues, FPGA resources utilization, FPGA limitations, and real time performance are discussed. Some results are presented and discussed.
Layered Architectures for Quantum Computers and Quantum Repeaters

Science.gov (United States)

Jones, Nathan C.

This chapter examines how to organize quantum computers and repeaters using a systematic framework known as layered architecture, where machine control is organized in layers associated with specialized tasks. The framework is flexible and could be used for analysis and comparison of quantum information systems. To demonstrate the design principles in practice, we develop architectures for quantum computers and quantum repeaters based on optically controlled quantum dots, showing how a myriad of technologies must operate synchronously to achieve fault-tolerance. Optical control makes information processing in this system very fast, scalable to large problem sizes, and extendable to quantum communication.
CUDA/GPU Technology : Parallel Programming For High Performance Scientific Computing

OpenAIRE

YUHENDRA; KUZE, Hiroaki; JOSAPHAT, Tetuko Sri Sumantyo

2009-01-01

[ABSTRACT]Graphics processing units (GP Us) originally designed for computer video cards have emerged as the most powerful chip in a high-performance workstation. In the high performance computation capabilities, graphic processing units (GPU) lead to much more powerful performance than conventional CPUs by means of parallel processing. In 2007, the birth of Compute Unified Device Architecture (CUDA) and CUDA-enabled GPUs by NVIDIA Corporation brought a revolution in the general purpose GPU a...
The path toward HEP High Performance Computing

International Nuclear Information System (INIS)

Apostolakis, John; Brun, René; Gheata, Andrei; Wenzel, Sandro; Carminati, Federico

2014-01-01

High Energy Physics code has been known for making poor use of high performance computing architectures. Efforts in optimising HEP code on vector and RISC architectures have yield limited results and recent studies have shown that, on modern architectures, it achieves a performance between 10% and 50% of the peak one. Although several successful attempts have been made to port selected codes on GPUs, no major HEP code suite has a 'High Performance' implementation. With LHC undergoing a major upgrade and a number of challenging experiments on the drawing board, HEP cannot any longer neglect the less-than-optimal performance of its code and it has to try making the best usage of the hardware. This activity is one of the foci of the SFT group at CERN, which hosts, among others, the Root and Geant4 project. The activity of the experiments is shared and coordinated via a Concurrency Forum, where the experience in optimising HEP code is presented and discussed. Another activity is the Geant-V project, centred on the development of a highperformance prototype for particle transport. Achieving a good concurrency level on the emerging parallel architectures without a complete redesign of the framework can only be done by parallelizing at event level, or with a much larger effort at track level. Apart the shareable data structures, this typically implies a multiplication factor in terms of memory consumption compared to the single threaded version, together with sub-optimal handling of event processing tails. Besides this, the low level instruction pipelining of modern processors cannot be used efficiently to speedup the program. We have implemented a framework that allows scheduling vectors of particles to an arbitrary number of computing resources in a fine grain parallel approach. The talk will review the current optimisation activities within the SFT group with a particular emphasis on the development perspectives towards a simulation framework able to profit
Design for scalability in 3D computer graphics architectures

DEFF Research Database (Denmark)

Holten-Lund, Hans Erik

2002-01-01

This thesis describes useful methods and techniques for designing scalable hybrid parallel rendering architectures for 3D computer graphics. Various techniques for utilizing parallelism in a pipelines system are analyzed. During the Ph.D study a prototype 3D graphics architecture named Hybris has...
MOMCC: Market-Oriented Architecture for Mobile Cloud Computing Based on Service Oriented Architecture

OpenAIRE

Abolfazli, Saeid; Sanaei, Zohreh; Gani, Abdullah; Shiraz, Muhammad

2012-01-01

The vision of augmenting computing capabilities of mobile devices, especially smartphones with least cost is likely transforming to reality leveraging cloud computing. Cloud exploitation by mobile devices breeds a new research domain called Mobile Cloud Computing (MCC). However, issues like portability and interoperability should be addressed for mobile augmentation which is a non-trivial task using component-based approaches. Service Oriented Architecture (SOA) is a promising design philosop...
How computer science can help in understanding the 3D genome architecture.

Science.gov (United States)

Shavit, Yoli; Merelli, Ivan; Milanesi, Luciano; Lio', Pietro

2016-09-01

Chromosome conformation capture techniques are producing a huge amount of data about the architecture of our genome. These data can provide us with a better understanding of the events that induce critical regulations of the cellular function from small changes in the three-dimensional genome architecture. Generating a unified view of spatial, temporal, genetic and epigenetic properties poses various challenges of data analysis, visualization, integration and mining, as well as of high performance computing and big data management. Here, we describe the critical issues of this new branch of bioinformatics, oriented at the comprehension of the three-dimensional genome architecture, which we call 'Nucleome Bioinformatics', looking beyond the currently available tools and methods, and highlight yet unaddressed challenges and the potential approaches that could be applied for tackling them. Our review provides a map for researchers interested in using computer science for studying 'Nucleome Bioinformatics', to achieve a better understanding of the biological processes that occur inside the nucleus. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Computer aided architectural design : futures 2001

NARCIS (Netherlands)

Vries, de B.; Leeuwen, van J.P.; Achten, H.H.

2001-01-01

CAAD Futures is a bi-annual conference that aims to promote the advancement of computer-aided architectural design in the service of those concerned with the quality of the built environment. The conferences are organized under the auspices of the CAAD Futures Foundation, which has its secretariat
INVESTIGATION OF FLIP-FLOP PERFORMANCE ON DIFFERENT TYPE AND ARCHITECTURE IN SHIFT REGISTER WITH PARALLEL LOAD APPLICATIONS

Directory of Open Access Journals (Sweden)

Dwi Purnomo

2015-08-01

Full Text Available Register is one of the computer components that have a key role in computer organisation. Every computer contains millions of registers that are manifested by flip-flop. This research focuses on the investigation of flip-flop performance based on its type (D, T, S-R, and J-K and architecture (structural, behavioural, and hybrid. Each type of flip-flop on each architecture would be tested in different bit of shift register with parallel load applications. The experiment criteria that will be assessed are power consumption, resources required, memory required, latency, and efficiency. Based on the experiment, it could be shown that D flip-flop and hybrid architecture showed the best performance in required memory, latency, power consumption, and efficiency. In addition, the experiment results showed that the greater the register number, the less efficient the system would be.
Optimizations of Unstructured Aerodynamics Computations for Many-core Architectures

KAUST Repository

Al Farhan, Mohammed Ahmed

2018-04-13

We investigate several state-of-the-practice shared-memory optimization techniques applied to key routines of an unstructured computational aerodynamics application with irregular memory accesses. We illustrate for the Intel KNL processor, as a representative of the processors in contemporary leading supercomputers, identifying and addressing performance challenges without compromising the floating point numerics of the original code. We employ low and high-level architecture-specific code optimizations involving thread and data-level parallelism. Our approach is based upon a multi-level hierarchical distribution of work and data across both the threads and the SIMD units within every hardware core. On a 64-core KNL chip, we achieve nearly 2.9x speedup of the dominant routines relative to the baseline. These exhibit almost linear strong scalability up to 64 threads, and thereafter some improvement with hyperthreading. At substantially fewer Watts, we achieve up to 1.7x speedup relative to the performance of 72 threads of a 36-core Haswell CPU and roughly equivalent performance to 112 threads of a 56-core Skylake scalable processor. These optimizations are expected to be of value for many other unstructured mesh PDE-based scientific applications as multi and many-core architecture evolves.
Universal Quantum Computing with Measurement-Induced Continuous-Variable Gate Sequence in a Loop-Based Architecture.

Science.gov (United States)

Takeda, Shuntaro; Furusawa, Akira

2017-09-22

We propose a scalable scheme for optical quantum computing using measurement-induced continuous-variable quantum gates in a loop-based architecture. Here, time-bin-encoded quantum information in a single spatial mode is deterministically processed in a nested loop by an electrically programmable gate sequence. This architecture can process any input state and an arbitrary number of modes with almost minimum resources, and offers a universal gate set for both qubits and continuous variables. Furthermore, quantum computing can be performed fault tolerantly by a known scheme for encoding a qubit in an infinite-dimensional Hilbert space of a single light mode.
On the impact of approximate computation in an analog DeSTIN architecture.

Science.gov (United States)

Young, Steven; Lu, Junjie; Holleman, Jeremy; Arel, Itamar

2014-05-01

Deep machine learning (DML) holds the potential to revolutionize machine learning by automating rich feature extraction, which has become the primary bottleneck of human engineering in pattern recognition systems. However, the heavy computational burden renders DML systems implemented on conventional digital processors impractical for large-scale problems. The highly parallel computations required to implement large-scale deep learning systems are well suited to custom hardware. Analog computation has demonstrated power efficiency advantages of multiple orders of magnitude relative to digital systems while performing nonideal computations. In this paper, we investigate typical error sources introduced by analog computational elements and their impact on system-level performance in DeSTIN--a compositional deep learning architecture. These inaccuracies are evaluated on a pattern classification benchmark, clearly demonstrating the robustness of the underlying algorithm to the errors introduced by analog computational elements. A clear understanding of the impacts of nonideal computations is necessary to fully exploit the efficiency of analog circuits.
The path toward HEP High Performance Computing

CERN Document Server

Apostolakis, John; Carminati, Federico; Gheata, Andrei; Wenzel, Sandro

2014-01-01

High Energy Physics code has been known for making poor use of high performance computing architectures. Efforts in optimising HEP code on vector and RISC architectures have yield limited results and recent studies have shown that, on modern architectures, it achieves a performance between 10% and 50% of the peak one. Although several successful attempts have been made to port selected codes on GPUs, no major HEP code suite has a 'High Performance' implementation. With LHC undergoing a major upgrade and a number of challenging experiments on the drawing board, HEP cannot any longer neglect the less-than-optimal performance of its code and it has to try making the best usage of the hardware. This activity is one of the foci of the SFT group at CERN, which hosts, among others, the Root and Geant4 project. The activity of the experiments is shared and coordinated via a Concurrency Forum, where the experience in optimising HEP code is presented and discussed. Another activity is the Geant-V project, centred on th...
Optimization and mathematical modeling in computer architecture

CERN Document Server

Sankaralingam, Karu; Nowatzki, Tony

2013-01-01

In this book we give an overview of modeling techniques used to describe computer systems to mathematical optimization tools. We give a brief introduction to various classes of mathematical optimization frameworks with special focus on mixed integer linear programming which provides a good balance between solver time and expressiveness. We present four detailed case studies -- instruction set customization, data center resource management, spatial architecture scheduling, and resource allocation in tiled architectures -- showing how MILP can be used and quantifying by how much it outperforms t
ATCA for Machines-- Advanced Telecommunications Computing Architecture

Energy Technology Data Exchange (ETDEWEB)

Larsen, R.S.; /SLAC

2008-04-22

The Advanced Telecommunications Computing Architecture is a new industry open standard for electronics instrument modules and shelves being evaluated for the International Linear Collider (ILC). It is the first industrial standard designed for High Availability (HA). ILC availability simulations have shown clearly that the capabilities of ATCA are needed in order to achieve acceptable integrated luminosity. The ATCA architecture looks attractive for beam instruments and detector applications as well. This paper provides an overview of ongoing R&D including application of HA principles to power electronics systems.
Blackboard architecture and qualitative model in a computer aided assistant designed to define computers for HEP computing

International Nuclear Information System (INIS)

Nodarse, F.F.; Ivanov, V.G.

1991-01-01

Using BLACKBOARD architecture and qualitative model, an expert systm was developed to assist the use in defining the computers method for High Energy Physics computing. The COMEX system requires an IBM AT personal computer or compatible with than 640 Kb RAM and hard disk. 5 refs.; 9 figs
An Adaptive Middleware for Improved Computational Performance

DEFF Research Database (Denmark)

Bonnichsen, Lars Frydendal

, we are improving computational performance by exploiting modern hardware features, such as dynamic voltage-frequency scaling and transactional memory. Adapting software is an iterative process, requiring that we continually revisit it to meet new requirements or realities; a time consuming process......The performance improvements in computer systems over the past 60 years have been fueled by an exponential increase in energy efficiency. In recent years, the phenomenon known as the end of Dennard’s scaling has slowed energy efficiency improvements — but improving computer energy efficiency...... is more important now than ever. Traditionally, most improvements in computer energy efficiency have come from improvements in lithography — the ability to produce smaller transistors — and computer architecture - the ability to apply those transistors efficiently. Since the end of scaling, we have seen...

Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

Science.gov (United States)

Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

2017-11-01

Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.
Smart SOA platforms in cloud computing architectures

CERN Document Server

Exposito , Ernesto

2014-01-01

This book is intended to introduce the principles of the Event-Driven and Service-Oriented Architecture (SOA 2.0) and its role in the new interconnected world based on the cloud computing architecture paradigm. In this new context, the concept of "service" is widely applied to the hardware and software resources available in the new generation of the Internet. The authors focus on how current and future SOA technologies provide the basis for the smart management of the service model provided by the Platform as a Service (PaaS) layer.
ATCA for Machines-- Advanced Telecommunications Computing Architecture

International Nuclear Information System (INIS)

Larsen, R

2008-01-01

The Advanced Telecommunications Computing Architecture is a new industry open standard for electronics instrument modules and shelves being evaluated for the International Linear Collider (ILC). It is the first industrial standard designed for High Availability (HA). ILC availability simulations have shown clearly that the capabilities of ATCA are needed in order to achieve acceptable integrated luminosity. The ATCA architecture looks attractive for beam instruments and detector applications as well. This paper provides an overview of ongoing R and D including application of HA principles to power electronics systems
Architectural design for a topological cluster state quantum computer

International Nuclear Information System (INIS)

Devitt, Simon J; Munro, William J; Nemoto, Kae; Fowler, Austin G; Stephens, Ashley M; Greentree, Andrew D; Hollenberg, Lloyd C L

2009-01-01

The development of a large scale quantum computer is a highly sought after goal of fundamental research and consequently a highly non-trivial problem. Scalability in quantum information processing is not just a problem of qubit manufacturing and control but it crucially depends on the ability to adapt advanced techniques in quantum information theory, such as error correction, to the experimental restrictions of assembling qubit arrays into the millions. In this paper, we introduce a feasible architectural design for large scale quantum computation in optical systems. We combine the recent developments in topological cluster state computation with the photonic module, a simple chip-based device that can be used as a fundamental building block for a large-scale computer. The integration of the topological cluster model with this comparatively simple operational element addresses many significant issues in scalable computing and leads to a promising modular architecture with complete integration of active error correction, exhibiting high fault-tolerant thresholds.
Software Systems for High-performance Quantum Computing

Energy Technology Data Exchange (ETDEWEB)

Humble, Travis S [ORNL; Britt, Keith A [ORNL

2016-01-01

Quantum computing promises new opportunities for solving hard computational problems, but harnessing this novelty requires breakthrough concepts in the design, operation, and application of computing systems. We define some of the challenges facing the development of quantum computing systems as well as software-based approaches that can be used to overcome these challenges. Following a brief overview of the state of the art, we present models for the quantum programming and execution models, the development of architectures for hybrid high-performance computing systems, and the realization of software stacks for quantum networking. This leads to a discussion of the role that conventional computing plays in the quantum paradigm and how some of the current challenges for exascale computing overlap with those facing quantum computing.
Quantum Accelerators for High-performance Computing Systems

Energy Technology Data Exchange (ETDEWEB)

Humble, Travis S. [ORNL; Britt, Keith A. [ORNL; Mohiyaddin, Fahd A. [ORNL

2017-11-01

We define some of the programming and system-level challenges facing the application of quantum processing to high-performance computing. Alongside barriers to physical integration, prominent differences in the execution of quantum and conventional programs challenges the intersection of these computational models. Following a brief overview of the state of the art, we discuss recent advances in programming and execution models for hybrid quantum-classical computing. We discuss a novel quantum-accelerator framework that uses specialized kernels to offload select workloads while integrating with existing computing infrastructure. We elaborate on the role of the host operating system to manage these unique accelerator resources, the prospects for deploying quantum modules, and the requirements placed on the language hierarchy connecting these different system components. We draw on recent advances in the modeling and simulation of quantum computing systems with the development of architectures for hybrid high-performance computing systems and the realization of software stacks for controlling quantum devices. Finally, we present simulation results that describe the expected system-level behavior of high-performance computing systems composed from compute nodes with quantum processing units. We describe performance for these hybrid systems in terms of time-to-solution, accuracy, and energy consumption, and we use simple application examples to estimate the performance advantage of quantum acceleration.
Performance Evaluation of a Mobile Wireless Computational Grid ...

African Journals Online (AJOL)

This work developed and simulated a mathematical model for a mobile wireless computational Grid architecture using networks of queuing theory. This was in order to evaluate the performance of theload-balancing three tier hierarchical configuration. The throughput and resource utilizationmetrics were measured and the ...
Heavy Lift Vehicle (HLV) Avionics Flight Computing Architecture Study

Science.gov (United States)

Hodson, Robert F.; Chen, Yuan; Morgan, Dwayne R.; Butler, A. Marc; Sdhuh, Joseph M.; Petelle, Jennifer K.; Gwaltney, David A.; Coe, Lisa D.; Koelbl, Terry G.; Nguyen, Hai D.

2011-01-01

A NASA multi-Center study team was assembled from LaRC, MSFC, KSC, JSC and WFF to examine potential flight computing architectures for a Heavy Lift Vehicle (HLV) to better understand avionics drivers. The study examined Design Reference Missions (DRMs) and vehicle requirements that could impact the vehicles avionics. The study considered multiple self-checking and voting architectural variants and examined reliability, fault-tolerance, mass, power, and redundancy management impacts. Furthermore, a goal of the study was to develop the skills and tools needed to rapidly assess additional architectures should requirements or assumptions change.
Performance of particle in cell methods on highly concurrent computational architectures

International Nuclear Information System (INIS)

Adams, M.F.; Ethier, S.; Wichmann, N.

2009-01-01

Particle in cell (PIC) methods are effective in computing Vlasov-Poisson system of equations used in simulations of magnetic fusion plasmas. PIC methods use grid based computations, for solving Poisson's equation or more generally Maxwell's equations, as well as Monte-Carlo type methods to sample the Vlasov equation. The presence of two types of discretizations, deterministic field solves and Monte-Carlo methods for the Vlasov equation, pose challenges in understanding and optimizing performance on today large scale computers which require high levels of concurrency. These challenges arises from the need to optimize two very different types of processes and the interactions between them. Modern cache based high-end computers have very deep memory hierarchies and high degrees of concurrency which must be utilized effectively to achieve good performance. The effective use of these machines requires maximizing concurrency by eliminating serial or redundant work and minimizing global communication. A related issue is minimizing the memory traffic between levels of the memory hierarchy because performance is often limited by the bandwidths and latencies of the memory system. This paper discusses some of the performance issues, particularly in regard to parallelism, of PIC methods. The gyrokinetic toroidal code (GTC) is used for these studies and a new radial grid decomposition is presented and evaluated. Scaling of the code is demonstrated on ITER sized plasmas with up to 16K Cray XT3/4 cores.
Performance of particle in cell methods on highly concurrent computational architectures

International Nuclear Information System (INIS)

Adams, M F; Ethier, S; Wichmann, N

2007-01-01

Particle in cell (PIC) methods are effective in computing Vlasov-Poisson system of equations used in simulations of magnetic fusion plasmas. PIC methods use grid based computations, for solving Poisson's equation or more generally Maxwell's equations, as well as Monte-Carlo type methods to sample the Vlasov equation. The presence of two types of discretizations, deterministic field solves and Monte-Carlo methods for the Vlasov equation, pose challenges in understanding and optimizing performance on today large scale computers which require high levels of concurrency. These challenges arises from the need to optimize two very different types of processes and the interactions between them. Modern cache based high-end computers have very deep memory hierarchies and high degrees of concurrency which must be utilized effectively to achieve good performance. The effective use of these machines requires maximizing concurrency by eliminating serial or redundant work and minimizing global communication. A related issue is minimizing the memory traffic between levels of the memory hierarchy because performance is often limited by the bandwidths and latencies of the memory system. This paper discusses some of the performance issues, particularly in regard to parallelism, of PIC methods. The gyrokinetic toroidal code (GTC) is used for these studies and a new radial grid decomposition is presented and evaluated. Scaling of the code is demonstrated on ITER sized plasmas with up to 16K Cray XT3/4 cores
A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing.

Energy Technology Data Exchange (ETDEWEB)

Vineyard, Craig Michael [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verzi, Stephen Joseph [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2017-09-01

As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilize memory.
Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures

Energy Technology Data Exchange (ETDEWEB)

Arumugam, Kamesh [Old Dominion Univ., Norfolk, VA (United States)

2017-05-01

Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore, these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address
Cloud Computing Security in Openstack Architecture: General Overview

Directory of Open Access Journals (Sweden)

Gleb Igorevich Shakulo

2015-10-01

Full Text Available The subject of article is cloud computing security. Article begins with author analyzing cloud computing advantages and disadvantages, factors of growth, both positive and negative. Among latter, security is deemed one of the most prominent. Furthermore, author takes architecture of OpenStack project as an example for study: describes its essential components and their interconnection. As conclusion, author raises series of questions as possible areas of further research to resolve security concerns, thus making cloud computing more secure technology.
Design of Carborane Molecular Architectures via Electronic Structure Computations

International Nuclear Information System (INIS)

Oliva, J.M.; Serrano-Andres, L.; Klein, D.J.; Schleyer, P.V.R.; Mich, J.

2009-01-01

Quantum-mechanical electronic structure computations were employed to explore initial steps towards a comprehensive design of poly carborane architectures through assembly of molecular units. Aspects considered were (i) the striking modification of geometrical parameters through substitution, (ii) endohedral carboranes and proposed ejection mechanisms for energy/ion/atom/energy storage/transport, (iii) the excited state character in single and dimeric molecular units, and (iv) higher architectural constructs. A goal of this work is to find optimal architectures where atom/ion/energy/spin transport within carborane superclusters is feasible in order to modernize and improve future photo energy processes.
Designing fault-tolerant real-time computer systems with diversified bus architecture for nuclear power plants

International Nuclear Information System (INIS)

Behera, Rajendra Prasad; Murali, N.; Satya Murty, S.A.V.

2014-01-01

Fault-tolerant real-time computer (FT-RTC) systems are widely used to perform safe operation of nuclear power plants (NPP) and safe shutdown in the event of any untoward situation. Design requirements for such systems need high reliability, availability, computational ability for measurement via sensors, control action via actuators, data communication and human interface via keyboard or display. All these attributes of FT-RTC systems are required to be implemented using best known methods such as redundant system design using diversified bus architecture to avoid common cause failure, fail-safe design to avoid unsafe failure and diagnostic features to validate system operation. In this context, the system designer must select efficient as well as highly reliable diversified bus architecture in order to realize fault-tolerant system design. This paper presents a comparative study between CompactPCI bus and Versa Module Eurocard (VME) bus architecture for designing FT-RTC systems with switch over logic system (SOLS) for NPP. (author)
A State-Based Modeling Approach for Efficient Performance Evaluation of Embedded System Architectures at Transaction Level

Directory of Open Access Journals (Sweden)

Anthony Barreteau

2012-01-01

Full Text Available Abstract models are necessary to assist system architects in the evaluation process of hardware/software architectures and to cope with the still increasing complexity of embedded systems. Efficient methods are required to create reliable models of system architectures and to allow early performance evaluation and fast exploration of the design space. In this paper, we present a specific transaction level modeling approach for performance evaluation of hardware/software architectures. This approach relies on a generic execution model that exhibits light modeling effort. Created models are used to evaluate by simulation expected processing and memory resources according to various architectures. The proposed execution model relies on a specific computation method defined to improve the simulation speed of transaction level models. The benefits of the proposed approach are highlighted through two case studies. The first case study is a didactic example illustrating the modeling approach. In this example, a simulation speed-up by a factor of 7,62 is achieved by using the proposed computation method. The second case study concerns the analysis of a communication receiver supporting part of the physical layer of the LTE protocol. In this case study, architecture exploration is led in order to improve the allocation of processing functions.
Building and measuring a high performance network architecture

Energy Technology Data Exchange (ETDEWEB)

Kramer, William T.C.; Toole, Timothy; Fisher, Chuck; Dugan, Jon; Wheeler, David; Wing, William R; Nickless, William; Goddard, Gregory; Corbato, Steven; Love, E. Paul; Daspit, Paul; Edwards, Hal; Mercer, Linden; Koester, David; Decina, Basil; Dart, Eli; Paul Reisinger, Paul; Kurihara, Riki; Zekauskas, Matthew J; Plesset, Eric; Wulf, Julie; Luce, Douglas; Rogers, James; Duncan, Rex; Mauth, Jeffery

2001-04-20

Once a year, the SC conferences present a unique opportunity to create and build one of the most complex and highest performance networks in the world. At SC2000, large-scale and complex local and wide area networking connections were demonstrated, including large-scale distributed applications running on different architectures. This project was designed to use the unique opportunity presented at SC2000 to create a testbed network environment and then use that network to demonstrate and evaluate high performance computational and communication applications. This testbed was designed to incorporate many interoperable systems and services and was designed for measurement from the very beginning. The end results were key insights into how to use novel, high performance networking technologies and to accumulate measurements that will give insights into the networks of the future.
Geometric Computing for Freeform Architecture

KAUST Repository

Wallner, J.

2011-06-03

Geometric computing has recently found a new field of applications, namely the various geometric problems which lie at the heart of rationalization and construction-aware design processes of freeform architecture. We report on our work in this area, dealing with meshes with planar faces and meshes which allow multilayer constructions (which is related to discrete surfaces and their curvatures), triangles meshes with circle-packing properties (which is related to conformal uniformization), and with the paneling problem. We emphasize the combination of numerical optimization and geometric knowledge.
Emerging opportunities in enterprise integration with open architecture computer numerical controls

Science.gov (United States)

Hudson, Christopher A.

1997-01-01

The shift to open-architecture machine tool computer numerical controls is providing new opportunities for metal working oriented manufacturers to streamline the entire 'art to part' process. Production cycle times, accuracy, consistency, predictability and process reliability are just some of the factors that can be improved, leading to better manufactured product at lower costs. Open architecture controllers are allowing manufacturers to apply general purpose software and hardware tools increase where previous approaches relied on proprietary and unique hardware and software. This includes DNC, SCADA, CAD, and CAM, where the increasing use of general purpose components is leading to lower cost system that are also more reliable and robust than the past proprietary approaches. In addition, a number of new opportunities exist, which in the past were likely impractical due to cost or performance constraints.
The Fermilab central computing facility architectural model

International Nuclear Information System (INIS)

Nicholls, J.

1989-01-01

The goal of the current Central Computing Upgrade at Fermilab is to create a computing environment that maximizes total productivity, particularly for high energy physics analysis. The Computing Department and the Next Computer Acquisition Committee decided upon a model which includes five components: an interactive front-end, a Large-Scale Scientific Computer (LSSC, a mainframe computing engine), a microprocessor farm system, a file server, and workstations. With the exception of the file server, all segments of this model are currently in production: a VAX/VMS cluster interactive front-end, an Amdahl VM Computing engine, ACP farms, and (primarily) VMS workstations. This paper will discuss the implementation of the Fermilab Central Computing Facility Architectural Model. Implications for Code Management in such a heterogeneous environment, including issues such as modularity and centrality, will be considered. Special emphasis will be placed on connectivity and communications between the front-end, LSSC, and workstations, as practiced at Fermilab. (orig.)

The Fermilab Central Computing Facility architectural model

International Nuclear Information System (INIS)

Nicholls, J.

1989-05-01

The goal of the current Central Computing Upgrade at Fermilab is to create a computing environment that maximizes total productivity, particularly for high energy physics analysis. The Computing Department and the Next Computer Acquisition Committee decided upon a model which includes five components: an interactive front end, a Large-Scale Scientific Computer (LSSC, a mainframe computing engine), a microprocessor farm system, a file server, and workstations. With the exception of the file server, all segments of this model are currently in production: a VAX/VMS Cluster interactive front end, an Amdahl VM computing engine, ACP farms, and (primarily) VMS workstations. This presentation will discuss the implementation of the Fermilab Central Computing Facility Architectural Model. Implications for Code Management in such a heterogeneous environment, including issues such as modularity and centrality, will be considered. Special emphasis will be placed on connectivity and communications between the front-end, LSSC, and workstations, as practiced at Fermilab. 2 figs
Information management architecture for an integrated computing environment for the Environmental Restoration Program. Environmental Restoration Program, Volume 3, Interim technical architecture

International Nuclear Information System (INIS)

1994-09-01

This third volume of the Information Management Architecture for an Integrated Computing Environment for the Environmental Restoration Program--the Interim Technical Architecture (TA) (referred to throughout the remainder of this document as the ER TA)--represents a key milestone in establishing a coordinated information management environment in which information initiatives can be pursued with the confidence that redundancy and inconsistencies will be held to a minimum. This architecture is intended to be used as a reference by anyone whose responsibilities include the acquisition or development of information technology for use by the ER Program. The interim ER TA provides technical guidance at three levels. At the highest level, the technical architecture provides an overall computing philosophy or direction. At this level, the guidance does not address specific technologies or products but addresses more general concepts, such as the use of open systems, modular architectures, graphical user interfaces, and architecture-based development. At the next level, the technical architecture provides specific information technology recommendations regarding a wide variety of specific technologies. These technologies include computing hardware, operating systems, communications software, database management software, application development software, and personal productivity software, among others. These recommendations range from the adoption of specific industry or Martin Marietta Energy Systems, Inc. (Energy Systems) standards to the specification of individual products. At the third level, the architecture provides guidance regarding implementation strategies for the recommended technologies that can be applied to individual projects and to the ER Program as a whole
Porting plasma physics simulation codes to modern computing architectures using the libmrc framework

Science.gov (United States)

Germaschewski, Kai; Abbott, Stephen

2015-11-01

Available computing power has continued to grow exponentially even after single-core performance satured in the last decade. The increase has since been driven by more parallelism, both using more cores and having more parallelism in each core, e.g. in GPUs and Intel Xeon Phi. Adapting existing plasma physics codes is challenging, in particular as there is no single programming model that covers current and future architectures. We will introduce the open-source libmrc framework that has been used to modularize and port three plasma physics codes: The extended MHD code MRCv3 with implicit time integration and curvilinear grids; the OpenGGCM global magnetosphere model; and the particle-in-cell code PSC. libmrc consolidates basic functionality needed for simulations based on structured grids (I/O, load balancing, time integrators), and also introduces a parallel object model that makes it possible to maintain multiple implementations of computational kernels, on e.g. conventional processors and GPUs. It handles data layout conversions and enables us to port performance-critical parts of a code to a new architecture step-by-step, while the rest of the code can remain unchanged. We will show examples of the performance gains and some physics applications.
Multicore Challenges and Benefits for High Performance Scientific Computing

Directory of Open Access Journals (Sweden)

Ida M.B. Nielsen

2008-01-01

Full Text Available Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexity of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.
Computer aid in solar architecture

Energy Technology Data Exchange (ETDEWEB)

Rosendahl, E W

1982-02-01

Among architects the question is being discussed in how far new buildings can be designed in a way to make more economical use of energy by architectural means. Solar houses in the USA are often taken as a model. As yet it is unclear how such measures will affect heat demand in the central European climate and with domestic building materials being used. A computer simulation program is introduced by which these questions can be answered as early as in the stage of planning. The program can be run on a common microcomputersystem.
Integrating Computing Resources: A Shared Distributed Architecture for Academics and Administrators.

Science.gov (United States)

Beltrametti, Monica; English, Will

1994-01-01

Development and implementation of a shared distributed computing architecture at the University of Alberta (Canada) are described. Aspects discussed include design of the architecture, users' views of the electronic environment, technical and managerial challenges, and the campuswide human infrastructures needed to manage such an integrated…
The Architectural Designs of a Nanoscale Computing Model

Directory of Open Access Journals (Sweden)

Mary M. Eshaghian-Wilner

2004-08-01

Full Text Available A generic nanoscale computing model is presented in this paper. The model consists of a collection of fully interconnected nanoscale computing modules, where each module is a cube of cells made out of quantum dots, spins, or molecules. The cells dynamically switch between two states by quantum interactions among their neighbors in all three dimensions. This paper includes a brief introduction to the field of nanotechnology from a computing point of view and presents a set of preliminary architectural designs for fabricating the nanoscale model studied.
High-Performance Monitoring Architecture for Large-Scale Distributed Systems Using Event Filtering

Science.gov (United States)

Maly, K.

1998-01-01

Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed (LSD) systems. In an LSD environment, a large number of events is generated by the system components during its execution or interaction with external objects (e.g. users or processes). Monitoring such events is necessary for observing the run-time behavior of LSD systems and providing status information required for debugging, tuning and managing such applications. However, correlated events are generated concurrently and could be distributed in various locations in the applications environment which complicates the management decisions process and thereby makes monitoring LSD systems an intricate task. We propose a scalable high-performance monitoring architecture for LSD systems to detect and classify interesting local and global events and disseminate the monitoring information to the corresponding end- points management applications such as debugging and reactive control tools to improve the application performance and reliability. A large volume of events may be generated due to the extensive demands of the monitoring applications and the high interaction of LSD systems. The monitoring architecture employs a high-performance event filtering mechanism to efficiently process the large volume of event traffic generated by LSD systems and minimize the intrusiveness of the monitoring process by reducing the event traffic flow in the system and distributing the monitoring computation. Our architecture also supports dynamic and flexible reconfiguration of the monitoring mechanism via its Instrumentation and subscription components. As a case study, we show how our monitoring architecture can be utilized to improve the reliability and the performance of the Interactive Remote Instruction (IRI) system which is a large-scale distributed system for collaborative distance learning. The filtering mechanism represents an Intrinsic component integrated
Micromagnetics on high-performance workstation and mobile computational platforms

Science.gov (United States)

Fu, S.; Chang, R.; Couture, S.; Menarini, M.; Escobar, M. A.; Kuteifan, M.; Lubarda, M.; Gabay, D.; Lomakin, V.

2015-05-01

The feasibility of using high-performance desktop and embedded mobile computational platforms is presented, including multi-core Intel central processing unit, Nvidia desktop graphics processing units, and Nvidia Jetson TK1 Platform. FastMag finite element method-based micromagnetic simulator is used as a testbed, showing high efficiency on all the platforms. Optimization aspects of improving the performance of the mobile systems are discussed. The high performance, low cost, low power consumption, and rapid performance increase of the embedded mobile systems make them a promising candidate for micromagnetic simulations. Such architectures can be used as standalone systems or can be built as low-power computing clusters.
Performance Analysis of GFDL's GCM Line-By-Line Radiative Transfer Model on GPU and MIC Architectures

Science.gov (United States)

Menzel, R.; Paynter, D.; Jones, A. L.

2017-12-01

Due to their relatively low computational cost, radiative transfer models in global climate models (GCMs) run on traditional CPU architectures generally consist of shortwave and longwave parameterizations over a small number of wavelength bands. With the rise of newer GPU and MIC architectures, however, the performance of high resolution line-by-line radiative transfer models may soon approach those of the physical parameterizations currently employed in GCMs. Here we present an analysis of the current performance of a new line-by-line radiative transfer model currently under development at GFDL. Although originally designed to specifically exploit GPU architectures through the use of CUDA, the radiative transfer model has recently been extended to include OpenMP in an effort to also effectively target MIC architectures such as Intel's Xeon Phi. Using input data provided by the upcoming Radiative Forcing Model Intercomparison Project (RFMIP, as part of CMIP 6), we compare model results and performance data for various model configurations and spectral resolutions run on both GPU and Intel Knights Landing architectures to analogous runs of the standard Oxford Reference Forward Model on traditional CPUs.
Network architecture test-beds as platforms for ubiquitous computing.

Science.gov (United States)

Roscoe, Timothy

2008-10-28

Distributed systems research, and in particular ubiquitous computing, has traditionally assumed the Internet as a basic underlying communications substrate. Recently, however, the networking research community has come to question the fundamental design or 'architecture' of the Internet. This has been led by two observations: first, that the Internet as it stands is now almost impossible to evolve to support new functionality; and second, that modern applications of all kinds now use the Internet rather differently, and frequently implement their own 'overlay' networks above it to work around its perceived deficiencies. In this paper, I discuss recent academic projects to allow disruptive change to the Internet architecture, and also outline a radically different view of networking for ubiquitous computing that such proposals might facilitate.
Computer Architecture for Energy Efficient SFQ

Science.gov (United States)

2014-08-27

IBM Corporation (T.J. Watson Research Laboratory) 1101 Kitchawan Road Yorktown Heights, NY 10598 -0000 2 ABSTRACT Number of Papers published in peer...accomplished during this ARO-sponsored project at IBM Research to identify and model an energy efficient SFQ-based computer architecture. The... IBM Windsor Blue (WB), illustrated schematically in Figure 2. The basic building block of WB is a "tile" comprised of a 64-bit arithmetic logic unit
Cloud Computing Security in Openstack Architecture: General Overview

OpenAIRE

Gleb Igorevich Shakulo

2015-01-01

The subject of article is cloud computing security. Article begins with author analyzing cloud computing advantages and disadvantages, factors of growth, both positive and negative. Among latter, security is deemed one of the most prominent. Furthermore, author takes architecture of OpenStack project as an example for study: describes its essential components and their interconnection. As conclusion, author raises series of questions as possible areas of further research to resolve security c...
Computational simulation in architectural and environmental acoustics methods and applications of wave-based computation

CERN Document Server

Sakamoto, Shinichi; Otsuru, Toru

2014-01-01

This book reviews a variety of methods for wave-based acoustic simulation and recent applications to architectural and environmental acoustic problems. Following an introduction providing an overview of computational simulation of sound environment, the book is in two parts: four chapters on methods and four chapters on applications. The first part explains the fundamentals and advanced techniques for three popular methods, namely, the finite-difference time-domain method, the finite element method, and the boundary element method, as well as alternative time-domain methods. The second part demonstrates various applications to room acoustics simulation, noise propagation simulation, acoustic property simulation for building components, and auralization. This book is a valuable reference that covers the state of the art in computational simulation for architectural and environmental acoustics.
Virtual Prototyping and Performance Analysis of Two Memory Architectures

Directory of Open Access Journals (Sweden)

Huda S. Muhammad

2009-01-01

Full Text Available The gap between CPU and memory speed has always been a critical concern that motivated researchers to study and analyze the performance of memory hierarchical architectures. In the early stages of the design cycle, performance evaluation methodologies can be used to leverage exploration at the architectural level and assist in making early design tradeoffs. In this paper, we use simulation platforms developed using the VisualSim tool to compare the performance of two memory architectures, namely, the Direct Connect architecture of the Opteron, and the Shared Bus of the Xeon multicore processors. Key variations exist between the two memory architectures and both design approaches provide rich platforms that call for the early use of virtual system prototyping and simulation techniques to assess performance at an early stage in the design cycle.
Performance analysis of IMS based LTE and WIMAX integration architectures

Directory of Open Access Journals (Sweden)

A. Bagubali

2016-12-01

Full Text Available In the current networking field many research works are going on regarding the integration of different wireless technologies, with the aim of providing uninterrupted connectivity to the user anywhere, with high data rates due to increased demand. However, the number of objects like smart devices, industrial machines, smart homes, connected by wireless interface is dramatically increasing due to the evolution of cloud computing and internet of things technology. This Paper begins with the challenges involved in such integrations and then explains the role of different couplings and different architectures. This paper also gives further improvement in the LTE and Wimax integration architectures to provide seamless vertical handover and flexible quality of service for supporting voice, video, multimedia services over IP network and mobility management with the help of IMS networks. Evaluation of various parameters like handover delay, cost of signalling, packet loss,, is done and the performance of the interworking architecture is analysed from the simulation results. Finally, it concludes that the cross layer scenario is better than the non cross layer scenario.
The visual simulators for architecture and computer organization learning

OpenAIRE

Nikolić Boško; Grbanović Nenad; Đorđević Jovan

2009-01-01

The paper proposes a method of an effective distance learning of architecture and computer organization. The proposed method is based on a software system that is possible to be applied in any course in this field. Within this system students are enabled to observe simulation of already created computer systems. The system provides creation and simulation of switch systems, too.
On Computational Fluid Dynamics Tools in Architectural Design

DEFF Research Database (Denmark)

Kirkegaard, Poul Henning; Hougaard, Mads; Stærdahl, Jesper Winther

engineering computational fluid dynamics (CFD) simulation program ANSYS CFX and a CFD based representative program RealFlow are investigated. These two programs represent two types of CFD based tools available for use during phases of an architectural design process. However, as outlined in two case studies...
Nanotube devices based crossbar architecture: toward neuromorphic computing

International Nuclear Information System (INIS)

Zhao, W S; Gamrat, C; Agnus, G; Derycke, V; Filoramo, A; Bourgoin, J-P

2010-01-01

Nanoscale devices such as carbon nanotube and nanowires based transistors, memristors and molecular devices are expected to play an important role in the development of new computing architectures. While their size represents a decisive advantage in terms of integration density, it also raises the critical question of how to efficiently address large numbers of densely integrated nanodevices without the need for complex multi-layer interconnection topologies similar to those used in CMOS technology. Two-terminal programmable devices in crossbar geometry seem particularly attractive, but suffer from severe addressing difficulties due to cross-talk, which implies complex programming procedures. Three-terminal devices can be easily addressed individually, but with limited gain in terms of interconnect integration. We show how optically gated carbon nanotube devices enable efficient individual addressing when arranged in a crossbar geometry with shared gate electrodes. This topology is particularly well suited for parallel programming or learning in the context of neuromorphic computing architectures.
Applications of parallel computer architectures to the real-time simulation of nuclear power systems

International Nuclear Information System (INIS)

Doster, J.M.; Sills, E.D.

1988-01-01

In this paper the authors report on efforts to utilize parallel computer architectures for the thermal-hydraulic simulation of nuclear power systems and current research efforts toward the development of advanced reactor operator aids and control systems based on this new technology. Many aspects of reactor thermal-hydraulic calculations are inherently parallel, and the computationally intensive portions of these calculations can be effectively implemented on modern computers. Timing studies indicate faster-than-real-time, high-fidelity physics models can be developed when the computational algorithms are designed to take advantage of the computer's architecture. These capabilities allow for the development of novel control systems and advanced reactor operator aids. Coupled with an integral real-time data acquisition system, evolving parallel computer architectures can provide operators and control room designers improved control and protection capabilities. Current research efforts are currently under way in this area

Multiprocessor architecture: Synthesis and evaluation

Science.gov (United States)

Standley, Hilda M.

1990-01-01

Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.
CMOL/CMOS hardware architectures and performance/price for Bayesian memory - The building block of intelligent systems

Science.gov (United States)

Zaveri, Mazad Shaheriar

The semiconductor/computer industry has been following Moore's law for several decades and has reaped the benefits in speed and density of the resultant scaling. Transistor density has reached almost one billion per chip, and transistor delays are in picoseconds. However, scaling has slowed down, and the semiconductor industry is now facing several challenges. Hybrid CMOS/nano technologies, such as CMOL, are considered as an interim solution to some of the challenges. Another potential architectural solution includes specialized architectures for applications/models in the intelligent computing domain, one aspect of which includes abstract computational models inspired from the neuro/cognitive sciences. Consequently in this dissertation, we focus on the hardware implementations of Bayesian Memory (BM), which is a (Bayesian) Biologically Inspired Computational Model (BICM). This model is a simplified version of George and Hawkins' model of the visual cortex, which includes an inference framework based on Judea Pearl's belief propagation. We then present a "hardware design space exploration" methodology for implementing and analyzing the (digital and mixed-signal) hardware for the BM. This particular methodology involves: analyzing the computational/operational cost and the related micro-architecture, exploring candidate hardware components, proposing various custom hardware architectures using both traditional CMOS and hybrid nanotechnology - CMOL, and investigating the baseline performance/price of these architectures. The results suggest that CMOL is a promising candidate for implementing a BM. Such implementations can utilize the very high density storage/computation benefits of these new nano-scale technologies much more efficiently; for example, the throughput per 858 mm2 (TPM) obtained for CMOL based architectures is 32 to 40 times better than the TPM for a CMOS based multiprocessor/multi-FPGA system, and almost 2000 times better than the TPM for a PC
The Simulation Intranet Architecture

Energy Technology Data Exchange (ETDEWEB)

Holmes, V.P.; Linebarger, J.M.; Miller, D.J.; Vandewart, R.L.

1998-12-02

The Simdarion Infranet (S1) is a term which is being used to dcscribc one element of a multidisciplinary distributed and distance computing initiative known as DisCom2 at Sandia National Laboratory (http ct al. 1998). The Simulation Intranet is an architecture for satisfying Sandia's long term goal of providing an end- to-end set of scrviccs for high fidelity full physics simu- lations in a high performance, distributed, and distance computing environment. The Intranet Architecture group was formed to apply current distributed object technologies to this problcm. For the hardware architec- tures and software models involved with the current simulation process, a CORBA-based architecture is best suited to meet Sandia's needs. This paper presents the initial desi-a and implementation of this Intranct based on a three-tier Network Computing Architecture(NCA). The major parts of the architecture include: the Web Cli- ent, the Business Objects, and Data Persistence.
Optical interconnection networks for high-performance computing systems

International Nuclear Information System (INIS)

Biberman, Aleksandr; Bergman, Keren

2012-01-01

Enabled by silicon photonic technology, optical interconnection networks have the potential to be a key disruptive technology in computing and communication industries. The enduring pursuit of performance gains in computing, combined with stringent power constraints, has fostered the ever-growing computational parallelism associated with chip multiprocessors, memory systems, high-performance computing systems and data centers. Sustaining these parallelism growths introduces unique challenges for on- and off-chip communications, shifting the focus toward novel and fundamentally different communication approaches. Chip-scale photonic interconnection networks, enabled by high-performance silicon photonic devices, offer unprecedented bandwidth scalability with reduced power consumption. We demonstrate that the silicon photonic platforms have already produced all the high-performance photonic devices required to realize these types of networks. Through extensive empirical characterization in much of our work, we demonstrate such feasibility of waveguides, modulators, switches and photodetectors. We also demonstrate systems that simultaneously combine many functionalities to achieve more complex building blocks. We propose novel silicon photonic devices, subsystems, network topologies and architectures to enable unprecedented performance of these photonic interconnection networks. Furthermore, the advantages of photonic interconnection networks extend far beyond the chip, offering advanced communication environments for memory systems, high-performance computing systems, and data centers. (review article)
Proposing Hybrid Architecture to Implement Cloud Computing in Higher Education Institutions Using a Meta-synthesis Appro

Directory of Open Access Journals (Sweden)

hamid reza bazi

2017-12-01

Full Text Available Cloud computing is a new technology that considerably helps Higher Education Institutions (HEIs to develop and create competitive advantage with inherent characteristics such as flexibility, scalability, accessibility, reliability, fault tolerant and economic efficiency. Due to the numerous advantages of cloud computing, and in order to take advantage of cloud computing infrastructure, services of universities and HEIs need to migrate to the cloud. However, this transition involves many challenges, one of which is lack or shortage of appropriate architecture for migration to the technology. Using a reliable architecture for migration ensures managers to mitigate risks in the cloud computing technology. Therefore, organizations always search for suitable cloud computing architecture. In previous studies, these important features have received less attention and have not been achieved in a comprehensive way. The aim of this study is to use a meta-synthesis method for the first time to analyze the previously published studies and to suggest appropriate hybrid cloud migration architecture (IUHEC. We reviewed many papers from relevant journals and conference proceedings. The concepts extracted from these papers are classified to related categories and sub-categories. Then, we developed our proposed hybrid architecture based on these concepts and categories. The proposed architecture was validated by a panel of experts and Lawshe’s model was used to determine the content validity. Due to its innovative yet user-friendly nature, comprehensiveness, and high security, this architecture can help HEIs have an effective migration to cloud computing environment.
Architecture and program structures for a special purpose finite element computer

Energy Technology Data Exchange (ETDEWEB)

Norrie, D.H.; Norrie, C.W.

1983-01-01

The development of very large scale integration (VLSI) has made special-purpose computers economically possible. With such a machine, the loss of flexibility compared with a general-purpose computer can be offset by the increased speed which can be obtained by tailoring the architecture to the particular problem or class of problem. The first kind of special-purpose machine has its architecture modelled on the physical structure of the problem and the second kind has its design tailored to the computational algorithm used. The parallel finite element machine (PARFEM) being designed at the University of Calgary for the solution of finite element problems is of the second kind. Its conceptual design is described and progress to date outlined. 14 references.
Component-based software for high-performance scientific computing

Energy Technology Data Exchange (ETDEWEB)

Alexeev, Yuri; Allan, Benjamin A; Armstrong, Robert C; Bernholdt, David E; Dahlgren, Tamara L; Gannon, Dennis; Janssen, Curtis L; Kenny, Joseph P; Krishnan, Manojkumar; Kohl, James A; Kumfert, Gary; McInnes, Lois Curfman; Nieplocha, Jarek; Parker, Steven G; Rasmussen, Craig; Windus, Theresa L

2005-01-01

Recent advances in both computational hardware and multidisciplinary science have given rise to an unprecedented level of complexity in scientific simulation software. This paper describes an ongoing grass roots effort aimed at addressing complexity in high-performance computing through the use of Component-Based Software Engineering (CBSE). Highlights of the benefits and accomplishments of the Common Component Architecture (CCA) Forum and SciDAC ISIC are given, followed by an illustrative example of how the CCA has been applied to drive scientific discovery in quantum chemistry. Thrusts for future research are also described briefly.
Component-based software for high-performance scientific computing

International Nuclear Information System (INIS)

Alexeev, Yuri; Allan, Benjamin A; Armstrong, Robert C; Bernholdt, David E; Dahlgren, Tamara L; Gannon, Dennis; Janssen, Curtis L; Kenny, Joseph P; Krishnan, Manojkumar; Kohl, James A; Kumfert, Gary; McInnes, Lois Curfman; Nieplocha, Jarek; Parker, Steven G; Rasmussen, Craig; Windus, Theresa L

2005-01-01

Recent advances in both computational hardware and multidisciplinary science have given rise to an unprecedented level of complexity in scientific simulation software. This paper describes an ongoing grass roots effort aimed at addressing complexity in high-performance computing through the use of Component-Based Software Engineering (CBSE). Highlights of the benefits and accomplishments of the Common Component Architecture (CCA) Forum and SciDAC ISIC are given, followed by an illustrative example of how the CCA has been applied to drive scientific discovery in quantum chemistry. Thrusts for future research are also described briefly
Performative Architecture and Urban Spaces

DEFF Research Database (Denmark)

Kiib, Hans

2008-01-01

3 Workshops one exibition Three conceptual architectural workshops took take place in parallel from August 16th - 22nd 2008. Each workshop carried a specific methodology and the goal is to come up with conceptual proposals that could be further developed for selected sites in the city of Aalb...... This workshop focus on temporary architecture and urban catalysts. Informal spaces and the interface between the built and the void are foremost in the development of performative urban environments and cultural interaction. ...... 3 Workshops one exibition Three conceptual architectural workshops took take place in parallel from August 16th - 22nd 2008. Each workshop carried a specific methodology and the goal is to come up with conceptual proposals that could be further developed for selected sites in the city...... The workshop model includes an open workshop where a handful of international architects are invited to spend five days with local architects, engineers and scholars contributing to a work of architectural vision and quality. The workshop includes presentations and discussions and development of projects...
Exploring Hardware-Based Primitives to Enhance Parallel Security Monitoring in a Novel Computing Architecture

National Research Council Canada - National Science Library

Mott, Stephen

2007-01-01

.... In doing this, we propose a novel computing architecture, derived from a contemporary shared memory architecture, that facilitates efficient security-related monitoring in real-time, while keeping...
Real-time field programmable gate array architecture for computer vision

Science.gov (United States)

Arias-Estrada, Miguel; Torres-Huitzil, Cesar

2001-01-01

This paper presents an architecture for real-time generic convolution of a mask and an image. The architecture is intended for fast low-level image processing. The field programmable gate array (FPGA)-based architecture takes advantage of the availability of registers in FPGAs to implement an efficient and compact module to process the convolutions. The architecture is designed to minimize the number of accesses to the image memory and it is based on parallel modules with internal pipeline operation in order to improve its performance. The architecture is prototyped in a FPGA, but it can be implemented on dedicated very- large-scale-integrated devices to reach higher clock frequencies. Complexity issues, FPGA resources utilization, FPGA limitations, and real-time performance are discussed. Some results are presented and discussed.
Advanced Architectures for Astrophysical Supercomputing

Science.gov (United States)

Barsdell, B. R.; Barnes, D. G.; Fluke, C. J.

2010-12-01

Astronomers have come to rely on the increasing performance of computers to reduce, analyze, simulate and visualize their data. In this environment, faster computation can mean more science outcomes or the opening up of new parameter spaces for investigation. If we are to avoid major issues when implementing codes on advanced architectures, it is important that we have a solid understanding of our algorithms. A recent addition to the high-performance computing scene that highlights this point is the graphics processing unit (GPU). The hardware originally designed for speeding-up graphics rendering in video games is now achieving speed-ups of O(100×) in general-purpose computation - performance that cannot be ignored. We are using a generalized approach, based on the analysis of astronomy algorithms, to identify the optimal problem-types and techniques for taking advantage of both current GPU hardware and future developments in computing architectures.
Integrated Optical Interconnect Architectures for Embedded Systems

CERN Document Server

Nicolescu, Gabriela

2013-01-01

This book provides a broad overview of current research in optical interconnect technologies and architectures. Introductory chapters on high-performance computing and the associated issues in conventional interconnect architectures, and on the fundamental building blocks for integrated optical interconnect, provide the foundations for the bulk of the book which brings together leading experts in the field of optical interconnect architectures for data communication. Particular emphasis is given to the ways in which the photonic components are assembled into architectures to address the needs of data-intensive on-chip communication, and to the performance evaluation of such architectures for specific applications. Provides state-of-the-art research on the use of optical interconnects in Embedded Systems; Begins with coverage of the basics for high-performance computing and optical interconnect; Includes a variety of on-chip optical communication topologies; Features coverage of system integration and opti...
A Coarse-Grained Reconfigurable Architecture with Compilation for High Performance

Directory of Open Access Journals (Sweden)

Lu Wan

2012-01-01

Full Text Available We propose a fast data relay (FDR mechanism to enhance existing CGRA (coarse-grained reconfigurable architecture. FDR can not only provide multicycle data transmission in concurrent with computations but also convert resource-demanding inter-processing-element global data accesses into local data accesses to avoid communication congestion. We also propose the supporting compiler techniques that can efficiently utilize the FDR feature to achieve higher performance for a variety of applications. Our results on FDR-based CGRA are compared with two other works in this field: ADRES and RCP. Experimental results for various multimedia applications show that FDR combined with the new compiler deliver up to 29% and 21% higher performance than ADRES and RCP, respectively.
An Integrated Architecture for On-Board Aircraft Engine Performance Trend Monitoring and Gas Path Fault Diagnostics

Science.gov (United States)

Simon, Donald L.

2010-01-01

Aircraft engine performance trend monitoring and gas path fault diagnostics are closely related technologies that assist operators in managing the health of their gas turbine engine assets. Trend monitoring is the process of monitoring the gradual performance change that an aircraft engine will naturally incur over time due to turbomachinery deterioration, while gas path diagnostics is the process of detecting and isolating the occurrence of any faults impacting engine flow-path performance. Today, performance trend monitoring and gas path fault diagnostic functions are performed by a combination of on-board and off-board strategies. On-board engine control computers contain logic that monitors for anomalous engine operation in real-time. Off-board ground stations are used to conduct fleet-wide engine trend monitoring and fault diagnostics based on data collected from each engine each flight. Continuing advances in avionics are enabling the migration of portions of the ground-based functionality on-board, giving rise to more sophisticated on-board engine health management capabilities. This paper reviews the conventional engine performance trend monitoring and gas path fault diagnostic architecture commonly applied today, and presents a proposed enhanced on-board architecture for future applications. The enhanced architecture gains real-time access to an expanded quantity of engine parameters, and provides advanced on-board model-based estimation capabilities. The benefits of the enhanced architecture include the real-time continuous monitoring of engine health, the early diagnosis of fault conditions, and the estimation of unmeasured engine performance parameters. A future vision to advance the enhanced architecture is also presented and discussed
Analytical Performance Modeling and Validation of Intel’s Xeon Phi Architecture

Energy Technology Data Exchange (ETDEWEB)

Chunduri, Sudheer; Balaprakash, Prasanna; Morozov, Vitali; Vishwanath, Venkatram; Kumaran, Kalyan

2017-01-01

Modeling the performance of scientific applications on emerging hardware plays a central role in achieving extreme-scale computing goals. Analytical models that capture the interaction between applications and hardware characteristics are attractive because even a reasonably accurate model can be useful for performance tuning before the hardware is made available. In this paper, we develop a hardware model for Intel’s second-generation Xeon Phi architecture code-named Knights Landing (KNL) for the SKOPE framework. We validate the KNL hardware model by projecting the performance of mini-benchmarks and application kernels. The results show that our KNL model can project the performance with prediction errors of 10% to 20%. The hardware model also provides informative recommendations for code transformations and tuning.
Quantum perceptron over a field and neural network architecture selection in a quantum computer.

Science.gov (United States)

da Silva, Adenilton José; Ludermir, Teresa Bernarda; de Oliveira, Wilson Rosa

2016-04-01

In this work, we propose a quantum neural network named quantum perceptron over a field (QPF). Quantum computers are not yet a reality and the models and algorithms proposed in this work cannot be simulated in actual (or classical) computers. QPF is a direct generalization of a classical perceptron and solves some drawbacks found in previous models of quantum perceptrons. We also present a learning algorithm named Superposition based Architecture Learning algorithm (SAL) that optimizes the neural network weights and architectures. SAL searches for the best architecture in a finite set of neural network architectures with linear time over the number of patterns in the training set. SAL is the first learning algorithm to determine neural network architectures in polynomial time. This speedup is obtained by the use of quantum parallelism and a non-linear quantum operator. Copyright © 2016 Elsevier Ltd. All rights reserved.
Hardware architecture design of image restoration based on time-frequency domain computation

Science.gov (United States)

Wen, Bo; Zhang, Jing; Jiao, Zipeng

2013-10-01

The image restoration algorithms based on time-frequency domain computation is high maturity and applied widely in engineering. To solve the high-speed implementation of these algorithms, the TFDC hardware architecture is proposed. Firstly, the main module is designed, by analyzing the common processing and numerical calculation. Then, to improve the commonality, the iteration control module is planed for iterative algorithms. In addition, to reduce the computational cost and memory requirements, the necessary optimizations are suggested for the time-consuming module, which include two-dimensional FFT/IFFT and the plural calculation. Eventually, the TFDC hardware architecture is adopted for hardware design of real-time image restoration system. The result proves that, the TFDC hardware architecture and its optimizations can be applied to image restoration algorithms based on TFDC, with good algorithm commonality, hardware realizability and high efficiency.
Performance evaluation of enterprise architecture using fuzzy sequence diagram

Directory of Open Access Journals (Sweden)

Mohammad Atasheneh

2014-01-01

Full Text Available Developing an Enterprise Architecture is a complex task and to control the complexity of the regulatory framework we need to measure the relative performance of one system against other available systems. On the other hand, enterprise architecture cannot be organized without the use of a logical structure. The framework provides a logical structure for classifying architectural output. Among the common architectural framework, the C4ISR framework and methodology of the product is one of the most popular techniques. In this paper, given the existing uncertainties in system development and information systems, a new version of UML called Fuzzy-UML is proposed for enterprise architecture development based on fuzzy Petri nets. In addition, the performance of the system is also evaluated based on Fuzzy sequence diagram.
TEACHING CAD PROGRAMMING TO ARCHITECTURE STUDENTS

Directory of Open Access Journals (Sweden)

Maria Gabriela Caffarena CELANI

2008-11-01

Full Text Available The objective of this paper is to discuss the relevance of including the discipline of computer programming in the architectural curriculum. To do so I start by explaining how computer programming has been applied in other educational contexts with pedagogical success, describing Seymour Papert's principles. After that, I summarize the historical development of CAD and provide three historical examples of educational applications of computer programming in architecture, followed by a contemporary case that I find of particular relevance. Next, I propose a methodology for teaching programming for architects that aims at improving the quality of designs by making their concepts more explicit. This methodology is based on my own experience teaching computer programming for architecture students at undergraduate and graduate levels at the State University of Campinas, Brazil. The paper ends with a discussion about the role of programming nowadays, when most CAD software are user-friendly and do not require any knowledge of programming for improving performance. I conclude that the introduction of programming in the CAD curriculum within a proper conceptual framework may transform the concept of architectural education. Key-words: Computer programming; computer-aided design; architectural education.

Performing three-dimensional neutral particle transport calculations on tera scale computers

International Nuclear Information System (INIS)

Woodward, C.S.; Brown, P.N.; Chang, B.; Dorr, M.R.; Hanebutte, U.R.

1999-01-01

A scalable, parallel code system to perform neutral particle transport calculations in three dimensions is presented. To utilize the hyper-cluster architecture of emerging tera scale computers, the parallel code successfully combines the MPI message passing and paradigms. The code's capabilities are demonstrated by a shielding calculation containing over 14 billion unknowns. This calculation was accomplished on the IBM SP ''ASCI-Blue-Pacific computer located at Lawrence Livermore National Laboratory (LLNL)
From Smart-Eco Building to High-Performance Architecture: Optimization of Energy Consumption in Architecture of Developing Countries

Science.gov (United States)

Mahdavinejad, M.; Bitaab, N.

2017-08-01

Search for high-performance architecture and dreams of future architecture resulted in attempts towards meeting energy efficient architecture and planning in different aspects. Recent trends as a mean to meet future legacy in architecture are based on the idea of innovative technologies for resource efficient buildings, performative design, bio-inspired technologies etc. while there are meaningful differences between architecture of developed and developing countries. Significance of issue might be understood when the emerging cities are found interested in Dubaization and other related booming development doctrines. This paper is to analyze the level of developing countries’ success to achieve smart-eco buildings’ goals and objectives. Emerging cities of West of Asia are selected as case studies of the paper. The results of the paper show that the concept of high-performance architecture and smart-eco buildings are different in developing countries in comparison with developed countries. The paper is to mention five essential issues in order to improve future architecture of developing countries: 1- Integrated Strategies for Energy Efficiency, 2- Contextual Solutions, 3- Embedded and Initial Energy Assessment, 4- Staff and Occupancy Wellbeing, 5- Life-Cycle Monitoring.
Insights into Working Memory from The Perspective of The EPIC Architecture for Modeling Skilled Perceptual-Motor and Cognitive Human Performance

National Research Council Canada - National Science Library

Kieras, David

1998-01-01

Computational modeling of human perceptual-motor and cognitive performance based on a comprehensive detailed information- processing architecture leads to new insights about the components of working memory...
An ATLAS distributed computing architecture for HL-LHC

CERN Document Server

Campana, Simone; The ATLAS collaboration

2017-01-01

The ATLAS collaboration started a process to understand the computing needs for the High Luminosity LHC era. Based on our best understanding of the computing model input parameters for the HL-LHC data taking conditions, results indicate the need for a larger amount of computational and storage resources with respect of the projection of constant yearly budget for computing in 2026. Filling the gap between the projection and the needs will be one of the challenges in preparation for LHC Run-4. While the gains from improvements in offline software will play a crucial role in this process, a different model for data processing, management, access and bookkeeping should also be envisaged to optimise resource usage. In this contribution we will describe a straw man of this model, founded on basic principles such as single event level granularity for data processing and virtual data. We will explain how the current architecture will evolve adiabatically into the future distributed computing system, through the prot...
High performance computer code for molecular dynamics simulations

International Nuclear Information System (INIS)

Levay, I.; Toekesi, K.

2007-01-01

Complete text of publication follows. Molecular Dynamics (MD) simulation is a widely used technique for modeling complicated physical phenomena. Since 2005 we are developing a MD simulations code for PC computers. The computer code is written in C++ object oriented programming language. The aim of our work is twofold: a) to develop a fast computer code for the study of random walk of guest atoms in Be crystal, b) 3 dimensional (3D) visualization of the particles motion. In this case we mimic the motion of the guest atoms in the crystal (diffusion-type motion), and the motion of atoms in the crystallattice (crystal deformation). Nowadays, it is common to use Graphics Devices in intensive computational problems. There are several ways to use this extreme processing performance, but never before was so easy to programming these devices as now. The CUDA (Compute Unified Device) Architecture introduced by nVidia Corporation in 2007 is a very useful for every processor hungry application. A Unified-architecture GPU include 96-128, or more stream processors, so the raw calculation performance is 576(!) GFLOPS. It is ten times faster, than the fastest dual Core CPU [Fig.1]. Our improved MD simulation software uses this new technology, which speed up our software and the code run 10 times faster in the critical calculation code segment. Although the GPU is a very powerful tool, it has a strongly paralleled structure. It means, that we have to create an algorithm, which works on several processors without deadlock. Our code currently uses 256 threads, shared and constant on-chip memory, instead of global memory, which is 100 times slower than others. It is possible to implement the total algorithm on GPU, therefore we do not need to download and upload the data in every iteration. On behalf of maximal throughput, every thread run with the same instructions
Bringing high-performance computing to the biologist's workbench: approaches, applications, and challenges

International Nuclear Information System (INIS)

Oehmen, C S; Cannon, W R

2008-01-01

Data-intensive and high-performance computing are poised to significantly impact the future of biological research which is increasingly driven by the prevalence of high-throughput experimental methodologies for genome sequencing, transcriptomics, proteomics, and other areas. Large centers such as NIH's National Center for Biotechnology Information, The Institute for Genomic Research, and the DOE's Joint Genome Institute) have made extensive use of multiprocessor architectures to deal with some of the challenges of processing, storing and curating exponentially growing genomic and proteomic datasets, thus enabling users to rapidly access a growing public data source, as well as use analysis tools transparently on high-performance computing resources. Applying this computational power to single-investigator analysis, however, often relies on users to provide their own computational resources, forcing them to endure the learning curve of porting, building, and running software on multiprocessor architectures. Solving the next generation of large-scale biology challenges using multiprocessor machines-from small clusters to emerging petascale machines-can most practically be realized if this learning curve can be minimized through a combination of workflow management, data management and resource allocation as well as intuitive interfaces and compatibility with existing common data formats
A COMPUTER APPLICATION FOR THE ARCHITECTURAL PROGRAM DEVELOPMENT IN DESIGN EDUCATION

Directory of Open Access Journals (Sweden)

Daniel de Carvalho Moreira

2012-02-01

Full Text Available The development of the architectural program in the design studio faces several difficulties. The purpose of the program is to describe the conditions where the building being designed will operate; this requires a lot of information and organization. Due to its complexity, the architetural program definition in the disciplines of design is often simplified. This article discusses such issue and proposes a computer application (SINFORMA that gathers information about the building and the theme of the project in order to develop the architectural program based on structures proposed by bibliographic references. The SINFORMA is composed by a framework which includes a data base and modules which analyze and organize functional requirements, according to the Problem Seeking method and the contemporary values of architecture enumerated by Hershberger. It is discussed how the application can be applied in design education and how it offers students a practical approach and a comprehensive data analysis for the design of built environment. Keywords: Architectural programming, Architectural design, Education.
Parallel algorithms and architecture for computation of manipulator forward dynamics

Science.gov (United States)

Fijany, Amir; Bejczy, Antal K.

1989-01-01

Parallel computation of manipulator forward dynamics is investigated. Considering three classes of algorithms for the solution of the problem, that is, the O(n), the O(n exp 2), and the O(n exp 3) algorithms, parallelism in the problem is analyzed. It is shown that the problem belongs to the class of NC and that the time and processors bounds are of O(log2/2n) and O(n exp 4), respectively. However, the fastest stable parallel algorithms achieve the computation time of O(n) and can be derived by parallelization of the O(n exp 3) serial algorithms. Parallel computation of the O(n exp 3) algorithms requires the development of parallel algorithms for a set of fundamentally different problems, that is, the Newton-Euler formulation, the computation of the inertia matrix, decomposition of the symmetric, positive definite matrix, and the solution of triangular systems. Parallel algorithms for this set of problems are developed which can be efficiently implemented on a unique architecture, a triangular array of n(n+2)/2 processors with a simple nearest-neighbor interconnection. This architecture is particularly suitable for VLSI and WSI implementations. The developed parallel algorithm, compared to the best serial O(n) algorithm, achieves an asymptotic speedup of more than two orders-of-magnitude in the computation the forward dynamics.
Three-Dimensional Nanobiocomputing Architectures With Neuronal Hypercells

Science.gov (United States)

2007-06-01

Neumann architectures, and CMOS fabrication. Novel solutions of massive parallel distributed computing and processing (pipelined due to systolic... and processing platforms utilizing molecular hardware within an enabling organization and architecture. The design technology is based on utilizing a...Microsystems and Nanotechnologies investigated a novel 3D3 (Hardware Software Nanotechnology) technology to design super-high performance computing
Collaborative Working Architecture for IoT-Based Applications.

Science.gov (United States)

Mora, Higinio; Signes-Pont, María Teresa; Gil, David; Johnsson, Magnus

2018-05-23

The new sensing applications need enhanced computing capabilities to handle the requirements of complex and huge data processing. The Internet of Things (IoT) concept brings processing and communication features to devices. In addition, the Cloud Computing paradigm provides resources and infrastructures for performing the computations and outsourcing the work from the IoT devices. This scenario opens new opportunities for designing advanced IoT-based applications, however, there is still much research to be done to properly gear all the systems for working together. This work proposes a collaborative model and an architecture to take advantage of the available computing resources. The resulting architecture involves a novel network design with different levels which combines sensing and processing capabilities based on the Mobile Cloud Computing (MCC) paradigm. An experiment is included to demonstrate that this approach can be used in diverse real applications. The results show the flexibility of the architecture to perform complex computational tasks of advanced applications.
Service Oriented Architecture for High Level Applications

International Nuclear Information System (INIS)

Chu, P.

2012-01-01

Standalone high level applications often suffer from poor performance and reliability due to lengthy initialization, heavy computation and rapid graphical update. Service-oriented architecture (SOA) is trying to separate the initialization and computation from applications and to distribute such work to various service providers. Heavy computation such as beam tracking will be done periodically on a dedicated server and data will be available to client applications at all time. Industrial standard service architecture can help to improve the performance, reliability and maintainability of the service. Robustness will also be improved by reducing the complexity of individual client applications.
CMS on the GRID: Toward a fully distributed computing architecture

International Nuclear Information System (INIS)

Innocente, Vincenzo

2003-01-01

The computing systems required to collect, analyse and store the physics data at LHC would need to be distributed and global in scope. CMS is actively involved in several grid-related projects to develop and deploy a fully distributed computing architecture. We present here recent developments of tools for automating job submission and for serving data to remote analysis stations. Plans for further test and deployment of a production grid are also described
STEMsalabim: A high-performance computing cluster friendly code for scanning transmission electron microscopy image simulations of thin specimens

International Nuclear Information System (INIS)

Oelerich, Jan Oliver; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D.; Volz, Kerstin

2017-01-01

Highlights: • We present STEMsalabim, a modern implementation of the multislice algorithm for simulation of STEM images. • Our package is highly parallelizable on high-performance computing clusters, combining shared and distributed memory architectures. • With STEMsalabim, computationally and memory expensive STEM image simulations can be carried out within reasonable time. - Abstract: We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space.
STEMsalabim: A high-performance computing cluster friendly code for scanning transmission electron microscopy image simulations of thin specimens

Energy Technology Data Exchange (ETDEWEB)

Oelerich, Jan Oliver, E-mail: jan.oliver.oelerich@physik.uni-marburg.de; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D.; Volz, Kerstin

2017-06-15

Highlights: • We present STEMsalabim, a modern implementation of the multislice algorithm for simulation of STEM images. • Our package is highly parallelizable on high-performance computing clusters, combining shared and distributed memory architectures. • With STEMsalabim, computationally and memory expensive STEM image simulations can be carried out within reasonable time. - Abstract: We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space.
Blaze-DEMGPU: Modular high performance DEM framework for the GPU architecture

Directory of Open Access Journals (Sweden)

Nicolin Govender

2016-01-01

Full Text Available Blaze-DEMGPU is a modular GPU based discrete element method (DEM framework that supports polyhedral shaped particles. The high level performance is attributed to the light weight and Single Instruction Multiple Data (SIMD that the GPU architecture offers. Blaze-DEMGPU offers suitable algorithms to conduct DEM simulations on the GPU and these algorithms can be extended and modified. Since a large number of scientific simulations are particle based, many of the algorithms and strategies for GPU implementation present in Blaze-DEMGPU can be applied to other fields. Blaze-DEMGPU will make it easier for new researchers to use high performance GPU computing as well as stimulate wider GPU research efforts by the DEM community.
14th annual Results and Review Workshop on High Performance Computing in Science and Engineering

CERN Document Server

Nagel, Wolfgang E; Resch, Michael M; Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2011; High Performance Computing in Science and Engineering '11

2012-01-01

This book presents the state-of-the-art in simulation on supercomputers. Leading researchers present results achieved on systems of the High Performance Computing Center Stuttgart (HLRS) for the year 2011. The reports cover all fields of computational science and engineering, ranging from CFD to computational physics and chemistry, to computer science, with a special emphasis on industrially relevant applications. Presenting results for both vector systems and microprocessor-based systems, the book allows readers to compare the performance levels and usability of various architectures. As HLRS
The Design of a System Architecture for Mobile Multimedia Computers

NARCIS (Netherlands)

Havinga, Paul J.M.

2000-01-01

This chapter discusses the system architecture of a portable computer, called Mobile Digital Companion, which provides support for handling multimedia applications energy efficiently. Because battery life is limited and battery weight is an important factor for the size and the weight of the Mobile
A SECURE MESSAGE TRANSMISSION SYSTEM ARCHITECTURE FOR COMPUTER NETWORKS EMPLOYING SMART CARDS

Directory of Open Access Journals (Sweden)

Geylani KARDAŞ

2008-01-01

Full Text Available In this study, we introduce a mobile system architecture which employs smart cards for secure message transmission in computer networks. The use of smart card provides two security services as authentication and confidentiality in our design. The security of the system is provided by asymmetric encryption. Hence, smart cards are used to store personal account information as well as private key of each user for encryption / decryption operations. This offers further security, authentication and mobility to the system architecture. A real implementation of the proposed architecture which utilizes the JavaCard technology is also discussed in this study.
Communication-Oriented Design Space Exploration for Reconfigurable Architectures

Directory of Open Access Journals (Sweden)

Gogniat Guy

2007-01-01

Full Text Available Many academic works in computer engineering focus on reconfigurable architectures and associated tools. Fine-grain architectures, field programmable gate arrays (FPGAs, are the most well-known structures of reconfigurable hardware. Dedicated tools (generic or specific allow for the exploration of their design space to choose the best architecture characteristics and/or to explore the application characteristics. The aim is to increase the synergy between the application and the architecture in order to get the best performance. However, there is no generic tool to perform such an exploration for coarse-grain or heterogeneous-grain architectures, just a small number of very specific tools are able to explore a limited set of architectures. To address this major lack, in this paper we propose a new design space exploration approach adapted to fine- and coarse-grain granularities. Our approach combines algorithmic and architecture explorations. It relies on an automatic estimation tool which computes the communication hierarchical distribution and the architectural processing resources use rate for the architecture under exploration. Such an approach forwards the rapid definition of efficient reconfigurable architectures dedicated to one or several applications.
A price and performance comparison of three different storage architectures for data in cloud-based systems

Science.gov (United States)

Gallagher, J. H. R.; Jelenak, A.; Potter, N.; Fulker, D. W.; Habermann, T.

2017-12-01

Providing data services based on cloud computing technology that is equivalent to those developed for traditional computing and storage systems is critical for successful migration to cloud-based architectures for data production, scientific analysis and storage. OPeNDAP Web-service capabilities (comprising the Data Access Protocol (DAP) specification plus open-source software for realizing DAP in servers and clients) are among the most widely deployed means for achieving data-as-service functionality in the Earth sciences. OPeNDAP services are especially common in traditional data center environments where servers offer access to datasets stored in (very large) file systems, and a preponderance of the source data for these services is being stored in the Hierarchical Data Format Version 5 (HDF5). Three candidate architectures for serving NASA satellite Earth Science HDF5 data via Hyrax running on Amazon Web Services (AWS) were developed and their performance examined for a set of representative use cases. The performance was based both on runtime and incurred cost. The three architectures differ in how HDF5 files are stored in the Amazon Simple Storage Service (S3) and how the Hyrax server (as an EC2 instance) retrieves their data. The results for both the serial and parallel access to HDF5 data in the S3 will be presented. While the study focused on HDF5 data, OPeNDAP and the Hyrax data server, the architectures are generic and the analysis can be extrapolated to many different data formats, web APIs, and data servers.

Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs

Science.gov (United States)

Dias, Tiago; Roma, Nuno; Sousa, Leonel

2014-12-01

A unified architecture for fast and efficient computation of the set of two-dimensional (2-D) transforms adopted by the most recent state-of-the-art digital video standards is presented in this paper. Contrasting to other designs with similar functionality, the presented architecture is supported on a scalable, modular and completely configurable processing structure. This flexible structure not only allows to easily reconfigure the architecture to support different transform kernels, but it also permits its resizing to efficiently support transforms of different orders (e.g. order-4, order-8, order-16 and order-32). Consequently, not only is it highly suitable to realize high-performance multi-standard transform cores, but it also offers highly efficient implementations of specialized processing structures addressing only a reduced subset of transforms that are used by a specific video standard. The experimental results that were obtained by prototyping several configurations of this processing structure in a Xilinx Virtex-7 FPGA show the superior performance and hardware efficiency levels provided by the proposed unified architecture for the implementation of transform cores for the Advanced Video Coding (AVC), Audio Video coding Standard (AVS), VC-1 and High Efficiency Video Coding (HEVC) standards. In addition, such results also demonstrate the ability of this processing structure to realize multi-standard transform cores supporting all the standards mentioned above and that are capable of processing the 8k Ultra High Definition Television (UHDTV) video format (7,680 × 4,320 at 30 fps) in real time.
Towards the development of run times leveraging virtualization for high performance computing

International Nuclear Information System (INIS)

Diakhate, F.

2010-12-01

In recent years, there has been a growing interest in using virtualization to improve the efficiency of data centers. This success is rooted in virtualization's excellent fault tolerance and isolation properties, in the overall flexibility it brings, and in its ability to exploit multi-core architectures efficiently. These characteristics also make virtualization an ideal candidate to tackle issues found in new compute cluster architectures. However, in spite of recent improvements in virtualization technology, overheads in the execution of parallel applications remain, which prevent its use in the field of high performance computing. In this thesis, we propose a virtual device dedicated to message passing between virtual machines, so as to improve the performance of parallel applications executed in a cluster of virtual machines. We also introduce a set of techniques facilitating the deployment of virtualized parallel applications. These functionalities have been implemented as part of a runtime system which allows to benefit from virtualization's properties in a way that is as transparent as possible to the user while minimizing performance overheads. (author)
An Architecture for Cross-Cloud System Management

Science.gov (United States)

Dodda, Ravi Teja; Smith, Chris; van Moorsel, Aad

The emergence of the cloud computing paradigm promises flexibility and adaptability through on-demand provisioning of compute resources. As the utilization of cloud resources extends beyond a single provider, for business as well as technical reasons, the issue of effectively managing such resources comes to the fore. Different providers expose different interfaces to their compute resources utilizing varied architectures and implementation technologies. This heterogeneity poses a significant system management problem, and can limit the extent to which the benefits of cross-cloud resource utilization can be realized. We address this problem through the definition of an architecture to facilitate the management of compute resources from different cloud providers in an homogenous manner. This preserves the flexibility and adaptability promised by the cloud computing paradigm, whilst enabling the benefits of cross-cloud resource utilization to be realized. The practical efficacy of the architecture is demonstrated through an implementation utilizing compute resources managed through different interfaces on the Amazon Elastic Compute Cloud (EC2) service. Additionally, we provide empirical results highlighting the performance differential of these different interfaces, and discuss the impact of this performance differential on efficiency and profitability.
Computer aided design of architecture of degradable tissue engineering scaffolds.

Science.gov (United States)

Heljak, M K; Kurzydlowski, K J; Swieszkowski, W

2017-11-01

One important factor affecting the process of tissue regeneration is scaffold stiffness loss, which should be properly balanced with the rate of tissue regeneration. The aim of the research reported here was to develop a computer tool for designing the architecture of biodegradable scaffolds fabricated by melt-dissolution deposition systems (e.g. Fused Deposition Modeling) to provide the required scaffold stiffness at each stage of degradation/regeneration. The original idea presented in the paper is that the stiffness of a tissue engineering scaffold can be controlled during degradation by means of a proper selection of the diameter of the constituent fibers and the distances between them. This idea is based on the size-effect on degradation of aliphatic polyesters. The presented computer tool combines a genetic algorithm and a diffusion-reaction model of polymer hydrolytic degradation. In particular, we show how to design the architecture of scaffolds made of poly(DL-lactide-co-glycolide) with the required Young's modulus change during hydrolytic degradation.
Could running experience on SPMD computers contribute to the architectural choices for future dedicated computers for high energy physics simulation

International Nuclear Information System (INIS)

Jejcic, A.; Maillard, J.; Silva, J.; Auguin, M.; Boeri, F.

1989-01-01

Results obtained on strongly coupled parallel computer are reported. They concern Monte-Carlo simulation and pattern recognition. Though the calculations were made on an experimental computer of rather low processing power, it is believed that the quoted figures could give useful indications on architectural choices for dedicated computers
Contributing to the design of run-time systems dedicated to high performance computing

International Nuclear Information System (INIS)

Perache, M.

2006-10-01

In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)
Raexplore: Enabling Rapid, Automated Architecture Exploration for Full Applications

Energy Technology Data Exchange (ETDEWEB)

Zhang, Yao [Argonne National Lab. (ANL), Argonne, IL (United States); Balaprakash, Prasanna [Argonne National Lab. (ANL), Argonne, IL (United States); Meng, Jiayuan [Argonne National Lab. (ANL), Argonne, IL (United States); Morozov, Vitali [Argonne National Lab. (ANL), Argonne, IL (United States); Parker, Scott [Argonne National Lab. (ANL), Argonne, IL (United States); Kumaran, Kalyan [Argonne National Lab. (ANL), Argonne, IL (United States)

2014-12-01

We present Raexplore, a performance modeling framework for architecture exploration. Raexplore enables rapid, automated, and systematic search of architecture design space by combining hardware counter-based performance characterization and analytical performance modeling. We demonstrate Raexplore for two recent manycore processors IBM Blue- Gene/Q compute chip and Intel Xeon Phi, targeting a set of scientific applications. Our framework is able to capture complex interactions between architectural components including instruction pipeline, cache, and memory, and to achieve a 3–22% error for same-architecture and cross-architecture performance predictions. Furthermore, we apply our framework to assess the two processors, and discover and evaluate a list of architectural scaling options for future processor designs.
Optimizing Engineering Tools Using Modern Ground Architectures

Science.gov (United States)

2017-12-01

ENGINEERING TOOLS USING MODERN GROUND ARCHITECTURES by Ryan P. McArdle December 2017 Thesis Advisor: Marc Peters Co-Advisor: I.M. Ross...Master’s thesis 4. TITLE AND SUBTITLE OPTIMIZING ENGINEERING TOOLS USING MODERN GROUND ARCHITECTURES 5. FUNDING NUMBERS 6. AUTHOR(S) Ryan P. McArdle 7... engineering tools. First, the effectiveness of MathWorks’ Parallel Computing Toolkit is assessed when performing somewhat basic computations in
Earth Science Computational Architecture for Multi-disciplinary Investigations

Science.gov (United States)

Parker, J. W.; Blom, R.; Gurrola, E.; Katz, D.; Lyzenga, G.; Norton, C.

2005-12-01

Understanding the processes underlying Earth's deformation and mass transport requires a non-traditional, integrated, interdisciplinary, approach dependent on multiple space and ground based data sets, modeling, and computational tools. Currently, details of geophysical data acquisition, analysis, and modeling largely limit research to discipline domain experts. Interdisciplinary research requires a new computational architecture that is optimized to perform complex data processing of multiple solid Earth science data types in a user-friendly environment. A web-based computational framework is being developed and integrated with applications for automatic interferometric radar processing, and models for high-resolution deformation & gravity, forward models of viscoelastic mass loading over short wavelengths & complex time histories, forward-inverse codes for characterizing surface loading-response over time scales of days to tens of thousands of years, and inversion of combined space magnetic & gravity fields to constrain deep crustal and mantle properties. This framework combines an adaptation of the QuakeSim distributed services methodology with the Pyre framework for multiphysics development. The system uses a three-tier architecture, with a middle tier server that manages user projects, available resources, and security. This ensures scalability to very large networks of collaborators. Users log into a web page and have a personal project area, persistently maintained between connections, for each application. Upon selection of an application and host from a list of available entities, inputs may be uploaded or constructed from web forms and available data archives, including gravity, GPS and imaging radar data. The user is notified of job completion and directed to results posted via URLs. Interdisciplinary work is supported through easy availability of all applications via common browsers, application tutorials and reference guides, and worked examples with
Could running experience on SPMD computers contribute to the architectural choices for future dedicated computers for high energy physics simulation?

International Nuclear Information System (INIS)

Jejcic, A.; Maillard, J.; Silva, J.; Auguin, M.; Boeri, F.

1989-01-01

Results obtained on a strongly coupled parallel computer are reported. They concern Monte-Carlo simulation and pattern recognition. Though the calculations were made on an experimental computer of rather low processing power, it is believed that the quoted figures could give useful indications on architectural choices for dedicated computers. (orig.)
A hybrid optical switch architecture to integrate IP into optical networks to provide flexible and intelligent bandwidth on demand for cloud computing

Science.gov (United States)

Yang, Wei; Hall, Trevor J.

2013-12-01

The Internet is entering an era of cloud computing to provide more cost effective, eco-friendly and reliable services to consumer and business users. As a consequence, the nature of the Internet traffic has been fundamentally transformed from a pure packet-based pattern to today's predominantly flow-based pattern. Cloud computing has also brought about an unprecedented growth in the Internet traffic. In this paper, a hybrid optical switch architecture is presented to deal with the flow-based Internet traffic, aiming to offer flexible and intelligent bandwidth on demand to improve fiber capacity utilization. The hybrid optical switch is capable of integrating IP into optical networks for cloud-based traffic with predictable performance, for which the delay performance of the electronic module in the hybrid optical switch architecture is evaluated through simulation.
A Project-Based Learning Approach to Programmable Logic Design and Computer Architecture

Science.gov (United States)

Kellett, C. M.

2012-01-01

This paper describes a course in programmable logic design and computer architecture as it is taught at the University of Newcastle, Australia. The course is designed around a major design project and has two supplemental assessment tasks that are also described. The context of the Computer Engineering degree program within which the course is…
Molecular architectures based on π-conjugated block copolymers for global quantum computation

International Nuclear Information System (INIS)

Mujica Martinez, C A; Arce, J C; Reina, J H; Thorwart, M

2009-01-01

We propose a molecular setup for the physical implementation of a barrier global quantum computation scheme based on the electron-doped π-conjugated copolymer architecture of nine blocks PPP-PDA-PPP-PA-(CCH-acene)-PA-PPP-PDA-PPP (where each block is an oligomer). The physical carriers of information are electrons coupled through the Coulomb interaction, and the building block of the computing architecture is composed by three adjacent qubit systems in a quasi-linear arrangement, each of them allowing qubit storage, but with the central qubit exhibiting a third accessible state of electronic energy far away from that of the qubits' transition energy. The third state is reached from one of the computational states by means of an on-resonance coherent laser field, and acts as a barrier mechanism for the direct control of qubit entanglement. Initial estimations of the spontaneous emission decay rates associated to the energy level structure allow us to compute a damping rate of order 10 -7 s, which suggest a not so strong coupling to the environment. Our results offer an all-optical, scalable, proposal for global quantum computing based on semiconducting π-conjugated polymers.
Molecular architectures based on pi-conjugated block copolymers for global quantum computation

Energy Technology Data Exchange (ETDEWEB)

Mujica Martinez, C A; Arce, J C [Universidad del Valle, Departamento de QuImica, A. A. 25360, Cali (Colombia); Reina, J H [Universidad del Valle, Departamento de Fisica, A. A. 25360, Cali (Colombia); Thorwart, M, E-mail: camujica@univalle.edu.c, E-mail: j.reina-estupinan@physics.ox.ac.u, E-mail: jularce@univalle.edu.c [Institut fuer Theoretische Physik IV, Heinrich-Heine-Universitaet Duesseldorf, 40225 Duesseldorf (Germany)

2009-05-01

We propose a molecular setup for the physical implementation of a barrier global quantum computation scheme based on the electron-doped pi-conjugated copolymer architecture of nine blocks PPP-PDA-PPP-PA-(CCH-acene)-PA-PPP-PDA-PPP (where each block is an oligomer). The physical carriers of information are electrons coupled through the Coulomb interaction, and the building block of the computing architecture is composed by three adjacent qubit systems in a quasi-linear arrangement, each of them allowing qubit storage, but with the central qubit exhibiting a third accessible state of electronic energy far away from that of the qubits' transition energy. The third state is reached from one of the computational states by means of an on-resonance coherent laser field, and acts as a barrier mechanism for the direct control of qubit entanglement. Initial estimations of the spontaneous emission decay rates associated to the energy level structure allow us to compute a damping rate of order 10{sup -7} s, which suggest a not so strong coupling to the environment. Our results offer an all-optical, scalable, proposal for global quantum computing based on semiconducting pi-conjugated polymers.
Client-server computer architecture saves costs and eliminates bottlenecks

International Nuclear Information System (INIS)

Darukhanavala, P.P.; Davidson, M.C.; Tyler, T.N.; Blaskovich, F.T.; Smith, C.

1992-01-01

This paper reports that workstation, client-server architecture saved costs and eliminated bottlenecks that BP Exploration (Alaska) Inc. experienced with mainframe computer systems. In 1991, BP embarked on an ambitious project to change technical computing for its Prudhoe Bay, Endicott, and Kuparuk operations on Alaska's North Slope. This project promised substantial rewards, but also involved considerable risk. The project plan called for reservoir simulations (which historically had run on a Cray Research Inc. X-MP supercomputer in the company's Houston data center) to be run on small computer workstations. Additionally, large Prudhoe Bay, Endicott, and Kuparuk production and reservoir engineering data bases and related applications also would be moved to workstations, replacing a Digital Equipment Corp. VAX cluster in Anchorage
High-performance reconfigurable hardware architecture for restricted Boltzmann machines.

Science.gov (United States)

Ly, Daniel Le; Chow, Paul

2010-11-01

Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications has been limited. A primary cause for this lack of adoption is that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can exploit the inherent parallelism in neural networks is desired. This paper investigates how the restricted Boltzmann machine (RBM), which is a popular type of neural network, can be mapped to a high-performance hardware architecture on field-programmable gate array (FPGA) platforms. The proposed modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. A method to partition large RBMs into smaller congruent components is also presented, allowing the distribution of one RBM across multiple FPGA resources. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100 MHz through a variety of different configurations. The maximum performance was obtained by instantiating an RBM of 256 × 256 nodes distributed across four FPGAs, which resulted in a computational speed of 3.13 billion connection-updates-per-second and a speedup of 145-fold over an optimized C program running on a 2.8-GHz Intel processor.
Efficient Architecture for Spike Sorting in Reconfigurable Hardware

Science.gov (United States)

Hwang, Wen-Jyi; Lee, Wei-Hao; Lin, Shiow-Jyu; Lai, Sheng-Ying

2013-01-01

This paper presents a novel hardware architecture for fast spike sorting. The architecture is able to perform both the feature extraction and clustering in hardware. The generalized Hebbian algorithm (GHA) and fuzzy C-means (FCM) algorithm are used for feature extraction and clustering, respectively. The employment of GHA allows efficient computation of principal components for subsequent clustering operations. The FCM is able to achieve near optimal clustering for spike sorting. Its performance is insensitive to the selection of initial cluster centers. The hardware implementations of GHA and FCM feature low area costs and high throughput. In the GHA architecture, the computation of different weight vectors share the same circuit for lowering the area costs. Moreover, in the FCM hardware implementation, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. To show the effectiveness of the circuit, the proposed architecture is physically implemented by field programmable gate array (FPGA). It is embedded in a System-on-Chip (SOC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient spike sorting design for attaining high classification correct rate and high speed computation. PMID:24189331
Efficient Architecture for Spike Sorting in Reconfigurable Hardware

Directory of Open Access Journals (Sweden)

Sheng-Ying Lai

2013-11-01

Full Text Available This paper presents a novel hardware architecture for fast spike sorting. The architecture is able to perform both the feature extraction and clustering in hardware. The generalized Hebbian algorithm (GHA and fuzzy C-means (FCM algorithm are used for feature extraction and clustering, respectively. The employment of GHA allows efficient computation of principal components for subsequent clustering operations. The FCM is able to achieve near optimal clustering for spike sorting. Its performance is insensitive to the selection of initial cluster centers. The hardware implementations of GHA and FCM feature low area costs and high throughput. In the GHA architecture, the computation of different weight vectors share the same circuit for lowering the area costs. Moreover, in the FCM hardware implementation, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. To show the effectiveness of the circuit, the proposed architecture is physically implemented by field programmable gate array (FPGA. It is embedded in a System-on-Chip (SOC platform for performance measurement. Experimental results show that the proposed architecture is an efficient spike sorting design for attaining high classification correct rate and high speed computation.
METRIC context unit architecture

Energy Technology Data Exchange (ETDEWEB)

Simpson, R.O.

1988-01-01

METRIC is an architecture for a simple but powerful Reduced Instruction Set Computer (RISC). Its speed comes from the simultaneous processing of several instruction streams, with instructions from the various streams being dispatched into METRIC's execution pipeline as they become available for execution. The pipeline is thus kept full, with a mix of instructions for several contexts in execution at the same time. True parallel programming is supported within a single execution unit, the METRIC Context Unit. METRIC's architecture provides for expansion through the addition of multiple Context Units and of specialized Functional Units. The architecture thus spans a range of size and performance from a single-chip microcomputer up through large and powerful multiprocessors. This research concentrates on the specification of the METRIC Context Unit at the architectural level. Performance tradeoffs made during METRIC's design are discussed, and projections of METRIC's performance are made based on simulation studies.
Silicon CMOS architecture for a spin-based quantum computer.

Science.gov (United States)

Veldhorst, M; Eenink, H G J; Yang, C H; Dzurak, A S

2017-12-15

Recent advances in quantum error correction codes for fault-tolerant quantum computing and physical realizations of high-fidelity qubits in multiple platforms give promise for the construction of a quantum computer based on millions of interacting qubits. However, the classical-quantum interface remains a nascent field of exploration. Here, we propose an architecture for a silicon-based quantum computer processor based on complementary metal-oxide-semiconductor (CMOS) technology. We show how a transistor-based control circuit together with charge-storage electrodes can be used to operate a dense and scalable two-dimensional qubit system. The qubits are defined by the spin state of a single electron confined in quantum dots, coupled via exchange interactions, controlled using a microwave cavity, and measured via gate-based dispersive readout. We implement a spin qubit surface code, showing the prospects for universal quantum computation. We discuss the challenges and focus areas that need to be addressed, providing a path for large-scale quantum computing.

Scaling to Nanotechnology Limits with the PIMS Computer Architecture and a new Scaling Rule

Energy Technology Data Exchange (ETDEWEB)

Debenedictis, Erik P. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2015-02-01

We describe a new approach to computing that moves towards the limits of nanotechnology using a newly formulated sc aling rule. This is in contrast to the current computer industry scali ng away from von Neumann's original computer at the rate of Moore's Law. We extend Moore's Law to 3D, which l eads generally to architectures that integrate logic and memory. To keep pow er dissipation cons tant through a 2D surface of the 3D structure requires using adiabatic principles. We call our newly proposed architecture Processor In Memory and Storage (PIMS). We propose a new computational model that integrates processing and memory into "tiles" that comprise logic, memory/storage, and communications functions. Since the programming model will be relatively stable as a system scales, programs repr esented by tiles could be executed in a PIMS system built with today's technology or could become the "schematic diagram" for implementation in an ultimate 3D nanotechnology of the future. We build a systems software approach that offers advantages over and above the technological and arch itectural advantages. Firs t, the algorithms may be more efficient in the conventional sens e of having fewer steps. Second, the algorithms may run with higher power efficiency per operation by being a better match for the adiabatic scaling ru le. The performance analysis based on demonstrated ideas in physical science suggests 80,000 x improvement in cost per operation for the (arguably) gene ral purpose function of emulating neurons in Deep Learning.
Factoring symmetric indefinite matrices on high-performance architectures

Science.gov (United States)

Jones, Mark T.; Patrick, Merrell L.

1990-01-01

The Bunch-Kaufman algorithm is the method of choice for factoring symmetric indefinite matrices in many applications. However, the Bunch-Kaufman algorithm does not take advantage of high-performance architectures such as the Cray Y-MP. Three new algorithms, based on Bunch-Kaufman factorization, that take advantage of such architectures are described. Results from an implementation of the third algorithm are presented.
ABINIT: Plane-Wave-Based Density-Functional Theory on High Performance Computers

Science.gov (United States)

Torrent, Marc

2014-03-01

For several years, a continuous effort has been produced to adapt electronic structure codes based on Density-Functional Theory to the future computing architectures. Among these codes, ABINIT is based on a plane-wave description of the wave functions which allows to treat systems of any kind. Porting such a code on petascale architectures pose difficulties related to the many-body nature of the DFT equations. To improve the performances of ABINIT - especially for what concerns standard LDA/GGA ground-state and response-function calculations - several strategies have been followed: A full multi-level parallelisation MPI scheme has been implemented, exploiting all possible levels and distributing both computation and memory. It allows to increase the number of distributed processes and could not be achieved without a strong restructuring of the code. The core algorithm used to solve the eigen problem (``Locally Optimal Blocked Congugate Gradient''), a Blocked-Davidson-like algorithm, is based on a distribution of processes combining plane-waves and bands. In addition to the distributed memory parallelization, a full hybrid scheme has been implemented, using standard shared-memory directives (openMP/openACC) or porting some comsuming code sections to Graphics Processing Units (GPU). As no simple performance model exists, the complexity of use has been increased; the code efficiency strongly depends on the distribution of processes among the numerous levels. ABINIT is able to predict the performances of several process distributions and automatically choose the most favourable one. On the other hand, a big effort has been carried out to analyse the performances of the code on petascale architectures, showing which sections of codes have to be improved; they all are related to Matrix Algebra (diagonalisation, orthogonalisation). The different strategies employed to improve the code scalability will be described. They are based on an exploration of new diagonalization
A supportive architecture for CFD-based design optimisation

Science.gov (United States)

Li, Ni; Su, Zeya; Bi, Zhuming; Tian, Chao; Ren, Zhiming; Gong, Guanghong

2014-03-01

Multi-disciplinary design optimisation (MDO) is one of critical methodologies to the implementation of enterprise systems (ES). MDO requiring the analysis of fluid dynamics raises a special challenge due to its extremely intensive computation. The rapid development of computational fluid dynamic (CFD) technique has caused a rise of its applications in various fields. Especially for the exterior designs of vehicles, CFD has become one of the three main design tools comparable to analytical approaches and wind tunnel experiments. CFD-based design optimisation is an effective way to achieve the desired performance under the given constraints. However, due to the complexity of CFD, integrating with CFD analysis in an intelligent optimisation algorithm is not straightforward. It is a challenge to solve a CFD-based design problem, which is usually with high dimensions, and multiple objectives and constraints. It is desirable to have an integrated architecture for CFD-based design optimisation. However, our review on existing works has found that very few researchers have studied on the assistive tools to facilitate CFD-based design optimisation. In the paper, a multi-layer architecture and a general procedure are proposed to integrate different CFD toolsets with intelligent optimisation algorithms, parallel computing technique and other techniques for efficient computation. In the proposed architecture, the integration is performed either at the code level or data level to fully utilise the capabilities of different assistive tools. Two intelligent algorithms are developed and embedded with parallel computing. These algorithms, together with the supportive architecture, lay a solid foundation for various applications of CFD-based design optimisation. To illustrate the effectiveness of the proposed architecture and algorithms, the case studies on aerodynamic shape design of a hypersonic cruising vehicle are provided, and the result has shown that the proposed architecture
Design and development of a run-time monitor for multi-core architectures in cloud computing.

Science.gov (United States)

Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon

2011-01-01

Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.
Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing

Directory of Open Access Journals (Sweden)

Junghoon Lee

2011-03-01

Full Text Available Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.
Reconfigurable FPGA architecture for computer vision applications in Smart Camera Networks

OpenAIRE

Maggiani , Luca; Salvadori , Claudio; Petracca , Matteo; Pagano , Paolo; Saletti , Roberto

2013-01-01

International audience; Smart Camera Networks (SCNs) is nowadays an emerging research field which represents the natural evolution of centralized computer vision applications towards full distributed and pervasive systems. In such a scenario, one of the biggest effort is in the definition of a flexible and reconfigurable SCN node architecture able to remotely support the possibility of updating the application parameters and changing the running computer vision applications at run-time. In th...
Optimized Architectural Approaches in Hardware and Software Enabling Very High Performance Shared Storage Systems

CERN Multimedia

CERN. Geneva

2004-01-01

There are issues encountered in high performance storage systems that normally lead to compromises in architecture. Compute clusters tend to have compute phases followed by an I/O phase that must move data from the entire cluster in one operation. That data may then be shared by a large number of clients creating unpredictable read and write patterns. In some cases the aggregate performance of a server cluster must exceed 100 GB/s to minimize the time required for the I/O cycle thus maximizing compute availability. Accessing the same content from multiple points in a shared file system leads to the classical problems of data "hot spots" on the disk drive side and access collisions on the data connectivity side. The traditional method for increasing apparent bandwidth usually includes data replication which is costly in both storage and management. Scaling a model that includes replicated data presents additional management challenges as capacity and bandwidth expand asymmetrically while the system is scaled. ...
InfoMall: An Innovative Strategy for High-Performance Computing and Communications Applications Development.

Science.gov (United States)

Mills, Kim; Fox, Geoffrey

1994-01-01

Describes the InfoMall, a program led by the Northeast Parallel Architectures Center (NPAC) at Syracuse University (New York). The InfoMall features a partnership of approximately 24 organizations offering linked programs in High Performance Computing and Communications (HPCC) technology integration, software development, marketing, education and…
Usage of Thin-Client/Server Architecture in Computer Aided Education

Science.gov (United States)

Cimen, Caghan; Kavurucu, Yusuf; Aydin, Halit

2014-01-01

With the advances of technology, thin-client/server architecture has become popular in multi-user/single network environments. Thin-client is a user terminal in which the user can login to a domain and run programs by connecting to a remote server. Recent developments in network and hardware technologies (cloud computing, virtualization, etc.)…
The NILE system architecture: fault-tolerant, wide-area access to computing and data resources

International Nuclear Information System (INIS)

Ricciardi, Aleta; Ogg, Michael; Rothfus, Eric

1996-01-01

NILE is a multi-disciplinary project building a distributed computing environment for HEP. It provides wide-area, fault-tolerant, integrated access to processing and data resources for collaborators of the CLEO experiment, though the goals and principles are applicable to many domains. NILE has three main objectives: a realistic distributed system architecture design, the design of a robust data model, and a Fast-Track implementation providing a prototype design environment which will also be used by CLEO physicists. This paper focuses on the software and wide-area system architecture design and the computing issues involved in making NILE services highly-available. (author)
Computer Assessed Design – A Vehicle of Architectural Communication and a Design Tool

OpenAIRE

Petrovici, Liliana-Mihaela

2012-01-01

In comparison with the limits of the traditional representation tools, the development of the computer graphics constitutes an opportunity to assert architectural values. The differences between communication codes of the architects and public are diminished; the architectural ideas can be represented in a coherent, intelligible and attractive way, so that they get more chances to be materialized according to the thinking of the creator. Concurrently, the graphic software have been improving ...
Architectures, Concepts and Architectures for Service Oriented Computing : proceedings of the 1st International Workshop - ACT4SOC 2007

NARCIS (Netherlands)

van Sinderen, Marten J.; Unknown, [Unknown

2007-01-01

This volume contains the proceedings of the First International Workshop on Architectures, Concepts and Technologies for Service Oriented Computing (ACT4SOC 2007), held on July 22 in Barcelona, Spain, in conjunction with the Second International Conference on Software and Data Technologies (ICSOFT
An overview of the activities of the OECD/NEA Task Force on adapting computer codes in nuclear applications to parallel architectures

Energy Technology Data Exchange (ETDEWEB)

Kirk, B.L. [Oak Ridge National Lab., TN (United States); Sartori, E. [OCDE/OECD NEA Data Bank, Issy-les-Moulineaux (France); Viedma, L.G. de [Consejo de Seguridad Nuclear, Madrid (Spain)

1997-06-01

Subsequent to the introduction of High Performance Computing in the developed countries, the Organization for Economic Cooperation and Development/Nuclear Energy Agency (OECD/NEA) created the Task Force on Adapting Computer Codes in Nuclear Applications to Parallel Architectures (under the guidance of the Nuclear Science Committee`s Working Party on Advanced Computing) to study the growth area in supercomputing and its applicability to the nuclear community`s computer codes. The result has been four years of investigation for the Task Force in different subject fields - deterministic and Monte Carlo radiation transport, computational mechanics and fluid dynamics, nuclear safety, atmospheric models and waste management.
An overview of the activities of the OECD/NEA Task Force on adapting computer codes in nuclear applications to parallel architectures

International Nuclear Information System (INIS)

Kirk, B.L.; Sartori, E.; Viedma, L.G. de

1997-01-01

Subsequent to the introduction of High Performance Computing in the developed countries, the Organization for Economic Cooperation and Development/Nuclear Energy Agency (OECD/NEA) created the Task Force on Adapting Computer Codes in Nuclear Applications to Parallel Architectures (under the guidance of the Nuclear Science Committee's Working Party on Advanced Computing) to study the growth area in supercomputing and its applicability to the nuclear community's computer codes. The result has been four years of investigation for the Task Force in different subject fields - deterministic and Monte Carlo radiation transport, computational mechanics and fluid dynamics, nuclear safety, atmospheric models and waste management
A Methodology for Making Early Comparative Architecture Performance Evaluations

Science.gov (United States)

Doyle, Gerald S.

2010-01-01

Complex and expensive systems' development suffers from a lack of method for making good system-architecture-selection decisions early in the development process. Failure to make a good system-architecture-selection decision increases the risk that a development effort will not meet cost, performance and schedule goals. This research provides a…
A high performance architecture for accelerator controls

International Nuclear Information System (INIS)

Allen, M.; Hunt, S.M; Lue, H.; Saltmarsh, C.G.; Parker, C.R.C.B.

1991-01-01

The demands placed on the Superconducting Super Collider (SSC) control system due to large distances, high bandwidth and fast response time required for operation will require a fresh approach to the data communications architecture of the accelerator. The prototype design effort aims at providing deterministic communication across the accelerator complex with a response time of < 100 ms and total bandwidth of 2 Gbits/sec. It will offer a consistent interface for a large number of equipment types, from vacuum pumps to beam position monitors, providing appropriate communications performance for each equipment type. It will consist of highly parallel links to all equipment: those with computing resources, non-intelligent direct control interfaces, and data concentrators. This system will give each piece of equipment a dedicated link of fixed bandwidth to the control system. Application programs will have access to all accelerator devices which will be memory mapped into a global virtual addressing scheme. Links to devices in the same geographical area will be multiplexed using commercial Time Division Multiplexing equipment. Low-level access will use reflective memory techniques, eliminating processing overhead and complexity of traditional data communication protocols. The use of commercial standards and equipment will enable a high performance system to be built at low cost
A high performance architecture for accelerator controls

International Nuclear Information System (INIS)

Allen, M.; Hunt, S.M.; Lue, H.; Saltmarsh, C.G.; Parker, C.R.C.B.

1991-03-01

The demands placed on the Superconducting Super Collider (SSC) control system due to large distances, high bandwidth and fast response time required for operation will require a fresh approach to the data communications architecture of the accelerator. The prototype design effort aims at providing deterministic communication across the accelerator complex with a response time of <100 ms and total bandwidth of 2 Gbits/sec. It will offer a consistent interface for a large number of equipment types, from vacuum pumps to beam position monitors, providing appropriate communications performance for each equipment type. It will consist of highly parallel links to all equipments: those with computing resources, non-intelligent direct control interfaces, and data concentrators. This system will give each piece of equipment a dedicated link of fixed bandwidth to the control system. Application programs will have access to all accelerator devices which will be memory mapped into a global virtual addressing scheme. Links to devices in the same geographical area will be multiplexed using commercial Time Division Multiplexing equipment. Low-level access will use reflective memory techniques, eliminating processing overhead and complexity of traditional data communication protocols. The use of commercial standards and equipment will enable a high performance system to be built at low cost. 1 fig
Methodology of modeling and measuring computer architectures for plasma simulations

Science.gov (United States)

Wang, L. P. T.

1977-01-01

A brief introduction to plasma simulation using computers and the difficulties on currently available computers is given. Through the use of an analyzing and measuring methodology - SARA, the control flow and data flow of a particle simulation model REM2-1/2D are exemplified. After recursive refinements the total execution time may be greatly shortened and a fully parallel data flow can be obtained. From this data flow, a matched computer architecture or organization could be configured to achieve the computation bound of an application problem. A sequential type simulation model, an array/pipeline type simulation model, and a fully parallel simulation model of a code REM2-1/2D are proposed and analyzed. This methodology can be applied to other application problems which have implicitly parallel nature.
Computation, architectural design and fabrication logic

DEFF Research Database (Denmark)

Larsen, Niels Martin

2016-01-01

Digital fabrication and digital form generation can change the way different professions interact in relation to the development and construction of architecture. The technologies can provide a more integrated design process and expand the architectural vocabulary. At Aarhus School of Architectur...

Performance Aided Design

DEFF Research Database (Denmark)

Parigi, Dario

2014-01-01

paradigm where the increasing integration of parametric tools and performative analysis is changing the way we learn and design. The term Performance Aided Architectural Design (PAD) is proposed at the Master of Science of Architecture and Design at Aalborg University, with the aim of extending a tectonic...... tradition of architecture with computational tools, preparing the basis for the creation of the figure of a modern master builder, sitting at the boundary of the disciplines of architecture and engineering. Performance Aided Design focuses on the role of performative analysis, embedded tectonics......, and computational methods tools to trigger creativity and innovative understanding of relation between form material and a increasingly wide range of performances in architectural design. The ultimate goal is to pursue a design approach that aims at embracing rather than excluding the complexity implicit...
Simulation of electronic structure Hamiltonians in a superconducting quantum computer architecture

Energy Technology Data Exchange (ETDEWEB)

Kaicher, Michael; Wilhelm, Frank K. [Theoretical Physics, Saarland University, 66123 Saarbruecken (Germany); Love, Peter J. [Department of Physics, Haverford College, Haverford, Pennsylvania 19041 (United States)

2015-07-01

Quantum chemistry has become one of the most promising applications within the field of quantum computation. Simulating the electronic structure Hamiltonian (ESH) in the Bravyi-Kitaev (BK)-Basis to compute the ground state energies of atoms/molecules reduces the number of qubit operations needed to simulate a single fermionic operation to O(log(n)) as compared to O(n) in the Jordan-Wigner-Transformation. In this work we will present the details of the BK-Transformation, show an example of implementation in a superconducting quantum computer architecture and compare it to the most recent quantum chemistry algorithms suggesting a constant overhead.
A Computational Architecture for Programmable Automation Research

Science.gov (United States)

Taylor, Russell H.; Korein, James U.; Maier, Georg E.; Durfee, Lawrence F.

1987-03-01

This short paper describes recent work at the IBM T. J. Watson Research Center directed at developing a highly flexible computational architecture for research on sensor-based programmable automation. The system described here has been designed with a focus on dynamic configurability, layered user inter-faces and incorporation of sensor-based real time operations into new commands. It is these features which distinguish it from earlier work. The system is cur-rently being implemented at IBM for research purposes and internal use and is an outgrowth of programmable automation research which has been ongoing since 1972 [e.g., 1, 2, 3, 4, 5, 6] .
FY1995 study of design methodology and environment of high-performance processor architectures; 1995 nendo koseino processor architecture sekkeiho to sekkei kankyo no kenkyu

Energy Technology Data Exchange (ETDEWEB)

NONE

1997-03-01

The aim of our project is to develop high-performance processor architectures for both general purpose and application-specific purpose. We also plan to develop basic softwares, such as compliers, and various design aid tools for those architectures. We are particularly interested in performance evaluation at architecture design phase, design optimization, automatic generation of compliers from processor designs, and architecture design methodologies combined with circuit layout. We have investigated both microprocessor architectures and design methodologies / environments for the processors. Our goal is to establish design technologies for high-performance, low-power, low-cost and highly-reliable systems in system-on-silicon era. We have proposed PPRAM architecture for high-performance system using DRAM and logic mixture technology, Softcore processor architecture for special purpose processors in embedded systems, and Power-Pro architecture for low power systems. We also developed design methodologies and design environments for the above architectures as well as a new method for design verification of microprocessors. (NEDO)
Computer Security Primer: Systems Architecture, Special Ontology and Cloud Virtual Machines

Science.gov (United States)

Waguespack, Leslie J.

2014-01-01

With the increasing proliferation of multitasking and Internet-connected devices, security has reemerged as a fundamental design concern in information systems. The shift of IS curricula toward a largely organizational perspective of security leaves little room for focus on its foundation in systems architecture, the computational underpinnings of…
FPGA hardware acceleration for high performance neutron transport computation based on agent methodology - 318

International Nuclear Information System (INIS)

Shanjie, Xiao; Tatjana, Jevremovic

2010-01-01

The accurate, detailed and 3D neutron transport analysis for Gen-IV reactors is still time-consuming regardless of advanced computational hardware available in developed countries. This paper introduces a new concept in addressing the computational time while persevering the detailed and accurate modeling; a specifically designed FPGA co-processor accelerates robust AGENT methodology for complex reactor geometries. For the first time this approach is applied to accelerate the neutronics analysis. The AGENT methodology solves neutron transport equation using the method of characteristics. The AGENT methodology performance was carefully analyzed before the hardware design based on the FPGA co-processor was adopted. The most time-consuming kernel part is then transplanted into the FPGA co-processor. The FPGA co-processor is designed with data flow-driven non von-Neumann architecture and has much higher efficiency than the conventional computer architecture. Details of the FPGA co-processor design are introduced and the design is benchmarked using two different examples. The advanced chip architecture helps the FPGA co-processor obtaining more than 20 times speed up with its working frequency much lower than the CPU frequency. (authors)
Power efficient and high performance VLSI architecture for AES algorithm

Directory of Open Access Journals (Sweden)

K. Kalaiselvi

2015-09-01

Full Text Available Advanced encryption standard (AES algorithm has been widely deployed in cryptographic applications. This work proposes a low power and high throughput implementation of AES algorithm using key expansion approach. We minimize the power consumption and critical path delay using the proposed high performance architecture. It supports both encryption and decryption using 256-bit keys with a throughput of 0.06 Gbps. The VHDL language is utilized for simulating the design and an FPGA chip has been used for the hardware implementations. Experimental results reveal that the proposed AES architectures offer superior performance than the existing VLSI architectures in terms of power, throughput and critical path delay.
Parallel computing works

Energy Technology Data Exchange (ETDEWEB)

1991-10-23

An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.
p88110: A Graphical Simulator for Computer Architecture and Organization Courses

Science.gov (United States)

Garcia, M. I.; Rodriguez, S.; Perez, A.; Garcia, A.

2009-01-01

Studying fundamental Computer Architecture and Organization topics requires a significant amount of practical work if students are to acquire a good grasp of the theoretical concepts presented in classroom lectures or textbooks. The use of simulators is commonly adopted in order to reach this objective. However, as most of the available…
Performative Environments

DEFF Research Database (Denmark)

Thomsen, Bo Stjerne

2008-01-01

The paper explores how performative architecture can act as a collective environment localizing urban flows and establishing public domains through the integration of pervasive computing and animation techniques. The NoRA project introduces the concept of ‘performative environments,' focusing on ...... of local interactions and network behaviour, building becomes social infrastructure and prompts an understanding of architectural structures as quasiobjects, which can retain both variation and recognisability in changing social constellations.......The paper explores how performative architecture can act as a collective environment localizing urban flows and establishing public domains through the integration of pervasive computing and animation techniques. The NoRA project introduces the concept of ‘performative environments,' focusing...
SAME4HPC: A Promising Approach in Building a Scalable and Mobile Environment for High-Performance Computing

Energy Technology Data Exchange (ETDEWEB)

Karthik, Rajasekar [ORNL

2014-01-01

In this paper, an architecture for building Scalable And Mobile Environment For High-Performance Computing with spatial capabilities called SAME4HPC is described using cutting-edge technologies and standards such as Node.js, HTML5, ECMAScript 6, and PostgreSQL 9.4. Mobile devices are increasingly becoming powerful enough to run high-performance apps. At the same time, there exist a significant number of low-end and older devices that rely heavily on the server or the cloud infrastructure to do the heavy lifting. Our architecture aims to support both of these types of devices to provide high-performance and rich user experience. A cloud infrastructure consisting of OpenStack with Ubuntu, GeoServer, and high-performance JavaScript frameworks are some of the key open-source and industry standard practices that has been adopted in this architecture.
ARCHITECTURE OF WEB BASED COMPUTER-AIDED MANUFACTURING SYSTEM

Directory of Open Access Journals (Sweden)

N. E. Filyukov

2014-09-01

Full Text Available The paper deals with design of a web-based system for Computer-Aided Manufacturing (CAM. Remote applications and databases located in the "private cloud" are proposed to be the basis of such system. The suggested approach contains: service - oriented architecture, using web applications and web services as modules, multi-agent technologies for implementation of information exchange functions between the components of the system and the usage of PDM - system for managing technology projects within the CAM. The proposed architecture involves CAM conversion into the corporate information system that will provide coordinated functioning of subsystems based on a common information space, as well as parallelize collective work on technology projects and be able to provide effective control of production planning. A system has been developed within this architecture which gives the possibility for a rather simple technological subsystems connect to the system and implementation of their interaction. The system makes it possible to produce CAM configuration for a particular company on the set of developed subsystems and databases specifying appropriate access rights for employees of the company. The proposed approach simplifies maintenance of software and information support for CAM subsystems due to their central location in the data center. The results can be used as a basis for CAM design and testing within the learning process for development and modernization of the system algorithms, and then can be tested in the extended enterprise.
Modeling, analysis and optimization of network-on-chip communication architectures

CERN Document Server

Ogras, Umit Y

2013-01-01

Traditionally, design space exploration for Systems-on-Chip (SoCs) has focused on the computational aspects of the problem at hand. However, as the number of components on a single chip and their performance continue to increase, the communication architecture plays a major role in the area, performance and energy consumption of the overall system. As a result, a shift from computation-based to communication-based design becomes mandatory. Towards this end, network-on-chip (NoC) communication architectures have emerged recently as a promising alternative to classical bus and point-to-point communication architectures. This book explores outstanding research problems related to modeling, analysis and optimization of NoC communication architectures. More precisely, we present novel design methodologies, software tools and FPGA prototypes to aid the design of application-specific NoCs.
Simulation of Si:P spin-based quantum computer architecture

International Nuclear Information System (INIS)

Chang Yiachung; Fang Angbo

2008-01-01

We present realistic simulation for single and double phosphorous donors in a silicon-based quantum computer design by solving a valley-orbit coupled effective-mass equation for describing phosphorous donors in strained silicon quantum well (QW). Using a generalized unrestricted Hartree-Fock method, we solve the two-electron effective-mass equation with quantum well confinement and realistic gate potentials. The effects of QW width, gate voltages, donor separation, and donor position shift on the lowest singlet and triplet energies and their charge distributions for a neighboring donor pair in the quantum computer(QC) architecture are analyzed. The gate tunability are defined and evaluated for a typical QC design. Estimates are obtained for the duration of spin half-swap gate operation.
Predictors of Future Performance in Architectural Design Education

Science.gov (United States)

Roberts, A. S.

2007-01-01

The link between academic performance in secondary education and the subsequent performance of students studying architecture at university level is commonly questioned by educators and admissions tutors. This paper investigates the potential for using measures of cognitive style and spatial ability as predictors of future potential in…
Efficient Phase Unwrapping Architecture for Digital Holographic Microscopy

Directory of Open Access Journals (Sweden)

Wen-Jyi Hwang

2011-09-01

Full Text Available This paper presents a novel phase unwrapping architecture for accelerating the computational speed of digital holographic microscopy (DHM. A fast Fourier transform (FFT based phase unwrapping algorithm providing a minimum squared error solution is adopted for hardware implementation because of its simplicity and robustness to noise. The proposed architecture is realized in a pipeline fashion to maximize through put of thecomputation. Moreover, the number of hardware multipliers and dividers are minimized to reduce the hardware costs. The proposed architecture is used as a custom user logic in a system on programmable chip (SOPC for physical performance measurement. Experimental results reveal that the proposed architecture is effective for expediting the computational speed while consuming low hardware resources for designing an embedded DHM system.
Verification of Electromagnetic Physics Models for Parallel Computing Architectures in the GeantV Project

Energy Technology Data Exchange (ETDEWEB)

Amadio, G.; et al.

2017-11-22

An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physics models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.
RSAM: An enhanced architecture for achieving web services reliability in mobile cloud computing

Directory of Open Access Journals (Sweden)

Amr S. Abdelfattah

2018-04-01

Full Text Available The evolution of the mobile landscape is coupled with the ubiquitous nature of the internet with its intermittent wireless connectivity and the web services. Achieving the web service reliability results in low communication overhead and retrieving the appropriate response. The middleware approach (MA is highly tended to achieve the web service reliability. This paper proposes a Reliable Service Architecture using Middleware (RSAM that achieves the reliable web services consumption. The enhanced architecture focuses on ensuring and tracking the request execution under the communication limitations and service temporal unavailability. It considers the most measurement factors including: request size, response size, and consuming time. We conducted experiments to compare the enhanced architecture with the traditional one. In these experiments, we covered several cases to prove the achievement of reliability. Results also show that the request size was found to be constant, the response size is identical to the traditional architecture, and the increase in the consuming time was less than 5% of the transaction time with the different response sizes. Keywords: Reliable web service, Middleware architecture, Mobile cloud computing
Initial results on computational performance of Intel Many Integrated Core (MIC) architecture: implementation of the Weather and Research Forecasting (WRF) Purdue-Lin microphysics scheme

Science.gov (United States)

Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

2014-10-01

Purdue-Lin scheme is a relatively sophisticated microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme includes six classes of hydro meteors: water vapor, cloud water, raid, cloud ice, snow and graupel. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. In this paper, we accelerate the Purdue Lin scheme using Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi is a high performance coprocessor consists of up to 61 cores. The Xeon Phi is connected to a CPU via the PCI Express (PICe) bus. In this paper, we will discuss in detail the code optimization issues encountered while tuning the Purdue-Lin microphysics Fortran code for Xeon Phi. In particularly, getting a good performance required utilizing multiple cores, the wide vector operations and make efficient use of memory. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 4.2x. Furthermore, the same optimizations improved performance on Intel Xeon E5-2603 CPU by a factor of 1.2x compared to the original code.
Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures

Energy Technology Data Exchange (ETDEWEB)

Aktulga, Hasan Metin [Michigan State Univ., East Lansing, MI (United States); Coffman, Paul [Argonne National Lab. (ANL), Argonne, IL (United States); Shan, Tzu-Ray [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Knight, Chris [Argonne National Lab. (ANL), Argonne, IL (United States); Jiang, Wei [Argonne National Lab. (ANL), Argonne, IL (United States)

2015-12-01

Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups in the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.

SCEAPI: A unified Restful Web API for High-Performance Computing

Science.gov (United States)

Rongqiang, Cao; Haili, Xiao; Shasha, Lu; Yining, Zhao; Xiaoning, Wang; Xuebin, Chi

2017-10-01

The development of scientific computing is increasingly moving to collaborative web and mobile applications. All these applications need high-quality programming interface for accessing heterogeneous computing resources consisting of clusters, grid computing or cloud computing. In this paper, we introduce our high-performance computing environment that integrates computing resources from 16 HPC centers across China. Then we present a bundle of web services called SCEAPI and describe how it can be used to access HPC resources with HTTP or HTTPs protocols. We discuss SCEAPI from several aspects including architecture, implementation and security, and address specific challenges in designing compatible interfaces and protecting sensitive data. We describe the functions of SCEAPI including authentication, file transfer and job management for creating, submitting and monitoring, and how to use SCEAPI in an easy-to-use way. Finally, we discuss how to exploit more HPC resources quickly for the ATLAS experiment by implementing the custom ARC compute element based on SCEAPI, and our work shows that SCEAPI is an easy-to-use and effective solution to extend opportunistic HPC resources.
Overview of Parallel Platforms for Common High Performance Computing

Directory of Open Access Journals (Sweden)

T. Fryza

2012-04-01

Full Text Available The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well.
Soft Computing Techniques for the Protein Folding Problem on High Performance Computing Architectures.

Science.gov (United States)

Llanes, Antonio; Muñoz, Andrés; Bueno-Crespo, Andrés; García-Valverde, Teresa; Sánchez, Antonia; Arcas-Túnez, Francisco; Pérez-Sánchez, Horacio; Cecilia, José M

2016-01-01

The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.
A NEW OS ARCHITECTURE FOR IOT

Directory of Open Access Journals (Sweden)

Jean Y. Astier

2018-03-01

Full Text Available Current computer operating systems architectures are not well suited for the coming world of connected objects, known as the Internet of Things (IoT for multiple reasons: poor communication performances in both point-to-point and broadcast cases, poor operational reliability and network security, excessive requirements both in terms of processor power and memory size leading to excessive electrical power consumption. We introduce a new computer operating system architecture well adapted to IoT, from the most modest to the most complex, and more generally able to significantly raise the input/output capacities of any communicating computer. This architecture rests on the principles of the Von Neumann hardware model, and is composed of two types of asymmetric distributed containers, which communicate by message passing. We describe the sub-systems of both of these types of containers, where each sub-system has its own scheduler, and a dedicated execution level.
High-performance, scalable optical network-on-chip architectures

Science.gov (United States)

Tan, Xianfang

The rapid advance of technology enables a large number of processing cores to be integrated into a single chip which is called a Chip Multiprocessor (CMP) or a Multiprocessor System-on-Chip (MPSoC) design. The on-chip interconnection network, which is the communication infrastructure for these processing cores, plays a central role in a many-core system. With the continuously increasing complexity of many-core systems, traditional metallic wired electronic networks-on-chip (NoC) became a bottleneck because of the unbearable latency in data transmission and extremely high energy consumption on chip. Optical networks-on-chip (ONoC) has been proposed as a promising alternative paradigm for electronic NoC with the benefits of optical signaling communication such as extremely high bandwidth, negligible latency, and low power consumption. This dissertation focus on the design of high-performance and scalable ONoC architectures and the contributions are highlighted as follow: 1. A micro-ring resonator (MRR)-based Generic Wavelength-routed Optical Router (GWOR) is proposed. A method for developing any sized GWOR is introduced. GWOR is a scalable non-blocking ONoC architecture with simple structure, low cost and high power efficiency compared to existing ONoC designs. 2. To expand the bandwidth and improve the fault tolerance of the GWOR, a redundant GWOR architecture is designed by cascading different type of GWORs into one network. 3. The redundant GWOR built with MRR-based comb switches is proposed. Comb switches can expand the bandwidth while keep the topology of GWOR unchanged by replacing the general MRRs with comb switches. 4. A butterfly fat tree (BFT)-based hybrid optoelectronic NoC (HONoC) architecture is developed in which GWORs are used for global communication and electronic routers are used for local communication. The proposed HONoC uses less numbers of electronic routers and links than its counterpart of electronic BFT-based NoC. It takes the advantages of
Algorithm-structured computer arrays and networks architectures and processes for images, percepts, models, information

CERN Document Server

Uhr, Leonard

1984-01-01

Computer Science and Applied Mathematics: Algorithm-Structured Computer Arrays and Networks: Architectures and Processes for Images, Percepts, Models, Information examines the parallel-array, pipeline, and other network multi-computers.This book describes and explores arrays and networks, those built, being designed, or proposed. The problems of developing higher-level languages for systems and designing algorithm, program, data flow, and computer structure are also discussed. This text likewise describes several sequences of successively more general attempts to combine the power of arrays wi
Air Force Science & Technology Issues & Opportunities Regarding High Performance Embedded Computing

Science.gov (United States)

2009-09-23

price-performance advantage include: large scale simulations of neuromorphic computing models GOTCHA radar video SAR for wide area persistent...the handcuffs were not for me and that the military had so far got … Neuromorphic example: Robust recognition of occluded text Gotcha SAR PCID Image...Architecture 16 cores / chip 10 x 10 stacks / board50 chips / stack EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM
Computer Architecture Techniques for Power-Efficiency

CERN Document Server

Kaxiras, Stefanos

2008-01-01

In the last few years, power dissipation has become an important design constraint, on par with performance, in the design of new computer systems. Whereas in the past, the primary job of the computer architect was to translate improvements in operating frequency and transistor count into performance, now power efficiency must be taken into account at every step of the design process. While for some time, architects have been successful in delivering 40% to 50% annual improvement in processor performance, costs that were previously brushed aside eventually caught up. The most critical of these
How to build a high-performance compute cluster for the Grid

CERN Document Server

Reinefeld, A

2001-01-01

The success of large-scale multi-national projects like the forthcoming analysis of the LHC particle collision data at CERN relies to a great extent on the ability to efficiently utilize computing and data-storage resources at geographically distributed sites. Currently, much effort is spent on the design of Grid management software (Datagrid, Globus, etc.), while the effective integration of computing nodes has been largely neglected up to now. This is the focus of our work. We present a framework for a high- performance cluster that can be used as a reliable computing node in the Grid. We outline the cluster architecture, the management of distributed data and the seamless integration of the cluster into the Grid environment. (11 refs).
3D-SoftChip: A Novel Architecture for Next-Generation Adaptive Computing Systems

Directory of Open Access Journals (Sweden)

Lee Mike Myung-Ok

2006-01-01

Full Text Available This paper introduces a novel architecture for next-generation adaptive computing systems, which we term 3D-SoftChip. The 3D-SoftChip is a 3-dimensional (3D vertically integrated adaptive computing system combining state-of-the-art processing and 3D interconnection technology. It comprises the vertical integration of two chips (a configurable array processor and an intelligent configurable switch through an indium bump interconnection array (IBIA. The configurable array processor (CAP is an array of heterogeneous processing elements (PEs, while the intelligent configurable switch (ICS comprises a switch block, 32-bit dedicated RISC processor for control, on-chip program/data memory, data frame buffer, along with a direct memory access (DMA controller. This paper introduces the novel 3D-SoftChip architecture for real-time communication and multimedia signal processing as a next-generation computing system. The paper further describes the advanced HW/SW codesign and verification methodology, including high-level system modeling of the 3D-SoftChip using SystemC, being used to determine the optimum hardware specification in the early design stage.
(Invited) Wavy Channel TFT Architecture for High Performance Oxide Based Displays

KAUST Repository

Hanna, Amir; Hussain, Aftab M.; Hussain, Aftab M.; Ghoneim, Mohamed T.; Rojas, Jhonathan Prieto; Sevilla, Galo T.; Hussain, Muhammad Mustafa

2015-01-01

We show the effectiveness of wavy channel architecture for thin film transistor application for increased output current. This specific architecture allows increased width of the device by adopting a corrugated shape of the substrate without any further real estate penalty. The performance improvement is attributed not only to the increased transistor width, but also to enhanced applied electric field in the channel due to the wavy architecture.
(Invited) Wavy Channel TFT Architecture for High Performance Oxide Based Displays

KAUST Repository

Hanna, Amir

2015-05-22

We show the effectiveness of wavy channel architecture for thin film transistor application for increased output current. This specific architecture allows increased width of the device by adopting a corrugated shape of the substrate without any further real estate penalty. The performance improvement is attributed not only to the increased transistor width, but also to enhanced applied electric field in the channel due to the wavy architecture.
An S_N Algorithm for Modern Architectures

Energy Technology Data Exchange (ETDEWEB)

Baker, Randal Scott [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

2016-08-29

LANL discrete ordinates transport packages are required to perform large, computationally intensive time-dependent calculations on massively parallel architectures, where even a single such calculation may need many months to complete. While KBA methods scale out well to very large numbers of compute nodes, we are limited by practical constraints on the number of such nodes we can actually apply to any given calculation. Instead, we describe a modified KBA algorithm that allows realization of the reductions in solution time offered by both the current, and future, architectural changes within a compute node.
Towards OpenVL: Improving Real-Time Performance of Computer Vision Applications

Science.gov (United States)

Shen, Changsong; Little, James J.; Fels, Sidney

Meeting constraints for real-time performance is a main issue for computer vision, especially for embedded computer vision systems. This chapter presents our progress on our open vision library (OpenVL), a novel software architecture to address efficiency through facilitating hardware acceleration, reusability, and scalability for computer vision systems. A logical image understanding pipeline is introduced to allow parallel processing. We also discuss progress on our middleware—vision library utility toolkit (VLUT)—that enables applications to operate transparently over a heterogeneous collection of hardware implementations. OpenVL works as a state machine,with an event-driven mechanismto provide users with application-level interaction. Various explicit or implicit synchronization and communication methods are supported among distributed processes in the logical pipelines. The intent of OpenVL is to allow users to quickly and easily recover useful information from multiple scenes, in a cross-platform, cross-language manner across various software environments and hardware platforms. To validate the critical underlying concepts of OpenVL, a human tracking system and a local positioning system are implemented and described. The novel architecture separates the specification of algorithmic details from the underlying implementation, allowing for different components to be implemented on an embedded system without recompiling code.
Computing Architecture of the ALICE Detector Control System

CERN Document Server

Augustinus, A; Moreno, A; Kurepin, A N; De Cataldo, G; Pinazza, O; Rosinský, P; Lechman, M; Jirdén, L S

2011-01-01

The ALICE Detector Control System (DCS) is based on a commercial SCADA product, running on a large Windows computer cluster. It communicates with about 1200 network attached devices to assure safe and stable operation of the experiment. In the presentation we focus on the design of the ALICE DCS computer systems. We describe the management of data flow, mechanisms for handling the large data amounts and information exchange with external systems. One of the key operational requirements is an intuitive, error proof and robust user interface allowing for simple operation of the experiment. At the same time the typical operator task, like trending or routine checks of the devices, must be decoupled from the automated operation in order to prevent overload of critical parts of the system. All these requirements must be implemented in an environment with strict security requirements. In the presentation we explain how these demands affected the architecture of the ALICE DCS.
Instruction Set Architectures for Quantum Processing Units

OpenAIRE

Britt, Keith A.; Humble, Travis S.

2017-01-01

Progress in quantum computing hardware raises questions about how these devices can be controlled, programmed, and integrated with existing computational workflows. We briefly describe several prominent quantum computational models, their associated quantum processing units (QPUs), and the adoption of these devices as accelerators within high-performance computing systems. Emphasizing the interface to the QPU, we analyze instruction set architectures based on reduced and complex instruction s...
Architectural approach to the energy performance of buildings in a hot-dry climate with special reference to Egypt

Energy Technology Data Exchange (ETDEWEB)

Hamdy, I F

1986-01-01

A thesis is presented on the changing approach to architectural design of buildings in a hot, dry climate in view of the increased recognition of the importance of energy efficiency. The thermal performance of buildings in Egypt is used as an example and the nature of the local climate and human requirements are also studied. Other effects on the thermal performance considered include building form, orientation and surrounding conditions. An evaluative computer model is constructed and its applications allow the prediction on the energy performance of changing design parameters.
Exploring Heterogeneous Multicore Architectures for Advanced Embedded Uncertainty Quantification.

Energy Technology Data Exchange (ETDEWEB)

Phipps, Eric T.; Edwards, Harold C.; Hu, Jonathan J.

2014-09-01

We explore rearrangements of classical uncertainty quantification methods with the aim of achieving higher aggregate performance for uncertainty quantification calculations on emerging multicore and manycore architectures. We show a rearrangement of the stochastic Galerkin method leads to improved performance and scalability on several computational architectures whereby un- certainty information is propagated at the lowest levels of the simulation code improving memory access patterns, exposing new dimensions of fine grained parallelism, and reducing communica- tion. We also develop a general framework for implementing such rearrangements for a diverse set of uncertainty quantification algorithms as well as computational simulation codes to which they are applied.
Computer sciences

Science.gov (United States)

Smith, Paul H.

1988-01-01

The Computer Science Program provides advanced concepts, techniques, system architectures, algorithms, and software for both space and aeronautics information sciences and computer systems. The overall goal is to provide the technical foundation within NASA for the advancement of computing technology in aerospace applications. The research program is improving the state of knowledge of fundamental aerospace computing principles and advancing computing technology in space applications such as software engineering and information extraction from data collected by scientific instruments in space. The program includes the development of special algorithms and techniques to exploit the computing power provided by high performance parallel processors and special purpose architectures. Research is being conducted in the fundamentals of data base logic and improvement techniques for producing reliable computing systems.
The impact of optimize solar radiation received on the levels and energy disposal of levels on architectural design result by using computer simulation

Energy Technology Data Exchange (ETDEWEB)

Rezaei, Davood; Farajzadeh Khosroshahi, Samaneh; Sadegh Falahat, Mohammad [Zanjan University (Iran, Islamic Republic of)], email: d_rezaei@znu.ac.ir, email: ronas_66@yahoo.com, email: Safalahat@yahoo.com

2011-07-01

In order to minimize the energy consumption of a building it is important to achieve optimum solar energy. The aim of this paper is to introduce the use of computer modeling in the early stages of design to optimize solar radiation received and energy disposal in an architectural design. Computer modeling was performed on 2 different projects located in Los Angeles, USA, using ECOTECT software. Changes were made to the designs following analysis of the modeling results and a subsequent analysis was carried out on the optimized designs. Results showed that the computer simulation allows the designer to set the analysis criteria and improve the energy performance of a building before it is constructed; moreover, it can be used for a wide range of optimization levels. This study pointed out that computer simulation should be performed in the design stage to optimize a building's energy performance.

Architecture and pervasive Computing when buildings and design artifacts become popular interfaces

DEFF Research Database (Denmark)

Krogh, Peter Gall; Grønbæk, Kaj

2001-01-01

One of the main areas of architecture is buildings design, and we will focus on the impact of pervasive computing in this area. The breakthrough of the Internet has triggered a significant increase in what is often called intelligent buildings 1 in recent years. Due to development in pervasive c...
How does Architecture Sound for Different Musical Instrument Performances?

DEFF Research Database (Denmark)

Saher, Konca; Rindel, Jens Holger

2006-01-01

This paper discusses how consideration of sound _in particular a specific musical instrument_ impacts the design of a room. Properly designed architectural acoustics is fundamental to improve the listening experience of an instrument in rooms in a conservatory. Six discrete instruments (violin, c...... different instruments and the choir experience that could fit into same category of room. For all calculations and the auralizations, a computational model is used: ODEON 7.0....
Evolution of the Milieu Approach for Software Development for the Polymorphous Computing Architecture Program

National Research Council Canada - National Science Library

Dandass, Yoginder

2004-01-01

A key goal of the DARPA Polymorphous Computing Architectures (PCA) program is to develop reactive closed-loop systems that are capable of being dynamically reconfigured in order to respond to changing mission scenarios...
A Survey and Evaluation of Simulators Suitable for Teaching Courses in Computer Architecture and Organization

Science.gov (United States)

Nikolic, B.; Radivojevic, Z.; Djordjevic, J.; Milutinovic, V.

2009-01-01

Courses in Computer Architecture and Organization are regularly included in Computer Engineering curricula. These courses are usually organized in such a way that students obtain not only a purely theoretical experience, but also a practical understanding of the topics lectured. This practical work is usually done in a laboratory using simulators…
State-of-the-art in Heterogeneous Computing

Directory of Open Access Journals (Sweden)

Andre R. Brodtkorb

2010-01-01

Full Text Available Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs, and field programmable gate arrays (FPGAs. We present a review of hardware, available software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.
Design and study of parallel computing environment of Monte Carlo simulation for particle therapy planning using a public cloud-computing infrastructure

International Nuclear Information System (INIS)

Yokohama, Noriya

2013-01-01

This report was aimed at structuring the design of architectures and studying performance measurement of a parallel computing environment using a Monte Carlo simulation for particle therapy using a high performance computing (HPC) instance within a public cloud-computing infrastructure. Performance measurements showed an approximately 28 times faster speed than seen with single-thread architecture, combined with improved stability. A study of methods of optimizing the system operations also indicated lower cost. (author)
Fault Tolerant Computer Architecture

CERN Document Server

Sorin, Daniel

2009-01-01

For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes
Architectural transformations in network services and distributed systems

CERN Document Server

Luntovskyy, Andriy

2017-01-01

With the given work we decided to help not only the readers but ourselves, as the professionals who actively involved in the networking branch, with understanding the trends that have developed in recent two decades in distributed systems and networks. Important architecture transformations of distributed systems have been examined. The examples of new architectural solutions are discussed. Content Periodization of service development Energy efficiency Architectural transformations in Distributed Systems Clustering and Parallel Computing, performance models Cloud Computing, RAICs, Virtualization, SDN Smart Grid, Internet of Things, Fog Computing Mobile Communication from LTE to 5G, DIDO, SAT-based systems Data Security Guaranteeing Distributed Systems Target Groups Students in EE and IT of universities and (dual) technical high schools Graduated engineers as well as teaching staff About the Authors Andriy Luntovskyy provides classes on networks, mobile communication, software technology, distributed systems, ...
Towards Energy-Centric Computing and Computer Architecture

CERN Multimedia

CERN. Geneva

2010-01-01

Technology forecasts indicate that device scaling will continue well into the next decade. Unfortunately, it is becoming extremely difficult to harness this increase in the number of transistors into performance due to a number of technological, circuit, architectural, methodological and programming challenges.In this talk, I will argue that the key emerging showstopper is power. Voltage scaling as a means to maintain a constant power envelope with an increase in transistor numbers is hitting diminishing returns. As such, to continue riding the Moore's law we need to look for drastic measures to cut power. This is definitely the case for server chips in future datacenters, where abundant server parallelism, redundancy and 3D chip integration are likely to remove programming, reliability and bandwidth hurdles, leaving power as the only true limiter.I will present results backing this argument based on validated models for f...
High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures

KAUST Repository

Ltaief, Hatem

2013-04-01

This article presents a new high-performance bidiagonal reduction (BRD) for homogeneous multicore architectures. This article is an extension of the high-performance tridiagonal reduction implemented by the same authors [Luszczek et al., IPDPS 2011] to the BRD case. The BRD is the first step toward computing the singular value decomposition of a matrix, which is one of the most important algorithms in numerical linear algebra due to its broad impact in computational science. The high performance of the BRD described in this article comes from the combination of four important features: (1) tile algorithms with tile data layout, which provide an efficient data representation in main memory; (2) a two-stage reduction approach that allows to cast most of the computation during the first stage (reduction to band form) into calls to Level 3 BLAS and reduces the memory traffic during the second stage (reduction from band to bidiagonal form) by using high-performance kernels optimized for cache reuse; (3) a data dependence translation layer that maps the general algorithm with column-major data layout into the tile data layout; and (4) a dynamic runtime system that efficiently schedules the newly implemented kernels across the processing units and ensures that the data dependencies are not violated. A detailed analysis is provided to understand the critical impact of the tile size on the total execution time, which also corresponds to the matrix bandwidth size after the reduction of the first stage. The performance results show a significant improvement over currently established alternatives. The new high-performance BRD achieves up to a 30-fold speedup on a 16-core Intel Xeon machine with a 12000×12000 matrix size against the state-of-the-art open source and commercial numerical software packages, namely LAPACK, compiled with optimized and multithreaded BLAS from MKL as well as Intel MKL version 10.2. © 2013 ACM.
Transportable GPU (General Processor Units) chip set technology for standard computer architectures

Science.gov (United States)

Fosdick, R. E.; Denison, H. C.

1982-11-01

The USAFR-developed GPU Chip Set has been utilized by Tracor to implement both USAF and Navy Standard 16-Bit Airborne Computer Architectures. Both configurations are currently being delivered into DOD full-scale development programs. Leadless Hermetic Chip Carrier packaging has facilitated implementation of both architectures on single 41/2 x 5 substrates. The CMOS and CMOS/SOS implementations of the GPU Chip Set have allowed both CPU implementations to use less than 3 watts of power each. Recent efforts by Tracor for USAF have included the definition of a next-generation GPU Chip Set that will retain the application-proven architecture of the current chip set while offering the added cost advantages of transportability across ISO-CMOS and CMOS/SOS processes and across numerous semiconductor manufacturers using a newly-defined set of common design rules. The Enhanced GPU Chip Set will increase speed by an approximate factor of 3 while significantly reducing chip counts and costs of standard CPU implementations.
Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

Science.gov (United States)

Moon, Hongsik

What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the
Parallel computation for distributed parameter system-from vector processors to Adena computer

Energy Technology Data Exchange (ETDEWEB)

Nogi, T

1983-04-01

Research on advanced parallel hardware and software architectures for very high-speed computation deserves and needs more support and attention to fulfil its promise. Novel architectures for parallel processing are being made ready. Architectures for parallel processing can be roughly divided into two groups. One is a vector processor in which a single central processing unit involves multiple vector-arithmetic registers. The other is a processor array in which slave processors are connected to a host processor to perform parallel computation. In this review, the concept and data structure of the Adena (alternating-direction edition nexus array) architecture, which is conformable to distributed-parameter simulation algorithms, are described. 5 references.
Single Assignment C: HP^2 programming for heterogeneous concurrent architectures

NARCIS (Netherlands)

Scholz, S.-B.; Herhut, S.; Grelck, C.

2010-01-01

The range of architectures used in high-performance computing is quickly expanding and at the same time lifetimes of platforms are decreasing. This shift threatens the return on investment for tuning applications to specific architectures and platforms, which until now was the prevailing development
High Performance Motion-Planner Architecture for Hardware-In-the-Loop System Based on Position-Based-Admittance-Control

Directory of Open Access Journals (Sweden)

Francesco La Mura

2018-02-01

Full Text Available This article focuses on a Hardware-In-the-Loop application developed from the advanced energy field project LIFES50+. The aim is to replicate, inside a wind gallery test facility, the combined effect of aerodynamic and hydrodynamic loads on a floating wind turbine model for offshore energy production, using a force controlled robotic device, emulating floating substructure’s behaviour. In addition to well known real-time Hardware-In-the-Loop (HIL issues, the particular application presented has stringent safety requirements of the HIL equipment and difficult to predict operating conditions, so that extra computational efforts have to be spent running specific safety algorithms and achieving desired performance. To meet project requirements, a high performance software architecture based on Position-Based-Admittance-Control (PBAC is presented, combining low level motion interpolation techniques, efficient motion planning, based on buffer management and Time-base control, and advanced high level safety algorithms, implemented in a rapid real-time control architecture.
Parallel processing algorithms for hydrocodes on a computer with MIMD architecture (DENELCOR's HEP)

International Nuclear Information System (INIS)

Hicks, D.L.

1983-11-01

In real time simulation/prediction of complex systems such as water-cooled nuclear reactors, if reactor operators had fast simulator/predictors to check the consequences of their operations before implementing them, events such as the incident at Three Mile Island might be avoided. However, existing simulator/predictors such as RELAP run slower than real time on serial computers. It appears that the only way to overcome the barrier to higher computing rates is to use computers with architectures that allow concurrent computations or parallel processing. The computer architecture with the greatest degree of parallelism is labeled Multiple Instruction Stream, Multiple Data Stream (MIMD). An example of a machine of this type is the HEP computer by DENELCOR. It appears that hydrocodes are very well suited for parallelization on the HEP. It is a straightforward exercise to parallelize explicit, one-dimensional Lagrangean hydrocodes in a zone-by-zone parallelization. Similarly, implicit schemes can be parallelized in a zone-by-zone fashion via an a priori, symbolic inversion of the tridiagonal matrix that arises in an implicit scheme. These techniques are extended to Eulerian hydrocodes by using Harlow's rezone technique. The extension from single-phase Eulerian to two-phase Eulerian is straightforward. This step-by-step extension leads to hydrocodes with zone-by-zone parallelization that are capable of two-phase flow simulation. Extensions to two and three spatial dimensions can be achieved by operator splitting. It appears that a zone-by-zone parallelization is the best way to utilize the capabilities of an MIMD machine. 40 references
Experimental Demonstration of a Self-organized Architecture for Emerging Grid Computing Applications on OBS Testbed

Science.gov (United States)

Liu, Lei; Hong, Xiaobin; Wu, Jian; Lin, Jintong

As Grid computing continues to gain popularity in the industry and research community, it also attracts more attention from the customer level. The large number of users and high frequency of job requests in the consumer market make it challenging. Clearly, all the current Client/Server(C/S)-based architecture will become unfeasible for supporting large-scale Grid applications due to its poor scalability and poor fault-tolerance. In this paper, based on our previous works [1, 2], a novel self-organized architecture to realize a highly scalable and flexible platform for Grids is proposed. Experimental results show that this architecture is suitable and efficient for consumer-oriented Grids.
Control system architecture: The standard and non-standard models

International Nuclear Information System (INIS)

Thuot, M.E.; Dalesio, L.R.

1993-01-01

Control system architecture development has followed the advances in computer technology through mainframes to minicomputers to micros and workstations. This technology advance and increasingly challenging accelerator data acquisition and automation requirements have driven control system architecture development. In summarizing the progress of control system architecture at the last International Conference on Accelerator and Large Experimental Physics Control Systems (ICALEPCS) B. Kuiper asserted that the system architecture issue was resolved and presented a ''standard model''. The ''standard model'' consists of a local area network (Ethernet or FDDI) providing communication between front end microcomputers, connected to the accelerator, and workstations, providing the operator interface and computational support. Although this model represents many present designs, there are exceptions including reflected memory and hierarchical architectures driven by requirements for widely dispersed, large channel count or tightly coupled systems. This paper describes the performance characteristics and features of the ''standard model'' to determine if the requirements of ''non-standard'' architectures can be met. Several possible extensions to the ''standard model'' are suggested including software as well as the hardware architectural feature
Low power design of wireless endoscopy compression/communication architecture

Directory of Open Access Journals (Sweden)

Zitouni Abdelkrim

2018-05-01

Full Text Available A wireless endoscopy capsule represents an efficient device interesting on the examination of digestive diseases. Many performance criteria’s (silicon area, dissipated power, image quality, computational time, etc. need to be deeply studied.In this paper, our interest is the optimization of the indicated criteria. The proposed methodology is based on exploring the advantages of the DCT/DWT transforms by combining them into single architecture. For arithmetic operations, the MCLA technique is used. This architecture integrates also a CABAC entropy coder that supports all binarization schemes. AMBA/I2C architecture is developed for assuring optimized communication.The comparisons of the proposed architecture with the most popular methods explained in related works show efficient results in terms dissipated power, hardware cost, and computation speed. Keywords: Wireless endoscopy capsule, DCT/DWT image compression, CABAC entropy coder, AMBA/I2C multi-bus architecture
Memory intensive functional architecture for distributed computer control systems

International Nuclear Information System (INIS)

Dimmler, D.G.

1983-10-01

A memory-intensive functional architectue for distributed data-acquisition, monitoring, and control systems with large numbers of nodes has been conceptually developed and applied in several large-scale and some smaller systems. This discussion concentrates on: (1) the basic architecture; (2) recent expansions of the architecture which now become feasible in view of the rapidly developing component technologies in microprocessors and functional large-scale integration circuits; and (3) implementation of some key hardware and software structures and one system implementation which is a system for performing control and data acquisition of a neutron spectrometer at the Brookhaven High Flux Beam Reactor. The spectrometer is equipped with a large-area position-sensitive neutron detector

Performance Assessment Strategies: A computational framework for conceptual design of large roofs

Directory of Open Access Journals (Sweden)

Michela Turrin

2014-01-01

Full Text Available Using engineering performance evaluations to explore design alternatives during the conceptual phase of architectural design helps to understand the relationships between form and performance; and is crucial for developing well-performing final designs. Computer aided conceptual design has the potential to aid the design team in discovering and highlighting these relationships; especially by means of procedural and parametric geometry to support the generation of geometric design, and building performance simulation tools to support performance assessments. However, current tools and methods for computer aided conceptual design in architecture do not explicitly reveal nor allow for backtracking the relationships between performance and geometry of the design. They currently support post-engineering, rather than the early design decisions and the design exploration process. Focusing on large roofs, this research aims at developing a computational design approach to support designers in performance driven explorations. The approach is meant to facilitate the multidisciplinary integration and the learning process of the designer; and not to constrain the process in precompiled procedures or in hard engineering formulations, nor to automatize it by delegating the design creativity to computational procedures. PAS (Performance Assessment Strategies as a method is the main output of the research. It consists of a framework including guidelines and an extensible library of procedures for parametric modelling. It is structured on three parts. Pre-PAS provides guidelines for a design strategy-definition, toward the parameterization process. Model-PAS provides guidelines, procedures and scripts for building the parametric models. Explore-PAS supports the solutions-assessment based on numeric evaluations and performance simulations, until the identification of a suitable design solution. PAS has been developed based on action research. Several case studies
The Architecture and Administration of the ATLAS Online Computing System

CERN Document Server

Dobson, M; Ertorer, E; Garitaonandia, H; Leahu, L; Leahu, M; Malciu, I M; Panikashvili, E; Topurov, A; Ünel, G; Computing In High Energy and Nuclear Physics

2006-01-01

The needs of ATLAS experiment at the upcoming LHC accelerator, CERN, in terms of data transmission rates and processing power require a large cluster of computers (of the order of thousands) administrated and exploited in a coherent and optimal manner. Requirements like stability, robustness and fast recovery in case of failure impose a server-client system architecture with servers distributed in a tree like structure and clients booted from the network. For security reasons, the system should be accessible only through an application gateway and, also to ensure the autonomy of the system, the network services should be provided internally by dedicated machines in synchronization with CERN IT department's central services. The paper describes a small scale implementation of the system architecture that fits the given requirements and constraints. Emphasis will be put on the mechanisms and tools used to net boot the clients via the "Boot With Me" project and to synchronize information within the cluster via t...
Contagious architecture: computation, aesthetics, and space (technologies of lived abstraction)

CERN Document Server

Parisi, Luciana

2013-01-01

In Contagious Architecture, Luciana Parisi offers a philosophical inquiry into the status of the algorithm in architectural and interaction design. Her thesis is that algorithmic computation is not simply an abstract mathematical tool but constitutes a mode of thought in its own right, in that its operation extends into forms of abstraction that lie beyond direct human cognition and control. These include modes of infinity, contingency, and indeterminacy, as well as incomputable quantities underlying the iterative process of algorithmic processing. The main philosophical source for the project is Alfred North Whitehead, whose process philosophy is specifically designed to provide a vocabulary for "modes of thought" exhibiting various degrees of autonomy from human agency even as they are mobilized by it. Because algorithmic processing lies at the heart of the design practices now reshaping our world -- from the physical spaces of our built environment to the networked spaces of digital culture -- the nature o...
Benchmarking hardware architecture candidates for the NFIRAOS real-time controller

Science.gov (United States)

Smith, Malcolm; Kerley, Dan; Herriot, Glen; Véran, Jean-Pierre

2014-07-01

As a part of the trade study for the Narrow Field Infrared Adaptive Optics System, the adaptive optics system for the Thirty Meter Telescope, we investigated the feasibility of performing real-time control computation using a Linux operating system and Intel Xeon E5 CPUs. We also investigated a Xeon Phi based architecture which allows higher levels of parallelism. This paper summarizes both the CPU based real-time controller architecture and the Xeon Phi based RTC. The Intel Xeon E5 CPU solution meets the requirements and performs the computation for one AO cycle in an average of 767 microseconds. The Xeon Phi solution did not meet the 1200 microsecond time requirement and also suffered from unpredictable execution times. More detailed benchmark results are reported for both architectures.
A Conceptual Architecture for Adaptive Human-Computer Interface of a PT Operation Platform Based on Context-Awareness

Directory of Open Access Journals (Sweden)

Qing Xue

2014-01-01

Full Text Available We present a conceptual architecture for adaptive human-computer interface of a PT operation platform based on context-awareness. This architecture will form the basis of design for such an interface. This paper describes components, key technologies, and working principles of the architecture. The critical contents covered context information modeling, processing, relationship establishing between contexts and interface design knowledge by use of adaptive knowledge reasoning, and visualization implementing of adaptive interface with the aid of interface tools technology.
Application of Raptor-M3G to reactor dosimetry problems on massively parallel architectures - 026

International Nuclear Information System (INIS)

Longoni, G.

2010-01-01

The solution of complex 3-D radiation transport problems requires significant resources both in terms of computation time and memory availability. Therefore, parallel algorithms and multi-processor architectures are required to solve efficiently large 3-D radiation transport problems. This paper presents the application of RAPTOR-M3G (Rapid Parallel Transport Of Radiation - Multiple 3D Geometries) to reactor dosimetry problems. RAPTOR-M3G is a newly developed parallel computer code designed to solve the discrete ordinates (SN) equations on multi-processor computer architectures. This paper presents the results for a reactor dosimetry problem using a 3-D model of a commercial 2-loop pressurized water reactor (PWR). The accuracy and performance of RAPTOR-M3G will be analyzed and the numerical results obtained from the calculation will be compared directly to measurements of the neutron field in the reactor cavity air gap. The parallel performance of RAPTOR-M3G on massively parallel architectures, where the number of computing nodes is in the order of hundreds, will be analyzed up to four hundred processors. The performance results will be presented based on two supercomputing architectures: the POPLE supercomputer operated by the Pittsburgh Supercomputing Center and the Westinghouse computer cluster. The Westinghouse computer cluster is equipped with a standard Ethernet network connection and an InfiniBand R interconnects capable of a bandwidth in excess of 20 GBit/sec. Therefore, the impact of the network architecture on RAPTOR-M3G performance will be analyzed as well. (authors)
Performance evaluation of enterprise architecture with a formal fuzzy model (FPN

Directory of Open Access Journals (Sweden)

Ashkan Marahel

2012-10-01

Full Text Available Preparing enterprise architecture is complicated procedure, which uses framework as structure regularity and style as the behavior director for controlling complexity. As in architecture behavior, precedence over structure, for better diagnosis of a behavior than other behaviors, there is a need to evaluate the architecture performance. Enterprise architecture cannot be organized without the benefit of the logical structure. Framework provides a logical structure for classifying architectural output. Among the common architectural framework, the C4ISR is one of the most appropriate frameworks because of the methodology of its production and the level of aggregation capability and minor revisions. C4ISR framework, in three views and by using some documents called product, describes the architecture. In this paper, for developing the systems, there are always uncertainties in information systems and we may use new version of UML called FUZZY-UML, which includes structure and behavior of the system. The proposed model of this paper also uses Fuzzy Petri nets to analyze the developed system.
Mapping the Intangible: On Adaptivity and Relational Prototyping in Architectural Design

DEFF Research Database (Denmark)

Bolbroe, Cameline

2016-01-01

In recent years, new computing technologies in architecture have led to the possibility of designing architecture with non-static qualities, which affords the architectural designer with a whole new opportunity space to explore. At the same time, this opportunity space challenges both...... to meet the challenges of designing with adaptivity in architecture, I propose a particular method specifically tailored for adaptive architectural design. The method, relational prototyping, is founded on the idea of inhabitation as an act. Relational prototyping adapts techniques from performance...
The Jupyter/IPython architecture: a unified view of computational research, from interactive exploration to communication and publication.

Science.gov (United States)

Ragan-Kelley, M.; Perez, F.; Granger, B.; Kluyver, T.; Ivanov, P.; Frederic, J.; Bussonnier, M.

2014-12-01

IPython has provided terminal-based tools for interactive computing in Python since 2001. The notebook document format and multi-process architecture introduced in 2011 have expanded the applicable scope of IPython into teaching, presenting, and sharing computational work, in addition to interactive exploration. The new architecture also allows users to work in any language, with implementations in Python, R, Julia, Haskell, and several other languages. The language agnostic parts of IPython have been renamed to Jupyter, to better capture the notion that a cross-language design can encapsulate commonalities present in computational research regardless of the programming language being used. This architecture offers components like the web-based Notebook interface, that supports rich documents that combine code and computational results with text narratives, mathematics, images, video and any media that a modern browser can display. This interface can be used not only in research, but also for publication and education, as notebooks can be converted to a variety of output formats, including HTML and PDF. Recent developments in the Jupyter project include a multi-user environment for hosting notebooks for a class or research group, a live collaboration notebook via Google Docs, and better support for languages other than Python.
Dynamic logic architecture based on piecewise-linear systems

International Nuclear Information System (INIS)

Peng Haipeng; Liu Fei; Li Lixiang; Yang Yixian; Wang Xue

2010-01-01

This Letter explores piecewise-linear systems to construct dynamic logic architecture. The proposed schemes can discriminate the two input signals and obtain 16 kinds of logic operations by different combinations of parameters and conditions for determining the output. Each logic cell performs more flexibly, that makes it possible to achieve complex logic operations more simply and construct computing architecture with less logic cells. We also analyze the various performances of our schemes under different conditions and the characteristics of these schemes.
Applications of an architecture design and assessment system (ADAS)

Science.gov (United States)

Gray, F. Gail; Debrunner, Linda S.; White, Tennis S.

1988-01-01

A new Architecture Design and Assessment System (ADAS) tool package is introduced, and a range of possible applications is illustrated. ADAS was used to evaluate the performance of an advanced fault-tolerant computer architecture in a modern flight control application. Bottlenecks were identified and possible solutions suggested. The tool was also used to inject faults into the architecture and evaluate the synchronization algorithm, and improvements are suggested. Finally, ADAS was used as a front end research tool to aid in the design of reconfiguration algorithms in a distributed array architecture.
Contributing to the design of run-time systems dedicated to high performance computing; Contribution a l'elaboration d'environnements de programmation dedies au calcul scientifique hautes performances

Energy Technology Data Exchange (ETDEWEB)

Perache, M

2006-10-15

In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)
Architectural design and energy performance; Conception architecturale et performance energetique

Energy Technology Data Exchange (ETDEWEB)

Beaud, Ph. [Agence de l' Environnement et de la Maitrise de l' Energie, (ADEME), 06 - Valbonne (France); Pouget, A. [Bureau Etude Thermique, 75 - Paris (France); Sesolis, B. [TRIBU, 75 - Paris (France)] [and others

2000-07-01

This day was organized around the energy performance of the architecture in three parts. A first time dealt with the design of new buildings and private houses. Simulation tools for the energy optimization and practice of design are discussed. The second part was devoted to the new 2000 regulation with an open discussion on the regulatory costs. The last part forecasted the evolution until 2015 taking into account the french program of fight against the greenhouse effect, the limitation of the air conditioning consumption and the definition of a quality label concerning the energy performances. (A.L.B.)
From Archi Torture to Architecture: Undergraduate Students Design and Implement Computers Using the Multimedia Logic Emulator

Science.gov (United States)

Stanley, Timothy D.; Wong, Lap Kei; Prigmore, Daniel; Benson, Justin; Fishler, Nathan; Fife, Leslie; Colton, Don

2007-01-01

Students learn better when they both hear and do. In computer architecture courses "doing" can be difficult in small schools without hardware laboratories hosted by computer engineering, electrical engineering, or similar departments. Software solutions exist. Our success with George Mills' Multimedia Logic (MML) is the focus of this paper. MML…
Heterogeneous reconfigurable processors for real-time baseband processing from algorithm to architecture

CERN Document Server

Zhang, Chenxin; Öwall, Viktor

2016-01-01

This book focuses on domain-specific heterogeneous reconfigurable architectures, demonstrating for readers a computing platform which is flexible enough to support multiple standards, multiple modes, and multiple algorithms. The content is multi-disciplinary, covering areas of wireless communication, computing architecture, and circuit design. The platform described provides real-time processing capability with reasonable implementation cost, achieving balanced trade-offs among flexibility, performance, and hardware costs. The authors discuss efficient design methods for wireless communication processing platforms, from both an algorithm and architecture design perspective. Coverage also includes computing platforms for different wireless technologies and standards, including MIMO, OFDM, Massive MIMO, DVB, WLAN, LTE/LTE-A, and 5G. •Discusses reconfigurable architectures, including hardware building blocks such as processing elements, memory sub-systems, Network-on-Chip (NoC), and dynamic hardware reconfigur...
Irregular Applications: Architectures & Algorithms

Energy Technology Data Exchange (ETDEWEB)

Feo, John T.; Villa, Oreste; Tumeo, Antonino; Secchi, Simone

2012-02-06

Irregular applications are characterized by irregular data structures, control and communication patterns. Novel irregular high performance applications which deal with large data sets and require have recently appeared. Unfortunately, current high performance systems and software infrastructures executes irregular algorithms poorly. Only coordinated efforts by end user, area specialists and computer scientists that consider both the architecture and the software stack may be able to provide solutions to the challenges of modern irregular applications.
Architecture Students' Perceptions of Their Learning Environment and Their Academic Performance

Science.gov (United States)

Oluwatayo, Adedapo Adewunmi; Aderonmu, Peter A.; Aduwo, Egidario B.

2015-01-01

Scholars have agreed that the way in which students perceive their learning environments influences their academic performance. Empirical studies that focus on architecture students, however, have been very scarce. This is the gap that an attempt is filled in this study. A questionnaire survey of 273 students in a school of architecture in Nigeria…
From variability tolerance to approximate computing in parallel integrated architectures and accelerators

CERN Document Server

Rahimi, Abbas; Gupta, Rajesh K

2017-01-01

This book focuses on computing devices and their design at various levels to combat variability. The authors provide a review of key concepts with particular emphasis on timing errors caused by various variability sources. They discuss methods to predict and prevent, detect and correct, and finally conditions under which such errors can be accepted; they also consider their implications on cost, performance and quality. Coverage includes a comparative evaluation of methods for deployment across various layers of the system from circuits, architecture, to application software. These can be combined in various ways to achieve specific goals related to observability and controllability of the variability effects, providing means to achieve cross layer or hybrid resilience. · Covers challenges and opportunities in identifying microelectronic variability and the resulting errors at various layers in the system abstraction; · Enables readers to assess how various levels of circuit and system design can mitigate t...
Transitioning ISR architecture into the cloud

Science.gov (United States)

Lash, Thomas D.

2012-06-01

Emerging cloud computing platforms offer an ideal opportunity for Intelligence, Surveillance, and Reconnaissance (ISR) intelligence analysis. Cloud computing platforms help overcome challenges and limitations of traditional ISR architectures. Modern ISR architectures can benefit from examining commercial cloud applications, especially as they relate to user experience, usage profiling, and transformational business models. This paper outlines legacy ISR architectures and their limitations, presents an overview of cloud technologies and their applications to the ISR intelligence mission, and presents an idealized ISR architecture implemented with cloud computing.
Scientific Computing Kernels on the Cell Processor

Energy Technology Data Exchange (ETDEWEB)

Williams, Samuel W.; Shalf, John; Oliker, Leonid; Kamil, Shoaib; Husbands, Parry; Yelick, Katherine

2007-04-04

The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of using the recently-released STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. First, we introduce a performance model for Cell and apply it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs. The difficulty of programming Cell, which requires assembly level intrinsics for the best performance, makes this model useful as an initial step in algorithm design and evaluation. Next, we validate the accuracy of our model by comparing results against published hardware results, as well as our own implementations on a 3.2GHz Cell blade. Additionally, we compare Cell performance to benchmarks run on leading superscalar (AMD Opteron), VLIW (Intel Itanium2), and vector (Cray X1E) architectures. Our work also explores several different mappings of the kernels and demonstrates a simple and effective programming model for Cell's unique architecture. Finally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.

Missile signal processing common computer architecture for rapid technology upgrade

Science.gov (United States)

Rabinkin, Daniel V.; Rutledge, Edward; Monticciolo, Paul

2004-10-01

Interceptor missiles process IR images to locate an intended target and guide the interceptor towards it. Signal processing requirements have increased as the sensor bandwidth increases and interceptors operate against more sophisticated targets. A typical interceptor signal processing chain is comprised of two parts. Front-end video processing operates on all pixels of the image and performs such operations as non-uniformity correction (NUC), image stabilization, frame integration and detection. Back-end target processing, which tracks and classifies targets detected in the image, performs such algorithms as Kalman tracking, spectral feature extraction and target discrimination. In the past, video processing was implemented using ASIC components or FPGAs because computation requirements exceeded the throughput of general-purpose processors. Target processing was performed using hybrid architectures that included ASICs, DSPs and general-purpose processors. The resulting systems tended to be function-specific, and required custom software development. They were developed using non-integrated toolsets and test equipment was developed along with the processor platform. The lifespan of a system utilizing the signal processing platform often spans decades, while the specialized nature of processor hardware and software makes it difficult and costly to upgrade. As a result, the signal processing systems often run on outdated technology, algorithms are difficult to update, and system effectiveness is impaired by the inability to rapidly respond to new threats. A new design approach is made possible three developments; Moore's Law - driven improvement in computational throughput; a newly introduced vector computing capability in general purpose processors; and a modern set of open interface software standards. Today's multiprocessor commercial-off-the-shelf (COTS) platforms have sufficient throughput to support interceptor signal processing requirements. This application
Measurements of the LHCb software stack on the ARM architecture

International Nuclear Information System (INIS)

Kartik, S Vijay; Couturier, Ben; Clemencic, Marco; Neufeld, Niko

2014-01-01

The ARM architecture is a power-efficient design that is used in most processors in mobile devices all around the world today since they provide reasonable compute performance per watt. The current LHCb software stack is designed (and thus expected) to build and run on machines with the x86/x86 6 4 architecture. This paper outlines the process of measuring the performance of the LHCb software stack on the ARM architecture – specifically, the ARMv7 architecture on Cortex-A9 processors from NVIDIA and on full-fledged ARM servers with chipsets from Calxeda – and makes comparisons with the performance on x86 6 4 architectures on the Intel Xeon L5520/X5650 and AMD Opteron 6272. The paper emphasises the aspects of performance per core with respect to the power drawn by the compute nodes for the given performance – this ensures a fair real-world comparison with much more 'powerful' Intel/AMD processors. The comparisons of these real workloads in the context of LHCb are also complemented with the standard synthetic benchmarks HEPSPEC and Coremark. The pitfalls and solutions for the non-trivial task of porting the source code to build for the ARMv7 instruction set are presented. The specific changes in the build process needed for ARM-specific portions of the software stack are described, to serve as pointers for further attempts taken up by other groups in this direction. Cases where architecture-specific tweaks at the assembler lever (both in ROOT and the LHCb software stack) were needed for a successful compile are detailed – these cases are good indicators of where/how the software stack as well as the build system can be made more portable and multi-arch friendly. The experience gained from the tasks described in this paper are intended to i) assist in making an informed choice about ARM-based server solutions as a feasible low-power alternative to the current compute nodes, and ii) revisit the software design and build system for portability and
FPGA Based Intelligent Co-operative Processor in Memory Architecture

DEFF Research Database (Denmark)

Ahmed, Zaki; Sotudeh, Reza; Hussain, Dil Muhammad Akbar

2011-01-01

benefits of PIM, a concept of Co-operative Intelligent Memory (CIM) was developed by the intelligent system group of University of Hertfordshire, based on the previously developed Co-operative Pseudo Intelligent Memory (CPIM). This paper provides an overview on previous works (CPIM, CIM) and realization......In a continuing effort to improve computer system performance, Processor-In-Memory (PIM) architecture has emerged as an alternative solution. PIM architecture incorporates computational units and control logic directly on the memory to provide immediate access to the data. To exploit the potential...
High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away

Science.gov (United States)

Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.

2012-09-01

, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.
Acceleration of FDTD mode solver by high-performance computing techniques.

Science.gov (United States)

Han, Lin; Xi, Yanping; Huang, Wei-Ping

2010-06-21

A two-dimensional (2D) compact finite-difference time-domain (FDTD) mode solver is developed based on wave equation formalism in combination with the matrix pencil method (MPM). The method is validated for calculation of both real guided and complex leaky modes of typical optical waveguides against the bench-mark finite-difference (FD) eigen mode solver. By taking advantage of the inherent parallel nature of the FDTD algorithm, the mode solver is implemented on graphics processing units (GPUs) using the compute unified device architecture (CUDA). It is demonstrated that the high-performance computing technique leads to significant acceleration of the FDTD mode solver with more than 30 times improvement in computational efficiency in comparison with the conventional FDTD mode solver running on CPU of a standard desktop computer. The computational efficiency of the accelerated FDTD method is in the same order of magnitude of the standard finite-difference eigen mode solver and yet require much less memory (e.g., less than 10%). Therefore, the new method may serve as an efficient, accurate and robust tool for mode calculation of optical waveguides even when the conventional eigen value mode solvers are no longer applicable due to memory limitation.
Architectural Drawing - an Animate Field

DEFF Research Database (Denmark)

Hougaard, Anna Katrine

2015-01-01

Architectural drawing is changing because architects today draw with computers. Due to this change digital diagrams employed by computational architectural practices are often emphasized as powerful structures of control and organisation in the design process. But there are also diagrams, which do...... ways of directing behaviour of artefacts and living things without controlling this behaviour completely. I analyse a musical composition by John Cage as an example of a sketch diagram, and then hypothesize that orthogonal, architectural drawing can work in similar ways. Thereby I hope to point out...... important affordance of architectural drawing as a ¬hybrid between the openness of hand-sketching and the rule-based-ness of diagramming, an affordance which might be useful in the migrational zone of current architectural drawing where traditional hand drawing techniques and computer drawing techniques...
The CEBAF [Continuous Electron Beam Accelerator Facility] control system architecture

International Nuclear Information System (INIS)

Bork, R.

1987-01-01

The focus of this paper is on CEBAF's computer control system. This control system will utilize computers in a distributed, networked configuration. The architecture, networking and operating system of the computers, and preliminary performance data are presented. We will also discuss the design of the operator consoles and the interfacing between the computers and CEBAF's instrumentation and operating equipment
Control system architecture: The standard and non-standard models

International Nuclear Information System (INIS)

Thuot, M.E.; Dalesio, L.R.

1993-01-01

Control system architecture development has followed the advances in computer technology through mainframes to minicomputers to micros and workstations. This technology advance and increasingly challenging accelerator data acquisition and automation requirements have driven control system architecture development. In summarizing the progress of control system architecture at the last International Conference on Accelerator and Large Experimental Physics Control Systems (ICALEPCS) B. Kuiper asserted that the system architecture issue was resolved and presented a open-quotes standard modelclose quotes. The open-quotes standard modelclose quotes consists of a local area network (Ethernet or FDDI) providing communication between front end microcomputers, connected to the accelerator, and workstations, providing the operator interface and computational support. Although this model represents many present designs, there are exceptions including reflected memory and hierarchical architectures driven by requirements for widely dispersed, large channel count or tightly coupled systems. This paper describes the performance characteristics and features of the open-quotes standard modelclose quotes to determine if the requirements of open-quotes non-standardclose quotes architectures can be met. Several possible extensions to the open-quotes standard modelclose quotes are suggested including software as well as the hardware architectural features
NETRA: A parallel architecture for integrated vision systems 2: Algorithms and performance evaluation

Science.gov (United States)

Choudhary, Alok N.; Patel, Janak H.; Ahuja, Narendra

1989-01-01

In part 1 architecture of NETRA is presented. A performance evaluation of NETRA using several common vision algorithms is also presented. Performance of algorithms when they are mapped on one cluster is described. It is shown that SIMD, MIMD, and systolic algorithms can be easily mapped onto processor clusters, and almost linear speedups are possible. For some algorithms, analytical performance results are compared with implementation performance results. It is observed that the analysis is very accurate. Performance analysis of parallel algorithms when mapped across clusters is presented. Mappings across clusters illustrate the importance and use of shared as well as distributed memory in achieving high performance. The parameters for evaluation are derived from the characteristics of the parallel algorithms, and these parameters are used to evaluate the alternative communication strategies in NETRA. Furthermore, the effect of communication interference from other processors in the system on the execution of an algorithm is studied. Using the analysis, performance of many algorithms with different characteristics is presented. It is observed that if communication speeds are matched with the computation speeds, good speedups are possible when algorithms are mapped across clusters.
Selecting an Architecture for a Safety-Critical Distributed Computer System with Power, Weight and Cost Considerations

Science.gov (United States)

Torres-Pomales, Wilfredo

2014-01-01

This report presents an example of the application of multi-criteria decision analysis to the selection of an architecture for a safety-critical distributed computer system. The design problem includes constraints on minimum system availability and integrity, and the decision is based on the optimal balance of power, weight and cost. The analysis process includes the generation of alternative architectures, evaluation of individual decision criteria, and the selection of an alternative based on overall value. In this example presented here, iterative application of the quantitative evaluation process made it possible to deliberately generate an alternative architecture that is superior to all others regardless of the relative importance of cost.
Integrating acoustic analysis in the architectural design process using parametric modelling

DEFF Research Database (Denmark)

Peters, Brady

2011-01-01

This paper discusses how parametric modeling techniques can be used to provide architectural designers with a better understanding of the acoustic performance of their designs and provide acoustic engineers with models that can be analyzed using computational acoustic analysis software. Architects......, acoustic performance can inform the geometry and material logic of the design. In this way, the architectural design and the acoustic analysis model become linked....
Design and Training of Limited-Interconnect Architectures

Science.gov (United States)

1991-07-16

and signal processing. Neuromorphic (brain like) models, allow an alternative for achieving real-time operation tor such tasks, while having a...compact and robust architecture. Neuromorphic models consist of interconnections of simple computational nodes. In this approach, each node computes a...operational performance. I1. Research Objectives The research objectives were: 1. Development of on- chip local training rules specifically designed for
Prospective Architectures for Onboard vs Cloud-Based Decision Making for Unmanned Aerial Systems

Science.gov (United States)

Sankararaman, Shankar; Teubert, Christopher

2017-01-01

This paper investigates propsective architectures for decision-making in unmanned aerial systems. When these unmanned vehicles operate in urban environments, there are several sources of uncertainty that affect their behavior, and decision-making algorithms need to be robust to account for these different sources of uncertainty. It is important to account for several risk-factors that affect the flight of these unmanned systems, and facilitate decision-making by taking into consideration these various risk-factors. In addition, there are several technical challenges related to autonomous flight of unmanned aerial systems; these challenges include sensing, obstacle detection, path planning and navigation, trajectory generation and selection, etc. Many of these activities require significant computational power and in many situations, all of these activities need to be performed in real-time. In order to efficiently integrate these activities, it is important to develop a systematic architecture that can facilitate real-time decision-making. Four prospective architectures are discussed in this paper; on one end of the spectrum, the first architecture considers all activities/computations being performed onboard the vehicle whereas on the other end of the spectrum, the fourth and final architecture considers all activities/computations being performed in the cloud, using a new service known as Prognostics as a Service that is being developed at NASA Ames Research Center. The four different architectures are compared, their advantages and disadvantages are explained and conclusions are presented.
The Sentinel-4 detectors: architecture and performance

Science.gov (United States)

Skegg, Michael P.; Hermsen, Markus; Hohn, Rüdiger; Williges, Christian; Woffinden, Charles; Levillain, Yves; Reulke, Ralf

2017-09-01

The Sentinel-4 instrument is an imaging spectrometer, developed by Airbus under ESA contract in the frame of the joint European Union (EU)/ESA COPERNICUS program. SENTINEL-4 will provide accurate measurements of trace gases from geostationary orbit, including key atmospheric constituents such as ozone, nitrogen dioxide, sulfur dioxide, formaldehyde, as well as aerosol and cloud properties. Key to achieving these atmospheric measurements are the two CCD detectors, covering the wavelengths in the ranges 305 nm to 500 nm (UVVIS) and 750 to 775 nm (NIR) respectively. The paper describes the architecture, and operation of these two CCD detectors, which have an unusually high full-well capacity and a very specific architecture and read-out sequence to match the requirements of the Sentinel- 4 instrument. The key performance aspects and their verification through measurement are presented, with a focus on an unusual, bi-modal dark signal generation rate observed during test.
An FPGA-Based Quantum Computing Emulation Framework Based on Serial-Parallel Architecture

Directory of Open Access Journals (Sweden)

Y. H. Lee

2016-01-01

Full Text Available Hardware emulation of quantum systems can mimic more efficiently the parallel behaviour of quantum computations, thus allowing higher processing speed-up than software simulations. In this paper, an efficient hardware emulation method that employs a serial-parallel hardware architecture targeted for field programmable gate array (FPGA is proposed. Quantum Fourier transform and Grover’s search are chosen as case studies in this work since they are the core of many useful quantum algorithms. Experimental work shows that, with the proposed emulation architecture, a linear reduction in resource utilization is attained against the pipeline implementations proposed in prior works. The proposed work contributes to the formulation of a proof-of-concept baseline FPGA emulation framework with optimization on datapath designs that can be extended to emulate practical large-scale quantum circuits.
The Potential of the Cell Processor for Scientific Computing

Energy Technology Data Exchange (ETDEWEB)

Williams, Samuel; Shalf, John; Oliker, Leonid; Husbands, Parry; Kamil, Shoaib; Yelick, Katherine

2005-10-14

The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of the using the forth coming STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. We are the first to present quantitative Cell performance data on scientific kernels and show direct comparisons against leading superscalar (AMD Opteron), VLIW (IntelItanium2), and vector (Cray X1) architectures. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop both analytical models and simulators to predict kernel performance. Our work also explores the complexity of mapping several important scientific algorithms onto the Cells unique architecture. Additionally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.
Human Computer Music Performance

OpenAIRE

Dannenberg, Roger B.

2012-01-01

Human Computer Music Performance (HCMP) is the study of music performance by live human performers and real-time computer-based performers. One goal of HCMP is to create a highly autonomous artificial performer that can fill the role of a human, especially in a popular music setting. This will require advances in automated music listening and understanding, new representations for music, techniques for music synchronization, real-time human-computer communication, music generation, sound synt...
Advanced parallel processing with supercomputer architectures

International Nuclear Information System (INIS)

Hwang, K.

1987-01-01

This paper investigates advanced parallel processing techniques and innovative hardware/software architectures that can be applied to boost the performance of supercomputers. Critical issues on architectural choices, parallel languages, compiling techniques, resource management, concurrency control, programming environment, parallel algorithms, and performance enhancement methods are examined and the best answers are presented. The authors cover advanced processing techniques suitable for supercomputers, high-end mainframes, minisupers, and array processors. The coverage emphasizes vectorization, multitasking, multiprocessing, and distributed computing. In order to achieve these operation modes, parallel languages, smart compilers, synchronization mechanisms, load balancing methods, mapping parallel algorithms, operating system functions, application library, and multidiscipline interactions are investigated to ensure high performance. At the end, they assess the potentials of optical and neural technologies for developing future supercomputers
SNAVA-A real-time multi-FPGA multi-model spiking neural network simulation architecture.

Science.gov (United States)

Sripad, Athul; Sanchez, Giovanny; Zapata, Mireya; Pirrone, Vito; Dorta, Taho; Cambria, Salvatore; Marti, Albert; Krishnamourthy, Karthikeyan; Madrenas, Jordi

2018-01-01

Spiking Neural Networks (SNN) for Versatile Applications (SNAVA) simulation platform is a scalable and programmable parallel architecture that supports real-time, large-scale, multi-model SNN computation. This parallel architecture is implemented in modern Field-Programmable Gate Arrays (FPGAs) devices to provide high performance execution and flexibility to support large-scale SNN models. Flexibility is defined in terms of programmability, which allows easy synapse and neuron implementation. This has been achieved by using a special-purpose Processing Elements (PEs) for computing SNNs, and analyzing and customizing the instruction set according to the processing needs to achieve maximum performance with minimum resources. The parallel architecture is interfaced with customized Graphical User Interfaces (GUIs) to configure the SNN's connectivity, to compile the neuron-synapse model and to monitor SNN's activity. Our contribution intends to provide a tool that allows to prototype SNNs faster than on CPU/GPU architectures but significantly cheaper than fabricating a customized neuromorphic chip. This could be potentially valuable to the computational neuroscience and neuromorphic engineering communities. Copyright © 2017 Elsevier Ltd. All rights reserved.
Convolutional neural network architectures for predicting DNA–protein binding

Science.gov (United States)

Zeng, Haoyang; Edwards, Matthew D.; Liu, Ge; Gifford, David K.

2016-01-01

Motivation: Convolutional neural networks (CNN) have outperformed conventional methods in modeling the sequence specificity of DNA–protein binding. Yet inappropriate CNN architectures can yield poorer performance than simpler models. Thus an in-depth understanding of how to match CNN architecture to a given task is needed to fully harness the power of CNNs for computational biology applications. Results: We present a systematic exploration of CNN architectures for predicting DNA sequence binding using a large compendium of transcription factor datasets. We identify the best-performing architectures by varying CNN width, depth and pooling designs. We find that adding convolutional kernels to a network is important for motif-based tasks. We show the benefits of CNNs in learning rich higher-order sequence features, such as secondary motifs and local sequence context, by comparing network performance on multiple modeling tasks ranging in difficulty. We also demonstrate how careful construction of sequence benchmark datasets, using approaches that control potentially confounding effects like positional or motif strength bias, is critical in making fair comparisons between competing methods. We explore how to establish the sufficiency of training data for these learning tasks, and we have created a flexible cloud-based framework that permits the rapid exploration of alternative neural network architectures for problems in computational biology. Availability and Implementation: All the models analyzed are available at http://cnn.csail.mit.edu. Contact: gifford@mit.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307608

Contributing to the design of run-time systems dedicated to high performance computing; Contribution a l'elaboration d'environnements de programmation dedies au calcul scientifique hautes performances

Energy Technology Data Exchange (ETDEWEB)

Perache, M

2006-10-15

In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)
A modular architecture for transparent computation in recurrent neural networks.

Science.gov (United States)

Carmantini, Giovanni S; Beim Graben, Peter; Desroches, Mathieu; Rodrigues, Serafim

2017-01-01

Computation is classically studied in terms of automata, formal languages and algorithms; yet, the relation between neural dynamics and symbolic representations and operations is still unclear in traditional eliminative connectionism. Therefore, we suggest a unique perspective on this central issue, to which we would like to refer as transparent connectionism, by proposing accounts of how symbolic computation can be implemented in neural substrates. In this study we first introduce a new model of dynamics on a symbolic space, the versatile shift, showing that it supports the real-time simulation of a range of automata. We then show that the Gödelization of versatile shifts defines nonlinear dynamical automata, dynamical systems evolving on a vectorial space. Finally, we present a mapping between nonlinear dynamical automata and recurrent artificial neural networks. The mapping defines an architecture characterized by its granular modularity, where data, symbolic operations and their control are not only distinguishable in activation space, but also spatially localizable in the network itself, while maintaining a distributed encoding of symbolic representations. The resulting networks simulate automata in real-time and are programmed directly, in the absence of network training. To discuss the unique characteristics of the architecture and their consequences, we present two examples: (i) the design of a Central Pattern Generator from a finite-state locomotive controller, and (ii) the creation of a network simulating a system of interactive automata that supports the parsing of garden-path sentences as investigated in psycholinguistics experiments. Copyright © 2016 Elsevier Ltd. All rights reserved.
High performance computing environment for multidimensional image analysis.

Science.gov (United States)

Rao, A Ravishankar; Cecchi, Guillermo A; Magnasco, Marcelo

2007-07-10

The processing of images acquired through microscopy is a challenging task due to the large size of datasets (several gigabytes) and the fast turnaround time required. If the throughput of the image processing stage is significantly increased, it can have a major impact in microscopy applications. We present a high performance computing (HPC) solution to this problem. This involves decomposing the spatial 3D image into segments that are assigned to unique processors, and matched to the 3D torus architecture of the IBM Blue Gene/L machine. Communication between segments is restricted to the nearest neighbors. When running on a 2 Ghz Intel CPU, the task of 3D median filtering on a typical 256 megabyte dataset takes two and a half hours, whereas by using 1024 nodes of Blue Gene, this task can be performed in 18.8 seconds, a 478x speedup. Our parallel solution dramatically improves the performance of image processing, feature extraction and 3D reconstruction tasks. This increased throughput permits biologists to conduct unprecedented large scale experiments with massive datasets.
A Trusted Computing Architecture of Embedded System Based on Improved TPM

Directory of Open Access Journals (Sweden)

Wang Xiaosheng

2017-01-01

Full Text Available The Trusted Platform Module (TPM currently used by PCs is not suitable for embedded systems, it is necessary to improve existing TPM. The paper proposes a trusted computing architecture with new TPM and the cryptographic system developed by China for the embedded system. The improved TPM consists of the Embedded System Trusted Cryptography Module (eTCM and the Embedded System Trusted Platform Control Module (eTPCM, which are combined and implemented the TPM’s autonomous control, active defense, high-speed encryption/decryption and other function through its internal bus arbitration module and symmetric and asymmetric cryptographic engines to effectively protect the security of embedded system. In our improved TPM, a trusted measurement method with chain model and star type model is used. Finally, the improved TPM is designed by FPGA, and it is used to a trusted PDA to carry out experimental verification. Experiments show that the trusted architecture of the embedded system based on the improved TPM is efficient, reliable and secure.
On Architectural Acoustics Design using Computer Simulation

DEFF Research Database (Denmark)

Schmidt, Anne Marie Due; Kirkegaard, Poul Henning

2004-01-01

The acoustical quality of a given building, or space within the building, is highly dependent on the architectural design. Architectural acoustics design has in the past been based on simple design rules. However, with a growing complexity in the architectural acoustic and the emergence of potent...... room acoustic simulation programs it is now possible to subjectively analyze and evaluate acoustic properties prior to the actual construction of a facility. With the right tools applied, the acoustic design can become an integrated part of the architectural design process. The aim of the present paper...... this information is discussed. The conclusion of the paper is that the application of acoustical simulation programs is most beneficial in the last of three phases but that an application of the program to the two first phases would be preferable and possible with an improvement of the interface of the program....
RATS: Reactive Architectures

National Research Council Canada - National Science Library

Christensen, Marc

2004-01-01

This project had two goals: To build an emulation prototype board for a tiled architecture and to demonstrate the utility of a global inter-chip free-space photonic interconnection fabric for polymorphous computer architectures (PCA...
Laboratory infrastructure driven key performance indicator development using the smart grid architecture model

DEFF Research Database (Denmark)

Syed, Mazheruddin H.; Guillo-Sansano, Efren; Blair, Steven M.

2017-01-01

This study presents a methodology for collaboratively designing laboratory experiments and developing key performance indicators for the testing and validation of novel power system control architectures in multiple laboratory environments. The contribution makes use of the smart grid architecture...
From green architecture to architectural green

DEFF Research Database (Denmark)

Earon, Ofri

2011-01-01

that describes the architectural exclusivity of this particular architecture genre. The adjective green expresses architectural qualities differentiating green architecture from none-green architecture. Currently, adding trees and vegetation to the building’s facade is the main architectural characteristics...... they have overshadowed the architectural potential of green architecture. The paper questions how a green space should perform, look like and function. Two examples are chosen to demonstrate thorough integrations between green and space. The examples are public buildings categorized as pavilions. One......The paper investigates the topic of green architecture from an architectural point of view and not an energy point of view. The purpose of the paper is to establish a debate about the architectural language and spatial characteristics of green architecture. In this light, green becomes an adjective...
BLAST in Gid (BiG): A Grid-Enabled Software Architecture and Implementation of Parallel and Sequential BLAST

International Nuclear Information System (INIS)

Aparicio, G.; Blanquer, I.; Hernandez, V.; Segrelles, D.

2007-01-01

The integration of High-performance computing tools is a key issue in biomedical research. Many computer-based applications have been migrated to High-Performance computers to deal with their computing and storage needs such as BLAST. However, the use of clusters and computing farm presents problems in scalability. The use of a higher layer of parallelism that splits the task into highly independent long jobs that can be executed in parallel can improve the performance maintaining the efficiency. Grid technologies combined with parallel computing resources are an important enabling technology. This work presents a software architecture for executing BLAST in a International Grid Infrastructure that guarantees security, scalability and fault tolerance. The software architecture is modular an adaptable to many other high-throughput applications, both inside the field of bio computing and outside. (Author)
Architectural Drawing - an Animate Field

DEFF Research Database (Denmark)

Hougaard, Anna Katrine

2015-01-01

Architectural drawing is changing because architects today draw with computers. Due to this change digital diagrams employed by computational architectural practices are often emphasized as powerful structures of control and organisation in the design process. But there are also diagrams, which d...
Ontology Design for Solving Computationally-Intensive Problems on Heterogeneous Architectures

Directory of Open Access Journals (Sweden)

Hossam M. Faheem

2018-02-01

Full Text Available Viewing a computationally-intensive problem as a self-contained challenge with its own hardware, software and scheduling strategies is an approach that should be investigated. We might suggest assigning heterogeneous hardware architectures to solve a problem, while parallel computing paradigms may play an important role in writing efficient code to solve the problem; moreover, the scheduling strategies may be examined as a possible solution. Depending on the problem complexity, finding the best possible solution using an integrated infrastructure of hardware, software and scheduling strategy can be a complex job. Developing and using ontologies and reasoning techniques play a significant role in reducing the complexity of identifying the components of such integrated infrastructures. Undertaking reasoning and inferencing regarding the domain concepts can help to find the best possible solution through a combination of hardware, software and scheduling strategies. In this paper, we present an ontology and show how we can use it to solve computationally-intensive problems from various domains. As a potential use for the idea, we present examples from the bioinformatics domain. Validation by using problems from the Elastic Optical Network domain has demonstrated the flexibility of the suggested ontology and its suitability for use with any other computationally-intensive problem domain.
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures

Science.gov (United States)

2017-10-04

to the memory architectures of CPUs and GPUs to obtain good performance and result in good memory performance using cache management. These methods ...Accomplishments: The PI and students has developed new methods for path and ray tracing and their Report Date: 14-Oct-2017 INVESTIGATOR(S): Phone...The efficiency of our method makes it a good candidate for forming hybrid schemes with wave-based models. One possibility is to couple the ray curve
Unstructured Computational Aerodynamics on Many Integrated Core Architecture

KAUST Repository

Al Farhan, Mohammed A.

2016-06-08

Shared memory parallelization of the flux kernel of PETSc-FUN3D, an unstructured tetrahedral mesh Euler flow code previously studied for distributed memory and multi-core shared memory, is evaluated on up to 61 cores per node and up to 4 threads per core. We explore several thread-level optimizations to improve flux kernel performance on the state-of-the-art many integrated core (MIC) Intel processor Xeon Phi “Knights Corner,” with a focus on strong thread scaling. While the linear algebraic kernel is bottlenecked by memory bandwidth for even modest numbers of cores sharing a common memory, the flux kernel, which arises in the control volume discretization of the conservation law residuals and in the formation of the preconditioner for the Jacobian by finite-differencing the conservation law residuals, is compute-intensive and is known to exploit effectively contemporary multi-core hardware. We extend study of the performance of the flux kernel to the Xeon Phi in three thread affinity modes, namely scatter, compact, and balanced, in both offload and native mode, with and without various code optimizations to improve alignment and reduce cache coherency penalties. Relative to baseline “out-of-the-box” optimized compilation, code restructuring optimizations provide about 3.8x speedup using the offload mode and about 5x speedup using the native mode. Even with these gains for the flux kernel, with respect to execution time the MIC simply achieves par with optimized compilation on a contemporary multi-core Intel CPU, the 16-core Sandy Bridge E5 2670. Nevertheless, the optimizations employed to reduce the data motion and cache coherency protocol penalties of the MIC are expected to be of value for CFD and many other unstructured applications as many-core architecture evolves. We explore large-scale distributed-shared memory performance on the Cray XC40 supercomputer, to demonstrate that optimizations employed on Phi hybridize to this context, where each of
Unstructured Computational Aerodynamics on Many Integrated Core Architecture

KAUST Repository

Al Farhan, Mohammed A.; Kaushik, Dinesh K.; Keyes, David E.

2016-01-01

Shared memory parallelization of the flux kernel of PETSc-FUN3D, an unstructured tetrahedral mesh Euler flow code previously studied for distributed memory and multi-core shared memory, is evaluated on up to 61 cores per node and up to 4 threads per core. We explore several thread-level optimizations to improve flux kernel performance on the state-of-the-art many integrated core (MIC) Intel processor Xeon Phi “Knights Corner,” with a focus on strong thread scaling. While the linear algebraic kernel is bottlenecked by memory bandwidth for even modest numbers of cores sharing a common memory, the flux kernel, which arises in the control volume discretization of the conservation law residuals and in the formation of the preconditioner for the Jacobian by finite-differencing the conservation law residuals, is compute-intensive and is known to exploit effectively contemporary multi-core hardware. We extend study of the performance of the flux kernel to the Xeon Phi in three thread affinity modes, namely scatter, compact, and balanced, in both offload and native mode, with and without various code optimizations to improve alignment and reduce cache coherency penalties. Relative to baseline “out-of-the-box” optimized compilation, code restructuring optimizations provide about 3.8x speedup using the offload mode and about 5x speedup using the native mode. Even with these gains for the flux kernel, with respect to execution time the MIC simply achieves par with optimized compilation on a contemporary multi-core Intel CPU, the 16-core Sandy Bridge E5 2670. Nevertheless, the optimizations employed to reduce the data motion and cache coherency protocol penalties of the MIC are expected to be of value for CFD and many other unstructured applications as many-core architecture evolves. We explore large-scale distributed-shared memory performance on the Cray XC40 supercomputer, to demonstrate that optimizations employed on Phi hybridize to this context, where each of
Selection of an optimal neural network architecture for computer-aided detection of microcalcifications - Comparison of automated optimization techniques

International Nuclear Information System (INIS)

Gurcan, Metin N.; Sahiner, Berkman; Chan Heangping; Hadjiiski, Lubomir; Petrick, Nicholas

2001-01-01

Many computer-aided diagnosis (CAD) systems use neural networks (NNs) for either detection or classification of abnormalities. Currently, most NNs are 'optimized' by manual search in a very limited parameter space. In this work, we evaluated the use of automated optimization methods for selecting an optimal convolution neural network (CNN) architecture. Three automated methods, the steepest descent (SD), the simulated annealing (SA), and the genetic algorithm (GA), were compared. We used as an example the CNN that classifies true and false microcalcifications detected on digitized mammograms by a prescreening algorithm. Four parameters of the CNN architecture were considered for optimization, the numbers of node groups and the filter kernel sizes in the first and second hidden layers, resulting in a search space of 432 possible architectures. The area A z under the receiver operating characteristic (ROC) curve was used to design a cost function. The SA experiments were conducted with four different annealing schedules. Three different parent selection methods were compared for the GA experiments. An available data set was split into two groups with approximately equal number of samples. By using the two groups alternately for training and testing, two different cost surfaces were evaluated. For the first cost surface, the SD method was trapped in a local minimum 91% (392/432) of the time. The SA using the Boltzman schedule selected the best architecture after evaluating, on average, 167 architectures. The GA achieved its best performance with linearly scaled roulette-wheel parent selection; however, it evaluated 391 different architectures, on average, to find the best one. The second cost surface contained no local minimum. For this surface, a simple SD algorithm could quickly find the global minimum, but the SA with the very fast reannealing schedule was still the most efficient. The same SA scheme, however, was trapped in a local minimum on the first cost
Cloud/Fog Computing System Architecture and Key Technologies for South-North Water Transfer Project Safety

Directory of Open Access Journals (Sweden)

Yaoling Fan

2018-01-01

Full Text Available In view of the real-time and distributed features of Internet of Things (IoT safety system in water conservancy engineering, this study proposed a new safety system architecture for water conservancy engineering based on cloud/fog computing and put forward a method of data reliability detection for the false alarm caused by false abnormal data from the bottom sensors. Designed for the South-North Water Transfer Project (SNWTP, the architecture integrated project safety, water quality safety, and human safety. Using IoT devices, fog computing layer was constructed between cloud server and safety detection devices in water conservancy projects. Technologies such as real-time sensing, intelligent processing, and information interconnection were developed. Therefore, accurate forecasting, accurate positioning, and efficient management were implemented as required by safety prevention of the SNWTP, and safety protection of water conservancy projects was effectively improved, and intelligential water conservancy engineering was developed.
High Performance Numerical Computing for High Energy Physics: A New Challenge for Big Data Science

International Nuclear Information System (INIS)

Pop, Florin

2014-01-01

Modern physics is based on both theoretical analysis and experimental validation. Complex scenarios like subatomic dimensions, high energy, and lower absolute temperature are frontiers for many theoretical models. Simulation with stable numerical methods represents an excellent instrument for high accuracy analysis, experimental validation, and visualization. High performance computing support offers possibility to make simulations at large scale, in parallel, but the volume of data generated by these experiments creates a new challenge for Big Data Science. This paper presents existing computational methods for high energy physics (HEP) analyzed from two perspectives: numerical methods and high performance computing. The computational methods presented are Monte Carlo methods and simulations of HEP processes, Markovian Monte Carlo, unfolding methods in particle physics, kernel estimation in HEP, and Random Matrix Theory used in analysis of particles spectrum. All of these methods produce data-intensive applications, which introduce new challenges and requirements for ICT systems architecture, programming paradigms, and storage capabilities.
LISA Mission and System architectures and performances

International Nuclear Information System (INIS)

Gath, Peter F; Weise, Dennis; Schulte, Hans-Reiner; Johann, Ulrich

2009-01-01

In the context of the LISA Mission Formulation Study, the LISA System was studied in detail and a new baseline architecture for the whole mission was established. This new baseline is the result of trade-offs on both, mission and system level. The paper gives an overview of the different mission scenarios and configurations that were studied in connection with their corresponding advantages and disadvantages as well as performance estimates. Differences in the required technologies and their influence on the overall performance budgets are highlighted for all configurations. For the selected baseline concept, a more detailed description of the configuration is given and open issues in the technologies involved are discussed.
LISA Mission and System architectures and performances

Energy Technology Data Exchange (ETDEWEB)

Gath, Peter F; Weise, Dennis; Schulte, Hans-Reiner; Johann, Ulrich, E-mail: peter.gath@astrium.eads.ne [Astrium GmbH Satellites, 88039 Friedrichshafen (Germany)

2009-03-01

In the context of the LISA Mission Formulation Study, the LISA System was studied in detail and a new baseline architecture for the whole mission was established. This new baseline is the result of trade-offs on both, mission and system level. The paper gives an overview of the different mission scenarios and configurations that were studied in connection with their corresponding advantages and disadvantages as well as performance estimates. Differences in the required technologies and their influence on the overall performance budgets are highlighted for all configurations. For the selected baseline concept, a more detailed description of the configuration is given and open issues in the technologies involved are discussed.
Analysis of mobile fronthaul bandwidth and wireless transmission performance in split-PHY processing architecture.

Science.gov (United States)

Miyamoto, Kenji; Kuwano, Shigeru; Terada, Jun; Otaka, Akihiro

2016-01-25

We analyze the mobile fronthaul (MFH) bandwidth and the wireless transmission performance in the split-PHY processing (SPP) architecture, which redefines the functional split of centralized/cloud RAN (C-RAN) while preserving high wireless coordinated multi-point (CoMP) transmission/reception performance. The SPP architecture splits the base stations (BS) functions between wireless channel coding/decoding and wireless modulation/demodulation, and employs its own CoMP joint transmission and reception schemes. Simulation results show that the SPP architecture reduces the MFH bandwidth by up to 97% from conventional C-RAN while matching the wireless bit error rate (BER) performance of conventional C-RAN in uplink joint reception with only 2-dB signal to noise ratio (SNR) penalty.

Integration of highly probabilistic sources into optical quantum architectures: perpetual quantum computation

International Nuclear Information System (INIS)

Devitt, Simon J; Stephens, Ashley M; Munro, William J; Nemoto, Kae

2011-01-01

In this paper, we introduce a design for an optical topological cluster state computer constructed exclusively from a single quantum component. Unlike previous efforts we eliminate the need for on demand, high fidelity photon sources and detectors and replace them with the same device utilized to create photon/photon entanglement. This introduces highly probabilistic elements into the optical architecture while maintaining complete specificity of the structure and operation for a large-scale computer. Photons in this system are continually recycled back into the preparation network, allowing for an arbitrarily deep three-dimensional cluster to be prepared using a comparatively small number of photonic qubits and consequently the elimination of high-frequency, deterministic photon sources.
Reconfigurable computing the theory and practice of FPGA-based computation

CERN Document Server

Hauck, Scott

2010-01-01

Reconfigurable Computing marks a revolutionary and hot topic that bridges the gap between the separate worlds of hardware and software design- the key feature of reconfigurable computing is its groundbreaking ability to perform computations in hardware to increase performance while retaining the flexibility of a software solution. Reconfigurable computers serve as affordable, fast, and accurate tools for developing designs ranging from single chip architectures to multi-chip and embedded systems. Scott Hauck and Andre DeHon have assembled a group of the key experts in the fields of both hardwa
Evaluation of a server-client architecture for accelerator modeling and simulation

International Nuclear Information System (INIS)

Bowling, B.A.; Akers, W.; Shoaee, H.; Watson, W.; Zeijts, J. van; Witherspoon, S.

1997-01-01

Traditional approaches to computational modeling and simulation often utilize a batch method for code execution using file-formatted input/output. This method of code implementation was generally chosen for several factors, including CPU throughput and availability, complexity of the required modeling problem, and presentation of computation results. With the advent of faster computer hardware and the advances in networking and software techniques, other program architectures for accelerator modeling have recently been employed. Jefferson Laboratory has implemented a client/server solution for accelerator beam transport modeling utilizing a query-based I/O. The goal of this code is to provide modeling information for control system applications and to serve as a computation engine for general modeling tasks, such as machine studies. This paper performs a comparison between the batch execution and server/client architectures, focusing on design and implementation issues, performance, and general utility towards accelerator modeling demands
Fault-tolerant architectures for superconducting qubits

International Nuclear Information System (INIS)

DiVincenzo, David P

2009-01-01

In this short review, I draw attention to new developments in the theory of fault tolerance in quantum computation that may give concrete direction to future work in the development of superconducting qubit systems. The basics of quantum error-correction codes, which I will briefly review, have not significantly changed since their introduction 15 years ago. But an interesting picture has emerged of an efficient use of these codes that may put fault-tolerant operation within reach. It is now understood that two-dimensional surface codes, close relatives of the original toric code of Kitaev, can be adapted as shown by Raussendorf and Harrington to effectively perform logical gate operations in a very simple planar architecture, with error thresholds for fault-tolerant operation simulated to be 0.75%. This architecture uses topological ideas in its functioning, but it is not 'topological quantum computation'-there are no non-abelian anyons in sight. I offer some speculations on the crucial pieces of superconducting hardware that could be demonstrated in the next couple of years that would be clear stepping stones towards this surface-code architecture.
Platform Architecture for Decentralized Positioning Systems

Directory of Open Access Journals (Sweden)

Zakaria Kasmi

2017-04-01

Full Text Available A platform architecture for positioning systems is essential for the realization of a flexible localization system, which interacts with other systems and supports various positioning technologies and algorithms. The decentralized processing of a position enables pushing the application-level knowledge into a mobile station and avoids the communication with a central unit such as a server or a base station. In addition, the calculation of the position on low-cost and resource-constrained devices presents a challenge due to the limited computing, storage capacity, as well as power supply. Therefore, we propose a platform architecture that enables the design of a system with the reusability of the components, extensibility (e.g., with other positioning technologies and interoperability. Furthermore, the position is computed on a low-cost device such as a microcontroller, which simultaneously performs additional tasks such as data collecting or preprocessing based on an operating system. The platform architecture is designed, implemented and evaluated on the basis of two positioning systems: a field strength system and a time of arrival-based positioning system.
Distributed computing environments for future space control systems

Science.gov (United States)

Viallefont, Pierre

1993-01-01

The aim of this paper is to present the results of a CNES research project on distributed computing systems. The purpose of this research was to study the impact of the use of new computer technologies in the design and development of future space applications. The first part of this study was a state-of-the-art review of distributed computing systems. One of the interesting ideas arising from this review is the concept of a 'virtual computer' allowing the distributed hardware architecture to be hidden from a software application. The 'virtual computer' can improve system performance by adapting the best architecture (addition of computers) to the software application without having to modify its source code. This concept can also decrease the cost and obsolescence of the hardware architecture. In order to verify the feasibility of the 'virtual computer' concept, a prototype representative of a distributed space application is being developed independently of the hardware architecture.
Computer-Related Task Performance

DEFF Research Database (Denmark)

Longstreet, Phil; Xiao, Xiao; Sarker, Saonee

2016-01-01

The existing information system (IS) literature has acknowledged computer self-efficacy (CSE) as an important factor contributing to enhancements in computer-related task performance. However, the empirical results of CSE on performance have not always been consistent, and increasing an individual......'s CSE is often a cumbersome process. Thus, we introduce the theoretical concept of self-prophecy (SP) and examine how this social influence strategy can be used to improve computer-related task performance. Two experiments are conducted to examine the influence of SP on task performance. Results show...... that SP and CSE interact to influence performance. Implications are then discussed in terms of organizations’ ability to increase performance....
FPGA-accelerated simulation of computer systems

CERN Document Server

Angepat, Hari; Chung, Eric S; Hoe, James C; Chung, Eric S

2014-01-01

To date, the most common form of simulators of computer systems are software-based running on standard computers. One promising approach to improve simulation performance is to apply hardware, specifically reconfigurable hardware in the form of field programmable gate arrays (FPGAs). This manuscript describes various approaches of using FPGAs to accelerate software-implemented simulation of computer systems and selected simulators that incorporate those techniques. More precisely, we describe a simulation architecture taxonomy that incorporates a simulation architecture specifically designed f
Error Resilience in Current Distributed Video Coding Architectures

Directory of Open Access Journals (Sweden)

Tonoli Claudia

2009-01-01

Full Text Available In distributed video coding the signal prediction is shifted at the decoder side, giving therefore most of the computational complexity burden at the receiver. Moreover, since no prediction loop exists before transmission, an intrinsic robustness to transmission errors has been claimed. This work evaluates and compares the error resilience performance of two distributed video coding architectures. In particular, we have considered a video codec based on the Stanford architecture (DISCOVER codec and a video codec based on the PRISM architecture. Specifically, an accurate temporal and rate/distortion based evaluation of the effects of the transmission errors for both the considered DVC architectures has been performed and discussed. These approaches have been also compared with H.264/AVC, in both cases of no error protection, and simple FEC error protection. Our evaluations have highlighted in all cases a strong dependence of the behavior of the various codecs to the content of the considered video sequence. In particular, PRISM seems to be particularly well suited for low-motion sequences, whereas DISCOVER provides better performance in the other cases.
Design and simulation of parallel and distributed architectures for images processing

International Nuclear Information System (INIS)

Pirson, Alain

1990-01-01

The exploitation of visual information requires special computers. The diversity of operations and the Computing power involved bring about structures founded on the concepts of concurrency and distributed processing. This work identifies a vision computer with an association of dedicated intelligent entities, exchanging messages according to the model of parallelism introduced by the language Occam. It puts forward an architecture of the 'enriched processor network' type. It consists of a classical multiprocessor structure where each node is provided with specific devices. These devices perform processing tasks as well as inter-nodes dialogues. Such an architecture benefits from the homogeneity of multiprocessor networks and the power of dedicated resources. Its implementation corresponds to that of a distributed structure, tasks being allocated to each Computing element. This approach culminates in an original architecture called ATILA. This modular structure is based on a transputer network supplied with vision dedicated co-processors and powerful communication devices. (author) [fr
CSP: A Multifaceted Hybrid Architecture for Space Computing

Science.gov (United States)

Rudolph, Dylan; Wilson, Christopher; Stewart, Jacob; Gauvin, Patrick; George, Alan; Lam, Herman; Crum, Gary Alex; Wirthlin, Mike; Wilson, Alex; Stoddard, Aaron

2014-01-01

Research on the CHREC Space Processor (CSP) takes a multifaceted hybrid approach to embedded space computing. Working closely with the NASA Goddard SpaceCube team, researchers at the National Science Foundation (NSF) Center for High-Performance Reconfigurable Computing (CHREC) at the University of Florida and Brigham Young University are developing hybrid space computers that feature an innovative combination of three technologies: commercial-off-the-shelf (COTS) devices, radiation-hardened (RadHard) devices, and fault-tolerant computing. Modern COTS processors provide the utmost in performance and energy-efficiency but are susceptible to ionizing radiation in space, whereas RadHard processors are virtually immune to this radiation but are more expensive, larger, less energy-efficient, and generations behind in speed and functionality. By featuring COTS devices to perform the critical data processing, supported by simpler RadHard devices that monitor and manage the COTS devices, and augmented with novel uses of fault-tolerant hardware, software, information, and networking within and between COTS devices, the resulting system can maximize performance and reliability while minimizing energy consumption and cost. NASA Goddard has adopted the CSP concept and technology with plans underway to feature flight-ready CSP boards on two upcoming space missions.
High performance matrix inversion based on LU factorization for multicore architectures

KAUST Repository

Dongarra, Jack

2011-01-01

The goal of this paper is to present an efficient implementation of an explicit matrix inversion of general square matrices on multicore computer architecture. The inversion procedure is split into four steps: 1) computing the LU factorization, 2) inverting the upper triangular U factor, 3) solving a linear system, whose solution yields inverse of the original matrix and 4) applying backward column pivoting on the inverted matrix. Using a tile data layout, which represents the matrix in the system memory with an optimized cache-aware format, the computation of the four steps is decomposed into computational tasks. A directed acyclic graph is generated on the fly which represents the program data flow. Its nodes represent tasks and edges the data dependencies between them. Previous implementations of matrix inversions, available in the state-of-the-art numerical libraries, are suffer from unnecessary synchronization points, which are non-existent in our implementation in order to fully exploit the parallelism of the underlying hardware. Our algorithmic approach allows to remove these bottlenecks and to execute the tasks with loose synchronization. A runtime environment system called QUARK is necessary to dynamically schedule our numerical kernels on the available processing units. The reported results from our LU-based matrix inversion implementation significantly outperform the state-of-the-art numerical libraries such as LAPACK (5x), MKL (5x) and ScaLAPACK (2.5x) on a contemporary AMD platform with four sockets and the total of 48 cores for a matrix of size 24000. A power consumption analysis shows that our high performance implementation is also energy efficient and substantially consumes less power than its competitors. © 2011 ACM.
Hybrid Cloud Computing Architecture Optimization by Total Cost of Ownership Criterion

Directory of Open Access Journals (Sweden)

Elena Valeryevna Makarenko

2014-12-01

Full Text Available Achieving the goals of information security is a key factor in the decision to outsource information technology and, in particular, to decide on the migration of organizational data, applications, and other resources to the infrastructure, based on cloud computing. And the key issue in the selection of optimal architecture and the subsequent migration of business applications and data to the cloud organization information environment is the question of the total cost of ownership of IT infrastructure. This paper focuses on solving the problem of minimizing the total cost of ownership cloud.
Accelerating Astronomy & Astrophysics in the New Era of Parallel Computing: GPUs, Phi and Cloud Computing

Science.gov (United States)

Ford, Eric B.; Dindar, Saleh; Peters, Jorg

2015-08-01

The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer
Resistive content addressable memory based in-memory computation architecture

KAUST Repository

Salama, Khaled N.; Zidan, Mohammed A.; Kurdahi, Fadi; Eltawil, Ahmed M.

2016-01-01

Various examples are provided examples related to resistive content addressable memory (RCAM) based in-memory computation architectures. In one example, a system includes a content addressable memory (CAM) including an array of cells having a memristor based crossbar and an interconnection switch matrix having a gateless memristor array, which is coupled to an output of the CAM. In another example, a method, includes comparing activated bit values stored a key register with corresponding bit values in a row of a CAM, setting a tag bit value to indicate that the activated bit values match the corresponding bit values, and writing masked key bit values to corresponding bit locations in the row of the CAM based on the tag bit value.
Resistive content addressable memory based in-memory computation architecture

KAUST Repository

Salama, Khaled N.

2016-12-08

Various examples are provided examples related to resistive content addressable memory (RCAM) based in-memory computation architectures. In one example, a system includes a content addressable memory (CAM) including an array of cells having a memristor based crossbar and an interconnection switch matrix having a gateless memristor array, which is coupled to an output of the CAM. In another example, a method, includes comparing activated bit values stored a key register with corresponding bit values in a row of a CAM, setting a tag bit value to indicate that the activated bit values match the corresponding bit values, and writing masked key bit values to corresponding bit locations in the row of the CAM based on the tag bit value.
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

Science.gov (United States)

Choudhary, Alok Nidhi

1989-01-01

Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.
A Performance/Cost Evaluation for a GPU-Based Drug Discovery Application on Volunteer Computing

Science.gov (United States)

Guerrero, Ginés D.; Imbernón, Baldomero; García, José M.

2014-01-01

Bioinformatics is an interdisciplinary research field that develops tools for the analysis of large biological databases, and, thus, the use of high performance computing (HPC) platforms is mandatory for the generation of useful biological knowledge. The latest generation of graphics processing units (GPUs) has democratized the use of HPC as they push desktop computers to cluster-level performance. Many applications within this field have been developed to leverage these powerful and low-cost architectures. However, these applications still need to scale to larger GPU-based systems to enable remarkable advances in the fields of healthcare, drug discovery, genome research, etc. The inclusion of GPUs in HPC systems exacerbates power and temperature issues, increasing the total cost of ownership (TCO). This paper explores the benefits of volunteer computing to scale bioinformatics applications as an alternative to own large GPU-based local infrastructures. We use as a benchmark a GPU-based drug discovery application called BINDSURF that their computational requirements go beyond a single desktop machine. Volunteer computing is presented as a cheap and valid HPC system for those bioinformatics applications that need to process huge amounts of data and where the response time is not a critical factor. PMID:25025055
Staggered Dslash Performance on Intel Xeon Phi Architecture

OpenAIRE

Li, Ruizi; Gottlieb, Steven

2014-01-01

The conjugate gradient (CG) algorithm is among the most essential and time consuming parts of lattice calculations with staggered quarks. We test the performance of CG and dslash, the key step in the CG algorithm, on the Intel Xeon Phi, also known as the Many Integrated Core (MIC) architecture. We try different parallelization strategies using MPI, OpenMP, and the vector processing units (VPUs).
Architecture of 32 bit CISC (Complex Instruction Set Computer) microprocessors

International Nuclear Information System (INIS)

Jove, T.M.; Ayguade, E.; Valero, M.

1988-01-01

In this paper we describe the main topics about the architecture of the best known 32-bit CISC microprocessors; i80386, MC68000 family, NS32000 series and Z80000. We focus on the high level languages support, operating system design facilities, memory management, techniques to speed up the overall performance and program debugging facilities. (Author)

Systemic Architecture

DEFF Research Database (Denmark)

Poletto, Marco; Pasquero, Claudia

-up or tactical design, behavioural space and the boundary of the natural and the artificial realms within the city and architecture. A new kind of "real-time world-city" is illustrated in the form of an operational design manual for the assemblage of proto-architectures, the incubation of proto-gardens...... and the coding of proto-interfaces. These prototypes of machinic architecture materialize as synthetic hybrids embedded with biological life (proto-gardens), computational power, behavioural responsiveness (cyber-gardens), spatial articulation (coMachines and fibrous structures), remote sensing (FUNclouds...
Combining Self-Explaining with Computer Architecture Diagrams to Enhance the Learning of Assembly Language Programming

Science.gov (United States)

Hung, Y.-C.

2012-01-01

This paper investigates the impact of combining self explaining (SE) with computer architecture diagrams to help novice students learn assembly language programming. Pre- and post-test scores for the experimental and control groups were compared and subjected to covariance (ANCOVA) statistical analysis. Results indicate that the SE-plus-diagram…
Introduction to massively-parallel computing in high-energy physics

CERN Document Server

AUTHOR|(CDS)2083520

1993-01-01

Ever since computers were first used for scientific and numerical work, there has existed an "arms race" between the technical development of faster computing hardware, and the desires of scientists to solve larger problems in shorter time-scales. However, the vast leaps in processor performance achieved through advances in semi-conductor science have reached a hiatus as the technology comes up against the physical limits of the speed of light and quantum effects. This has lead all high performance computer manufacturers to turn towards a parallel architecture for their new machines. In these lectures we will introduce the history and concepts behind parallel computing, and review the various parallel architectures and software environments currently available. We will then introduce programming methodologies that allow efficient exploitation of parallel machines, and present case studies of the parallelization of typical High Energy Physics codes for the two main classes of parallel computing architecture (S...
Sprinting performance on the Woodway Curve 3.0 is related to muscle architecture.

Science.gov (United States)

Mangine, Gerald T; Fukuda, David H; Townsend, Jeremy R; Wells, Adam J; Gonzalez, Adam M; Jajtner, Adam R; Bohner, Jonathan D; LaMonica, Michael; Hoffman, Jay R; Fragala, Maren S; Stout, Jeffrey R

2015-01-01

To determine if unilateral measures of muscle architecture in the rectus femoris (RF) and vastus lateralis (VL) were related to (and predictive of) sprinting speed and unilateral (and bilateral) force (FRC) and power (POW) during a 30 s maximal sprint on the Woodway Curve 3.0 non-motorized treadmill. Twenty-eight healthy, physically active men (n = 14) and women (n = 14) (age = 22.9 ± 2.4 years; body mass = 77.1 ± 16.2 kg; height = 171.6 ± 11.2 cm; body-fa t = 19.4 ± 8.1%) completed one familiarization and one 30-s maximal sprint on the TM to obtain maximal sprinting speed, POW and FRC. Muscle thickness (MT), cross-sectional area (CSA) and echo intensity (ECHO) of the RF and VL in the dominant (DOM; determined by unilateral sprinting power) and non-dominant (ND) legs were measured via ultrasound. Pearson correlations indicated several significant (p architecture. Stepwise regression indicated that POW(DOM) was predictive of ipsilateral RF (MT and CSA) and VL (CSA and ECHO), while POW(ND) was predictive of ipsilateral RF (MT and CSA) and VL (CSA); sprinting power/force asymmetry was not predictive of architecture asymmetry. Sprinting time was best predicted by peak power and peak force, though muscle quality (ECHO) and the bilateral percent difference in VL (CSA) were strong architectural predictors. Muscle architecture is related to (and predictive of) TM sprinting performance, while unilateral POW is predictive of ipsilateral architecture. However, the extent to which architecture and other factors (i.e. neuromuscular control and sprinting technique) affect TM performance remains unknown.
Architecture and Initial Development of a Digital Library Platform for Computable Knowledge Objects for Health.

Science.gov (United States)

Flynn, Allen J; Bahulekar, Namita; Boisvert, Peter; Lagoze, Carl; Meng, George; Rampton, James; Friedman, Charles P

2017-01-01

Throughout the world, biomedical knowledge is routinely generated and shared through primary and secondary scientific publications. However, there is too much latency between publication of knowledge and its routine use in practice. To address this latency, what is actionable in scientific publications can be encoded to make it computable. We have created a purpose-built digital library platform to hold, manage, and share actionable, computable knowledge for health called the Knowledge Grid Library. Here we present it with its system architecture.
1995 CERN school of computing. Proceedings

Energy Technology Data Exchange (ETDEWEB)

Vandoni, C E [ed.

1995-10-25

These proceedings contain a written account of the majority of the lectures given at the 1995 CERN School of Computing. The Scientific Programme was articulated on 8 main themes: Human Computer Interfaces; Collaborative Software Engineering; Information Super Highways; Trends in Computer Architecture/Industry; Parallel Architectures (MPP); Mathematical Computing; Data Acquisition Systems; World-Wide Web for Physics. A number of lectures dealt with general aspects of computing, in particular in the area of Human Computer Interfaces (computer graphics, user interface tools and virtual reality). Applications in HEP of computer graphics (event display) was the subject of two lectures. The main theme of Mathematical Computing covered Mathematica and the usage of statistics packages. The important subject of Data Acqusition Systems was covered by lectures on switching techniques and simulation and modelling tools. A series of lectures dealt with the Information Super Highways and World-Wide Web Technology and its applications to High Energy Physics. Different aspects of Object Oriented Information Engineering Methodology and Object Oriented Programming in HEP were dealt in detail also in connection with data acquisition systems. On the theme `Trends in Computer Architecutre and Industry` lectures were given on: ATM Switching, and FORTRAN90 and High Performance FORTRAN. Computer Parallel Architectures (MPP) lectures delt with very large scale open systems, history and future of computer system architecture, message passing paradigm, features of PVM and MPI. (orig.).
1995 CERN school of computing. Proceedings

International Nuclear Information System (INIS)

Vandoni, C.E.

1995-01-01

These proceedings contain a written account of the majority of the lectures given at the 1995 CERN School of Computing. The Scientific Programme was articulated on 8 main themes: Human Computer Interfaces; Collaborative Software Engineering; Information Super Highways; Trends in Computer Architecture/Industry; Parallel Architectures (MPP); Mathematical Computing; Data Acquisition Systems; World-Wide Web for Physics. A number of lectures dealt with general aspects of computing, in particular in the area of Human Computer Interfaces (computer graphics, user interface tools and virtual reality). Applications in HEP of computer graphics (event display) was the subject of two lectures. The main theme of Mathematical Computing covered Mathematica and the usage of statistics packages. The important subject of Data Acqusition Systems was covered by lectures on switching techniques and simulation and modelling tools. A series of lectures dealt with the Information Super Highways and World-Wide Web Technology and its applications to High Energy Physics. Different aspects of Object Oriented Information Engineering Methodology and Object Oriented Programming in HEP were dealt in detail also in connection with data acquisition systems. On the theme 'Trends in Computer Architecutre and Industry' lectures were given on: ATM Switching, and FORTRAN90 and High Performance FORTRAN. Computer Parallel Architectures (MPP) lectures delt with very large scale open systems, history and future of computer system architecture, message passing paradigm, features of PVM and MPI. (orig.)
Automated Improvement of Software Architecture Models for Performance and Other Quality Attributes

OpenAIRE

Koziolek, Anne

2013-01-01

Quality attributes, such as performance or reliability, are crucial for the success of a software system and largely influenced by the software architecture. Their quantitative prediction supports systematic, goal-oriented software design and forms a base of an engineering approach to software design. This thesis proposes a method and tool to automatically improve component-based software architecture (CBA) models based on such quantitative quality prediction techniques.
Heterogeneous System Architectures from APUs to discrete GPUs

CERN Multimedia

CERN. Geneva

2013-01-01

We will present the Heterogeneous Systems Architectures that new AMD processors are bringing with the new GCN based GPUs and the new APUs. We will show how together they represent a huge step forward for programming flexibility and performance efficiently for Compute.
FPGA-based architecture for motion recovering in real-time

Science.gov (United States)

Arias-Estrada, Miguel; Maya-Rueda, Selene E.; Torres-Huitzil, Cesar

2002-03-01

A key problem in the computer vision field is the measurement of object motion in a scene. The main goal is to compute an approximation of the 3D motion from the analysis of an image sequence. Once computed, this information can be used as a basis to reach higher level goals in different applications. Motion estimation algorithms pose a significant computational load for the sequential processors limiting its use in practical applications. In this work we propose a hardware architecture for motion estimation in real time based on FPGA technology. The technique used for motion estimation is Optical Flow due to its accuracy, and the density of velocity estimation, however other techniques are being explored. The architecture is composed of parallel modules working in a pipeline scheme to reach high throughput rates near gigaflops. The modules are organized in a regular structure to provide a high degree of flexibility to cover different applications. Some results will be presented and the real-time performance will be discussed and analyzed. The architecture is prototyped in an FPGA board with a Virtex device interfaced to a digital imager.
The EPOS ICT Architecture

Science.gov (United States)

Jeffery, Keith; Harrison, Matt; Bailo, Daniele

2016-04-01

parallel the ICT team is tracking developments in ICT for relevance to EPOS-IP. In particular, the potential utilisation of e-Is (e-Infrastructures) such as GEANT(network), AARC (security), EGI (GRID computing), EUDAT (data curation), PRACE (High Performance Computing), HELIX-Nebula / Open Science Cloud (Cloud computing) are being assessed. Similarly relationships to other e-RIs (e-Research Infrastructures) such as ENVRI+, EXCELERATE and other ESFRI (European Strategic Forum for Research Infrastructures) projects are developed to share experience and technology and to promote interoperability. EPOS ICT team members are also involved in VRE4EIC, a project developing a reference architecture and component software services for a Virtual Research Environment to be superimposed on EPOS-ICS. The challenge which is being tackled now is therefore to keep consistency and interoperability among the different modules, initiatives and actors which participate to the process of running the EPOS platform. It implies both a continuous update about IT aspects of mentioned initiatives and a refinement of the e-architecture designed so far. One major aspect of EPOS-IP is the ICT support for legalistic, financial and governance aspects of the EPOS ERIC to be initiated during EPOS-IP. This implies a sophisticated AAAI (Authentication, authorization, accounting infrastructure) with consistency throughout the software, communications and data stack.
Comparison of Three Smart Camera Architectures for Real-Time Machine Vision System

Directory of Open Access Journals (Sweden)

Abdul Waheed Malik

2013-12-01

Full Text Available This paper presents a machine vision system for real-time computation of distance and angle of a camera from a set of reference points located on a target board. Three different smart camera architectures were explored to compare performance parameters such as power consumption, frame speed and latency. Architecture 1 consists of hardware machine vision modules modeled at Register Transfer (RT level and a soft-core processor on a single FPGA chip. Architecture 2 is commercially available software based smart camera, Matrox Iris GT. Architecture 3 is a two-chip solution composed of hardware machine vision modules on FPGA and an external microcontroller. Results from a performance comparison show that Architecture 2 has higher latency and consumes much more power than Architecture 1 and 3. However, Architecture 2 benefits from an easy programming model. Smart camera system with FPGA and external microcontroller has lower latency and consumes less power as compared to single FPGA chip having hardware modules and soft-core processor.
HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

Science.gov (United States)

van Dyk, Danny; Geveler, Markus; Mallach, Sven; Ribbrock, Dirk; Göddeke, Dominik; Gutwenger, Carsten

2009-12-01

We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI's libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI's SSE backend, and additional 3-4 and 4-16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development. Program summaryProgram title: HONEI Catalogue identifier: AEDW_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDW_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPLv2 No. of lines in distributed program, including test data, etc.: 216 180 No. of bytes in distributed program, including test data, etc.: 1 270 140 Distribution format: tar.gz Programming language: C++ Computer: x86, x86_64, NVIDIA CUDA GPUs, Cell blades and PlayStation 3 Operating system: Linux RAM: at least 500 MB free Classification: 4.8, 4.3, 6.1 External routines: SSE: none; [1] for GPU, [2] for Cell backend Nature of problem: Computational science in general and numerical simulation in particular have reached a turning point. The revolution developers are facing is not primarily driven by a change in (problem-specific) methodology, but rather by the fundamental paradigm shift of the
High performance statistical computing with parallel R: applications to biology and climate modelling

International Nuclear Information System (INIS)

Samatova, Nagiza F; Branstetter, Marcia; Ganguly, Auroop R; Hettich, Robert; Khan, Shiraj; Kora, Guruprasad; Li, Jiangtian; Ma, Xiaosong; Pan, Chongle; Shoshani, Arie; Yoginath, Srikanth

2006-01-01

Ultrascale computing and high-throughput experimental technologies have enabled the production of scientific data about complex natural phenomena. With this opportunity, comes a new problem - the massive quantities of data so produced. Answers to fundamental questions about the nature of those phenomena remain largely hidden in the produced data. The goal of this work is to provide a scalable high performance statistical data analysis framework to help scientists perform interactive analyses of these raw data to extract knowledge. Towards this goal we have been developing an open source parallel statistical analysis package, called Parallel R, that lets scientists employ a wide range of statistical analysis routines on high performance shared and distributed memory architectures without having to deal with the intricacies of parallelizing these routines
Achieving high performance in numerical computations on RISC workstations and parallel systems

Energy Technology Data Exchange (ETDEWEB)

Goedecker, S. [Max-Planck Inst. for Solid State Research, Stuttgart (Germany); Hoisie, A. [Los Alamos National Lab., NM (United States)

1997-08-20

The nominal peak speeds of both serial and parallel computers is raising rapidly. At the same time however it is becoming increasingly difficult to get out a significant fraction of this high peak speed from modern computer architectures. In this tutorial the authors give the scientists and engineers involved in numerically demanding calculations and simulations the necessary basic knowledge to write reasonably efficient programs. The basic principles are rather simple and the possible rewards large. Writing a program by taking into account optimization techniques related to the computer architecture can significantly speedup your program, often by factors of 10--100. As such, optimizing a program can for instance be a much better solution than buying a faster computer. If a few basic optimization principles are applied during program development, the additional time needed for obtaining an efficient program is practically negligible. In-depth optimization is usually only needed for a few subroutines or kernels and the effort involved is therefore also acceptable.
SpaceCubeX: A Framework for Evaluating Hybrid Multi-Core CPU FPGA DSP Architectures

Science.gov (United States)

Schmidt, Andrew G.; Weisz, Gabriel; French, Matthew; Flatley, Thomas; Villalpando, Carlos Y.

2017-01-01

The SpaceCubeX project is motivated by the need for high performance, modular, and scalable on-board processing to help scientists answer critical 21st century questions about global climate change, air quality, ocean health, and ecosystem dynamics, while adding new capabilities such as low-latency data products for extreme event warnings. These goals translate into on-board processing throughput requirements that are on the order of 100-1,000 more than those of previous Earth Science missions for standard processing, compression, storage, and downlink operations. To study possible future architectures to achieve these performance requirements, the SpaceCubeX project provides an evolvable testbed and framework that enables a focused design space exploration of candidate hybrid CPU/FPGA/DSP processing architectures. The framework includes ArchGen, an architecture generator tool populated with candidate architecture components, performance models, and IP cores, that allows an end user to specify the type, number, and connectivity of a hybrid architecture. The framework requires minimal extensions to integrate new processors, such as the anticipated High Performance Spaceflight Computer (HPSC), reducing time to initiate benchmarking by months. To evaluate the framework, we leverage a wide suite of high performance embedded computing benchmarks and Earth science scenarios to ensure robust architecture characterization. We report on our projects Year 1 efforts and demonstrate the capabilities across four simulation testbed models, a baseline SpaceCube 2.0 system, a dual ARM A9 processor system, a hybrid quad ARM A53 and FPGA system, and a hybrid quad ARM A53 and DSP system.
Algorithmically specialized parallel computers

CERN Document Server

Snyder, Lawrence; Gannon, Dennis B

1985-01-01

Algorithmically Specialized Parallel Computers focuses on the concept and characteristics of an algorithmically specialized computer.This book discusses the algorithmically specialized computers, algorithmic specialization using VLSI, and innovative architectures. The architectures and algorithms for digital signal, speech, and image processing and specialized architectures for numerical computations are also elaborated. Other topics include the model for analyzing generalized inter-processor, pipelined architecture for search tree maintenance, and specialized computer organization for raster
A computational architecture for social agents

Energy Technology Data Exchange (ETDEWEB)

Bond, A.H. [California Institute of Technology, Pasadena, CA (United States)

1996-12-31

This article describes a new class of information-processing models for social agents. They axe derived from primate brain architecture, the processing in brain regions, the interactions among brain regions, and the social behavior of primates. In another paper, we have reviewed the neuroanatomical connections and functional involvements of cortical regions. We reviewed the evidence for a hierarchical architecture in the primate brain. By examining neuroanatomical evidence for connections among neural areas, we were able to establish anatomical regions and connections. We then examined evidence for specific functional involvements of the different neural axeas and found some support for hierarchical functioning, not only for the perception hierarchies but also for the planning and action hierarchy in the frontal lobes.
Polymorphous Computing Architecture (PCA) Kernel-Level Benchmarks

National Research Council Canada - National Science Library

Lebak, J

2004-01-01

.... "Computation" aspects include floating-point and integer performance, as well as the memory hierarchy, while the "communication" aspects include the network, the memory hierarchy, and the 110 capabilities...
''Beauty of Wholeness and Beauty of Partiality.'' New Terms Defining the Concept of Beauty in Architecture in Terms of Sustainability and Computer Aided Design

Science.gov (United States)

Farid, Ayman A.; Zaghloul, Weaam M.; Dewidar, Khaled M.

2014-01-01

The great shift in sustainability and computer aided design in the field of architecture caused a remarkable change in the architecture philosophy, new aspects of beauty and aesthetic values are being introduced, and traditional definitions for beauty cannot fully cover this aspects, which causes a gap between; new architecture works criticism and…

Are Nanotube Architectures More Advantageous Than Nanowire Architectures For Field Effect Transistors?

KAUST Repository

Fahad, Hossain M.

2012-06-27

Decade long research in 1D nanowire field effect transistors (FET) shows although it has ultra-low off-state leakage current and a single device uses a very small area, its drive current generation per device is extremely low. Thus it requires arrays of nanowires to be integrated together to achieve appreciable amount of current necessary for high performance computation causing an area penalty and compromised functionality. Here we show that a FET with a nanotube architecture and core-shell gate stacks is capable of achieving the desirable leakage characteristics of the nanowire FET while generating a much larger drive current with area efficiency. The core-shell gate stacks of silicon nanotube FETs tighten the electrostatic control and enable volume inversion mode operation leading to improved short channel behavior and enhanced performance. Our comparative study is based on semi-classical transport models with quantum confinement effects which offers new opportunity for future generation high performance computation.
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.

Energy Technology Data Exchange (ETDEWEB)

Deveci, Mehmet [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Rajamanickam, Sivasankaran [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Trott, Christian Robert [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2017-12-01

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.
Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.

Energy Technology Data Exchange (ETDEWEB)

Deveci, Mehmet [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Trott, Christian Robert [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Rajamanickam, Sivasankaran [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

2018-01-01

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and data structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.
Improving the energy performance of historic buildings with architectural and cultural values

DEFF Research Database (Denmark)

Hansen, Ernst Jan de Place

2017-01-01

The thermal performance of solid walls of historic buildings can be improved by external or internal insulation. External insulation is preferred from a technical perspective, but is often disregarded as many such buildings have architectural or cultural values leaving internal insulation.......g. improvement of thermal indoor climate. The paper discusses different motivating factors for improving the thermal performance of solid walls in historic buildings with architectural and cultural values. It is argued that internal insulation, provided that it can be done without resulting in critical moisture...... as the only possible solution. As internal insulation is considered a risky way of improving the thermal performance from a moisture perspective, technically feasible solutions are needed. Further, other arguments than energy saving could convince a building owner to carry out internal insulation, e...
Computational Biology and High Performance Computing 2000

Energy Technology Data Exchange (ETDEWEB)

Simon, Horst D.; Zorn, Manfred D.; Spengler, Sylvia J.; Shoichet, Brian K.; Stewart, Craig; Dubchak, Inna L.; Arkin, Adam P.

2000-10-19

The pace of extraordinary advances in molecular biology has accelerated in the past decade due in large part to discoveries coming from genome projects on human and model organisms. The advances in the genome project so far, happening well ahead of schedule and under budget, have exceeded any dreams by its protagonists, let alone formal expectations. Biologists expect the next phase of the genome project to be even more startling in terms of dramatic breakthroughs in our understanding of human biology, the biology of health and of disease. Only today can biologists begin to envision the necessary experimental, computational and theoretical steps necessary to exploit genome sequence information for its medical impact, its contribution to biotechnology and economic competitiveness, and its ultimate contribution to environmental quality. High performance computing has become one of the critical enabling technologies, which will help to translate this vision of future advances in biology into reality. Biologists are increasingly becoming aware of the potential of high performance computing. The goal of this tutorial is to introduce the exciting new developments in computational biology and genomics to the high performance computing community.
A Performance/Cost Evaluation for a GPU-Based Drug Discovery Application on Volunteer Computing

Directory of Open Access Journals (Sweden)

Ginés D. Guerrero

2014-01-01

Full Text Available Bioinformatics is an interdisciplinary research field that develops tools for the analysis of large biological databases, and, thus, the use of high performance computing (HPC platforms is mandatory for the generation of useful biological knowledge. The latest generation of graphics processing units (GPUs has democratized the use of HPC as they push desktop computers to cluster-level performance. Many applications within this field have been developed to leverage these powerful and low-cost architectures. However, these applications still need to scale to larger GPU-based systems to enable remarkable advances in the fields of healthcare, drug discovery, genome research, etc. The inclusion of GPUs in HPC systems exacerbates power and temperature issues, increasing the total cost of ownership (TCO. This paper explores the benefits of volunteer computing to scale bioinformatics applications as an alternative to own large GPU-based local infrastructures. We use as a benchmark a GPU-based drug discovery application called BINDSURF that their computational requirements go beyond a single desktop machine. Volunteer computing is presented as a cheap and valid HPC system for those bioinformatics applications that need to process huge amounts of data and where the response time is not a critical factor.
Quantum Computing for Computer Architects

CERN Document Server

Metodi, Tzvetan

2011-01-01

Quantum computers can (in theory) solve certain problems far faster than a classical computer running any known classical algorithm. While existing technologies for building quantum computers are in their infancy, it is not too early to consider their scalability and reliability in the context of the design of large-scale quantum computers. To architect such systems, one must understand what it takes to design and model a balanced, fault-tolerant quantum computer architecture. The goal of this lecture is to provide architectural abstractions for the design of a quantum computer and to explore
HyperCell: A Bio-inspired Design Framework for Real-time Interactive Architectures

Directory of Open Access Journals (Sweden)

Jia-Rey Chang

2018-01-01

Full Text Available This pioneering research focuses on Biomimetic Interactive Architecture using “Computation”, “Embodiment”, and “Biology” to generate an intimate embodied convergence to propose a novel rule-based design framework for creating organic architectures composed of swarm-based intelligent components. Furthermore, the research boldly claims that Interactive Architecture should emerge as the next truly Organic Architecture. As the world and society are dynamically changing, especially in this digital era, the research dares to challenge the Utilitas, Firmitas, and Venustas of the traditional architectural Weltanschauung, and rejects them by adopting the novel notion that architecture should be dynamic, fluid, and interactive. This project reflects a trajectory from the 1960’s with the advent of the avant-garde architectural design group, Archigram, and its numerous intriguing and pioneering visionary projects. Archigram’s non-standard, mobile, and interactive projects profoundly influenced a new generation of architects to explore the connection between technology and their architectural projects. This research continues this trend of exploring novel design thinking and the framework of Interactive Architecture by discovering the interrelationship amongst three major topics: “Computation”, “Embodiment”, and “Biology”. The project aims to elucidate pioneering research combining these three topics in one discourse: “Bio-inspired digital architectural design”. These three major topics will be introduced in this Summary. “Computation”, is any type of calculation that includes both arithmetical and nonarithmetical steps and follows a well-defined model understood and described as, for example, an algorithm. But, in this research, refers to the use of data storage, parametric design application, and physical computing for developing informed architectural designs. “Form” has always been the most critical focus in
The ATLAS Analysis Architecture

International Nuclear Information System (INIS)

Cranmer, K.S.

2008-01-01

We present an overview of the ATLAS analysis architecture including the relevant aspects of the computing model and the major architectural aspects of the Athena framework. Emphasis will be given to the interplay between the analysis use cases and the technical aspects of the architecture including the design of the event data model, transient-persistent separation, data reduction strategies, analysis tools, and ROOT interoperability
The Performance and Compatibility of Thin Client Computing with Fleet Operations

National Research Council Canada - National Science Library

Landry, Kenneth J

2006-01-01

...) with a thin client/server-based computing (TCSBC) architecture. After becoming nearly extinct in the early 1990s, thin clients are emerging on the forefront of technology with numerous bandwidth improvements and cost reduction benefits...
Polymorphous computing fabric

Science.gov (United States)

Wolinski, Christophe Czeslaw [Los Alamos, NM; Gokhale, Maya B [Los Alamos, NM; McCabe, Kevin Peter [Los Alamos, NM

2011-01-18

Fabric-based computing systems and methods are disclosed. A fabric-based computing system can include a polymorphous computing fabric that can be customized on a per application basis and a host processor in communication with said polymorphous computing fabric. The polymorphous computing fabric includes a cellular architecture that can be highly parameterized to enable a customized synthesis of fabric instances for a variety of enhanced application performances thereof. A global memory concept can also be included that provides the host processor random access to all variables and instructions associated with the polymorphous computing fabric.
Combining Performance and Flexibility for RMS with a Hybrid Architecture

NARCIS (Netherlands)

Dennis Koole; Arjan Groenewegen; Daniël Telgen; Patrick Wit; Leo van Moergestel; Arjan van Zanten; John-Jules Meyer; Ing. Erik Puik; Dick van der Steen; Pascal Muller

2013-01-01

Author supplied Combining Performance and Flexibility for RMS with a Hybrid Architecture Dani¨el Telgen 12? , Leo van Moergestel 1 , Erik Puik 1 , Pascal Muller 1 , Arjan Groenewegen 1 , Dick van der Steen 1 , Dennis Koole 1 , Patrick de Wit 1 , Arjen van Zanten 1 , and John-Jules
High performance in software development

CERN Multimedia

CERN. Geneva; Haapio, Petri; Liukkonen, Juha-Matti

2015-01-01

What are the ingredients of high-performing software? Software development, especially for large high-performance systems, is one the most complex tasks mankind has ever tried. Technological change leads to huge opportunities but challenges our old ways of working. Processing large data sets, possibly in real time or with other tight computational constraints, requires an efficient solution architecture. Efficiency requirements span from the distributed storage and large-scale organization of computation and data onto the lowest level of processor and data bus behavior. Integrating performance behavior over these levels is especially important when the computation is resource-bounded, as it is in numerics: physical simulation, machine learning, estimation of statistical models, etc. For example, memory locality and utilization of vector processing are essential for harnessing the computing power of modern processor architectures due to the deep memory hierarchies of modern general-purpose computers. As a r...
Migration of vectorized iterative solvers to distributed memory architectures

Energy Technology Data Exchange (ETDEWEB)

Pommerell, C. [AT& T Bell Labs., Murray Hill, NJ (United States); Ruehl, R. [CSCS-ETH, Manno (Switzerland)

1994-12-31

Both necessity and opportunity motivate the use of high-performance computers for iterative linear solvers. Necessity results from the size of the problems being solved-smaller problems are often better handled by direct methods. Opportunity arises from the formulation of the iterative methods in terms of simple linear algebra operations, even if this {open_quote}natural{close_quotes} parallelism is not easy to exploit in irregularly structured sparse matrices and with good preconditioners. As a result, high-performance implementations of iterative solvers have attracted a lot of interest in recent years. Most efforts are geared to vectorize or parallelize the dominating operation-structured or unstructured sparse matrix-vector multiplication, or to increase locality and parallelism by reformulating the algorithm-reducing global synchronization in inner products or local data exchange in preconditioners. Target architectures for iterative solvers currently include mostly vector supercomputers and architectures with one or few optimized (e.g., super-scalar and/or super-pipelined RISC) processors and hierarchical memory systems. More recently, parallel computers with physically distributed memory and a better price/performance ratio have been offered by vendors as a very interesting alternative to vector supercomputers. However, programming comfort on such distributed memory parallel processors (DMPPs) still lags behind. Here the authors are concerned with iterative solvers and their changing computing environment. In particular, they are considering migration from traditional vector supercomputers to DMPPs. Application requirements force one to use flexible and portable libraries. They want to extend the portability of iterative solvers rather than reimplementing everything for each new machine, or even for each new architecture.
Behavioral Simulation and Performance Evaluation of Multi-Processor Architectures

Directory of Open Access Journals (Sweden)

Ausif Mahmood

1996-01-01

Full Text Available The development of multi-processor architectures requires extensive behavioral simulations to verify the correctness of design and to evaluate its performance. A high level language can provide maximum flexibility in this respect if the constructs for handling concurrent processes and a time mapping mechanism are added. This paper describes a novel technique for emulating hardware processes involved in a parallel architecture such that an object-oriented description of the design is maintained. The communication and synchronization between hardware processes is handled by splitting the processes into their equivalent subprograms at the entry points. The proper scheduling of these subprograms is coordinated by a timing wheel which provides a time mapping mechanism. Finally, a high level language pre-processor is proposed so that the timing wheel and the process emulation details can be made transparent to the user.
GW Calculations of Materials on the Intel Xeon-Phi Architecture

Science.gov (United States)

Deslippe, Jack; da Jornada, Felipe H.; Vigil-Fowler, Derek; Biller, Ariel; Chelikowsky, James R.; Louie, Steven G.

Intel Xeon-Phi processors are expected to power a large number of High-Performance Computing (HPC) systems around the United States and the world in the near future. We evaluate the ability of GW and pre-requisite Density Functional Theory (DFT) calculations for materials on utilizing the Xeon-Phi architecture. We describe the optimization process and performance improvements achieved. We find that the GW method, like other higher level Many-Body methods beyond standard local/semilocal approximations to Kohn-Sham DFT, is particularly well suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-waves, band-pairs and frequencies. Support provided by the SCIDAC program, Department of Energy, Office of Science, Advanced Scientic Computing Research and Basic Energy Sciences. Grant Numbers DE-SC0008877 (Austin) and DE-AC02-05CH11231 (LBNL).
All-optical reservoir computing.

Science.gov (United States)

Duport, François; Schneider, Bendix; Smerieri, Anteo; Haelterman, Marc; Massar, Serge

2012-09-24

Reservoir Computing is a novel computing paradigm that uses a nonlinear recurrent dynamical system to carry out information processing. Recent electronic and optoelectronic Reservoir Computers based on an architecture with a single nonlinear node and a delay loop have shown performance on standardized tasks comparable to state-of-the-art digital implementations. Here we report an all-optical implementation of a Reservoir Computer, made of off-the-shelf components for optical telecommunications. It uses the saturation of a semiconductor optical amplifier as nonlinearity. The present work shows that, within the Reservoir Computing paradigm, all-optical computing with state-of-the-art performance is possible.
Engineering and Computing Portal to Solve Environmental Problems

Science.gov (United States)

Gudov, A. M.; Zavozkin, S. Y.; Sotnikov, I. Y.

2018-01-01

This paper describes architecture and services of the Engineering and Computing Portal, which is considered to be a complex solution that provides access to high-performance computing resources, enables to carry out computational experiments, teach parallel technologies and solve computing tasks, including technogenic safety ones.
High Performance Computing in Science and Engineering '15 : Transactions of the High Performance Computing Center

CERN Document Server

Kröner, Dietmar; Resch, Michael

2016-01-01

This book presents the state-of-the-art in supercomputer simulation. It includes the latest findings from leading researchers using systems from the High Performance Computing Center Stuttgart (HLRS) in 2015. The reports cover all fields of computational science and engineering ranging from CFD to computational physics and from chemistry to computer science with a special emphasis on industrially relevant applications. Presenting findings of one of Europe’s leading systems, this volume covers a wide variety of applications that deliver a high level of sustained performance. The book covers the main methods in high-performance computing. Its outstanding results in achieving the best performance for production codes are of particular interest for both scientists and engineers. The book comes with a wealth of color illustrations and tables of results.
High Performance Computing in Science and Engineering '17 : Transactions of the High Performance Computing Center

CERN Document Server

Kröner, Dietmar; Resch, Michael; HLRS 2017

2018-01-01

This book presents the state-of-the-art in supercomputer simulation. It includes the latest findings from leading researchers using systems from the High Performance Computing Center Stuttgart (HLRS) in 2017. The reports cover all fields of computational science and engineering ranging from CFD to computational physics and from chemistry to computer science with a special emphasis on industrially relevant applications. Presenting findings of one of Europe’s leading systems, this volume covers a wide variety of applications that deliver a high level of sustained performance.The book covers the main methods in high-performance computing. Its outstanding results in achieving the best performance for production codes are of particular interest for both scientists and engineers. The book comes with a wealth of color illustrations and tables of results.

Thermal performance measurement and application of a multilayer insulator for emergency architecture

International Nuclear Information System (INIS)

Salvalai, Graziano; Imperadori, Marco; Scaccabarozzi, Diego; Pusceddu, Cristina

2015-01-01

Lightness coupled with a quick assembly method is crucial for emergency architecture in post-disaster area where accessibility and action time play a huge barer to rescue people. In this prospective, the following work analyses the potentiality (technological and thermal performances) of multilayer insulator for a new shelter envelope able to provide superior thermal comfort for the users. The thermal characteristics are derived experimentally by means of a guard ring apparatus under different working temperatures. Tests are performed on the multilayer insulator itself and on a composite structure, made of the multilayer insulator and two air gaps wrapped by a polyester cover, which is the core of a new lightweight emergency architecture. Experimental results show good agreement with literature data, providing a thermal conductivity and transmittance of about 0.04 W/(m °C) and 1.6 W/(m 2 °C) for the tested multilayer. The composite structure called Thermo Reflective Multilayer System (TRMS) shows better insulation performances, providing a thermal transmittance set to 0.85 W/(m 2 °C). A thermal model of an emergency tent based on the new insulating structure (TRMS) has been developed and its thermal performances have been compared with those of a UNHCR traditional emergency shelter. The shelter model was simulated (Trnsys v.17 environment) in the winter season considering the climate of Belgrade and using only the casual gains from occupant and solar radiation through opaque wall. Numerical simulations evidenced that the new insulating composite envelope reduces required heating load of about two and four times with respect to the traditional insulation. The study sets a starting point to develop a lightweight emergency architecture made with a combination between multilayer, air, polyester and vulcanized rubber. - Highlights: • Multilayer insulator tested by means of a guard ring apparatus. • Thermo reflective multilayer system (TRMS) development
Micro-computed tomography assessment of human alveolar bone: bone density and three-dimensional micro-architecture.

Science.gov (United States)

Kim, Yoon Jeong; Henkin, Jeffrey

2015-04-01

Micro-computed tomography (micro-CT) is a valuable means to evaluate and secure information related to bone density and quality in human necropsy samples and small live animals. The aim of this study was to assess the bone density of the alveolar jaw bones in human cadaver, using micro-CT. The correlation between bone density and three-dimensional micro architecture of trabecular bone was evaluated. Thirty-four human cadaver jaw bone specimens were harvested. Each specimen was scanned with micro-CT at resolution of 10.5 μm. The bone volume fraction (BV/TV) and the bone mineral density (BMD) value within a volume of interest were measured. The three-dimensional micro architecture of trabecular bone was assessed. All the parameters in the maxilla and the mandible were subject to comparison. The variables for the bone density and the three-dimensional micro architecture were analyzed for nonparametric correlation using Spearman's rho at the significance level of p architecture parameters were consistently higher in the mandible, up to 3.3 times greater than those in the maxilla. The most linear correlation was observed between BV/TV and BMD, with Spearman's rho = 0.99 (p = .01). Both BV/TV and BMD were highly correlated with all micro architecture parameters with Spearman's rho above 0.74 (p = .01). Two aspects of bone density using micro-CT, the BV/TV and BMD, are highly correlated with three-dimensional micro architecture parameters, which represent the quality of trabecular bone. This noninvasive method may adequately enhance evaluation of the alveolar bone. © 2013 Wiley Periodicals, Inc.
BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics.

Science.gov (United States)

Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A

2012-01-01

Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.
High-performance Sonitopia (Sonic Utopia): Hyper intelligent Material-based Architectural Systems for Acoustic Energy Harvesting

Science.gov (United States)

Heidari, F.; Mahdavinejad, M.

2017-08-01

The rate of energy consumption in all over the world, based on reliable statistics of international institutions such as the International Energy Agency (IEA) shows significant increase in energy demand in recent years. Periodical recorded data shows a continuous increasing trend in energy consumption especially in developed countries as well as recently emerged developing economies such as China and India. While air pollution and water contamination as results of high consumption of fossil energy resources might be consider as menace to civic ideals such as livability, conviviality and people-oriented cities. In other hand, automobile dependency, cars oriented design and other noisy activities in urban spaces consider as threats to urban life. Thus contemporary urban design and planning concentrates on rethinking about ecology of sound, reorganizing the soundscape of neighborhoods, redesigning the sonic order of urban space. It seems that contemporary architecture and planning trends through soundscape mapping look for sonitopia (Sonic + Utopia) This paper is to propose some interactive hyper intelligent material-based architectural systems for acoustic energy harvesting. The proposed architectural design system may be result in high-performance architecture and planning strategies for future cities. The ultimate aim of research is to develop a comprehensive system for acoustic energy harvesting which cover the aim of noise reduction as well as being in harmony with architectural design. The research methodology is based on a literature review as well as experimental and quasi-experimental strategies according the paradigm of designedly ways of doing and knowing. While architectural design has solution-focused essence in problem-solving process, the proposed systems had better be hyper intelligent rather than predefined procedures. Therefore, the steps of the inference mechanism of the research include: 1- understanding sonic energy and noise potentials as energy
A Semi-Automated Machine Learning Algorithm for Tree Cover Delineation from 1-m Naip Imagery Using a High Performance Computing Architecture

Science.gov (United States)

Basu, S.; Ganguly, S.; Nemani, R. R.; Mukhopadhyay, S.; Milesi, C.; Votava, P.; Michaelis, A.; Zhang, G.; Cook, B. D.; Saatchi, S. S.; Boyda, E.

2014-12-01

Accurate tree cover delineation is a useful instrument in the derivation of Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) satellite imagery data. Numerous algorithms have been designed to perform tree cover delineation in high to coarse resolution satellite imagery, but most of them do not scale to terabytes of data, typical in these VHR datasets. In this paper, we present an automated probabilistic framework for the segmentation and classification of 1-m VHR data as obtained from the National Agriculture Imagery Program (NAIP) for deriving tree cover estimates for the whole of Continental United States, using a High Performance Computing Architecture. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field (CRF), which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by incorporating expert knowledge through the relabeling of misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the state of California, which covers a total of 11,095 NAIP tiles and spans a total geographical area of 163,696 sq. miles. Our framework produced correct detection rates of around 85% for fragmented forests and 70% for urban tree cover areas, with false positive rates lower than 3% for both regions. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR high-resolution canopy height model shows the effectiveness of our algorithm in generating accurate high-resolution tree cover maps.
A Compute Capable SSD Architecture for Next-Generation Non-volatile Memories

Energy Technology Data Exchange (ETDEWEB)

De, Arup [Univ. of California, San Diego, CA (United States)

2014-01-01

Existing storage technologies (e.g., disks and ash) are failing to cope with the processor and main memory speed and are limiting the overall perfor- mance of many large scale I/O or data-intensive applications. Emerging fast byte-addressable non-volatile memory (NVM) technologies, such as phase-change memory (PCM), spin-transfer torque memory (STTM) and memristor are very promising and are approaching DRAM-like performance with lower power con- sumption and higher density as process technology scales. These new memories are narrowing down the performance gap between the storage and the main mem- ory and are putting forward challenging problems on existing SSD architecture, I/O interface (e.g, SATA, PCIe) and software. This dissertation addresses those challenges and presents a novel SSD architecture called XSSD. XSSD o oads com- putation in storage to exploit fast NVMs and reduce the redundant data tra c across the I/O bus. XSSD o ers a exible RPC-based programming framework that developers can use for application development on SSD without dealing with the complication of the underlying architecture and communication management. We have built a prototype of XSSD on the BEE3 FPGA prototyping system. We implement various data-intensive applications and achieve speedup and energy ef- ciency of 1.5-8.9 and 1.7-10.27 respectively. This dissertation also compares XSSD with previous work on intelligent storage and intelligent memory. The existing ecosystem and these new enabling technologies make this system more viable than earlier ones.
Fog Computing and Edge Computing Architectures for Processing Data From Diabetes Devices Connected to the Medical Internet of Things.

Science.gov (United States)

Klonoff, David C

2017-07-01

The Internet of Things (IoT) is generating an immense volume of data. With cloud computing, medical sensor and actuator data can be stored and analyzed remotely by distributed servers. The results can then be delivered via the Internet. The number of devices in IoT includes such wireless diabetes devices as blood glucose monitors, continuous glucose monitors, insulin pens, insulin pumps, and closed-loop systems. The cloud model for data storage and analysis is increasingly unable to process the data avalanche, and processing is being pushed out to the edge of the network closer to where the data-generating devices are. Fog computing and edge computing are two architectures for data handling that can offload data from the cloud, process it nearby the patient, and transmit information machine-to-machine or machine-to-human in milliseconds or seconds. Sensor data can be processed near the sensing and actuating devices with fog computing (with local nodes) and with edge computing (within the sensing devices). Compared to cloud computing, fog computing and edge computing offer five advantages: (1) greater data transmission speed, (2) less dependence on limited bandwidths, (3) greater privacy and security, (4) greater control over data generated in foreign countries where laws may limit use or permit unwanted governmental access, and (5) lower costs because more sensor-derived data are used locally and less data are transmitted remotely. Connected diabetes devices almost all use fog computing or edge computing because diabetes patients require a very rapid response to sensor input and cannot tolerate delays for cloud computing.
PLM support to architecture based development

DEFF Research Database (Denmark)

Bruun, Hans Peter Lomholt

, organisation, processes, etc. To identify, evaluate, and align aspects of these domains are necessary for developing the optimal layout of product architectures. It is stated in this thesis that architectures describe building principles for products, product families, and product programs, where this project...... and developing architectures can be difficult to manage, update, and maintain during development. The concept of representing product architectures in computer-based product information tools has though been central in this research, and in the creation of results. A standard PLM tool (Windchill PDMLink...... architectures in computer systems. Presented results build on research literature and experiences from industrial partners. Verification of the theory contributions, approaches, models, and tools, have been carried out in industrial projects, with promising results. This thesis describes the means for: (1...
Computational Modeling of Human Multiple-Task Performance

National Research Council Canada - National Science Library

Kieras, David E; Meyer, David

2005-01-01

This is the final report for a project that was a continuation of an earlier, long-term project on the development and validation of the EPIC cognitive architecture for modeling human cognition and performance...
Standalone computer-aided detection compared to radiologists' performance for the detection of mammographic masses

International Nuclear Information System (INIS)

Hupse, Rianne; Samulski, Maurice; Imhof-Tas, Mechli W.; Karssemeijer, Nico; Lobbes, Marc; Boetes, Carla; Heeten, Ard den; Beijerinck, David; Pijnappel, Ruud

2013-01-01

We developed a computer-aided detection (CAD) system aimed at decision support for detection of malignant masses and architectural distortions in mammograms. The effect of this system on radiologists' performance depends strongly on its standalone performance. The purpose of this study was to compare the standalone performance of this CAD system to that of radiologists. In a retrospective study, nine certified screening radiologists and three residents read 200 digital screening mammograms without the use of CAD. Performances of the individual readers and of CAD were computed as the true-positive fraction (TPF) at a false-positive fraction of 0.05 and 0.2. Differences were analysed using an independent one-sample t-test. At a false-positive fraction of 0.05, the performance of CAD (TPF = 0.487) was similar to that of the certified screening radiologists (TPF = 0.518, P = 0.17). At a false-positive fraction of 0.2, CAD performance (TPF = 0.620) was significantly lower than the radiologist performance (TPF = 0.736, P <0.001). Compared to the residents, CAD performance was similar for all false-positive fractions. The sensitivity of CAD at a high specificity was comparable to that of human readers. These results show potential for CAD to be used as an independent reader in breast cancer screening. (orig.)
Optimizing Vector-Quantization Processor Architecture for Intelligent Query-Search Applications

Science.gov (United States)

Xu, Huaiyu; Mita, Yoshio; Shibata, Tadashi

2002-04-01

The architecture of a very large scale integration (VLSI) vector-quantization processor (VQP) has been optimized to develop a general-purpose intelligent query-search agent. The agent performs a similarity-based search in a large-volume database. Although similarity-based search processing is computationally very expensive, latency-free searches have become possible due to the highly parallel maximum-likelihood search architecture of the VQP chip. Three architectures of the VQP chip have been studied and their performances are compared. In order to give reasonable searching results according to the different policies, the concept of penalty function has been introduced into the VQP. An E-commerce real-estate agency system has been developed using the VQP chip implemented in a field-programmable gate array (FPGA) and the effectiveness of such an agency system has been demonstrated.
Kine-Mould : Manufacturing technology for curved architectural elements in concrete

NARCIS (Netherlands)

Schipper, H.R.; Eigenraam, P.; Grünewald, S.; Soru, M.; Nap, P.; Van Overveld, B.; Vermeulen, J.

2015-01-01

The production of architectural elements with complex geometry is challenging for concrete manufacturers. Computer-numerically-controlled (CNC) milled foam moulds have been applied frequently in the last decades, resulting in good aesthetical performance. However, still the costs are high and a
A Bandwidth-Optimized Multi-Core Architecture for Irregular Applications

Energy Technology Data Exchange (ETDEWEB)

Secchi, Simone; Tumeo, Antonino; Villa, Oreste

2012-05-31

This paper presents an architecture template for next-generation high performance computing systems specifically targeted to irregular applications. We start our work by considering that future generation interconnection and memory bandwidth full-system numbers are expected to grow by a factor of 10. In order to keep up with such a communication capacity, while still resorting to fine-grained multithreading as the main way to tolerate unpredictable memory access latencies of irregular applications, we show how overall performance scaling can benefit from the multi-core paradigm. At the same time, we also show how such an architecture template must be coupled with specific techniques in order to optimize bandwidth utilization and achieve the maximum scalability. We propose a technique based on memory references aggregation, together with the related hardware implementation, as one of such optimization techniques. We explore the proposed architecture template by focusing on the Cray XMT architecture and, using a dedicated simulation infrastructure, validate the performance of our template with two typical irregular applications. Our experimental results prove the benefits provided by both the multi-core approach and the bandwidth optimization reference aggregation technique.
An Architecture of IoT Service Delegation and Resource Allocation Based on Collaboration between Fog and Cloud Computing

Directory of Open Access Journals (Sweden)

Aymen Abdullah Alsaffar

2016-01-01

Full Text Available Despite the wide utilization of cloud computing (e.g., services, applications, and resources, some of the services, applications, and smart devices are not able to fully benefit from this attractive cloud computing paradigm due to the following issues: (1 smart devices might be lacking in their capacity (e.g., processing, memory, storage, battery, and resource allocation, (2 they might be lacking in their network resources, and (3 the high network latency to centralized server in cloud might not be efficient for delay-sensitive application, services, and resource allocations requests. Fog computing is promising paradigm that can extend cloud resources to edge of network, solving the abovementioned issue. As a result, in this work, we propose an architecture of IoT service delegation and resource allocation based on collaboration between fog and cloud computing. We provide new algorithm that is decision rules of linearized decision tree based on three conditions (services size, completion time, and VMs capacity for managing and delegating user request in order to balance workload. Moreover, we propose algorithm to allocate resources to meet service level agreement (SLA and quality of services (QoS as well as optimizing big data distribution in fog and cloud computing. Our simulation result shows that our proposed approach can efficiently balance workload, improve resource allocation efficiently, optimize big data distribution, and show better performance than other existing methods.
The role of FFM accumulation and skeletal muscle architecture in powerlifting performance.

Science.gov (United States)

Brechue, William F; Abe, Takashi

2002-02-01

The purpose of this study was to determine the distribution and architectural characteristics of skeletal muscle in elite powerlifters, and to investigate their relationship to fat-free mat (FFM) accumulation and powerlifting performance. Twenty elite male powerlifters (including four world and three US national champions) volunteered for this study. FFM, skeletal muscle distribution (muscle thickness at 13 anatomical sites), and isolated muscle thickness and fascicle pennation angle (PAN) of the triceps long-head (TL), vastus lateralis, and gastrocnemius medialis (MG) muscles were measured with B-mode ultrasound. Fascicle length (FAL) was calculated. Best lifting performance in the bench press (BP), squat lift (SQT), and dead lift (DL) was recorded from competition performance. Significant correlations (P FFM and FFM relative to standing height (r = 0.86 to 0.95, P FFM (r = 0.59, P FFM and, therefore, may be limited by the ability to accumulate FFM. Additionally, muscle architecture appears to play an important role in powerlifting performance in that greater fascicle lengths are associated with greater FFM accumulation and powerlifting performance.
A Parallel Implementation of a Smoothed Particle Hydrodynamics Method on Graphics Hardware Using the Compute Unified Device Architecture

International Nuclear Information System (INIS)

Wong Unhong; Wong Honcheng; Tang Zesheng

2010-01-01

The smoothed particle hydrodynamics (SPH), which is a class of meshfree particle methods (MPMs), has a wide range of applications from micro-scale to macro-scale as well as from discrete systems to continuum systems. Graphics hardware, originally designed for computer graphics, now provide unprecedented computational power for scientific computation. Particle system needs a huge amount of computations in physical simulation. In this paper, an efficient parallel implementation of a SPH method on graphics hardware using the Compute Unified Device Architecture is developed for fluid simulation. Comparing to the corresponding CPU implementation, our experimental results show that the new approach allows significant speedups of fluid simulation through handling huge amount of computations in parallel on graphics hardware.
GPU-computing in econophysics and statistical physics

Science.gov (United States)

Preis, T.

2011-03-01

A recent trend in computer science and related fields is general purpose computing on graphics processing units (GPUs), which can yield impressive performance. With multiple cores connected by high memory bandwidth, today's GPUs offer resources for non-graphics parallel processing. This article provides a brief introduction into the field of GPU computing and includes examples. In particular computationally expensive analyses employed in financial market context are coded on a graphics card architecture which leads to a significant reduction of computing time. In order to demonstrate the wide range of possible applications, a standard model in statistical physics - the Ising model - is ported to a graphics card architecture as well, resulting in large speedup values.
A parallel VLSI architecture for a digital filter of arbitrary length using Fermat number transforms

Science.gov (United States)

Truong, T. K.; Reed, I. S.; Yeh, C. S.; Shao, H. M.

1982-01-01

A parallel architecture for computation of the linear convolution of two sequences of arbitrary lengths using the Fermat number transform (FNT) is described. In particular a pipeline structure is designed to compute a 128-point FNT. In this FNT, only additions and bit rotations are required. A standard barrel shifter circuit is modified so that it performs the required bit rotation operation. The overlap-save method is generalized for the FNT to compute a linear convolution of arbitrary length. A parallel architecture is developed to realize this type of overlap-save method using one FNT and several inverse FNTs of 128 points. The generalized overlap save method alleviates the usual dynamic range limitation in FNTs of long transform lengths. Its architecture is regular, simple, and expandable, and therefore naturally suitable for VLSI implementation.
A performance model for the communication in fast multipole methods on high-performance computing platforms

KAUST Repository

Ibeid, Huda

2016-03-04

Exascale systems are predicted to have approximately 1 billion cores, assuming gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the currently dominant parallel programing model. Currently, there are many efforts to evaluate the hardware and software bottlenecks of exascale designs. It is therefore of interest to model application performance and to understand what changes need to be made to ensure extrapolated scalability. The fast multipole method (FMM) was originally developed for accelerating N-body problems in astrophysics and molecular dynamics but has recently been extended to a wider range of problems. Its high arithmetic intensity combined with its linear complexity and asynchronous communication patterns make it a promising algorithm for exascale systems. In this paper, we discuss the challenges for FMM on current parallel computers and future exascale architectures, with a focus on internode communication. We focus on the communication part only; the efficiency of the computational kernels are beyond the scope of the present study. We develop a performance model that considers the communication patterns of the FMM and observe a good match between our model and the actual communication time on four high-performance computing (HPC) systems, when latency, bandwidth, network topology, and multicore penalties are all taken into account. To our knowledge, this is the first formal characterization of internode communication in FMM that validates the model against actual measurements of communication time. The ultimate communication model is predictive in an absolute sense; however, on complex systems, this objective is often out of reach or of a difficulty out of proportion to its benefit when there exists a simpler model that is inexpensive and sufficient to guide coding decisions leading to improved scaling. The current model provides such guidance.
Real-time collaboration in activity-based architectures

DEFF Research Database (Denmark)

Bardram, Jakob Eyvind; Christensen, Henrik Bærbak

2004-01-01

With the growing research into mobile and ubiquitous computing, there is a need for addressing how such infrastructures can support collaboration between nomadic users. We present the activity based computing paradigm and outline a proposal for handling collaboration in an activity......-based architecture. We argue that activity-based computing establishes a natural and sound conceptual and architectural basis for session management in real-time, synchronous collaboration....

Innovative HPC architectures for the study of planetary plasma environments

Science.gov (United States)

Amaya, Jorge; Wolf, Anna; Lembège, Bertrand; Zitz, Anke; Alvarez, Damian; Lapenta, Giovanni

2016-04-01

DEEP-ER is an European Commission founded project that develops a new type of High Performance Computer architecture. The revolutionary system is currently used by KU Leuven to study the effects of the solar wind on the global environments of the Earth and Mercury. The new architecture combines the versatility of Intel Xeon computing nodes with the power of the upcoming Intel Xeon Phi accelerators. Contrary to classical heterogeneous HPC architectures, where it is customary to find CPU and accelerators in the same computing nodes, in the DEEP-ER system CPU nodes are grouped together (Cluster) and independently from the accelerator nodes (Booster). The system is equipped with a state of the art interconnection network, a highly scalable and fast I/O and a fail recovery resiliency system. The final objective of the project is to introduce a scalable system that can be used to create the next generation of exascale supercomputers. The code iPic3D from KU Leuven is being adapted to this new architecture. This particle-in-cell code can now perform the computation of the electromagnetic fields in the Cluster while the particles are moved in the Booster side. Using fast and scalable Xeon Phi accelerators in the Booster we can introduce many more particles per cell in the simulation than what is possible in the current generation of HPC systems, allowing to calculate fully kinetic plasmas with very low interpolation noise. The system will be used to perform fully kinetic, low noise, 3D simulations of the interaction of the solar wind with the magnetosphere of the Earth and Mercury. Preliminary simulations have been performed in other HPC centers in order to compare the results in different systems. In this presentation we show the complexity of the plasma flow around the planets, including the development of hydrodynamic instabilities at the flanks, the presence of the collision-less shock, the magnetosheath, the magnetopause, reconnection zones, the formation of the
PHENIX On-Line Distributed Computing System Architecture

International Nuclear Information System (INIS)

Desmond, Edmond; Haggerty, John; Kehayias, Hyon Joo; Purschke, Martin L.; Witzig, Chris; Kozlowski, Thomas

1997-01-01

PHENIX is one of the two large experiments at the Relativistic Heavy Ion Collider (RHIC) currently under construction at Brookhaven National Laboratory. The detector consists of 11 sub-detectors, that are further subdivided into 29 units (''granules'') that can be operated independently, which includes simultaneous data taking with independent data streams and independent triggers. The detector has 250,000 channels and is read out by front end modules, where the data is buffered in a pipeline while awaiting the level trigger decision. Zero suppression and calibration is done after the level accept in custom built data collection modules (DCMs) with DSPs before the data is sent to an event builder (design throughput of 2 Gb/sec) and higher level triggers. The On-line Computing Systems Group (ONCS) has two responsibilities. Firstly it is responsible for receiving the data from the event builder, routing it through a network of workstations to consumer processes and archiving it at a data rate of 20 MB/sec. Secondly it is also responsible for the overall configuration, control and operation of the detector and data acquisition chain, which comprises the software integration for several thousand custom built hardware modules. The software must furthermore support the independent operation of the above mentioned granules, which includes the coordination of processes that run in 60-100 VME processors and workstations. ONOS has adapted the Shlaer- Mellor Object Oriented Methodology for the design of the top layer software. CORBA is used as communication layer between the distributed objects, which are implemented as asynchronous finite state machines. We will give an overview of the PHENIX online system with the main focus on the system architecture, software components and integration tasks of the On-line Computing group ONCS and report on the status of the current prototypes
Design issues for numerical libraries on scalable multicore architectures

International Nuclear Information System (INIS)

Heroux, M A

2008-01-01

Future generations of scalable computers will rely on multicore nodes for a significant portion of overall system performance. At present, most applications and libraries cannot exploit multiple cores beyond running addition MPI processes per node. In this paper we discuss important multicore architecture issues, programming models, algorithms requirements and software design related to effective use of scalable multicore computers. In particular, we focus on important issues for library research and development, making recommendations for how to effectively develop libraries for future scalable computer systems
Operational Numerical Weather Prediction systems based on Linux cluster architectures

International Nuclear Information System (INIS)

Pasqui, M.; Baldi, M.; Gozzini, B.; Maracchi, G.; Giuliani, G.; Montagnani, S.

2005-01-01

The progress in weather forecast and atmospheric science has been always closely linked to the improvement of computing technology. In order to have more accurate weather forecasts and climate predictions, more powerful computing resources are needed, in addition to more complex and better-performing numerical models. To overcome such a large computing request, powerful workstations or massive parallel systems have been used. In the last few years, parallel architectures, based on the Linux operating system, have been introduced and became popular, representing real high performance-low cost systems. In this work the Linux cluster experience achieved at the Laboratory far Meteorology and Environmental Analysis (LaMMA-CNR-IBIMET) is described and tips and performances analysed
Super-computer architecture

CERN Document Server

Hockney, R W

1977-01-01

This paper examines the design of the top-of-the-range, scientific, number-crunching computers. The market for such computers is not as large as that for smaller machines, but on the other hand it is by no means negligible. The present work-horse machines in this category are the CDC 7600 and IBM 360/195, and over fifty of the former machines have been sold. The types of installation that form the market for such machines are not only the major scientific research laboratories in the major countries-such as Los Alamos, CERN, Rutherford laboratory-but also major universities or university networks. It is also true that, as with sports cars, innovations made to satisfy the top of the market today often become the standard for the medium-scale computer of tomorrow. Hence there is considerable interest in examining present developments in this area. (0 refs).
Development of a computerized handbook of architectural plans

NARCIS (Netherlands)

Koutamanis, A.

1990-01-01

The dissertation investigates an approach to the development of visual / spatial computer representations for architectural purposes through the development of the computerized handbook of architectural plans (chap), a knowledge-based computer system capable of recognizing the metric properties of
Architecture-Aware Configuration and Scheduling of Matrix Multiplication on Asymmetric Multicore Processors

OpenAIRE

Catalán, Sandra; Igual, Francisco D.; Mayo, Rafael; Rodríguez-Sánchez, Rafael; Quintana-Ortí, Enrique S.

2015-01-01

Asymmetric multicore processors (AMPs) have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for low-power high performance computing, this type of architectures is also being investigated as a means to improve the throughput-per-Watt of complex scientific applications. In this paper, we design and embed several architecture-aware ...
Computer systems a programmer's perspective

CERN Document Server

Bryant, Randal E

2016-01-01

Computer systems: A Programmer’s Perspective explains the underlying elements common among all computer systems and how they affect general application performance. Written from the programmer’s perspective, this book strives to teach readers how understanding basic elements of computer systems and executing real practice can lead them to create better programs. Spanning across computer science themes such as hardware architecture, the operating system, and systems software, the Third Edition serves as a comprehensive introduction to programming. This book strives to create programmers who understand all elements of computer systems and will be able to engage in any application of the field--from fixing faulty software, to writing more capable programs, to avoiding common flaws. It lays the groundwork for readers to delve into more intensive topics such as computer architecture, embedded systems, and cybersecurity. This book focuses on systems that execute an x86-64 machine code, and recommends th...
Implementation of the Principal Component Analysis onto High-Performance Computer Facilities for Hyperspectral Dimensionality Reduction: Results and Comparisons

Directory of Open Access Journals (Sweden)

Ernestina Martel

2018-06-01

Full Text Available Dimensionality reduction represents a critical preprocessing step in order to increase the efficiency and the performance of many hyperspectral imaging algorithms. However, dimensionality reduction algorithms, such as the Principal Component Analysis (PCA, suffer from their computationally demanding nature, becoming advisable for their implementation onto high-performance computer architectures for applications under strict latency constraints. This work presents the implementation of the PCA algorithm onto two different high-performance devices, namely, an NVIDIA Graphics Processing Unit (GPU and a Kalray manycore, uncovering a highly valuable set of tips and tricks in order to take full advantage of the inherent parallelism of these high-performance computing platforms, and hence, reducing the time that is required to process a given hyperspectral image. Moreover, the achieved results obtained with different hyperspectral images have been compared with the ones that were obtained with a field programmable gate array (FPGA-based implementation of the PCA algorithm that has been recently published, providing, for the first time in the literature, a comprehensive analysis in order to highlight the pros and cons of each option.
Parallel Programming using OpenCL on Modern Architectures

DEFF Research Database (Denmark)

Nielsen, Allan Svejstrup; Engsig-Karup, Allan Peter; Dammann, Bernd

as they are at graphics. To conclude the presentation of OpenCL as a language for compute, a matrix-matrix multiplication example is devised and optimized for the VLIW4, Tesla and Fermi architectures. The performance is measured as a function of both matrix and work-group size and results are discussed. Where applicable...
Energy and architecture: improvement of energy performance in existing buildings

Energy Technology Data Exchange (ETDEWEB)

Haase, Matthias; Wycmans, Annemie; Solbraa, Anne; Grytli, Eir

2011-07-01

This book aims to give an overview of different aspects of retrofitting existing buildings. The target group is students of architecture and building engineering as well as building professionals. Eight out of ten buildings which we will inhabit in 2050 already exist. This means that a great potential for reducing our carbon footprint lies in the existing building stock. Students from NTNU have used the renovation of a 1950s school building at Linesoeya in Soer-Trondelag as a case to increase their awareness and knowledge about the challenges building professionals need to overcome to unite technical details and high user quality into good environmental performance. The students were invited by the building owners and initiators of LIPA Eco Project to contribute to its development: By retrofitting an existing building to passive house standards and combining this with energy generated on site, LIPA Eco Project aims to provide a hands-on example with regard to energy efficiency, architectural design and craftsmanship for a low carbon society. The overall goal for this project is to raise awareness regarding resource efficiency measures in architecture and particularly in existing building mass.(au)
A Declarative Approach to Architectural Reflection

DEFF Research Database (Denmark)

Ingstrup, Mads; Hansen, Klaus Marius

2005-01-01

which both creates runtime models of specific distributed architectures and allow for evaluation of AQL queries on these models. We illustrate the viability of the approach in two particular applications of such a model: constraint checking relative to an architectural style, and reasoning about certain......Recent research shows runtime architectural reflection is instrumental in, for instance, building adaptive and flexible systems or checking correspondence between design and implementation. Moreover, experience with computational reflection in various branches of computer science shows...... that the interface through which the meta-information of the running system is accessed, and possibly modified, lies at the heart of designing reflective systems. This paper proposes that such an interface should be like a database: accessed through queries expressed using the concepts with which architecture...
Performance anomaly detection in microservice architectures under continuous change

OpenAIRE

Düllmann, Thomas F.

2017-01-01

The idea of DevOps and agile approaches like Continuous Integration (CI) and microservice architectures are bocoming more and more popular as the demand for flexible and scalable solutions is increasing. By raising the degree of automation and distribution new challenges in terms of application performance monitoring arise because microservices are possibly short-lived and may be replaced within seconds. The fact that microservices are added and removed on a regular basis brings new requireme...
Using Runtime Systems Tools to Implement Efficient Preconditioners for Heterogeneous Architectures

Directory of Open Access Journals (Sweden)

Roussel Adrien

2016-11-01

Full Text Available Solving large sparse linear systems is a time-consuming step in basin modeling or reservoir simulation. The choice of a robust preconditioner strongly impact the performance of the overall simulation. Heterogeneous architectures based on General Purpose computing on Graphic Processing Units (GPGPU or many-core architectures introduce programming challenges which can be managed in a transparent way for developer with the use of runtime systems. Nevertheless, algorithms need to be well suited for these massively parallel architectures. In this paper, we present preconditioning techniques which enable to take advantage of emerging architectures. We also present our task-based implementations through the use of the HARTS (Heterogeneous Abstract RunTime System runtime system, which aims to manage the recent architectures. We focus on two preconditoners. The first is ILU(0 preconditioner implemented on distributing memory systems. The second one is a multi-level domain decomposition method implemented on a shared-memory system. Obtained results are then presented on corresponding architectures, which open the way to discuss on the scalability of such methods according to numerical performances while keeping in mind that the next step is to propose a massively parallel implementations of these techniques.
Evaluation of existing and proposed computer architectures for future ground-based systems

Science.gov (United States)

Schulbach, C.

1985-01-01

Parallel processing architectures and techniques used in current supercomputers are described and projections are made of future advances. Presently, the von Neumann sequential processing pattern has been accelerated by having separate I/O processors, interleaved memories, wide memories, independent functional units and pipelining. Recent supercomputers have featured single-input, multiple data stream architectures, which have different processors for performing various operations (vector or pipeline processors). Multiple input, multiple data stream machines have also been developed. Data flow techniques, wherein program instructions are activated only when data are available, are expected to play a large role in future supercomputers, along with increased parallel processor arrays. The enhanced operational speeds are essential for adequately treating data from future spacecraft remote sensing instruments such as the Thematic Mapper.
Blueprint for a microwave trapped ion quantum computer.

Science.gov (United States)

Lekitsch, Bjoern; Weidt, Sebastian; Fowler, Austin G; Mølmer, Klaus; Devitt, Simon J; Wunderlich, Christof; Hensinger, Winfried K

2017-02-01

The availability of a universal quantum computer may have a fundamental impact on a vast number of research fields and on society as a whole. An increasingly large scientific and industrial community is working toward the realization of such a device. An arbitrarily large quantum computer may best be constructed using a modular approach. We present a blueprint for a trapped ion-based scalable quantum computer module, making it possible to create a scalable quantum computer architecture based on long-wavelength radiation quantum gates. The modules control all operations as stand-alone units, are constructed using silicon microfabrication techniques, and are within reach of current technology. To perform the required quantum computations, the modules make use of long-wavelength radiation-based quantum gate technology. To scale this microwave quantum computer architecture to a large size, we present a fully scalable design that makes use of ion transport between different modules, thereby allowing arbitrarily many modules to be connected to construct a large-scale device. A high error-threshold surface error correction code can be implemented in the proposed architecture to execute fault-tolerant operations. With appropriate adjustments, the proposed modules are also suitable for alternative trapped ion quantum computer architectures, such as schemes using photonic interconnects.
Multicore technology architecture, reconfiguration, and modeling

CERN Document Server

Qadri, Muhammad Yasir

2013-01-01

The saturation of design complexity and clock frequencies for single-core processors has resulted in the emergence of multicore architectures as an alternative design paradigm. Nowadays, multicore/multithreaded computing systems are not only a de-facto standard for high-end applications, they are also gaining popularity in the field of embedded computing. The start of the multicore era has altered the concepts relating to almost all of the areas of computer architecture design, including core design, memory management, thread scheduling, application support, inter-processor communication, debu
Analyzing Resiliency of the Smart Grid Communication Architectures

Energy Technology Data Exchange (ETDEWEB)

None, None

2016-08-01

Smart grids are susceptible to cyber-attack as a result of new communication, control and computation techniques employed in the grid. In this paper, we characterize and analyze the resiliency of smart grid communication architecture, specifically an RF mesh based architecture, under cyber attacks. We analyze the resiliency of the communication architecture by studying the performance of high-level smart grid functions such as metering, and demand response which depend on communication. Disrupting the operation of these functions impacts the operational resiliency of the smart grid. Our analysis shows that it takes an attacker only a small fraction of meters to compromise the communication resiliency of the smart grid. We discuss the implications of our result to critical smart grid functions and to the overall security of the smart grid.
Architectural communication: Intra and extra activity of architecture

Directory of Open Access Journals (Sweden)

Stamatović-Vučković Slavica

2013-01-01

Full Text Available Apart from a brief overview of architectural communication viewed from the standpoint of theory of information and semiotics, this paper contains two forms of dualistically viewed architectural communication. The duality denotation/connotation (”primary” and ”secondary” architectural communication is one of semiotic postulates taken from Umberto Eco who viewed architectural communication as a semiotic phenomenon. In addition, architectural communication can be viewed as an intra and an extra activity of architecture where the overall activity of the edifice performed through its spatial manifestation may be understood as an act of communication. In that respect, the activity may be perceived as the ”behavior of architecture”, which corresponds to Lefebvre’s production of space.
Peer-to-peer computing for secure high performance data copying

International Nuclear Information System (INIS)

Hanushevsky, A.; Trunov, A.; Cottrell, L.

2001-01-01

The BaBar Copy Program (bbcp) is an excellent representative of peer-to-peer (P2P) computing. It is also a pioneering application of its type in the P2P arena. Built upon the foundation of its predecessor, Secure Fast Copy (sfcp), bbcp incorporates significant improvements performance and usability. As with sfcp, bbcp uses ssh for authentication; providing an elegant and simple working model--if you can ssh to a location, you can copy files to or from that location. To fully support this notion, bbcp transparently supports 3rd party copy operations. The program also incorporates several mechanism to deal with firewall security; the bane of P2P computing. To achieve high performance in a wide area network, bbcp allows a user to independently specify, the number of parallel network streams, tcp window size, and the file I/O blocking factor. Using these parameters, data is pipelined from source to target to provide a uniform traffic pattern that maximizes router efficiency. For improved recoverability, bbcp also keeps track of copy operations so that an operation can be restarted from the point of failure at a later time; minimizing the amount of network traffic in the event of a copy failure. Here, the authors present the bbcp architecture, it's various features, and the reasons for their inclusion

Peer-to-Peer Computing for Secure High Performance Data Copying

International Nuclear Information System (INIS)

2002-01-01

The BaBar Copy Program (bbcp) is an excellent representative of peer-to-peer (P2P) computing. It is also a pioneering application of its type in the P2P arena. Built upon the foundation of its predecessor, Secure Fast Copy (sfcp), bbcp incorporates significant improvements performance and usability. As with sfcp, bbcp uses ssh for authentication; providing an elegant and simple working model -- if you can ssh to a location, you can copy files to or from that location. To fully support this notion, bbcp transparently supports 3rd party copy operations. The program also incorporates several mechanism to deal with firewall security; the bane of P2P computing. To achieve high performance in a wide area network, bbcp allows a user to independently specify, the number of parallel network streams, tcp window size, and the file I/O blocking factor. Using these parameters, data is pipelined from source to target to provide a uniform traffic pattern that maximizes router efficiency. For improved recoverability, bbcp also keeps track of copy operations so that an operation can be restarted from the point of failure at a later time; minimizing the amount of network traffic in the event of a copy failure. Here, we preset the bbcp architecture, it's various features, and the reasons for their inclusion
SaaS architecture and pricing models

OpenAIRE

Laatikainen, Gabriella; Ojala, Arto

2014-01-01

In the new era of computing, SaaS software with different architectural characteristics might be priced in different ways. Even though both pricing and architectural characteristics are responsible for the success of the offering; the relationship between architectural and pricing characteristics has not been studied before. The present study fills this gap by employing a multi-case research. The findings accentuate that flexible and well-designed architecture enables different pricing models...
Application of Tessellation in Architectural Geometry Design

Science.gov (United States)

Chang, Wei

2018-06-01

Tessellation plays a significant role in architectural geometry design, which is widely used both through history of architecture and in modern architectural design with the help of computer technology. Tessellation has been found since the birth of civilization. In terms of dimensions, there are two- dimensional tessellations and three-dimensional tessellations; in terms of symmetry, there are periodic tessellations and aperiodic tessellations. Besides, some special types of tessellations such as Voronoi Tessellation and Delaunay Triangles are also included. Both Geometry and Crystallography, the latter of which is the basic theory of three-dimensional tessellations, need to be studied. In history, tessellation was applied into skins or decorations in architecture. The development of Computer technology enables tessellation to be more powerful, as seen in surface control, surface display and structure design, etc. Therefore, research on the application of tessellation in architectural geometry design is of great necessity in architecture studies.
Digital optical computers at the optoelectronic computing systems center

Science.gov (United States)

Jordan, Harry F.

1991-01-01

The Digital Optical Computing Program within the National Science Foundation Engineering Research Center for Opto-electronic Computing Systems has as its specific goal research on optical computing architectures suitable for use at the highest possible speeds. The program can be targeted toward exploiting the time domain because other programs in the Center are pursuing research on parallel optical systems, exploiting optical interconnection and optical devices and materials. Using a general purpose computing architecture as the focus, we are developing design techniques, tools and architecture for operation at the speed of light limit. Experimental work is being done with the somewhat low speed components currently available but with architectures which will scale up in speed as faster devices are developed. The design algorithms and tools developed for a general purpose, stored program computer are being applied to other systems such as optimally controlled optical communication networks.
Analog readout for optical reservoir computers

OpenAIRE

Smerieri, Anteo; Duport, François; Paquot, Yvan; Schrauwen, Benjamin; Haelterman, Marc; Massar, Serge

2012-01-01

Reservoir computing is a new, powerful and flexible machine learning technique that is easily implemented in hardware. Recently, by using a time-multiplexed architecture, hardware reservoir computers have reached performance comparable to digital implementations. Operating speeds allowing for real time information operation have been reached using optoelectronic systems. At present the main performance bottleneck is the readout layer which uses slow, digital postprocessing. We have designed a...
24th & 25th Joint Workshop on Sustained Simulation Performance

CERN Document Server

Bez, Wolfgang; Focht, Erich; Gienger, Michael; Kobayashi, Hiroaki

2017-01-01

This book presents the state of the art in High Performance Computing on modern supercomputer architectures. It addresses trends in hardware and software development in general, as well as the future of High Performance Computing systems and heterogeneous architectures. The contributions cover a broad range of topics, from improved system management to Computational Fluid Dynamics, High Performance Data Analytics, and novel mathematical approaches for large-scale systems. In addition, they explore innovative fields like coupled multi-physics and multi-scale simulations. All contributions are based on selected papers presented at the 24th Workshop on Sustained Simulation Performance, held at the University of Stuttgart’s High Performance Computing Center in Stuttgart, Germany in December 2016 and the subsequent Workshop on Sustained Simulation Performance, held at the Cyberscience Center, Tohoku University, Japan in March 2017.
An information-theoretic approach to motor action decoding with a reconfigurable parallel architecture.

Science.gov (United States)

Craciun, Stefan; Brockmeier, Austin J; George, Alan D; Lam, Herman; Príncipe, José C

2011-01-01

Methods for decoding movements from neural spike counts using adaptive filters often rely on minimizing the mean-squared error. However, for non-Gaussian distribution of errors, this approach is not optimal for performance. Therefore, rather than using probabilistic modeling, we propose an alternate non-parametric approach. In order to extract more structure from the input signal (neuronal spike counts) we propose using minimum error entropy (MEE), an information-theoretic approach that minimizes the error entropy as part of an iterative cost function. However, the disadvantage of using MEE as the cost function for adaptive filters is the increase in computational complexity. In this paper we present a comparison between the decoding performance of the analytic Wiener filter and a linear filter trained with MEE, which is then mapped to a parallel architecture in reconfigurable hardware tailored to the computational needs of the MEE filter. We observe considerable speedup from the hardware design. The adaptation of filter weights for the multiple-input, multiple-output linear filters, necessary in motor decoding, is a highly parallelizable algorithm. It can be decomposed into many independent computational blocks with a parallel architecture readily mapped to a field-programmable gate array (FPGA) and scales to large numbers of neurons. By pipelining and parallelizing independent computations in the algorithm, the proposed parallel architecture has sublinear increases in execution time with respect to both window size and filter order.
Parametric Approach to Assessing Performance of High-Lift Device Active Flow Control Architectures

Directory of Open Access Journals (Sweden)

Yu Cai

2017-02-01

Full Text Available Active Flow Control is at present an area of considerable research, with multiple potential aircraft applications. While the majority of research has focused on the performance of the actuators themselves, a system-level perspective is necessary to assess the viability of proposed solutions. This paper demonstrates such an approach, in which major system components are sized based on system flow and redundancy considerations, with the impacts linked directly to the mission performance of the aircraft. Considering the case of a large twin-aisle aircraft, four distinct active flow control architectures that facilitate the simplification of the high-lift mechanism are investigated using the demonstrated approach. The analysis indicates a very strong influence of system total mass flow requirement on architecture performance, both for a typical mission and also over the entire payload-range envelope of the aircraft.
High performance integer arithmetic circuit design on FPGA architecture, implementation and design automation

CERN Document Server

Palchaudhuri, Ayan

2016-01-01

This book describes the optimized implementations of several arithmetic datapath, controlpath and pseudorandom sequence generator circuits for realization of high performance arithmetic circuits targeted towards a specific family of the high-end Field Programmable Gate Arrays (FPGAs). It explores regular, modular, cascadable, and bit-sliced architectures of these circuits, by directly instantiating the target FPGA-specific primitives in the HDL. Every proposed architecture is justified with detailed mathematical analyses. Simultaneously, constrained placement of the circuit building blocks is performed, by placing the logically related hardware primitives in close proximity to one another by supplying relevant placement constraints in the Xilinx proprietary “User Constraints File”. The book covers the implementation of a GUI-based CAD tool named FlexiCore integrated with the Xilinx Integrated Software Environment (ISE) for design automation of platform-specific high-performance arithmetic circuits from us...
THE CHALLENGE OF THE PERFORMANCE CONCEPT WITHIN THE SUSTAINABILITY AND COMPUTATIONAL DESIGN FIELD

Directory of Open Access Journals (Sweden)

Marcio Nisenbaum

2017-11-01

Full Text Available This paper discusses the notion of performance and its appropriation within the research fields related to sustainability and computational design, focusing on the design processes of the architectural and urban fields. Recently, terms such as “performance oriented design” or “performance driven architecture”, especially when related to sustainability, have been used by many authors and professionals as an attempt to engender project guidelines based on simulation processes and systematic use of digital tools. In this context, the notion of performance has basically been understood as the way in which an action is fulfilled, agreeing to contemporary discourses of efficiency and optimization – in this circumstance it is considered that a building or urban area “performs” if it fulfills certain objective sustainability evaluation criteria, reduced to mathematical parameters. This paper intends to broaden this understanding by exploring new theoretical interpretations, referring to etymological investigation, historical research, and literature review, based on authors from different areas and on the case study of the solar houses academic competition, Solar Decathlon. This initial analysis is expected to contribute to the emergence of new forms of interpretation of the performance concept, relativizing the notion of the “body” that “performs” in different manners, thus enhancing its appropriation and use within the fields of sustainability and computational design.
Staged Event-Driven Architecture As A Micro-Architecture Of Distributed And Pluginable Crawling Platform

Directory of Open Access Journals (Sweden)

Leszek Siwik

2013-01-01

Full Text Available There are many crawling systems available on the market but they are rather close systems dedicated for performing particular kind and class of tasks with predefined set of scope, strategy etc. In real life however there are meaningful groups of users (e.g. marketing, criminal or governmental analysts requiring not just a yet another crawling system dedicated for performing predefined tasks. They need rather easy-to-use, user friendly all-in-one studio for not only executing and running internet robots and crawlers, but also for (graphical (redefining and (recomposing crawlers according to dynamically changing requirements and use-cases. To realize the above-mentioned idea, Cassiopeia framework has been designed and developed. One has to remember, however, that enormous size and unimaginable structural complexity of WWW network are the reasons that, from a technical and architectural point of view, developing effective internet robots – and the more so developing a framework supporting graphical robots’ composition – becomes a really challenging task. The crucial aspect in the context of crawling efficiency and scalability is concurrency model applied. There are two the most typical concurrency management models i.e. classical concurrency based on the pool of threads and processes and event-driven concurrency. None of them are ideal approaches. That is why, research on alternative models is still conducted to propose efficient and convenient architecture for concurrent and distributed applications. One of promising models is staged event-driven architecture mixing to some extent both of above mentioned classical approaches and providing some additional benefits such as splitting application into separate stages connected by events queues – what is interesting taking requirements about crawler (recomposition into account. The goal of this paper is to present the idea and the PoC implementation of Cassiopeia framework, with the special
Designing Domain-Specific Heterogeneous Architectures from Dataflow Programs

Directory of Open Access Journals (Sweden)

Süleyman Savas

2018-04-01

Full Text Available The last ten years have seen performance and power requirements pushing computer architectures using only a single core towards so-called manycore systems with hundreds of cores on a single chip. To further increase performance and energy efficiency, we are now seeing the development of heterogeneous architectures with specialized and accelerated cores. However, designing these heterogeneous systems is a challenging task due to their inherent complexity. We proposed an approach for designing domain-specific heterogeneous architectures based on instruction augmentation through the integration of hardware accelerators into simple cores. These hardware accelerators were determined based on their common use among applications within a certain domain.The objective was to generate heterogeneous architectures by integrating many of these accelerated cores and connecting them with a network-on-chip. The proposed approach aimed to ease the design of heterogeneous manycore architectures—and, consequently, exploration of the design space—by automating the design steps. To evaluate our approach, we enhanced our software tool chain with a tool that can generate accelerated cores from dataflow programs. This new tool chain was evaluated with the aid of two use cases: radar signal processing and mobile baseband processing. We could achieve an approximately 4 × improvement in performance, while executing complete applications on the augmented cores with a small impact (2.5–13% on area usage. The generated accelerators are competitive, achieving more than 90% of the performance of hand-written implementations.
Use of communication architecture test bed to evaluate data network performance

International Nuclear Information System (INIS)

Clapp, N.E. Jr.; Swail, B.K.; Naser, J.A.

1994-01-01

Local area networks (LANs) are becoming more prevalent in nuclear power plants. Traditionally, LANs were only used as information highways, providing office automation services. LANs are now being used as data highways for applications in plant data acquisition and control systems. A communication architecture test bed, which contains network simulators, is needed to allow network performance studies and to resolve design issues prior to equipment purchase. Two levels of granularity of simulation are needed to provide the dynamic information about network performance. A coarse-grain simulator is used to estimate the dynamic performance of the network due to major resources such as workstations, gateways, and data acquisition systems. A fine-grain simulator allows a greater level of detail about the underlying network protocol and resources to be simulated. The combination of coarse-grain and fine-grain simulation packages provides the network designer with the required tools to thoroughly understand the behavior of the modeled network. This paper describes the development of a communication architecture test bed using commercial network simulation packages. Network simulators allow the resolution of major design issues in software without the expense of purchasing costly hardware components
Accessible high performance computing solutions for near real-time image processing for time critical applications

Science.gov (United States)

Bielski, Conrad; Lemoine, Guido; Syryczynski, Jacek

2009-09-01

High Performance Computing (HPC) hardware solutions such as grid computing and General Processing on a Graphics Processing Unit (GPGPU) are now accessible to users with general computing needs. Grid computing infrastructures in the form of computing clusters or blades are becoming common place and GPGPU solutions that leverage the processing power of the video card are quickly being integrated into personal workstations. Our interest in these HPC technologies stems from the need to produce near real-time maps from a combination of pre- and post-event satellite imagery in support of post-disaster management. Faster processing provides a twofold gain in this situation: 1. critical information can be provided faster and 2. more elaborate automated processing can be performed prior to providing the critical information. In our particular case, we test the use of the PANTEX index which is based on analysis of image textural measures extracted using anisotropic, rotation-invariant GLCM statistics. The use of this index, applied in a moving window, has been shown to successfully identify built-up areas in remotely sensed imagery. Built-up index image masks are important input to the structuring of damage assessment interpretation because they help optimise the workload. The performance of computing the PANTEX workflow is compared on two different HPC hardware architectures: (1) a blade server with 4 blades, each having dual quad-core CPUs and (2) a CUDA enabled GPU workstation. The reference platform is a dual CPU-quad core workstation and the PANTEX workflow total computing time is measured. Furthermore, as part of a qualitative evaluation, the differences in setting up and configuring various hardware solutions and the related software coding effort is presented.
Emerging memory technologies design, architecture, and applications

CERN Document Server

2014-01-01

This book explores the design implications of emerging, non-volatile memory (NVM) technologies on future computer memory hierarchy architecture designs. Since NVM technologies combine the speed of SRAM, the density of DRAM, and the non-volatility of Flash memory, they are very attractive as the basis for future universal memories. This book provides a holistic perspective on the topic, covering modeling, design, architecture and applications. The practical information included in this book will enable designers to exploit emerging memory technologies to improve significantly the performance/power/reliability of future, mainstream integrated circuits. • Provides a comprehensive reference on designing modern circuits with emerging, non-volatile memory technologies, such as MRAM and PCRAM; • Explores new design opportunities offered by emerging memory technologies, from a holistic perspective; • Describes topics in technology, modeling, architecture and applications; • Enables circuit designers to ex...
Brain inspired hardware architectures - Can they be used for particle physics ?

CERN Multimedia

CERN. Geneva

2016-01-01

After their inception in the 1940s and several decades of moderate success, artificial neural networks have recently demonstrated impressive achievements in analysing big data volumes. Wide and deep network architectures can now be trained using high performance computing systems, graphics card clusters in particular. Despite their successes these state-of-the-art approaches suffer from very long training times and huge energy consumption, in particular during the training phase. The biological brain can perform similar and superior classification tasks in the space and time domains, but at the same time exhibits very low power consumption, rapid unsupervised learning capabilities and fault tolerance. In the talk the differences between classical neural networks and neural circuits in the brain will be presented. Recent hardware implementations of neuromorphic computing systems and their applications will be shown. Finally, some initial ideas to use accelerated neural architectures as trigger processors i...
High Performance Computing (HPC) Challenge (HPCC) Benchmark Suite Development

National Research Council Canada - National Science Library

Dongarra, J. J

2005-01-01

.... The applications of performance modeling are numerous, including evaluation of algorithms, optimization of code implementation, parallel library development, and comparison of system architectures...
The GOES-R Product Generation Architecture

Science.gov (United States)

Dittberner, G. J.; Kalluri, S.; Hansen, D.; Weiner, A.; Tarpley, A.; Marley, S.

2011-12-01

The GOES-R system will substantially improve users' ability to succeed in their work by providing data with significantly enhanced instruments, higher resolution, much shorter relook times, and an increased number and diversity of products. The Product Generation architecture is designed to provide the computer and memory resources necessary to achieve the necessary latency and availability for these products. Over time, new and updated algorithms are expected to be added and old ones removed as science advances and new products are developed. The GOES-R GS architecture is being planned to maintain functionality so that when such changes are implemented, operational product generation will continue without interruption. The primary parts of the PG infrastructure are the Service Based Architecture (SBA) and the Data Fabric (DF). SBA is the middleware that encapsulates and manages science algorithms that generate products. It is divided into three parts, the Executive, which manages and configures the algorithm as a service, the Dispatcher, which provides data to the algorithm, and the Strategy, which determines when the algorithm can execute with the available data. SBA is a distributed architecture, with services connected to each other over a compute grid and is highly scalable. This plug-and-play architecture allows algorithms to be added, removed, or updated without affecting any other services or software currently running and producing data. Algorithms require product data from other algorithms, so a scalable and reliable messaging is necessary. The SBA uses the DF to provide this data communication layer between algorithms. The DF provides an abstract interface over a distributed and persistent multi-layered storage system (e.g., memory based caching above disk-based storage) and an event management system that allows event-driven algorithm services to know when instrument data are available and where they reside. Together, the SBA and the DF provide a
Exploring performance and energy tradeoffs for irregular applications: A case study on the Tilera many-core architecture

Energy Technology Data Exchange (ETDEWEB)

Panyala, Ajay; Chavarría-Miranda, Daniel; Manzano, Joseph B.; Tumeo, Antonino; Halappanavar, Mahantesh

2017-06-01

High performance, parallel applications with irregular data accesses are becoming a critical workload class for modern systems. In particular, the execution of such workloads on emerging many-core systems is expected to be a significant component of applications in data mining, machine learning, scientific computing and graph analytics. However, power and energy constraints limit the capabilities of individual cores, memory hierarchy and on-chip interconnect of such systems, thus leading to architectural and software trade-os that must be understood in the context of the intended application’s behavior. Irregular applications are notoriously hard to optimize given their data-dependent access patterns, lack of structured locality and complex data structures and code patterns. We have ported two irregular applications, graph community detection using the Louvain method (Grappolo) and high-performance conjugate gradient (HPCCG), to the Tilera many-core system and have conducted a detailed study of platform-independent and platform-specific optimizations that improve their performance as well as reduce their overall energy consumption. To conduct this study, we employ an auto-tuning based approach that explores the optimization design space along three dimensions - memory layout schemes, GCC compiler flag choices and OpenMP loop scheduling options. We leverage MIT’s OpenTuner auto-tuning framework to explore and recommend energy optimal choices for different combinations of parameters. We then conduct an in-depth architectural characterization to understand the memory behavior of the selected workloads. Finally, we perform a correlation study to demonstrate the interplay between the hardware behavior and application characteristics. Using auto-tuning, we demonstrate whole-node energy savings and performance improvements of up to 49:6% and 60% relative to a baseline instantiation, and up to 31% and 45:4% relative to manually optimized variants.
Development and Performance of the Modularized, High-performance Computing and Hybrid-architecture Capable GEOS-Chem Chemical Transport Model

Science.gov (United States)

Long, M. S.; Yantosca, R.; Nielsen, J.; Linford, J. C.; Keller, C. A.; Payer Sulprizio, M.; Jacob, D. J.

2014-12-01

The GEOS-Chem global chemical transport model (CTM), used by a large atmospheric chemistry research community, has been reengineered to serve as a platform for a range of computational atmospheric chemistry science foci and applications. Development included modularization for coupling to general circulation and Earth system models (ESMs) and the adoption of co-processor capable atmospheric chemistry solvers. This was done using an Earth System Modeling Framework (ESMF) interface that operates independently of GEOS-Chem scientific code to permit seamless transition from the GEOS-Chem stand-alone serial CTM to deployment as a coupled ESM module. In this manner, the continual stream of updates contributed by the CTM user community is automatically available for broader applications, which remain state-of-science and directly referenceable to the latest version of the standard GEOS-Chem CTM. These developments are now available as part of the standard version of the GEOS-Chem CTM. The system has been implemented as an atmospheric chemistry module within the NASA GEOS-5 ESM. The coupled GEOS-5/GEOS-Chem system was tested for weak and strong scalability and performance with a tropospheric oxidant-aerosol simulation. Results confirm that the GEOS-Chem chemical operator scales efficiently for any number of processes. Although inclusion of atmospheric chemistry in ESMs is computationally expensive, the excellent scalability of the chemical operator means that the relative cost goes down with increasing number of processes, making fine-scale resolution simulations possible.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.