WorldWideScience

Sample records for performance computer architectures

  1. A High Performance COTS Based Computer Architecture

    Science.gov (United States)

    Patte, Mathieu; Grimoldi, Raoul; Trautner, Roland

    2014-08-01

    Using Commercial Off The Shelf (COTS) electronic components for space applications is a long standing idea. Indeed the difference in processing performance and energy efficiency between radiation hardened components and COTS components is so important that COTS components are very attractive for use in mass and power constrained systems. However using COTS components in space is not straightforward as one must account with the effects of the space environment on the COTS components behavior. In the frame of the ESA funded activity called High Performance COTS Based Computer, Airbus Defense and Space and its subcontractor OHB CGS have developed and prototyped a versatile COTS based architecture for high performance processing. The rest of the paper is organized as follows: in a first section we will start by recapitulating the interests and constraints of using COTS components for space applications; then we will briefly describe existing fault mitigation architectures and present our solution for fault mitigation based on a component called the SmartIO; in the last part of the paper we will describe the prototyping activities executed during the HiP CBC project.

  2. Performance Analysis of Cloud Computing Architectures Using Discrete Event Simulation

    Science.gov (United States)

    Stocker, John C.; Golomb, Andrew M.

    2011-01-01

    Cloud computing offers the economic benefit of on-demand resource allocation to meet changing enterprise computing needs. However, the flexibility of cloud computing is disadvantaged when compared to traditional hosting in providing predictable application and service performance. Cloud computing relies on resource scheduling in a virtualized network-centric server environment, which makes static performance analysis infeasible. We developed a discrete event simulation model to evaluate the overall effectiveness of organizations in executing their workflow in traditional and cloud computing architectures. The two part model framework characterizes both the demand using a probability distribution for each type of service request as well as enterprise computing resource constraints. Our simulations provide quantitative analysis to design and provision computing architectures that maximize overall mission effectiveness. We share our analysis of key resource constraints in cloud computing architectures and findings on the appropriateness of cloud computing in various applications.

  3. A High Performance VLSI Computer Architecture For Computer Graphics

    Science.gov (United States)

    Chin, Chi-Yuan; Lin, Wen-Tai

    1988-10-01

    A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.

  4. Benchmarking high performance computing architectures with CMS’ skeleton framework

    Science.gov (United States)

    Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

    2017-10-01

    In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.

  5. Performance evaluation of scientific programs on advanced architecture computers

    International Nuclear Information System (INIS)

    Walker, D.W.; Messina, P.; Baille, C.F.

    1988-01-01

    Recently a number of advanced architecture machines have become commercially available. These new machines promise better cost-performance then traditional computers, and some of them have the potential of competing with current supercomputers, such as the Cray X/MP, in terms of maximum performance. This paper describes an on-going project to evaluate a broad range of advanced architecture computers using a number of complete scientific application programs. The computers to be evaluated include distributed- memory machines such as the NCUBE, INTEL and Caltech/JPL hypercubes, and the MEIKO computing surface, shared-memory, bus architecture machines such as the Sequent Balance and the Alliant, very long instruction word machines such as the Multiflow Trace 7/200 computer, traditional supercomputers such as the Cray X.MP and Cray-2, and SIMD machines such as the Connection Machine. Currently 11 application codes from a number of scientific disciplines have been selected, although it is not intended to run all codes on all machines. Results are presented for two of the codes (QCD and missile tracking), and future work is proposed

  6. Improving Software Performance in the Compute Unified Device Architecture

    Directory of Open Access Journals (Sweden)

    Alexandru PIRJAN

    2010-01-01

    Full Text Available This paper analyzes several aspects regarding the improvement of software performance for applications written in the Compute Unified Device Architecture CUDA. We address an issue of great importance when programming a CUDA application: the Graphics Processing Unit’s (GPU’s memory management through ranspose ernels. We also benchmark and evaluate the performance for progressively optimizing a transposing matrix application in CUDA. One particular interest was to research how well the optimization techniques, applied to software application written in CUDA, scale to the latest generation of general-purpose graphic processors units (GPGPU, like the Fermi architecture implemented in the GTX480 and the previous architecture implemented in GTX280. Lately, there has been a lot of interest in the literature for this type of optimization analysis, but none of the works so far (to our best knowledge tried to validate if the optimizations can apply to a GPU from the latest Fermi architecture and how well does the Fermi architecture scale to these software performance improving techniques.

  7. Architecture and Programming Models for High Performance Intensive Computation

    Science.gov (United States)

    2016-06-29

    commands from the data processing center to the sensors is needed. It has been noted that the ubiquity of mobile communication devices offers the...commands from a Processing Facility by way of mobile Relay Stations. The activity of each component of this model other than the Merge module can be...evaluation of the initial system implementation. Gao also was in charge of the development of Fresh Breeze architecture backend on new many-core computers

  8. Newmark local time stepping on high-performance computing architectures

    KAUST Repository

    Rietmann, Max

    2016-11-25

    In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strong element-size contrasts (more than 100×). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.

  9. Newmark local time stepping on high-performance computing architectures

    Energy Technology Data Exchange (ETDEWEB)

    Rietmann, Max, E-mail: max.rietmann@erdw.ethz.ch [Institute for Computational Science, Università della Svizzera italiana, Lugano (Switzerland); Institute of Geophysics, ETH Zurich (Switzerland); Grote, Marcus, E-mail: marcus.grote@unibas.ch [Department of Mathematics and Computer Science, University of Basel (Switzerland); Peter, Daniel, E-mail: daniel.peter@kaust.edu.sa [Institute for Computational Science, Università della Svizzera italiana, Lugano (Switzerland); Institute of Geophysics, ETH Zurich (Switzerland); Schenk, Olaf, E-mail: olaf.schenk@usi.ch [Institute for Computational Science, Università della Svizzera italiana, Lugano (Switzerland)

    2017-04-01

    In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strong element-size contrasts (more than 100x). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.

  10. Newmark local time stepping on high-performance computing architectures

    KAUST Repository

    Rietmann, Max; Grote, Marcus; Peter, Daniel; Schenk, Olaf

    2016-01-01

    In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strong element-size contrasts (more than 100×). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.

  11. Performance evaluation for compressible flow calculations on five parallel computers of different architectures

    International Nuclear Information System (INIS)

    Kimura, Toshiya.

    1997-03-01

    A two-dimensional explicit Euler solver has been implemented for five MIMD parallel computers of different machine architectures in Center for Promotion of Computational Science and Engineering of Japan Atomic Energy Research Institute. These parallel computers are Fujitsu VPP300, NEC SX-4, CRAY T94, IBM SP2, and Hitachi SR2201. The code was parallelized by several parallelization methods, and a typical compressible flow problem has been calculated for different grid sizes changing the number of processors. Their effective performances for parallel calculations, such as calculation speed, speed-up ratio and parallel efficiency, have been investigated and evaluated. The communication time among processors has been also measured and evaluated. As a result, the differences on the performance and the characteristics between vector-parallel and scalar-parallel computers can be pointed, and it will present the basic data for efficient use of parallel computers and for large scale CFD simulations on parallel computers. (author)

  12. High-performance computing on the Intel Xeon Phi how to fully exploit MIC architectures

    CERN Document Server

    Wang, Endong; Shen, Bo; Zhang, Guangyong; Lu, Xiaowei; Wu, Qing; Wang, Yajuan

    2014-01-01

    The aim of this book is to explain to high-performance computing (HPC) developers how to utilize the Intel® Xeon Phi™ series products efficiently. To that end, it introduces some computing grammar, programming technology and optimization methods for using many-integrated-core (MIC) platforms and also offers tips and tricks for actual use, based on the authors' first-hand optimization experience.The material is organized in three sections. The first section, "Basics of MIC", introduces the fundamentals of MIC architecture and programming, including the specific Intel MIC programming environment

  13. Computer architecture technology trends

    CERN Document Server

    1991-01-01

    Please note this is a Short Discount publication. This year's edition of Computer Architecture Technology Trends analyses the trends which are taking place in the architecture of computing systems today. Due to the sheer number of different applications to which computers are being applied, there seems no end to the different adoptions which proliferate. There are, however, some underlying trends which appear. Decision makers should be aware of these trends when specifying architectures, particularly for future applications. This report is fully revised and updated and provides insight in

  14. Confabulation Based Real-time Anomaly Detection for Wide-area Surveillance Using Heterogeneous High Performance Computing Architecture

    Science.gov (United States)

    2015-06-01

    CONFABULATION BASED REAL-TIME ANOMALY DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE SYRACUSE...DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE 5a. CONTRACT NUMBER FA8750-12-1-0251 5b. GRANT...processors including graphic processor units (GPUs) and Intel Xeon Phi processors. Experimental results showed significant speedups, which can enable

  15. Computing architecture for autonomous microgrids

    Science.gov (United States)

    Goldsmith, Steven Y.

    2015-09-29

    A computing architecture that facilitates autonomously controlling operations of a microgrid is described herein. A microgrid network includes numerous computing devices that execute intelligent agents, each of which is assigned to a particular entity (load, source, storage device, or switch) in the microgrid. The intelligent agents can execute in accordance with predefined protocols to collectively perform computations that facilitate uninterrupted control of the .

  16. Matrix multiplication operations with data pre-conditioning in a high performance computing architecture

    Science.gov (United States)

    Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

    2013-11-05

    Mechanisms for performing matrix multiplication operations with data pre-conditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation.

  17. Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture

    KAUST Repository

    AbdulJabbar, Mustafa Abdulmajeed

    2017-07-31

    Manycore optimizations are essential for achieving performance worthy of anticipated exascale systems. Utilization of manycore chips is inevitable to attain the desired floating point performance of these energy-austere systems. In this work, we revisit ExaFMM, the open source Fast Multiple Method (FMM) library, in light of highly tuned shared-memory parallelization and detailed performance analysis on the new highly parallel Intel manycore architecture, Knights Landing (KNL). We assess scalability and performance gain using task-based parallelism of the FMM tree traversal. We also provide an in-depth analysis of the most computationally intensive part of the traversal kernel (i.e., the particle-to-particle (P2P) kernel), by comparing its performance across KNL and Broadwell architectures. We quantify different configurations that exploit the on-chip 512-bit vector units within different task-based threading paradigms. MPI communication-reducing and NUMA-aware approaches for the FMM’s global tree data exchange are examined with different cluster modes of KNL. By applying several algorithm- and architecture-aware optimizations for FMM, we show that the N-Body kernel on 256 threads of KNL achieves on average 2.8× speedup compared to the non-vectorized version, whereas on 56 threads of Broadwell, it achieves on average 2.9× speedup. In addition, the tree traversal kernel on KNL scales monotonically up to 256 threads with task-based programming models. The MPI-based communication-reducing algorithms show expected improvements of the data locality across the KNL on-chip network.

  18. Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture

    KAUST Repository

    AbdulJabbar, Mustafa Abdulmajeed; Al Farhan, Mohammed; Yokota, Rio; Keyes, David E.

    2017-01-01

    Manycore optimizations are essential for achieving performance worthy of anticipated exascale systems. Utilization of manycore chips is inevitable to attain the desired floating point performance of these energy-austere systems. In this work, we revisit ExaFMM, the open source Fast Multiple Method (FMM) library, in light of highly tuned shared-memory parallelization and detailed performance analysis on the new highly parallel Intel manycore architecture, Knights Landing (KNL). We assess scalability and performance gain using task-based parallelism of the FMM tree traversal. We also provide an in-depth analysis of the most computationally intensive part of the traversal kernel (i.e., the particle-to-particle (P2P) kernel), by comparing its performance across KNL and Broadwell architectures. We quantify different configurations that exploit the on-chip 512-bit vector units within different task-based threading paradigms. MPI communication-reducing and NUMA-aware approaches for the FMM’s global tree data exchange are examined with different cluster modes of KNL. By applying several algorithm- and architecture-aware optimizations for FMM, we show that the N-Body kernel on 256 threads of KNL achieves on average 2.8× speedup compared to the non-vectorized version, whereas on 56 threads of Broadwell, it achieves on average 2.9× speedup. In addition, the tree traversal kernel on KNL scales monotonically up to 256 threads with task-based programming models. The MPI-based communication-reducing algorithms show expected improvements of the data locality across the KNL on-chip network.

  19. New Developments in Modeling MHD Systems on High Performance Computing Architectures

    Science.gov (United States)

    Germaschewski, K.; Raeder, J.; Larson, D. J.; Bhattacharjee, A.

    2009-04-01

    Modeling the wide range of time and length scales present even in fluid models of plasmas like MHD and X-MHD (Extended MHD including two fluid effects like Hall term, electron inertia, electron pressure gradient) is challenging even on state-of-the-art supercomputers. In the last years, HPC capacity has continued to grow exponentially, but at the expense of making the computer systems more and more difficult to program in order to get maximum performance. In this paper, we will present a new approach to managing the complexity caused by the need to write efficient codes: Separating the numerical description of the problem, in our case a discretized right hand side (r.h.s.), from the actual implementation of efficiently evaluating it. An automatic code generator is used to describe the r.h.s. in a quasi-symbolic form while leaving the translation into efficient and parallelized code to a computer program itself. We implemented this approach for OpenGGCM (Open General Geospace Circulation Model), a model of the Earth's magnetosphere, which was accelerated by a factor of three on regular x86 architecture and a factor of 25 on the Cell BE architecture (commonly known for its deployment in Sony's PlayStation 3).

  20. Performative Urban Architecture

    DEFF Research Database (Denmark)

    Thomsen, Bo Stjerne; Jensen, Ole B.

    The paper explores how performative urban architecture can enhance community-making and public domain using socio-technical systems and digital technologies to constitute an urban reality. Digital medias developed for the web are now increasingly occupying the urban realm as a tool for navigating...... the physical world e.g. as exemplified by the Google Walk Score and the mobile extension of the Google Maps to the iPhone. At the same time the development in pervasive technologies and situated computing extends the build environment with digital feedback systems that are increasingly embedded and deployed...... using sensor technologies opening up for new access considerations in architecture as well as the ability for a local environment to act as real-time sources of information and facilities. Starting from the NoRA pavilion for the 10th International Architecture Biennale in Venice the paper discusses...

  1. Developing Materials Processing to Performance Modeling Capabilities and the Need for Exascale Computing Architectures (and Beyond)

    Energy Technology Data Exchange (ETDEWEB)

    Schraad, Mark William [Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Physics and Engineering Models; Luscher, Darby Jon [Los Alamos National Lab. (LANL), Los Alamos, NM (United States). Advanced Simulation and Computing

    2016-09-06

    Additive Manufacturing techniques are presenting the Department of Energy and the NNSA Laboratories with new opportunities to consider novel component production and repair processes, and to manufacture materials with tailored response and optimized performance characteristics. Additive Manufacturing technologies already are being applied to primary NNSA mission areas, including Nuclear Weapons. These mission areas are adapting to these new manufacturing methods, because of potential advantages, such as smaller manufacturing footprints, reduced needs for specialized tooling, an ability to embed sensing, novel part repair options, an ability to accommodate complex geometries, and lighter weight materials. To realize the full potential of Additive Manufacturing as a game-changing technology for the NNSA’s national security missions; however, significant progress must be made in several key technical areas. In addition to advances in engineering design, process optimization and automation, and accelerated feedstock design and manufacture, significant progress must be made in modeling and simulation. First and foremost, a more mature understanding of the process-structure-property-performance relationships must be developed. Because Additive Manufacturing processes change the nature of a material’s structure below the engineering scale, new models are required to predict materials response across the spectrum of relevant length scales, from the atomistic to the continuum. New diagnostics will be required to characterize materials response across these scales. And not just models, but advanced algorithms, next-generation codes, and advanced computer architectures will be required to complement the associated modeling activities. Based on preliminary work in each of these areas, a strong argument for the need for Exascale computing architectures can be made, if a legitimate predictive capability is to be developed.

  2. Soft Computing Techniques for the Protein Folding Problem on High Performance Computing Architectures.

    Science.gov (United States)

    Llanes, Antonio; Muñoz, Andrés; Bueno-Crespo, Andrés; García-Valverde, Teresa; Sánchez, Antonia; Arcas-Túnez, Francisco; Pérez-Sánchez, Horacio; Cecilia, José M

    2016-01-01

    The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.

  3. Computers in Academic Architecture Libraries.

    Science.gov (United States)

    Willis, Alfred; And Others

    1992-01-01

    Computers are widely used in architectural research and teaching in U.S. schools of architecture. A survey of libraries serving these schools sought information on the emphasis placed on computers by the architectural curriculum, accessibility of computers to library staff, and accessibility of computers to library patrons. Survey results and…

  4. Computer Architecture A Quantitative Approach

    CERN Document Server

    Hennessy, John L

    2007-01-01

    The era of seemingly unlimited growth in processor performance is over: single chip architectures can no longer overcome the performance limitations imposed by the power they consume and the heat they generate. Today, Intel and other semiconductor firms are abandoning the single fast processor model in favor of multi-core microprocessors--chips that combine two or more processors in a single package. In the fourth edition of Computer Architecture, the authors focus on this historic shift, increasing their coverage of multiprocessors and exploring the most effective ways of achieving parallelis

  5. Specialized computer architectures for computational aerodynamics

    Science.gov (United States)

    Stevenson, D. K.

    1978-01-01

    In recent years, computational fluid dynamics has made significant progress in modelling aerodynamic phenomena. Currently, one of the major barriers to future development lies in the compute-intensive nature of the numerical formulations and the relative high cost of performing these computations on commercially available general purpose computers, a cost high with respect to dollar expenditure and/or elapsed time. Today's computing technology will support a program designed to create specialized computing facilities to be dedicated to the important problems of computational aerodynamics. One of the still unresolved questions is the organization of the computing components in such a facility. The characteristics of fluid dynamic problems which will have significant impact on the choice of computer architecture for a specialized facility are reviewed.

  6. Time-Predictable Computer Architecture

    Directory of Open Access Journals (Sweden)

    Schoeberl Martin

    2009-01-01

    Full Text Available Today's general-purpose processors are optimized for maximum throughput. Real-time systems need a processor with both a reasonable and a known worst-case execution time (WCET. Features such as pipelines with instruction dependencies, caches, branch prediction, and out-of-order execution complicate WCET analysis and lead to very conservative estimates. In this paper, we evaluate the issues of current architectures with respect to WCET analysis. Then, we propose solutions for a time-predictable computer architecture. The proposed architecture is evaluated with implementation of some features in a Java processor. The resulting processor is a good target for WCET analysis and still performs well in the average case.

  7. Power-efficient computer architectures recent advances

    CERN Document Server

    Själander, Magnus; Kaxiras, Stefanos

    2014-01-01

    As Moore's Law and Dennard scaling trends have slowed, the challenges of building high-performance computer architectures while maintaining acceptable power efficiency levels have heightened. Over the past ten years, architecture techniques for power efficiency have shifted from primarily focusing on module-level efficiencies, toward more holistic design styles based on parallelism and heterogeneity. This work highlights and synthesizes recent techniques and trends in power-efficient computer architecture.Table of Contents: Introduction / Voltage and Frequency Management / Heterogeneity and Sp

  8. Spatial computing in interactive architecture

    NARCIS (Netherlands)

    S.O. Dulman (Stefan); M. Krezer; L. Hovestad

    2014-01-01

    htmlabstractDistributed computing is the theoretical foundation for applications and technologies like interactive architecture, wearable computing, and smart materials. It evolves continuously, following needs rising from scientific developments, novel uses of technology, or simply the curiosity to

  9. CITAstudio: Computation in Architecture 2015

    DEFF Research Database (Denmark)

    Nicholas, Paul; Ayres, Phil

    2016-01-01

    CITAstudio yearbook. CITAstudio: Computation in Architecture is a two year International Master's Programme at The Royal Danish Academy of Fine Arts, School of Architecture. With a focus on digital design and material fabrication the programme questions how computation is changing our spatial...

  10. Layered architecture for quantum computing

    OpenAIRE

    Jones, N. Cody; Van Meter, Rodney; Fowler, Austin G.; McMahon, Peter L.; Kim, Jungsang; Ladd, Thaddeus D.; Yamamoto, Yoshihisa

    2010-01-01

    We develop a layered quantum-computer architecture, which is a systematic framework for tackling the individual challenges of developing a quantum computer while constructing a cohesive device design. We discuss many of the prominent techniques for implementing circuit-model quantum computing and introduce several new methods, with an emphasis on employing surface-code quantum error correction. In doing so, we propose a new quantum-computer architecture based on optical control of quantum dot...

  11. Computer architecture a quantitative approach

    CERN Document Server

    Hennessy, John L

    2019-01-01

    Computer Architecture: A Quantitative Approach, Sixth Edition has been considered essential reading by instructors, students and practitioners of computer design for over 20 years. The sixth edition of this classic textbook is fully revised with the latest developments in processor and system architecture. It now features examples from the RISC-V (RISC Five) instruction set architecture, a modern RISC instruction set developed and designed to be a free and openly adoptable standard. It also includes a new chapter on domain-specific architectures and an updated chapter on warehouse-scale computing that features the first public information on Google's newest WSC. True to its original mission of demystifying computer architecture, this edition continues the longstanding tradition of focusing on areas where the most exciting computing innovation is happening, while always keeping an emphasis on good engineering design.

  12. Performance of particle in cell methods on highly concurrent computational architectures

    International Nuclear Information System (INIS)

    Adams, M.F.; Ethier, S.; Wichmann, N.

    2009-01-01

    Particle in cell (PIC) methods are effective in computing Vlasov-Poisson system of equations used in simulations of magnetic fusion plasmas. PIC methods use grid based computations, for solving Poisson's equation or more generally Maxwell's equations, as well as Monte-Carlo type methods to sample the Vlasov equation. The presence of two types of discretizations, deterministic field solves and Monte-Carlo methods for the Vlasov equation, pose challenges in understanding and optimizing performance on today large scale computers which require high levels of concurrency. These challenges arises from the need to optimize two very different types of processes and the interactions between them. Modern cache based high-end computers have very deep memory hierarchies and high degrees of concurrency which must be utilized effectively to achieve good performance. The effective use of these machines requires maximizing concurrency by eliminating serial or redundant work and minimizing global communication. A related issue is minimizing the memory traffic between levels of the memory hierarchy because performance is often limited by the bandwidths and latencies of the memory system. This paper discusses some of the performance issues, particularly in regard to parallelism, of PIC methods. The gyrokinetic toroidal code (GTC) is used for these studies and a new radial grid decomposition is presented and evaluated. Scaling of the code is demonstrated on ITER sized plasmas with up to 16K Cray XT3/4 cores.

  13. Performance of particle in cell methods on highly concurrent computational architectures

    International Nuclear Information System (INIS)

    Adams, M F; Ethier, S; Wichmann, N

    2007-01-01

    Particle in cell (PIC) methods are effective in computing Vlasov-Poisson system of equations used in simulations of magnetic fusion plasmas. PIC methods use grid based computations, for solving Poisson's equation or more generally Maxwell's equations, as well as Monte-Carlo type methods to sample the Vlasov equation. The presence of two types of discretizations, deterministic field solves and Monte-Carlo methods for the Vlasov equation, pose challenges in understanding and optimizing performance on today large scale computers which require high levels of concurrency. These challenges arises from the need to optimize two very different types of processes and the interactions between them. Modern cache based high-end computers have very deep memory hierarchies and high degrees of concurrency which must be utilized effectively to achieve good performance. The effective use of these machines requires maximizing concurrency by eliminating serial or redundant work and minimizing global communication. A related issue is minimizing the memory traffic between levels of the memory hierarchy because performance is often limited by the bandwidths and latencies of the memory system. This paper discusses some of the performance issues, particularly in regard to parallelism, of PIC methods. The gyrokinetic toroidal code (GTC) is used for these studies and a new radial grid decomposition is presented and evaluated. Scaling of the code is demonstrated on ITER sized plasmas with up to 16K Cray XT3/4 cores

  14. Fault Tolerant Computer Architecture

    CERN Document Server

    Sorin, Daniel

    2009-01-01

    For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes

  15. Digital design and computer architecture

    CERN Document Server

    Harris, David

    2010-01-01

    Digital Design and Computer Architecture is designed for courses that combine digital logic design with computer organization/architecture or that teach these subjects as a two-course sequence. Digital Design and Computer Architecture begins with a modern approach by rigorously covering the fundamentals of digital logic design and then introducing Hardware Description Languages (HDLs). Featuring examples of the two most widely-used HDLs, VHDL and Verilog, the first half of the text prepares the reader for what follows in the second: the design of a MIPS Processor. By the end of D

  16. The performance of a new Geant4 Bertini intra-nuclear cascade model in high throughput computing (HTC) cluster architecture

    Energy Technology Data Exchange (ETDEWEB)

    Aatos, Heikkinen; Andi, Hektor; Veikko, Karimaki; Tomas, Linden [Helsinki Univ., Institute of Physics (Finland)

    2003-07-01

    We study the performance of a new Bertini intra-nuclear cascade model implemented in the general detector simulation tool-kit Geant4 with a High Throughput Computing (HTC) cluster architecture. A 60 node Pentium III open-Mosix cluster is used with the Mosix kernel performing automatic process load-balancing across several CPUs. The Mosix cluster consists of several computer classes equipped with Windows NT workstations that automatically boot, daily and become nodes of the Mosix cluster. The models included in our study are a Bertini intra-nuclear cascade model with excitons, consisting of a pre-equilibrium model, a nucleus explosion model, a fission model and an evaporation model. The speed and accuracy obtained for these models is presented. (authors)

  17. Layered Architecture for Quantum Computing

    Directory of Open Access Journals (Sweden)

    N. Cody Jones

    2012-07-01

    Full Text Available We develop a layered quantum-computer architecture, which is a systematic framework for tackling the individual challenges of developing a quantum computer while constructing a cohesive device design. We discuss many of the prominent techniques for implementing circuit-model quantum computing and introduce several new methods, with an emphasis on employing surface-code quantum error correction. In doing so, we propose a new quantum-computer architecture based on optical control of quantum dots. The time scales of physical-hardware operations and logical, error-corrected quantum gates differ by several orders of magnitude. By dividing functionality into layers, we can design and analyze subsystems independently, demonstrating the value of our layered architectural approach. Using this concrete hardware platform, we provide resource analysis for executing fault-tolerant quantum algorithms for integer factoring and quantum simulation, finding that the quantum-dot architecture we study could solve such problems on the time scale of days.

  18. Geometric Computing for Freeform Architecture

    KAUST Repository

    Wallner, J.; Pottmann, Helmut

    2011-01-01

    Geometric computing has recently found a new field of applications, namely the various geometric problems which lie at the heart of rationalization and construction-aware design processes of freeform architecture. We report on our work in this area

  19. A Heterogeneous Quantum Computer Architecture

    NARCIS (Netherlands)

    Fu, X.; Riesebos, L.; Lao, L.; Garcia Almudever, C.; Sebastiano, F.; Versluis, R.; Charbon, E.; Bertels, K.

    2016-01-01

    In this paper, we present a high level view of the heterogeneous quantum computer architecture as any future quantum computer will consist of both a classical and quantum computing part. The classical part is needed for error correction as well as for the execution of algorithms that contain both

  20. Polymorphous Computing Architecture (PCA) Application Benchmark 1: Three-Dimensional Radar Data Processing

    National Research Council Canada - National Science Library

    Lebak, J

    2001-01-01

    The DARPA Polymorphous Computing Architecture (PCA) program is building advanced computer architectures that can reorganize their computation and communication structures to achieve better overall application performance...

  1. Programmable architecture for quantum computing

    NARCIS (Netherlands)

    Chen, J.; Wang, L.; Charbon, E.; Wang, B.

    2013-01-01

    A programmable architecture called “quantum FPGA (field-programmable gate array)” (QFPGA) is presented for quantum computing, which is a hybrid model combining the advantages of the qubus system and the measurement-based quantum computation. There are two kinds of buses in QFPGA, the local bus and

  2. Digital architecture, wearable computers and providing affinity

    DEFF Research Database (Denmark)

    Guglielmi, Michel; Johannesen, Hanne Louise

    2005-01-01

    as the setting for the events of experience. Contemporary architecture is a meta-space residing almost any thinkable field, striving to blur boundaries between art, architecture, design and urbanity and break down the distinction between the material and the user or inhabitant. The presentation for this paper...... will, through research, a workshop and participation in a cumulus competition, focus on the exploration of boundaries between digital architecture, performative space and wearable computers. Our design method in general focuses on the interplay between the performing body and the environment – between...

  3. Savannah River Site computing architecture

    Energy Technology Data Exchange (ETDEWEB)

    1991-03-29

    A computing architecture is a framework for making decisions about the implementation of computer technology and the supporting infrastructure. Because of the size, diversity, and amount of resources dedicated to computing at the Savannah River Site (SRS), there must be an overall strategic plan that can be followed by the thousands of site personnel who make decisions daily that directly affect the SRS computing environment and impact the site's production and business systems. This plan must address the following requirements: There must be SRS-wide standards for procurement or development of computing systems (hardware and software). The site computing organizations must develop systems that end users find easy to use. Systems must be put in place to support the primary function of site information workers. The developers of computer systems must be given tools that automate and speed up the development of information systems and applications based on computer technology. This document describes a proposal for a site-wide computing architecture that addresses the above requirements. In summary, this architecture is standards-based data-driven, and workstation-oriented with larger systems being utilized for the delivery of needed information to users in a client-server relationship.

  4. Savannah River Site computing architecture

    Energy Technology Data Exchange (ETDEWEB)

    1991-03-29

    A computing architecture is a framework for making decisions about the implementation of computer technology and the supporting infrastructure. Because of the size, diversity, and amount of resources dedicated to computing at the Savannah River Site (SRS), there must be an overall strategic plan that can be followed by the thousands of site personnel who make decisions daily that directly affect the SRS computing environment and impact the site`s production and business systems. This plan must address the following requirements: There must be SRS-wide standards for procurement or development of computing systems (hardware and software). The site computing organizations must develop systems that end users find easy to use. Systems must be put in place to support the primary function of site information workers. The developers of computer systems must be given tools that automate and speed up the development of information systems and applications based on computer technology. This document describes a proposal for a site-wide computing architecture that addresses the above requirements. In summary, this architecture is standards-based data-driven, and workstation-oriented with larger systems being utilized for the delivery of needed information to users in a client-server relationship.

  5. Computer Architecture A Quantitative Approach

    CERN Document Server

    Hennessy, John L

    2011-01-01

    The computing world today is in the middle of a revolution: mobile clients and cloud computing have emerged as the dominant paradigms driving programming and hardware innovation today. The Fifth Edition of Computer Architecture focuses on this dramatic shift, exploring the ways in which software and technology in the cloud are accessed by cell phones, tablets, laptops, and other mobile computing devices. Each chapter includes two real-world examples, one mobile and one datacenter, to illustrate this revolutionary change.Updated to cover the mobile computing revolutionEmphasizes the two most im

  6. Super-computer architecture

    CERN Document Server

    Hockney, R W

    1977-01-01

    This paper examines the design of the top-of-the-range, scientific, number-crunching computers. The market for such computers is not as large as that for smaller machines, but on the other hand it is by no means negligible. The present work-horse machines in this category are the CDC 7600 and IBM 360/195, and over fifty of the former machines have been sold. The types of installation that form the market for such machines are not only the major scientific research laboratories in the major countries-such as Los Alamos, CERN, Rutherford laboratory-but also major universities or university networks. It is also true that, as with sports cars, innovations made to satisfy the top of the market today often become the standard for the medium-scale computer of tomorrow. Hence there is considerable interest in examining present developments in this area. (0 refs).

  7. SCinet Architecture: Featured at the International Conference for High Performance Computing,Networking, Storage and Analysis 2016

    Energy Technology Data Exchange (ETDEWEB)

    Lyonnais, Marc; Smith, Matt; Mace, Kate P.

    2017-02-06

    SCinet is the purpose-built network that operates during the International Conference for High Performance Computing,Networking, Storage and Analysis (Super Computing or SC). Created each year for the conference, SCinet brings to life a high-capacity network that supports applications and experiments that are a hallmark of the SC conference. The network links the convention center to research and commercial networks around the world. This resource serves as a platform for exhibitors to demonstrate the advanced computing resources of their home institutions and elsewhere by supporting a wide variety of applications. Volunteers from academia, government and industry work together to design and deliver the SCinet infrastructure. Industry vendors and carriers donate millions of dollars in equipment and services needed to build and support the local and wide area networks. Planning begins more than a year in advance of each SC conference and culminates in a high intensity installation in the days leading up to the conference. The SCinet architecture for SC16 illustrates a dramatic increase in participation from the vendor community, particularly those that focus on network equipment. Software-Defined Networking (SDN) and Data Center Networking (DCN) are present in nearly all aspects of the design.

  8. The new landscape of parallel computer architecture

    Energy Technology Data Exchange (ETDEWEB)

    Shalf, John [NERSC Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley California, 94720 (United States)

    2007-07-15

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.

  9. The new landscape of parallel computer architecture

    International Nuclear Information System (INIS)

    Shalf, John

    2007-01-01

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models

  10. VLSI Architectures for Computing DFT's

    Science.gov (United States)

    Truong, T. K.; Chang, J. J.; Hsu, I. S.; Reed, I. S.; Pei, D. Y.

    1986-01-01

    Simplifications result from use of residue Fermat number systems. System of finite arithmetic over residue Fermat number systems enables calculation of discrete Fourier transform (DFT) of series of complex numbers with reduced number of multiplications. Computer architectures based on approach suitable for design of very-large-scale integrated (VLSI) circuits for computing DFT's. General approach not limited to DFT's; Applicable to decoding of error-correcting codes and other transform calculations. System readily implemented in VLSI.

  11. High-level language computer architecture

    CERN Document Server

    Chu, Yaohan

    1975-01-01

    High-Level Language Computer Architecture offers a tutorial on high-level language computer architecture, including von Neumann architecture and syntax-oriented architecture as well as direct and indirect execution architecture. Design concepts of Japanese-language data processing systems are discussed, along with the architecture of stack machines and the SYMBOL computer system. The conceptual design of a direct high-level language processor is also described.Comprised of seven chapters, this book first presents a classification of high-level language computer architecture according to the pr

  12. Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture

    Science.gov (United States)

    Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

    2014-02-11

    Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.

  13. Electromagnetic Physics Models for Parallel Computing Architectures

    International Nuclear Information System (INIS)

    Amadio, G; Bianchini, C; Iope, R; Ananya, A; Apostolakis, J; Aurora, A; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S Y; Lima, G; Duhem, L

    2016-01-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. (paper)

  14. Electromagnetic Physics Models for Parallel Computing Architectures

    Science.gov (United States)

    Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

    2016-10-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.

  15. Truth in advertising: Reporting performance of computer programs, algorithms and the impact of architecture

    Directory of Open Access Journals (Sweden)

    Scott Hazelhurst

    2010-11-01

    Full Text Available The level of detail and precision that appears in the experimental methodology section computer science papers is usually much less than in natural science disciplines. This is partially justified by different nature of experiments. The experimental evidence presented here shows that the time taken by the same algorithm varies so significantly on different CPUs that without knowing the exact model of CPU, it is difficult to compare the results. This is placed in context by analysing a cross-section of experimental results reported in the literature. The reporting of experimental results is sometimes insufficient to allow experiments to be replicated, and in some case is insufficient to support the claims made for the algorithms. New standards for reporting on algorithms results are suggested.

  16. A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures

    Science.gov (United States)

    Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.

    2016-05-01

    In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.

  17. The Software Architecture for Performing Scientific Computation with the JLAPACK Libraries in ScalaLab

    Directory of Open Access Journals (Sweden)

    Stergios Papadimitriou

    2012-01-01

    Full Text Available Although LAPACK is a powerful library its utilization is difficult. JLAPACK, a Java translation obtained automatically from the Fortran LAPACK sources, retains exactly the same difficult to use interface of LAPACK routines. The MTJ library implements an object oriented Java interface to JLAPACK that hides many complicated details. ScalaLab exploits the flexibility of the Scala language to present an even more friendly and convenient interface to the powerful but complicated JLAPACK library. The article describes the interfacing of the low-level JLAPACK routines within the ScalaLab environment. This is performed rather easily by exploiting well suited features of the Scala language. Also, the paper demonstrates the convenience of using JLAPACK routines for linear algebra operations from within ScalaLab.

  18. Computer architecture fundamentals and principles of computer design

    CERN Document Server

    Dumas II, Joseph D

    2005-01-01

    Introduction to Computer ArchitectureWhat is Computer Architecture?Architecture vs. ImplementationBrief History of Computer SystemsThe First GenerationThe Second GenerationThe Third GenerationThe Fourth GenerationModern Computers - The Fifth GenerationTypes of Computer SystemsSingle Processor SystemsParallel Processing SystemsSpecial ArchitecturesQuality of Computer SystemsGenerality and ApplicabilityEase of UseExpandabilityCompatibilityReliabilitySuccess and Failure of Computer Architectures and ImplementationsQuality and the Perception of QualityCost IssuesArchitectural Openness, Market Timi

  19. Computing on Knights and Kepler Architectures

    International Nuclear Information System (INIS)

    Bortolotti, G; Caberletti, M; Ferraro, A; Giacomini, F; Manzali, M; Maron, G; Salomoni, D; Crimi, G; Zanella, M

    2014-01-01

    A recent trend in scientific computing is the increasingly important role of co-processors, originally built to accelerate graphics rendering, and now used for general high-performance computing. The INFN Computing On Knights and Kepler Architectures (COKA) project focuses on assessing the suitability of co-processor boards for scientific computing in a wide range of physics applications, and on studying the best programming methodologies for these systems. Here we present in a comparative way our results in porting a Lattice Boltzmann code on two state-of-the-art accelerators: the NVIDIA K20X, and the Intel Xeon-Phi. We describe our implementations, analyze results and compare with a baseline architecture adopting Intel Sandy Bridge CPUs.

  20. Geometric Computing for Freeform Architecture

    KAUST Repository

    Wallner, J.

    2011-06-03

    Geometric computing has recently found a new field of applications, namely the various geometric problems which lie at the heart of rationalization and construction-aware design processes of freeform architecture. We report on our work in this area, dealing with meshes with planar faces and meshes which allow multilayer constructions (which is related to discrete surfaces and their curvatures), triangles meshes with circle-packing properties (which is related to conformal uniformization), and with the paneling problem. We emphasize the combination of numerical optimization and geometric knowledge.

  1. Computer aid in solar architecture

    Energy Technology Data Exchange (ETDEWEB)

    Rosendahl, E W

    1982-02-01

    Among architects the question is being discussed in how far new buildings can be designed in a way to make more economical use of energy by architectural means. Solar houses in the USA are often taken as a model. As yet it is unclear how such measures will affect heat demand in the central European climate and with domestic building materials being used. A computer simulation program is introduced by which these questions can be answered as early as in the stage of planning. The program can be run on a common microcomputersystem.

  2. RGCA: A Reliable GPU Cluster Architecture for Large-Scale Internet of Things Computing Based on Effective Performance-Energy Optimization.

    Science.gov (United States)

    Fang, Yuling; Chen, Qingkui; Xiong, Neal N; Zhao, Deyu; Wang, Jingjuan

    2017-08-04

    This paper aims to develop a low-cost, high-performance and high-reliability computing system to process large-scale data using common data mining algorithms in the Internet of Things (IoT) computing environment. Considering the characteristics of IoT data processing, similar to mainstream high performance computing, we use a GPU (Graphics Processing Unit) cluster to achieve better IoT services. Firstly, we present an energy consumption calculation method (ECCM) based on WSNs. Then, using the CUDA (Compute Unified Device Architecture) Programming model, we propose a Two-level Parallel Optimization Model (TLPOM) which exploits reasonable resource planning and common compiler optimization techniques to obtain the best blocks and threads configuration considering the resource constraints of each node. The key to this part is dynamic coupling Thread-Level Parallelism (TLP) and Instruction-Level Parallelism (ILP) to improve the performance of the algorithms without additional energy consumption. Finally, combining the ECCM and the TLPOM, we use the Reliable GPU Cluster Architecture (RGCA) to obtain a high-reliability computing system considering the nodes' diversity, algorithm characteristics, etc. The results show that the performance of the algorithms significantly increased by 34.1%, 33.96% and 24.07% for Fermi, Kepler and Maxwell on average with TLPOM and the RGCA ensures that our IoT computing system provides low-cost and high-reliability services.

  3. Computer architecture evaluation for structural dynamics computations: Project summary

    Science.gov (United States)

    Standley, Hilda M.

    1989-01-01

    The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies.

  4. Brain architecture: a design for natural computation.

    Science.gov (United States)

    Kaiser, Marcus

    2007-12-15

    Fifty years ago, John von Neumann compared the architecture of the brain with that of the computers he invented and which are still in use today. In those days, the organization of computers was based on concepts of brain organization. Here, we give an update on current results on the global organization of neural systems. For neural systems, we outline how the spatial and topological architecture of neuronal and cortical networks facilitates robustness against failures, fast processing and balanced network activation. Finally, we discuss mechanisms of self-organization for such architectures. After all, the organization of the brain might again inspire computer architecture.

  5. Computer programming and architecture the VAX

    CERN Document Server

    Levy, Henry

    2014-01-01

    Takes a unique systems approach to programming and architecture of the VAXUsing the VAX as a detailed example, the first half of this book offers a complete course in assembly language programming. The second describes higher-level systems issues in computer architecture. Highlights include the VAX assembler and debugger, other modern architectures such as RISCs, multiprocessing and parallel computing, microprogramming, caches and translation buffers, and an appendix on the Berkeley UNIX assembler.

  6. ELASTIC CLOUD COMPUTING ARCHITECTURE AND SYSTEM FOR HETEROGENEOUS SPATIOTEMPORAL COMPUTING

    Directory of Open Access Journals (Sweden)

    X. Shi

    2017-10-01

    Full Text Available Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs, while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.

  7. Elastic Cloud Computing Architecture and System for Heterogeneous Spatiotemporal Computing

    Science.gov (United States)

    Shi, X.

    2017-10-01

    Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs), while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC) or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA) may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.

  8. Fast semivariogram computation using FPGA architectures

    Science.gov (United States)

    Lagadapati, Yamuna; Shirvaikar, Mukul; Dong, Xuanliang

    2015-02-01

    The semivariogram is a statistical measure of the spatial distribution of data and is based on Markov Random Fields (MRFs). Semivariogram analysis is a computationally intensive algorithm that has typically seen applications in the geosciences and remote sensing areas. Recently, applications in the area of medical imaging have been investigated, resulting in the need for efficient real time implementation of the algorithm. The semivariogram is a plot of semivariances for different lag distances between pixels. A semi-variance, γ(h), is defined as the half of the expected squared differences of pixel values between any two data locations with a lag distance of h. Due to the need to examine each pair of pixels in the image or sub-image being processed, the base algorithm complexity for an image window with n pixels is O(n2). Field Programmable Gate Arrays (FPGAs) are an attractive solution for such demanding applications due to their parallel processing capability. FPGAs also tend to operate at relatively modest clock rates measured in a few hundreds of megahertz, but they can perform tens of thousands of calculations per clock cycle while operating in the low range of power. This paper presents a technique for the fast computation of the semivariogram using two custom FPGA architectures. The design consists of several modules dedicated to the constituent computational tasks. A modular architecture approach is chosen to allow for replication of processing units. This allows for high throughput due to concurrent processing of pixel pairs. The current implementation is focused on isotropic semivariogram computations only. Anisotropic semivariogram implementation is anticipated to be an extension of the current architecture, ostensibly based on refinements to the current modules. The algorithm is benchmarked using VHDL on a Xilinx XUPV5-LX110T development Kit, which utilizes the Virtex5 FPGA. Medical image data from MRI scans are utilized for the experiments

  9. Architectures for single-chip image computing

    Science.gov (United States)

    Gove, Robert J.

    1992-04-01

    This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.

  10. Brain architecture: A design for natural computation

    OpenAIRE

    Kaiser, Marcus

    2008-01-01

    Fifty years ago, John von Neumann compared the architecture of the brain with that of computers that he invented and which is still in use today. In those days, the organisation of computers was based on concepts of brain organisation. Here, we give an update on current results on the global organisation of neural systems. For neural systems, we outline how the spatial and topological architecture of neuronal and cortical networks facilitates robustness against failures, fast processing, and ...

  11. A Multi-Time Scale Morphable Software Milieu for Polymorphous Computing Architectures (PCA) - Composable, Scalable Systems

    National Research Council Canada - National Science Library

    Skjellum, Anthony

    2004-01-01

    Polymorphous Computing Architectures (PCA) rapidly "morph" (reorganize) software and hardware configurations in order to achieve high performance on computation styles ranging from specialized streaming to general threaded applications...

  12. Performative Architecture and Urban Spaces

    DEFF Research Database (Denmark)

    Kiib, Hans

    2008-01-01

      3 Workshops one exibition   Three conceptual architectural workshops took take place in parallel from August 16th - 22nd 2008. Each workshop carried a specific methodology and the goal is to come up with conceptual proposals that could be further developed for selected sites in the city of Aalb...... This workshop focus on temporary architecture and urban catalysts. Informal spaces and the interface between the built and the void are foremost in the development of performative urban environments and cultural interaction. ......  3 Workshops one exibition   Three conceptual architectural workshops took take place in parallel from August 16th - 22nd 2008. Each workshop carried a specific methodology and the goal is to come up with conceptual proposals that could be further developed for selected sites in the city...... The workshop model includes an open workshop where a handful of international architects are invited to spend five days with local architects, engineers and scholars contributing to a work of architectural vision and quality. The workshop includes presentations and discussions and development of projects...

  13. A computer architecture for intelligent machines

    Science.gov (United States)

    Lefebvre, D. R.; Saridis, G. N.

    1992-01-01

    The theory of intelligent machines proposes a hierarchical organization for the functions of an autonomous robot based on the principle of increasing precision with decreasing intelligence. An analytic formulation of this theory using information-theoretic measures of uncertainty for each level of the intelligent machine has been developed. The authors present a computer architecture that implements the lower two levels of the intelligent machine. The architecture supports an event-driven programming paradigm that is independent of the underlying computer architecture and operating system. Execution-level controllers for motion and vision systems are briefly addressed, as well as the Petri net transducer software used to implement coordination-level functions. A case study illustrates how this computer architecture integrates real-time and higher-level control of manipulator and vision systems.

  14. Security Architecture of Cloud Computing

    OpenAIRE

    V.KRISHNA REDDY; Dr. L.S.S.REDDY

    2011-01-01

    The Cloud Computing offers service over internet with dynamically scalable resources. Cloud Computing services provides benefits to the users in terms of cost and ease of use. Cloud Computing services need to address the security during the transmission of sensitive data and critical applications to shared and public cloud environments. The cloud environments are scaling large for data processing and storage needs. Cloud computing environment have various advantages as well as disadvantages o...

  15. Fundamentals of computer architecture and design

    CERN Document Server

    Bindal, Ahmet

    2017-01-01

    This textbook provides semester-length coverage of computer architecture and design, providing a strong foundation for students to understand modern computer system architecture and to apply these insights and principles to future computer designs.  It is based on the author’s decades of industrial experience with computer architecture and design, as well as with teaching students focused on pursuing careers in computer engineering.  Unlike a number of existing textbooks for this course, this one focuses not only on CPU architecture, but also covers in great detail in system buses, peripherals and memories.This book teaches every element in a computing system in two steps.  First, it introduces the functionality of each topic (and subtopics) and then goes into “from-scratch design” of a particular digital block from its architectural specifications using timing diagrams.  The author describes how the data-path of a certain digital block is generated using timin g diagrams, a method which most textbo...

  16. Quantum computation architecture using optical tweezers

    DEFF Research Database (Denmark)

    Weitenberg, Christof; Kuhr, Stefan; Mølmer, Klaus

    2011-01-01

    We present a complete architecture for scalable quantum computation with ultracold atoms in optical lattices using optical tweezers focused to the size of a lattice spacing. We discuss three different two-qubit gates based on local collisional interactions. The gates between arbitrary qubits...... quantum computing....

  17. Cloud Computing: Architecture and Services

    OpenAIRE

    Ms. Ravneet Kaur

    2018-01-01

    Cloud computing is Internet-based computing, whereby shared resources, software, and information are provided to computers and other devices on demand, like the electricity grid. It is a method for delivering information technology (IT) services where resources are retrieved from the Internet through web-based tools and applications, as opposed to a direct connection to a server. Rather than keeping files on a proprietary hard drive or local storage device, cloud-based storage makes it possib...

  18. Monte Carlo simulations on SIMD computer architectures

    International Nuclear Information System (INIS)

    Burmester, C.P.; Gronsky, R.; Wille, L.T.

    1992-01-01

    In this paper algorithmic considerations regarding the implementation of various materials science applications of the Monte Carlo technique to single instruction multiple data (SIMD) computer architectures are presented. In particular, implementation of the Ising model with nearest, next nearest, and long range screened Coulomb interactions on the SIMD architecture MasPar MP-1 (DEC mpp-12000) series of massively parallel computers is demonstrated. Methods of code development which optimize processor array use and minimize inter-processor communication are presented including lattice partitioning and the use of processor array spanning tree structures for data reduction. Both geometric and algorithmic parallel approaches are utilized. Benchmarks in terms of Monte Carl updates per second for the MasPar architecture are presented and compared to values reported in the literature from comparable studies on other architectures

  19. Architecture, systems research and computational sciences

    CERN Document Server

    2012-01-01

    The Winter 2012 (vol. 14 no. 1) issue of the Nexus Network Journal is dedicated to the theme “Architecture, Systems Research and Computational Sciences”. This is an outgrowth of the session by the same name which took place during the eighth international, interdisciplinary conference “Nexus 2010: Relationships between Architecture and Mathematics, held in Porto, Portugal, in June 2010. Today computer science is an integral part of even strictly historical investigations, such as those concerning the construction of vaults, where the computer is used to survey the existing building, analyse the data and draw the ideal solution. What the papers in this issue make especially evident is that information technology has had an impact at a much deeper level as well: architecture itself can now be considered as a manifestation of information and as a complex system. The issue is completed with other research papers, conference reports and book reviews.

  20. Switching from computer to microcomputer architecture education

    Science.gov (United States)

    Bolanakis, Dimosthenis E.; Kotsis, Konstantinos T.; Laopoulos, Theodore

    2010-03-01

    In the last decades, the technological and scientific evolution of the computing discipline has been widely affecting research in software engineering education, which nowadays advocates more enlightened and liberal ideas. This article reviews cross-disciplinary research on a computer architecture class in consideration of its switching to microcomputer architecture. The authors present their strategies towards a successful crossing of boundaries between engineering disciplines. This communication aims at providing a different aspect on professional courses that are, nowadays, addressed at the expense of traditional courses.

  1. CAAD as Computer-Activated Architectural Design

    DEFF Research Database (Denmark)

    Galle, Per

    1998-01-01

    In a brief sketch, drawing on a general philosophical conception of human interaction with the world, the architectural design process is analysed in terms of two kinds of human action: interpretation and production. Both of these are seen as establishing a link between mental and material entities....... On this background two alternative roles of computers in computer-aided architectural design (CAAD) are distinguished: a passive and a more active role, where in the latter case, the computer’s capacity for symbol manipulation is utilized to influence design thinking actively. The analysis offered in this paper may...... serve at least two purposes: to provide a conceptual machinery for research and reflection on CAAD, and to clarify the notion of ‘artificial intelligence’ in the light of architectural design....

  2. The Fermilab central computing facility architectural model

    International Nuclear Information System (INIS)

    Nicholls, J.

    1989-01-01

    The goal of the current Central Computing Upgrade at Fermilab is to create a computing environment that maximizes total productivity, particularly for high energy physics analysis. The Computing Department and the Next Computer Acquisition Committee decided upon a model which includes five components: an interactive front-end, a Large-Scale Scientific Computer (LSSC, a mainframe computing engine), a microprocessor farm system, a file server, and workstations. With the exception of the file server, all segments of this model are currently in production: a VAX/VMS cluster interactive front-end, an Amdahl VM Computing engine, ACP farms, and (primarily) VMS workstations. This paper will discuss the implementation of the Fermilab Central Computing Facility Architectural Model. Implications for Code Management in such a heterogeneous environment, including issues such as modularity and centrality, will be considered. Special emphasis will be placed on connectivity and communications between the front-end, LSSC, and workstations, as practiced at Fermilab. (orig.)

  3. The Fermilab Central Computing Facility architectural model

    International Nuclear Information System (INIS)

    Nicholls, J.

    1989-05-01

    The goal of the current Central Computing Upgrade at Fermilab is to create a computing environment that maximizes total productivity, particularly for high energy physics analysis. The Computing Department and the Next Computer Acquisition Committee decided upon a model which includes five components: an interactive front end, a Large-Scale Scientific Computer (LSSC, a mainframe computing engine), a microprocessor farm system, a file server, and workstations. With the exception of the file server, all segments of this model are currently in production: a VAX/VMS Cluster interactive front end, an Amdahl VM computing engine, ACP farms, and (primarily) VMS workstations. This presentation will discuss the implementation of the Fermilab Central Computing Facility Architectural Model. Implications for Code Management in such a heterogeneous environment, including issues such as modularity and centrality, will be considered. Special emphasis will be placed on connectivity and communications between the front-end, LSSC, and workstations, as practiced at Fermilab. 2 figs

  4. Computer aided architectural design : futures 2001

    NARCIS (Netherlands)

    Vries, de B.; Leeuwen, van J.P.; Achten, H.H.

    2001-01-01

    CAAD Futures is a bi-annual conference that aims to promote the advancement of computer-aided architectural design in the service of those concerned with the quality of the built environment. The conferences are organized under the auspices of the CAAD Futures Foundation, which has its secretariat

  5. Large computer systems and new architectures

    International Nuclear Information System (INIS)

    Bloch, T.

    1978-01-01

    The super-computers of today are becoming quite specialized and one can no longer expect to get all the state-of-the-art software and hardware facilities in one package. In order to achieve faster and faster computing it is necessary to experiment with new architectures, and the cost of developing each experimental architecture into a general-purpose computer system is too high when one considers the relatively small market for these computers. The result is that such computers are becoming 'back-ends' either to special systems (BSP, DAP) or to anything (CRAY-1). Architecturally the CRAY-1 is the most attractive today since it guarantees a speed gain of a factor of two over a CDC 7600 thus allowing us to regard any speed up resulting from vectorization as a bonus. It looks, however, as if it will be very difficult to make substantially faster computers using only pipe-lining techniques and that it will be necessary to explore multiple processors working on the same problem. The experience which will be gained with the BSP and the DAP over the next few years will certainly be most valuable in this respect. (Auth.)

  6. Initial results on computational performance of Intel Many Integrated Core (MIC) architecture: implementation of the Weather and Research Forecasting (WRF) Purdue-Lin microphysics scheme

    Science.gov (United States)

    Mielikainen, Jarno; Huang, Bormin; Huang, Allen H.

    2014-10-01

    Purdue-Lin scheme is a relatively sophisticated microphysics scheme in the Weather Research and Forecasting (WRF) model. The scheme includes six classes of hydro meteors: water vapor, cloud water, raid, cloud ice, snow and graupel. The scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. In this paper, we accelerate the Purdue Lin scheme using Intel Many Integrated Core Architecture (MIC) hardware. The Intel Xeon Phi is a high performance coprocessor consists of up to 61 cores. The Xeon Phi is connected to a CPU via the PCI Express (PICe) bus. In this paper, we will discuss in detail the code optimization issues encountered while tuning the Purdue-Lin microphysics Fortran code for Xeon Phi. In particularly, getting a good performance required utilizing multiple cores, the wide vector operations and make efficient use of memory. The results show that the optimizations improved performance of the original code on Xeon Phi 5110P by a factor of 4.2x. Furthermore, the same optimizations improved performance on Intel Xeon E5-2603 CPU by a factor of 1.2x compared to the original code.

  7. Optimization and mathematical modeling in computer architecture

    CERN Document Server

    Sankaralingam, Karu; Nowatzki, Tony

    2013-01-01

    In this book we give an overview of modeling techniques used to describe computer systems to mathematical optimization tools. We give a brief introduction to various classes of mathematical optimization frameworks with special focus on mixed integer linear programming which provides a good balance between solver time and expressiveness. We present four detailed case studies -- instruction set customization, data center resource management, spatial architecture scheduling, and resource allocation in tiled architectures -- showing how MILP can be used and quantifying by how much it outperforms t

  8. Smart SOA platforms in cloud computing architectures

    CERN Document Server

    Exposito , Ernesto

    2014-01-01

    This book is intended to introduce the principles of the Event-Driven and Service-Oriented Architecture (SOA 2.0) and its role in the new interconnected world based on the cloud computing architecture paradigm. In this new context, the concept of "service" is widely applied to the hardware and software resources available in the new generation of the Internet. The authors focus on how current and future SOA technologies provide the basis for the smart management of the service model provided by the Platform as a Service (PaaS) layer.

  9. ATCA for Machines-- Advanced Telecommunications Computing Architecture

    Energy Technology Data Exchange (ETDEWEB)

    Larsen, R.S.; /SLAC

    2008-04-22

    The Advanced Telecommunications Computing Architecture is a new industry open standard for electronics instrument modules and shelves being evaluated for the International Linear Collider (ILC). It is the first industrial standard designed for High Availability (HA). ILC availability simulations have shown clearly that the capabilities of ATCA are needed in order to achieve acceptable integrated luminosity. The ATCA architecture looks attractive for beam instruments and detector applications as well. This paper provides an overview of ongoing R&D including application of HA principles to power electronics systems.

  10. ATCA for Machines-- Advanced Telecommunications Computing Architecture

    International Nuclear Information System (INIS)

    Larsen, R

    2008-01-01

    The Advanced Telecommunications Computing Architecture is a new industry open standard for electronics instrument modules and shelves being evaluated for the International Linear Collider (ILC). It is the first industrial standard designed for High Availability (HA). ILC availability simulations have shown clearly that the capabilities of ATCA are needed in order to achieve acceptable integrated luminosity. The ATCA architecture looks attractive for beam instruments and detector applications as well. This paper provides an overview of ongoing R and D including application of HA principles to power electronics systems

  11. Architecture independent environment for developing engineering software on MIMD computers

    Science.gov (United States)

    Valimohamed, Karim A.; Lopez, L. A.

    1990-01-01

    Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management.

  12. Addressing Cloud Computing in Enterprise Architecture: Issues and Challenges

    OpenAIRE

    Khan, Khaled; Gangavarapu, Narendra

    2009-01-01

    This article discusses how the characteristics of cloud computing affect the enterprise architecture in four domains: business, data, application and technology. The ownership and control of architectural components are shifted from organisational perimeters to cloud providers. It argues that although cloud computing promises numerous benefits to enterprises, the shifting control from enterprises to cloud providers on architectural components introduces several architectural challenges. The d...

  13. Field-programmable custom computing technology architectures, tools, and applications

    CERN Document Server

    Luk, Wayne; Pocek, Ken

    2000-01-01

    Field-Programmable Custom Computing Technology: Architectures, Tools, and Applications brings together in one place important contributions and up-to-date research results in this fast-moving area. In seven selected chapters, the book describes the latest advances in architectures, design methods, and applications of field-programmable devices for high-performance reconfigurable systems. The contributors to this work were selected from the leading researchers and practitioners in the field. It will be valuable to anyone working or researching in the field of custom computing technology. It serves as an excellent reference, providing insight into some of the most challenging issues being examined today.

  14. Roadmap to the SRS computing architecture

    Energy Technology Data Exchange (ETDEWEB)

    Johnson, A.

    1994-07-05

    This document outlines the major steps that must be taken by the Savannah River Site (SRS) to migrate the SRS information technology (IT) environment to the new architecture described in the Savannah River Site Computing Architecture. This document proposes an IT environment that is {open_quotes}...standards-based, data-driven, and workstation-oriented, with larger systems being utilized for the delivery of needed information to users in a client-server relationship.{close_quotes} Achieving this vision will require many substantial changes in the computing applications, systems, and supporting infrastructure at the site. This document consists of a set of roadmaps which provide explanations of the necessary changes for IT at the site and describes the milestones that must be completed to finish the migration.

  15. Computer Architecture for Energy Efficient SFQ

    Science.gov (United States)

    2014-08-27

    IBM Corporation (T.J. Watson Research Laboratory) 1101 Kitchawan Road Yorktown Heights, NY 10598 -0000 2 ABSTRACT Number of Papers published in peer...accomplished during this ARO-sponsored project at IBM Research to identify and model an energy efficient SFQ-based computer architecture. The... IBM Windsor Blue (WB), illustrated schematically in Figure 2. The basic building block of WB is a "tile" comprised of a 64-bit arithmetic logic unit

  16. Centaure: an heterogeneous parallel architecture for computer vision

    International Nuclear Information System (INIS)

    Peythieux, Marc

    1997-01-01

    This dissertation deals with the architecture of parallel computers dedicated to computer vision. In the first chapter, the problem to be solved is presented, as well as the architecture of the Sympati and Symphonie computers, on which this work is based. The second chapter is about the state of the art of computers and integrated processors that can execute computer vision and image processing codes. The third chapter contains a description of the architecture of Centaure. It has an heterogeneous structure: it is composed of a multiprocessor system based on Analog Devices ADSP21060 Sharc digital signal processor, and of a set of Symphonie computers working in a multi-SIMD fashion. Centaure also has a modular structure. Its basic node is composed of one Symphonie computer, tightly coupled to a Sharc thanks to a dual ported memory. The nodes of Centaure are linked together by the Sharc communication links. The last chapter deals with a performance validation of Centaure. The execution times on Symphonie and on Centaure of a benchmark which is typical of industrial vision, are presented and compared. In the first place, these results show that the basic node of Centaure allows a faster execution than Symphonie, and that increasing the size of the tested computer leads to a better speed-up with Centaure than with Symphonie. In the second place, these results validate the choice of running the low level structure of Centaure in a multi- SIMD fashion. (author) [fr

  17. Neuromorphic Computing – From Materials Research to Systems Architecture Roundtable

    Energy Technology Data Exchange (ETDEWEB)

    Schuller, Ivan K. [Univ. of California, San Diego, CA (United States); Stevens, Rick [Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States); Pino, Robinson [Dept. of Energy (DOE) Office of Science, Washington, DC (United States); Pechan, Michael [Dept. of Energy (DOE) Office of Science, Washington, DC (United States)

    2015-10-29

    Computation in its many forms is the engine that fuels our modern civilization. Modern computation—based on the von Neumann architecture—has allowed, until now, the development of continuous improvements, as predicted by Moore’s law. However, computation using current architectures and materials will inevitably—within the next 10 years—reach a limit because of fundamental scientific reasons. DOE convened a roundtable of experts in neuromorphic computing systems, materials science, and computer science in Washington on October 29-30, 2015 to address the following basic questions: Can brain-like (“neuromorphic”) computing devices based on new material concepts and systems be developed to dramatically outperform conventional CMOS based technology? If so, what are the basic research challenges for materials sicence and computing? The overarching answer that emerged was: The development of novel functional materials and devices incorporated into unique architectures will allow a revolutionary technological leap toward the implementation of a fully “neuromorphic” computer. To address this challenge, the following issues were considered: The main differences between neuromorphic and conventional computing as related to: signaling models, timing/clock, non-volatile memory, architecture, fault tolerance, integrated memory and compute, noise tolerance, analog vs. digital, and in situ learning New neuromorphic architectures needed to: produce lower energy consumption, potential novel nanostructured materials, and enhanced computation Device and materials properties needed to implement functions such as: hysteresis, stability, and fault tolerance Comparisons of different implementations: spin torque, memristors, resistive switching, phase change, and optical schemes for enhanced breakthroughs in performance, cost, fault tolerance, and/or manufacturability.

  18. Development and Performance of the Modularized, High-performance Computing and Hybrid-architecture Capable GEOS-Chem Chemical Transport Model

    Science.gov (United States)

    Long, M. S.; Yantosca, R.; Nielsen, J.; Linford, J. C.; Keller, C. A.; Payer Sulprizio, M.; Jacob, D. J.

    2014-12-01

    The GEOS-Chem global chemical transport model (CTM), used by a large atmospheric chemistry research community, has been reengineered to serve as a platform for a range of computational atmospheric chemistry science foci and applications. Development included modularization for coupling to general circulation and Earth system models (ESMs) and the adoption of co-processor capable atmospheric chemistry solvers. This was done using an Earth System Modeling Framework (ESMF) interface that operates independently of GEOS-Chem scientific code to permit seamless transition from the GEOS-Chem stand-alone serial CTM to deployment as a coupled ESM module. In this manner, the continual stream of updates contributed by the CTM user community is automatically available for broader applications, which remain state-of-science and directly referenceable to the latest version of the standard GEOS-Chem CTM. These developments are now available as part of the standard version of the GEOS-Chem CTM. The system has been implemented as an atmospheric chemistry module within the NASA GEOS-5 ESM. The coupled GEOS-5/GEOS-Chem system was tested for weak and strong scalability and performance with a tropospheric oxidant-aerosol simulation. Results confirm that the GEOS-Chem chemical operator scales efficiently for any number of processes. Although inclusion of atmospheric chemistry in ESMs is computationally expensive, the excellent scalability of the chemical operator means that the relative cost goes down with increasing number of processes, making fine-scale resolution simulations possible.

  19. Compact, open-architecture computed radiography system

    International Nuclear Information System (INIS)

    Huang, H.K.; Lim, A.; Kangarloo, H.; Eldredge, S.; Loloyan, M.; Chuang, K.S.

    1990-01-01

    Computed radiography (CR) was introduced in 1982, and its basic system design has not changed. Current CR systems have certain limitations: spatial resolution and signal-to-noise ratios are lower than those of screen-film systems, they are complicated and expensive to build, and they have a closed architecture. The authors of this paper designed and implemented a simpler, lower-cost, compact, open-architecture CR system to overcome some of these limitations. The open-architecture system is a manual-load-single-plate reader that can fit on a desk top. Phosphor images are stored in a local disk and can be sent to any other computer through standard interfaces. Any manufacturer's plate can be read with a scanning time of 90 second for a 35 x 43-cm plate. The standard pixel size is 174 μm and can be adjusted for higher spatial resolution. The data resolution is 12 bits/pixel over an x-ray exposure range of 0.01-100 mR

  20. A Semi-Automated Machine Learning Algorithm for Tree Cover Delineation from 1-m Naip Imagery Using a High Performance Computing Architecture

    Science.gov (United States)

    Basu, S.; Ganguly, S.; Nemani, R. R.; Mukhopadhyay, S.; Milesi, C.; Votava, P.; Michaelis, A.; Zhang, G.; Cook, B. D.; Saatchi, S. S.; Boyda, E.

    2014-12-01

    Accurate tree cover delineation is a useful instrument in the derivation of Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) satellite imagery data. Numerous algorithms have been designed to perform tree cover delineation in high to coarse resolution satellite imagery, but most of them do not scale to terabytes of data, typical in these VHR datasets. In this paper, we present an automated probabilistic framework for the segmentation and classification of 1-m VHR data as obtained from the National Agriculture Imagery Program (NAIP) for deriving tree cover estimates for the whole of Continental United States, using a High Performance Computing Architecture. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field (CRF), which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by incorporating expert knowledge through the relabeling of misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the state of California, which covers a total of 11,095 NAIP tiles and spans a total geographical area of 163,696 sq. miles. Our framework produced correct detection rates of around 85% for fragmented forests and 70% for urban tree cover areas, with false positive rates lower than 3% for both regions. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR high-resolution canopy height model shows the effectiveness of our algorithm in generating accurate high-resolution tree cover maps.

  1. Developing a Distributed Computing Architecture at Arizona State University.

    Science.gov (United States)

    Armann, Neil; And Others

    1994-01-01

    Development of Arizona State University's computing architecture, designed to ensure that all new distributed computing pieces will work together, is described. Aspects discussed include the business rationale, the general architectural approach, characteristics and objectives of the architecture, specific services, and impact on the university…

  2. A Computational Architecture for Programmable Automation Research

    Science.gov (United States)

    Taylor, Russell H.; Korein, James U.; Maier, Georg E.; Durfee, Lawrence F.

    1987-03-01

    This short paper describes recent work at the IBM T. J. Watson Research Center directed at developing a highly flexible computational architecture for research on sensor-based programmable automation. The system described here has been designed with a focus on dynamic configurability, layered user inter-faces and incorporation of sensor-based real time operations into new commands. It is these features which distinguish it from earlier work. The system is cur-rently being implemented at IBM for research purposes and internal use and is an outgrowth of programmable automation research which has been ongoing since 1972 [e.g., 1, 2, 3, 4, 5, 6] .

  3. Performances of multiprocessor multidisk architectures for continuous media storage

    Science.gov (United States)

    Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.

    1996-03-01

    Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.

  4. A computer architecture for the implementation of SDL

    Energy Technology Data Exchange (ETDEWEB)

    Crutcher, L A

    1989-01-01

    Finite State Machines (FSMs) are a part of well-established automata theory. The FSM model is useful in all stages of system design, from abstract specification to implementation in hardware. The FSM model has been studied as a technique in software design, and the implementation of this type of software considered. The Specification and Description Language (SDL) has been considered in detail as an example of this approach. The complexity of systems designed using SDL warrants their implementation through a programmed computer. A benchmark for the implementation of SDL has been established and the performance of SDL on three particular computer architectures investigated. Performance is judged according to this benchmark and also the ease of implementation, which is related to the confidence of a correct implementation. The implementation on 68000s and transputers is considered as representative of established and state-of-the-art microprocessors respectively. A third architecture that uses a processor that has been proposed specifically for the implementation of SDL is considered as a high-level custom architecture. Analysis and measurements of the benchmark on each architecture indicates that the execution time of SDL decreases by an order of magnitude from the 68000 to the transputer to the custom architecture. The ease of implementation is also greater when the execution time is reduced. A study of some real applications of SDL indicates that the benchmark figures are reflected in user-oriented measures of performance such as data throughput and response time. A high-level architecture such as the one proposed here for SDL can provide benefits in terms of execution time and correctness.

  5. Experimental high energy physics and modern computer architectures

    International Nuclear Information System (INIS)

    Hoek, J.

    1988-06-01

    The paper examines how experimental High Energy Physics can use modern computer architectures efficiently. In this connection parallel and vector architectures are investigated, and the types available at the moment for general use are discussed. A separate section briefly describes some architectures that are either a combination of both, or exemplify other architectures. In an appendix some directions in which computing seems to be developing in the USA are mentioned. (author)

  6. NET-COMPUTER: Internet Computer Architecture and its Application in E-Commerce

    OpenAIRE

    P. O. Umenne; M. O. Odhiambo

    2012-01-01

    Research in Intelligent Agents has yielded interesting results, some of which have been translated into commer­cial ventures. Intelligent Agents are executable software components that represent the user, perform tasks on behalf of the user and when the task terminates, the Agents send the result to the user. Intelligent Agents are best suited for the Internet: a collection of computers connected together in a world-wide computer network. Swarm and HYDRA computer architectures for Agents’ ex...

  7. Hybrid parallel computing architecture for multiview phase shifting

    Science.gov (United States)

    Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun

    2014-11-01

    The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.

  8. Systemic Approach to Architectural Performance

    Directory of Open Access Journals (Sweden)

    Marie Davidova

    2017-04-01

    Full Text Available First-hand experiences in several design projects that were based on media richness and collaboration are described in this article. Although complex design processes are merely considered as socio-technical systems, they are deeply involved with natural systems. My collaborative research in the field of performance-oriented design combines digital and physical conceptual sketches, simulations and prototyping. GIGA-mapping - is applied to organise the data. The design process uses the most suitable tools, for the subtasks at hand, and the use of media is mixed according to particular requirements. These tools include digital and physical GIGA-mapping, parametric computer aided design (CAD, digital simulation of analyses, as well as sampling and 1:1 prototyping. Also discussed in this article are the methodologies used in several design projects to strategize these tools and the developments and trends in the tools employed.  The paper argues that the digital tools tend to produce similar results through given pre-sets that often do not correspond to real needs. Thus, there is a significant need for mixed methods including prototyping in the creative design process. Media mixing and cooperation across disciplines is unavoidable in the holistic approach to contemporary design. This includes the consideration of diverse biotic and abiotic agents. I argue that physical and digital GIGA-mapping is a crucial tool to use in coping with this complexity. Furthermore, I propose the integration of physical and digital outputs in one GIGA-map and the participation and co-design of biotic and abiotic agents into one rich design research space, which is resulting in an ever-evolving research-design process-result time-based design.

  9. High performance computing on vector systems

    CERN Document Server

    Roller, Sabine

    2008-01-01

    Presents the developments in high-performance computing and simulation on modern supercomputer architectures. This book covers trends in hardware and software development in general and specifically the vector-based systems and heterogeneous architectures. It presents innovative fields like coupled multi-physics or multi-scale simulations.

  10. Outline of a novel architecture for cortical computation

    OpenAIRE

    Majumdar, Kaushik

    2007-01-01

    In this paper a novel architecture for cortical computation has been proposed. This architecture is composed of computing paths consisting of neurons and synapses only. These paths have been decomposed into lateral, longitudinal and vertical components. Cortical computation has then been decomposed into lateral computation (LaC), longitudinal computation (LoC) and vertical computation (VeC). It has been shown that various loop structures in the cortical circuit play important roles in cortica...

  11. Using EDUCache Simulator for the Computer Architecture and Organization Course

    Directory of Open Access Journals (Sweden)

    Sasko Ristov

    2013-07-01

    Full Text Available The computer architecture and organization course is essential in all computer science and engineering programs, and the most selected and liked elective course for related engineering disciplines. However, the attractiveness brings a new challenge, it requires a lot of effort by the instructor, to explain rather complicated concepts to beginners or to those who study related disciplines. The usage of visual simulators can improve both the teaching and learning processes. The overall goal is twofold: 1~to enable a visual environment to explain the basic concepts and 2~to increase the student's willingness and ability to learn the material.A lot of visual simulators have been used for the computer architecture and organization course. However, due to the lack of visual simulators for simulation of the cache memory concepts, we have developed a new visual simulator EDUCache simulator. In this paper we present that it can be effectively and efficiently used as a supporting tool in the learning process of modern multi-layer, multi-cache and multi-core multi-processors.EDUCache's features enable an environment for performance evaluation and engineering of software systems, i.e. the students will also understand the importance of computer architecture building parts and hopefully, will increase their curiosity for hardware courses in general.

  12. Lightgrid-an agile distributed computing architecture for Geant4

    International Nuclear Information System (INIS)

    Young, Jason; Perry, John O.; Jevremovic, Tatjana

    2010-01-01

    A light weight grid based computing architecture has been developed to accelerate Geant4 computations on a variety of network architectures. This new software is called LightGrid. LightGrid has a variety of features designed to overcome current limitations on other grid based computing platforms, more specifically, smaller network architectures. By focusing on smaller, local grids, LightGrid is able to simplify the grid computing process with minimal changes to existing Geant4 code. LightGrid allows for integration between Geant4 and MySQL, which both increases flexibility in the grid as well as provides a faster, reliable, and more portable method for accessing results than traditional data storage systems. This unique method of data acquisition allows for more fault tolerant runs as well as instant results from simulations as they occur. The performance increases brought along by using LightGrid allow simulation times to be decreased linearly. LightGrid also allows for pseudo-parallelization with minimal Geant4 code changes.

  13. Efficient universal computing architectures for decoding neural activity.

    Directory of Open Access Journals (Sweden)

    Benjamin I Rapoport

    Full Text Available The ability to decode neural activity into meaningful control signals for prosthetic devices is critical to the development of clinically useful brain- machine interfaces (BMIs. Such systems require input from tens to hundreds of brain-implanted recording electrodes in order to deliver robust and accurate performance; in serving that primary function they should also minimize power dissipation in order to avoid damaging neural tissue; and they should transmit data wirelessly in order to minimize the risk of infection associated with chronic, transcutaneous implants. Electronic architectures for brain- machine interfaces must therefore minimize size and power consumption, while maximizing the ability to compress data to be transmitted over limited-bandwidth wireless channels. Here we present a system of extremely low computational complexity, designed for real-time decoding of neural signals, and suited for highly scalable implantable systems. Our programmable architecture is an explicit implementation of a universal computing machine emulating the dynamics of a network of integrate-and-fire neurons; it requires no arithmetic operations except for counting, and decodes neural signals using only computationally inexpensive logic operations. The simplicity of this architecture does not compromise its ability to compress raw neural data by factors greater than [Formula: see text]. We describe a set of decoding algorithms based on this computational architecture, one designed to operate within an implanted system, minimizing its power consumption and data transmission bandwidth; and a complementary set of algorithms for learning, programming the decoder, and postprocessing the decoded output, designed to operate in an external, nonimplanted unit. The implementation of the implantable portion is estimated to require fewer than 5000 operations per second. A proof-of-concept, 32-channel field-programmable gate array (FPGA implementation of this portion

  14. Human Computer Music Performance

    OpenAIRE

    Dannenberg, Roger B.

    2012-01-01

    Human Computer Music Performance (HCMP) is the study of music performance by live human performers and real-time computer-based performers. One goal of HCMP is to create a highly autonomous artificial performer that can fill the role of a human, especially in a popular music setting. This will require advances in automated music listening and understanding, new representations for music, techniques for music synchronization, real-time human-computer communication, music generation, sound synt...

  15. NET-COMPUTER: Internet Computer Architecture and its Application in E-Commerce

    Directory of Open Access Journals (Sweden)

    P. O. Umenne

    2012-12-01

    Full Text Available Research in Intelligent Agents has yielded interesting results, some of which have been translated into commer­cial ventures. Intelligent Agents are executable software components that represent the user, perform tasks on behalf of the user and when the task terminates, the Agents send the result to the user. Intelligent Agents are best suited for the Internet: a collection of computers connected together in a world-wide computer network. Swarm and HYDRA computer architectures for Agents’ execution were developed at the University of Surrey, UK in the 90s. The objective of the research was to develop a software-based computer architecture on which Agents execution could be explored. The combination of Intelligent Agents and HYDRA computer architecture gave rise to a new computer concept: the NET-Computer in which the comput­ing resources reside on the Internet. The Internet computers form the hardware and software resources, and the user is provided with a simple interface to access the Internet and run user tasks. The Agents autonomously roam the Internet (NET-Computer executing the tasks. A growing segment of the Internet is E-Commerce for online shopping for products and services. The Internet computing resources provide a marketplace for product suppliers and consumers alike. Consumers are looking for suppliers selling products and services, while suppliers are looking for buyers. Searching the vast amount of information available on the Internet causes a great deal of problems for both consumers and suppliers. Intelligent Agents executing on the NET-Computer can surf through the Internet and select specific information of interest to the user. The simulation results show that Intelligent Agents executing HYDRA computer architecture could be applied in E-Commerce.

  16. Teaching Computer Organization and Architecture Using Simulation and FPGA Applications

    OpenAIRE

    D. K.M. Al-Aubidy

    2007-01-01

    This paper presents the design concepts and realization of incorporating micro-operation simulation and FPGA implementation into a teaching tool for computer organization and architecture. This teaching tool helps computer engineering and computer science students to be familiarized practically with computer organization and architecture through the development of their own instruction set, computer programming and interfacing experiments. A two-pass assembler has been designed and implemente...

  17. Computer Architecture Techniques for Power-Efficiency

    CERN Document Server

    Kaxiras, Stefanos

    2008-01-01

    In the last few years, power dissipation has become an important design constraint, on par with performance, in the design of new computer systems. Whereas in the past, the primary job of the computer architect was to translate improvements in operating frequency and transistor count into performance, now power efficiency must be taken into account at every step of the design process. While for some time, architects have been successful in delivering 40% to 50% annual improvement in processor performance, costs that were previously brushed aside eventually caught up. The most critical of these

  18. A high performance architecture for accelerator controls

    International Nuclear Information System (INIS)

    Allen, M.; Hunt, S.M; Lue, H.; Saltmarsh, C.G.; Parker, C.R.C.B.

    1991-01-01

    The demands placed on the Superconducting Super Collider (SSC) control system due to large distances, high bandwidth and fast response time required for operation will require a fresh approach to the data communications architecture of the accelerator. The prototype design effort aims at providing deterministic communication across the accelerator complex with a response time of < 100 ms and total bandwidth of 2 Gbits/sec. It will offer a consistent interface for a large number of equipment types, from vacuum pumps to beam position monitors, providing appropriate communications performance for each equipment type. It will consist of highly parallel links to all equipment: those with computing resources, non-intelligent direct control interfaces, and data concentrators. This system will give each piece of equipment a dedicated link of fixed bandwidth to the control system. Application programs will have access to all accelerator devices which will be memory mapped into a global virtual addressing scheme. Links to devices in the same geographical area will be multiplexed using commercial Time Division Multiplexing equipment. Low-level access will use reflective memory techniques, eliminating processing overhead and complexity of traditional data communication protocols. The use of commercial standards and equipment will enable a high performance system to be built at low cost

  19. A high performance architecture for accelerator controls

    International Nuclear Information System (INIS)

    Allen, M.; Hunt, S.M.; Lue, H.; Saltmarsh, C.G.; Parker, C.R.C.B.

    1991-03-01

    The demands placed on the Superconducting Super Collider (SSC) control system due to large distances, high bandwidth and fast response time required for operation will require a fresh approach to the data communications architecture of the accelerator. The prototype design effort aims at providing deterministic communication across the accelerator complex with a response time of <100 ms and total bandwidth of 2 Gbits/sec. It will offer a consistent interface for a large number of equipment types, from vacuum pumps to beam position monitors, providing appropriate communications performance for each equipment type. It will consist of highly parallel links to all equipments: those with computing resources, non-intelligent direct control interfaces, and data concentrators. This system will give each piece of equipment a dedicated link of fixed bandwidth to the control system. Application programs will have access to all accelerator devices which will be memory mapped into a global virtual addressing scheme. Links to devices in the same geographical area will be multiplexed using commercial Time Division Multiplexing equipment. Low-level access will use reflective memory techniques, eliminating processing overhead and complexity of traditional data communication protocols. The use of commercial standards and equipment will enable a high performance system to be built at low cost. 1 fig

  20. Computation, architectural design and fabrication logic

    DEFF Research Database (Denmark)

    Larsen, Niels Martin

    2016-01-01

    Digital fabrication and digital form generation can change the way different professions interact in relation to the development and construction of architecture. The technologies can provide a more integrated design process and expand the architectural vocabulary. At Aarhus School of Architectur...

  1. High-performance computing using FPGAs

    CERN Document Server

    Benkrid, Khaled

    2013-01-01

    This book is concerned with the emerging field of High Performance Reconfigurable Computing (HPRC), which aims to harness the high performance and relative low power of reconfigurable hardware–in the form Field Programmable Gate Arrays (FPGAs)–in High Performance Computing (HPC) applications. It presents the latest developments in this field from applications, architecture, and tools and methodologies points of view. We hope that this work will form a reference for existing researchers in the field, and entice new researchers and developers to join the HPRC community.  The book includes:  Thirteen application chapters which present the most important application areas tackled by high performance reconfigurable computers, namely: financial computing, bioinformatics and computational biology, data search and processing, stencil computation e.g. computational fluid dynamics and seismic modeling, cryptanalysis, astronomical N-body simulation, and circuit simulation.     Seven architecture chapters which...

  2. Outline of a novel architecture for cortical computation.

    Science.gov (United States)

    Majumdar, Kaushik

    2008-03-01

    In this paper a novel architecture for cortical computation has been proposed. This architecture is composed of computing paths consisting of neurons and synapses. These paths have been decomposed into lateral, longitudinal and vertical components. Cortical computation has then been decomposed into lateral computation (LaC), longitudinal computation (LoC) and vertical computation (VeC). It has been shown that various loop structures in the cortical circuit play important roles in cortical computation as well as in memory storage and retrieval, keeping in conformity with the molecular basis of short and long term memory. A new learning scheme for the brain has also been proposed and how it is implemented within the proposed architecture has been explained. A few mathematical results about the architecture have been proposed, some of which are without proof.

  3. Thrifty: An Exascale Architecture for Energy Proportional Computing

    Energy Technology Data Exchange (ETDEWEB)

    Torrellas, Josep [Univ. of Illinois, Champaign, IL (United States)

    2014-12-23

    The objective of this project is to design different aspects of a novel exascale architecture called Thrifty. Our goal is to focus on the challenges of power/energy efficiency, performance, and resiliency in exascale systems. The project includes work on computer architecture (Josep Torrellas from University of Illinois), compilation (Daniel Quinlan from Lawrence Livermore National Laboratory), runtime and applications (Laura Carrington from University of California San Diego), and circuits (Wilfred Pinfold from Intel Corporation). In this report, we focus on the progress at the University of Illinois during the last year of the grant (September 1, 2013 to August 31, 2014). We also point to the progress in the other collaborating institutions when needed.

  4. High-performance computing — an overview

    Science.gov (United States)

    Marksteiner, Peter

    1996-08-01

    An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.

  5. On Architectural Acoustics Design using Computer Simulation

    DEFF Research Database (Denmark)

    Schmidt, Anne Marie Due; Kirkegaard, Poul Henning

    2004-01-01

    The acoustical quality of a given building, or space within the building, is highly dependent on the architectural design. Architectural acoustics design has in the past been based on simple design rules. However, with a growing complexity in the architectural acoustic and the emergence of potent...... room acoustic simulation programs it is now possible to subjectively analyze and evaluate acoustic properties prior to the actual construction of a facility. With the right tools applied, the acoustic design can become an integrated part of the architectural design process. The aim of the present paper...... this information is discussed. The conclusion of the paper is that the application of acoustical simulation programs is most beneficial in the last of three phases but that an application of the program to the two first phases would be preferable and possible with an improvement of the interface of the program....

  6. Progress in a novel architecture for high performance processing

    Science.gov (United States)

    Zhang, Zhiwei; Liu, Meng; Liu, Zijun; Du, Xueliang; Xie, Shaolin; Ma, Hong; Ding, Guangxin; Ren, Weili; Zhou, Fabiao; Sun, Wenqin; Wang, Huijuan; Wang, Donglin

    2018-04-01

    The high performance processing (HPP) is an innovative architecture which targets on high performance computing with excellent power efficiency and computing performance. It is suitable for data intensive applications like supercomputing, machine learning and wireless communication. An example chip with four application-specific integrated circuit (ASIC) cores which is the first generation of HPP cores has been taped out successfully under Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm low power process. The innovative architecture shows great energy efficiency over the traditional central processing unit (CPU) and general-purpose computing on graphics processing units (GPGPU). Compared with MaPU, HPP has made great improvement in architecture. The chip with 32 HPP cores is being developed under TSMC 16 nm field effect transistor (FFC) technology process and is planed to use commercially. The peak performance of this chip can reach 4.3 teraFLOPS (TFLOPS) and its power efficiency is up to 89.5 gigaFLOPS per watt (GFLOPS/W).

  7. Biomorphic Multi-Agent Architecture for Persistent Computing

    Science.gov (United States)

    Lodding, Kenneth N.; Brewster, Paul

    2009-01-01

    A multi-agent software/hardware architecture, inspired by the multicellular nature of living organisms, has been proposed as the basis of design of a robust, reliable, persistent computing system. Just as a multicellular organism can adapt to changing environmental conditions and can survive despite the failure of individual cells, a multi-agent computing system, as envisioned, could adapt to changing hardware, software, and environmental conditions. In particular, the computing system could continue to function (perhaps at a reduced but still reasonable level of performance) if one or more component( s) of the system were to fail. One of the defining characteristics of a multicellular organism is unity of purpose. In biology, the purpose is survival of the organism. The purpose of the proposed multi-agent architecture is to provide a persistent computing environment in harsh conditions in which repair is difficult or impossible. A multi-agent, organism-like computing system would be a single entity built from agents or cells. Each agent or cell would be a discrete hardware processing unit that would include a data processor with local memory, an internal clock, and a suite of communication equipment capable of both local line-of-sight communications and global broadcast communications. Some cells, denoted specialist cells, could contain such additional hardware as sensors and emitters. Each cell would be independent in the sense that there would be no global clock, no global (shared) memory, no pre-assigned cell identifiers, no pre-defined network topology, and no centralized brain or control structure. Like each cell in a living organism, each agent or cell of the computing system would contain a full description of the system encoded as genes, but in this case, the genes would be components of a software genome.

  8. High Performance Systolic Array Core Architecture Design for DNA Sequencer

    Directory of Open Access Journals (Sweden)

    Saiful Nurdin Dayana

    2018-01-01

    Full Text Available This paper presents a high performance systolic array (SA core architecture design for Deoxyribonucleic Acid (DNA sequencer. The core implements the affine gap penalty score Smith-Waterman (SW algorithm. This time-consuming local alignment algorithm guarantees optimal alignment between DNA sequences, but it requires quadratic computation time when performed on standard desktop computers. The use of linear SA decreases the time complexity from quadratic to linear. In addition, with the exponential growth of DNA databases, the SA architecture is used to overcome the timing issue. In this work, the SW algorithm has been captured using Verilog Hardware Description Language (HDL and simulated using Xilinx ISIM simulator. The proposed design has been implemented in Xilinx Virtex -6 Field Programmable Gate Array (FPGA and improved in the core area by 90% reduction.

  9. Biomimetic design processes in architecture: morphogenetic and evolutionary computational design

    International Nuclear Information System (INIS)

    Menges, Achim

    2012-01-01

    Design computation has profound impact on architectural design methods. This paper explains how computational design enables the development of biomimetic design processes specific to architecture, and how they need to be significantly different from established biomimetic processes in engineering disciplines. The paper first explains the fundamental difference between computer-aided and computational design in architecture, as the understanding of this distinction is of critical importance for the research presented. Thereafter, the conceptual relation and possible transfer of principles from natural morphogenesis to design computation are introduced and the related developments of generative, feature-based, constraint-based, process-based and feedback-based computational design methods are presented. This morphogenetic design research is then related to exploratory evolutionary computation, followed by the presentation of two case studies focusing on the exemplary development of spatial envelope morphologies and urban block morphologies. (paper)

  10. Design for scalability in 3D computer graphics architectures

    DEFF Research Database (Denmark)

    Holten-Lund, Hans Erik

    2002-01-01

    This thesis describes useful methods and techniques for designing scalable hybrid parallel rendering architectures for 3D computer graphics. Various techniques for utilizing parallelism in a pipelines system are analyzed. During the Ph.D study a prototype 3D graphics architecture named Hybris has...

  11. A heterogeneous hierarchical architecture for real-time computing

    Energy Technology Data Exchange (ETDEWEB)

    Skroch, D.A.; Fornaro, R.J.

    1988-12-01

    The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.

  12. Memristor-based nanoelectronic computing circuits and architectures

    CERN Document Server

    Vourkas, Ioannis

    2016-01-01

    This book considers the design and development of nanoelectronic computing circuits, systems and architectures focusing particularly on memristors, which represent one of today’s latest technology breakthroughs in nanoelectronics. The book studies, explores, and addresses the related challenges and proposes solutions for the smooth transition from conventional circuit technologies to emerging computing memristive nanotechnologies. Its content spans from fundamental device modeling to emerging storage system architectures and novel circuit design methodologies, targeting advanced non-conventional analog/digital massively parallel computational structures. Several new results on memristor modeling, memristive interconnections, logic circuit design, memory circuit architectures, computer arithmetic systems, simulation software tools, and applications of memristors in computing are presented. High-density memristive data storage combined with memristive circuit-design paradigms and computational tools applied t...

  13. The Sentinel-4 detectors: architecture and performance

    Science.gov (United States)

    Skegg, Michael P.; Hermsen, Markus; Hohn, Rüdiger; Williges, Christian; Woffinden, Charles; Levillain, Yves; Reulke, Ralf

    2017-09-01

    The Sentinel-4 instrument is an imaging spectrometer, developed by Airbus under ESA contract in the frame of the joint European Union (EU)/ESA COPERNICUS program. SENTINEL-4 will provide accurate measurements of trace gases from geostationary orbit, including key atmospheric constituents such as ozone, nitrogen dioxide, sulfur dioxide, formaldehyde, as well as aerosol and cloud properties. Key to achieving these atmospheric measurements are the two CCD detectors, covering the wavelengths in the ranges 305 nm to 500 nm (UVVIS) and 750 to 775 nm (NIR) respectively. The paper describes the architecture, and operation of these two CCD detectors, which have an unusually high full-well capacity and a very specific architecture and read-out sequence to match the requirements of the Sentinel- 4 instrument. The key performance aspects and their verification through measurement are presented, with a focus on an unusual, bi-modal dark signal generation rate observed during test.

  14. LISA Mission and System architectures and performances

    International Nuclear Information System (INIS)

    Gath, Peter F; Weise, Dennis; Schulte, Hans-Reiner; Johann, Ulrich

    2009-01-01

    In the context of the LISA Mission Formulation Study, the LISA System was studied in detail and a new baseline architecture for the whole mission was established. This new baseline is the result of trade-offs on both, mission and system level. The paper gives an overview of the different mission scenarios and configurations that were studied in connection with their corresponding advantages and disadvantages as well as performance estimates. Differences in the required technologies and their influence on the overall performance budgets are highlighted for all configurations. For the selected baseline concept, a more detailed description of the configuration is given and open issues in the technologies involved are discussed.

  15. LISA Mission and System architectures and performances

    Energy Technology Data Exchange (ETDEWEB)

    Gath, Peter F; Weise, Dennis; Schulte, Hans-Reiner; Johann, Ulrich, E-mail: peter.gath@astrium.eads.ne [Astrium GmbH Satellites, 88039 Friedrichshafen (Germany)

    2009-03-01

    In the context of the LISA Mission Formulation Study, the LISA System was studied in detail and a new baseline architecture for the whole mission was established. This new baseline is the result of trade-offs on both, mission and system level. The paper gives an overview of the different mission scenarios and configurations that were studied in connection with their corresponding advantages and disadvantages as well as performance estimates. Differences in the required technologies and their influence on the overall performance budgets are highlighted for all configurations. For the selected baseline concept, a more detailed description of the configuration is given and open issues in the technologies involved are discussed.

  16. Optimizations of Unstructured Aerodynamics Computations for Many-core Architectures

    KAUST Repository

    Al Farhan, Mohammed Ahmed

    2018-04-13

    We investigate several state-of-the-practice shared-memory optimization techniques applied to key routines of an unstructured computational aerodynamics application with irregular memory accesses. We illustrate for the Intel KNL processor, as a representative of the processors in contemporary leading supercomputers, identifying and addressing performance challenges without compromising the floating point numerics of the original code. We employ low and high-level architecture-specific code optimizations involving thread and data-level parallelism. Our approach is based upon a multi-level hierarchical distribution of work and data across both the threads and the SIMD units within every hardware core. On a 64-core KNL chip, we achieve nearly 2.9x speedup of the dominant routines relative to the baseline. These exhibit almost linear strong scalability up to 64 threads, and thereafter some improvement with hyperthreading. At substantially fewer Watts, we achieve up to 1.7x speedup relative to the performance of 72 threads of a 36-core Haswell CPU and roughly equivalent performance to 112 threads of a 56-core Skylake scalable processor. These optimizations are expected to be of value for many other unstructured mesh PDE-based scientific applications as multi and many-core architecture evolves.

  17. Computer system architecture for laboratory automation

    International Nuclear Information System (INIS)

    Penney, B.K.

    1978-01-01

    This paper describes the various approaches that may be taken to provide computing resources for laboratory automation. Three distinct approaches are identified, the single dedicated small computer, shared use of a larger computer, and a distributed approach in which resources are provided by a number of computers, linked together, and working in some cooperative way. The significance of the microprocessor in laboratory automation is discussed, and it is shown that it is not simply a cheap replacement of the minicomputer. (Auth.)

  18. A computational architecture for social agents

    Energy Technology Data Exchange (ETDEWEB)

    Bond, A.H. [California Institute of Technology, Pasadena, CA (United States)

    1996-12-31

    This article describes a new class of information-processing models for social agents. They axe derived from primate brain architecture, the processing in brain regions, the interactions among brain regions, and the social behavior of primates. In another paper, we have reviewed the neuroanatomical connections and functional involvements of cortical regions. We reviewed the evidence for a hierarchical architecture in the primate brain. By examining neuroanatomical evidence for connections among neural areas, we were able to establish anatomical regions and connections. We then examined evidence for specific functional involvements of the different neural axeas and found some support for hierarchical functioning, not only for the perception hierarchies but also for the planning and action hierarchy in the frontal lobes.

  19. On architectural acoustic design using computer simulation

    DEFF Research Database (Denmark)

    Schmidt, Anne Marie Due; Kirkegaard, Poul Henning

    2004-01-01

    properties prior to the actual construction of a building. With the right tools applied, acoustic design can become an integral part of the architectural design process. The aim of this paper is to investigate the field of application that an acoustic simulation programme can have during an architectural...... acoustic design process. The emphasis is put on the first three out of five phases in the working process of the architect and a case study is carried out in which each phase is represented by typical results ? as exemplified with reference to the design of Bagsværd Church by Jørn Utzon. The paper...... discusses the advantages and disadvantages of the programme in each phase compared to the works of architects not using acoustic simulation programmes. The conclusion of the paper is that the application of acoustic simulation programs is most beneficial in the last of three phases but an application...

  20. Utilizing a multiprocessor architecture - The performance of MIDAS

    International Nuclear Information System (INIS)

    Maples, C.; Logan, D.; Meng, J.; Rathbun, W.; Weaver, D.

    1983-01-01

    The MIDAS architecture organizes multiple CPUs into clusters called distributed subsystems. Each subsystem consists of an array of processors controlled by a supervisory CPU. The multiprocessor array is composed of commercial CPUs (with floating point hardware) and specialized processing elements. Interprocessor communication within the array may occur either through switched memory modules or common shared memory. The architecture permits multiple processors to be focused on single problems. A distributed subsystem has been constructed and tested. It currently consists of a supervisor CPU; 16 blocks of independently switchable memory; 9 general purpose, VAX-class CPUs; and 2 specialized pipelined processors to handle I/O. Results on a variety of problems indicate that the subsystem performs 8 to 15 times faster than a standard computer with an identical CPU. The difference in performance represents the effect of differing CPU and I/O requirements

  1. The visual simulators for architecture and computer organization learning

    OpenAIRE

    Nikolić Boško; Grbanović Nenad; Đorđević Jovan

    2009-01-01

    The paper proposes a method of an effective distance learning of architecture and computer organization. The proposed method is based on a software system that is possible to be applied in any course in this field. Within this system students are enabled to observe simulation of already created computer systems. The system provides creation and simulation of switch systems, too.

  2. High Performance Computing Multicast

    Science.gov (United States)

    2012-02-01

    A History of the Virtual Synchrony Replication Model,” in Replication: Theory and Practice, Charron-Bost, B., Pedone, F., and Schiper, A. (Eds...Performance Computing IP / IPv4 Internet Protocol (version 4.0) IPMC Internet Protocol MultiCast LAN Local Area Network MCMD Dr. Multicast MPI

  3. Towards Energy-Centric Computing and Computer Architecture

    CERN Multimedia

    CERN. Geneva

    2010-01-01

    Technology forecasts indicate that device scaling will continue well into the next decade.  Unfortunately, it is becoming extremely difficult to harness this increase in the number of transistors into performance due to a number of technological, circuit, architectural, methodological and  programming challenges.In this talk, I will argue that the key emerging showstopper is power.  Voltage scaling as a means to maintain a constant power envelope with an increase in transistor  numbers is hitting diminishing returns. As such, to continue riding the Moore's law we need to look  for drastic measures to cut power. This is definitely the case for server chips in future datacenters, where abundant server parallelism, redundancy and 3D chip integration are likely to remove  programming, reliability and bandwidth hurdles, leaving power as the only true limiter.I will present  results backing this argument based on validated models for f...

  4. A memory-array architecture for computer vision

    Energy Technology Data Exchange (ETDEWEB)

    Balsara, P.T.

    1989-01-01

    With the fast advances in the area of computer vision and robotics there is a growing need for machines that can understand images at a very high speed. A conventional von Neumann computer is not suited for this purpose because it takes a tremendous amount of time to solve most typical image processing problems. Exploiting the inherent parallelism present in various vision tasks can significantly reduce the processing time. Fortunately, parallelism is increasingly affordable as hardware gets cheaper. Thus it is now imperative to study computer vision in a parallel processing framework. The author should first design a computational structure which is well suited for a wide range of vision tasks and then develop parallel algorithms which can run efficiently on this structure. Recent advances in VLSI technology have led to several proposals for parallel architectures for computer vision. In this thesis he demonstrates that a memory array architecture with efficient local and global communication capabilities can be used for high speed execution of a wide range of computer vision tasks. This architecture, called the Access Constrained Memory Array Architecture (ACMAA), is efficient for VLSI implementation because of its modular structure, simple interconnect and limited global control. Several parallel vision algorithms have been designed for this architecture. The choice of vision problems demonstrates the versatility of ACMAA for a wide range of vision tasks. These algorithms were simulated on a high level ACMAA simulator running on the Intel iPSC/2 hypercube, a parallel architecture. The results of this simulation are compared with those of sequential algorithms running on a single hypercube node. Details of the ACMAA processor architecture are also presented.

  5. Optoelectronic Computer Architecture Development for Image Reconstruction

    National Research Council Canada - National Science Library

    Forber, Richard

    1996-01-01

    .... Specifically, we collaborated with UCSD and ERIM on the development of an optically augmented electronic computer for high speed inverse transform calculations to enable real time image reconstruction...

  6. Heavy Lift Vehicle (HLV) Avionics Flight Computing Architecture Study

    Science.gov (United States)

    Hodson, Robert F.; Chen, Yuan; Morgan, Dwayne R.; Butler, A. Marc; Sdhuh, Joseph M.; Petelle, Jennifer K.; Gwaltney, David A.; Coe, Lisa D.; Koelbl, Terry G.; Nguyen, Hai D.

    2011-01-01

    A NASA multi-Center study team was assembled from LaRC, MSFC, KSC, JSC and WFF to examine potential flight computing architectures for a Heavy Lift Vehicle (HLV) to better understand avionics drivers. The study examined Design Reference Missions (DRMs) and vehicle requirements that could impact the vehicles avionics. The study considered multiple self-checking and voting architectural variants and examined reliability, fault-tolerance, mass, power, and redundancy management impacts. Furthermore, a goal of the study was to develop the skills and tools needed to rapidly assess additional architectures should requirements or assumptions change.

  7. Design of Carborane Molecular Architectures via Electronic Structure Computations

    International Nuclear Information System (INIS)

    Oliva, J.M.; Serrano-Andres, L.; Klein, D.J.; Schleyer, P.V.R.; Mich, J.

    2009-01-01

    Quantum-mechanical electronic structure computations were employed to explore initial steps towards a comprehensive design of poly carborane architectures through assembly of molecular units. Aspects considered were (i) the striking modification of geometrical parameters through substitution, (ii) endohedral carboranes and proposed ejection mechanisms for energy/ion/atom/energy storage/transport, (iii) the excited state character in single and dimeric molecular units, and (iv) higher architectural constructs. A goal of this work is to find optimal architectures where atom/ion/energy/spin transport within carborane superclusters is feasible in order to modernize and improve future photo energy processes.

  8. Architectural analysis for wirelessly powered computing platforms

    NARCIS (Netherlands)

    Kapoor, A.; Pineda de Gyvez, J.

    2013-01-01

    We present a design framework for wirelessly powered generic computing platforms that takes into account various system parameters in response to a time-varying energy source. These parameters are the charging profile of the energy source, computing speed (fclk), digital supply voltage (VDD), energy

  9. MOMCC: Market-Oriented Architecture for Mobile Cloud Computing Based on Service Oriented Architecture

    OpenAIRE

    Abolfazli, Saeid; Sanaei, Zohreh; Gani, Abdullah; Shiraz, Muhammad

    2012-01-01

    The vision of augmenting computing capabilities of mobile devices, especially smartphones with least cost is likely transforming to reality leveraging cloud computing. Cloud exploitation by mobile devices breeds a new research domain called Mobile Cloud Computing (MCC). However, issues like portability and interoperability should be addressed for mobile augmentation which is a non-trivial task using component-based approaches. Service Oriented Architecture (SOA) is a promising design philosop...

  10. Explaining the gap between theoretical peak performance and real performance for supercomputer architectures

    International Nuclear Information System (INIS)

    Schoenauer, W.; Haefner, H.

    1993-01-01

    The basic architectures of vector and parallel computers with their properties are presented. Then the memory size and the arithmetic operations in the context of memory bandwidth are discussed. For the exemplary discussion of a single operation micro-measurements of the vector triad for the IBM 3090 VF and the CRAY Y-MP/8 are presented. They reveal the details of the losses for a single operation. Then we analyze the global performance of a whole supercomputer by identifying reduction factors that bring down the theoretical peak performance to the poor real performance. The responsibilities of the manufacturer and of the user for these losses are dicussed. Then the price-performance ratio for different architectures in a snapshot of January 1991 is briefly mentioned. Finally some remarks to a user-friendly architecture for a supercomputer will be made. (orig.)

  11. Cloud Computing Security in Openstack Architecture: General Overview

    Directory of Open Access Journals (Sweden)

    Gleb Igorevich Shakulo

    2015-10-01

    Full Text Available The subject of article is cloud computing security. Article begins with author analyzing cloud computing advantages and disadvantages, factors of growth, both positive and negative. Among latter, security is deemed one of the most prominent. Furthermore, author takes architecture of OpenStack project as an example for study: describes its essential components and their interconnection. As conclusion, author raises series of questions as possible areas of further research to resolve security concerns, thus making cloud computing more secure technology.

  12. Real-time FPGA architectures for computer vision

    Science.gov (United States)

    Arias-Estrada, Miguel; Torres-Huitzil, Cesar

    2000-03-01

    This paper presents an architecture for real-time generic convolution of a mask and an image. The architecture is intended for fast low level image processing. The FPGA-based architecture takes advantage of the availability of registers in FPGAs to implement an efficient and compact module to process the convolutions. The architecture is designed to minimize the number of accesses to the image memory and is based on parallel modules with internal pipeline operation in order to improve its performance. The architecture is prototyped in a FPGA, but it can be implemented on a dedicated VLSI to reach higher clock frequencies. Complexity issues, FPGA resources utilization, FPGA limitations, and real time performance are discussed. Some results are presented and discussed.

  13. Analysis OpenMP performance of AMD and Intel architecture for breaking waves simulation using MPS

    Science.gov (United States)

    Alamsyah, M. N. A.; Utomo, A.; Gunawan, P. H.

    2018-03-01

    Simulation of breaking waves by using Navier-Stokes equation via moving particle semi-implicit method (MPS) over close domain is given. The results show the parallel computing on multicore architecture using OpenMP platform can reduce the computational time almost half of the serial time. Here, the comparison using two computer architectures (AMD and Intel) are performed. The results using Intel architecture is shown better than AMD architecture in CPU time. However, in efficiency, the computer with AMD architecture gives slightly higher than the Intel. For the simulation by 1512 number of particles, the CPU time using Intel and AMD are 12662.47 and 28282.30 respectively. Moreover, the efficiency using similar number of particles, AMD obtains 50.09 % and Intel up to 49.42 %.

  14. Performing stencil computations

    Energy Technology Data Exchange (ETDEWEB)

    Donofrio, David

    2018-01-16

    A method and apparatus for performing stencil computations efficiently are disclosed. In one embodiment, a processor receives an offset, and in response, retrieves a value from a memory via a single instruction, where the retrieving comprises: identifying, based on the offset, one of a plurality of registers of the processor; loading an address stored in the identified register; and retrieving from the memory the value at the address.

  15. Layered Architectures for Quantum Computers and Quantum Repeaters

    Science.gov (United States)

    Jones, Nathan C.

    This chapter examines how to organize quantum computers and repeaters using a systematic framework known as layered architecture, where machine control is organized in layers associated with specialized tasks. The framework is flexible and could be used for analysis and comparison of quantum information systems. To demonstrate the design principles in practice, we develop architectures for quantum computers and quantum repeaters based on optically controlled quantum dots, showing how a myriad of technologies must operate synchronously to achieve fault-tolerance. Optical control makes information processing in this system very fast, scalable to large problem sizes, and extendable to quantum communication.

  16. Architectural design for a topological cluster state quantum computer

    International Nuclear Information System (INIS)

    Devitt, Simon J; Munro, William J; Nemoto, Kae; Fowler, Austin G; Stephens, Ashley M; Greentree, Andrew D; Hollenberg, Lloyd C L

    2009-01-01

    The development of a large scale quantum computer is a highly sought after goal of fundamental research and consequently a highly non-trivial problem. Scalability in quantum information processing is not just a problem of qubit manufacturing and control but it crucially depends on the ability to adapt advanced techniques in quantum information theory, such as error correction, to the experimental restrictions of assembling qubit arrays into the millions. In this paper, we introduce a feasible architectural design for large scale quantum computation in optical systems. We combine the recent developments in topological cluster state computation with the photonic module, a simple chip-based device that can be used as a fundamental building block for a large-scale computer. The integration of the topological cluster model with this comparatively simple operational element addresses many significant issues in scalable computing and leads to a promising modular architecture with complete integration of active error correction, exhibiting high fault-tolerant thresholds.

  17. Experimental comparison of two quantum computing architectures.

    Science.gov (United States)

    Linke, Norbert M; Maslov, Dmitri; Roetteler, Martin; Debnath, Shantanu; Figgatt, Caroline; Landsman, Kevin A; Wright, Kenneth; Monroe, Christopher

    2017-03-28

    We run a selection of algorithms on two state-of-the-art 5-qubit quantum computers that are based on different technology platforms. One is a publicly accessible superconducting transmon device (www. ibm.com/ibm-q) with limited connectivity, and the other is a fully connected trapped-ion system. Even though the two systems have different native quantum interactions, both can be programed in a way that is blind to the underlying hardware, thus allowing a comparison of identical quantum algorithms between different physical systems. We show that quantum algorithms and circuits that use more connectivity clearly benefit from a better-connected system of qubits. Although the quantum systems here are not yet large enough to eclipse classical computers, this experiment exposes critical factors of scaling quantum computers, such as qubit connectivity and gate expressivity. In addition, the results suggest that codesigning particular quantum applications with the hardware itself will be paramount in successfully using quantum computers in the future.

  18. Investigating Architectural Issues in Neuromorphic Computing

    Science.gov (United States)

    2009-06-01

    An example of this is Diffusion Tensor Imaging ( DTI ), a variant of fMRI, which detects water diffusion. DTI is routinely applied at medical...model computed for a subfield positioned over a section of the silhouette dog’s hind leg . The illustrated angles roughly correspond to orientation

  19. Performative Computation-aided Design Optimization

    Directory of Open Access Journals (Sweden)

    Ming Tang

    2012-12-01

    Full Text Available This article discusses a collaborative research and teaching project between the University of Cincinnati, Perkins+Will’s Tech Lab, and the University of North Carolina Greensboro. The primary investigation focuses on the simulation, optimization, and generation of architectural designs using performance-based computational design approaches. The projects examine various design methods, including relationships between building form, performance and the use of proprietary software tools for parametric design.

  20. Experimental Comparison of Two Quantum Computing Architectures

    Science.gov (United States)

    2017-03-28

    trap experiment on an independent quantum computer of identical size and comparable capability but with a different physical implementation at its core... locked laser. These optical controllers con- sist of an array of individual addressing beams and a coun- terpropagating global beam that illuminates...generally programmable. This allows identical quantum tasks or algorithms to be imple- mented on radically different technologies to inform further

  1. Contemporary high performance computing from petascale toward exascale

    CERN Document Server

    Vetter, Jeffrey S

    2013-01-01

    Contemporary High Performance Computing: From Petascale toward Exascale focuses on the ecosystems surrounding the world's leading centers for high performance computing (HPC). It covers many of the important factors involved in each ecosystem: computer architectures, software, applications, facilities, and sponsors. The first part of the book examines significant trends in HPC systems, including computer architectures, applications, performance, and software. It discusses the growth from terascale to petascale computing and the influence of the TOP500 and Green500 lists. The second part of the

  2. On Computational Fluid Dynamics Tools in Architectural Design

    DEFF Research Database (Denmark)

    Kirkegaard, Poul Henning; Hougaard, Mads; Stærdahl, Jesper Winther

    engineering computational fluid dynamics (CFD) simulation program ANSYS CFX and a CFD based representative program RealFlow are investigated. These two programs represent two types of CFD based tools available for use during phases of an architectural design process. However, as outlined in two case studies...

  3. The Design of a System Architecture for Mobile Multimedia Computers

    NARCIS (Netherlands)

    Havinga, Paul J.M.

    2000-01-01

    This chapter discusses the system architecture of a portable computer, called Mobile Digital Companion, which provides support for handling multimedia applications energy efficiently. Because battery life is limited and battery weight is an important factor for the size and the weight of the Mobile

  4. Cloud Computing Security in Openstack Architecture: General Overview

    OpenAIRE

    Gleb Igorevich Shakulo

    2015-01-01

    The subject of article is cloud computing security. Article begins with author analyzing cloud computing advantages and disadvantages, factors of growth, both positive and negative. Among latter, security is deemed one of the most prominent. Furthermore, author takes architecture of OpenStack project as an example for study: describes its essential components and their interconnection. As conclusion, author raises series of questions as possible areas of further research to resolve security c...

  5. CMS on the GRID: Toward a fully distributed computing architecture

    International Nuclear Information System (INIS)

    Innocente, Vincenzo

    2003-01-01

    The computing systems required to collect, analyse and store the physics data at LHC would need to be distributed and global in scope. CMS is actively involved in several grid-related projects to develop and deploy a fully distributed computing architecture. We present here recent developments of tools for automating job submission and for serving data to remote analysis stations. Plans for further test and deployment of a production grid are also described

  6. Architectural design and energy performance; Conception architecturale et performance energetique

    Energy Technology Data Exchange (ETDEWEB)

    Beaud, Ph. [Agence de l' Environnement et de la Maitrise de l' Energie, (ADEME), 06 - Valbonne (France); Pouget, A. [Bureau Etude Thermique, 75 - Paris (France); Sesolis, B. [TRIBU, 75 - Paris (France)] [and others

    2000-07-01

    This day was organized around the energy performance of the architecture in three parts. A first time dealt with the design of new buildings and private houses. Simulation tools for the energy optimization and practice of design are discussed. The second part was devoted to the new 2000 regulation with an open discussion on the regulatory costs. The last part forecasted the evolution until 2015 taking into account the french program of fight against the greenhouse effect, the limitation of the air conditioning consumption and the definition of a quality label concerning the energy performances. (A.L.B.)

  7. Earth Science Computational Architecture for Multi-disciplinary Investigations

    Science.gov (United States)

    Parker, J. W.; Blom, R.; Gurrola, E.; Katz, D.; Lyzenga, G.; Norton, C.

    2005-12-01

    Understanding the processes underlying Earth's deformation and mass transport requires a non-traditional, integrated, interdisciplinary, approach dependent on multiple space and ground based data sets, modeling, and computational tools. Currently, details of geophysical data acquisition, analysis, and modeling largely limit research to discipline domain experts. Interdisciplinary research requires a new computational architecture that is optimized to perform complex data processing of multiple solid Earth science data types in a user-friendly environment. A web-based computational framework is being developed and integrated with applications for automatic interferometric radar processing, and models for high-resolution deformation & gravity, forward models of viscoelastic mass loading over short wavelengths & complex time histories, forward-inverse codes for characterizing surface loading-response over time scales of days to tens of thousands of years, and inversion of combined space magnetic & gravity fields to constrain deep crustal and mantle properties. This framework combines an adaptation of the QuakeSim distributed services methodology with the Pyre framework for multiphysics development. The system uses a three-tier architecture, with a middle tier server that manages user projects, available resources, and security. This ensures scalability to very large networks of collaborators. Users log into a web page and have a personal project area, persistently maintained between connections, for each application. Upon selection of an application and host from a list of available entities, inputs may be uploaded or constructed from web forms and available data archives, including gravity, GPS and imaging radar data. The user is notified of job completion and directed to results posted via URLs. Interdisciplinary work is supported through easy availability of all applications via common browsers, application tutorials and reference guides, and worked examples with

  8. Performance analysis of IMS based LTE and WIMAX integration architectures

    Directory of Open Access Journals (Sweden)

    A. Bagubali

    2016-12-01

    Full Text Available In the current networking field many research works are going on regarding the integration of different wireless technologies, with the aim of providing uninterrupted connectivity to the user anywhere, with high data rates due to increased demand. However, the number of objects like smart devices, industrial machines, smart homes, connected by wireless interface is dramatically increasing due to the evolution of cloud computing and internet of things technology. This Paper begins with the challenges involved in such integrations and then explains the role of different couplings and different architectures. This paper also gives further improvement in the LTE and Wimax integration architectures to provide seamless vertical handover and flexible quality of service for supporting voice, video, multimedia services over IP network and mobility management with the help of IMS networks. Evaluation of various parameters like handover delay, cost of signalling, packet loss,, is done and the performance of the interworking architecture is analysed from the simulation results. Finally, it concludes that the cross layer scenario is better than the non cross layer scenario.

  9. Building and measuring a high performance network architecture

    Energy Technology Data Exchange (ETDEWEB)

    Kramer, William T.C.; Toole, Timothy; Fisher, Chuck; Dugan, Jon; Wheeler, David; Wing, William R; Nickless, William; Goddard, Gregory; Corbato, Steven; Love, E. Paul; Daspit, Paul; Edwards, Hal; Mercer, Linden; Koester, David; Decina, Basil; Dart, Eli; Paul Reisinger, Paul; Kurihara, Riki; Zekauskas, Matthew J; Plesset, Eric; Wulf, Julie; Luce, Douglas; Rogers, James; Duncan, Rex; Mauth, Jeffery

    2001-04-20

    Once a year, the SC conferences present a unique opportunity to create and build one of the most complex and highest performance networks in the world. At SC2000, large-scale and complex local and wide area networking connections were demonstrated, including large-scale distributed applications running on different architectures. This project was designed to use the unique opportunity presented at SC2000 to create a testbed network environment and then use that network to demonstrate and evaluate high performance computational and communication applications. This testbed was designed to incorporate many interoperable systems and services and was designed for measurement from the very beginning. The end results were key insights into how to use novel, high performance networking technologies and to accumulate measurements that will give insights into the networks of the future.

  10. An Adaptive Middleware for Improved Computational Performance

    DEFF Research Database (Denmark)

    Bonnichsen, Lars Frydendal

    , we are improving computational performance by exploiting modern hardware features, such as dynamic voltage-frequency scaling and transactional memory. Adapting software is an iterative process, requiring that we continually revisit it to meet new requirements or realities; a time consuming process......The performance improvements in computer systems over the past 60 years have been fueled by an exponential increase in energy efficiency. In recent years, the phenomenon known as the end of Dennard’s scaling has slowed energy efficiency improvements — but improving computer energy efficiency...... is more important now than ever. Traditionally, most improvements in computer energy efficiency have come from improvements in lithography — the ability to produce smaller transistors — and computer architecture - the ability to apply those transistors efficiently. Since the end of scaling, we have seen...

  11. The Architectural Designs of a Nanoscale Computing Model

    Directory of Open Access Journals (Sweden)

    Mary M. Eshaghian-Wilner

    2004-08-01

    Full Text Available A generic nanoscale computing model is presented in this paper. The model consists of a collection of fully interconnected nanoscale computing modules, where each module is a cube of cells made out of quantum dots, spins, or molecules. The cells dynamically switch between two states by quantum interactions among their neighbors in all three dimensions. This paper includes a brief introduction to the field of nanotechnology from a computing point of view and presents a set of preliminary architectural designs for fabricating the nanoscale model studied.

  12. Performance evaluation of microservices architectures using containers

    OpenAIRE

    Amaral, Marcelo; Polo, Jordà; Carrera Pérez, David; Mohomed, Iqbal; Unuvar, Merve; Steinder, Malgorzata

    2015-01-01

    Microservices architecture has started a new trend for application development for a number of reasons: (1) to reduce complexity by using tiny services; (2) to scale, remove and deploy parts of the system easily; (3) to improve flexibility to use different frameworks and tools; (4) to increase the overall scalability; and (5) to improve the resilience of the system. Containers have empowered the usage of microservices architectures by being lightweight, providing fast start-up times, and havi...

  13. Nanotube devices based crossbar architecture: toward neuromorphic computing

    International Nuclear Information System (INIS)

    Zhao, W S; Gamrat, C; Agnus, G; Derycke, V; Filoramo, A; Bourgoin, J-P

    2010-01-01

    Nanoscale devices such as carbon nanotube and nanowires based transistors, memristors and molecular devices are expected to play an important role in the development of new computing architectures. While their size represents a decisive advantage in terms of integration density, it also raises the critical question of how to efficiently address large numbers of densely integrated nanodevices without the need for complex multi-layer interconnection topologies similar to those used in CMOS technology. Two-terminal programmable devices in crossbar geometry seem particularly attractive, but suffer from severe addressing difficulties due to cross-talk, which implies complex programming procedures. Three-terminal devices can be easily addressed individually, but with limited gain in terms of interconnect integration. We show how optically gated carbon nanotube devices enable efficient individual addressing when arranged in a crossbar geometry with shared gate electrodes. This topology is particularly well suited for parallel programming or learning in the context of neuromorphic computing architectures.

  14. Network architecture test-beds as platforms for ubiquitous computing.

    Science.gov (United States)

    Roscoe, Timothy

    2008-10-28

    Distributed systems research, and in particular ubiquitous computing, has traditionally assumed the Internet as a basic underlying communications substrate. Recently, however, the networking research community has come to question the fundamental design or 'architecture' of the Internet. This has been led by two observations: first, that the Internet as it stands is now almost impossible to evolve to support new functionality; and second, that modern applications of all kinds now use the Internet rather differently, and frequently implement their own 'overlay' networks above it to work around its perceived deficiencies. In this paper, I discuss recent academic projects to allow disruptive change to the Internet architecture, and also outline a radically different view of networking for ubiquitous computing that such proposals might facilitate.

  15. Virtual Prototyping and Performance Analysis of Two Memory Architectures

    Directory of Open Access Journals (Sweden)

    Huda S. Muhammad

    2009-01-01

    Full Text Available The gap between CPU and memory speed has always been a critical concern that motivated researchers to study and analyze the performance of memory hierarchical architectures. In the early stages of the design cycle, performance evaluation methodologies can be used to leverage exploration at the architectural level and assist in making early design tradeoffs. In this paper, we use simulation platforms developed using the VisualSim tool to compare the performance of two memory architectures, namely, the Direct Connect architecture of the Opteron, and the Shared Bus of the Xeon multicore processors. Key variations exist between the two memory architectures and both design approaches provide rich platforms that call for the early use of virtual system prototyping and simulation techniques to assess performance at an early stage in the design cycle.

  16. The path toward HEP High Performance Computing

    International Nuclear Information System (INIS)

    Apostolakis, John; Brun, René; Gheata, Andrei; Wenzel, Sandro; Carminati, Federico

    2014-01-01

    High Energy Physics code has been known for making poor use of high performance computing architectures. Efforts in optimising HEP code on vector and RISC architectures have yield limited results and recent studies have shown that, on modern architectures, it achieves a performance between 10% and 50% of the peak one. Although several successful attempts have been made to port selected codes on GPUs, no major HEP code suite has a 'High Performance' implementation. With LHC undergoing a major upgrade and a number of challenging experiments on the drawing board, HEP cannot any longer neglect the less-than-optimal performance of its code and it has to try making the best usage of the hardware. This activity is one of the foci of the SFT group at CERN, which hosts, among others, the Root and Geant4 project. The activity of the experiments is shared and coordinated via a Concurrency Forum, where the experience in optimising HEP code is presented and discussed. Another activity is the Geant-V project, centred on the development of a highperformance prototype for particle transport. Achieving a good concurrency level on the emerging parallel architectures without a complete redesign of the framework can only be done by parallelizing at event level, or with a much larger effort at track level. Apart the shareable data structures, this typically implies a multiplication factor in terms of memory consumption compared to the single threaded version, together with sub-optimal handling of event processing tails. Besides this, the low level instruction pipelining of modern processors cannot be used efficiently to speedup the program. We have implemented a framework that allows scheduling vectors of particles to an arbitrary number of computing resources in a fine grain parallel approach. The talk will review the current optimisation activities within the SFT group with a particular emphasis on the development perspectives towards a simulation framework able to profit

  17. Architecture of 32 bit CISC (Complex Instruction Set Computer) microprocessors

    International Nuclear Information System (INIS)

    Jove, T.M.; Ayguade, E.; Valero, M.

    1988-01-01

    In this paper we describe the main topics about the architecture of the best known 32-bit CISC microprocessors; i80386, MC68000 family, NS32000 series and Z80000. We focus on the high level languages support, operating system design facilities, memory management, techniques to speed up the overall performance and program debugging facilities. (Author)

  18. Parallel algorithms and architecture for computation of manipulator forward dynamics

    Science.gov (United States)

    Fijany, Amir; Bejczy, Antal K.

    1989-01-01

    Parallel computation of manipulator forward dynamics is investigated. Considering three classes of algorithms for the solution of the problem, that is, the O(n), the O(n exp 2), and the O(n exp 3) algorithms, parallelism in the problem is analyzed. It is shown that the problem belongs to the class of NC and that the time and processors bounds are of O(log2/2n) and O(n exp 4), respectively. However, the fastest stable parallel algorithms achieve the computation time of O(n) and can be derived by parallelization of the O(n exp 3) serial algorithms. Parallel computation of the O(n exp 3) algorithms requires the development of parallel algorithms for a set of fundamentally different problems, that is, the Newton-Euler formulation, the computation of the inertia matrix, decomposition of the symmetric, positive definite matrix, and the solution of triangular systems. Parallel algorithms for this set of problems are developed which can be efficiently implemented on a unique architecture, a triangular array of n(n+2)/2 processors with a simple nearest-neighbor interconnection. This architecture is particularly suitable for VLSI and WSI implementations. The developed parallel algorithm, compared to the best serial O(n) algorithm, achieves an asymptotic speedup of more than two orders-of-magnitude in the computation the forward dynamics.

  19. Methodology of modeling and measuring computer architectures for plasma simulations

    Science.gov (United States)

    Wang, L. P. T.

    1977-01-01

    A brief introduction to plasma simulation using computers and the difficulties on currently available computers is given. Through the use of an analyzing and measuring methodology - SARA, the control flow and data flow of a particle simulation model REM2-1/2D are exemplified. After recursive refinements the total execution time may be greatly shortened and a fully parallel data flow can be obtained. From this data flow, a matched computer architecture or organization could be configured to achieve the computation bound of an application problem. A sequential type simulation model, an array/pipeline type simulation model, and a fully parallel simulation model of a code REM2-1/2D are proposed and analyzed. This methodology can be applied to other application problems which have implicitly parallel nature.

  20. On the impact of approximate computation in an analog DeSTIN architecture.

    Science.gov (United States)

    Young, Steven; Lu, Junjie; Holleman, Jeremy; Arel, Itamar

    2014-05-01

    Deep machine learning (DML) holds the potential to revolutionize machine learning by automating rich feature extraction, which has become the primary bottleneck of human engineering in pattern recognition systems. However, the heavy computational burden renders DML systems implemented on conventional digital processors impractical for large-scale problems. The highly parallel computations required to implement large-scale deep learning systems are well suited to custom hardware. Analog computation has demonstrated power efficiency advantages of multiple orders of magnitude relative to digital systems while performing nonideal computations. In this paper, we investigate typical error sources introduced by analog computational elements and their impact on system-level performance in DeSTIN--a compositional deep learning architecture. These inaccuracies are evaluated on a pattern classification benchmark, clearly demonstrating the robustness of the underlying algorithm to the errors introduced by analog computational elements. A clear understanding of the impacts of nonideal computations is necessary to fully exploit the efficiency of analog circuits.

  1. A Methodology for Making Early Comparative Architecture Performance Evaluations

    Science.gov (United States)

    Doyle, Gerald S.

    2010-01-01

    Complex and expensive systems' development suffers from a lack of method for making good system-architecture-selection decisions early in the development process. Failure to make a good system-architecture-selection decision increases the risk that a development effort will not meet cost, performance and schedule goals. This research provides a…

  2. Architecture and VHDL behavioural validation of a parallel processor dedicated to computer vision

    International Nuclear Information System (INIS)

    Collette, Thierry

    1992-01-01

    Speeding up image processing is mainly obtained using parallel computers; SIMD processors (single instruction stream, multiple data stream) have been developed, and have proven highly efficient regarding low-level image processing operations. Nevertheless, their performances drop for most intermediate of high level operations, mainly when random data reorganisations in processor memories are involved. The aim of this thesis was to extend the SIMD computer capabilities to allow it to perform more efficiently at the image processing intermediate level. The study of some representative algorithms of this class, points out the limits of this computer. Nevertheless, these limits can be erased by architectural modifications. This leads us to propose SYMPATIX, a new SIMD parallel computer. To valid its new concept, a behavioural model written in VHDL - Hardware Description Language - has been elaborated. With this model, the new computer performances have been estimated running image processing algorithm simulations. VHDL modeling approach allows to perform the system top down electronic design giving an easy coupling between system architectural modifications and their electronic cost. The obtained results show SYMPATIX to be an efficient computer for low and intermediate level image processing. It can be connected to a high level computer, opening up the development of new computer vision applications. This thesis also presents, a top down design method, based on the VHDL, intended for electronic system architects. (author) [fr

  3. Missile signal processing common computer architecture for rapid technology upgrade

    Science.gov (United States)

    Rabinkin, Daniel V.; Rutledge, Edward; Monticciolo, Paul

    2004-10-01

    Interceptor missiles process IR images to locate an intended target and guide the interceptor towards it. Signal processing requirements have increased as the sensor bandwidth increases and interceptors operate against more sophisticated targets. A typical interceptor signal processing chain is comprised of two parts. Front-end video processing operates on all pixels of the image and performs such operations as non-uniformity correction (NUC), image stabilization, frame integration and detection. Back-end target processing, which tracks and classifies targets detected in the image, performs such algorithms as Kalman tracking, spectral feature extraction and target discrimination. In the past, video processing was implemented using ASIC components or FPGAs because computation requirements exceeded the throughput of general-purpose processors. Target processing was performed using hybrid architectures that included ASICs, DSPs and general-purpose processors. The resulting systems tended to be function-specific, and required custom software development. They were developed using non-integrated toolsets and test equipment was developed along with the processor platform. The lifespan of a system utilizing the signal processing platform often spans decades, while the specialized nature of processor hardware and software makes it difficult and costly to upgrade. As a result, the signal processing systems often run on outdated technology, algorithms are difficult to update, and system effectiveness is impaired by the inability to rapidly respond to new threats. A new design approach is made possible three developments; Moore's Law - driven improvement in computational throughput; a newly introduced vector computing capability in general purpose processors; and a modern set of open interface software standards. Today's multiprocessor commercial-off-the-shelf (COTS) platforms have sufficient throughput to support interceptor signal processing requirements. This application

  4. Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures

    Science.gov (United States)

    2017-10-04

    to the memory architectures of CPUs and GPUs to obtain good performance and result in good memory performance using cache management. These methods ...Accomplishments: The PI and students has developed new methods for path and ray tracing and their Report Date: 14-Oct-2017 INVESTIGATOR(S): Phone...The efficiency of our method makes it a good candidate for forming hybrid schemes with wave-based models. One possibility is to couple the ray curve

  5. Blackboard architecture and qualitative model in a computer aided assistant designed to define computers for HEP computing

    International Nuclear Information System (INIS)

    Nodarse, F.F.; Ivanov, V.G.

    1991-01-01

    Using BLACKBOARD architecture and qualitative model, an expert systm was developed to assist the use in defining the computers method for High Energy Physics computing. The COMEX system requires an IBM AT personal computer or compatible with than 640 Kb RAM and hard disk. 5 refs.; 9 figs

  6. Memristor-Based Synapse Design and Training Scheme for Neuromorphic Computing Architecture

    Science.gov (United States)

    2012-06-01

    system level built upon the conventional Von Neumann computer architecture [2][3]. Developing the neuromorphic architecture at chip level by...SCHEME FOR NEUROMORPHIC COMPUTING ARCHITECTURE 5a. CONTRACT NUMBER FA8750-11-2-0046 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER 62788F 6...creation of memristor-based neuromorphic computing architecture. Rather than the existing crossbar-based neuron network designs, we focus on memristor

  7. Performance evaluation of enterprise architecture using fuzzy sequence diagram

    Directory of Open Access Journals (Sweden)

    Mohammad Atasheneh

    2014-01-01

    Full Text Available Developing an Enterprise Architecture is a complex task and to control the complexity of the regulatory framework we need to measure the relative performance of one system against other available systems. On the other hand, enterprise architecture cannot be organized without the use of a logical structure. The framework provides a logical structure for classifying architectural output. Among the common architectural framework, the C4ISR framework and methodology of the product is one of the most popular techniques. In this paper, given the existing uncertainties in system development and information systems, a new version of UML called Fuzzy-UML is proposed for enterprise architecture development based on fuzzy Petri nets. In addition, the performance of the system is also evaluated based on Fuzzy sequence diagram.

  8. Contagious architecture: computation, aesthetics, and space (technologies of lived abstraction)

    CERN Document Server

    Parisi, Luciana

    2013-01-01

    In Contagious Architecture, Luciana Parisi offers a philosophical inquiry into the status of the algorithm in architectural and interaction design. Her thesis is that algorithmic computation is not simply an abstract mathematical tool but constitutes a mode of thought in its own right, in that its operation extends into forms of abstraction that lie beyond direct human cognition and control. These include modes of infinity, contingency, and indeterminacy, as well as incomputable quantities underlying the iterative process of algorithmic processing. The main philosophical source for the project is Alfred North Whitehead, whose process philosophy is specifically designed to provide a vocabulary for "modes of thought" exhibiting various degrees of autonomy from human agency even as they are mobilized by it. Because algorithmic processing lies at the heart of the design practices now reshaping our world -- from the physical spaces of our built environment to the networked spaces of digital culture -- the nature o...

  9. The Architecture and Administration of the ATLAS Online Computing System

    CERN Document Server

    Dobson, M; Ertorer, E; Garitaonandia, H; Leahu, L; Leahu, M; Malciu, I M; Panikashvili, E; Topurov, A; Ünel, G; Computing In High Energy and Nuclear Physics

    2006-01-01

    The needs of ATLAS experiment at the upcoming LHC accelerator, CERN, in terms of data transmission rates and processing power require a large cluster of computers (of the order of thousands) administrated and exploited in a coherent and optimal manner. Requirements like stability, robustness and fast recovery in case of failure impose a server-client system architecture with servers distributed in a tree like structure and clients booted from the network. For security reasons, the system should be accessible only through an application gateway and, also to ensure the autonomy of the system, the network services should be provided internally by dedicated machines in synchronization with CERN IT department's central services. The paper describes a small scale implementation of the system architecture that fits the given requirements and constraints. Emphasis will be put on the mechanisms and tools used to net boot the clients via the "Boot With Me" project and to synchronize information within the cluster via t...

  10. Client-server computer architecture saves costs and eliminates bottlenecks

    International Nuclear Information System (INIS)

    Darukhanavala, P.P.; Davidson, M.C.; Tyler, T.N.; Blaskovich, F.T.; Smith, C.

    1992-01-01

    This paper reports that workstation, client-server architecture saved costs and eliminated bottlenecks that BP Exploration (Alaska) Inc. experienced with mainframe computer systems. In 1991, BP embarked on an ambitious project to change technical computing for its Prudhoe Bay, Endicott, and Kuparuk operations on Alaska's North Slope. This project promised substantial rewards, but also involved considerable risk. The project plan called for reservoir simulations (which historically had run on a Cray Research Inc. X-MP supercomputer in the company's Houston data center) to be run on small computer workstations. Additionally, large Prudhoe Bay, Endicott, and Kuparuk production and reservoir engineering data bases and related applications also would be moved to workstations, replacing a Digital Equipment Corp. VAX cluster in Anchorage

  11. An energy efficient and high speed architecture for convolution computing based on binary resistive random access memory

    Science.gov (United States)

    Liu, Chen; Han, Runze; Zhou, Zheng; Huang, Peng; Liu, Lifeng; Liu, Xiaoyan; Kang, Jinfeng

    2018-04-01

    In this work we present a novel convolution computing architecture based on metal oxide resistive random access memory (RRAM) to process the image data stored in the RRAM arrays. The proposed image storage architecture shows performances of better speed-device consumption efficiency compared with the previous kernel storage architecture. Further we improve the architecture for a high accuracy and low power computing by utilizing the binary storage and the series resistor. For a 28 × 28 image and 10 kernels with a size of 3 × 3, compared with the previous kernel storage approach, the newly proposed architecture shows excellent performances including: 1) almost 100% accuracy within 20% LRS variation and 90% HRS variation; 2) more than 67 times speed boost; 3) 71.4% energy saving.

  12. Performative Responsive Architecture Powered by Climate

    DEFF Research Database (Denmark)

    Foged, Isak Worre; Pasold, Anke

    2010-01-01

    This paper is to link the thermonastic behaviour found in flower heads in nature with the material research into bimetallic strips. This is to advance the discussion of environmental responsive systems on the basis of thermal properties for advanced environmental studies within the field of archi......This paper is to link the thermonastic behaviour found in flower heads in nature with the material research into bimetallic strips. This is to advance the discussion of environmental responsive systems on the basis of thermal properties for advanced environmental studies within the field...... of architecture in general and in the form of a responsive building skin in particular....

  13. Power efficient and high performance VLSI architecture for AES algorithm

    Directory of Open Access Journals (Sweden)

    K. Kalaiselvi

    2015-09-01

    Full Text Available Advanced encryption standard (AES algorithm has been widely deployed in cryptographic applications. This work proposes a low power and high throughput implementation of AES algorithm using key expansion approach. We minimize the power consumption and critical path delay using the proposed high performance architecture. It supports both encryption and decryption using 256-bit keys with a throughput of 0.06 Gbps. The VHDL language is utilized for simulating the design and an FPGA chip has been used for the hardware implementations. Experimental results reveal that the proposed AES architectures offer superior performance than the existing VLSI architectures in terms of power, throughput and critical path delay.

  14. The path toward HEP High Performance Computing

    CERN Document Server

    Apostolakis, John; Carminati, Federico; Gheata, Andrei; Wenzel, Sandro

    2014-01-01

    High Energy Physics code has been known for making poor use of high performance computing architectures. Efforts in optimising HEP code on vector and RISC architectures have yield limited results and recent studies have shown that, on modern architectures, it achieves a performance between 10% and 50% of the peak one. Although several successful attempts have been made to port selected codes on GPUs, no major HEP code suite has a 'High Performance' implementation. With LHC undergoing a major upgrade and a number of challenging experiments on the drawing board, HEP cannot any longer neglect the less-than-optimal performance of its code and it has to try making the best usage of the hardware. This activity is one of the foci of the SFT group at CERN, which hosts, among others, the Root and Geant4 project. The activity of the experiments is shared and coordinated via a Concurrency Forum, where the experience in optimising HEP code is presented and discussed. Another activity is the Geant-V project, centred on th...

  15. High-performance reconfigurable hardware architecture for restricted Boltzmann machines.

    Science.gov (United States)

    Ly, Daniel Le; Chow, Paul

    2010-11-01

    Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications has been limited. A primary cause for this lack of adoption is that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can exploit the inherent parallelism in neural networks is desired. This paper investigates how the restricted Boltzmann machine (RBM), which is a popular type of neural network, can be mapped to a high-performance hardware architecture on field-programmable gate array (FPGA) platforms. The proposed modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. A method to partition large RBMs into smaller congruent components is also presented, allowing the distribution of one RBM across multiple FPGA resources. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100 MHz through a variety of different configurations. The maximum performance was obtained by instantiating an RBM of 256 × 256 nodes distributed across four FPGAs, which resulted in a computational speed of 3.13 billion connection-updates-per-second and a speedup of 145-fold over an optimized C program running on a 2.8-GHz Intel processor.

  16. Supporting Undergraduate Computer Architecture Students Using a Visual MIPS64 CPU Simulator

    Science.gov (United States)

    Patti, D.; Spadaccini, A.; Palesi, M.; Fazzino, F.; Catania, V.

    2012-01-01

    The topics of computer architecture are always taught using an Assembly dialect as an example. The most commonly used textbooks in this field use the MIPS64 Instruction Set Architecture (ISA) to help students in learning the fundamentals of computer architecture because of its orthogonality and its suitability for real-world applications. This…

  17. Computational Biology and High Performance Computing 2000

    Energy Technology Data Exchange (ETDEWEB)

    Simon, Horst D.; Zorn, Manfred D.; Spengler, Sylvia J.; Shoichet, Brian K.; Stewart, Craig; Dubchak, Inna L.; Arkin, Adam P.

    2000-10-19

    The pace of extraordinary advances in molecular biology has accelerated in the past decade due in large part to discoveries coming from genome projects on human and model organisms. The advances in the genome project so far, happening well ahead of schedule and under budget, have exceeded any dreams by its protagonists, let alone formal expectations. Biologists expect the next phase of the genome project to be even more startling in terms of dramatic breakthroughs in our understanding of human biology, the biology of health and of disease. Only today can biologists begin to envision the necessary experimental, computational and theoretical steps necessary to exploit genome sequence information for its medical impact, its contribution to biotechnology and economic competitiveness, and its ultimate contribution to environmental quality. High performance computing has become one of the critical enabling technologies, which will help to translate this vision of future advances in biology into reality. Biologists are increasingly becoming aware of the potential of high performance computing. The goal of this tutorial is to introduce the exciting new developments in computational biology and genomics to the high performance computing community.

  18. Insights into Working Memory from The Perspective of The EPIC Architecture for Modeling Skilled Perceptual-Motor and Cognitive Human Performance

    National Research Council Canada - National Science Library

    Kieras, David

    1998-01-01

    Computational modeling of human perceptual-motor and cognitive performance based on a comprehensive detailed information- processing architecture leads to new insights about the components of working memory...

  19. Unstructured Computational Aerodynamics on Many Integrated Core Architecture

    KAUST Repository

    Al Farhan, Mohammed A.

    2016-06-08

    Shared memory parallelization of the flux kernel of PETSc-FUN3D, an unstructured tetrahedral mesh Euler flow code previously studied for distributed memory and multi-core shared memory, is evaluated on up to 61 cores per node and up to 4 threads per core. We explore several thread-level optimizations to improve flux kernel performance on the state-of-the-art many integrated core (MIC) Intel processor Xeon Phi “Knights Corner,” with a focus on strong thread scaling. While the linear algebraic kernel is bottlenecked by memory bandwidth for even modest numbers of cores sharing a common memory, the flux kernel, which arises in the control volume discretization of the conservation law residuals and in the formation of the preconditioner for the Jacobian by finite-differencing the conservation law residuals, is compute-intensive and is known to exploit effectively contemporary multi-core hardware. We extend study of the performance of the flux kernel to the Xeon Phi in three thread affinity modes, namely scatter, compact, and balanced, in both offload and native mode, with and without various code optimizations to improve alignment and reduce cache coherency penalties. Relative to baseline “out-of-the-box” optimized compilation, code restructuring optimizations provide about 3.8x speedup using the offload mode and about 5x speedup using the native mode. Even with these gains for the flux kernel, with respect to execution time the MIC simply achieves par with optimized compilation on a contemporary multi-core Intel CPU, the 16-core Sandy Bridge E5 2670. Nevertheless, the optimizations employed to reduce the data motion and cache coherency protocol penalties of the MIC are expected to be of value for CFD and many other unstructured applications as many-core architecture evolves. We explore large-scale distributed-shared memory performance on the Cray XC40 supercomputer, to demonstrate that optimizations employed on Phi hybridize to this context, where each of

  20. Unstructured Computational Aerodynamics on Many Integrated Core Architecture

    KAUST Repository

    Al Farhan, Mohammed A.; Kaushik, Dinesh K.; Keyes, David E.

    2016-01-01

    Shared memory parallelization of the flux kernel of PETSc-FUN3D, an unstructured tetrahedral mesh Euler flow code previously studied for distributed memory and multi-core shared memory, is evaluated on up to 61 cores per node and up to 4 threads per core. We explore several thread-level optimizations to improve flux kernel performance on the state-of-the-art many integrated core (MIC) Intel processor Xeon Phi “Knights Corner,” with a focus on strong thread scaling. While the linear algebraic kernel is bottlenecked by memory bandwidth for even modest numbers of cores sharing a common memory, the flux kernel, which arises in the control volume discretization of the conservation law residuals and in the formation of the preconditioner for the Jacobian by finite-differencing the conservation law residuals, is compute-intensive and is known to exploit effectively contemporary multi-core hardware. We extend study of the performance of the flux kernel to the Xeon Phi in three thread affinity modes, namely scatter, compact, and balanced, in both offload and native mode, with and without various code optimizations to improve alignment and reduce cache coherency penalties. Relative to baseline “out-of-the-box” optimized compilation, code restructuring optimizations provide about 3.8x speedup using the offload mode and about 5x speedup using the native mode. Even with these gains for the flux kernel, with respect to execution time the MIC simply achieves par with optimized compilation on a contemporary multi-core Intel CPU, the 16-core Sandy Bridge E5 2670. Nevertheless, the optimizations employed to reduce the data motion and cache coherency protocol penalties of the MIC are expected to be of value for CFD and many other unstructured applications as many-core architecture evolves. We explore large-scale distributed-shared memory performance on the Cray XC40 supercomputer, to demonstrate that optimizations employed on Phi hybridize to this context, where each of

  1. ARCHITECTURE OF WEB BASED COMPUTER-AIDED MANUFACTURING SYSTEM

    Directory of Open Access Journals (Sweden)

    N. E. Filyukov

    2014-09-01

    Full Text Available The paper deals with design of a web-based system for Computer-Aided Manufacturing (CAM. Remote applications and databases located in the "private cloud" are proposed to be the basis of such system. The suggested approach contains: service - oriented architecture, using web applications and web services as modules, multi-agent technologies for implementation of information exchange functions between the components of the system and the usage of PDM - system for managing technology projects within the CAM. The proposed architecture involves CAM conversion into the corporate information system that will provide coordinated functioning of subsystems based on a common information space, as well as parallelize collective work on technology projects and be able to provide effective control of production planning. A system has been developed within this architecture which gives the possibility for a rather simple technological subsystems connect to the system and implementation of their interaction. The system makes it possible to produce CAM configuration for a particular company on the set of developed subsystems and databases specifying appropriate access rights for employees of the company. The proposed approach simplifies maintenance of software and information support for CAM subsystems due to their central location in the data center. The results can be used as a basis for CAM design and testing within the learning process for development and modernization of the system algorithms, and then can be tested in the extended enterprise.

  2. Scalable quantum computer architecture with coupled donor-quantum dot qubits

    Science.gov (United States)

    Schenkel, Thomas; Lo, Cheuk Chi; Weis, Christoph; Lyon, Stephen; Tyryshkin, Alexei; Bokor, Jeffrey

    2014-08-26

    A quantum bit computing architecture includes a plurality of single spin memory donor atoms embedded in a semiconductor layer, a plurality of quantum dots arranged with the semiconductor layer and aligned with the donor atoms, wherein a first voltage applied across at least one pair of the aligned quantum dot and donor atom controls a donor-quantum dot coupling. A method of performing quantum computing in a scalable architecture quantum computing apparatus includes arranging a pattern of single spin memory donor atoms in a semiconductor layer, forming a plurality of quantum dots arranged with the semiconductor layer and aligned with the donor atoms, applying a first voltage across at least one aligned pair of a quantum dot and donor atom to control a donor-quantum dot coupling, and applying a second voltage between one or more quantum dots to control a Heisenberg exchange J coupling between quantum dots and to cause transport of a single spin polarized electron between quantum dots.

  3. Exploring Hardware-Based Primitives to Enhance Parallel Security Monitoring in a Novel Computing Architecture

    National Research Council Canada - National Science Library

    Mott, Stephen

    2007-01-01

    .... In doing this, we propose a novel computing architecture, derived from a contemporary shared memory architecture, that facilitates efficient security-related monitoring in real-time, while keeping...

  4. Computing Architecture of the ALICE Detector Control System

    CERN Document Server

    Augustinus, A; Moreno, A; Kurepin, A N; De Cataldo, G; Pinazza, O; Rosinský, P; Lechman, M; Jirdén, L S

    2011-01-01

    The ALICE Detector Control System (DCS) is based on a commercial SCADA product, running on a large Windows computer cluster. It communicates with about 1200 network attached devices to assure safe and stable operation of the experiment. In the presentation we focus on the design of the ALICE DCS computer systems. We describe the management of data flow, mechanisms for handling the large data amounts and information exchange with external systems. One of the key operational requirements is an intuitive, error proof and robust user interface allowing for simple operation of the experiment. At the same time the typical operator task, like trending or routine checks of the devices, must be decoupled from the automated operation in order to prevent overload of critical parts of the system. All these requirements must be implemented in an environment with strict security requirements. In the presentation we explain how these demands affected the architecture of the ALICE DCS.

  5. Silicon CMOS architecture for a spin-based quantum computer.

    Science.gov (United States)

    Veldhorst, M; Eenink, H G J; Yang, C H; Dzurak, A S

    2017-12-15

    Recent advances in quantum error correction codes for fault-tolerant quantum computing and physical realizations of high-fidelity qubits in multiple platforms give promise for the construction of a quantum computer based on millions of interacting qubits. However, the classical-quantum interface remains a nascent field of exploration. Here, we propose an architecture for a silicon-based quantum computer processor based on complementary metal-oxide-semiconductor (CMOS) technology. We show how a transistor-based control circuit together with charge-storage electrodes can be used to operate a dense and scalable two-dimensional qubit system. The qubits are defined by the spin state of a single electron confined in quantum dots, coupled via exchange interactions, controlled using a microwave cavity, and measured via gate-based dispersive readout. We implement a spin qubit surface code, showing the prospects for universal quantum computation. We discuss the challenges and focus areas that need to be addressed, providing a path for large-scale quantum computing.

  6. An ATLAS distributed computing architecture for HL-LHC

    CERN Document Server

    Campana, Simone; The ATLAS collaboration

    2017-01-01

    The ATLAS collaboration started a process to understand the computing needs for the High Luminosity LHC era. Based on our best understanding of the computing model input parameters for the HL-LHC data taking conditions, results indicate the need for a larger amount of computational and storage resources with respect of the projection of constant yearly budget for computing in 2026. Filling the gap between the projection and the needs will be one of the challenges in preparation for LHC Run-4. While the gains from improvements in offline software will play a crucial role in this process, a different model for data processing, management, access and bookkeeping should also be envisaged to optimise resource usage. In this contribution we will describe a straw man of this model, founded on basic principles such as single event level granularity for data processing and virtual data. We will explain how the current architecture will evolve adiabatically into the future distributed computing system, through the prot...

  7. Computer aided design of architecture of degradable tissue engineering scaffolds.

    Science.gov (United States)

    Heljak, M K; Kurzydlowski, K J; Swieszkowski, W

    2017-11-01

    One important factor affecting the process of tissue regeneration is scaffold stiffness loss, which should be properly balanced with the rate of tissue regeneration. The aim of the research reported here was to develop a computer tool for designing the architecture of biodegradable scaffolds fabricated by melt-dissolution deposition systems (e.g. Fused Deposition Modeling) to provide the required scaffold stiffness at each stage of degradation/regeneration. The original idea presented in the paper is that the stiffness of a tissue engineering scaffold can be controlled during degradation by means of a proper selection of the diameter of the constituent fibers and the distances between them. This idea is based on the size-effect on degradation of aliphatic polyesters. The presented computer tool combines a genetic algorithm and a diffusion-reaction model of polymer hydrolytic degradation. In particular, we show how to design the architecture of scaffolds made of poly(DL-lactide-co-glycolide) with the required Young's modulus change during hydrolytic degradation.

  8. A modular architecture for transparent computation in recurrent neural networks.

    Science.gov (United States)

    Carmantini, Giovanni S; Beim Graben, Peter; Desroches, Mathieu; Rodrigues, Serafim

    2017-01-01

    Computation is classically studied in terms of automata, formal languages and algorithms; yet, the relation between neural dynamics and symbolic representations and operations is still unclear in traditional eliminative connectionism. Therefore, we suggest a unique perspective on this central issue, to which we would like to refer as transparent connectionism, by proposing accounts of how symbolic computation can be implemented in neural substrates. In this study we first introduce a new model of dynamics on a symbolic space, the versatile shift, showing that it supports the real-time simulation of a range of automata. We then show that the Gödelization of versatile shifts defines nonlinear dynamical automata, dynamical systems evolving on a vectorial space. Finally, we present a mapping between nonlinear dynamical automata and recurrent artificial neural networks. The mapping defines an architecture characterized by its granular modularity, where data, symbolic operations and their control are not only distinguishable in activation space, but also spatially localizable in the network itself, while maintaining a distributed encoding of symbolic representations. The resulting networks simulate automata in real-time and are programmed directly, in the absence of network training. To discuss the unique characteristics of the architecture and their consequences, we present two examples: (i) the design of a Central Pattern Generator from a finite-state locomotive controller, and (ii) the creation of a network simulating a system of interactive automata that supports the parsing of garden-path sentences as investigated in psycholinguistics experiments. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Memory intensive functional architecture for distributed computer control systems

    International Nuclear Information System (INIS)

    Dimmler, D.G.

    1983-10-01

    A memory-intensive functional architectue for distributed data-acquisition, monitoring, and control systems with large numbers of nodes has been conceptually developed and applied in several large-scale and some smaller systems. This discussion concentrates on: (1) the basic architecture; (2) recent expansions of the architecture which now become feasible in view of the rapidly developing component technologies in microprocessors and functional large-scale integration circuits; and (3) implementation of some key hardware and software structures and one system implementation which is a system for performing control and data acquisition of a neutron spectrometer at the Brookhaven High Flux Beam Reactor. The spectrometer is equipped with a large-area position-sensitive neutron detector

  10. Factoring symmetric indefinite matrices on high-performance architectures

    Science.gov (United States)

    Jones, Mark T.; Patrick, Merrell L.

    1990-01-01

    The Bunch-Kaufman algorithm is the method of choice for factoring symmetric indefinite matrices in many applications. However, the Bunch-Kaufman algorithm does not take advantage of high-performance architectures such as the Cray Y-MP. Three new algorithms, based on Bunch-Kaufman factorization, that take advantage of such architectures are described. Results from an implementation of the third algorithm are presented.

  11. Heterogeneous computing architecture for fast detection of SNP-SNP interactions.

    Science.gov (United States)

    Sluga, Davor; Curk, Tomaz; Zupan, Blaz; Lotric, Uros

    2014-06-25

    The extent of data in a typical genome-wide association study (GWAS) poses considerable computational challenges to software tools for gene-gene interaction discovery. Exhaustive evaluation of all interactions among hundreds of thousands to millions of single nucleotide polymorphisms (SNPs) may require weeks or even months of computation. Massively parallel hardware within a modern Graphic Processing Unit (GPU) and Many Integrated Core (MIC) coprocessors can shorten the run time considerably. While the utility of GPU-based implementations in bioinformatics has been well studied, MIC architecture has been introduced only recently and may provide a number of comparative advantages that have yet to be explored and tested. We have developed a heterogeneous, GPU and Intel MIC-accelerated software module for SNP-SNP interaction discovery to replace the previously single-threaded computational core in the interactive web-based data exploration program SNPsyn. We report on differences between these two modern massively parallel architectures and their software environments. Their utility resulted in an order of magnitude shorter execution times when compared to the single-threaded CPU implementation. GPU implementation on a single Nvidia Tesla K20 runs twice as fast as that for the MIC architecture-based Xeon Phi P5110 coprocessor, but also requires considerably more programming effort. General purpose GPUs are a mature platform with large amounts of computing power capable of tackling inherently parallel problems, but can prove demanding for the programmer. On the other hand the new MIC architecture, albeit lacking in performance reduces the programming effort and makes it up with a more general architecture suitable for a wider range of problems.

  12. An Overview of the Most Important Reference Architectures for Cloud Computing

    Directory of Open Access Journals (Sweden)

    Razvan Daniel ZOTA

    2014-01-01

    Full Text Available In this paper we have presented the main characteristics of the most important reference archi-tectures designed for the cloud computing environment. Specifically, we have introduced the proposed architectures of the worldwide cloud computing companies like Cisco, IBM and VMware and we also had a look at the National Institute of Standards and Technology (NIST reference architecture which is the starting point for all proposed architectures in the field. As one would expect, the provider dependent reference architectures are written is such a way to suit the services and products of the company, while NIST’s architecture is a more general model with more comprehensive architectural details that we highlighted in this article. In the end of the article we draw out some conclusions regarding the existing reference architectures for cloud computing.

  13. Polymorphous Computing Architecture (PCA) Kernel-Level Benchmarks

    National Research Council Canada - National Science Library

    Lebak, J

    2004-01-01

    .... "Computation" aspects include floating-point and integer performance, as well as the memory hierarchy, while the "communication" aspects include the network, the memory hierarchy, and the 110 capabilities...

  14. Computer-Related Task Performance

    DEFF Research Database (Denmark)

    Longstreet, Phil; Xiao, Xiao; Sarker, Saonee

    2016-01-01

    The existing information system (IS) literature has acknowledged computer self-efficacy (CSE) as an important factor contributing to enhancements in computer-related task performance. However, the empirical results of CSE on performance have not always been consistent, and increasing an individual......'s CSE is often a cumbersome process. Thus, we introduce the theoretical concept of self-prophecy (SP) and examine how this social influence strategy can be used to improve computer-related task performance. Two experiments are conducted to examine the influence of SP on task performance. Results show...... that SP and CSE interact to influence performance. Implications are then discussed in terms of organizations’ ability to increase performance....

  15. Predictors of Future Performance in Architectural Design Education

    Science.gov (United States)

    Roberts, A. S.

    2007-01-01

    The link between academic performance in secondary education and the subsequent performance of students studying architecture at university level is commonly questioned by educators and admissions tutors. This paper investigates the potential for using measures of cognitive style and spatial ability as predictors of future potential in…

  16. Resistive content addressable memory based in-memory computation architecture

    KAUST Repository

    Salama, Khaled N.; Zidan, Mohammed A.; Kurdahi, Fadi; Eltawil, Ahmed M.

    2016-01-01

    Various examples are provided examples related to resistive content addressable memory (RCAM) based in-memory computation architectures. In one example, a system includes a content addressable memory (CAM) including an array of cells having a memristor based crossbar and an interconnection switch matrix having a gateless memristor array, which is coupled to an output of the CAM. In another example, a method, includes comparing activated bit values stored a key register with corresponding bit values in a row of a CAM, setting a tag bit value to indicate that the activated bit values match the corresponding bit values, and writing masked key bit values to corresponding bit locations in the row of the CAM based on the tag bit value.

  17. Resistive content addressable memory based in-memory computation architecture

    KAUST Repository

    Salama, Khaled N.

    2016-12-08

    Various examples are provided examples related to resistive content addressable memory (RCAM) based in-memory computation architectures. In one example, a system includes a content addressable memory (CAM) including an array of cells having a memristor based crossbar and an interconnection switch matrix having a gateless memristor array, which is coupled to an output of the CAM. In another example, a method, includes comparing activated bit values stored a key register with corresponding bit values in a row of a CAM, setting a tag bit value to indicate that the activated bit values match the corresponding bit values, and writing masked key bit values to corresponding bit locations in the row of the CAM based on the tag bit value.

  18. Computational Strategies for the Architectural Design of Bending Active Structures

    DEFF Research Database (Denmark)

    Tamke, Martin; Nicholas, Paul

    2013-01-01

    Active bending introduces a new level of integration into the design of architectural structures, and opens up new complexities for the architectural design process. In particular, the introduction of material variation reconfigures the design space. Through the precise specification...

  19. Enabling high performance computational science through combinatorial algorithms

    International Nuclear Information System (INIS)

    Boman, Erik G; Bozdag, Doruk; Catalyurek, Umit V; Devine, Karen D; Gebremedhin, Assefaw H; Hovland, Paul D; Pothen, Alex; Strout, Michelle Mills

    2007-01-01

    The Combinatorial Scientific Computing and Petascale Simulations (CSCAPES) Institute is developing algorithms and software for combinatorial problems that play an enabling role in scientific and engineering computations. Discrete algorithms will be increasingly critical for achieving high performance for irregular problems on petascale architectures. This paper describes recent contributions by researchers at the CSCAPES Institute in the areas of load balancing, parallel graph coloring, performance improvement, and parallel automatic differentiation

  20. Enabling high performance computational science through combinatorial algorithms

    Energy Technology Data Exchange (ETDEWEB)

    Boman, Erik G [Discrete Algorithms and Math Department, Sandia National Laboratories (United States); Bozdag, Doruk [Biomedical Informatics, and Electrical and Computer Engineering, Ohio State University (United States); Catalyurek, Umit V [Biomedical Informatics, and Electrical and Computer Engineering, Ohio State University (United States); Devine, Karen D [Discrete Algorithms and Math Department, Sandia National Laboratories (United States); Gebremedhin, Assefaw H [Computer Science and Center for Computational Science, Old Dominion University (United States); Hovland, Paul D [Mathematics and Computer Science Division, Argonne National Laboratory (United States); Pothen, Alex [Computer Science and Center for Computational Science, Old Dominion University (United States); Strout, Michelle Mills [Computer Science, Colorado State University (United States)

    2007-07-15

    The Combinatorial Scientific Computing and Petascale Simulations (CSCAPES) Institute is developing algorithms and software for combinatorial problems that play an enabling role in scientific and engineering computations. Discrete algorithms will be increasingly critical for achieving high performance for irregular problems on petascale architectures. This paper describes recent contributions by researchers at the CSCAPES Institute in the areas of load balancing, parallel graph coloring, performance improvement, and parallel automatic differentiation.

  1. How does Architecture Sound for Different Musical Instrument Performances?

    DEFF Research Database (Denmark)

    Saher, Konca; Rindel, Jens Holger

    2006-01-01

    This paper discusses how consideration of sound _in particular a specific musical instrument_ impacts the design of a room. Properly designed architectural acoustics is fundamental to improve the listening experience of an instrument in rooms in a conservatory. Six discrete instruments (violin, c...... different instruments and the choir experience that could fit into same category of room. For all calculations and the auralizations, a computational model is used: ODEON 7.0....

  2. Computational simulation in architectural and environmental acoustics methods and applications of wave-based computation

    CERN Document Server

    Sakamoto, Shinichi; Otsuru, Toru

    2014-01-01

    This book reviews a variety of methods for wave-based acoustic simulation and recent applications to architectural and environmental acoustic problems. Following an introduction providing an overview of computational simulation of sound environment, the book is in two parts: four chapters on methods and four chapters on applications. The first part explains the fundamentals and advanced techniques for three popular methods, namely, the finite-difference time-domain method, the finite element method, and the boundary element method, as well as alternative time-domain methods. The second part demonstrates various applications to room acoustics simulation, noise propagation simulation, acoustic property simulation for building components, and auralization. This book is a valuable reference that covers the state of the art in computational simulation for architectural and environmental acoustics.  

  3. Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

    Science.gov (United States)

    Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

    2015-09-01

    The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.

  4. High Performance Computing in Science and Engineering '02 : Transactions of the High Performance Computing Center

    CERN Document Server

    Jäger, Willi

    2003-01-01

    This book presents the state-of-the-art in modeling and simulation on supercomputers. Leading German research groups present their results achieved on high-end systems of the High Performance Computing Center Stuttgart (HLRS) for the year 2002. Reports cover all fields of supercomputing simulation ranging from computational fluid dynamics to computer science. Special emphasis is given to industrially relevant applications. Moreover, by presenting results for both vector sytems and micro-processor based systems the book allows to compare performance levels and usability of a variety of supercomputer architectures. It therefore becomes an indispensable guidebook to assess the impact of the Japanese Earth Simulator project on supercomputing in the years to come.

  5. PLEIADES SYSTEM ARCHITECTURE AND MAIN PERFORMANCES

    Directory of Open Access Journals (Sweden)

    M. A. Gleyzes

    2012-07-01

    Full Text Available France, under the leadership of the French Space Agency (CNES, has set up a cooperative program with Austria, Belgium, Spain, Sweden, in order to develop a space Earth Observation system called PLEIADES. PLEIADES is a dual system, this means that it is intended to fulfill an extended panel of both civilian and Defense user’s needs.. This paper reports the status of the satellite after its launch and the in orbit commissioning, the PLEIADES satellite first model has been launched at the end of year 2011, the second model will be launched about 12 months later. It describes the main mission characteristics and performances status. It exposes how the system, satellite and ground segment have been designed in order to be compliant with a dual exploitation between civilian and defense partners. The system is based on the use of a set of newly European developed technologies to feature the satellite. In order to maximize the agility of the satellite, weight and inertia have been reduced using a compact hexagonal shape for the satellite bus. The optical mission consists in Earth optical observation composed of 0.7 m nadir resolution for the panchromatic band and 2.8 m nadir resolution for the four multi-spectral bands. The image swath is about 20 km. PLEIADES delivers optical high resolution products consisting in a Panchromatic image, into which is merged a four multispectral bands image, orthorectified on a Digital Terrain Model (DTM. Thanks to the huge satellite agility obtained with control momentum gyros as actuators, the optical system delivers as well instantaneous stereo images, under different stereoscopic conditions and mosaic images, issued from along the track thus enlarging the field of view. The ground segment is composed of a dual ground center located in CNES Toulouse premises in charge of preparing the dual mission command plan and of the real time contacts with the satellite through a control center. The dual ground center

  6. Combining Performance and Flexibility for RMS with a Hybrid Architecture

    NARCIS (Netherlands)

    Dennis Koole; Arjan Groenewegen; Daniël Telgen; Patrick Wit; Leo van Moergestel; Arjan van Zanten; John-Jules Meyer; Ing. Erik Puik; Dick van der Steen; Pascal Muller

    2013-01-01

    Author supplied Combining Performance and Flexibility for RMS with a Hybrid Architecture Dani¨el Telgen 12? , Leo van Moergestel 1 , Erik Puik 1 , Pascal Muller 1 , Arjan Groenewegen 1 , Dick van der Steen 1 , Dennis Koole 1 , Patrick de Wit 1 , Arjen van Zanten 1 , and John-Jules

  7. Implicit Unstructured Computational Aerodynamics on Many-Integrated Core Architecture

    KAUST Repository

    Al Farhan, Mohammed A.

    2014-05-04

    This research aims to understand the performance of PETSc-FUN3D, a fully nonlinear implicit unstructured grid incompressible or compressible Euler code with origins at NASA and the U.S. DOE, on many-integrated core architecture and how a hybridprogramming paradigm (MPI+OpenMP) can exploit Intel Xeon Phi hardware with upwards of 60 cores per node and 4 threads per core. For the current contribution, we focus on strong scaling with many-integrated core hardware. In most implicit PDE-based codes, while the linear algebraic kernel is limited by the bottleneck of memory bandwidth, the flux kernel arising in control volume discretization of the conservation law residuals and the preconditioner for the Jacobian exploits the Phi hardware well.

  8. Performance Evaluation of a Mobile Wireless Computational Grid ...

    African Journals Online (AJOL)

    This work developed and simulated a mathematical model for a mobile wireless computational Grid architecture using networks of queuing theory. This was in order to evaluate the performance of theload-balancing three tier hierarchical configuration. The throughput and resource utilizationmetrics were measured and the ...

  9. CSP: A Multifaceted Hybrid Architecture for Space Computing

    Science.gov (United States)

    Rudolph, Dylan; Wilson, Christopher; Stewart, Jacob; Gauvin, Patrick; George, Alan; Lam, Herman; Crum, Gary Alex; Wirthlin, Mike; Wilson, Alex; Stoddard, Aaron

    2014-01-01

    Research on the CHREC Space Processor (CSP) takes a multifaceted hybrid approach to embedded space computing. Working closely with the NASA Goddard SpaceCube team, researchers at the National Science Foundation (NSF) Center for High-Performance Reconfigurable Computing (CHREC) at the University of Florida and Brigham Young University are developing hybrid space computers that feature an innovative combination of three technologies: commercial-off-the-shelf (COTS) devices, radiation-hardened (RadHard) devices, and fault-tolerant computing. Modern COTS processors provide the utmost in performance and energy-efficiency but are susceptible to ionizing radiation in space, whereas RadHard processors are virtually immune to this radiation but are more expensive, larger, less energy-efficient, and generations behind in speed and functionality. By featuring COTS devices to perform the critical data processing, supported by simpler RadHard devices that monitor and manage the COTS devices, and augmented with novel uses of fault-tolerant hardware, software, information, and networking within and between COTS devices, the resulting system can maximize performance and reliability while minimizing energy consumption and cost. NASA Goddard has adopted the CSP concept and technology with plans underway to feature flight-ready CSP boards on two upcoming space missions.

  10. Characterization of the MCNPX computer code in micro processed architectures

    International Nuclear Information System (INIS)

    Almeida, Helder C.; Dominguez, Dany S.; Orellana, Esbel T.V.; Milian, Felix M.

    2009-01-01

    The MCNPX (Monte Carlo N-Particle extended) can be used to simulate the transport of several types of nuclear particles, using probabilistic methods. The technique used for MCNPX is to follow the history of each particle from its origin to its extinction that can be given by absorption, escape or other reasons. To obtain accurate results in simulations performed with the MCNPX is necessary to process a large number of histories, which demand high computational cost. Currently the MCNPX can be installed in virtually all computing platforms available, however there is virtually no information on the performance of the application in each. This paper studies the performance of MCNPX, to work with electrons and photons in phantom Faux on two platforms used by most researchers, Windows and Li nux. Both platforms were tested on the same computer to ensure the reliability of the hardware in the measures of performance. The performance of MCNPX was measured by time spent to run a simulation, making the variable time the main measure of comparison. During the tests the difference in performance between the two platforms MCNPX was evident. In some cases we were able to gain speed more than 10% only with the exchange platforms, without any specific optimization. This shows the relevance of the study to optimize this tool on the platform most appropriate for its use. (author)

  11. HTMT-class Latency Tolerant Parallel Architecture for Petaflops Scale Computation

    Science.gov (United States)

    Sterling, Thomas; Bergman, Larry

    2000-01-01

    Computational Aero Sciences and other numeric intensive computation disciplines demand computing throughputs substantially greater than the Teraflops scale systems only now becoming available. The related fields of fluids, structures, thermal, combustion, and dynamic controls are among the interdisciplinary areas that in combination with sufficient resolution and advanced adaptive techniques may force performance requirements towards Petaflops. This will be especially true for compute intensive models such as Navier-Stokes are or when such system models are only part of a larger design optimization computation involving many design points. Yet recent experience with conventional MPP configurations comprising commodity processing and memory components has shown that larger scale frequently results in higher programming difficulty and lower system efficiency. While important advances in system software and algorithms techniques have had some impact on efficiency and programmability for certain classes of problems, in general it is unlikely that software alone will resolve the challenges to higher scalability. As in the past, future generations of high-end computers may require a combination of hardware architecture and system software advances to enable efficient operation at a Petaflops level. The NASA led HTMT project has engaged the talents of a broad interdisciplinary team to develop a new strategy in high-end system architecture to deliver petaflops scale computing in the 2004/5 timeframe. The Hybrid-Technology, MultiThreaded parallel computer architecture incorporates several advanced technologies in combination with an innovative dynamic adaptive scheduling mechanism to provide unprecedented performance and efficiency within practical constraints of cost, complexity, and power consumption. The emerging superconductor Rapid Single Flux Quantum electronics can operate at 100 GHz (the record is 770 GHz) and one percent of the power required by convention

  12. Hybrid VLSI/QCA Architecture for Computing FFTs

    Science.gov (United States)

    Fijany, Amir; Toomarian, Nikzad; Modarres, Katayoon; Spotnitz, Matthew

    2003-01-01

    A data-processor architecture that would incorporate elements of both conventional very-large-scale integrated (VLSI) circuitry and quantum-dot cellular automata (QCA) has been proposed to enable the highly parallel and systolic computation of fast Fourier transforms (FFTs). The proposed circuit would complement the QCA-based circuits described in several prior NASA Tech Briefs articles, namely Implementing Permutation Matrices by Use of Quantum Dots (NPO-20801), Vol. 25, No. 10 (October 2001), page 42; Compact Interconnection Networks Based on Quantum Dots (NPO-20855) Vol. 27, No. 1 (January 2003), page 32; and Bit-Serial Adder Based on Quantum Dots (NPO-20869), Vol. 27, No. 1 (January 2003), page 35. The cited prior articles described the limitations of very-large-scale integrated (VLSI) circuitry and the major potential advantage afforded by QCA. To recapitulate: In a VLSI circuit, signal paths that are required not to interact with each other must not cross in the same plane. In contrast, for reasons too complex to describe in the limited space available for this article, suitably designed and operated QCAbased signal paths that are required not to interact with each other can nevertheless be allowed to cross each other in the same plane without adverse effect. In principle, this characteristic could be exploited to design compact, coplanar, simple (relative to VLSI) QCA-based networks to implement complex, advanced interconnection schemes.

  13. PHENIX On-Line Distributed Computing System Architecture

    International Nuclear Information System (INIS)

    Desmond, Edmond; Haggerty, John; Kehayias, Hyon Joo; Purschke, Martin L.; Witzig, Chris; Kozlowski, Thomas

    1997-01-01

    PHENIX is one of the two large experiments at the Relativistic Heavy Ion Collider (RHIC) currently under construction at Brookhaven National Laboratory. The detector consists of 11 sub-detectors, that are further subdivided into 29 units (''granules'') that can be operated independently, which includes simultaneous data taking with independent data streams and independent triggers. The detector has 250,000 channels and is read out by front end modules, where the data is buffered in a pipeline while awaiting the level trigger decision. Zero suppression and calibration is done after the level accept in custom built data collection modules (DCMs) with DSPs before the data is sent to an event builder (design throughput of 2 Gb/sec) and higher level triggers. The On-line Computing Systems Group (ONCS) has two responsibilities. Firstly it is responsible for receiving the data from the event builder, routing it through a network of workstations to consumer processes and archiving it at a data rate of 20 MB/sec. Secondly it is also responsible for the overall configuration, control and operation of the detector and data acquisition chain, which comprises the software integration for several thousand custom built hardware modules. The software must furthermore support the independent operation of the above mentioned granules, which includes the coordination of processes that run in 60-100 VME processors and workstations. ONOS has adapted the Shlaer- Mellor Object Oriented Methodology for the design of the top layer software. CORBA is used as communication layer between the distributed objects, which are implemented as asynchronous finite state machines. We will give an overview of the PHENIX online system with the main focus on the system architecture, software components and integration tasks of the On-line Computing group ONCS and report on the status of the current prototypes

  14. Emerging opportunities in enterprise integration with open architecture computer numerical controls

    Science.gov (United States)

    Hudson, Christopher A.

    1997-01-01

    The shift to open-architecture machine tool computer numerical controls is providing new opportunities for metal working oriented manufacturers to streamline the entire 'art to part' process. Production cycle times, accuracy, consistency, predictability and process reliability are just some of the factors that can be improved, leading to better manufactured product at lower costs. Open architecture controllers are allowing manufacturers to apply general purpose software and hardware tools increase where previous approaches relied on proprietary and unique hardware and software. This includes DNC, SCADA, CAD, and CAM, where the increasing use of general purpose components is leading to lower cost system that are also more reliable and robust than the past proprietary approaches. In addition, a number of new opportunities exist, which in the past were likely impractical due to cost or performance constraints.

  15. Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

    Science.gov (United States)

    Carroll, Chester C.; Youngblood, John N.; Saha, Aindam

    1987-01-01

    Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.

  16. Universal Quantum Computing with Measurement-Induced Continuous-Variable Gate Sequence in a Loop-Based Architecture.

    Science.gov (United States)

    Takeda, Shuntaro; Furusawa, Akira

    2017-09-22

    We propose a scalable scheme for optical quantum computing using measurement-induced continuous-variable quantum gates in a loop-based architecture. Here, time-bin-encoded quantum information in a single spatial mode is deterministically processed in a nested loop by an electrically programmable gate sequence. This architecture can process any input state and an arbitrary number of modes with almost minimum resources, and offers a universal gate set for both qubits and continuous variables. Furthermore, quantum computing can be performed fault tolerantly by a known scheme for encoding a qubit in an infinite-dimensional Hilbert space of a single light mode.

  17. How computer science can help in understanding the 3D genome architecture.

    Science.gov (United States)

    Shavit, Yoli; Merelli, Ivan; Milanesi, Luciano; Lio', Pietro

    2016-09-01

    Chromosome conformation capture techniques are producing a huge amount of data about the architecture of our genome. These data can provide us with a better understanding of the events that induce critical regulations of the cellular function from small changes in the three-dimensional genome architecture. Generating a unified view of spatial, temporal, genetic and epigenetic properties poses various challenges of data analysis, visualization, integration and mining, as well as of high performance computing and big data management. Here, we describe the critical issues of this new branch of bioinformatics, oriented at the comprehension of the three-dimensional genome architecture, which we call 'Nucleome Bioinformatics', looking beyond the currently available tools and methods, and highlight yet unaddressed challenges and the potential approaches that could be applied for tackling them. Our review provides a map for researchers interested in using computer science for studying 'Nucleome Bioinformatics', to achieve a better understanding of the biological processes that occur inside the nucleus. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  18. Architectures, Concepts and Architectures for Service Oriented Computing : proceedings of the 1st International Workshop - ACT4SOC 2007

    NARCIS (Netherlands)

    van Sinderen, Marten J.; Unknown, [Unknown

    2007-01-01

    This volume contains the proceedings of the First International Workshop on Architectures, Concepts and Technologies for Service Oriented Computing (ACT4SOC 2007), held on July 22 in Barcelona, Spain, in conjunction with the Second International Conference on Software and Data Technologies (ICSOFT

  19. Architecture

    OpenAIRE

    Clear, Nic

    2014-01-01

    When discussing science fiction’s relationship with architecture, the usual practice is to look at the architecture “in” science fiction—in particular, the architecture in SF films (see Kuhn 75-143) since the spaces of literary SF present obvious difficulties as they have to be imagined. In this essay, that relationship will be reversed: I will instead discuss science fiction “in” architecture, mapping out a number of architectural movements and projects that can be viewed explicitly as scien...

  20. Staggered Dslash Performance on Intel Xeon Phi Architecture

    OpenAIRE

    Li, Ruizi; Gottlieb, Steven

    2014-01-01

    The conjugate gradient (CG) algorithm is among the most essential and time consuming parts of lattice calculations with staggered quarks. We test the performance of CG and dslash, the key step in the CG algorithm, on the Intel Xeon Phi, also known as the Many Integrated Core (MIC) architecture. We try different parallelization strategies using MPI, OpenMP, and the vector processing units (VPUs).

  1. Performance anomaly detection in microservice architectures under continuous change

    OpenAIRE

    Düllmann, Thomas F.

    2017-01-01

    The idea of DevOps and agile approaches like Continuous Integration (CI) and microservice architectures are bocoming more and more popular as the demand for flexible and scalable solutions is increasing. By raising the degree of automation and distribution new challenges in terms of application performance monitoring arise because microservices are possibly short-lived and may be replaced within seconds. The fact that microservices are added and removed on a regular basis brings new requireme...

  2. Improving engineers' performance with computers

    International Nuclear Information System (INIS)

    Purvis, E.E. III

    1984-01-01

    The problem addressed is how to improve the performance of engineers in the design, operation, and maintenance of nuclear power plants. The application of computer science to this problem offers a challenge in maximizing the use of developments outside the nuclear industry and setting priorities to address the most fruitful areas first. Areas of potential benefits include data base management through design, analysis, procurement, construction, operation maintenance, cost, schedule and interface control and planning, and quality engineering on specifications, inspection, and training

  3. Micromagnetics on high-performance workstation and mobile computational platforms

    Science.gov (United States)

    Fu, S.; Chang, R.; Couture, S.; Menarini, M.; Escobar, M. A.; Kuteifan, M.; Lubarda, M.; Gabay, D.; Lomakin, V.

    2015-05-01

    The feasibility of using high-performance desktop and embedded mobile computational platforms is presented, including multi-core Intel central processing unit, Nvidia desktop graphics processing units, and Nvidia Jetson TK1 Platform. FastMag finite element method-based micromagnetic simulator is used as a testbed, showing high efficiency on all the platforms. Optimization aspects of improving the performance of the mobile systems are discussed. The high performance, low cost, low power consumption, and rapid performance increase of the embedded mobile systems make them a promising candidate for micromagnetic simulations. Such architectures can be used as standalone systems or can be built as low-power computing clusters.

  4. Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Arumugam, Kamesh [Old Dominion Univ., Norfolk, VA (United States)

    2017-05-01

    Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore, these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address

  5. High-performance, scalable optical network-on-chip architectures

    Science.gov (United States)

    Tan, Xianfang

    The rapid advance of technology enables a large number of processing cores to be integrated into a single chip which is called a Chip Multiprocessor (CMP) or a Multiprocessor System-on-Chip (MPSoC) design. The on-chip interconnection network, which is the communication infrastructure for these processing cores, plays a central role in a many-core system. With the continuously increasing complexity of many-core systems, traditional metallic wired electronic networks-on-chip (NoC) became a bottleneck because of the unbearable latency in data transmission and extremely high energy consumption on chip. Optical networks-on-chip (ONoC) has been proposed as a promising alternative paradigm for electronic NoC with the benefits of optical signaling communication such as extremely high bandwidth, negligible latency, and low power consumption. This dissertation focus on the design of high-performance and scalable ONoC architectures and the contributions are highlighted as follow: 1. A micro-ring resonator (MRR)-based Generic Wavelength-routed Optical Router (GWOR) is proposed. A method for developing any sized GWOR is introduced. GWOR is a scalable non-blocking ONoC architecture with simple structure, low cost and high power efficiency compared to existing ONoC designs. 2. To expand the bandwidth and improve the fault tolerance of the GWOR, a redundant GWOR architecture is designed by cascading different type of GWORs into one network. 3. The redundant GWOR built with MRR-based comb switches is proposed. Comb switches can expand the bandwidth while keep the topology of GWOR unchanged by replacing the general MRRs with comb switches. 4. A butterfly fat tree (BFT)-based hybrid optoelectronic NoC (HONoC) architecture is developed in which GWORs are used for global communication and electronic routers are used for local communication. The proposed HONoC uses less numbers of electronic routers and links than its counterpart of electronic BFT-based NoC. It takes the advantages of

  6. Software Systems for High-performance Quantum Computing

    Energy Technology Data Exchange (ETDEWEB)

    Humble, Travis S [ORNL; Britt, Keith A [ORNL

    2016-01-01

    Quantum computing promises new opportunities for solving hard computational problems, but harnessing this novelty requires breakthrough concepts in the design, operation, and application of computing systems. We define some of the challenges facing the development of quantum computing systems as well as software-based approaches that can be used to overcome these challenges. Following a brief overview of the state of the art, we present models for the quantum programming and execution models, the development of architectures for hybrid high-performance computing systems, and the realization of software stacks for quantum networking. This leads to a discussion of the role that conventional computing plays in the quantum paradigm and how some of the current challenges for exascale computing overlap with those facing quantum computing.

  7. Could running experience on SPMD computers contribute to the architectural choices for future dedicated computers for high energy physics simulation?

    International Nuclear Information System (INIS)

    Jejcic, A.; Maillard, J.; Silva, J.; Auguin, M.; Boeri, F.

    1989-01-01

    Results obtained on a strongly coupled parallel computer are reported. They concern Monte-Carlo simulation and pattern recognition. Though the calculations were made on an experimental computer of rather low processing power, it is believed that the quoted figures could give useful indications on architectural choices for dedicated computers. (orig.)

  8. Could running experience on SPMD computers contribute to the architectural choices for future dedicated computers for high energy physics simulation

    International Nuclear Information System (INIS)

    Jejcic, A.; Maillard, J.; Silva, J.; Auguin, M.; Boeri, F.

    1989-01-01

    Results obtained on strongly coupled parallel computer are reported. They concern Monte-Carlo simulation and pattern recognition. Though the calculations were made on an experimental computer of rather low processing power, it is believed that the quoted figures could give useful indications on architectural choices for dedicated computers

  9. Porting plasma physics simulation codes to modern computing architectures using the libmrc framework

    Science.gov (United States)

    Germaschewski, Kai; Abbott, Stephen

    2015-11-01

    Available computing power has continued to grow exponentially even after single-core performance satured in the last decade. The increase has since been driven by more parallelism, both using more cores and having more parallelism in each core, e.g. in GPUs and Intel Xeon Phi. Adapting existing plasma physics codes is challenging, in particular as there is no single programming model that covers current and future architectures. We will introduce the open-source libmrc framework that has been used to modularize and port three plasma physics codes: The extended MHD code MRCv3 with implicit time integration and curvilinear grids; the OpenGGCM global magnetosphere model; and the particle-in-cell code PSC. libmrc consolidates basic functionality needed for simulations based on structured grids (I/O, load balancing, time integrators), and also introduces a parallel object model that makes it possible to maintain multiple implementations of computational kernels, on e.g. conventional processors and GPUs. It handles data layout conversions and enables us to port performance-critical parts of a code to a new architecture step-by-step, while the rest of the code can remain unchanged. We will show examples of the performance gains and some physics applications.

  10. Behavioral Simulation and Performance Evaluation of Multi-Processor Architectures

    Directory of Open Access Journals (Sweden)

    Ausif Mahmood

    1996-01-01

    Full Text Available The development of multi-processor architectures requires extensive behavioral simulations to verify the correctness of design and to evaluate its performance. A high level language can provide maximum flexibility in this respect if the constructs for handling concurrent processes and a time mapping mechanism are added. This paper describes a novel technique for emulating hardware processes involved in a parallel architecture such that an object-oriented description of the design is maintained. The communication and synchronization between hardware processes is handled by splitting the processes into their equivalent subprograms at the entry points. The proper scheduling of these subprograms is coordinated by a timing wheel which provides a time mapping mechanism. Finally, a high level language pre-processor is proposed so that the timing wheel and the process emulation details can be made transparent to the user.

  11. Integrating Computing Resources: A Shared Distributed Architecture for Academics and Administrators.

    Science.gov (United States)

    Beltrametti, Monica; English, Will

    1994-01-01

    Development and implementation of a shared distributed computing architecture at the University of Alberta (Canada) are described. Aspects discussed include design of the architecture, users' views of the electronic environment, technical and managerial challenges, and the campuswide human infrastructures needed to manage such an integrated…

  12. Toward a Fault Tolerant Architecture for Vital Medical-Based Wearable Computing.

    Science.gov (United States)

    Abdali-Mohammadi, Fardin; Bajalan, Vahid; Fathi, Abdolhossein

    2015-12-01

    Advancements in computers and electronic technologies have led to the emergence of a new generation of efficient small intelligent systems. The products of such technologies might include Smartphones and wearable devices, which have attracted the attention of medical applications. These products are used less in critical medical applications because of their resource constraint and failure sensitivity. This is due to the fact that without safety considerations, small-integrated hardware will endanger patients' lives. Therefore, proposing some principals is required to construct wearable systems in healthcare so that the existing concerns are dealt with. Accordingly, this paper proposes an architecture for constructing wearable systems in critical medical applications. The proposed architecture is a three-tier one, supporting data flow from body sensors to cloud. The tiers of this architecture include wearable computers, mobile computing, and mobile cloud computing. One of the features of this architecture is its high possible fault tolerance due to the nature of its components. Moreover, the required protocols are presented to coordinate the components of this architecture. Finally, the reliability of this architecture is assessed by simulating the architecture and its components, and other aspects of the proposed architecture are discussed.

  13. MAINS: MULTI-AGENT INTELLIGENT SERVICE ARCHITECTURE FOR CLOUD COMPUTING

    Directory of Open Access Journals (Sweden)

    T. Joshva Devadas

    2014-04-01

    Full Text Available Computing has been transformed to a model having commoditized services. These services are modeled similar to the utility services water and electricity. The Internet has been stunningly successful over the course of past three decades in supporting multitude of distributed applications and a wide variety of network technologies. However, its popularity has become the biggest impediment to its further growth with the handheld devices mobile and laptops. Agents are intelligent software system that works on behalf of others. Agents are incorporated in many innovative applications in order to improve the performance of the system. Agent uses its possessed knowledge to react with the system and helps to improve the performance. Agents are introduced in the cloud computing is to minimize the response time when similar request is raised from an end user in the globe. In this paper, we have introduced a Multi Agent Intelligent system (MAINS prior to cloud service models and it was tested using sample dataset. Performance of the MAINS layer was analyzed in three aspects and the outcome of the analysis proves that MAINS Layer provides a flexible model to create cloud applications and deploying them in variety of applications.

  14. A performance analysis of advanced I/O architectures for PC-based network file servers

    Science.gov (United States)

    Huynh, K. D.; Khoshgoftaar, T. M.

    1994-12-01

    In the personal computing and workstation environments, more and more I/O adapters are becoming complete functional subsystems that are intelligent enough to handle I/O operations on their own without much intervention from the host processor. The IBM Subsystem Control Block (SCB) architecture has been defined to enhance the potential of these intelligent adapters by defining services and conventions that deliver command information and data to and from the adapters. In recent years, a new storage architecture, the Redundant Array of Independent Disks (RAID), has been quickly gaining acceptance in the world of computing. In this paper, we would like to discuss critical system design issues that are important to the performance of a network file server. We then present a performance analysis of the SCB architecture and disk array technology in typical network file server environments based on personal computers (PCs). One of the key issues investigated in this paper is whether a disk array can outperform a group of disks (of same type, same data capacity, and same cost) operating independently, not in parallel as in a disk array.

  15. Simulating Hydrologic Flow and Reactive Transport with PFLOTRAN and PETSc on Emerging Fine-Grained Parallel Computer Architectures

    Science.gov (United States)

    Mills, R. T.; Rupp, K.; Smith, B. F.; Brown, J.; Knepley, M.; Zhang, H.; Adams, M.; Hammond, G. E.

    2017-12-01

    As the high-performance computing community pushes towards the exascale horizon, power and heat considerations have driven the increasing importance and prevalence of fine-grained parallelism in new computer architectures. High-performance computing centers have become increasingly reliant on GPGPU accelerators and "manycore" processors such as the Intel Xeon Phi line, and 512-bit SIMD registers have even been introduced in the latest generation of Intel's mainstream Xeon server processors. The high degree of fine-grained parallelism and more complicated memory hierarchy considerations of such "manycore" processors present several challenges to existing scientific software. Here, we consider how the massively parallel, open-source hydrologic flow and reactive transport code PFLOTRAN - and the underlying Portable, Extensible Toolkit for Scientific Computation (PETSc) library on which it is built - can best take advantage of such architectures. We will discuss some key features of these novel architectures and our code optimizations and algorithmic developments targeted at them, and present experiences drawn from working with a wide range of PFLOTRAN benchmark problems on these architectures.

  16. Analysis of multigrid methods on massively parallel computers: Architectural implications

    Science.gov (United States)

    Matheson, Lesley R.; Tarjan, Robert E.

    1993-01-01

    We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.

  17. Evolution of the Milieu Approach for Software Development for the Polymorphous Computing Architecture Program

    National Research Council Canada - National Science Library

    Dandass, Yoginder

    2004-01-01

    A key goal of the DARPA Polymorphous Computing Architectures (PCA) program is to develop reactive closed-loop systems that are capable of being dynamically reconfigured in order to respond to changing mission scenarios...

  18. Energy and architecture: improvement of energy performance in existing buildings

    Energy Technology Data Exchange (ETDEWEB)

    Haase, Matthias; Wycmans, Annemie; Solbraa, Anne; Grytli, Eir

    2011-07-01

    This book aims to give an overview of different aspects of retrofitting existing buildings. The target group is students of architecture and building engineering as well as building professionals. Eight out of ten buildings which we will inhabit in 2050 already exist. This means that a great potential for reducing our carbon footprint lies in the existing building stock. Students from NTNU have used the renovation of a 1950s school building at Linesoeya in Soer-Trondelag as a case to increase their awareness and knowledge about the challenges building professionals need to overcome to unite technical details and high user quality into good environmental performance. The students were invited by the building owners and initiators of LIPA Eco Project to contribute to its development: By retrofitting an existing building to passive house standards and combining this with energy generated on site, LIPA Eco Project aims to provide a hands-on example with regard to energy efficiency, architectural design and craftsmanship for a low carbon society. The overall goal for this project is to raise awareness regarding resource efficiency measures in architecture and particularly in existing building mass.(au)

  19. Computed radiography systems performance evaluation

    International Nuclear Information System (INIS)

    Xavier, Clarice C.; Nersissian, Denise Y.; Furquim, Tania A.C.

    2009-01-01

    The performance of a computed radiography system was evaluated, according to the AAPM Report No. 93. Evaluation tests proposed by the publication were performed, and the following nonconformities were found: imaging p/ate (lP) dark noise, which compromises the clinical image acquired using the IP; exposure indicator uncalibrated, which can cause underexposure to the IP; nonlinearity of the system response, which causes overexposure; resolution limit under the declared by the manufacturer and erasure thoroughness uncalibrated, impairing structures visualization; Moire pattern visualized at the grid response, and IP Throughput over the specified by the manufacturer. These non-conformities indicate that digital imaging systems' lack of calibration can cause an increase in dose in order that image prob/ems can be so/ved. (author)

  20. Applications of parallel computer architectures to the real-time simulation of nuclear power systems

    International Nuclear Information System (INIS)

    Doster, J.M.; Sills, E.D.

    1988-01-01

    In this paper the authors report on efforts to utilize parallel computer architectures for the thermal-hydraulic simulation of nuclear power systems and current research efforts toward the development of advanced reactor operator aids and control systems based on this new technology. Many aspects of reactor thermal-hydraulic calculations are inherently parallel, and the computationally intensive portions of these calculations can be effectively implemented on modern computers. Timing studies indicate faster-than-real-time, high-fidelity physics models can be developed when the computational algorithms are designed to take advantage of the computer's architecture. These capabilities allow for the development of novel control systems and advanced reactor operator aids. Coupled with an integral real-time data acquisition system, evolving parallel computer architectures can provide operators and control room designers improved control and protection capabilities. Current research efforts are currently under way in this area

  1. A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing.

    Energy Technology Data Exchange (ETDEWEB)

    Vineyard, Craig Michael [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verzi, Stephen Joseph [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-09-01

    As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilize memory.

  2. INVESTIGATION OF FLIP-FLOP PERFORMANCE ON DIFFERENT TYPE AND ARCHITECTURE IN SHIFT REGISTER WITH PARALLEL LOAD APPLICATIONS

    Directory of Open Access Journals (Sweden)

    Dwi Purnomo

    2015-08-01

    Full Text Available Register is one of the computer components that have a key role in computer organisation. Every computer contains millions of registers that are manifested by flip-flop. This research focuses on the investigation of flip-flop performance based on its type (D, T, S-R, and J-K and architecture (structural, behavioural, and hybrid. Each type of flip-flop on each architecture would be tested in different bit of shift register with parallel load applications. The experiment criteria that will be assessed are power consumption, resources required, memory required, latency, and efficiency. Based on the experiment, it could be shown that D flip-flop and hybrid architecture showed the best performance in required memory, latency, power consumption, and efficiency. In addition, the experiment results showed that the greater the register number, the less efficient the system would be.

  3. High-performance full adder architecture in quantum-dot cellular automata

    Directory of Open Access Journals (Sweden)

    Hamid Rashidi

    2017-06-01

    Full Text Available Quantum-dot cellular automata (QCA is a new and promising computation paradigm, which can be a viable replacement for the complementary metal–oxide–semiconductor technology at nano-scale level. This technology provides a possible solution for improving the computation in various computational applications. Two QCA full adder architectures are presented and evaluated: a new and efficient 1-bit QCA full adder architecture and a 4-bit QCA ripple carry adder (RCA architecture. The proposed architectures are simulated using QCADesigner tool version 2.0.1. These architectures are implemented with the coplanar crossover approach. The simulation results show that the proposed 1-bit QCA full adder and 4-bit QCA RCA architectures utilise 33 and 175 QCA cells, respectively. Our simulation results show that the proposed architectures outperform most results so far in the literature.

  4. Quantum perceptron over a field and neural network architecture selection in a quantum computer.

    Science.gov (United States)

    da Silva, Adenilton José; Ludermir, Teresa Bernarda; de Oliveira, Wilson Rosa

    2016-04-01

    In this work, we propose a quantum neural network named quantum perceptron over a field (QPF). Quantum computers are not yet a reality and the models and algorithms proposed in this work cannot be simulated in actual (or classical) computers. QPF is a direct generalization of a classical perceptron and solves some drawbacks found in previous models of quantum perceptrons. We also present a learning algorithm named Superposition based Architecture Learning algorithm (SAL) that optimizes the neural network weights and architectures. SAL searches for the best architecture in a finite set of neural network architectures with linear time over the number of patterns in the training set. SAL is the first learning algorithm to determine neural network architectures in polynomial time. This speedup is obtained by the use of quantum parallelism and a non-linear quantum operator. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Neuromorphic Computing, Architectures, Models, and Applications. A Beyond-CMOS Approach to Future Computing, June 29-July 1, 2016, Oak Ridge, TN

    Energy Technology Data Exchange (ETDEWEB)

    Potok, Thomas [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Schuman, Catherine [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Patton, Robert [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Hylton, Todd [Brain Corporation, San Diego, CA (United States); Li, Hai [Univ. of Pittsburgh, PA (United States); Pino, Robinson [US Dept. of Energy, Washington, DC (United States)

    2016-12-31

    The White House and Department of Energy have been instrumental in driving the development of a neuromorphic computing program to help the United States continue its lead in basic research into (1) Beyond Exascale—high performance computing beyond Moore’s Law and von Neumann architectures, (2) Scientific Discovery—new paradigms for understanding increasingly large and complex scientific data, and (3) Emerging Architectures—assessing the potential of neuromorphic and quantum architectures. Neuromorphic computing spans a broad range of scientific disciplines from materials science to devices, to computer science, to neuroscience, all of which are required to solve the neuromorphic computing grand challenge. In our workshop we focus on the computer science aspects, specifically from a neuromorphic device through an application. Neuromorphic devices present a very different paradigm to the computer science community from traditional von Neumann architectures, which raises six major questions about building a neuromorphic application from the device level. We used these fundamental questions to organize the workshop program and to direct the workshop panels and discussions. From the white papers, presentations, panels, and discussions, there emerged several recommendations on how to proceed.

  6. Multicore Challenges and Benefits for High Performance Scientific Computing

    Directory of Open Access Journals (Sweden)

    Ida M.B. Nielsen

    2008-01-01

    Full Text Available Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexity of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.

  7. Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs

    Science.gov (United States)

    Dias, Tiago; Roma, Nuno; Sousa, Leonel

    2014-12-01

    A unified architecture for fast and efficient computation of the set of two-dimensional (2-D) transforms adopted by the most recent state-of-the-art digital video standards is presented in this paper. Contrasting to other designs with similar functionality, the presented architecture is supported on a scalable, modular and completely configurable processing structure. This flexible structure not only allows to easily reconfigure the architecture to support different transform kernels, but it also permits its resizing to efficiently support transforms of different orders (e.g. order-4, order-8, order-16 and order-32). Consequently, not only is it highly suitable to realize high-performance multi-standard transform cores, but it also offers highly efficient implementations of specialized processing structures addressing only a reduced subset of transforms that are used by a specific video standard. The experimental results that were obtained by prototyping several configurations of this processing structure in a Xilinx Virtex-7 FPGA show the superior performance and hardware efficiency levels provided by the proposed unified architecture for the implementation of transform cores for the Advanced Video Coding (AVC), Audio Video coding Standard (AVS), VC-1 and High Efficiency Video Coding (HEVC) standards. In addition, such results also demonstrate the ability of this processing structure to realize multi-standard transform cores supporting all the standards mentioned above and that are capable of processing the 8k Ultra High Definition Television (UHDTV) video format (7,680 × 4,320 at 30 fps) in real time.

  8. Peer-to-peer architectures for exascale computing : LDRD final report.

    Energy Technology Data Exchange (ETDEWEB)

    Vorobeychik, Yevgeniy; Mayo, Jackson R.; Minnich, Ronald G.; Armstrong, Robert C.; Rudish, Donald W.

    2010-09-01

    The goal of this research was to investigate the potential for employing dynamic, decentralized software architectures to achieve reliability in future high-performance computing platforms. These architectures, inspired by peer-to-peer networks such as botnets that already scale to millions of unreliable nodes, hold promise for enabling scientific applications to run usefully on next-generation exascale platforms ({approx} 10{sup 18} operations per second). Traditional parallel programming techniques suffer rapid deterioration of performance scaling with growing platform size, as the work of coping with increasingly frequent failures dominates over useful computation. Our studies suggest that new architectures, in which failures are treated as ubiquitous and their effects are considered as simply another controllable source of error in a scientific computation, can remove such obstacles to exascale computing for certain applications. We have developed a simulation framework, as well as a preliminary implementation in a large-scale emulation environment, for exploration of these 'fault-oblivious computing' approaches. High-performance computing (HPC) faces a fundamental problem of increasing total component failure rates due to increasing system sizes, which threaten to degrade system reliability to an unusable level by the time the exascale range is reached ({approx} 10{sup 18} operations per second, requiring of order millions of processors). As computer scientists seek a way to scale system software for next-generation exascale machines, it is worth considering peer-to-peer (P2P) architectures that are already capable of supporting 10{sup 6}-10{sup 7} unreliable nodes. Exascale platforms will require a different way of looking at systems and software because the machine will likely not be available in its entirety for a meaningful execution time. Realistic estimates of failure rates range from a few times per day to more than once per hour for these

  9. Scaling to Nanotechnology Limits with the PIMS Computer Architecture and a new Scaling Rule

    Energy Technology Data Exchange (ETDEWEB)

    Debenedictis, Erik P. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2015-02-01

    We describe a new approach to computing that moves towards the limits of nanotechnology using a newly formulated sc aling rule. This is in contrast to the current computer industry scali ng away from von Neumann's original computer at the rate of Moore's Law. We extend Moore's Law to 3D, which l eads generally to architectures that integrate logic and memory. To keep pow er dissipation cons tant through a 2D surface of the 3D structure requires using adiabatic principles. We call our newly proposed architecture Processor In Memory and Storage (PIMS). We propose a new computational model that integrates processing and memory into "tiles" that comprise logic, memory/storage, and communications functions. Since the programming model will be relatively stable as a system scales, programs repr esented by tiles could be executed in a PIMS system built with today's technology or could become the "schematic diagram" for implementation in an ultimate 3D nanotechnology of the future. We build a systems software approach that offers advantages over and above the technological and arch itectural advantages. Firs t, the algorithms may be more efficient in the conventional sens e of having fewer steps. Second, the algorithms may run with higher power efficiency per operation by being a better match for the adiabatic scaling ru le. The performance analysis based on demonstrated ideas in physical science suggests 80,000 x improvement in cost per operation for the (arguably) gene ral purpose function of emulating neurons in Deep Learning.

  10. Laboratory infrastructure driven key performance indicator development using the smart grid architecture model

    DEFF Research Database (Denmark)

    Syed, Mazheruddin H.; Guillo-Sansano, Efren; Blair, Steven M.

    2017-01-01

    This study presents a methodology for collaboratively designing laboratory experiments and developing key performance indicators for the testing and validation of novel power system control architectures in multiple laboratory environments. The contribution makes use of the smart grid architecture...

  11. Cloud Computing: A study of cloud architecture and its patterns

    OpenAIRE

    Mandeep Handa,; Shriya Sharma

    2015-01-01

    Cloud computing is a general term for anything that involves delivering hosted services over the Internet. Cloud computing is a paradigm shift following the shift from mainframe to client–server in the early 1980s. Cloud computing can be defined as accessing third party software and services on web and paying as per usage. It facilitates scalability and virtualized resources over Internet as a service providing cost effective and scalable solution to customers. Cloud computing has...

  12. From Smart-Eco Building to High-Performance Architecture: Optimization of Energy Consumption in Architecture of Developing Countries

    Science.gov (United States)

    Mahdavinejad, M.; Bitaab, N.

    2017-08-01

    Search for high-performance architecture and dreams of future architecture resulted in attempts towards meeting energy efficient architecture and planning in different aspects. Recent trends as a mean to meet future legacy in architecture are based on the idea of innovative technologies for resource efficient buildings, performative design, bio-inspired technologies etc. while there are meaningful differences between architecture of developed and developing countries. Significance of issue might be understood when the emerging cities are found interested in Dubaization and other related booming development doctrines. This paper is to analyze the level of developing countries’ success to achieve smart-eco buildings’ goals and objectives. Emerging cities of West of Asia are selected as case studies of the paper. The results of the paper show that the concept of high-performance architecture and smart-eco buildings are different in developing countries in comparison with developed countries. The paper is to mention five essential issues in order to improve future architecture of developing countries: 1- Integrated Strategies for Energy Efficiency, 2- Contextual Solutions, 3- Embedded and Initial Energy Assessment, 4- Staff and Occupancy Wellbeing, 5- Life-Cycle Monitoring.

  13. Impact of Cognitive Architectures on Human-Computer Interaction

    Science.gov (United States)

    2014-09-01

    activation, reinforced learning, emotion, semantic memory , episodic memory , and visual imagery.12 In 2010 Rosenbloom created a variant of the Soar...being added to almost every new version. In 2004 Nuxoll and Laird added episodic memory to the Soar architecture.11 In 2008 Laird presented...York (NY): Psychology Press; 2014; p. 1–50. 11. Nuxoll A, Laird JE. A cognitive model of episodic memory integrated with a general cognitive

  14. High Performance Spaceflight Computing (HPSC)

    Data.gov (United States)

    National Aeronautics and Space Administration — Space-based computing has not kept up with the needs of current and future NASA missions. We are developing a next-generation flight computing system that addresses...

  15. Do Performance-Based Codes Support Universal Design in Architecture?

    DEFF Research Database (Denmark)

    Grangaard, Sidse; Frandsen, Anne Kathrine

    2016-01-01

    – Universal Design (UD). The empirical material consists of input from six workshops to which all 700 Danish Architectural firms were invited, as well as eight group interviews. The analysis shows that the current prescriptive requirements are criticized for being too homogenous and possibilities...... for differentiation and zoning are required. Therefore, a majority of professionals are interested in a performance-based model because they think that such a model will support ‘accessibility zoning’, achieving flexibility because of different levels of accessibility in a building due to its performance. The common...... of educational objectives is suggested as a tool for such a boost. The research project has been financed by the Danish Transport and Construction Agency....

  16. High-Performance Monitoring Architecture for Large-Scale Distributed Systems Using Event Filtering

    Science.gov (United States)

    Maly, K.

    1998-01-01

    Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed (LSD) systems. In an LSD environment, a large number of events is generated by the system components during its execution or interaction with external objects (e.g. users or processes). Monitoring such events is necessary for observing the run-time behavior of LSD systems and providing status information required for debugging, tuning and managing such applications. However, correlated events are generated concurrently and could be distributed in various locations in the applications environment which complicates the management decisions process and thereby makes monitoring LSD systems an intricate task. We propose a scalable high-performance monitoring architecture for LSD systems to detect and classify interesting local and global events and disseminate the monitoring information to the corresponding end- points management applications such as debugging and reactive control tools to improve the application performance and reliability. A large volume of events may be generated due to the extensive demands of the monitoring applications and the high interaction of LSD systems. The monitoring architecture employs a high-performance event filtering mechanism to efficiently process the large volume of event traffic generated by LSD systems and minimize the intrusiveness of the monitoring process by reducing the event traffic flow in the system and distributing the monitoring computation. Our architecture also supports dynamic and flexible reconfiguration of the monitoring mechanism via its Instrumentation and subscription components. As a case study, we show how our monitoring architecture can be utilized to improve the reliability and the performance of the Interactive Remote Instruction (IRI) system which is a large-scale distributed system for collaborative distance learning. The filtering mechanism represents an Intrinsic component integrated

  17. Architecture for high performance stereoscopic game rendering on Android

    Science.gov (United States)

    Flack, Julien; Sanderson, Hugh; Shetty, Sampath

    2014-03-01

    Stereoscopic gaming is a popular source of content for consumer 3D display systems. There has been a significant shift in the gaming industry towards casual games for mobile devices running on the Android™ Operating System and driven by ARM™ and other low power processors. Such systems are now being integrated directly into the next generation of 3D TVs potentially removing the requirement for an external games console. Although native stereo support has been integrated into some high profile titles on established platforms like Windows PC and PS3 there is a lack of GPU independent 3D support for the emerging Android platform. We describe a framework for enabling stereoscopic 3D gaming on Android for applications on mobile devices, set top boxes and TVs. A core component of the architecture is a 3D game driver, which is integrated into the Android OpenGL™ ES graphics stack to convert existing 2D graphics applications into stereoscopic 3D in real-time. The architecture includes a method of analyzing 2D games and using rule based Artificial Intelligence (AI) to position separate objects in 3D space. We describe an innovative stereo 3D rendering technique to separate the views in the depth domain and render directly into the display buffer. The advantages of the stereo renderer are demonstrated by characterizing the performance in comparison to more traditional render techniques, including depth based image rendering, both in terms of frame rates and impact on battery consumption.

  18. Quantum Accelerators for High-performance Computing Systems

    Energy Technology Data Exchange (ETDEWEB)

    Humble, Travis S. [ORNL; Britt, Keith A. [ORNL; Mohiyaddin, Fahd A. [ORNL

    2017-11-01

    We define some of the programming and system-level challenges facing the application of quantum processing to high-performance computing. Alongside barriers to physical integration, prominent differences in the execution of quantum and conventional programs challenges the intersection of these computational models. Following a brief overview of the state of the art, we discuss recent advances in programming and execution models for hybrid quantum-classical computing. We discuss a novel quantum-accelerator framework that uses specialized kernels to offload select workloads while integrating with existing computing infrastructure. We elaborate on the role of the host operating system to manage these unique accelerator resources, the prospects for deploying quantum modules, and the requirements placed on the language hierarchy connecting these different system components. We draw on recent advances in the modeling and simulation of quantum computing systems with the development of architectures for hybrid high-performance computing systems and the realization of software stacks for controlling quantum devices. Finally, we present simulation results that describe the expected system-level behavior of high-performance computing systems composed from compute nodes with quantum processing units. We describe performance for these hybrid systems in terms of time-to-solution, accuracy, and energy consumption, and we use simple application examples to estimate the performance advantage of quantum acceleration.

  19. A Survey and Evaluation of Simulators Suitable for Teaching Courses in Computer Architecture and Organization

    Science.gov (United States)

    Nikolic, B.; Radivojevic, Z.; Djordjevic, J.; Milutinovic, V.

    2009-01-01

    Courses in Computer Architecture and Organization are regularly included in Computer Engineering curricula. These courses are usually organized in such a way that students obtain not only a purely theoretical experience, but also a practical understanding of the topics lectured. This practical work is usually done in a laboratory using simulators…

  20. From Archi Torture to Architecture: Undergraduate Students Design and Implement Computers Using the Multimedia Logic Emulator

    Science.gov (United States)

    Stanley, Timothy D.; Wong, Lap Kei; Prigmore, Daniel; Benson, Justin; Fishler, Nathan; Fife, Leslie; Colton, Don

    2007-01-01

    Students learn better when they both hear and do. In computer architecture courses "doing" can be difficult in small schools without hardware laboratories hosted by computer engineering, electrical engineering, or similar departments. Software solutions exist. Our success with George Mills' Multimedia Logic (MML) is the focus of this paper. MML…

  1. A Project-Based Learning Approach to Programmable Logic Design and Computer Architecture

    Science.gov (United States)

    Kellett, C. M.

    2012-01-01

    This paper describes a course in programmable logic design and computer architecture as it is taught at the University of Newcastle, Australia. The course is designed around a major design project and has two supplemental assessment tasks that are also described. The context of the Computer Engineering degree program within which the course is…

  2. A cerebellar neuroprosthetic system: computational architecture and in vivo experiments

    Directory of Open Access Journals (Sweden)

    Ivan eHerreros Alonso

    2014-05-01

    Full Text Available Emulating the input-output functions performed by a brain structure opens the possibility for developing neuro-prosthetic systems that replace damaged neuronal circuits. Here, we demonstrate the feasibility of this approach by replacing the cerebellar circuit responsible for the acquisition and extinction of motor memories. Specifically, we show that a rat can undergo acquisition, retention and extinction of the eye-blink reflex even though the biological circuit responsible for this task has been chemically inactivated via anesthesia. This is achieved by first developing a computational model of the cerebellar microcircuit involved in the acquisition of conditioned reflexes and training it with synthetic data generated based on physiological recordings. Secondly, the cerebellar model is interfaced with the brain of an anesthetized rat, connecting the model's inputs and outputs to afferent and efferent cerebellar structures. As a result, we show that the anesthetized rat, equipped with our neuro-prosthetic system, can be classically conditioned to the acquisition of an eye-blink response. However, non-stationarities in the recorded biological signals limit the performance of the cerebellar model. Thus, we introduce an updated cerebellar model and validate it with physiological recordings showing that learning becomes stable and reliable. The resulting system represents an important step towards replacing lost functions of the central nervous system via neuro-prosthetics, obtained by integrating a synthetic circuit with the afferent and efferent pathways of a damaged brain region. These results also embody an early example of science-based medicine, where on the one hand the neuro-prosthetic system directly validates a theory of cerebellar learning that informed the design of the system, and on the other one it takes a step towards the development of neuro-prostheses that could recover lost learning functions in animals and, in the longer term

  3. A Cerebellar Neuroprosthetic System: Computational Architecture and in vivo Test

    International Nuclear Information System (INIS)

    Herreros, Ivan; Giovannucci, Andrea; Taub, Aryeh H.; Hogri, Roni; Magal, Ari; Bamford, Sim; Prueckl, Robert; Verschure, Paul F. M. J.

    2014-01-01

    Emulating the input–output functions performed by a brain structure opens the possibility for developing neuroprosthetic systems that replace damaged neuronal circuits. Here, we demonstrate the feasibility of this approach by replacing the cerebellar circuit responsible for the acquisition and extinction of motor memories. Specifically, we show that a rat can undergo acquisition, retention, and extinction of the eye-blink reflex even though the biological circuit responsible for this task has been chemically inactivated via anesthesia. This is achieved by first developing a computational model of the cerebellar microcircuit involved in the acquisition of conditioned reflexes and training it with synthetic data generated based on physiological recordings. Secondly, the cerebellar model is interfaced with the brain of an anesthetized rat, connecting the model’s inputs and outputs to afferent and efferent cerebellar structures. As a result, we show that the anesthetized rat, equipped with our neuroprosthetic system, can be classically conditioned to the acquisition of an eye-blink response. However, non-stationarities in the recorded biological signals limit the performance of the cerebellar model. Thus, we introduce an updated cerebellar model and validate it with physiological recordings showing that learning becomes stable and reliable. The resulting system represents an important step toward replacing lost functions of the central nervous system via neuroprosthetics, obtained by integrating a synthetic circuit with the afferent and efferent pathways of a damaged brain region. These results also embody an early example of science-based medicine, where on the one hand the neuroprosthetic system directly validates a theory of cerebellar learning that informed the design of the system, and on the other one it takes a step toward the development of neuro-prostheses that could recover lost learning functions in animals and, in the longer term, humans.

  4. A Cerebellar Neuroprosthetic System: Computational Architecture and in vivo Test

    Energy Technology Data Exchange (ETDEWEB)

    Herreros, Ivan; Giovannucci, Andrea [Synthetic Perceptive, Emotive and Cognitive Systems group (SPECS), Universitat Pompeu Fabra, Barcelona (Spain); Taub, Aryeh H.; Hogri, Roni; Magal, Ari [Psychobiology Research Unit, Tel Aviv University, Tel Aviv (Israel); Bamford, Sim [Physics Laboratory, Istituto Superiore di Sanità, Rome (Italy); Prueckl, Robert [Guger Technologies OG, Graz (Austria); Verschure, Paul F. M. J., E-mail: paul.verschure@upf.edu [Synthetic Perceptive, Emotive and Cognitive Systems group (SPECS), Universitat Pompeu Fabra, Barcelona (Spain); Institució Catalana de Recerca i Estudis Avançats, Barcelona (Spain)

    2014-05-21

    Emulating the input–output functions performed by a brain structure opens the possibility for developing neuroprosthetic systems that replace damaged neuronal circuits. Here, we demonstrate the feasibility of this approach by replacing the cerebellar circuit responsible for the acquisition and extinction of motor memories. Specifically, we show that a rat can undergo acquisition, retention, and extinction of the eye-blink reflex even though the biological circuit responsible for this task has been chemically inactivated via anesthesia. This is achieved by first developing a computational model of the cerebellar microcircuit involved in the acquisition of conditioned reflexes and training it with synthetic data generated based on physiological recordings. Secondly, the cerebellar model is interfaced with the brain of an anesthetized rat, connecting the model’s inputs and outputs to afferent and efferent cerebellar structures. As a result, we show that the anesthetized rat, equipped with our neuroprosthetic system, can be classically conditioned to the acquisition of an eye-blink response. However, non-stationarities in the recorded biological signals limit the performance of the cerebellar model. Thus, we introduce an updated cerebellar model and validate it with physiological recordings showing that learning becomes stable and reliable. The resulting system represents an important step toward replacing lost functions of the central nervous system via neuroprosthetics, obtained by integrating a synthetic circuit with the afferent and efferent pathways of a damaged brain region. These results also embody an early example of science-based medicine, where on the one hand the neuroprosthetic system directly validates a theory of cerebellar learning that informed the design of the system, and on the other one it takes a step toward the development of neuro-prostheses that could recover lost learning functions in animals and, in the longer term, humans.

  5. Cloud Computing Databases: Latest Trends and Architectural Concepts

    OpenAIRE

    Tarandeep Singh; Parvinder S. Sandhu

    2011-01-01

    The Economic factors are leading to the rise of infrastructures provides software and computing facilities as a service, known as cloud services or cloud computing. Cloud services can provide efficiencies for application providers, both by limiting up-front capital expenses, and by reducing the cost of ownership over time. Such services are made available in a data center, using shared commodity hardware for computation and storage. There is a varied set of cloud services...

  6. High performance computing in Windows Azure cloud

    OpenAIRE

    Ambruš, Dejan

    2013-01-01

    High performance, security, availability, scalability, flexibility and lower costs of maintenance have essentially contributed to the growing popularity of cloud computing in all spheres of life, especially in business. In fact cloud computing offers even more than this. With usage of virtual computing clusters a runtime environment for high performance computing can be efficiently implemented also in a cloud. There are many advantages but also some disadvantages of cloud computing, some ...

  7. A learnable parallel processing architecture towards unity of memory and computing.

    Science.gov (United States)

    Li, H; Gao, B; Chen, Z; Zhao, Y; Huang, P; Ye, H; Liu, L; Liu, X; Kang, J

    2015-08-14

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named "iMemComp", where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped "iMemComp" with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on "iMemComp" can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  8. A learnable parallel processing architecture towards unity of memory and computing

    Science.gov (United States)

    Li, H.; Gao, B.; Chen, Z.; Zhao, Y.; Huang, P.; Ye, H.; Liu, L.; Liu, X.; Kang, J.

    2015-08-01

    Developing energy-efficient parallel information processing systems beyond von Neumann architecture is a long-standing goal of modern information technologies. The widely used von Neumann computer architecture separates memory and computing units, which leads to energy-hungry data movement when computers work. In order to meet the need of efficient information processing for the data-driven applications such as big data and Internet of Things, an energy-efficient processing architecture beyond von Neumann is critical for the information society. Here we show a non-von Neumann architecture built of resistive switching (RS) devices named “iMemComp”, where memory and logic are unified with single-type devices. Leveraging nonvolatile nature and structural parallelism of crossbar RS arrays, we have equipped “iMemComp” with capabilities of computing in parallel and learning user-defined logic functions for large-scale information processing tasks. Such architecture eliminates the energy-hungry data movement in von Neumann computers. Compared with contemporary silicon technology, adder circuits based on “iMemComp” can improve the speed by 76.8% and the power dissipation by 60.3%, together with a 700 times aggressive reduction in the circuit area.

  9. Computer Assessed Design – A Vehicle of Architectural Communication and a Design Tool

    OpenAIRE

    Petrovici, Liliana-Mihaela

    2012-01-01

    In comparison with the limits of the traditional representation tools, the development of the computer graphics constitutes an opportunity to assert architectural values. The differences between communication codes of the architects and public are diminished; the architectural ideas can be represented in a coherent, intelligible and attractive way, so that they get more chances to be materialized according to the thinking of the creator. Concurrently, the graphic software have been improving ...

  10. A SECURE MESSAGE TRANSMISSION SYSTEM ARCHITECTURE FOR COMPUTER NETWORKS EMPLOYING SMART CARDS

    Directory of Open Access Journals (Sweden)

    Geylani KARDAŞ

    2008-01-01

    Full Text Available In this study, we introduce a mobile system architecture which employs smart cards for secure message transmission in computer networks. The use of smart card provides two security services as authentication and confidentiality in our design. The security of the system is provided by asymmetric encryption. Hence, smart cards are used to store personal account information as well as private key of each user for encryption / decryption operations. This offers further security, authentication and mobility to the system architecture. A real implementation of the proposed architecture which utilizes the JavaCard technology is also discussed in this study.

  11. From variability tolerance to approximate computing in parallel integrated architectures and accelerators

    CERN Document Server

    Rahimi, Abbas; Gupta, Rajesh K

    2017-01-01

    This book focuses on computing devices and their design at various levels to combat variability. The authors provide a review of key concepts with particular emphasis on timing errors caused by various variability sources. They discuss methods to predict and prevent, detect and correct, and finally conditions under which such errors can be accepted; they also consider their implications on cost, performance and quality. Coverage includes a comparative evaluation of methods for deployment across various layers of the system from circuits, architecture, to application software. These can be combined in various ways to achieve specific goals related to observability and controllability of the variability effects, providing means to achieve cross layer or hybrid resilience. · Covers challenges and opportunities in identifying microelectronic variability and the resulting errors at various layers in the system abstraction; · Enables readers to assess how various levels of circuit and system design can mitigate t...

  12. Analysis of Different Blade Architectures on small VAWT Performance

    Science.gov (United States)

    Battisti, L.; Brighenti, A.; Benini, E.; Raciti Castelli, M.

    2016-09-01

    The present paper aims at describing and comparing different small Vertical Axis Wind Turbine (VAWT) architectures, in terms of performance and loads. These characteristics can be highlighted by resorting to the Blade Element-Momentum (BE-M) model, commonly adopted for rotor pre-design and controller assessment. After validating the model with experimental data, the paper focuses on the analysis of VAWT loads depending on some relevant rotor features: blade number (2 and 3), airfoil camber line (comparing symmetrical and asymmetrical profiles) and blade inclination (straight versus helical blade). The effect of such characteristics on both power and thrusts (in the streamwise direction and in the crosswise one) as a function of both the blades azimuthal position and their Tip Speed Ratio (TSR) are presented and widely discussed.

  13. Gate errors in solid-state quantum-computer architectures

    International Nuclear Information System (INIS)

    Hu Xuedong; Das Sarma, S.

    2002-01-01

    We theoretically consider possible errors in solid-state quantum computation due to the interplay of the complex solid-state environment and gate imperfections. In particular, we study two examples of gate operations in the opposite ends of the gate speed spectrum, an adiabatic gate operation in electron-spin-based quantum dot quantum computation and a sudden gate operation in Cooper-pair-box superconducting quantum computation. We evaluate quantitatively the nonadiabatic operation of a two-qubit gate in a two-electron double quantum dot. We also analyze the nonsudden pulse gate in a Cooper-pair-box-based quantum-computer model. In both cases our numerical results show strong influences of the higher excited states of the system on the gate operation, clearly demonstrating the importance of a detailed understanding of the relevant Hilbert-space structure on the quantum-computer operations

  14. Component-based software for high-performance scientific computing

    Energy Technology Data Exchange (ETDEWEB)

    Alexeev, Yuri; Allan, Benjamin A; Armstrong, Robert C; Bernholdt, David E; Dahlgren, Tamara L; Gannon, Dennis; Janssen, Curtis L; Kenny, Joseph P; Krishnan, Manojkumar; Kohl, James A; Kumfert, Gary; McInnes, Lois Curfman; Nieplocha, Jarek; Parker, Steven G; Rasmussen, Craig; Windus, Theresa L

    2005-01-01

    Recent advances in both computational hardware and multidisciplinary science have given rise to an unprecedented level of complexity in scientific simulation software. This paper describes an ongoing grass roots effort aimed at addressing complexity in high-performance computing through the use of Component-Based Software Engineering (CBSE). Highlights of the benefits and accomplishments of the Common Component Architecture (CCA) Forum and SciDAC ISIC are given, followed by an illustrative example of how the CCA has been applied to drive scientific discovery in quantum chemistry. Thrusts for future research are also described briefly.

  15. Component-based software for high-performance scientific computing

    International Nuclear Information System (INIS)

    Alexeev, Yuri; Allan, Benjamin A; Armstrong, Robert C; Bernholdt, David E; Dahlgren, Tamara L; Gannon, Dennis; Janssen, Curtis L; Kenny, Joseph P; Krishnan, Manojkumar; Kohl, James A; Kumfert, Gary; McInnes, Lois Curfman; Nieplocha, Jarek; Parker, Steven G; Rasmussen, Craig; Windus, Theresa L

    2005-01-01

    Recent advances in both computational hardware and multidisciplinary science have given rise to an unprecedented level of complexity in scientific simulation software. This paper describes an ongoing grass roots effort aimed at addressing complexity in high-performance computing through the use of Component-Based Software Engineering (CBSE). Highlights of the benefits and accomplishments of the Common Component Architecture (CCA) Forum and SciDAC ISIC are given, followed by an illustrative example of how the CCA has been applied to drive scientific discovery in quantum chemistry. Thrusts for future research are also described briefly

  16. Real-time field programmable gate array architecture for computer vision

    Science.gov (United States)

    Arias-Estrada, Miguel; Torres-Huitzil, Cesar

    2001-01-01

    This paper presents an architecture for real-time generic convolution of a mask and an image. The architecture is intended for fast low-level image processing. The field programmable gate array (FPGA)-based architecture takes advantage of the availability of registers in FPGAs to implement an efficient and compact module to process the convolutions. The architecture is designed to minimize the number of accesses to the image memory and it is based on parallel modules with internal pipeline operation in order to improve its performance. The architecture is prototyped in a FPGA, but it can be implemented on dedicated very- large-scale-integrated devices to reach higher clock frequencies. Complexity issues, FPGA resources utilization, FPGA limitations, and real-time performance are discussed. Some results are presented and discussed.

  17. A Distributed Agent Architecture for a Computer Virus Immune System

    National Research Council Canada - National Science Library

    Harmer, Paul

    2000-01-01

    .... Information protection and information assurance are vital components required for achieving superiority in the Infosphere, but these goals are threatened by the exponential birth rate of new computer viruses...

  18. Single instruction computer architecture and its application in image processing

    Science.gov (United States)

    Laplante, Phillip A.

    1992-03-01

    A single processing computer system using only half-adder circuits is described. In addition, it is shown that only a single hard-wired instruction is needed in the control unit to obtain a complete instruction set for this general purpose computer. Such a system has several advantages. First it is intrinsically a RISC machine--in fact the 'ultimate RISC' machine. Second, because only a single type of logic element is employed the entire computer system can be easily realized on a single, highly integrated chip. Finally, due to the homogeneous nature of the computer's logic elements, the computer has possible implementations as an optical or chemical machine. This in turn suggests possible paradigms for neural computing and artificial intelligence. After showing how we can implement a full-adder, min, max and other operations using the half-adder, we use an array of such full-adders to implement the dilation operation for two black and white images. Next we implement the erosion operation of two black and white images using a relative complement function and the properties of erosion and dilation. This approach was inspired by papers by van der Poel in which a single instruction is used to furnish a complete set of general purpose instructions and by Bohm- Jacopini where it is shown that any problem can be solved using a Turing machine with one entry and one exit.

  19. Biologically-Inspired Control Architecture for Musical Performance Robots

    Directory of Open Access Journals (Sweden)

    Jorge Solis

    2014-10-01

    Full Text Available At Waseda University, since 1990, the authors have been developing anthropomorphic musical performance robots as a means for understanding human control, introducing novel ways of interaction between musical partners and robots, and proposing applications for humanoid robots. In this paper, the design of a biologically-inspired control architecture for both an anthropomorphic flutist robot and a saxophone playing robot are described. As for the flutist robot, the authors have focused on implementing an auditory feedback system to improve the calibration procedure for the robot in order to play all the notes correctly during a performance. In particular, the proposed auditory feedback system is composed of three main modules: an Expressive Music Generator, a Feed Forward Air Pressure Control System and a Pitch Evaluation System. As for the saxophone-playing robot, a pressure-pitch controller (based on the feedback error learning to improve the sound produced by the robot during a musical performance was proposed and implemented. In both cases studied, a set of experiments are described to verify the improvements achieved while considering biologically-inspired control approaches.

  20. The NILE system architecture: fault-tolerant, wide-area access to computing and data resources

    International Nuclear Information System (INIS)

    Ricciardi, Aleta; Ogg, Michael; Rothfus, Eric

    1996-01-01

    NILE is a multi-disciplinary project building a distributed computing environment for HEP. It provides wide-area, fault-tolerant, integrated access to processing and data resources for collaborators of the CLEO experiment, though the goals and principles are applicable to many domains. NILE has three main objectives: a realistic distributed system architecture design, the design of a robust data model, and a Fast-Track implementation providing a prototype design environment which will also be used by CLEO physicists. This paper focuses on the software and wide-area system architecture design and the computing issues involved in making NILE services highly-available. (author)

  1. Laboratory Works Designed for Developing Student Motivation in Computer Architecture

    Directory of Open Access Journals (Sweden)

    Petre Ogrutan

    2017-02-01

    Full Text Available In light of the current difficulties related to maintaining the students’ interest and to stimulate their motivation for learning, the authors have developed a range of new laboratory exercises intended for first-year students in Computer Science as well as for engineering students after completion of at least one course in computers. The educational goal of the herein proposed laboratory exercises is to enhance the students’ motivation and creative thinking by organizing a relaxed yet competitive learning environment. The authors have developed a device including LEDs and switches, which is connected to a computer. By using assembly language, commands can be issued to flash several LEDs and read the states of the switches. The effectiveness of this idea was confirmed by a statistical study.

  2. CUDA/GPU Technology : Parallel Programming For High Performance Scientific Computing

    OpenAIRE

    YUHENDRA; KUZE, Hiroaki; JOSAPHAT, Tetuko Sri Sumantyo

    2009-01-01

    [ABSTRACT]Graphics processing units (GP Us) originally designed for computer video cards have emerged as the most powerful chip in a high-performance workstation. In the high performance computation capabilities, graphic processing units (GPU) lead to much more powerful performance than conventional CPUs by means of parallel processing. In 2007, the birth of Compute Unified Device Architecture (CUDA) and CUDA-enabled GPUs by NVIDIA Corporation brought a revolution in the general purpose GPU a...

  3. Connection machine: a computer architecture based on cellular automata

    Energy Technology Data Exchange (ETDEWEB)

    Hillis, W D

    1984-01-01

    This paper describes the connection machine, a programmable computer based on cellular automata. The essential idea behind the connection machine is that a regular locally-connected cellular array can be made to behave as if the processing cells are connected into any desired topology. When the topology of the machine is chosen to match the topology of the application program, the result is a fast, powerful computing engine. The connection machine was originally designed to implement knowledge retrieval operations in artificial intelligence programs, but the hardware and the programming techniques are apparently applicable to a much larger class of problems. A machine with 100000 processing cells is currently being constructed. 27 references.

  4. Verification of Electromagnetic Physics Models for Parallel Computing Architectures in the GeantV Project

    Energy Technology Data Exchange (ETDEWEB)

    Amadio, G.; et al.

    2017-11-22

    An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physics models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.

  5. Analytical Performance Modeling and Validation of Intel’s Xeon Phi Architecture

    Energy Technology Data Exchange (ETDEWEB)

    Chunduri, Sudheer; Balaprakash, Prasanna; Morozov, Vitali; Vishwanath, Venkatram; Kumaran, Kalyan

    2017-01-01

    Modeling the performance of scientific applications on emerging hardware plays a central role in achieving extreme-scale computing goals. Analytical models that capture the interaction between applications and hardware characteristics are attractive because even a reasonably accurate model can be useful for performance tuning before the hardware is made available. In this paper, we develop a hardware model for Intel’s second-generation Xeon Phi architecture code-named Knights Landing (KNL) for the SKOPE framework. We validate the KNL hardware model by projecting the performance of mini-benchmarks and application kernels. The results show that our KNL model can project the performance with prediction errors of 10% to 20%. The hardware model also provides informative recommendations for code transformations and tuning.

  6. Control bandwidth improvements in GRAVITY fringe tracker by switching to a synchronous real time computer architecture

    Science.gov (United States)

    Abuter, Roberto; Dembet, Roderick; Lacour, Sylvestre; di Lieto, Nicola; Woillez, Julien; Eisenhauer, Frank; Fedou, Pierre; Phan Duc, Than

    2016-08-01

    The new VLTI (Very Large Telescope Interferometer) 1 instrument GRAVITY5, 22, 23 is equipped with a fringe tracker16 able to stabilize the K-band fringes on six baselines at the same time. It has been designed to achieve a performance for average seeing conditions of a residual OPD (Optical Path Difference) lower than 300 nm with objects brighter than K = 10. The control loop implementing the tracking is composed of a four stage real time computer system compromising: a sensor where the detector pixels are read in and the OPD and GD (Group Delay) are calculated; a controller receiving the computed sensor quantities and producing commands for the piezo actuators; a concentrator which combines both the OPD commands with the real time tip/tilt corrections offloading them to the piezo actuator; and finally a Kalman15 parameter estimator. This last stage is used to monitor current measurements over a window of few seconds and estimate new values for the main Kalman15 control loop parameters. The hardware and software implementation of this design runs asynchronously and communicates the four computers for data transfer via the Reflective Memory Network3. With the purpose of improving the performance of the GRAVITY5, 23 fringe tracking16, 22 control loop, a deviation from the standard asynchronous communication mechanism has been proposed and implemented. This new scheme operates the four independent real time computers involved in the tracking loop synchronously using the Reflective Memory Interrupts2 as the coordination signal. This synchronous mechanism had the effect of reducing the total pure delay of the loop from 3.5 [ms] to 2.0 [ms] which then translates on a better stabilization of the fringes as the bandwidth of the system is substantially improved. This paper will explain in detail the real time architecture of the fringe tracker in both is synchronous and synchronous implementation. The achieved improvements on reducing the delay via this mechanism will be

  7. A State-Based Modeling Approach for Efficient Performance Evaluation of Embedded System Architectures at Transaction Level

    Directory of Open Access Journals (Sweden)

    Anthony Barreteau

    2012-01-01

    Full Text Available Abstract models are necessary to assist system architects in the evaluation process of hardware/software architectures and to cope with the still increasing complexity of embedded systems. Efficient methods are required to create reliable models of system architectures and to allow early performance evaluation and fast exploration of the design space. In this paper, we present a specific transaction level modeling approach for performance evaluation of hardware/software architectures. This approach relies on a generic execution model that exhibits light modeling effort. Created models are used to evaluate by simulation expected processing and memory resources according to various architectures. The proposed execution model relies on a specific computation method defined to improve the simulation speed of transaction level models. The benefits of the proposed approach are highlighted through two case studies. The first case study is a didactic example illustrating the modeling approach. In this example, a simulation speed-up by a factor of 7,62 is achieved by using the proposed computation method. The second case study concerns the analysis of a communication receiver supporting part of the physical layer of the LTE protocol. In this case study, architecture exploration is led in order to improve the allocation of processing functions.

  8. Optical interconnection networks for high-performance computing systems

    International Nuclear Information System (INIS)

    Biberman, Aleksandr; Bergman, Keren

    2012-01-01

    Enabled by silicon photonic technology, optical interconnection networks have the potential to be a key disruptive technology in computing and communication industries. The enduring pursuit of performance gains in computing, combined with stringent power constraints, has fostered the ever-growing computational parallelism associated with chip multiprocessors, memory systems, high-performance computing systems and data centers. Sustaining these parallelism growths introduces unique challenges for on- and off-chip communications, shifting the focus toward novel and fundamentally different communication approaches. Chip-scale photonic interconnection networks, enabled by high-performance silicon photonic devices, offer unprecedented bandwidth scalability with reduced power consumption. We demonstrate that the silicon photonic platforms have already produced all the high-performance photonic devices required to realize these types of networks. Through extensive empirical characterization in much of our work, we demonstrate such feasibility of waveguides, modulators, switches and photodetectors. We also demonstrate systems that simultaneously combine many functionalities to achieve more complex building blocks. We propose novel silicon photonic devices, subsystems, network topologies and architectures to enable unprecedented performance of these photonic interconnection networks. Furthermore, the advantages of photonic interconnection networks extend far beyond the chip, offering advanced communication environments for memory systems, high-performance computing systems, and data centers. (review article)

  9. Information management architecture for an integrated computing environment for the Environmental Restoration Program. Environmental Restoration Program, Volume 3, Interim technical architecture

    International Nuclear Information System (INIS)

    1994-09-01

    This third volume of the Information Management Architecture for an Integrated Computing Environment for the Environmental Restoration Program--the Interim Technical Architecture (TA) (referred to throughout the remainder of this document as the ER TA)--represents a key milestone in establishing a coordinated information management environment in which information initiatives can be pursued with the confidence that redundancy and inconsistencies will be held to a minimum. This architecture is intended to be used as a reference by anyone whose responsibilities include the acquisition or development of information technology for use by the ER Program. The interim ER TA provides technical guidance at three levels. At the highest level, the technical architecture provides an overall computing philosophy or direction. At this level, the guidance does not address specific technologies or products but addresses more general concepts, such as the use of open systems, modular architectures, graphical user interfaces, and architecture-based development. At the next level, the technical architecture provides specific information technology recommendations regarding a wide variety of specific technologies. These technologies include computing hardware, operating systems, communications software, database management software, application development software, and personal productivity software, among others. These recommendations range from the adoption of specific industry or Martin Marietta Energy Systems, Inc. (Energy Systems) standards to the specification of individual products. At the third level, the architecture provides guidance regarding implementation strategies for the recommended technologies that can be applied to individual projects and to the ER Program as a whole

  10. Hardware architecture design of image restoration based on time-frequency domain computation

    Science.gov (United States)

    Wen, Bo; Zhang, Jing; Jiao, Zipeng

    2013-10-01

    The image restoration algorithms based on time-frequency domain computation is high maturity and applied widely in engineering. To solve the high-speed implementation of these algorithms, the TFDC hardware architecture is proposed. Firstly, the main module is designed, by analyzing the common processing and numerical calculation. Then, to improve the commonality, the iteration control module is planed for iterative algorithms. In addition, to reduce the computational cost and memory requirements, the necessary optimizations are suggested for the time-consuming module, which include two-dimensional FFT/IFFT and the plural calculation. Eventually, the TFDC hardware architecture is adopted for hardware design of real-time image restoration system. The result proves that, the TFDC hardware architecture and its optimizations can be applied to image restoration algorithms based on TFDC, with good algorithm commonality, hardware realizability and high efficiency.

  11. Network survivability performance (computer diskette)

    Science.gov (United States)

    1993-11-01

    File characteristics: Data file; 1 file. Physical description: 1 computer diskette; 3 1/2 in.; high density; 2.0MB. System requirements: Mac; Word. This technical report has been developed to address the survivability of telecommunications networks including services. It responds to the need for a common understanding of, and assessment techniques for network survivability, availability, integrity, and reliability. It provides a basis for designing and operating telecommunication networks to user expectations for network survivability.

  12. Algorithm-structured computer arrays and networks architectures and processes for images, percepts, models, information

    CERN Document Server

    Uhr, Leonard

    1984-01-01

    Computer Science and Applied Mathematics: Algorithm-Structured Computer Arrays and Networks: Architectures and Processes for Images, Percepts, Models, Information examines the parallel-array, pipeline, and other network multi-computers.This book describes and explores arrays and networks, those built, being designed, or proposed. The problems of developing higher-level languages for systems and designing algorithm, program, data flow, and computer structure are also discussed. This text likewise describes several sequences of successively more general attempts to combine the power of arrays wi

  13. p88110: A Graphical Simulator for Computer Architecture and Organization Courses

    Science.gov (United States)

    Garcia, M. I.; Rodriguez, S.; Perez, A.; Garcia, A.

    2009-01-01

    Studying fundamental Computer Architecture and Organization topics requires a significant amount of practical work if students are to acquire a good grasp of the theoretical concepts presented in classroom lectures or textbooks. The use of simulators is commonly adopted in order to reach this objective. However, as most of the available…

  14. Computer Security Primer: Systems Architecture, Special Ontology and Cloud Virtual Machines

    Science.gov (United States)

    Waguespack, Leslie J.

    2014-01-01

    With the increasing proliferation of multitasking and Internet-connected devices, security has reemerged as a fundamental design concern in information systems. The shift of IS curricula toward a largely organizational perspective of security leaves little room for focus on its foundation in systems architecture, the computational underpinnings of…

  15. Usage of Thin-Client/Server Architecture in Computer Aided Education

    Science.gov (United States)

    Cimen, Caghan; Kavurucu, Yusuf; Aydin, Halit

    2014-01-01

    With the advances of technology, thin-client/server architecture has become popular in multi-user/single network environments. Thin-client is a user terminal in which the user can login to a domain and run programs by connecting to a remote server. Recent developments in network and hardware technologies (cloud computing, virtualization, etc.)…

  16. Architecture and pervasive Computing when buildings and design artifacts become popular interfaces

    DEFF Research Database (Denmark)

    Krogh, Peter Gall; Grønbæk, Kaj

    2001-01-01

    One of the main areas of architecture is buildings design, and we will focus on the impact of pervasive computing in this area. The breakthrough of the Internet has triggered a significant increase in what is often called intelligent buildings 1  in recent years. Due to development in pervasive c...

  17. Combining Self-Explaining with Computer Architecture Diagrams to Enhance the Learning of Assembly Language Programming

    Science.gov (United States)

    Hung, Y.-C.

    2012-01-01

    This paper investigates the impact of combining self explaining (SE) with computer architecture diagrams to help novice students learn assembly language programming. Pre- and post-test scores for the experimental and control groups were compared and subjected to covariance (ANCOVA) statistical analysis. Results indicate that the SE-plus-diagram…

  18. Architecture and Initial Development of a Digital Library Platform for Computable Knowledge Objects for Health.

    Science.gov (United States)

    Flynn, Allen J; Bahulekar, Namita; Boisvert, Peter; Lagoze, Carl; Meng, George; Rampton, James; Friedman, Charles P

    2017-01-01

    Throughout the world, biomedical knowledge is routinely generated and shared through primary and secondary scientific publications. However, there is too much latency between publication of knowledge and its routine use in practice. To address this latency, what is actionable in scientific publications can be encoded to make it computable. We have created a purpose-built digital library platform to hold, manage, and share actionable, computable knowledge for health called the Knowledge Grid Library. Here we present it with its system architecture.

  19. Reconfigurable FPGA architecture for computer vision applications in Smart Camera Networks

    OpenAIRE

    Maggiani , Luca; Salvadori , Claudio; Petracca , Matteo; Pagano , Paolo; Saletti , Roberto

    2013-01-01

    International audience; Smart Camera Networks (SCNs) is nowadays an emerging research field which represents the natural evolution of centralized computer vision applications towards full distributed and pervasive systems. In such a scenario, one of the biggest effort is in the definition of a flexible and reconfigurable SCN node architecture able to remotely support the possibility of updating the application parameters and changing the running computer vision applications at run-time. In th...

  20. Performance assessment of distributed communication architectures in smart grid.

    OpenAIRE

    Jiang, Jing; Sun, Hongjian

    2016-01-01

    The huge amount of smart meters and growing frequent data readings have become a big challenge on data acquisition and processing in smart grid advanced metering infrastructure systems. This requires a distributed communication architecture in which multiple distributed meter data management systems (MDMSs) are deployed and meter data are processed locally. In this paper, we present the network model for supporting this distributed communication architecture and propos...

  1. Computer Generation of Fourier Transform Libraries for Distributed Memory Architectures

    Science.gov (United States)

    2010-12-01

    tractions used in quantum chemistry . It too performs algebraic transformations tominimize the operations count, and then optimizes code based on...existing parallel DFT algorithms, including their strengths and weaknesses. Four-stepFFT.The four-step algorithm [Hegland, 1994;Norton and Silberger , 1987...Sadayappan, and Alexander Sibiryakov. Synthesis of high-performance parallel programs for a class of ab initio quan- tum chemistry models. Proc. of

  2. Performance Analysis of GFDL's GCM Line-By-Line Radiative Transfer Model on GPU and MIC Architectures

    Science.gov (United States)

    Menzel, R.; Paynter, D.; Jones, A. L.

    2017-12-01

    Due to their relatively low computational cost, radiative transfer models in global climate models (GCMs) run on traditional CPU architectures generally consist of shortwave and longwave parameterizations over a small number of wavelength bands. With the rise of newer GPU and MIC architectures, however, the performance of high resolution line-by-line radiative transfer models may soon approach those of the physical parameterizations currently employed in GCMs. Here we present an analysis of the current performance of a new line-by-line radiative transfer model currently under development at GFDL. Although originally designed to specifically exploit GPU architectures through the use of CUDA, the radiative transfer model has recently been extended to include OpenMP in an effort to also effectively target MIC architectures such as Intel's Xeon Phi. Using input data provided by the upcoming Radiative Forcing Model Intercomparison Project (RFMIP, as part of CMIP 6), we compare model results and performance data for various model configurations and spectral resolutions run on both GPU and Intel Knights Landing architectures to analogous runs of the standard Oxford Reference Forward Model on traditional CPUs.

  3. Overview of Parallel Platforms for Common High Performance Computing

    Directory of Open Access Journals (Sweden)

    T. Fryza

    2012-04-01

    Full Text Available The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well.

  4. Apolux : an innovative computer code for daylight design and analysis in architecture and urbanism

    Energy Technology Data Exchange (ETDEWEB)

    Claro, A.; Pereira, F.O.R.; Ledo, R.Z. [Santa Catarina Federal Univ., Florianopolis, SC (Brazil)

    2005-07-01

    The main capabilities of a new computer program for calculating and analyzing daylighting in architectural space were discussed. Apolux 1.0 was designed to use three-dimensional files generated in graphic editors in the data exchange file (DXF) format and was developed to integrate an architect's design characteristics. An example of its use in a design context development was presented. The program offers fast and flexible manipulation of video card models in different visualization conditions. The algorithm for working with the physics of light is based on the radiosity method representing the surfaces through finite elements divided in small triangular units of area which are fully confronted to each other. The form factors of each triangle are determined in relation to all others in the primary calculation. Visible directions of the sky are also included according to the modular units of a subdivided globe. Following these primary calculations, the different and successive daylighting solutions can be determined under different sky conditions. The program can also change the properties of the materials to quickly recalculate the solutions. The program has been applied in an office building in Florianopolis, Brazil. The four stages of design include initial discussion with the architects about the conceptual possibilities; development of a comparative study based on 2 architectural designs with different conceptual elements regarding daylighting exploitation in order to compare internal daylighting levels and distribution of the 2 options exposed to the same external conditions; study the solar shading devices for specific facades; and, simulations to test the performance of different designs. The program has proven to be very flexible with reliable results. It has the possibility of incorporating situations of the real sky through the input of the Spherical model of real sky luminance values. 3 refs., 14 figs.

  5. Architecture and program structures for a special purpose finite element computer

    Energy Technology Data Exchange (ETDEWEB)

    Norrie, D.H.; Norrie, C.W.

    1983-01-01

    The development of very large scale integration (VLSI) has made special-purpose computers economically possible. With such a machine, the loss of flexibility compared with a general-purpose computer can be offset by the increased speed which can be obtained by tailoring the architecture to the particular problem or class of problem. The first kind of special-purpose machine has its architecture modelled on the physical structure of the problem and the second kind has its design tailored to the computational algorithm used. The parallel finite element machine (PARFEM) being designed at the University of Calgary for the solution of finite element problems is of the second kind. Its conceptual design is described and progress to date outlined. 14 references.

  6. Blaze-DEMGPU: Modular high performance DEM framework for the GPU architecture

    Directory of Open Access Journals (Sweden)

    Nicolin Govender

    2016-01-01

    Full Text Available Blaze-DEMGPU is a modular GPU based discrete element method (DEM framework that supports polyhedral shaped particles. The high level performance is attributed to the light weight and Single Instruction Multiple Data (SIMD that the GPU architecture offers. Blaze-DEMGPU offers suitable algorithms to conduct DEM simulations on the GPU and these algorithms can be extended and modified. Since a large number of scientific simulations are particle based, many of the algorithms and strategies for GPU implementation present in Blaze-DEMGPU can be applied to other fields. Blaze-DEMGPU will make it easier for new researchers to use high performance GPU computing as well as stimulate wider GPU research efforts by the DEM community.

  7. A Coarse-Grained Reconfigurable Architecture with Compilation for High Performance

    Directory of Open Access Journals (Sweden)

    Lu Wan

    2012-01-01

    Full Text Available We propose a fast data relay (FDR mechanism to enhance existing CGRA (coarse-grained reconfigurable architecture. FDR can not only provide multicycle data transmission in concurrent with computations but also convert resource-demanding inter-processing-element global data accesses into local data accesses to avoid communication congestion. We also propose the supporting compiler techniques that can efficiently utilize the FDR feature to achieve higher performance for a variety of applications. Our results on FDR-based CGRA are compared with two other works in this field: ADRES and RCP. Experimental results for various multimedia applications show that FDR combined with the new compiler deliver up to 29% and 21% higher performance than ADRES and RCP, respectively.

  8. US QCD computational performance studies with PERI

    International Nuclear Information System (INIS)

    Zhang, Y; Fowler, R; Huck, K; Malony, A; Porterfield, A; Reed, D; Shende, S; Taylor, V; Wu, X

    2007-01-01

    We report on some of the interactions between two SciDAC projects: The National Computational Infrastructure for Lattice Gauge Theory (USQCD), and the Performance Engineering Research Institute (PERI). Many modern scientific programs consistently report the need for faster computational resources to maintain global competitiveness. However, as the size and complexity of emerging high end computing (HEC) systems continue to rise, achieving good performance on such systems is becoming ever more challenging. In order to take full advantage of the resources, it is crucial to understand the characteristics of relevant scientific applications and the systems these applications are running on. Using tools developed under PERI and by other performance measurement researchers, we studied the performance of two applications, MILC and Chroma, on several high performance computing systems at DOE laboratories. In the case of Chroma, we discuss how the use of C++ and modern software engineering and programming methods are driving the evolution of performance tools

  9. Framework for Architecture Trade Study Using MBSE and Performance Simulation

    Science.gov (United States)

    Ryan, Jessica; Sarkani, Shahram; Mazzuchim, Thomas

    2012-01-01

    Increasing complexity in modern systems as well as cost and schedule constraints require a new paradigm of system engineering to fulfill stakeholder needs. Challenges facing efficient trade studies include poor tool interoperability, lack of simulation coordination (design parameters) and requirements flowdown. A recent trend toward Model Based System Engineering (MBSE) includes flexible architecture definition, program documentation, requirements traceability and system engineering reuse. As a new domain MBSE still lacks governing standards and commonly accepted frameworks. This paper proposes a framework for efficient architecture definition using MBSE in conjunction with Domain Specific simulation to evaluate trade studies. A general framework is provided followed with a specific example including a method for designing a trade study, defining candidate architectures, planning simulations to fulfill requirements and finally a weighted decision analysis to optimize system objectives.

  10. Simulation of electronic structure Hamiltonians in a superconducting quantum computer architecture

    Energy Technology Data Exchange (ETDEWEB)

    Kaicher, Michael; Wilhelm, Frank K. [Theoretical Physics, Saarland University, 66123 Saarbruecken (Germany); Love, Peter J. [Department of Physics, Haverford College, Haverford, Pennsylvania 19041 (United States)

    2015-07-01

    Quantum chemistry has become one of the most promising applications within the field of quantum computation. Simulating the electronic structure Hamiltonian (ESH) in the Bravyi-Kitaev (BK)-Basis to compute the ground state energies of atoms/molecules reduces the number of qubit operations needed to simulate a single fermionic operation to O(log(n)) as compared to O(n) in the Jordan-Wigner-Transformation. In this work we will present the details of the BK-Transformation, show an example of implementation in a superconducting quantum computer architecture and compare it to the most recent quantum chemistry algorithms suggesting a constant overhead.

  11. (Invited) Wavy Channel TFT Architecture for High Performance Oxide Based Displays

    KAUST Repository

    Hanna, Amir

    2015-05-22

    We show the effectiveness of wavy channel architecture for thin film transistor application for increased output current. This specific architecture allows increased width of the device by adopting a corrugated shape of the substrate without any further real estate penalty. The performance improvement is attributed not only to the increased transistor width, but also to enhanced applied electric field in the channel due to the wavy architecture.

  12. (Invited) Wavy Channel TFT Architecture for High Performance Oxide Based Displays

    KAUST Repository

    Hanna, Amir; Hussain, Aftab M.; Hussain, Aftab M.; Ghoneim, Mohamed T.; Rojas, Jhonathan Prieto; Sevilla, Galo T.; Hussain, Muhammad Mustafa

    2015-01-01

    We show the effectiveness of wavy channel architecture for thin film transistor application for increased output current. This specific architecture allows increased width of the device by adopting a corrugated shape of the substrate without any further real estate penalty. The performance improvement is attributed not only to the increased transistor width, but also to enhanced applied electric field in the channel due to the wavy architecture.

  13. An overview of the activities of the OECD/NEA Task Force on adapting computer codes in nuclear applications to parallel architectures

    Energy Technology Data Exchange (ETDEWEB)

    Kirk, B.L. [Oak Ridge National Lab., TN (United States); Sartori, E. [OCDE/OECD NEA Data Bank, Issy-les-Moulineaux (France); Viedma, L.G. de [Consejo de Seguridad Nuclear, Madrid (Spain)

    1997-06-01

    Subsequent to the introduction of High Performance Computing in the developed countries, the Organization for Economic Cooperation and Development/Nuclear Energy Agency (OECD/NEA) created the Task Force on Adapting Computer Codes in Nuclear Applications to Parallel Architectures (under the guidance of the Nuclear Science Committee`s Working Party on Advanced Computing) to study the growth area in supercomputing and its applicability to the nuclear community`s computer codes. The result has been four years of investigation for the Task Force in different subject fields - deterministic and Monte Carlo radiation transport, computational mechanics and fluid dynamics, nuclear safety, atmospheric models and waste management.

  14. An overview of the activities of the OECD/NEA Task Force on adapting computer codes in nuclear applications to parallel architectures

    International Nuclear Information System (INIS)

    Kirk, B.L.; Sartori, E.; Viedma, L.G. de

    1997-01-01

    Subsequent to the introduction of High Performance Computing in the developed countries, the Organization for Economic Cooperation and Development/Nuclear Energy Agency (OECD/NEA) created the Task Force on Adapting Computer Codes in Nuclear Applications to Parallel Architectures (under the guidance of the Nuclear Science Committee's Working Party on Advanced Computing) to study the growth area in supercomputing and its applicability to the nuclear community's computer codes. The result has been four years of investigation for the Task Force in different subject fields - deterministic and Monte Carlo radiation transport, computational mechanics and fluid dynamics, nuclear safety, atmospheric models and waste management

  15. Experimental Demonstration of a Self-organized Architecture for Emerging Grid Computing Applications on OBS Testbed

    Science.gov (United States)

    Liu, Lei; Hong, Xiaobin; Wu, Jian; Lin, Jintong

    As Grid computing continues to gain popularity in the industry and research community, it also attracts more attention from the customer level. The large number of users and high frequency of job requests in the consumer market make it challenging. Clearly, all the current Client/Server(C/S)-based architecture will become unfeasible for supporting large-scale Grid applications due to its poor scalability and poor fault-tolerance. In this paper, based on our previous works [1, 2], a novel self-organized architecture to realize a highly scalable and flexible platform for Grids is proposed. Experimental results show that this architecture is suitable and efficient for consumer-oriented Grids.

  16. Designing fault-tolerant real-time computer systems with diversified bus architecture for nuclear power plants

    International Nuclear Information System (INIS)

    Behera, Rajendra Prasad; Murali, N.; Satya Murty, S.A.V.

    2014-01-01

    Fault-tolerant real-time computer (FT-RTC) systems are widely used to perform safe operation of nuclear power plants (NPP) and safe shutdown in the event of any untoward situation. Design requirements for such systems need high reliability, availability, computational ability for measurement via sensors, control action via actuators, data communication and human interface via keyboard or display. All these attributes of FT-RTC systems are required to be implemented using best known methods such as redundant system design using diversified bus architecture to avoid common cause failure, fail-safe design to avoid unsafe failure and diagnostic features to validate system operation. In this context, the system designer must select efficient as well as highly reliable diversified bus architecture in order to realize fault-tolerant system design. This paper presents a comparative study between CompactPCI bus and Versa Module Eurocard (VME) bus architecture for designing FT-RTC systems with switch over logic system (SOLS) for NPP. (author)

  17. CMOL/CMOS hardware architectures and performance/price for Bayesian memory - The building block of intelligent systems

    Science.gov (United States)

    Zaveri, Mazad Shaheriar

    The semiconductor/computer industry has been following Moore's law for several decades and has reaped the benefits in speed and density of the resultant scaling. Transistor density has reached almost one billion per chip, and transistor delays are in picoseconds. However, scaling has slowed down, and the semiconductor industry is now facing several challenges. Hybrid CMOS/nano technologies, such as CMOL, are considered as an interim solution to some of the challenges. Another potential architectural solution includes specialized architectures for applications/models in the intelligent computing domain, one aspect of which includes abstract computational models inspired from the neuro/cognitive sciences. Consequently in this dissertation, we focus on the hardware implementations of Bayesian Memory (BM), which is a (Bayesian) Biologically Inspired Computational Model (BICM). This model is a simplified version of George and Hawkins' model of the visual cortex, which includes an inference framework based on Judea Pearl's belief propagation. We then present a "hardware design space exploration" methodology for implementing and analyzing the (digital and mixed-signal) hardware for the BM. This particular methodology involves: analyzing the computational/operational cost and the related micro-architecture, exploring candidate hardware components, proposing various custom hardware architectures using both traditional CMOS and hybrid nanotechnology - CMOL, and investigating the baseline performance/price of these architectures. The results suggest that CMOL is a promising candidate for implementing a BM. Such implementations can utilize the very high density storage/computation benefits of these new nano-scale technologies much more efficiently; for example, the throughput per 858 mm2 (TPM) obtained for CMOL based architectures is 32 to 40 times better than the TPM for a CMOS based multiprocessor/multi-FPGA system, and almost 2000 times better than the TPM for a PC

  18. A COMPUTER APPLICATION FOR THE ARCHITECTURAL PROGRAM DEVELOPMENT IN DESIGN EDUCATION

    Directory of Open Access Journals (Sweden)

    Daniel de Carvalho Moreira

    2012-02-01

    Full Text Available The development of the architectural program in the design studio faces several difficulties. The purpose of the program is to describe the conditions where the building being designed will operate; this requires a lot of information and organization. Due to its complexity, the architetural program definition in the disciplines of design is often simplified. This article discusses such issue and proposes a computer application (SINFORMA that gathers information about the building and the theme of the project in order to develop the architectural program based on structures proposed by bibliographic references. The SINFORMA is composed by a framework which includes a data base and modules which analyze and organize functional requirements, according to the Problem Seeking method and the contemporary values of architecture enumerated by Hershberger. It is discussed how the application can be applied in design education and how it offers students a practical approach and a comprehensive data analysis for the design of built environment. Keywords: Architectural programming, Architectural design, Education.

  19. RSAM: An enhanced architecture for achieving web services reliability in mobile cloud computing

    Directory of Open Access Journals (Sweden)

    Amr S. Abdelfattah

    2018-04-01

    Full Text Available The evolution of the mobile landscape is coupled with the ubiquitous nature of the internet with its intermittent wireless connectivity and the web services. Achieving the web service reliability results in low communication overhead and retrieving the appropriate response. The middleware approach (MA is highly tended to achieve the web service reliability. This paper proposes a Reliable Service Architecture using Middleware (RSAM that achieves the reliable web services consumption. The enhanced architecture focuses on ensuring and tracking the request execution under the communication limitations and service temporal unavailability. It considers the most measurement factors including: request size, response size, and consuming time. We conducted experiments to compare the enhanced architecture with the traditional one. In these experiments, we covered several cases to prove the achievement of reliability. Results also show that the request size was found to be constant, the response size is identical to the traditional architecture, and the increase in the consuming time was less than 5% of the transaction time with the different response sizes. Keywords: Reliable web service, Middleware architecture, Mobile cloud computing

  20. Debugging a high performance computing program

    Science.gov (United States)

    Gooding, Thomas M.

    2013-08-20

    Methods, apparatus, and computer program products are disclosed for debugging a high performance computing program by gathering lists of addresses of calling instructions for a plurality of threads of execution of the program, assigning the threads to groups in dependence upon the addresses, and displaying the groups to identify defective threads.

  1. CPU SIM: A Computer Simulator for Use in an Introductory Computer Organization-Architecture Class.

    Science.gov (United States)

    Skrein, Dale

    1994-01-01

    CPU SIM, an interactive low-level computer simulation package that runs on the Macintosh computer, is described. The program is designed for instructional use in the first or second year of undergraduate computer science, to teach various features of typical computer organization through hands-on exercises. (MSE)

  2. High performance computer code for molecular dynamics simulations

    International Nuclear Information System (INIS)

    Levay, I.; Toekesi, K.

    2007-01-01

    Complete text of publication follows. Molecular Dynamics (MD) simulation is a widely used technique for modeling complicated physical phenomena. Since 2005 we are developing a MD simulations code for PC computers. The computer code is written in C++ object oriented programming language. The aim of our work is twofold: a) to develop a fast computer code for the study of random walk of guest atoms in Be crystal, b) 3 dimensional (3D) visualization of the particles motion. In this case we mimic the motion of the guest atoms in the crystal (diffusion-type motion), and the motion of atoms in the crystallattice (crystal deformation). Nowadays, it is common to use Graphics Devices in intensive computational problems. There are several ways to use this extreme processing performance, but never before was so easy to programming these devices as now. The CUDA (Compute Unified Device) Architecture introduced by nVidia Corporation in 2007 is a very useful for every processor hungry application. A Unified-architecture GPU include 96-128, or more stream processors, so the raw calculation performance is 576(!) GFLOPS. It is ten times faster, than the fastest dual Core CPU [Fig.1]. Our improved MD simulation software uses this new technology, which speed up our software and the code run 10 times faster in the critical calculation code segment. Although the GPU is a very powerful tool, it has a strongly paralleled structure. It means, that we have to create an algorithm, which works on several processors without deadlock. Our code currently uses 256 threads, shared and constant on-chip memory, instead of global memory, which is 100 times slower than others. It is possible to implement the total algorithm on GPU, therefore we do not need to download and upload the data in every iteration. On behalf of maximal throughput, every thread run with the same instructions

  3. Optimized Architectural Approaches in Hardware and Software Enabling Very High Performance Shared Storage Systems

    CERN Multimedia

    CERN. Geneva

    2004-01-01

    There are issues encountered in high performance storage systems that normally lead to compromises in architecture. Compute clusters tend to have compute phases followed by an I/O phase that must move data from the entire cluster in one operation. That data may then be shared by a large number of clients creating unpredictable read and write patterns. In some cases the aggregate performance of a server cluster must exceed 100 GB/s to minimize the time required for the I/O cycle thus maximizing compute availability. Accessing the same content from multiple points in a shared file system leads to the classical problems of data "hot spots" on the disk drive side and access collisions on the data connectivity side. The traditional method for increasing apparent bandwidth usually includes data replication which is costly in both storage and management. Scaling a model that includes replicated data presents additional management challenges as capacity and bandwidth expand asymmetrically while the system is scaled. ...

  4. Embedded High Performance Scalable Computing Systems

    National Research Council Canada - National Science Library

    Ngo, David

    2003-01-01

    The Embedded High Performance Scalable Computing Systems (EHPSCS) program is a cooperative agreement between Sanders, A Lockheed Martin Company and DARPA that ran for three years, from Apr 1995 - Apr 1998...

  5. High Performance Computing (HPC) Challenge (HPCC) Benchmark Suite Development

    National Research Council Canada - National Science Library

    Dongarra, J. J

    2005-01-01

    .... The applications of performance modeling are numerous, including evaluation of algorithms, optimization of code implementation, parallel library development, and comparison of system architectures...

  6. Architecture Students' Perceptions of Their Learning Environment and Their Academic Performance

    Science.gov (United States)

    Oluwatayo, Adedapo Adewunmi; Aderonmu, Peter A.; Aduwo, Egidario B.

    2015-01-01

    Scholars have agreed that the way in which students perceive their learning environments influences their academic performance. Empirical studies that focus on architecture students, however, have been very scarce. This is the gap that an attempt is filled in this study. A questionnaire survey of 273 students in a school of architecture in Nigeria…

  7. Cloud Computing for Complex Performance Codes.

    Energy Technology Data Exchange (ETDEWEB)

    Appel, Gordon John [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Hadgu, Teklu [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Klein, Brandon Thorin [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Miner, John Gifford [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-02-01

    This report describes the use of cloud computing services for running complex public domain performance assessment problems. The work consisted of two phases: Phase 1 was to demonstrate complex codes, on several differently configured servers, could run and compute trivial small scale problems in a commercial cloud infrastructure. Phase 2 focused on proving non-trivial large scale problems could be computed in the commercial cloud environment. The cloud computing effort was successfully applied using codes of interest to the geohydrology and nuclear waste disposal modeling community.

  8. NETRA: A parallel architecture for integrated vision systems 2: Algorithms and performance evaluation

    Science.gov (United States)

    Choudhary, Alok N.; Patel, Janak H.; Ahuja, Narendra

    1989-01-01

    In part 1 architecture of NETRA is presented. A performance evaluation of NETRA using several common vision algorithms is also presented. Performance of algorithms when they are mapped on one cluster is described. It is shown that SIMD, MIMD, and systolic algorithms can be easily mapped onto processor clusters, and almost linear speedups are possible. For some algorithms, analytical performance results are compared with implementation performance results. It is observed that the analysis is very accurate. Performance analysis of parallel algorithms when mapped across clusters is presented. Mappings across clusters illustrate the importance and use of shared as well as distributed memory in achieving high performance. The parameters for evaluation are derived from the characteristics of the parallel algorithms, and these parameters are used to evaluate the alternative communication strategies in NETRA. Furthermore, the effect of communication interference from other processors in the system on the execution of an algorithm is studied. Using the analysis, performance of many algorithms with different characteristics is presented. It is observed that if communication speeds are matched with the computation speeds, good speedups are possible when algorithms are mapped across clusters.

  9. High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures

    KAUST Repository

    Ltaief, Hatem

    2013-04-01

    This article presents a new high-performance bidiagonal reduction (BRD) for homogeneous multicore architectures. This article is an extension of the high-performance tridiagonal reduction implemented by the same authors [Luszczek et al., IPDPS 2011] to the BRD case. The BRD is the first step toward computing the singular value decomposition of a matrix, which is one of the most important algorithms in numerical linear algebra due to its broad impact in computational science. The high performance of the BRD described in this article comes from the combination of four important features: (1) tile algorithms with tile data layout, which provide an efficient data representation in main memory; (2) a two-stage reduction approach that allows to cast most of the computation during the first stage (reduction to band form) into calls to Level 3 BLAS and reduces the memory traffic during the second stage (reduction from band to bidiagonal form) by using high-performance kernels optimized for cache reuse; (3) a data dependence translation layer that maps the general algorithm with column-major data layout into the tile data layout; and (4) a dynamic runtime system that efficiently schedules the newly implemented kernels across the processing units and ensures that the data dependencies are not violated. A detailed analysis is provided to understand the critical impact of the tile size on the total execution time, which also corresponds to the matrix bandwidth size after the reduction of the first stage. The performance results show a significant improvement over currently established alternatives. The new high-performance BRD achieves up to a 30-fold speedup on a 16-core Intel Xeon machine with a 12000×12000 matrix size against the state-of-the-art open source and commercial numerical software packages, namely LAPACK, compiled with optimized and multithreaded BLAS from MKL as well as Intel MKL version 10.2. © 2013 ACM.

  10. The research of contamination regularities of historical buildings and architectural monuments by methods of computer modeling

    Directory of Open Access Journals (Sweden)

    Kuzmichev Andrey A.

    2017-01-01

    Full Text Available Due to the active step of urbanization and rapid development of industry the external appearance of buildings and architectural monuments of urban environment from visual ecology position requires special attention. Dust deposition by polluted atmospheric air is one of the key aspects of degradation of the facades of buildings. With the help of modern computer modeling methods it is possible to evaluate the impact of polluted atmospheric air on the external facades of the buildings in order to save them.

  11. An Integrated Architecture for On-Board Aircraft Engine Performance Trend Monitoring and Gas Path Fault Diagnostics

    Science.gov (United States)

    Simon, Donald L.

    2010-01-01

    Aircraft engine performance trend monitoring and gas path fault diagnostics are closely related technologies that assist operators in managing the health of their gas turbine engine assets. Trend monitoring is the process of monitoring the gradual performance change that an aircraft engine will naturally incur over time due to turbomachinery deterioration, while gas path diagnostics is the process of detecting and isolating the occurrence of any faults impacting engine flow-path performance. Today, performance trend monitoring and gas path fault diagnostic functions are performed by a combination of on-board and off-board strategies. On-board engine control computers contain logic that monitors for anomalous engine operation in real-time. Off-board ground stations are used to conduct fleet-wide engine trend monitoring and fault diagnostics based on data collected from each engine each flight. Continuing advances in avionics are enabling the migration of portions of the ground-based functionality on-board, giving rise to more sophisticated on-board engine health management capabilities. This paper reviews the conventional engine performance trend monitoring and gas path fault diagnostic architecture commonly applied today, and presents a proposed enhanced on-board architecture for future applications. The enhanced architecture gains real-time access to an expanded quantity of engine parameters, and provides advanced on-board model-based estimation capabilities. The benefits of the enhanced architecture include the real-time continuous monitoring of engine health, the early diagnosis of fault conditions, and the estimation of unmeasured engine performance parameters. A future vision to advance the enhanced architecture is also presented and discussed

  12. Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters

    Science.gov (United States)

    Younge, Andrew J.

    2016-01-01

    With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities, and the many novel computing paradigms available for…

  13. Design and development of a run-time monitor for multi-core architectures in cloud computing.

    Science.gov (United States)

    Kang, Mikyung; Kang, Dong-In; Crago, Stephen P; Park, Gyung-Leen; Lee, Junghoon

    2011-01-01

    Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM) which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.

  14. Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing

    Directory of Open Access Journals (Sweden)

    Junghoon Lee

    2011-03-01

    Full Text Available Cloud computing is a new information technology trend that moves computing and data away from desktops and portable PCs into large data centers. The basic principle of cloud computing is to deliver applications as services over the Internet as well as infrastructure. A cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualized computers that are dynamically provisioned and presented as one or more unified computing resources. The large-scale distributed applications on a cloud require adaptive service-based software, which has the capability of monitoring system status changes, analyzing the monitored information, and adapting its service configuration while considering tradeoffs among multiple QoS features simultaneously. In this paper, we design and develop a Run-Time Monitor (RTM which is a system software to monitor the application behavior at run-time, analyze the collected information, and optimize cloud computing resources for multi-core architectures. RTM monitors application software through library instrumentation as well as underlying hardware through a performance counter optimizing its computing configuration based on the analyzed data.

  15. Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

    Science.gov (United States)

    Moon, Hongsik

    What is the impact of multicore and associated advanced technologies on computational software for science? Most researchers and students have multicore laptops or desktops for their research and they need computing power to run computational software packages. Computing power was initially derived from Central Processing Unit (CPU) clock speed. That changed when increases in clock speed became constrained by power requirements. Chip manufacturers turned to multicore CPU architectures and associated technological advancements to create the CPUs for the future. Most software applications benefited by the increased computing power the same way that increases in clock speed helped applications run faster. However, for Computational ElectroMagnetics (CEM) software developers, this change was not an obvious benefit - it appeared to be a detriment. Developers were challenged to find a way to correctly utilize the advancements in hardware so that their codes could benefit. The solution was parallelization and this dissertation details the investigation to address these challenges. Prior to multicore CPUs, advanced computer technologies were compared with the performance using benchmark software and the metric was FLoting-point Operations Per Seconds (FLOPS) which indicates system performance for scientific applications that make heavy use of floating-point calculations. Is FLOPS an effective metric for parallelized CEM simulation tools on new multicore system? Parallel CEM software needs to be benchmarked not only by FLOPS but also by the performance of other parameters related to type and utilization of the hardware, such as CPU, Random Access Memory (RAM), hard disk, network, etc. The codes need to be optimized for more than just FLOPs and new parameters must be included in benchmarking. In this dissertation, the parallel CEM software named High Order Basis Based Integral Equation Solver (HOBBIES) is introduced. This code was developed to address the needs of the

  16. HONEI: A collection of libraries for numerical computations targeting multiple processor architectures

    Science.gov (United States)

    van Dyk, Danny; Geveler, Markus; Mallach, Sven; Ribbrock, Dirk; Göddeke, Dominik; Gutwenger, Carsten

    2009-12-01

    We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI's libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI's SSE backend, and additional 3-4 and 4-16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development. Program summaryProgram title: HONEI Catalogue identifier: AEDW_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDW_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPLv2 No. of lines in distributed program, including test data, etc.: 216 180 No. of bytes in distributed program, including test data, etc.: 1 270 140 Distribution format: tar.gz Programming language: C++ Computer: x86, x86_64, NVIDIA CUDA GPUs, Cell blades and PlayStation 3 Operating system: Linux RAM: at least 500 MB free Classification: 4.8, 4.3, 6.1 External routines: SSE: none; [1] for GPU, [2] for Cell backend Nature of problem: Computational science in general and numerical simulation in particular have reached a turning point. The revolution developers are facing is not primarily driven by a change in (problem-specific) methodology, but rather by the fundamental paradigm shift of the

  17. A Conceptual Architecture for Adaptive Human-Computer Interface of a PT Operation Platform Based on Context-Awareness

    Directory of Open Access Journals (Sweden)

    Qing Xue

    2014-01-01

    Full Text Available We present a conceptual architecture for adaptive human-computer interface of a PT operation platform based on context-awareness. This architecture will form the basis of design for such an interface. This paper describes components, key technologies, and working principles of the architecture. The critical contents covered context information modeling, processing, relationship establishing between contexts and interface design knowledge by use of adaptive knowledge reasoning, and visualization implementing of adaptive interface with the aid of interface tools technology.

  18. Proposing Hybrid Architecture to Implement Cloud Computing in Higher Education Institutions Using a Meta-synthesis Appro

    Directory of Open Access Journals (Sweden)

    hamid reza bazi

    2017-12-01

    Full Text Available Cloud computing is a new technology that considerably helps Higher Education Institutions (HEIs to develop and create competitive advantage with inherent characteristics such as flexibility, scalability, accessibility, reliability, fault tolerant and economic efficiency. Due to the numerous advantages of cloud computing, and in order to take advantage of cloud computing infrastructure, services of universities and HEIs need to migrate to the cloud. However, this transition involves many challenges, one of which is lack or shortage of appropriate architecture for migration to the technology. Using a reliable architecture for migration ensures managers to mitigate risks in the cloud computing technology. Therefore, organizations always search for suitable cloud computing architecture. In previous studies, these important features have received less attention and have not been achieved in a comprehensive way. The aim of this study is to use a meta-synthesis method for the first time to analyze the previously published studies and to suggest appropriate hybrid cloud migration architecture (IUHEC. We reviewed many papers from relevant journals and conference proceedings. The concepts extracted from these papers are classified to related categories and sub-categories. Then, we developed our proposed hybrid architecture based on these concepts and categories. The proposed architecture was validated by a panel of experts and Lawshe’s model was used to determine the content validity. Due to its innovative yet user-friendly nature, comprehensiveness, and high security, this architecture can help HEIs have an effective migration to cloud computing environment.

  19. Molecular architectures based on π-conjugated block copolymers for global quantum computation

    International Nuclear Information System (INIS)

    Mujica Martinez, C A; Arce, J C; Reina, J H; Thorwart, M

    2009-01-01

    We propose a molecular setup for the physical implementation of a barrier global quantum computation scheme based on the electron-doped π-conjugated copolymer architecture of nine blocks PPP-PDA-PPP-PA-(CCH-acene)-PA-PPP-PDA-PPP (where each block is an oligomer). The physical carriers of information are electrons coupled through the Coulomb interaction, and the building block of the computing architecture is composed by three adjacent qubit systems in a quasi-linear arrangement, each of them allowing qubit storage, but with the central qubit exhibiting a third accessible state of electronic energy far away from that of the qubits' transition energy. The third state is reached from one of the computational states by means of an on-resonance coherent laser field, and acts as a barrier mechanism for the direct control of qubit entanglement. Initial estimations of the spontaneous emission decay rates associated to the energy level structure allow us to compute a damping rate of order 10 -7 s, which suggest a not so strong coupling to the environment. Our results offer an all-optical, scalable, proposal for global quantum computing based on semiconducting π-conjugated polymers.

  20. Molecular architectures based on pi-conjugated block copolymers for global quantum computation

    Energy Technology Data Exchange (ETDEWEB)

    Mujica Martinez, C A; Arce, J C [Universidad del Valle, Departamento de QuImica, A. A. 25360, Cali (Colombia); Reina, J H [Universidad del Valle, Departamento de Fisica, A. A. 25360, Cali (Colombia); Thorwart, M, E-mail: camujica@univalle.edu.c, E-mail: j.reina-estupinan@physics.ox.ac.u, E-mail: jularce@univalle.edu.c [Institut fuer Theoretische Physik IV, Heinrich-Heine-Universitaet Duesseldorf, 40225 Duesseldorf (Germany)

    2009-05-01

    We propose a molecular setup for the physical implementation of a barrier global quantum computation scheme based on the electron-doped pi-conjugated copolymer architecture of nine blocks PPP-PDA-PPP-PA-(CCH-acene)-PA-PPP-PDA-PPP (where each block is an oligomer). The physical carriers of information are electrons coupled through the Coulomb interaction, and the building block of the computing architecture is composed by three adjacent qubit systems in a quasi-linear arrangement, each of them allowing qubit storage, but with the central qubit exhibiting a third accessible state of electronic energy far away from that of the qubits' transition energy. The third state is reached from one of the computational states by means of an on-resonance coherent laser field, and acts as a barrier mechanism for the direct control of qubit entanglement. Initial estimations of the spontaneous emission decay rates associated to the energy level structure allow us to compute a damping rate of order 10{sup -7} s, which suggest a not so strong coupling to the environment. Our results offer an all-optical, scalable, proposal for global quantum computing based on semiconducting pi-conjugated polymers.

  1. Architectural approach to the energy performance of buildings in a hot-dry climate with special reference to Egypt

    Energy Technology Data Exchange (ETDEWEB)

    Hamdy, I F

    1986-01-01

    A thesis is presented on the changing approach to architectural design of buildings in a hot, dry climate in view of the increased recognition of the importance of energy efficiency. The thermal performance of buildings in Egypt is used as an example and the nature of the local climate and human requirements are also studied. Other effects on the thermal performance considered include building form, orientation and surrounding conditions. An evaluative computer model is constructed and its applications allow the prediction on the energy performance of changing design parameters.

  2. Function Follows Performance in Evolutionary Computational Processing

    DEFF Research Database (Denmark)

    Pasold, Anke; Foged, Isak Worre

    2011-01-01

    As the title ‘Function Follows Performance in Evolutionary Computational Processing’ suggests, this paper explores the potentials of employing multiple design and evaluation criteria within one processing model in order to account for a number of performative parameters desired within varied...

  3. Fog Computing and Edge Computing Architectures for Processing Data From Diabetes Devices Connected to the Medical Internet of Things.

    Science.gov (United States)

    Klonoff, David C

    2017-07-01

    The Internet of Things (IoT) is generating an immense volume of data. With cloud computing, medical sensor and actuator data can be stored and analyzed remotely by distributed servers. The results can then be delivered via the Internet. The number of devices in IoT includes such wireless diabetes devices as blood glucose monitors, continuous glucose monitors, insulin pens, insulin pumps, and closed-loop systems. The cloud model for data storage and analysis is increasingly unable to process the data avalanche, and processing is being pushed out to the edge of the network closer to where the data-generating devices are. Fog computing and edge computing are two architectures for data handling that can offload data from the cloud, process it nearby the patient, and transmit information machine-to-machine or machine-to-human in milliseconds or seconds. Sensor data can be processed near the sensing and actuating devices with fog computing (with local nodes) and with edge computing (within the sensing devices). Compared to cloud computing, fog computing and edge computing offer five advantages: (1) greater data transmission speed, (2) less dependence on limited bandwidths, (3) greater privacy and security, (4) greater control over data generated in foreign countries where laws may limit use or permit unwanted governmental access, and (5) lower costs because more sensor-derived data are used locally and less data are transmitted remotely. Connected diabetes devices almost all use fog computing or edge computing because diabetes patients require a very rapid response to sensor input and cannot tolerate delays for cloud computing.

  4. Effects of classrooms’ architecture on academic performance in view of telic versus paratelic motivation: a review

    NARCIS (Netherlands)

    Lewinski, P.

    2015-01-01

    This mini literature review analyzes research papers from many countries that directly or indirectly test how classrooms’ architecture influences academic performance. These papers evaluate and explain specific characteristics of classrooms, with an emphasis on how they affect learning processes and

  5. Molecular computing towards a novel computing architecture for complex problem solving

    CERN Document Server

    Chang, Weng-Long

    2014-01-01

    This textbook introduces a concise approach to the design of molecular algorithms for students or researchers who are interested in dealing with complex problems. Through numerous examples and exercises, you will understand the main difference of molecular circuits and traditional digital circuits to manipulate the same problem and you will also learn how to design a molecular algorithm of solving any a problem from start to finish. The book starts with an introduction to computational aspects of digital computers and molecular computing, data representation of molecular computing, molecular operations of molecular computing and number representation of molecular computing, and provides many molecular algorithm to construct the parity generator and the parity checker of error-detection codes on digital communication, to encode integers of different formats, single precision and double precision of floating-point numbers, to implement addition and subtraction of unsigned integers, to construct logic operations...

  6. AHPCRC - Army High Performance Computing Research Center

    Science.gov (United States)

    2010-01-01

    computing. Of particular interest is the ability of a distrib- uted jamming network (DJN) to jam signals in all or part of a sensor or communications net...and reasoning, assistive technologies. FRIEDRICH (FRITZ) PRINZ Finmeccanica Professor of Engineering, Robert Bosch Chair, Department of Engineering...High Performance Computing Research Center www.ahpcrc.org BARBARA BRYAN AHPCRC Research and Outreach Manager, HPTi (650) 604-3732 bbryan@hpti.com Ms

  7. DURIP: High Performance Computing in Biomathematics Applications

    Science.gov (United States)

    2017-05-10

    Mathematics and Statistics (AMS) at the University of California, Santa Cruz (UCSC) to conduct research and research-related education in areas of...Computing in Biomathematics Applications Report Title The goal of this award was to enhance the capabilities of the Department of Applied Mathematics and...DURIP: High Performance Computing in Biomathematics Applications The goal of this award was to enhance the capabilities of the Department of Applied

  8. Selection of an optimal neural network architecture for computer-aided detection of microcalcifications - Comparison of automated optimization techniques

    International Nuclear Information System (INIS)

    Gurcan, Metin N.; Sahiner, Berkman; Chan Heangping; Hadjiiski, Lubomir; Petrick, Nicholas

    2001-01-01

    Many computer-aided diagnosis (CAD) systems use neural networks (NNs) for either detection or classification of abnormalities. Currently, most NNs are 'optimized' by manual search in a very limited parameter space. In this work, we evaluated the use of automated optimization methods for selecting an optimal convolution neural network (CNN) architecture. Three automated methods, the steepest descent (SD), the simulated annealing (SA), and the genetic algorithm (GA), were compared. We used as an example the CNN that classifies true and false microcalcifications detected on digitized mammograms by a prescreening algorithm. Four parameters of the CNN architecture were considered for optimization, the numbers of node groups and the filter kernel sizes in the first and second hidden layers, resulting in a search space of 432 possible architectures. The area A z under the receiver operating characteristic (ROC) curve was used to design a cost function. The SA experiments were conducted with four different annealing schedules. Three different parent selection methods were compared for the GA experiments. An available data set was split into two groups with approximately equal number of samples. By using the two groups alternately for training and testing, two different cost surfaces were evaluated. For the first cost surface, the SD method was trapped in a local minimum 91% (392/432) of the time. The SA using the Boltzman schedule selected the best architecture after evaluating, on average, 167 architectures. The GA achieved its best performance with linearly scaled roulette-wheel parent selection; however, it evaluated 391 different architectures, on average, to find the best one. The second cost surface contained no local minimum. For this surface, a simple SD algorithm could quickly find the global minimum, but the SA with the very fast reannealing schedule was still the most efficient. The same SA scheme, however, was trapped in a local minimum on the first cost

  9. Developing a New Framework for Integration and Teaching of Computer Aided Architectural Design (CAAD) in Nigerian Schools of Architecture

    Science.gov (United States)

    Uwakonye, Obioha; Alagbe, Oluwole; Oluwatayo, Adedapo; Alagbe, Taiye; Alalade, Gbenga

    2015-01-01

    As a result of globalization of digital technology, intellectual discourse on what constitutes the basic body of architectural knowledge to be imparted to future professionals has been on the increase. This digital revolution has brought to the fore the need to review the already overloaded architectural education curriculum of Nigerian schools of…

  10. Computer technique for evaluating collimator performance

    International Nuclear Information System (INIS)

    Rollo, F.D.

    1975-01-01

    A computer program has been developed to theoretically evaluate the overall performance of collimators used with radioisotope scanners and γ cameras. The first step of the program involves the determination of the line spread function (LSF) and geometrical efficiency from the fundamental parameters of the collimator being evaluated. The working equations can be applied to any plane of interest. The resulting LSF is applied to subroutine computer programs which compute corresponding modulation transfer function and contrast efficiency functions. The latter function is then combined with appropriate geometrical efficiency data to determine the performance index function. The overall computer program allows one to predict from the physical parameters of the collimator alone how well the collimator will reproduce various sized spherical voids of activity in the image plane. The collimator performance program can be used to compare the performance of various collimator types, to study the effects of source depth on collimator performance, and to assist in the design of collimators. The theory of the collimator performance equation is discussed, a comparison between the experimental and theoretical LSF values is made, and examples of the application of the technique are presented

  11. Development of a Computer Architecture to Support the Optical Plume Anomaly Detection (OPAD) System

    Science.gov (United States)

    Katsinis, Constantine

    1996-01-01

    to execute the software in a modern single-processor workstation, and therefore real-time operation is currently not possible. A different number of iterations may be required to perform spectral data fitting per spectral sample. Yet, the OPAD system must be designed to maintain real-time performance in all cases. Although faster single-processor workstations are available for execution of the fitting and SPECTRA software, this option is unattractive due to the excessive cost associated with very fast workstations and also due to the fact that such hardware is not easily expandable to accommodate future versions of the software which may require more processing power. Initial research has already demonstrated that the OPAD software can take advantage of a parallel computer architecture to achieve the necessary speedup. Current work has improved the software by converting it into a form which is easily parallelizable. Timing experiments have been performed to establish the computational complexity and execution speed of major components of the software. This work provides the foundation of future work which will create a fully parallel version of the software executing in a shared-memory multiprocessor system.

  12. An FPGA-Based Quantum Computing Emulation Framework Based on Serial-Parallel Architecture

    Directory of Open Access Journals (Sweden)

    Y. H. Lee

    2016-01-01

    Full Text Available Hardware emulation of quantum systems can mimic more efficiently the parallel behaviour of quantum computations, thus allowing higher processing speed-up than software simulations. In this paper, an efficient hardware emulation method that employs a serial-parallel hardware architecture targeted for field programmable gate array (FPGA is proposed. Quantum Fourier transform and Grover’s search are chosen as case studies in this work since they are the core of many useful quantum algorithms. Experimental work shows that, with the proposed emulation architecture, a linear reduction in resource utilization is attained against the pipeline implementations proposed in prior works. The proposed work contributes to the formulation of a proof-of-concept baseline FPGA emulation framework with optimization on datapath designs that can be extended to emulate practical large-scale quantum circuits.

  13. 14th annual Results and Review Workshop on High Performance Computing in Science and Engineering

    CERN Document Server

    Nagel, Wolfgang E; Resch, Michael M; Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2011; High Performance Computing in Science and Engineering '11

    2012-01-01

    This book presents the state-of-the-art in simulation on supercomputers. Leading researchers present results achieved on systems of the High Performance Computing Center Stuttgart (HLRS) for the year 2011. The reports cover all fields of computational science and engineering, ranging from CFD to computational physics and chemistry, to computer science, with a special emphasis on industrially relevant applications. Presenting results for both vector systems and microprocessor-based systems, the book allows readers to compare the performance levels and usability of various architectures. As HLRS

  14. Misleading Performance Claims in Parallel Computations

    Energy Technology Data Exchange (ETDEWEB)

    Bailey, David H.

    2009-05-29

    In a previous humorous note entitled 'Twelve Ways to Fool the Masses,' I outlined twelve common ways in which performance figures for technical computer systems can be distorted. In this paper and accompanying conference talk, I give a reprise of these twelve 'methods' and give some actual examples that have appeared in peer-reviewed literature in years past. I then propose guidelines for reporting performance, the adoption of which would raise the level of professionalism and reduce the level of confusion, not only in the world of device simulation but also in the larger arena of technical computing.

  15. Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Aktulga, Hasan Metin [Michigan State Univ., East Lansing, MI (United States); Coffman, Paul [Argonne National Lab. (ANL), Argonne, IL (United States); Shan, Tzu-Ray [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Knight, Chris [Argonne National Lab. (ANL), Argonne, IL (United States); Jiang, Wei [Argonne National Lab. (ANL), Argonne, IL (United States)

    2015-12-01

    Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups in the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.

  16. FY1995 study of design methodology and environment of high-performance processor architectures; 1995 nendo koseino processor architecture sekkeiho to sekkei kankyo no kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-03-01

    The aim of our project is to develop high-performance processor architectures for both general purpose and application-specific purpose. We also plan to develop basic softwares, such as compliers, and various design aid tools for those architectures. We are particularly interested in performance evaluation at architecture design phase, design optimization, automatic generation of compliers from processor designs, and architecture design methodologies combined with circuit layout. We have investigated both microprocessor architectures and design methodologies / environments for the processors. Our goal is to establish design technologies for high-performance, low-power, low-cost and highly-reliable systems in system-on-silicon era. We have proposed PPRAM architecture for high-performance system using DRAM and logic mixture technology, Softcore processor architecture for special purpose processors in embedded systems, and Power-Pro architecture for low power systems. We also developed design methodologies and design environments for the above architectures as well as a new method for design verification of microprocessors. (NEDO)

  17. Architecture and performance of the new CESR control system

    International Nuclear Information System (INIS)

    Strohman, C.R.; Peck, S.B.

    1989-01-01

    The new control system for the Cornell Electron Storage Ring (CESR) is based on a multi-port memory which can be accessed by many computers. The computers are either VAXes, which run user programs, or Xbus Processors, which move data to and from the hardware devices which are being monitored or controlled. The control system database is in the multi-port memory, and contains all of the data needed to communicate with various pieces of hardware. 1 fig

  18. High-performance computing for airborne applications

    International Nuclear Information System (INIS)

    Quinn, Heather M.; Manuzatto, Andrea; Fairbanks, Tom; Dallmann, Nicholas; Desgeorges, Rose

    2010-01-01

    Recently, there has been attempts to move common satellite tasks to unmanned aerial vehicles (UAVs). UAVs are significantly cheaper to buy than satellites and easier to deploy on an as-needed basis. The more benign radiation environment also allows for an aggressive adoption of state-of-the-art commercial computational devices, which increases the amount of data that can be collected. There are a number of commercial computing devices currently available that are well-suited to high-performance computing. These devices range from specialized computational devices, such as field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), to traditional computing platforms, such as microprocessors. Even though the radiation environment is relatively benign, these devices could be susceptible to single-event effects. In this paper, we will present radiation data for high-performance computing devices in a accelerated neutron environment. These devices include a multi-core digital signal processor, two field-programmable gate arrays, and a microprocessor. From these results, we found that all of these devices are suitable for many airplane environments without reliability problems.

  19. InfoMall: An Innovative Strategy for High-Performance Computing and Communications Applications Development.

    Science.gov (United States)

    Mills, Kim; Fox, Geoffrey

    1994-01-01

    Describes the InfoMall, a program led by the Northeast Parallel Architectures Center (NPAC) at Syracuse University (New York). The InfoMall features a partnership of approximately 24 organizations offering linked programs in High Performance Computing and Communications (HPCC) technology integration, software development, marketing, education and…

  20. Parallel processing algorithms for hydrocodes on a computer with MIMD architecture (DENELCOR's HEP)

    International Nuclear Information System (INIS)

    Hicks, D.L.

    1983-11-01

    In real time simulation/prediction of complex systems such as water-cooled nuclear reactors, if reactor operators had fast simulator/predictors to check the consequences of their operations before implementing them, events such as the incident at Three Mile Island might be avoided. However, existing simulator/predictors such as RELAP run slower than real time on serial computers. It appears that the only way to overcome the barrier to higher computing rates is to use computers with architectures that allow concurrent computations or parallel processing. The computer architecture with the greatest degree of parallelism is labeled Multiple Instruction Stream, Multiple Data Stream (MIMD). An example of a machine of this type is the HEP computer by DENELCOR. It appears that hydrocodes are very well suited for parallelization on the HEP. It is a straightforward exercise to parallelize explicit, one-dimensional Lagrangean hydrocodes in a zone-by-zone parallelization. Similarly, implicit schemes can be parallelized in a zone-by-zone fashion via an a priori, symbolic inversion of the tridiagonal matrix that arises in an implicit scheme. These techniques are extended to Eulerian hydrocodes by using Harlow's rezone technique. The extension from single-phase Eulerian to two-phase Eulerian is straightforward. This step-by-step extension leads to hydrocodes with zone-by-zone parallelization that are capable of two-phase flow simulation. Extensions to two and three spatial dimensions can be achieved by operator splitting. It appears that a zone-by-zone parallelization is the best way to utilize the capabilities of an MIMD machine. 40 references

  1. High performance parallel computers for science

    International Nuclear Information System (INIS)

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

    1989-01-01

    This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction

  2. An Architecture of IoT Service Delegation and Resource Allocation Based on Collaboration between Fog and Cloud Computing

    Directory of Open Access Journals (Sweden)

    Aymen Abdullah Alsaffar

    2016-01-01

    Full Text Available Despite the wide utilization of cloud computing (e.g., services, applications, and resources, some of the services, applications, and smart devices are not able to fully benefit from this attractive cloud computing paradigm due to the following issues: (1 smart devices might be lacking in their capacity (e.g., processing, memory, storage, battery, and resource allocation, (2 they might be lacking in their network resources, and (3 the high network latency to centralized server in cloud might not be efficient for delay-sensitive application, services, and resource allocations requests. Fog computing is promising paradigm that can extend cloud resources to edge of network, solving the abovementioned issue. As a result, in this work, we propose an architecture of IoT service delegation and resource allocation based on collaboration between fog and cloud computing. We provide new algorithm that is decision rules of linearized decision tree based on three conditions (services size, completion time, and VMs capacity for managing and delegating user request in order to balance workload. Moreover, we propose algorithm to allocate resources to meet service level agreement (SLA and quality of services (QoS as well as optimizing big data distribution in fog and cloud computing. Our simulation result shows that our proposed approach can efficiently balance workload, improve resource allocation efficiently, optimize big data distribution, and show better performance than other existing methods.

  3. Performance evaluation of throughput computing workloads using multi-core processors and graphics processors

    Science.gov (United States)

    Dave, Gaurav P.; Sureshkumar, N.; Blessy Trencia Lincy, S. S.

    2017-11-01

    Current trend in processor manufacturing focuses on multi-core architectures rather than increasing the clock speed for performance improvement. Graphic processors have become as commodity hardware for providing fast co-processing in computer systems. Developments in IoT, social networking web applications, big data created huge demand for data processing activities and such kind of throughput intensive applications inherently contains data level parallelism which is more suited for SIMD architecture based GPU. This paper reviews the architectural aspects of multi/many core processors and graphics processors. Different case studies are taken to compare performance of throughput computing applications using shared memory programming in OpenMP and CUDA API based programming.

  4. High performance matrix inversion based on LU factorization for multicore architectures

    KAUST Repository

    Dongarra, Jack

    2011-01-01

    The goal of this paper is to present an efficient implementation of an explicit matrix inversion of general square matrices on multicore computer architecture. The inversion procedure is split into four steps: 1) computing the LU factorization, 2) inverting the upper triangular U factor, 3) solving a linear system, whose solution yields inverse of the original matrix and 4) applying backward column pivoting on the inverted matrix. Using a tile data layout, which represents the matrix in the system memory with an optimized cache-aware format, the computation of the four steps is decomposed into computational tasks. A directed acyclic graph is generated on the fly which represents the program data flow. Its nodes represent tasks and edges the data dependencies between them. Previous implementations of matrix inversions, available in the state-of-the-art numerical libraries, are suffer from unnecessary synchronization points, which are non-existent in our implementation in order to fully exploit the parallelism of the underlying hardware. Our algorithmic approach allows to remove these bottlenecks and to execute the tasks with loose synchronization. A runtime environment system called QUARK is necessary to dynamically schedule our numerical kernels on the available processing units. The reported results from our LU-based matrix inversion implementation significantly outperform the state-of-the-art numerical libraries such as LAPACK (5x), MKL (5x) and ScaLAPACK (2.5x) on a contemporary AMD platform with four sockets and the total of 48 cores for a matrix of size 24000. A power consumption analysis shows that our high performance implementation is also energy efficient and substantially consumes less power than its competitors. © 2011 ACM.

  5. 3D-SoftChip: A Novel Architecture for Next-Generation Adaptive Computing Systems

    Directory of Open Access Journals (Sweden)

    Lee Mike Myung-Ok

    2006-01-01

    Full Text Available This paper introduces a novel architecture for next-generation adaptive computing systems, which we term 3D-SoftChip. The 3D-SoftChip is a 3-dimensional (3D vertically integrated adaptive computing system combining state-of-the-art processing and 3D interconnection technology. It comprises the vertical integration of two chips (a configurable array processor and an intelligent configurable switch through an indium bump interconnection array (IBIA. The configurable array processor (CAP is an array of heterogeneous processing elements (PEs, while the intelligent configurable switch (ICS comprises a switch block, 32-bit dedicated RISC processor for control, on-chip program/data memory, data frame buffer, along with a direct memory access (DMA controller. This paper introduces the novel 3D-SoftChip architecture for real-time communication and multimedia signal processing as a next-generation computing system. The paper further describes the advanced HW/SW codesign and verification methodology, including high-level system modeling of the 3D-SoftChip using SystemC, being used to determine the optimum hardware specification in the early design stage.

  6. Hybrid Cloud Computing Architecture Optimization by Total Cost of Ownership Criterion

    Directory of Open Access Journals (Sweden)

    Elena Valeryevna Makarenko

    2014-12-01

    Full Text Available Achieving the goals of information security is a key factor in the decision to outsource information technology and, in particular, to decide on the migration of organizational data, applications, and other resources to the infrastructure, based on cloud computing. And the key issue in the selection of optimal architecture and the subsequent migration of business applications and data to the cloud organization information environment is the question of the total cost of ownership of IT infrastructure. This paper focuses on solving the problem of minimizing the total cost of ownership cloud.

  7. Efficient reconfigurable hardware architecture for accurately computing success probability and data complexity of linear attacks

    DEFF Research Database (Denmark)

    Bogdanov, Andrey; Kavun, Elif Bilge; Tischhauser, Elmar

    2012-01-01

    An accurate estimation of the success probability and data complexity of linear cryptanalysis is a fundamental question in symmetric cryptography. In this paper, we propose an efficient reconfigurable hardware architecture to compute the success probability and data complexity of Matsui's Algorithm...... block lengths ensures that any empirical observations are not due to differences in statistical behavior for artificially small block lengths. Rather surprisingly, we observed in previous experiments a significant deviation between the theory and practice for Matsui's Algorithm 2 for larger block sizes...

  8. Concept of a computer network architecture for complete automation of nuclear power plants

    International Nuclear Information System (INIS)

    Edwards, R.M.; Ray, A.

    1990-01-01

    The state of the art in automation of nuclear power plants has been largely limited to computerized data acquisition, monitoring, display, and recording of process signals. Complete automation of nuclear power plants, which would include plant operations, control, and management, fault diagnosis, and system reconfiguration with efficient and reliable man/machine interactions, has been projected as a realistic goal. This paper presents the concept of a computer network architecture that would use a high-speed optical data highway to integrate diverse, interacting, and spatially distributed functions that are essential for a fully automated nuclear power plant

  9. Computation studies into architecture and energy transfer properties of photosynthetic units from filamentous anoxygenic phototrophs

    Energy Technology Data Exchange (ETDEWEB)

    Linnanto, Juha Matti [Institute of Physics, University of Tartu, Riia 142, 51014 Tartu (Estonia); Freiberg, Arvi [Institute of Physics, University of Tartu, Riia 142, 51014 Tartu, Estonia and Institute of Molecular and Cell Biology, University of Tartu, Riia 23, 51010 Tartu (Estonia)

    2014-10-06

    We have used different computational methods to study structural architecture, and light-harvesting and energy transfer properties of the photosynthetic unit of filamentous anoxygenic phototrophs. Due to the huge number of atoms in the photosynthetic unit, a combination of atomistic and coarse methods was used for electronic structure calculations. The calculations reveal that the light energy absorbed by the peripheral chlorosome antenna complex transfers efficiently via the baseplate and the core B808–866 antenna complexes to the reaction center complex, in general agreement with the present understanding of this complex system.

  10. Automated Improvement of Software Architecture Models for Performance and Other Quality Attributes

    OpenAIRE

    Koziolek, Anne

    2013-01-01

    Quality attributes, such as performance or reliability, are crucial for the success of a software system and largely influenced by the software architecture. Their quantitative prediction supports systematic, goal-oriented software design and forms a base of an engineering approach to software design. This thesis proposes a method and tool to automatically improve component-based software architecture (CBA) models based on such quantitative quality prediction techniques.

  11. High-performance computing in seismology

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1996-09-01

    The scientific, technical, and economic importance of the issues discussed here presents a clear agenda for future research in computational seismology. In this way these problems will drive advances in high-performance computing in the field of seismology. There is a broad community that will benefit from this work, including the petroleum industry, research geophysicists, engineers concerned with seismic hazard mitigation, and governments charged with enforcing a comprehensive test ban treaty. These advances may also lead to new applications for seismological research. The recent application of high-resolution seismic imaging of the shallow subsurface for the environmental remediation industry is an example of this activity. This report makes the following recommendations: (1) focused efforts to develop validated documented software for seismological computations should be supported, with special emphasis on scalable algorithms for parallel processors; (2) the education of seismologists in high-performance computing technologies and methodologies should be improved; (3) collaborations between seismologists and computational scientists and engineers should be increased; (4) the infrastructure for archiving, disseminating, and processing large volumes of seismological data should be improved.

  12. High performance computing in linear control

    International Nuclear Information System (INIS)

    Datta, B.N.

    1993-01-01

    Remarkable progress has been made in both theory and applications of all important areas of control. The theory is rich and very sophisticated. Some beautiful applications of control theory are presently being made in aerospace, biomedical engineering, industrial engineering, robotics, economics, power systems, etc. Unfortunately, the same assessment of progress does not hold in general for computations in control theory. Control Theory is lagging behind other areas of science and engineering in this respect. Nowadays there is a revolution going on in the world of high performance scientific computing. Many powerful computers with vector and parallel processing have been built and have been available in recent years. These supercomputers offer very high speed in computations. Highly efficient software, based on powerful algorithms, has been developed to use on these advanced computers, and has also contributed to increased performance. While workers in many areas of science and engineering have taken great advantage of these hardware and software developments, control scientists and engineers, unfortunately, have not been able to take much advantage of these developments

  13. The Activity-Based Computing Project - A Software Architecture for Pervasive Computing Final Report

    DEFF Research Database (Denmark)

    Bardram, Jakob Eyvind

    . Special attention should be drawn to publication [25], which gives an overview of the ABC project to the IEEE Pervasive Computing community; the ACM CHI 2006 [19] paper that documents the implementation of the ABC technology; and the ACM ToCHI paper [12], which is the main publication of the project......, documenting all of the project’s four objectives. All of these publication venues are top-tier journals and conferences within computer science. From a business perspective, the project had the objective of incorporating relevant parts of the ABC technology into the products of Medical Insight, which has been...... done. Moreover, partly based on the research done in the ABC project, the company Cetrea A/S has been founded, which incorporate ABC concepts and technologies in its products. The concepts of activity-based computing have also been researched in cooperation with IBM Research, and the ABC project has...

  14. Transportable GPU (General Processor Units) chip set technology for standard computer architectures

    Science.gov (United States)

    Fosdick, R. E.; Denison, H. C.

    1982-11-01

    The USAFR-developed GPU Chip Set has been utilized by Tracor to implement both USAF and Navy Standard 16-Bit Airborne Computer Architectures. Both configurations are currently being delivered into DOD full-scale development programs. Leadless Hermetic Chip Carrier packaging has facilitated implementation of both architectures on single 41/2 x 5 substrates. The CMOS and CMOS/SOS implementations of the GPU Chip Set have allowed both CPU implementations to use less than 3 watts of power each. Recent efforts by Tracor for USAF have included the definition of a next-generation GPU Chip Set that will retain the application-proven architecture of the current chip set while offering the added cost advantages of transportability across ISO-CMOS and CMOS/SOS processes and across numerous semiconductor manufacturers using a newly-defined set of common design rules. The Enhanced GPU Chip Set will increase speed by an approximate factor of 3 while significantly reducing chip counts and costs of standard CPU implementations.

  15. Computational Architecture of a Robot Coach for Physical Exercises in Kinesthetic Rehabilitation

    OpenAIRE

    Nguyen , Sao Mai; Tanguy , Philippe; Rémy-Néris , Olivier

    2016-01-01

    International audience; The rising number of the elderly incurs growing concern about healthcare, and in particular rehabilitation healthcare. Assistive technology and and assistive robotics in particular may help to improve this process. We develop a robot coach capable of demonstrating rehabilitation exercises to patients, watch a patient carry out the exercises and give him feedback so as to improve his performance and encourage him. We propose a general software architecture for our robot...

  16. HPCToolkit: performance tools for scientific computing

    Energy Technology Data Exchange (ETDEWEB)

    Tallent, N; Mellor-Crummey, J; Adhianto, L; Fagan, M; Krentel, M [Department of Computer Science, Rice University, Houston, TX 77005 (United States)

    2008-07-15

    As part of the U.S. Department of Energy's Scientific Discovery through Advanced Computing (SciDAC) program, science teams are tackling problems that require simulation and modeling on petascale computers. As part of activities associated with the SciDAC Center for Scalable Application Development Software (CScADS) and the Performance Engineering Research Institute (PERI), Rice University is building software tools for performance analysis of scientific applications on the leadership-class platforms. In this poster abstract, we briefly describe the HPCToolkit performance tools and how they can be used to pinpoint bottlenecks in SPMD and multi-threaded parallel codes. We demonstrate HPCToolkit's utility by applying it to two SciDAC applications: the S3D code for simulation of turbulent combustion and the MFDn code for ab initio calculations of microscopic structure of nuclei.

  17. HPCToolkit: performance tools for scientific computing

    International Nuclear Information System (INIS)

    Tallent, N; Mellor-Crummey, J; Adhianto, L; Fagan, M; Krentel, M

    2008-01-01

    As part of the U.S. Department of Energy's Scientific Discovery through Advanced Computing (SciDAC) program, science teams are tackling problems that require simulation and modeling on petascale computers. As part of activities associated with the SciDAC Center for Scalable Application Development Software (CScADS) and the Performance Engineering Research Institute (PERI), Rice University is building software tools for performance analysis of scientific applications on the leadership-class platforms. In this poster abstract, we briefly describe the HPCToolkit performance tools and how they can be used to pinpoint bottlenecks in SPMD and multi-threaded parallel codes. We demonstrate HPCToolkit's utility by applying it to two SciDAC applications: the S3D code for simulation of turbulent combustion and the MFDn code for ab initio calculations of microscopic structure of nuclei

  18. Performance evaluation of enterprise architecture with a formal fuzzy model (FPN

    Directory of Open Access Journals (Sweden)

    Ashkan Marahel

    2012-10-01

    Full Text Available Preparing enterprise architecture is complicated procedure, which uses framework as structure regularity and style as the behavior director for controlling complexity. As in architecture behavior, precedence over structure, for better diagnosis of a behavior than other behaviors, there is a need to evaluate the architecture performance. Enterprise architecture cannot be organized without the benefit of the logical structure. Framework provides a logical structure for classifying architectural output. Among the common architectural framework, the C4ISR is one of the most appropriate frameworks because of the methodology of its production and the level of aggregation capability and minor revisions. C4ISR framework, in three views and by using some documents called product, describes the architecture. In this paper, for developing the systems, there are always uncertainties in information systems and we may use new version of UML called FUZZY-UML, which includes structure and behavior of the system. The proposed model of this paper also uses Fuzzy Petri nets to analyze the developed system.

  19. Performing three-dimensional neutral particle transport calculations on tera scale computers

    International Nuclear Information System (INIS)

    Woodward, C.S.; Brown, P.N.; Chang, B.; Dorr, M.R.; Hanebutte, U.R.

    1999-01-01

    A scalable, parallel code system to perform neutral particle transport calculations in three dimensions is presented. To utilize the hyper-cluster architecture of emerging tera scale computers, the parallel code successfully combines the MPI message passing and paradigms. The code's capabilities are demonstrated by a shielding calculation containing over 14 billion unknowns. This calculation was accomplished on the IBM SP ''ASCI-Blue-Pacific computer located at Lawrence Livermore National Laboratory (LLNL)

  20. Analytical performance modeling for computer systems

    CERN Document Server

    Tay, Y C

    2013-01-01

    This book is an introduction to analytical performance modeling for computer systems, i.e., writing equations to describe their performance behavior. It is accessible to readers who have taken college-level courses in calculus and probability, networking and operating systems. This is not a training manual for becoming an expert performance analyst. Rather, the objective is to help the reader construct simple models for analyzing and understanding the systems that they are interested in.Describing a complicated system abstractly with mathematical equations requires a careful choice of assumpti

  1. Computer performance optimization systems, applications, processes

    CERN Document Server

    Osterhage, Wolfgang W

    2013-01-01

    Computing power performance was important at times when hardware was still expensive, because hardware had to be put to the best use. Later on this criterion was no longer critical, since hardware had become inexpensive. Meanwhile, however, people have realized that performance again plays a significant role, because of the major drain on system resources involved in developing complex applications. This book distinguishes between three levels of performance optimization: the system level, application level and business processes level. On each, optimizations can be achieved and cost-cutting p

  2. High Performance Computing in Science and Engineering '15 : Transactions of the High Performance Computing Center

    CERN Document Server

    Kröner, Dietmar; Resch, Michael

    2016-01-01

    This book presents the state-of-the-art in supercomputer simulation. It includes the latest findings from leading researchers using systems from the High Performance Computing Center Stuttgart (HLRS) in 2015. The reports cover all fields of computational science and engineering ranging from CFD to computational physics and from chemistry to computer science with a special emphasis on industrially relevant applications. Presenting findings of one of Europe’s leading systems, this volume covers a wide variety of applications that deliver a high level of sustained performance. The book covers the main methods in high-performance computing. Its outstanding results in achieving the best performance for production codes are of particular interest for both scientists and engineers. The book comes with a wealth of color illustrations and tables of results.

  3. High Performance Computing in Science and Engineering '17 : Transactions of the High Performance Computing Center

    CERN Document Server

    Kröner, Dietmar; Resch, Michael; HLRS 2017

    2018-01-01

    This book presents the state-of-the-art in supercomputer simulation. It includes the latest findings from leading researchers using systems from the High Performance Computing Center Stuttgart (HLRS) in 2017. The reports cover all fields of computational science and engineering ranging from CFD to computational physics and from chemistry to computer science with a special emphasis on industrially relevant applications. Presenting findings of one of Europe’s leading systems, this volume covers a wide variety of applications that deliver a high level of sustained performance.The book covers the main methods in high-performance computing. Its outstanding results in achieving the best performance for production codes are of particular interest for both scientists and engineers. The book comes with a wealth of color illustrations and tables of results.

  4. The impact of optimize solar radiation received on the levels and energy disposal of levels on architectural design result by using computer simulation

    Energy Technology Data Exchange (ETDEWEB)

    Rezaei, Davood; Farajzadeh Khosroshahi, Samaneh; Sadegh Falahat, Mohammad [Zanjan University (Iran, Islamic Republic of)], email: d_rezaei@znu.ac.ir, email: ronas_66@yahoo.com, email: Safalahat@yahoo.com

    2011-07-01

    In order to minimize the energy consumption of a building it is important to achieve optimum solar energy. The aim of this paper is to introduce the use of computer modeling in the early stages of design to optimize solar radiation received and energy disposal in an architectural design. Computer modeling was performed on 2 different projects located in Los Angeles, USA, using ECOTECT software. Changes were made to the designs following analysis of the modeling results and a subsequent analysis was carried out on the optimized designs. Results showed that the computer simulation allows the designer to set the analysis criteria and improve the energy performance of a building before it is constructed; moreover, it can be used for a wide range of optimization levels. This study pointed out that computer simulation should be performed in the design stage to optimize a building's energy performance.

  5. Performance comparison of optical interference cancellation system architectures.

    Science.gov (United States)

    Lu, Maddie; Chang, Matt; Deng, Yanhua; Prucnal, Paul R

    2013-04-10

    The performance of three optics-based interference cancellation systems are compared and contrasted with each other, and with traditional electronic techniques for interference cancellation. The comparison is based on a set of common performance metrics that we have developed for this purpose. It is shown that thorough evaluation of our optical approaches takes into account the traditional notions of depth of cancellation and dynamic range, along with notions of link loss and uniformity of cancellation. Our evaluation shows that our use of optical components affords performance that surpasses traditional electronic approaches, and that the optimal choice for an optical interference canceller requires taking into account the performance metrics discussed in this paper.

  6. A price and performance comparison of three different storage architectures for data in cloud-based systems

    Science.gov (United States)

    Gallagher, J. H. R.; Jelenak, A.; Potter, N.; Fulker, D. W.; Habermann, T.

    2017-12-01

    Providing data services based on cloud computing technology that is equivalent to those developed for traditional computing and storage systems is critical for successful migration to cloud-based architectures for data production, scientific analysis and storage. OPeNDAP Web-service capabilities (comprising the Data Access Protocol (DAP) specification plus open-source software for realizing DAP in servers and clients) are among the most widely deployed means for achieving data-as-service functionality in the Earth sciences. OPeNDAP services are especially common in traditional data center environments where servers offer access to datasets stored in (very large) file systems, and a preponderance of the source data for these services is being stored in the Hierarchical Data Format Version 5 (HDF5). Three candidate architectures for serving NASA satellite Earth Science HDF5 data via Hyrax running on Amazon Web Services (AWS) were developed and their performance examined for a set of representative use cases. The performance was based both on runtime and incurred cost. The three architectures differ in how HDF5 files are stored in the Amazon Simple Storage Service (S3) and how the Hyrax server (as an EC2 instance) retrieves their data. The results for both the serial and parallel access to HDF5 data in the S3 will be presented. While the study focused on HDF5 data, OPeNDAP and the Hyrax data server, the architectures are generic and the analysis can be extrapolated to many different data formats, web APIs, and data servers.

  7. Simulation of Si:P spin-based quantum computer architecture

    International Nuclear Information System (INIS)

    Chang Yiachung; Fang Angbo

    2008-01-01

    We present realistic simulation for single and double phosphorous donors in a silicon-based quantum computer design by solving a valley-orbit coupled effective-mass equation for describing phosphorous donors in strained silicon quantum well (QW). Using a generalized unrestricted Hartree-Fock method, we solve the two-electron effective-mass equation with quantum well confinement and realistic gate potentials. The effects of QW width, gate voltages, donor separation, and donor position shift on the lowest singlet and triplet energies and their charge distributions for a neighboring donor pair in the quantum computer(QC) architecture are analyzed. The gate tunability are defined and evaluated for a typical QC design. Estimates are obtained for the duration of spin half-swap gate operation.

  8. Integration of highly probabilistic sources into optical quantum architectures: perpetual quantum computation

    International Nuclear Information System (INIS)

    Devitt, Simon J; Stephens, Ashley M; Munro, William J; Nemoto, Kae

    2011-01-01

    In this paper, we introduce a design for an optical topological cluster state computer constructed exclusively from a single quantum component. Unlike previous efforts we eliminate the need for on demand, high fidelity photon sources and detectors and replace them with the same device utilized to create photon/photon entanglement. This introduces highly probabilistic elements into the optical architecture while maintaining complete specificity of the structure and operation for a large-scale computer. Photons in this system are continually recycled back into the preparation network, allowing for an arbitrarily deep three-dimensional cluster to be prepared using a comparatively small number of photonic qubits and consequently the elimination of high-frequency, deterministic photon sources.

  9. A methodology for performing computer security reviews

    International Nuclear Information System (INIS)

    Hunteman, W.J.

    1991-01-01

    DOE Order 5637.1, ''Classified Computer Security,'' requires regular reviews of the computer security activities for an ADP system and for a site. Based on experiences gained in the Los Alamos computer security program through interactions with DOE facilities, we have developed a methodology to aid a site or security officer in performing a comprehensive computer security review. The methodology is designed to aid a reviewer in defining goals of the review (e.g., preparation for inspection), determining security requirements based on DOE policies, determining threats/vulnerabilities based on DOE and local threat guidance, and identifying critical system components to be reviewed. Application of the methodology will result in review procedures and checklists oriented to the review goals, the target system, and DOE policy requirements. The review methodology can be used to prepare for an audit or inspection and as a periodic self-check tool to determine the status of the computer security program for a site or specific ADP system. 1 tab

  10. A methodology for performing computer security reviews

    International Nuclear Information System (INIS)

    Hunteman, W.J.

    1991-01-01

    This paper reports on DIE Order 5637.1, Classified Computer Security, which requires regular reviews of the computer security activities for an ADP system and for a site. Based on experiences gained in the Los Alamos computer security program through interactions with DOE facilities, the authors have developed a methodology to aid a site or security officer in performing a comprehensive computer security review. The methodology is designed to aid a reviewer in defining goals of the review (e.g., preparation for inspection), determining security requirements based on DOE policies, determining threats/vulnerabilities based on DOE and local threat guidance, and identifying critical system components to be reviewed. Application of the methodology will result in review procedures and checklists oriented to the review goals, the target system, and DOE policy requirements. The review methodology can be used to prepare for an audit or inspection and as a periodic self-check tool to determine the status of the computer security program for a site or specific ADP system

  11. HIGH PERFORMANCE PHOTOGRAMMETRIC PROCESSING ON COMPUTER CLUSTERS

    Directory of Open Access Journals (Sweden)

    V. N. Adrov

    2012-07-01

    Full Text Available Most cpu consuming tasks in photogrammetric processing can be done in parallel. The algorithms take independent bits as input and produce independent bits as output. The independence of bits comes from the nature of such algorithms since images, stereopairs or small image blocks parts can be processed independently. Many photogrammetric algorithms are fully automatic and do not require human interference. Photogrammetric workstations can perform tie points measurements, DTM calculations, orthophoto construction, mosaicing and many other service operations in parallel using distributed calculations. Distributed calculations save time reducing several days calculations to several hours calculations. Modern trends in computer technology show the increase of cpu cores in workstations, speed increase in local networks, and as a result dropping the price of the supercomputers or computer clusters that can contain hundreds or even thousands of computing nodes. Common distributed processing in DPW is usually targeted for interactive work with a limited number of cpu cores and is not optimized for centralized administration. The bottleneck of common distributed computing in photogrammetry can be in the limited lan throughput and storage performance, since the processing of huge amounts of large raster images is needed.

  12. Evaluation of existing and proposed computer architectures for future ground-based systems

    Science.gov (United States)

    Schulbach, C.

    1985-01-01

    Parallel processing architectures and techniques used in current supercomputers are described and projections are made of future advances. Presently, the von Neumann sequential processing pattern has been accelerated by having separate I/O processors, interleaved memories, wide memories, independent functional units and pipelining. Recent supercomputers have featured single-input, multiple data stream architectures, which have different processors for performing various operations (vector or pipeline processors). Multiple input, multiple data stream machines have also been developed. Data flow techniques, wherein program instructions are activated only when data are available, are expected to play a large role in future supercomputers, along with increased parallel processor arrays. The enhanced operational speeds are essential for adequately treating data from future spacecraft remote sensing instruments such as the Thematic Mapper.

  13. Monitoring SLAC High Performance UNIX Computing Systems

    International Nuclear Information System (INIS)

    Lettsome, Annette K.

    2005-01-01

    Knowledge of the effectiveness and efficiency of computers is important when working with high performance systems. The monitoring of such systems is advantageous in order to foresee possible misfortunes or system failures. Ganglia is a software system designed for high performance computing systems to retrieve specific monitoring information. An alternative storage facility for Ganglia's collected data is needed since its default storage system, the round-robin database (RRD), struggles with data integrity. The creation of a script-driven MySQL database solves this dilemma. This paper describes the process took in the creation and implementation of the MySQL database for use by Ganglia. Comparisons between data storage by both databases are made using gnuplot and Ganglia's real-time graphical user interface

  14. A hybrid optical switch architecture to integrate IP into optical networks to provide flexible and intelligent bandwidth on demand for cloud computing

    Science.gov (United States)

    Yang, Wei; Hall, Trevor J.

    2013-12-01

    The Internet is entering an era of cloud computing to provide more cost effective, eco-friendly and reliable services to consumer and business users. As a consequence, the nature of the Internet traffic has been fundamentally transformed from a pure packet-based pattern to today's predominantly flow-based pattern. Cloud computing has also brought about an unprecedented growth in the Internet traffic. In this paper, a hybrid optical switch architecture is presented to deal with the flow-based Internet traffic, aiming to offer flexible and intelligent bandwidth on demand to improve fiber capacity utilization. The hybrid optical switch is capable of integrating IP into optical networks for cloud-based traffic with predictable performance, for which the delay performance of the electronic module in the hybrid optical switch architecture is evaluated through simulation.

  15. Cloud Computing for Maintenance Performance Improvement

    OpenAIRE

    Kour, Ravdeep; Karim, Ramin; Parida, Aditya

    2013-01-01

    Cloud Computing is an emerging research area. It can be utilised for acquiring an effective and efficient information logistics. This paper uses cloud-based technology for the establishment of information logistics for railway system which requires information based on data from different data sources (e.g. railway maintenance, railway operation, and railway business data). In order to improve the performance of the maintenance process relevant data from various sources need to be acquired, f...

  16. High Performance Computing Operations Review Report

    Energy Technology Data Exchange (ETDEWEB)

    Cupps, Kimberly C. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2013-12-19

    The High Performance Computing Operations Review (HPCOR) meeting—requested by the ASC and ASCR program headquarters at DOE—was held November 5 and 6, 2013, at the Marriott Hotel in San Francisco, CA. The purpose of the review was to discuss the processes and practices for HPC integration and its related software and facilities. Experiences and lessons learned from the most recent systems deployed were covered in order to benefit the deployment of new systems.

  17. Discussing performance management architecture in public service broadcasting

    DEFF Research Database (Denmark)

    Tambo, Torben; Gabel, Ole Dahl

    2014-01-01

    (DR) as case. Design/methodology/approach: Qualitative, case-based, inspired by information systems research using ontologies of organisational performance governance frameworks. Findings: A closer connection between corporate activities, metrics and the technologies defining and underpinning...... and a disproportional “market share” against commercial actors. This is both interesting to research but also issues limits on the conclusions given the uniqueness. Practical implications: Ambiguity, bad connectedness, and lack of consensus of measurement of organisational performance can tentatively have a negative...

  18. Analysis of mobile fronthaul bandwidth and wireless transmission performance in split-PHY processing architecture.

    Science.gov (United States)

    Miyamoto, Kenji; Kuwano, Shigeru; Terada, Jun; Otaka, Akihiro

    2016-01-25

    We analyze the mobile fronthaul (MFH) bandwidth and the wireless transmission performance in the split-PHY processing (SPP) architecture, which redefines the functional split of centralized/cloud RAN (C-RAN) while preserving high wireless coordinated multi-point (CoMP) transmission/reception performance. The SPP architecture splits the base stations (BS) functions between wireless channel coding/decoding and wireless modulation/demodulation, and employs its own CoMP joint transmission and reception schemes. Simulation results show that the SPP architecture reduces the MFH bandwidth by up to 97% from conventional C-RAN while matching the wireless bit error rate (BER) performance of conventional C-RAN in uplink joint reception with only 2-dB signal to noise ratio (SNR) penalty.

  19. Computational Modeling of Human Multiple-Task Performance

    National Research Council Canada - National Science Library

    Kieras, David E; Meyer, David

    2005-01-01

    This is the final report for a project that was a continuation of an earlier, long-term project on the development and validation of the EPIC cognitive architecture for modeling human cognition and performance...

  20. NASA Human Health and Performance Information Architecture Panel

    Science.gov (United States)

    Johnson-Throop, Kathy; Kadwa, Binafer; VanBaalen, Mary

    2014-01-01

    The Human Health and Performance (HH&P) Directorate at NASA's Johnson Space Center has a mission to enable optimization of human health and performance throughout all phases of spaceflight. All HH&P functions are ultimately aimed at achieving this mission. Our activities enable mission success, optimizing human health and productivity in space before, during, and after the actual spaceflight experience of our crews, and include support for ground-based functions. Many of our spaceflight innovations also provide solutions for terrestrial challenges, thereby enhancing life on Earth.

  1. Selecting an Architecture for a Safety-Critical Distributed Computer System with Power, Weight and Cost Considerations

    Science.gov (United States)

    Torres-Pomales, Wilfredo

    2014-01-01

    This report presents an example of the application of multi-criteria decision analysis to the selection of an architecture for a safety-critical distributed computer system. The design problem includes constraints on minimum system availability and integrity, and the decision is based on the optimal balance of power, weight and cost. The analysis process includes the generation of alternative architectures, evaluation of individual decision criteria, and the selection of an alternative based on overall value. In this example presented here, iterative application of the quantitative evaluation process made it possible to deliberately generate an alternative architecture that is superior to all others regardless of the relative importance of cost.

  2. Ontology Design for Solving Computationally-Intensive Problems on Heterogeneous Architectures

    Directory of Open Access Journals (Sweden)

    Hossam M. Faheem

    2018-02-01

    Full Text Available Viewing a computationally-intensive problem as a self-contained challenge with its own hardware, software and scheduling strategies is an approach that should be investigated. We might suggest assigning heterogeneous hardware architectures to solve a problem, while parallel computing paradigms may play an important role in writing efficient code to solve the problem; moreover, the scheduling strategies may be examined as a possible solution. Depending on the problem complexity, finding the best possible solution using an integrated infrastructure of hardware, software and scheduling strategy can be a complex job. Developing and using ontologies and reasoning techniques play a significant role in reducing the complexity of identifying the components of such integrated infrastructures. Undertaking reasoning and inferencing regarding the domain concepts can help to find the best possible solution through a combination of hardware, software and scheduling strategies. In this paper, we present an ontology and show how we can use it to solve computationally-intensive problems from various domains. As a potential use for the idea, we present examples from the bioinformatics domain. Validation by using problems from the Elastic Optical Network domain has demonstrated the flexibility of the suggested ontology and its suitability for use with any other computationally-intensive problem domain.

  3. Scalable optical packet switch architecture for low latency and high load computer communication networks

    NARCIS (Netherlands)

    Calabretta, N.; Di Lucente, S.; Nazarathy, Y.; Raz, O.; Dorren, H.J.S.

    2011-01-01

    High performance computer and data-centers require PetaFlop/s processing speed and Petabyte storage capacity with thousands of low-latency short link interconnections between computers nodes. Switch matrices that operate transparently in the optical domain are a potential way to efficiently

  4. NINJA: Java for High Performance Numerical Computing

    Directory of Open Access Journals (Sweden)

    José E. Moreira

    2002-01-01

    Full Text Available When Java was first introduced, there was a perception that its many benefits came at a significant performance cost. In the particularly performance-sensitive field of numerical computing, initial measurements indicated a hundred-fold performance disadvantage between Java and more established languages such as Fortran and C. Although much progress has been made, and Java now can be competitive with C/C++ in many important situations, significant performance challenges remain. Existing Java virtual machines are not yet capable of performing the advanced loop transformations and automatic parallelization that are now common in state-of-the-art Fortran compilers. Java also has difficulties in implementing complex arithmetic efficiently. These performance deficiencies can be attacked with a combination of class libraries (packages, in Java that implement truly multidimensional arrays and complex numbers, and new compiler techniques that exploit the properties of these class libraries to enable other, more conventional, optimizations. Two compiler techniques, versioning and semantic expansion, can be leveraged to allow fully automatic optimization and parallelization of Java code. Our measurements with the NINJA prototype Java environment show that Java can be competitive in performance with highly optimized and tuned Fortran code.

  5. RISC Processors and High Performance Computing

    Science.gov (United States)

    Bailey, David H.; Saini, Subhash; Craw, James M. (Technical Monitor)

    1995-01-01

    This tutorial will discuss the top five RISC microprocessors and the parallel systems in which they are used. It will provide a unique cross-machine comparison not available elsewhere. The effective performance of these processors will be compared by citing standard benchmarks in the context of real applications. The latest NAS Parallel Benchmarks, both absolute performance and performance per dollar, will be listed. The next generation of the NPB will be described. The tutorial will conclude with a discussion of future directions in the field. Technology Transfer Considerations: All of these computer systems are commercially available internationally. Information about these processors is available in the public domain, mostly from the vendors themselves. The NAS Parallel Benchmarks and their results have been previously approved numerous times for public release, beginning back in 1991.

  6. Sprinting performance on the Woodway Curve 3.0 is related to muscle architecture.

    Science.gov (United States)

    Mangine, Gerald T; Fukuda, David H; Townsend, Jeremy R; Wells, Adam J; Gonzalez, Adam M; Jajtner, Adam R; Bohner, Jonathan D; LaMonica, Michael; Hoffman, Jay R; Fragala, Maren S; Stout, Jeffrey R

    2015-01-01

    To determine if unilateral measures of muscle architecture in the rectus femoris (RF) and vastus lateralis (VL) were related to (and predictive of) sprinting speed and unilateral (and bilateral) force (FRC) and power (POW) during a 30 s maximal sprint on the Woodway Curve 3.0 non-motorized treadmill. Twenty-eight healthy, physically active men (n = 14) and women (n = 14) (age = 22.9 ± 2.4 years; body mass = 77.1 ± 16.2 kg; height = 171.6 ± 11.2 cm; body-fa t = 19.4 ± 8.1%) completed one familiarization and one 30-s maximal sprint on the TM to obtain maximal sprinting speed, POW and FRC. Muscle thickness (MT), cross-sectional area (CSA) and echo intensity (ECHO) of the RF and VL in the dominant (DOM; determined by unilateral sprinting power) and non-dominant (ND) legs were measured via ultrasound. Pearson correlations indicated several significant (p architecture. Stepwise regression indicated that POW(DOM) was predictive of ipsilateral RF (MT and CSA) and VL (CSA and ECHO), while POW(ND) was predictive of ipsilateral RF (MT and CSA) and VL (CSA); sprinting power/force asymmetry was not predictive of architecture asymmetry. Sprinting time was best predicted by peak power and peak force, though muscle quality (ECHO) and the bilateral percent difference in VL (CSA) were strong architectural predictors. Muscle architecture is related to (and predictive of) TM sprinting performance, while unilateral POW is predictive of ipsilateral architecture. However, the extent to which architecture and other factors (i.e. neuromuscular control and sprinting technique) affect TM performance remains unknown.

  7. Designing block copolymer architectures for targeted membrane performance

    KAUST Repository

    Dorin, Rachel Mika

    2014-01-01

    Using a combination of block copolymer self-assembly and non-solvent induced phase separation, isoporous ultrafiltration membranes were fabricated from four poly(isoprene-b-styrene-b-4-vinylpyridine) triblock terpolymers with similar block volume fractions but varying in total molar mass from 43 kg/mol to 115 kg/mol to systematically study the effect of polymer size on membrane structure. Small-angle X-ray scattering was used to probe terpolymer solution structure in the dope. All four triblocks displayed solution scattering patterns consistent with a body-centered cubic morphology. After membrane formation, structures were characterized using a combination of scanning electron microscopy and filtration performance tests. Membrane pore densities that ranged from 4.53 × 1014 to 1.48 × 1015 pores/m 2 were observed, which are the highest pore densities yet reported for membranes using self-assembly and non-solvent induced phase separation. Hydraulic permeabilities ranging from 24 to 850 L m-2 h-1 bar-1 and pore diameters ranging from 7 to 36 nm were determined from permeation and rejection experiments. Both the hydraulic permeability and pore size increased with increasing molar mass of the parent terpolymer. The combination of polymer characterization and membrane transport tests described here demonstrates the ability to rationally design macromolecular structures to target specific performance characteristics in block copolymer derived ultrafiltration membranes. © 2013 Elsevier Ltd. All rights reserved.

  8. Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture

    OpenAIRE

    Liu, Xu; Chen, Langshi; Firoz, Jesun S.; Qiu, Judy; Jiang, Lei

    2017-01-01

    Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we empirically evaluate various computing platforms including an Intel Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor codenamed Knights Landing (KNL) in the domain of parallel graph processing. We show that the KNL gains encouraging per...

  9. Computer fan performance enhancement via acoustic perturbations

    Energy Technology Data Exchange (ETDEWEB)

    Greenblatt, David, E-mail: davidg@technion.ac.il [Faculty of Mechanical Engineering, Technion - Israel Institute of Technology, Haifa (Israel); Avraham, Tzahi; Golan, Maayan [Faculty of Mechanical Engineering, Technion - Israel Institute of Technology, Haifa (Israel)

    2012-04-15

    Highlights: Black-Right-Pointing-Pointer Computer fan effectiveness was increased by introducing acoustic perturbations. Black-Right-Pointing-Pointer Acoustic perturbations controlled blade boundary layer separation. Black-Right-Pointing-Pointer Optimum frequencies corresponded with airfoils studies. Black-Right-Pointing-Pointer Exploitation of flow instabilities was responsible for performance improvements. Black-Right-Pointing-Pointer Peak pressure and peak flowrate were increased by 40% and 15% respectively. - Abstract: A novel technique for increasing computer fan effectiveness, based on introducing acoustic perturbations onto the fan blades to control boundary layer separation, was assessed. Experiments were conducted in a specially designed facility that simultaneously allowed characterization of fan performance and introduction of the perturbations. A parametric study was conducted to determine the optimum control parameters, namely those that deliver the largest increase in fan pressure for a given flowrate. The optimum reduced frequencies corresponded with those identified on stationary airfoils and it was thus concluded that the exploitation of Kelvin-Helmholtz instabilities, commonly observed on airfoils, was responsible for the fan blade performance improvements. The optimum control inputs, such as acoustic frequency and sound pressure level, showed some variation with different fan flowrates. With the near-optimum control conditions identified, the full operational envelope of the fan, when subjected to acoustic perturbations, was assessed. The peak pressure and peak flowrate were increased by up to 40% and 15% respectively. The peak fan efficiency increased with acoustic perturbations but the overall system efficiency was reduced when the speaker input power was accounted for.

  10. Computer fan performance enhancement via acoustic perturbations

    International Nuclear Information System (INIS)

    Greenblatt, David; Avraham, Tzahi; Golan, Maayan

    2012-01-01

    Highlights: ► Computer fan effectiveness was increased by introducing acoustic perturbations. ► Acoustic perturbations controlled blade boundary layer separation. ► Optimum frequencies corresponded with airfoils studies. ► Exploitation of flow instabilities was responsible for performance improvements. ► Peak pressure and peak flowrate were increased by 40% and 15% respectively. - Abstract: A novel technique for increasing computer fan effectiveness, based on introducing acoustic perturbations onto the fan blades to control boundary layer separation, was assessed. Experiments were conducted in a specially designed facility that simultaneously allowed characterization of fan performance and introduction of the perturbations. A parametric study was conducted to determine the optimum control parameters, namely those that deliver the largest increase in fan pressure for a given flowrate. The optimum reduced frequencies corresponded with those identified on stationary airfoils and it was thus concluded that the exploitation of Kelvin–Helmholtz instabilities, commonly observed on airfoils, was responsible for the fan blade performance improvements. The optimum control inputs, such as acoustic frequency and sound pressure level, showed some variation with different fan flowrates. With the near-optimum control conditions identified, the full operational envelope of the fan, when subjected to acoustic perturbations, was assessed. The peak pressure and peak flowrate were increased by up to 40% and 15% respectively. The peak fan efficiency increased with acoustic perturbations but the overall system efficiency was reduced when the speaker input power was accounted for.

  11. Precision Agriculture Design Method Using a Distributed Computing Architecture on Internet of Things Context

    Directory of Open Access Journals (Sweden)

    Francisco Javier Ferrández-Pastor

    2018-05-01

    Full Text Available The Internet of Things (IoT has opened productive ways to cultivate soil with the use of low-cost hardware (sensors/actuators and communication (Internet technologies. Remote equipment and crop monitoring, predictive analytic, weather forecasting for crops or smart logistics and warehousing are some examples of these new opportunities. Nevertheless, farmers are agriculture experts but, usually, do not have experience in IoT applications. Users who use IoT applications must participate in its design, improving the integration and use. In this work, different industrial agricultural facilities are analysed with farmers and growers to design new functionalities based on IoT paradigms deployment. User-centred design model is used to obtain knowledge and experience in the process of introducing technology in agricultural applications. Internet of things paradigms are used as resources to facilitate the decision making. IoT architecture, operating rules and smart processes are implemented using a distributed model based on edge and fog computing paradigms. A communication architecture is proposed using these technologies. The aim is to help farmers to develop smart systems both, in current and new facilities. Different decision trees to automate the installation, designed by the farmer, can be easily deployed using the method proposed in this document.

  12. A Trusted Computing Architecture of Embedded System Based on Improved TPM

    Directory of Open Access Journals (Sweden)

    Wang Xiaosheng

    2017-01-01

    Full Text Available The Trusted Platform Module (TPM currently used by PCs is not suitable for embedded systems, it is necessary to improve existing TPM. The paper proposes a trusted computing architecture with new TPM and the cryptographic system developed by China for the embedded system. The improved TPM consists of the Embedded System Trusted Cryptography Module (eTCM and the Embedded System Trusted Platform Control Module (eTPCM, which are combined and implemented the TPM’s autonomous control, active defense, high-speed encryption/decryption and other function through its internal bus arbitration module and symmetric and asymmetric cryptographic engines to effectively protect the security of embedded system. In our improved TPM, a trusted measurement method with chain model and star type model is used. Finally, the improved TPM is designed by FPGA, and it is used to a trusted PDA to carry out experimental verification. Experiments show that the trusted architecture of the embedded system based on the improved TPM is efficient, reliable and secure.

  13. Every Second Counts: Integrating Edge Computing and Service Oriented Architecture for Automatic Emergency Management

    Directory of Open Access Journals (Sweden)

    Lei Chen

    2018-01-01

    Full Text Available Emergency management has long been recognized as a social challenge due to the criticality of the response time. In emergency situations such as severe traffic accidents, minimizing the response time, which requires close collaborations between all stakeholders involved and distributed intelligence support, leads to greater survival chance of the injured. However, the current response system is far from efficient, despite the rapid development of information and communication technologies. This paper presents an automated collaboration framework for emergency management that coordinates all stakeholders within the emergency response system and fully automates the rescue process. Applying the concept of multiaccess edge computing architecture, as well as choreography of the service oriented architecture, the system allows seamless coordination between multiple organizations in a distributed way through standard web services. A service choreography is designed to globally model the emergency management process from the time an accident occurs until the rescue is finished. The choreography can be synthesized to generate detailed specification on peer-to-peer interaction logic, and then the specification can be enacted and deployed on cloud infrastructures.

  14. Precision Agriculture Design Method Using a Distributed Computing Architecture on Internet of Things Context.

    Science.gov (United States)

    Ferrández-Pastor, Francisco Javier; García-Chamizo, Juan Manuel; Nieto-Hidalgo, Mario; Mora-Martínez, José

    2018-05-28

    The Internet of Things (IoT) has opened productive ways to cultivate soil with the use of low-cost hardware (sensors/actuators) and communication (Internet) technologies. Remote equipment and crop monitoring, predictive analytic, weather forecasting for crops or smart logistics and warehousing are some examples of these new opportunities. Nevertheless, farmers are agriculture experts but, usually, do not have experience in IoT applications. Users who use IoT applications must participate in its design, improving the integration and use. In this work, different industrial agricultural facilities are analysed with farmers and growers to design new functionalities based on IoT paradigms deployment. User-centred design model is used to obtain knowledge and experience in the process of introducing technology in agricultural applications. Internet of things paradigms are used as resources to facilitate the decision making. IoT architecture, operating rules and smart processes are implemented using a distributed model based on edge and fog computing paradigms. A communication architecture is proposed using these technologies. The aim is to help farmers to develop smart systems both, in current and new facilities. Different decision trees to automate the installation, designed by the farmer, can be easily deployed using the method proposed in this document.

  15. High performance computations using dynamical nucleation theory

    International Nuclear Information System (INIS)

    Windus, T L; Crosby, L D; Kathmann, S M

    2008-01-01

    Chemists continue to explore the use of very large computations to perform simulations that describe the molecular level physics of critical challenges in science. In this paper, we describe the Dynamical Nucleation Theory Monte Carlo (DNTMC) model - a model for determining molecular scale nucleation rate constants - and its parallel capabilities. The potential for bottlenecks and the challenges to running on future petascale or larger resources are delineated. A 'master-slave' solution is proposed to scale to the petascale and will be developed in the NWChem software. In addition, mathematical and data analysis challenges are described

  16. A Development Architecture for Serious Games Using BCI (Brain Computer Interface Sensors

    Directory of Open Access Journals (Sweden)

    Kyhyun Um

    2012-11-01

    Full Text Available Games that use brainwaves via brain–computer interface (BCI devices, to improve brain functions are known as BCI serious games. Due to the difficulty of developing BCI serious games, various BCI engines and authoring tools are required, and these reduce the development time and cost. However, it is desirable to reduce the amount of technical knowledge of brain functions and BCI devices needed by game developers. Moreover, a systematic BCI serious game development process is required. In this paper, we present a methodology for the development of BCI serious games. We describe an architecture, authoring tools, and development process of the proposed methodology, and apply it to a game development approach for patients with mild cognitive impairment as an example. This application demonstrates that BCI serious games can be developed on the basis of expert-verified theories.

  17. A development architecture for serious games using BCI (brain computer interface) sensors.

    Science.gov (United States)

    Sung, Yunsick; Cho, Kyungeun; Um, Kyhyun

    2012-11-12

    Games that use brainwaves via brain-computer interface (BCI) devices, to improve brain functions are known as BCI serious games. Due to the difficulty of developing BCI serious games, various BCI engines and authoring tools are required, and these reduce the development time and cost. However, it is desirable to reduce the amount of technical knowledge of brain functions and BCI devices needed by game developers. Moreover, a systematic BCI serious game development process is required. In this paper, we present a methodology for the development of BCI serious games. We describe an architecture, authoring tools, and development process of the proposed methodology, and apply it to a game development approach for patients with mild cognitive impairment as an example. This application demonstrates that BCI serious games can be developed on the basis of expert-verified theories.

  18. A Development Architecture for Serious Games Using BCI (Brain Computer Interface) Sensors

    Science.gov (United States)

    Sung, Yunsick; Cho, Kyungeun; Um, Kyhyun

    2012-01-01

    Games that use brainwaves via brain–computer interface (BCI) devices, to improve brain functions are known as BCI serious games. Due to the difficulty of developing BCI serious games, various BCI engines and authoring tools are required, and these reduce the development time and cost. However, it is desirable to reduce the amount of technical knowledge of brain functions and BCI devices needed by game developers. Moreover, a systematic BCI serious game development process is required. In this paper, we present a methodology for the development of BCI serious games. We describe an architecture, authoring tools, and development process of the proposed methodology, and apply it to a game development approach for patients with mild cognitive impairment as an example. This application demonstrates that BCI serious games can be developed on the basis of expert-verified theories. PMID:23202227

  19. Analysis of parallel computing performance of the code MCNP

    International Nuclear Information System (INIS)

    Wang Lei; Wang Kan; Yu Ganglin

    2006-01-01

    Parallel computing can reduce the running time of the code MCNP effectively. With the MPI message transmitting software, MCNP5 can achieve its parallel computing on PC cluster with Windows operating system. Parallel computing performance of MCNP is influenced by factors such as the type, the complexity level and the parameter configuration of the computing problem. This paper analyzes the parallel computing performance of MCNP regarding with these factors and gives measures to improve the MCNP parallel computing performance. (authors)

  20. A Compute Capable SSD Architecture for Next-Generation Non-volatile Memories

    Energy Technology Data Exchange (ETDEWEB)

    De, Arup [Univ. of California, San Diego, CA (United States)

    2014-01-01

    Existing storage technologies (e.g., disks and ash) are failing to cope with the processor and main memory speed and are limiting the overall perfor- mance of many large scale I/O or data-intensive applications. Emerging fast byte-addressable non-volatile memory (NVM) technologies, such as phase-change memory (PCM), spin-transfer torque memory (STTM) and memristor are very promising and are approaching DRAM-like performance with lower power con- sumption and higher density as process technology scales. These new memories are narrowing down the performance gap between the storage and the main mem- ory and are putting forward challenging problems on existing SSD architecture, I/O interface (e.g, SATA, PCIe) and software. This dissertation addresses those challenges and presents a novel SSD architecture called XSSD. XSSD o oads com- putation in storage to exploit fast NVMs and reduce the redundant data tra c across the I/O bus. XSSD o ers a exible RPC-based programming framework that developers can use for application development on SSD without dealing with the complication of the underlying architecture and communication management. We have built a prototype of XSSD on the BEE3 FPGA prototyping system. We implement various data-intensive applications and achieve speedup and energy ef- ciency of 1.5-8.9 and 1.7-10.27 respectively. This dissertation also compares XSSD with previous work on intelligent storage and intelligent memory. The existing ecosystem and these new enabling technologies make this system more viable than earlier ones.

  1. Putting all that (HEP-) data to work - a REAL implementation of an unlimited computing and storage architecture

    International Nuclear Information System (INIS)

    Ernst, Michael

    1996-01-01

    Since computing in HEP left the Mainframe-Path, many institutions demonstrated a successful migration to workstation-based computing, especially for applications requiring a high CPU-to-I/O ratio. However, the difficulties and the complexity starts beyond just providing CPU-Cycles. Critical applications, requiring either sequential access to large amounts of data or to many small sets out of a multi 10-Terabyte Data Repository need technical approaches we have not had so far. Though we felt that we were hardly able to follow technology evolving in the various fields, we recently had to realize that even politics overtook technical evolution - at least in the areas mentioned above. The USA is making peace with Russia. DEC is talking to IBM, SGI communicating with HP. All these things became true, and through, unfortunately, the Cold War lasted 50 years, and-in a relative sense-we were afraid that 50 years seemed to be how long any self respecting high performance computer (or a set of workstations) had to wait for data from its Server, fortunately, we are now facing a similar progress of friendliness, harmony and balance in the former problematic (computing) areas. Buzzwords, mentioned many thousand times in talks describing today's and future requirements, including Functionality, Reliability, Scalability, Modularity and Portability are not just phrases, wishes and dreams any longer. At DESY, we are in the process of demonstrating an architecture that is taking those five issues equally into consideration, including Heterogeneous Computing Platforms with ultimate file system approaches, Heterogeneous Mass Storage Devices and an Open Distributed Hierarchical Mass Storage Management System. This contribution will provide an overview on how far we got and what the next steps will be. (author)

  2. Evaluation of high-performance computing software

    Energy Technology Data Exchange (ETDEWEB)

    Browne, S.; Dongarra, J. [Univ. of Tennessee, Knoxville, TN (United States); Rowan, T. [Oak Ridge National Lab., TN (United States)

    1996-12-31

    The absence of unbiased and up to date comparative evaluations of high-performance computing software complicates a user`s search for the appropriate software package. The National HPCC Software Exchange (NHSE) is attacking this problem using an approach that includes independent evaluations of software, incorporation of author and user feedback into the evaluations, and Web access to the evaluations. We are applying this approach to the Parallel Tools Library (PTLIB), a new software repository for parallel systems software and tools, and HPC-Netlib, a high performance branch of the Netlib mathematical software repository. Updating the evaluations with feed-back and making it available via the Web helps ensure accuracy and timeliness, and using independent reviewers produces unbiased comparative evaluations difficult to find elsewhere.

  3. ''Beauty of Wholeness and Beauty of Partiality.'' New Terms Defining the Concept of Beauty in Architecture in Terms of Sustainability and Computer Aided Design

    Science.gov (United States)

    Farid, Ayman A.; Zaghloul, Weaam M.; Dewidar, Khaled M.

    2014-01-01

    The great shift in sustainability and computer aided design in the field of architecture caused a remarkable change in the architecture philosophy, new aspects of beauty and aesthetic values are being introduced, and traditional definitions for beauty cannot fully cover this aspects, which causes a gap between; new architecture works criticism and…

  4. High performance computing environment for multidimensional image analysis.

    Science.gov (United States)

    Rao, A Ravishankar; Cecchi, Guillermo A; Magnasco, Marcelo

    2007-07-10

    The processing of images acquired through microscopy is a challenging task due to the large size of datasets (several gigabytes) and the fast turnaround time required. If the throughput of the image processing stage is significantly increased, it can have a major impact in microscopy applications. We present a high performance computing (HPC) solution to this problem. This involves decomposing the spatial 3D image into segments that are assigned to unique processors, and matched to the 3D torus architecture of the IBM Blue Gene/L machine. Communication between segments is restricted to the nearest neighbors. When running on a 2 Ghz Intel CPU, the task of 3D median filtering on a typical 256 megabyte dataset takes two and a half hours, whereas by using 1024 nodes of Blue Gene, this task can be performed in 18.8 seconds, a 478x speedup. Our parallel solution dramatically improves the performance of image processing, feature extraction and 3D reconstruction tasks. This increased throughput permits biologists to conduct unprecedented large scale experiments with massive datasets.

  5. Evaluating the performance of the particle finite element method in parallel architectures

    Science.gov (United States)

    Gimenez, Juan M.; Nigro, Norberto M.; Idelsohn, Sergio R.

    2014-05-01

    This paper presents a high performance implementation for the particle-mesh based method called particle finite element method two (PFEM-2). It consists of a material derivative based formulation of the equations with a hybrid spatial discretization which uses an Eulerian mesh and Lagrangian particles. The main aim of PFEM-2 is to solve transport equations as fast as possible keeping some level of accuracy. The method was found to be competitive with classical Eulerian alternatives for these targets, even in their range of optimal application. To evaluate the goodness of the method with large simulations, it is imperative to use of parallel environments. Parallel strategies for Finite Element Method have been widely studied and many libraries can be used to solve Eulerian stages of PFEM-2. However, Lagrangian stages, such as streamline integration, must be developed considering the parallel strategy selected. The main drawback of PFEM-2 is the large amount of memory needed, which limits its application to large problems with only one computer. Therefore, a distributed-memory implementation is urgently needed. Unlike a shared-memory approach, using domain decomposition the memory is automatically isolated, thus avoiding race conditions; however new issues appear due to data distribution over the processes. Thus, a domain decomposition strategy for both particle and mesh is adopted, which minimizes the communication between processes. Finally, performance analysis running over multicore and multinode architectures are presented. The Courant-Friedrichs-Lewy number used influences the efficiency of the parallelization and, in some cases, a weighted partitioning can be used to improve the speed-up. However the total cputime for cases presented is lower than that obtained when using classical Eulerian strategies.

  6. L-Band Digital Aeronautical Communications System Engineering - Concepts of Use, Systems Performance, Requirements, and Architectures

    Science.gov (United States)

    Zelkin, Natalie; Henriksen, Stephen

    2010-01-01

    This NASA Contractor Report summarizes and documents the work performed to develop concepts of use (ConUse) and high-level system requirements and architecture for the proposed L-band (960 to 1164 MHz) terrestrial en route communications system. This work was completed as a follow-on to the technology assessment conducted by NASA Glenn Research Center and ITT for the Future Communications Study (FCS). ITT assessed air-to-ground (A/G) communications concepts of use and operations presented in relevant NAS-level, international, and NAS-system-level documents to derive the appropriate ConUse relevant to potential A/G communications applications and services for domestic continental airspace. ITT also leveraged prior concepts of use developed during the earlier phases of the FCS. A middle-out functional architecture was adopted by merging the functional system requirements identified in the bottom-up assessment of existing requirements with those derived as a result of the top-down analysis of ConUse and higher level functional requirements. Initial end-to-end system performance requirements were derived to define system capabilities based on the functional requirements and on NAS-SR-1000 and the Operational Performance Assessment conducted as part of the COCR. A high-level notional architecture of the L-DACS supporting A/G communication was derived from the functional architecture and requirements.

  7. Highly Parallel Computing Architectures by using Arrays of Quantum-dot Cellular Automata (QCA): Opportunities, Challenges, and Recent Results

    Science.gov (United States)

    Fijany, Amir; Toomarian, Benny N.

    2000-01-01

    There has been significant improvement in the performance of VLSI devices, in terms of size, power consumption, and speed, in recent years and this trend may also continue for some near future. However, it is a well known fact that there are major obstacles, i.e., physical limitation of feature size reduction and ever increasing cost of foundry, that would prevent the long term continuation of this trend. This has motivated the exploration of some fundamentally new technologies that are not dependent on the conventional feature size approach. Such technologies are expected to enable scaling to continue to the ultimate level, i.e., molecular and atomistic size. Quantum computing, quantum dot-based computing, DNA based computing, biologically inspired computing, etc., are examples of such new technologies. In particular, quantum-dots based computing by using Quantum-dot Cellular Automata (QCA) has recently been intensely investigated as a promising new technology capable of offering significant improvement over conventional VLSI in terms of reduction of feature size (and hence increase in integration level), reduction of power consumption, and increase of switching speed. Quantum dot-based computing and memory in general and QCA specifically, are intriguing to NASA due to their high packing density (10(exp 11) - 10(exp 12) per square cm ) and low power consumption (no transfer of current) and potentially higher radiation tolerant. Under Revolutionary Computing Technology (RTC) Program at the NASA/JPL Center for Integrated Space Microelectronics (CISM), we have been investigating the potential applications of QCA for the space program. To this end, exploiting the intrinsic features of QCA, we have designed novel QCA-based circuits for co-planner (i.e., single layer) and compact implementation of a class of data permutation matrices, a class of interconnection networks, and a bit-serial processor. Building upon these circuits, we have developed novel algorithms and QCA

  8. Surveillance and Datalink Communication Performance Analysis for Distributed Separation Assurance System Architectures

    Science.gov (United States)

    Chung, William W.; Linse, Dennis J.; Alaverdi, Omeed; Ifarraguerri, Carlos; Seifert, Scott C.; Salvano, Dan; Calender, Dale

    2012-01-01

    This study investigates the effects of two technical enablers: Automatic Dependent Surveillance - Broadcast (ADS-B) and digital datalink communication, of the Federal Aviation Administration s Next Generation Air Transportation System (NextGen) under two separation assurance (SA) system architectures: ground-based SA and airborne SA, on overall separation assurance performance. Datalink performance such as successful reception probability in both surveillance and communication messages, and surveillance accuracy are examined in various operational conditions. Required SA performance is evaluated as a function of subsystem performance, using availability, continuity, and integrity metrics to establish overall required separation assurance performance, under normal and off-nominal conditions.

  9. Building highly available control system applications with Advanced Telecom Computing Architecture and open standards

    International Nuclear Information System (INIS)

    Kazakov, Artem; Furukawa, Kazuro

    2010-01-01

    Requirements for modern and future control systems for large projects like International Linear Collider demand high availability for control system components. Recently telecom industry came up with a great open hardware specification - Advanced Telecom Computing Architecture (ATCA). This specification is aimed for better reliability, availability and serviceability. Since its first market appearance in 2004, ATCA platform has shown tremendous growth and proved to be stable and well represented by a number of vendors. ATCA is an industry standard for highly available systems. On the other hand Service Availability Forum, a consortium of leading communications and computing companies, describes interaction between hardware and software. SAF defines a set of specifications such as Hardware Platform Interface, Application Interface Specification. SAF specifications provide extensive description of highly available systems, services and their interfaces. Originally aimed for telecom applications, these specifications can be used for accelerator controls software as well. This study describes benefits of using these specifications and their possible adoption to accelerator control systems. It is demonstrated how EPICS Redundant IOC was extended using Hardware Platform Interface specification, which made it possible to utilize benefits of the ATCA platform.

  10. Performance evaluation of a computed radiography system

    Energy Technology Data Exchange (ETDEWEB)

    Roussilhe, J.; Fallet, E. [Carestream Health France, 71 - Chalon/Saone (France); Mango, St.A. [Carestream Health, Inc. Rochester, New York (United States)

    2007-07-01

    Computed radiography (CR) standards have been formalized and published in Europe and in the US. The CR system classification is defined in those standards by - minimum normalized signal-to-noise ratio (SNRN), and - maximum basic spatial resolution (SRb). Both the signal-to-noise ratio (SNR) and the contrast sensitivity of a CR system depend on the dose (exposure time and conditions) at the detector. Because of their wide dynamic range, the same storage phosphor imaging plate can qualify for all six CR system classes. The exposure characteristics from 30 to 450 kV, the contrast sensitivity, and the spatial resolution of the KODAK INDUSTREX CR Digital System have been thoroughly evaluated. This paper will present some of the factors that determine the system's spatial resolution performance. (authors)

  11. T and D-Bench--Innovative Combined Support for Education and Research in Computer Architecture and Embedded Systems

    Science.gov (United States)

    Soares, S. N.; Wagner, F. R.

    2011-01-01

    Teaching and Design Workbench (T&D-Bench) is a framework aimed at education and research in the areas of computer architecture and embedded systems. It includes a set of features not found in other educational environments. This set of features is the result of an original combination of design requirements for T&D-Bench: that the…

  12. ANL/Star project: a new architecture for large scale theoretical physics computations

    International Nuclear Information System (INIS)

    Rushton, A.M.

    1985-01-01

    The project reported consists of two phases, each of which has goals of substantial physics content on its own. In Phase 1, we have selected Star Technologies' ST-100 as the array processor for the prototype coupled system and have installed one on a Vax 11/750 host. Our goals with this system are to institute a substantial program in computational physics at Argonne based on the power provided by this system and thereby to gain experience with both the hardware and software architecture of the ST-100. In Phase II, we propose to build a prototype consisting of two coupled array processors with shared memory to prove that this design can achieve high speed and efficiency in a readily extensible and cost-effective manner. This will implement all of the hardware and software modifications necessary to extend this design to as many as 64 (or more) nodes. In our design, we seek to minimize the changes made in the standard system hardware and software; this drastically reduces the effort required by our group to implement such a design and enables us to more readily incorporate the companies' upgrades to the array processor. It should be emphasized that our design is intended as a special purpose system for theoretical calculations; however it can be efficiently applied to a surprisingly broad class of problems. I shall discuss first the architecture of the ST-100 and then the physics program being currently implemented on a single system. Finally the proposed design of the coupled system is presented

  13. ANL/Star project: a new architecture for large scale theoretical physics computations

    Energy Technology Data Exchange (ETDEWEB)

    Rushton, A.M.

    1985-01-01

    The project reported consists of two phases, each of which has goals of substantial physics content on its own. In Phase 1, we have selected Star Technologies' ST-100 as the array processor for the prototype coupled system and have installed one on a Vax 11/750 host. Our goals with this system are to institute a substantial program in computational physics at Argonne based on the power provided by this system and thereby to gain experience with both the hardware and software architecture of the ST-100. In Phase II, we propose to build a prototype consisting of two coupled array processors with shared memory to prove that this design can achieve high speed and efficiency in a readily extensible and cost-effective manner. This will implement all of the hardware and software modifications necessary to extend this design to as many as 64 (or more) nodes. In our design, we seek to minimize the changes made in the standard system hardware and software; this drastically reduces the effort required by our group to implement such a design and enables us to more readily incorporate the companies' upgrades to the array processor. It should be emphasized that our design is intended as a special purpose system for theoretical calculations; however it can be efficiently applied to a surprisingly broad class of problems. I shall discuss first the architecture of the ST-100 and then the physics program being currently implemented on a single system. Finally the proposed design of the coupled system is presented.

  14. Improving the energy performance of historic buildings with architectural and cultural values

    DEFF Research Database (Denmark)

    Hansen, Ernst Jan de Place

    2017-01-01

    The thermal performance of solid walls of historic buildings can be improved by external or internal insulation. External insulation is preferred from a technical perspective, but is often disregarded as many such buildings have architectural or cultural values leaving internal insulation.......g. improvement of thermal indoor climate. The paper discusses different motivating factors for improving the thermal performance of solid walls in historic buildings with architectural and cultural values. It is argued that internal insulation, provided that it can be done without resulting in critical moisture...... as the only possible solution. As internal insulation is considered a risky way of improving the thermal performance from a moisture perspective, technically feasible solutions are needed. Further, other arguments than energy saving could convince a building owner to carry out internal insulation, e...

  15. Parametric Approach to Assessing Performance of High-Lift Device Active Flow Control Architectures

    Directory of Open Access Journals (Sweden)

    Yu Cai

    2017-02-01

    Full Text Available Active Flow Control is at present an area of considerable research, with multiple potential aircraft applications. While the majority of research has focused on the performance of the actuators themselves, a system-level perspective is necessary to assess the viability of proposed solutions. This paper demonstrates such an approach, in which major system components are sized based on system flow and redundancy considerations, with the impacts linked directly to the mission performance of the aircraft. Considering the case of a large twin-aisle aircraft, four distinct active flow control architectures that facilitate the simplification of the high-lift mechanism are investigated using the demonstrated approach. The analysis indicates a very strong influence of system total mass flow requirement on architecture performance, both for a typical mission and also over the entire payload-range envelope of the aircraft.

  16. High performance integer arithmetic circuit design on FPGA architecture, implementation and design automation

    CERN Document Server

    Palchaudhuri, Ayan

    2016-01-01

    This book describes the optimized implementations of several arithmetic datapath, controlpath and pseudorandom sequence generator circuits for realization of high performance arithmetic circuits targeted towards a specific family of the high-end Field Programmable Gate Arrays (FPGAs). It explores regular, modular, cascadable, and bit-sliced architectures of these circuits, by directly instantiating the target FPGA-specific primitives in the HDL. Every proposed architecture is justified with detailed mathematical analyses. Simultaneously, constrained placement of the circuit building blocks is performed, by placing the logically related hardware primitives in close proximity to one another by supplying relevant placement constraints in the Xilinx proprietary “User Constraints File”. The book covers the implementation of a GUI-based CAD tool named FlexiCore integrated with the Xilinx Integrated Software Environment (ISE) for design automation of platform-specific high-performance arithmetic circuits from us...

  17. Air Force Science & Technology Issues & Opportunities Regarding High Performance Embedded Computing

    Science.gov (United States)

    2009-09-23

    price-performance advantage include: large scale simulations of neuromorphic computing models GOTCHA radar video SAR for wide area persistent...the handcuffs were not for me and that the military had so far got … Neuromorphic example: Robust recognition of occluded text Gotcha SAR PCID Image...Architecture 16 cores / chip 10 x 10 stacks / board50 chips / stack EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM

  18. High-Performance Computing Paradigm and Infrastructure

    CERN Document Server

    Yang, Laurence T

    2006-01-01

    With hyperthreading in Intel processors, hypertransport links in next generation AMD processors, multi-core silicon in today's high-end microprocessors from IBM and emerging grid computing, parallel and distributed computers have moved into the mainstream

  19. Thermal performance measurement and application of a multilayer insulator for emergency architecture

    International Nuclear Information System (INIS)

    Salvalai, Graziano; Imperadori, Marco; Scaccabarozzi, Diego; Pusceddu, Cristina

    2015-01-01

    Lightness coupled with a quick assembly method is crucial for emergency architecture in post-disaster area where accessibility and action time play a huge barer to rescue people. In this prospective, the following work analyses the potentiality (technological and thermal performances) of multilayer insulator for a new shelter envelope able to provide superior thermal comfort for the users. The thermal characteristics are derived experimentally by means of a guard ring apparatus under different working temperatures. Tests are performed on the multilayer insulator itself and on a composite structure, made of the multilayer insulator and two air gaps wrapped by a polyester cover, which is the core of a new lightweight emergency architecture. Experimental results show good agreement with literature data, providing a thermal conductivity and transmittance of about 0.04 W/(m °C) and 1.6 W/(m 2  °C) for the tested multilayer. The composite structure called Thermo Reflective Multilayer System (TRMS) shows better insulation performances, providing a thermal transmittance set to 0.85 W/(m 2  °C). A thermal model of an emergency tent based on the new insulating structure (TRMS) has been developed and its thermal performances have been compared with those of a UNHCR traditional emergency shelter. The shelter model was simulated (Trnsys v.17 environment) in the winter season considering the climate of Belgrade and using only the casual gains from occupant and solar radiation through opaque wall. Numerical simulations evidenced that the new insulating composite envelope reduces required heating load of about two and four times with respect to the traditional insulation. The study sets a starting point to develop a lightweight emergency architecture made with a combination between multilayer, air, polyester and vulcanized rubber. - Highlights: • Multilayer insulator tested by means of a guard ring apparatus. • Thermo reflective multilayer system (TRMS) development

  20. A Study Effects Architectural Marketing Capabilities on Performance Marketing unit Based on: Morgan et al case: Past Industry in Tehran

    OpenAIRE

    Mohammad Reza Dalvi; Robabe Seifi

    2014-01-01

    Over a period of time architectural marketing capabilities combination of knowledge and skills develop in to capabilities. These architectural marketing capabilities have been identified as one of the important ways firms can achieve a competitive advantage The following research tests effects architectural marketing capabilities on performance marketing unit Based on a survey .a structural equation model was developed to test our hypotheses. the study develops a structural model linking arch...

  1. High Performance Computing in Science and Engineering '99 : Transactions of the High Performance Computing Center

    CERN Document Server

    Jäger, Willi

    2000-01-01

    The book contains reports about the most significant projects from science and engineering of the Federal High Performance Computing Center Stuttgart (HLRS). They were carefully selected in a peer-review process and are showcases of an innovative combination of state-of-the-art modeling, novel algorithms and the use of leading-edge parallel computer technology. The projects of HLRS are using supercomputer systems operated jointly by university and industry and therefore a special emphasis has been put on the industrial relevance of results and methods.

  2. High Performance Computing in Science and Engineering '98 : Transactions of the High Performance Computing Center

    CERN Document Server

    Jäger, Willi

    1999-01-01

    The book contains reports about the most significant projects from science and industry that are using the supercomputers of the Federal High Performance Computing Center Stuttgart (HLRS). These projects are from different scientific disciplines, with a focus on engineering, physics and chemistry. They were carefully selected in a peer-review process and are showcases for an innovative combination of state-of-the-art physical modeling, novel algorithms and the use of leading-edge parallel computer technology. As HLRS is in close cooperation with industrial companies, special emphasis has been put on the industrial relevance of results and methods.

  3. High Performance Motion-Planner Architecture for Hardware-In-the-Loop System Based on Position-Based-Admittance-Control

    Directory of Open Access Journals (Sweden)

    Francesco La Mura

    2018-02-01

    Full Text Available This article focuses on a Hardware-In-the-Loop application developed from the advanced energy field project LIFES50+. The aim is to replicate, inside a wind gallery test facility, the combined effect of aerodynamic and hydrodynamic loads on a floating wind turbine model for offshore energy production, using a force controlled robotic device, emulating floating substructure’s behaviour. In addition to well known real-time Hardware-In-the-Loop (HIL issues, the particular application presented has stringent safety requirements of the HIL equipment and difficult to predict operating conditions, so that extra computational efforts have to be spent running specific safety algorithms and achieving desired performance. To meet project requirements, a high performance software architecture based on Position-Based-Admittance-Control (PBAC is presented, combining low level motion interpolation techniques, efficient motion planning, based on buffer management and Time-base control, and advanced high level safety algorithms, implemented in a rapid real-time control architecture.

  4. Lightweight Provenance Service for High-Performance Computing

    Energy Technology Data Exchange (ETDEWEB)

    Dai, Dong; Chen, Yong; Carns, Philip; Jenkins, John; Ross, Robert

    2017-09-09

    Provenance describes detailed information about the history of a piece of data, containing the relationships among elements such as users, processes, jobs, and workflows that contribute to the existence of data. Provenance is key to supporting many data management functionalities that are increasingly important in operations such as identifying data sources, parameters, or assumptions behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. Despite its importance, however, provenance support is largely underdeveloped in highly parallel architectures and systems. One major challenge is the demanding requirements of providing provenance service in situ. The need to remain lightweight and to be always on often conflicts with the need to be transparent and offer an accurate catalog of details regarding the applications and systems. To tackle this challenge, we introduce a lightweight provenance service, called LPS, for high-performance computing (HPC) systems. LPS leverages a kernel instrument mechanism to achieve transparency and introduces representative execution and flexible granularity to capture comprehensive provenance with controllable overhead. Extensive evaluations and use cases have confirmed its efficiency and usability. We believe that LPS can be integrated into current and future HPC systems to support a variety of data management needs.

  5. The Spin Torque Lego - from spin torque nano-devices to advanced computing architectures

    Science.gov (United States)

    Grollier, Julie

    2013-03-01

    Spin transfer torque (STT), predicted in 1996, and first observed around 2000, brought spintronic devices to the realm of active elements. A whole class of new devices, based on the combined effects of STT for writing and Giant Magneto-Resistance or Tunnel Magneto-Resistance for reading has emerged. The second generation of MRAMs, based on spin torque writing : the STT-RAM, is under industrial development and should be out on the market in three years. But spin torque devices are not limited to binary memories. We will rapidly present how the spin torque effect also allows to implement non-linear nano-oscillators, spin-wave emitters, controlled stochastic devices and microwave nano-detectors. What is extremely interesting is that all these functionalities can be obtained using the same materials, the exact same stack, simply by changing the device geometry and its bias conditions. So these different devices can be seen as Lego bricks, each brick with its own functionality. During this talk, I will show how spin torque can be engineered to build new bricks, such as the Spintronic Memristor, an artificial magnetic nano-synapse. I will then give hints on how to assemble these bricks in order to build novel types of computing architectures, with a special focus on neuromorphic circuits. Financial support by the European Research Council Starting Grant NanoBrain (ERC 2010 Stg 259068) is acknowledged.

  6. METRIC context unit architecture

    Energy Technology Data Exchange (ETDEWEB)

    Simpson, R.O.

    1988-01-01

    METRIC is an architecture for a simple but powerful Reduced Instruction Set Computer (RISC). Its speed comes from the simultaneous processing of several instruction streams, with instructions from the various streams being dispatched into METRIC's execution pipeline as they become available for execution. The pipeline is thus kept full, with a mix of instructions for several contexts in execution at the same time. True parallel programming is supported within a single execution unit, the METRIC Context Unit. METRIC's architecture provides for expansion through the addition of multiple Context Units and of specialized Functional Units. The architecture thus spans a range of size and performance from a single-chip microcomputer up through large and powerful multiprocessors. This research concentrates on the specification of the METRIC Context Unit at the architectural level. Performance tradeoffs made during METRIC's design are discussed, and projections of METRIC's performance are made based on simulation studies.

  7. Open Computer Forensic Architecture a Way to Process Terabytes of Forensic Disk Images

    Science.gov (United States)

    Vermaas, Oscar; Simons, Joep; Meijer, Rob

    This chapter describes the Open Computer Forensics Architecture (OCFA), an automated system that dissects complex file types, extracts metadata from files and ultimately creates indexes on forensic images of seized computers. It consists of a set of collaborating processes, called modules. Each module is specialized in processing a certain file type. When it receives a so called 'evidence', the information that has been extracted so far about the file together with the actual data, it either adds new information about the file or uses the file to derive a new 'evidence'. All evidence, original and derived, is sent to a router after being processed by a particular module. The router decides which module should process the evidence next, based upon the metadata associated with the evidence. Thus the OCFA system can recursively process images until from every compound file the embedded files, if any, are extracted, all information that the system can derive, has been derived and all extracted text is indexed. Compound files include, but are not limited to, archive- and zip-files, disk images, text documents of various formats and, for example, mailboxes. The output of an OCFA run is a repository full of derived files, a database containing all extracted information about the files and an index which can be used when searching. This is presented in a web interface. Moreover, processed data is easily fed to third party software for further analysis or to be used in data mining or text mining-tools. The main advantages of the OCFA system are Scalability, it is able to process large amounts of data.

  8. The coupling of fluids, dynamics, and controls on advanced architecture computers

    Science.gov (United States)

    Atwood, Christopher

    1995-01-01

    This grant provided for the demonstration of coupled controls, body dynamics, and fluids computations in a workstation cluster environment; and an investigation of the impact of peer-peer communication on flow solver performance and robustness. The findings of these investigations were documented in the conference articles.The attached publication, 'Towards Distributed Fluids/Controls Simulations', documents the solution and scaling of the coupled Navier-Stokes, Euler rigid-body dynamics, and state feedback control equations for a two-dimensional canard-wing. The poor scaling shown was due to serialized grid connectivity computation and Ethernet bandwidth limits. The scaling of a peer-to-peer communication flow code on an IBM SP-2 was also shown. The scaling of the code on the switched fabric-linked nodes was good, with a 2.4 percent loss due to communication of intergrid boundary point information. The code performance on 30 worker nodes was 1.7 (mu)s/point/iteration, or a factor of three over a Cray C-90 head. The attached paper, 'Nonlinear Fluid Computations in a Distributed Environment', documents the effect of several computational rate enhancing methods on convergence. For the cases shown, the highest throughput was achieved using boundary updates at each step, with the manager process performing communication tasks only. Constrained domain decomposition of the implicit fluid equations did not degrade the convergence rate or final solution. The scaling of a coupled body/fluid dynamics problem on an Ethernet-linked cluster was also shown.

  9. Contributing to the design of run-time systems dedicated to high performance computing

    International Nuclear Information System (INIS)

    Perache, M.

    2006-10-01

    In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)

  10. High Performance Numerical Computing for High Energy Physics: A New Challenge for Big Data Science

    International Nuclear Information System (INIS)

    Pop, Florin

    2014-01-01

    Modern physics is based on both theoretical analysis and experimental validation. Complex scenarios like subatomic dimensions, high energy, and lower absolute temperature are frontiers for many theoretical models. Simulation with stable numerical methods represents an excellent instrument for high accuracy analysis, experimental validation, and visualization. High performance computing support offers possibility to make simulations at large scale, in parallel, but the volume of data generated by these experiments creates a new challenge for Big Data Science. This paper presents existing computational methods for high energy physics (HEP) analyzed from two perspectives: numerical methods and high performance computing. The computational methods presented are Monte Carlo methods and simulations of HEP processes, Markovian Monte Carlo, unfolding methods in particle physics, kernel estimation in HEP, and Random Matrix Theory used in analysis of particles spectrum. All of these methods produce data-intensive applications, which introduce new challenges and requirements for ICT systems architecture, programming paradigms, and storage capabilities.

  11. A Parallel Implementation of a Smoothed Particle Hydrodynamics Method on Graphics Hardware Using the Compute Unified Device Architecture

    International Nuclear Information System (INIS)

    Wong Unhong; Wong Honcheng; Tang Zesheng

    2010-01-01

    The smoothed particle hydrodynamics (SPH), which is a class of meshfree particle methods (MPMs), has a wide range of applications from micro-scale to macro-scale as well as from discrete systems to continuum systems. Graphics hardware, originally designed for computer graphics, now provide unprecedented computational power for scientific computation. Particle system needs a huge amount of computations in physical simulation. In this paper, an efficient parallel implementation of a SPH method on graphics hardware using the Compute Unified Device Architecture is developed for fluid simulation. Comparing to the corresponding CPU implementation, our experimental results show that the new approach allows significant speedups of fluid simulation through handling huge amount of computations in parallel on graphics hardware.

  12. Non-Planar Nanotube and Wavy Architecture Based Ultra-High Performance Field Effect Transistors

    KAUST Repository

    Hanna, Amir

    2016-11-01

    This dissertation presents a unique concept for a device architecture named the nanotube (NT) architecture, which is capable of higher drive current compared to the Gate-All-Around Nanowire architecture when applied to heterostructure Tunnel Field Effect Transistors. Through the use of inner/outer core-shell gates, heterostructure NT TFET leverages physically larger tunneling area thus achieving higher driver current (ION) and saving real estates by eliminating arraying requirement. We discuss the physics of p-type (Silicon/Indium Arsenide) and n-type (Silicon/Germanium hetero-structure) based TFETs. Numerical TCAD simulations have shown that NT TFETs have 5x and 1.6 x higher normalized ION when compared to GAA NW TFET for p and n-type TFETs, respectively. This is due to the availability of larger tunneling junction cross sectional area, and lower Shockley-Reed-Hall recombination, while achieving sub 60 mV/dec performance for more than 5 orders of magnitude of drain current, thus enabling scaling down of Vdd to 0.5 V. This dissertation also introduces a novel thin-film-transistors architecture that is named the Wavy Channel (WC) architecture, which allows for extending device width by integrating vertical fin-like substrate corrugations giving rise to up to 50% larger device width, without occupying extra chip area. The novel architecture shows 2x higher output drive current per unit chip area when compared to conventional planar architecture. The current increase is attributed to both the extra device width and 50% enhancement in field effect mobility due to electrostatic gating effects. Digital circuits are fabricated to demonstrate the potential of integrating WC TFT based circuits. WC inverters have shown 2× the peak-to-peak output voltage for the same input, and ~2× the operation frequency of the planar inverters for the same peak-to-peak output voltage. WC NAND circuits have shown 2× higher peak-to-peak output voltage, and 3× lower high-to-low propagation

  13. A high level language for a high performance computer

    Science.gov (United States)

    Perrott, R. H.

    1978-01-01

    The proposed computational aerodynamic facility will join the ranks of the supercomputers due to its architecture and increased execution speed. At present, the languages used to program these supercomputers have been modifications of programming languages which were designed many years ago for sequential machines. A new programming language should be developed based on the techniques which have proved valuable for sequential programming languages and incorporating the algorithmic techniques required for these supercomputers. The design objectives for such a language are outlined.

  14. A performance model for the communication in fast multipole methods on high-performance computing platforms

    KAUST Repository

    Ibeid, Huda

    2016-03-04

    Exascale systems are predicted to have approximately 1 billion cores, assuming gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the currently dominant parallel programing model. Currently, there are many efforts to evaluate the hardware and software bottlenecks of exascale designs. It is therefore of interest to model application performance and to understand what changes need to be made to ensure extrapolated scalability. The fast multipole method (FMM) was originally developed for accelerating N-body problems in astrophysics and molecular dynamics but has recently been extended to a wider range of problems. Its high arithmetic intensity combined with its linear complexity and asynchronous communication patterns make it a promising algorithm for exascale systems. In this paper, we discuss the challenges for FMM on current parallel computers and future exascale architectures, with a focus on internode communication. We focus on the communication part only; the efficiency of the computational kernels are beyond the scope of the present study. We develop a performance model that considers the communication patterns of the FMM and observe a good match between our model and the actual communication time on four high-performance computing (HPC) systems, when latency, bandwidth, network topology, and multicore penalties are all taken into account. To our knowledge, this is the first formal characterization of internode communication in FMM that validates the model against actual measurements of communication time. The ultimate communication model is predictive in an absolute sense; however, on complex systems, this objective is often out of reach or of a difficulty out of proportion to its benefit when there exists a simpler model that is inexpensive and sufficient to guide coding decisions leading to improved scaling. The current model provides such guidance.

  15. DOE research in utilization of high-performance computers

    International Nuclear Information System (INIS)

    Buzbee, B.L.; Worlton, W.J.; Michael, G.; Rodrigue, G.

    1980-12-01

    Department of Energy (DOE) and other Government research laboratories depend on high-performance computer systems to accomplish their programatic goals. As the most powerful computer systems become available, they are acquired by these laboratories so that advances can be made in their disciplines. These advances are often the result of added sophistication to numerical models whose execution is made possible by high-performance computer systems. However, high-performance computer systems have become increasingly complex; consequently, it has become increasingly difficult to realize their potential performance. The result is a need for research on issues related to the utilization of these systems. This report gives a brief description of high-performance computers, and then addresses the use of and future needs for high-performance computers within DOE, the growing complexity of applications within DOE, and areas of high-performance computer systems warranting research. 1 figure

  16. The Jupyter/IPython architecture: a unified view of computational research, from interactive exploration to communication and publication.

    Science.gov (United States)

    Ragan-Kelley, M.; Perez, F.; Granger, B.; Kluyver, T.; Ivanov, P.; Frederic, J.; Bussonnier, M.

    2014-12-01

    IPython has provided terminal-based tools for interactive computing in Python since 2001. The notebook document format and multi-process architecture introduced in 2011 have expanded the applicable scope of IPython into teaching, presenting, and sharing computational work, in addition to interactive exploration. The new architecture also allows users to work in any language, with implementations in Python, R, Julia, Haskell, and several other languages. The language agnostic parts of IPython have been renamed to Jupyter, to better capture the notion that a cross-language design can encapsulate commonalities present in computational research regardless of the programming language being used. This architecture offers components like the web-based Notebook interface, that supports rich documents that combine code and computational results with text narratives, mathematics, images, video and any media that a modern browser can display. This interface can be used not only in research, but also for publication and education, as notebooks can be converted to a variety of output formats, including HTML and PDF. Recent developments in the Jupyter project include a multi-user environment for hosting notebooks for a class or research group, a live collaboration notebook via Google Docs, and better support for languages other than Python.

  17. Analysis of Critical Characteristics for Safety Graded Personnel Computers in the KNICS Architecture

    International Nuclear Information System (INIS)

    Lee, Hyun Chul; Lee, Dong Young

    2009-01-01

    Critical characteristics analysis of a safety related item is to identify characteristics to be verified to replace an original item with the dedicated item. It is sure that the dedicated item meeting critical characteristics would perform its intended safety function instead of the specified item. KNICS project developed two safety systems: IDiPS RPS (Reactor Protection System) and IDiPS ESF-CCS (Engineered Safety Features-Component Control System). Two safety systems of IDiPS are equipped with personnel computers, so-called COMs (Cabinet Operator Modules), in their cabinets. The personnel computers, COMs, are responsible for safety system monitoring, testing, and maintaining. Even though two safety systems are safety critical system, the personnel computers of two systems, i.e. COMs, are not graded as safety-graded items. Regulation requirements are expected to be strengthened, and the functions of the personnel computer may be enhanced to include safety-related functions and safety functions, it would be necessary that the grade of the personnel computers is adjusted to a higher level, the safety grade. To try to upgrade a non safety system, i.e. COMs, to a safety system, its safety functions and requirements, i.e. critical characteristics, must be identified and verified. This paper describes the process of the identification of critical characteristics and the results of analysis

  18. Management of microbial community composition, architecture and performance in autotrophic nitrogen removing bioreactors through aeration regimes

    DEFF Research Database (Denmark)

    Mutlu, A. Gizem

    to describe aggregation and architectural evolution in nitritation/anammox reactors, incorporating the possible influences of intermediates formed with intermittent aeration. Community analysis revealed an abundant fraction of heterotrophic types despite the absence of organic carbon in the feed. The aerobic...... and anaerobic ammonia oxidizing guilds were dominated by fast-growing Nitrosomonas spp. and Ca. Brocadia spp., while the nitrite oxidizing guild was dominated by high affinity Nitrospira spp. Emission of nitrous oxide (N2O) was evaluated from both reactors under dynamic aeration regimes. Contrary to the widely...... impacts could be isolated, increasing process understanding. It was demonstrated that aeration strategy can be used as a powerful tool to manipulate the microbial community composition, its architecture and reactor performance. We suggest operation via intermittent aeration with short aerated periods...

  19. Performance of 3-D architecture silicon sensors after intense proton irradiation

    CERN Document Server

    Parker, S I

    2001-01-01

    Silicon detectors with a three-dimensional architecture, in which the n- and p-electrodes penetrate through the entire substrate, have been successfully fabricated. The electrodes can be separated from each other by distances that are less than the substrate thickness, allowing short collection paths, low depletion voltages, and large current signals from rapid charge collection. While no special hardening steps were taken in this initial fabrication run, these features of three dimensional architectures produce an intrinsic resistance to the effects of radiation damage. Some performance measurements are given for detectors that are fully depleted and working after exposures to proton beams with doses equivalent to that from slightly more than ten years at the B-layer radius (50 mm) in the planned Atlas detector at the Large Hadron Collider at CERN. (41 refs).

  20. An open source/real-time atomic force microscope architecture to perform customizable force spectroscopy experiments.

    Science.gov (United States)

    Materassi, Donatello; Baschieri, Paolo; Tiribilli, Bruno; Zuccheri, Giampaolo; Samorì, Bruno

    2009-08-01

    We describe the realization of an atomic force microscope architecture designed to perform customizable experiments in a flexible and automatic way. Novel technological contributions are given by the software implementation platform (RTAI-LINUX), which is free and open source, and from a functional point of view, by the implementation of hard real-time control algorithms. Some other technical solutions such as a new way to estimate the optical lever constant are described as well. The adoption of this architecture provides many degrees of freedom in the device behavior and, furthermore, allows one to obtain a flexible experimental instrument at a relatively low cost. In particular, we show how such a system has been employed to obtain measures in sophisticated single-molecule force spectroscopy experiments [Fernandez and Li, Science 303, 1674 (2004)]. Experimental results on proteins already studied using the same methodologies are provided in order to show the reliability of the measure system.

  1. Exploring performance and energy tradeoffs for irregular applications: A case study on the Tilera many-core architecture

    Energy Technology Data Exchange (ETDEWEB)

    Panyala, Ajay; Chavarría-Miranda, Daniel; Manzano, Joseph B.; Tumeo, Antonino; Halappanavar, Mahantesh

    2017-06-01

    High performance, parallel applications with irregular data accesses are becoming a critical workload class for modern systems. In particular, the execution of such workloads on emerging many-core systems is expected to be a significant component of applications in data mining, machine learning, scientific computing and graph analytics. However, power and energy constraints limit the capabilities of individual cores, memory hierarchy and on-chip interconnect of such systems, thus leading to architectural and software trade-os that must be understood in the context of the intended application’s behavior. Irregular applications are notoriously hard to optimize given their data-dependent access patterns, lack of structured locality and complex data structures and code patterns. We have ported two irregular applications, graph community detection using the Louvain method (Grappolo) and high-performance conjugate gradient (HPCCG), to the Tilera many-core system and have conducted a detailed study of platform-independent and platform-specific optimizations that improve their performance as well as reduce their overall energy consumption. To conduct this study, we employ an auto-tuning based approach that explores the optimization design space along three dimensions - memory layout schemes, GCC compiler flag choices and OpenMP loop scheduling options. We leverage MIT’s OpenTuner auto-tuning framework to explore and recommend energy optimal choices for different combinations of parameters. We then conduct an in-depth architectural characterization to understand the memory behavior of the selected workloads. Finally, we perform a correlation study to demonstrate the interplay between the hardware behavior and application characteristics. Using auto-tuning, we demonstrate whole-node energy savings and performance improvements of up to 49:6% and 60% relative to a baseline instantiation, and up to 31% and 45:4% relative to manually optimized variants.

  2. Stability and performance of propulsion control systems with distributed control architectures and failures

    Science.gov (United States)

    Belapurkar, Rohit K.

    Future aircraft engine control systems will be based on a distributed architecture, in which, the sensors and actuators will be connected to the Full Authority Digital Engine Control (FADEC) through an engine area network. Distributed engine control architecture will allow the implementation of advanced, active control techniques along with achieving weight reduction, improvement in performance and lower life cycle cost. The performance of a distributed engine control system is predominantly dependent on the performance of the communication network. Due to the serial data transmission policy, network-induced time delays and sampling jitter are introduced between the sensor/actuator nodes and the distributed FADEC. Communication network faults and transient node failures may result in data dropouts, which may not only degrade the control system performance but may even destabilize the engine control system. Three different architectures for a turbine engine control system based on a distributed framework are presented. A partially distributed control system for a turbo-shaft engine is designed based on ARINC 825 communication protocol. Stability conditions and control design methodology are developed for the proposed partially distributed turbo-shaft engine control system to guarantee the desired performance under the presence of network-induced time delay and random data loss due to transient sensor/actuator failures. A fault tolerant control design methodology is proposed to benefit from the availability of an additional system bandwidth and from the broadcast feature of the data network. It is shown that a reconfigurable fault tolerant control design can help to reduce the performance degradation in presence of node failures. A T-700 turbo-shaft engine model is used to validate the proposed control methodology based on both single input and multiple-input multiple-output control design techniques.

  3. Performance of Air Pollution Models on Massively Parallel Computers

    DEFF Research Database (Denmark)

    Brown, John; Hansen, Per Christian; Wasniewski, Jerzy

    1996-01-01

    To compare the performance and use of three massively parallel SIMD computers, we implemented a large air pollution model on the computers. Using a realistic large-scale model, we gain detailed insight about the performance of the three computers when used to solve large-scale scientific problems...

  4. High Performance Computing Facility Operational Assessment 2015: Oak Ridge Leadership Computing Facility

    Energy Technology Data Exchange (ETDEWEB)

    Barker, Ashley D. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Bernholdt, David E. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Bland, Arthur S. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Gary, Jeff D. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Hack, James J. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; McNally, Stephen T. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Rogers, James H. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Smith, Brian E. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Straatsma, T. P. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Sukumar, Sreenivas Rangan [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Thach, Kevin G. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Tichenor, Suzy [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Vazhkudai, Sudharshan S. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Wells, Jack C. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility

    2016-03-01

    Oak Ridge National Laboratory’s (ORNL’s) Leadership Computing Facility (OLCF) continues to surpass its operational target goals: supporting users; delivering fast, reliable systems; creating innovative solutions for high-performance computing (HPC) needs; and managing risks, safety, and security aspects associated with operating one of the most powerful computers in the world. The results can be seen in the cutting-edge science delivered by users and the praise from the research community. Calendar year (CY) 2015 was filled with outstanding operational results and accomplishments: a very high rating from users on overall satisfaction that ties the highest-ever mark set in CY 2014; the greatest number of core-hours delivered to research projects; the largest percentage of capability usage since the OLCF began tracking the metric in 2009; and success in delivering on the allocation of 60, 30, and 10% of core hours offered for the INCITE (Innovative and Novel Computational Impact on Theory and Experiment), ALCC (Advanced Scientific Computing Research Leadership Computing Challenge), and Director’s Discretionary programs, respectively. These accomplishments, coupled with the extremely high utilization rate, represent the fulfillment of the promise of Titan: maximum use by maximum-size simulations. The impact of all of these successes and more is reflected in the accomplishments of OLCF users, with publications this year in notable journals Nature, Nature Materials, Nature Chemistry, Nature Physics, Nature Climate Change, ACS Nano, Journal of the American Chemical Society, and Physical Review Letters, as well as many others. The achievements included in the 2015 OLCF Operational Assessment Report reflect first-ever or largest simulations in their communities; for example Titan enabled engineers in Los Angeles and the surrounding region to design and begin building improved critical infrastructure by enabling the highest-resolution Cybershake map for Southern

  5. Building a High Performance Computing Infrastructure for Novosibirsk Scientific Center

    International Nuclear Information System (INIS)

    Adakin, A; Chubarov, D; Nikultsev, V; Belov, S; Kaplin, V; Sukharev, A; Zaytsev, A; Kalyuzhny, V; Kuchin, N; Lomakin, S

    2011-01-01

    Novosibirsk Scientific Center (NSC), also known worldwide as Akademgorodok, is one of the largest Russian scientific centers hosting Novosibirsk State University (NSU) and more than 35 research organizations of the Siberian Branch of Russian Academy of Sciences including Budker Institute of Nuclear Physics (BINP), Institute of Computational Technologies (ICT), and Institute of Computational Mathematics and Mathematical Geophysics (ICM and MG). Since each institute has specific requirements on the architecture of the computing farms involved in its research field, currently we've got several computing facilities hosted by NSC institutes, each optimized for the particular set of tasks, of which the largest are the NSU Supercomputer Center, Siberian Supercomputer Center (ICM and MG), and a Grid Computing Facility of BINP. Recently a dedicated optical network with the initial bandwidth of 10 Gbps connecting these three facilities was built in order to make it possible to share the computing resources among the research communities of participating institutes, thus providing a common platform for building the computing infrastructure for various scientific projects. Unification of the computing infrastructure is achieved by extensive use of virtualization technologies based on XEN and KVM platforms. The solution implemented was tested thoroughly within the computing environment of KEDR detector experiment which is being carried out at BINP, and foreseen to be applied to the use cases of other HEP experiments in the upcoming future.

  6. The ongoing investigation of high performance parallel computing in HEP

    CERN Document Server

    Peach, Kenneth J; Böck, R K; Dobinson, Robert W; Hansroul, M; Norton, Alan Robert; Willers, Ian Malcolm; Baud, J P; Carminati, F; Gagliardi, F; McIntosh, E; Metcalf, M; Robertson, L; CERN. Geneva. Detector Research and Development Committee

    1993-01-01

    Past and current exploitation of parallel computing in High Energy Physics is summarized and a list of R & D projects in this area is presented. The applicability of new parallel hardware and software to physics problems is investigated, in the light of the requirements for computing power of LHC experiments and the current trends in the computer industry. Four main themes are discussed (possibilities for a finer grain of parallelism; fine-grain communication mechanism; usable parallel programming environment; different programming models and architectures, using standard commercial products). Parallel computing technology is potentially of interest for offline and vital for real time applications in LHC. A substantial investment in applications development and evaluation of state of the art hardware and software products is needed. A solid development environment is required at an early stage, before mainline LHC program development begins.

  7. X-Ray Computed Tomography Reveals the Response of Root System Architecture to Soil Texture1[OPEN

    Science.gov (United States)

    Rogers, Eric D.; Monaenkova, Daria; Mijar, Medhavinee; Goldman, Daniel I.

    2016-01-01

    Root system architecture (RSA) impacts plant fitness and crop yield by facilitating efficient nutrient and water uptake from the soil. A better understanding of the effects of soil on RSA could improve crop productivity by matching roots to their soil environment. We used x-ray computed tomography to perform a detailed three-dimensional quantification of changes in rice (Oryza sativa) RSA in response to the physical properties of a granular substrate. We characterized the RSA of eight rice cultivars in five different growth substrates and determined that RSA is the result of interactions between genotype and growth environment. We identified cultivar-specific changes in RSA in response to changing growth substrate texture. The cultivar Azucena exhibited low RSA plasticity in all growth substrates, whereas cultivar Bala root depth was a function of soil hardness. Our imaging techniques provide a framework to study RSA in different growth environments, the results of which can be used to improve root traits with agronomic potential. PMID:27208237

  8. An AmI-Based Software Architecture Enabling Evolutionary Computation in Blended Commerce: The Shopping Plan Application

    Directory of Open Access Journals (Sweden)

    Giuseppe D’Aniello

    2015-01-01

    Full Text Available This work describes an approach to synergistically exploit ambient intelligence technologies, mobile devices, and evolutionary computation in order to support blended commerce or ubiquitous commerce scenarios. The work proposes a software architecture consisting of three main components: linked data for e-commerce, cloud-based services, and mobile apps. The three components implement a scenario where a shopping mall is presented as an intelligent environment in which customers use NFC capabilities of their smartphones in order to handle e-coupons produced, suggested, and consumed by the abovesaid environment. The main function of the intelligent environment is to help customers define shopping plans, which minimize the overall shopping cost by looking for best prices, discounts, and coupons. The paper proposes a genetic algorithm to find suboptimal solutions for the shopping plan problem in a highly dynamic context, where the final cost of a product for an individual customer is dependent on his previous purchases. In particular, the work provides details on the Shopping Plan software prototype and some experimentation results showing the overall performance of the genetic algorithm.

  9. Peregrine System | High-Performance Computing | NREL

    Science.gov (United States)

    classes of nodes that users access: Login Nodes Peregrine has four login nodes, each of which has Intel E5 /scratch file systems, the /mss file system is mounted on all login nodes. Compute Nodes Peregrine has 2592

  10. Cloud/Fog Computing System Architecture and Key Technologies for South-North Water Transfer Project Safety

    Directory of Open Access Journals (Sweden)

    Yaoling Fan

    2018-01-01

    Full Text Available In view of the real-time and distributed features of Internet of Things (IoT safety system in water conservancy engineering, this study proposed a new safety system architecture for water conservancy engineering based on cloud/fog computing and put forward a method of data reliability detection for the false alarm caused by false abnormal data from the bottom sensors. Designed for the South-North Water Transfer Project (SNWTP, the architecture integrated project safety, water quality safety, and human safety. Using IoT devices, fog computing layer was constructed between cloud server and safety detection devices in water conservancy projects. Technologies such as real-time sensing, intelligent processing, and information interconnection were developed. Therefore, accurate forecasting, accurate positioning, and efficient management were implemented as required by safety prevention of the SNWTP, and safety protection of water conservancy projects was effectively improved, and intelligential water conservancy engineering was developed.

  11. STEMsalabim: A high-performance computing cluster friendly code for scanning transmission electron microscopy image simulations of thin specimens

    International Nuclear Information System (INIS)

    Oelerich, Jan Oliver; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D.; Volz, Kerstin

    2017-01-01

    Highlights: • We present STEMsalabim, a modern implementation of the multislice algorithm for simulation of STEM images. • Our package is highly parallelizable on high-performance computing clusters, combining shared and distributed memory architectures. • With STEMsalabim, computationally and memory expensive STEM image simulations can be carried out within reasonable time. - Abstract: We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space.

  12. STEMsalabim: A high-performance computing cluster friendly code for scanning transmission electron microscopy image simulations of thin specimens

    Energy Technology Data Exchange (ETDEWEB)

    Oelerich, Jan Oliver, E-mail: jan.oliver.oelerich@physik.uni-marburg.de; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D.; Volz, Kerstin

    2017-06-15

    Highlights: • We present STEMsalabim, a modern implementation of the multislice algorithm for simulation of STEM images. • Our package is highly parallelizable on high-performance computing clusters, combining shared and distributed memory architectures. • With STEMsalabim, computationally and memory expensive STEM image simulations can be carried out within reasonable time. - Abstract: We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space.

  13. Performance Evaluation of 14 Neural Network Architectures Used for Predicting Heat Transfer Characteristics of Engine Oils

    Science.gov (United States)

    Al-Ajmi, R. M.; Abou-Ziyan, H. Z.; Mahmoud, M. A.

    2012-01-01

    This paper reports the results of a comprehensive study that aimed at identifying best neural network architecture and parameters to predict subcooled boiling characteristics of engine oils. A total of 57 different neural networks (NNs) that were derived from 14 different NN architectures were evaluated for four different prediction cases. The NNs were trained on experimental datasets performed on five engine oils of different chemical compositions. The performance of each NN was evaluated using a rigorous statistical analysis as well as careful examination of smoothness of predicted boiling curves. One NN, out of the 57 evaluated, correctly predicted the boiling curves for all cases considered either for individual oils or for all oils taken together. It was found that the pattern selection and weight update techniques strongly affect the performance of the NNs. It was also revealed that the use of descriptive statistical analysis such as R2, mean error, standard deviation, and T and slope tests, is a necessary but not sufficient condition for evaluating NN performance. The performance criteria should also include inspection of the smoothness of the predicted curves either visually or by plotting the slopes of these curves.

  14. The role of FFM accumulation and skeletal muscle architecture in powerlifting performance.

    Science.gov (United States)

    Brechue, William F; Abe, Takashi

    2002-02-01

    The purpose of this study was to determine the distribution and architectural characteristics of skeletal muscle in elite powerlifters, and to investigate their relationship to fat-free mat (FFM) accumulation and powerlifting performance. Twenty elite male powerlifters (including four world and three US national champions) volunteered for this study. FFM, skeletal muscle distribution (muscle thickness at 13 anatomical sites), and isolated muscle thickness and fascicle pennation angle (PAN) of the triceps long-head (TL), vastus lateralis, and gastrocnemius medialis (MG) muscles were measured with B-mode ultrasound. Fascicle length (FAL) was calculated. Best lifting performance in the bench press (BP), squat lift (SQT), and dead lift (DL) was recorded from competition performance. Significant correlations (P FFM and FFM relative to standing height (r = 0.86 to 0.95, P FFM (r = 0.59, P FFM and, therefore, may be limited by the ability to accumulate FFM. Additionally, muscle architecture appears to play an important role in powerlifting performance in that greater fascicle lengths are associated with greater FFM accumulation and powerlifting performance.

  15. FPGA hardware acceleration for high performance neutron transport computation based on agent methodology - 318

    International Nuclear Information System (INIS)

    Shanjie, Xiao; Tatjana, Jevremovic

    2010-01-01

    The accurate, detailed and 3D neutron transport analysis for Gen-IV reactors is still time-consuming regardless of advanced computational hardware available in developed countries. This paper introduces a new concept in addressing the computational time while persevering the detailed and accurate modeling; a specifically designed FPGA co-processor accelerates robust AGENT methodology for complex reactor geometries. For the first time this approach is applied to accelerate the neutronics analysis. The AGENT methodology solves neutron transport equation using the method of characteristics. The AGENT methodology performance was carefully analyzed before the hardware design based on the FPGA co-processor was adopted. The most time-consuming kernel part is then transplanted into the FPGA co-processor. The FPGA co-processor is designed with data flow-driven non von-Neumann architecture and has much higher efficiency than the conventional computer architecture. Details of the FPGA co-processor design are introduced and the design is benchmarked using two different examples. The advanced chip architecture helps the FPGA co-processor obtaining more than 20 times speed up with its working frequency much lower than the CPU frequency. (authors)

  16. Improving the Performance of CPU Architectures by Reducing the Operating System Overhead (Extended Version

    Directory of Open Access Journals (Sweden)

    Zagan Ionel

    2016-07-01

    Full Text Available The predictable CPU architectures that run hard real-time tasks must be executed with isolation in order to provide a timing-analyzable execution for real-time systems. The major problems for real-time operating systems are determined by an excessive jitter, introduced mainly through task switching. This can alter deadline requirements, and, consequently, the predictability of hard real-time tasks. New requirements also arise for a real-time operating system used in mixed-criticality systems, when the executions of hard real-time applications require timing predictability. The present article discusses several solutions to improve the performance of CPU architectures and eventually overcome the Operating Systems overhead inconveniences. This paper focuses on the innovative CPU implementation named nMPRA-MT, designed for small real-time applications. This implementation uses the replication and remapping techniques for the program counter, general purpose registers and pipeline registers, enabling multiple threads to share a single pipeline assembly line. In order to increase predictability, the proposed architecture partially removes the hazard situation at the expense of larger execution latency per one instruction.

  17. Wavy channel thin film transistor architecture for area efficient, high performance and low power displays

    KAUST Repository

    Hanna, Amir

    2013-12-23

    We demonstrate a new thin film transistor (TFT) architecture that allows expansion of the device width using continuous fin features - termed as wavy channel (WC) architecture. This architecture allows expansion of transistor width in a direction perpendicular to the substrate, thus not consuming extra chip area, achieving area efficiency. The devices have shown for a 13% increase in the device width resulting in a maximum 2.5× increase in \\'ON\\' current value of the WCTFT, when compared to planar devices consuming the same chip area, while using atomic layer deposition based zinc oxide (ZnO) as the channel material. The WCTFT devices also maintain similar \\'OFF\\' current value, ~100 pA, when compared to planar devices, thus not compromising on power consumption for performance which usually happens with larger width devices. This work offers an interesting opportunity to use WCTFTs as backplane circuitry for large-area high-resolution display applications. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  18. SAME4HPC: A Promising Approach in Building a Scalable and Mobile Environment for High-Performance Computing

    Energy Technology Data Exchange (ETDEWEB)

    Karthik, Rajasekar [ORNL

    2014-01-01

    In this paper, an architecture for building Scalable And Mobile Environment For High-Performance Computing with spatial capabilities called SAME4HPC is described using cutting-edge technologies and standards such as Node.js, HTML5, ECMAScript 6, and PostgreSQL 9.4. Mobile devices are increasingly becoming powerful enough to run high-performance apps. At the same time, there exist a significant number of low-end and older devices that rely heavily on the server or the cloud infrastructure to do the heavy lifting. Our architecture aims to support both of these types of devices to provide high-performance and rich user experience. A cloud infrastructure consisting of OpenStack with Ubuntu, GeoServer, and high-performance JavaScript frameworks are some of the key open-source and industry standard practices that has been adopted in this architecture.

  19. Performing an allreduce operation on a plurality of compute nodes of a parallel computer

    Science.gov (United States)

    Faraj, Ahmad [Rochester, MN

    2012-04-17

    Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.

  20. Micro-computed tomography assessment of human alveolar bone: bone density and three-dimensional micro-architecture.

    Science.gov (United States)

    Kim, Yoon Jeong; Henkin, Jeffrey

    2015-04-01

    Micro-computed tomography (micro-CT) is a valuable means to evaluate and secure information related to bone density and quality in human necropsy samples and small live animals. The aim of this study was to assess the bone density of the alveolar jaw bones in human cadaver, using micro-CT. The correlation between bone density and three-dimensional micro architecture of trabecular bone was evaluated. Thirty-four human cadaver jaw bone specimens were harvested. Each specimen was scanned with micro-CT at resolution of 10.5 μm. The bone volume fraction (BV/TV) and the bone mineral density (BMD) value within a volume of interest were measured. The three-dimensional micro architecture of trabecular bone was assessed. All the parameters in the maxilla and the mandible were subject to comparison. The variables for the bone density and the three-dimensional micro architecture were analyzed for nonparametric correlation using Spearman's rho at the significance level of p architecture parameters were consistently higher in the mandible, up to 3.3 times greater than those in the maxilla. The most linear correlation was observed between BV/TV and BMD, with Spearman's rho = 0.99 (p = .01). Both BV/TV and BMD were highly correlated with all micro architecture parameters with Spearman's rho above 0.74 (p = .01). Two aspects of bone density using micro-CT, the BV/TV and BMD, are highly correlated with three-dimensional micro architecture parameters, which represent the quality of trabecular bone. This noninvasive method may adequately enhance evaluation of the alveolar bone. © 2013 Wiley Periodicals, Inc.