WorldWideScience

Sample records for high-performance computing architectures

  1. A High Performance COTS Based Computer Architecture

    Patte, Mathieu; Grimoldi, Raoul; Trautner, Roland

    2014-08-01

    Using Commercial Off The Shelf (COTS) electronic components for space applications is a long standing idea. Indeed the difference in processing performance and energy efficiency between radiation hardened components and COTS components is so important that COTS components are very attractive for use in mass and power constrained systems. However using COTS components in space is not straightforward as one must account with the effects of the space environment on the COTS components behavior. In the frame of the ESA funded activity called High Performance COTS Based Computer, Airbus Defense and Space and its subcontractor OHB CGS have developed and prototyped a versatile COTS based architecture for high performance processing. The rest of the paper is organized as follows: in a first section we will start by recapitulating the interests and constraints of using COTS components for space applications; then we will briefly describe existing fault mitigation architectures and present our solution for fault mitigation based on a component called the SmartIO; in the last part of the paper we will describe the prototyping activities executed during the HiP CBC project.

  2. Benchmarking high performance computing architectures with CMS’ skeleton framework

    Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.

    2017-10-01

    In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.

  3. A High Performance VLSI Computer Architecture For Computer Graphics

    Chin, Chi-Yuan; Lin, Wen-Tai

    1988-10-01

    A VLSI computer architecture, consisting of multiple processors, is presented in this paper to satisfy the modern computer graphics demands, e.g. high resolution, realistic animation, real-time display etc.. All processors share a global memory which are partitioned into multiple banks. Through a crossbar network, data from one memory bank can be broadcasted to many processors. Processors are physically interconnected through a hyper-crossbar network (a crossbar-like network). By programming the network, the topology of communication links among processors can be reconfigurated to satisfy specific dataflows of different applications. Each processor consists of a controller, arithmetic operators, local memory, a local crossbar network, and I/O ports to communicate with other processors, memory banks, and a system controller. Operations in each processor are characterized into two modes, i.e. object domain and space domain, to fully utilize the data-independency characteristics of graphics processing. Special graphics features such as 3D-to-2D conversion, shadow generation, texturing, and reflection, can be easily handled. With the current high density interconnection (MI) technology, it is feasible to implement a 64-processor system to achieve 2.5 billion operations per second, a performance needed in most advanced graphics applications.

  4. Architecture and Programming Models for High Performance Intensive Computation

    2016-06-29

    commands from the data processing center to the sensors is needed. It has been noted that the ubiquity of mobile communication devices offers the...commands from a Processing Facility by way of mobile Relay Stations. The activity of each component of this model other than the Merge module can be...evaluation of the initial system implementation. Gao also was in charge of the development of Fresh Breeze architecture backend on new many-core computers

  5. High-performance computing on the Intel Xeon Phi how to fully exploit MIC architectures

    Wang, Endong; Shen, Bo; Zhang, Guangyong; Lu, Xiaowei; Wu, Qing; Wang, Yajuan

    2014-01-01

    The aim of this book is to explain to high-performance computing (HPC) developers how to utilize the Intel® Xeon Phi™ series products efficiently. To that end, it introduces some computing grammar, programming technology and optimization methods for using many-integrated-core (MIC) platforms and also offers tips and tricks for actual use, based on the authors' first-hand optimization experience.The material is organized in three sections. The first section, "Basics of MIC", introduces the fundamentals of MIC architecture and programming, including the specific Intel MIC programming environment

  6. Confabulation Based Real-time Anomaly Detection for Wide-area Surveillance Using Heterogeneous High Performance Computing Architecture

    2015-06-01

    CONFABULATION BASED REAL-TIME ANOMALY DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE SYRACUSE...DETECTION FOR WIDE-AREA SURVEILLANCE USING HETEROGENEOUS HIGH PERFORMANCE COMPUTING ARCHITECTURE 5a. CONTRACT NUMBER FA8750-12-1-0251 5b. GRANT...processors including graphic processor units (GPUs) and Intel Xeon Phi processors. Experimental results showed significant speedups, which can enable

  7. Matrix multiplication operations with data pre-conditioning in a high performance computing architecture

    Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

    2013-11-05

    Mechanisms for performing matrix multiplication operations with data pre-conditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation.

  8. Soft Computing Techniques for the Protein Folding Problem on High Performance Computing Architectures.

    Llanes, Antonio; Muñoz, Andrés; Bueno-Crespo, Andrés; García-Valverde, Teresa; Sánchez, Antonia; Arcas-Túnez, Francisco; Pérez-Sánchez, Horacio; Cecilia, José M

    2016-01-01

    The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.

  9. New Developments in Modeling MHD Systems on High Performance Computing Architectures

    Germaschewski, K.; Raeder, J.; Larson, D. J.; Bhattacharjee, A.

    2009-04-01

    Modeling the wide range of time and length scales present even in fluid models of plasmas like MHD and X-MHD (Extended MHD including two fluid effects like Hall term, electron inertia, electron pressure gradient) is challenging even on state-of-the-art supercomputers. In the last years, HPC capacity has continued to grow exponentially, but at the expense of making the computer systems more and more difficult to program in order to get maximum performance. In this paper, we will present a new approach to managing the complexity caused by the need to write efficient codes: Separating the numerical description of the problem, in our case a discretized right hand side (r.h.s.), from the actual implementation of efficiently evaluating it. An automatic code generator is used to describe the r.h.s. in a quasi-symbolic form while leaving the translation into efficient and parallelized code to a computer program itself. We implemented this approach for OpenGGCM (Open General Geospace Circulation Model), a model of the Earth's magnetosphere, which was accelerated by a factor of three on regular x86 architecture and a factor of 25 on the Cell BE architecture (commonly known for its deployment in Sony's PlayStation 3).

  10. SCinet Architecture: Featured at the International Conference for High Performance Computing,Networking, Storage and Analysis 2016

    Lyonnais, Marc; Smith, Matt; Mace, Kate P.

    2017-02-06

    SCinet is the purpose-built network that operates during the International Conference for High Performance Computing,Networking, Storage and Analysis (Super Computing or SC). Created each year for the conference, SCinet brings to life a high-capacity network that supports applications and experiments that are a hallmark of the SC conference. The network links the convention center to research and commercial networks around the world. This resource serves as a platform for exhibitors to demonstrate the advanced computing resources of their home institutions and elsewhere by supporting a wide variety of applications. Volunteers from academia, government and industry work together to design and deliver the SCinet infrastructure. Industry vendors and carriers donate millions of dollars in equipment and services needed to build and support the local and wide area networks. Planning begins more than a year in advance of each SC conference and culminates in a high intensity installation in the days leading up to the conference. The SCinet architecture for SC16 illustrates a dramatic increase in participation from the vendor community, particularly those that focus on network equipment. Software-Defined Networking (SDN) and Data Center Networking (DCN) are present in nearly all aspects of the design.

  11. High-performance computing using FPGAs

    Benkrid, Khaled

    2013-01-01

    This book is concerned with the emerging field of High Performance Reconfigurable Computing (HPRC), which aims to harness the high performance and relative low power of reconfigurable hardware–in the form Field Programmable Gate Arrays (FPGAs)–in High Performance Computing (HPC) applications. It presents the latest developments in this field from applications, architecture, and tools and methodologies points of view. We hope that this work will form a reference for existing researchers in the field, and entice new researchers and developers to join the HPRC community.  The book includes:  Thirteen application chapters which present the most important application areas tackled by high performance reconfigurable computers, namely: financial computing, bioinformatics and computational biology, data search and processing, stencil computation e.g. computational fluid dynamics and seismic modeling, cryptanalysis, astronomical N-body simulation, and circuit simulation.     Seven architecture chapters which...

  12. High performance computing on vector systems

    Roller, Sabine

    2008-01-01

    Presents the developments in high-performance computing and simulation on modern supercomputer architectures. This book covers trends in hardware and software development in general and specifically the vector-based systems and heterogeneous architectures. It presents innovative fields like coupled multi-physics or multi-scale simulations.

  13. High Performance Computing Multicast

    2012-02-01

    A History of the Virtual Synchrony Replication Model,” in Replication: Theory and Practice, Charron-Bost, B., Pedone, F., and Schiper, A. (Eds...Performance Computing IP / IPv4 Internet Protocol (version 4.0) IPMC Internet Protocol MultiCast LAN Local Area Network MCMD Dr. Multicast MPI

  14. High-performance computing — an overview

    Marksteiner, Peter

    1996-08-01

    An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.

  15. A high performance architecture for accelerator controls

    Allen, M.; Hunt, S.M; Lue, H.; Saltmarsh, C.G.; Parker, C.R.C.B.

    1991-01-01

    The demands placed on the Superconducting Super Collider (SSC) control system due to large distances, high bandwidth and fast response time required for operation will require a fresh approach to the data communications architecture of the accelerator. The prototype design effort aims at providing deterministic communication across the accelerator complex with a response time of < 100 ms and total bandwidth of 2 Gbits/sec. It will offer a consistent interface for a large number of equipment types, from vacuum pumps to beam position monitors, providing appropriate communications performance for each equipment type. It will consist of highly parallel links to all equipment: those with computing resources, non-intelligent direct control interfaces, and data concentrators. This system will give each piece of equipment a dedicated link of fixed bandwidth to the control system. Application programs will have access to all accelerator devices which will be memory mapped into a global virtual addressing scheme. Links to devices in the same geographical area will be multiplexed using commercial Time Division Multiplexing equipment. Low-level access will use reflective memory techniques, eliminating processing overhead and complexity of traditional data communication protocols. The use of commercial standards and equipment will enable a high performance system to be built at low cost

  16. A high performance architecture for accelerator controls

    Allen, M.; Hunt, S.M.; Lue, H.; Saltmarsh, C.G.; Parker, C.R.C.B.

    1991-03-01

    The demands placed on the Superconducting Super Collider (SSC) control system due to large distances, high bandwidth and fast response time required for operation will require a fresh approach to the data communications architecture of the accelerator. The prototype design effort aims at providing deterministic communication across the accelerator complex with a response time of <100 ms and total bandwidth of 2 Gbits/sec. It will offer a consistent interface for a large number of equipment types, from vacuum pumps to beam position monitors, providing appropriate communications performance for each equipment type. It will consist of highly parallel links to all equipments: those with computing resources, non-intelligent direct control interfaces, and data concentrators. This system will give each piece of equipment a dedicated link of fixed bandwidth to the control system. Application programs will have access to all accelerator devices which will be memory mapped into a global virtual addressing scheme. Links to devices in the same geographical area will be multiplexed using commercial Time Division Multiplexing equipment. Low-level access will use reflective memory techniques, eliminating processing overhead and complexity of traditional data communication protocols. The use of commercial standards and equipment will enable a high performance system to be built at low cost. 1 fig

  17. The path toward HEP High Performance Computing

    Apostolakis, John; Brun, René; Gheata, Andrei; Wenzel, Sandro; Carminati, Federico

    2014-01-01

    High Energy Physics code has been known for making poor use of high performance computing architectures. Efforts in optimising HEP code on vector and RISC architectures have yield limited results and recent studies have shown that, on modern architectures, it achieves a performance between 10% and 50% of the peak one. Although several successful attempts have been made to port selected codes on GPUs, no major HEP code suite has a 'High Performance' implementation. With LHC undergoing a major upgrade and a number of challenging experiments on the drawing board, HEP cannot any longer neglect the less-than-optimal performance of its code and it has to try making the best usage of the hardware. This activity is one of the foci of the SFT group at CERN, which hosts, among others, the Root and Geant4 project. The activity of the experiments is shared and coordinated via a Concurrency Forum, where the experience in optimising HEP code is presented and discussed. Another activity is the Geant-V project, centred on the development of a highperformance prototype for particle transport. Achieving a good concurrency level on the emerging parallel architectures without a complete redesign of the framework can only be done by parallelizing at event level, or with a much larger effort at track level. Apart the shareable data structures, this typically implies a multiplication factor in terms of memory consumption compared to the single threaded version, together with sub-optimal handling of event processing tails. Besides this, the low level instruction pipelining of modern processors cannot be used efficiently to speedup the program. We have implemented a framework that allows scheduling vectors of particles to an arbitrary number of computing resources in a fine grain parallel approach. The talk will review the current optimisation activities within the SFT group with a particular emphasis on the development perspectives towards a simulation framework able to profit

  18. The path toward HEP High Performance Computing

    Apostolakis, John; Carminati, Federico; Gheata, Andrei; Wenzel, Sandro

    2014-01-01

    High Energy Physics code has been known for making poor use of high performance computing architectures. Efforts in optimising HEP code on vector and RISC architectures have yield limited results and recent studies have shown that, on modern architectures, it achieves a performance between 10% and 50% of the peak one. Although several successful attempts have been made to port selected codes on GPUs, no major HEP code suite has a 'High Performance' implementation. With LHC undergoing a major upgrade and a number of challenging experiments on the drawing board, HEP cannot any longer neglect the less-than-optimal performance of its code and it has to try making the best usage of the hardware. This activity is one of the foci of the SFT group at CERN, which hosts, among others, the Root and Geant4 project. The activity of the experiments is shared and coordinated via a Concurrency Forum, where the experience in optimising HEP code is presented and discussed. Another activity is the Geant-V project, centred on th...

  19. Computational Biology and High Performance Computing 2000

    Simon, Horst D.; Zorn, Manfred D.; Spengler, Sylvia J.; Shoichet, Brian K.; Stewart, Craig; Dubchak, Inna L.; Arkin, Adam P.

    2000-10-19

    The pace of extraordinary advances in molecular biology has accelerated in the past decade due in large part to discoveries coming from genome projects on human and model organisms. The advances in the genome project so far, happening well ahead of schedule and under budget, have exceeded any dreams by its protagonists, let alone formal expectations. Biologists expect the next phase of the genome project to be even more startling in terms of dramatic breakthroughs in our understanding of human biology, the biology of health and of disease. Only today can biologists begin to envision the necessary experimental, computational and theoretical steps necessary to exploit genome sequence information for its medical impact, its contribution to biotechnology and economic competitiveness, and its ultimate contribution to environmental quality. High performance computing has become one of the critical enabling technologies, which will help to translate this vision of future advances in biology into reality. Biologists are increasingly becoming aware of the potential of high performance computing. The goal of this tutorial is to introduce the exciting new developments in computational biology and genomics to the high performance computing community.

  20. High Performance Spaceflight Computing (HPSC)

    National Aeronautics and Space Administration — Space-based computing has not kept up with the needs of current and future NASA missions. We are developing a next-generation flight computing system that addresses...

  1. Progress in a novel architecture for high performance processing

    Zhang, Zhiwei; Liu, Meng; Liu, Zijun; Du, Xueliang; Xie, Shaolin; Ma, Hong; Ding, Guangxin; Ren, Weili; Zhou, Fabiao; Sun, Wenqin; Wang, Huijuan; Wang, Donglin

    2018-04-01

    The high performance processing (HPP) is an innovative architecture which targets on high performance computing with excellent power efficiency and computing performance. It is suitable for data intensive applications like supercomputing, machine learning and wireless communication. An example chip with four application-specific integrated circuit (ASIC) cores which is the first generation of HPP cores has been taped out successfully under Taiwan Semiconductor Manufacturing Company (TSMC) 40 nm low power process. The innovative architecture shows great energy efficiency over the traditional central processing unit (CPU) and general-purpose computing on graphics processing units (GPGPU). Compared with MaPU, HPP has made great improvement in architecture. The chip with 32 HPP cores is being developed under TSMC 16 nm field effect transistor (FFC) technology process and is planed to use commercially. The peak performance of this chip can reach 4.3 teraFLOPS (TFLOPS) and its power efficiency is up to 89.5 gigaFLOPS per watt (GFLOPS/W).

  2. Contemporary high performance computing from petascale toward exascale

    Vetter, Jeffrey S

    2013-01-01

    Contemporary High Performance Computing: From Petascale toward Exascale focuses on the ecosystems surrounding the world's leading centers for high performance computing (HPC). It covers many of the important factors involved in each ecosystem: computer architectures, software, applications, facilities, and sponsors. The first part of the book examines significant trends in HPC systems, including computer architectures, applications, performance, and software. It discusses the growth from terascale to petascale computing and the influence of the TOP500 and Green500 lists. The second part of the

  3. High Performance Systolic Array Core Architecture Design for DNA Sequencer

    Saiful Nurdin Dayana

    2018-01-01

    Full Text Available This paper presents a high performance systolic array (SA core architecture design for Deoxyribonucleic Acid (DNA sequencer. The core implements the affine gap penalty score Smith-Waterman (SW algorithm. This time-consuming local alignment algorithm guarantees optimal alignment between DNA sequences, but it requires quadratic computation time when performed on standard desktop computers. The use of linear SA decreases the time complexity from quadratic to linear. In addition, with the exponential growth of DNA databases, the SA architecture is used to overcome the timing issue. In this work, the SW algorithm has been captured using Verilog Hardware Description Language (HDL and simulated using Xilinx ISIM simulator. The proposed design has been implemented in Xilinx Virtex -6 Field Programmable Gate Array (FPGA and improved in the core area by 90% reduction.

  4. High performance computing in Windows Azure cloud

    Ambruš, Dejan

    2013-01-01

    High performance, security, availability, scalability, flexibility and lower costs of maintenance have essentially contributed to the growing popularity of cloud computing in all spheres of life, especially in business. In fact cloud computing offers even more than this. With usage of virtual computing clusters a runtime environment for high performance computing can be efficiently implemented also in a cloud. There are many advantages but also some disadvantages of cloud computing, some ...

  5. A Semi-Automated Machine Learning Algorithm for Tree Cover Delineation from 1-m Naip Imagery Using a High Performance Computing Architecture

    Basu, S.; Ganguly, S.; Nemani, R. R.; Mukhopadhyay, S.; Milesi, C.; Votava, P.; Michaelis, A.; Zhang, G.; Cook, B. D.; Saatchi, S. S.; Boyda, E.

    2014-12-01

    Accurate tree cover delineation is a useful instrument in the derivation of Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) satellite imagery data. Numerous algorithms have been designed to perform tree cover delineation in high to coarse resolution satellite imagery, but most of them do not scale to terabytes of data, typical in these VHR datasets. In this paper, we present an automated probabilistic framework for the segmentation and classification of 1-m VHR data as obtained from the National Agriculture Imagery Program (NAIP) for deriving tree cover estimates for the whole of Continental United States, using a High Performance Computing Architecture. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field (CRF), which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by incorporating expert knowledge through the relabeling of misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the state of California, which covers a total of 11,095 NAIP tiles and spans a total geographical area of 163,696 sq. miles. Our framework produced correct detection rates of around 85% for fragmented forests and 70% for urban tree cover areas, with false positive rates lower than 3% for both regions. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR high-resolution canopy height model shows the effectiveness of our algorithm in generating accurate high-resolution tree cover maps.

  6. Building and measuring a high performance network architecture

    Kramer, William T.C.; Toole, Timothy; Fisher, Chuck; Dugan, Jon; Wheeler, David; Wing, William R; Nickless, William; Goddard, Gregory; Corbato, Steven; Love, E. Paul; Daspit, Paul; Edwards, Hal; Mercer, Linden; Koester, David; Decina, Basil; Dart, Eli; Paul Reisinger, Paul; Kurihara, Riki; Zekauskas, Matthew J; Plesset, Eric; Wulf, Julie; Luce, Douglas; Rogers, James; Duncan, Rex; Mauth, Jeffery

    2001-04-20

    Once a year, the SC conferences present a unique opportunity to create and build one of the most complex and highest performance networks in the world. At SC2000, large-scale and complex local and wide area networking connections were demonstrated, including large-scale distributed applications running on different architectures. This project was designed to use the unique opportunity presented at SC2000 to create a testbed network environment and then use that network to demonstrate and evaluate high performance computational and communication applications. This testbed was designed to incorporate many interoperable systems and services and was designed for measurement from the very beginning. The end results were key insights into how to use novel, high performance networking technologies and to accumulate measurements that will give insights into the networks of the future.

  7. A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures

    Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.

    2016-05-01

    In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.

  8. High Performance Computing in Science and Engineering '02 : Transactions of the High Performance Computing Center

    Jäger, Willi

    2003-01-01

    This book presents the state-of-the-art in modeling and simulation on supercomputers. Leading German research groups present their results achieved on high-end systems of the High Performance Computing Center Stuttgart (HLRS) for the year 2002. Reports cover all fields of supercomputing simulation ranging from computational fluid dynamics to computer science. Special emphasis is given to industrially relevant applications. Moreover, by presenting results for both vector sytems and micro-processor based systems the book allows to compare performance levels and usability of a variety of supercomputer architectures. It therefore becomes an indispensable guidebook to assess the impact of the Japanese Earth Simulator project on supercomputing in the years to come.

  9. Embedded High Performance Scalable Computing Systems

    Ngo, David

    2003-01-01

    The Embedded High Performance Scalable Computing Systems (EHPSCS) program is a cooperative agreement between Sanders, A Lockheed Martin Company and DARPA that ran for three years, from Apr 1995 - Apr 1998...

  10. Debugging a high performance computing program

    Gooding, Thomas M.

    2013-08-20

    Methods, apparatus, and computer program products are disclosed for debugging a high performance computing program by gathering lists of addresses of calling instructions for a plurality of threads of execution of the program, assigning the threads to groups in dependence upon the addresses, and displaying the groups to identify defective threads.

  11. Enabling high performance computational science through combinatorial algorithms

    Boman, Erik G; Bozdag, Doruk; Catalyurek, Umit V; Devine, Karen D; Gebremedhin, Assefaw H; Hovland, Paul D; Pothen, Alex; Strout, Michelle Mills

    2007-01-01

    The Combinatorial Scientific Computing and Petascale Simulations (CSCAPES) Institute is developing algorithms and software for combinatorial problems that play an enabling role in scientific and engineering computations. Discrete algorithms will be increasingly critical for achieving high performance for irregular problems on petascale architectures. This paper describes recent contributions by researchers at the CSCAPES Institute in the areas of load balancing, parallel graph coloring, performance improvement, and parallel automatic differentiation

  12. Enabling high performance computational science through combinatorial algorithms

    Boman, Erik G [Discrete Algorithms and Math Department, Sandia National Laboratories (United States); Bozdag, Doruk [Biomedical Informatics, and Electrical and Computer Engineering, Ohio State University (United States); Catalyurek, Umit V [Biomedical Informatics, and Electrical and Computer Engineering, Ohio State University (United States); Devine, Karen D [Discrete Algorithms and Math Department, Sandia National Laboratories (United States); Gebremedhin, Assefaw H [Computer Science and Center for Computational Science, Old Dominion University (United States); Hovland, Paul D [Mathematics and Computer Science Division, Argonne National Laboratory (United States); Pothen, Alex [Computer Science and Center for Computational Science, Old Dominion University (United States); Strout, Michelle Mills [Computer Science, Colorado State University (United States)

    2007-07-15

    The Combinatorial Scientific Computing and Petascale Simulations (CSCAPES) Institute is developing algorithms and software for combinatorial problems that play an enabling role in scientific and engineering computations. Discrete algorithms will be increasingly critical for achieving high performance for irregular problems on petascale architectures. This paper describes recent contributions by researchers at the CSCAPES Institute in the areas of load balancing, parallel graph coloring, performance improvement, and parallel automatic differentiation.

  13. Factoring symmetric indefinite matrices on high-performance architectures

    Jones, Mark T.; Patrick, Merrell L.

    1990-01-01

    The Bunch-Kaufman algorithm is the method of choice for factoring symmetric indefinite matrices in many applications. However, the Bunch-Kaufman algorithm does not take advantage of high-performance architectures such as the Cray Y-MP. Three new algorithms, based on Bunch-Kaufman factorization, that take advantage of such architectures are described. Results from an implementation of the third algorithm are presented.

  14. AHPCRC - Army High Performance Computing Research Center

    2010-01-01

    computing. Of particular interest is the ability of a distrib- uted jamming network (DJN) to jam signals in all or part of a sensor or communications net...and reasoning, assistive technologies. FRIEDRICH (FRITZ) PRINZ Finmeccanica Professor of Engineering, Robert Bosch Chair, Department of Engineering...High Performance Computing Research Center www.ahpcrc.org BARBARA BRYAN AHPCRC Research and Outreach Manager, HPTi (650) 604-3732 bbryan@hpti.com Ms

  15. DURIP: High Performance Computing in Biomathematics Applications

    2017-05-10

    Mathematics and Statistics (AMS) at the University of California, Santa Cruz (UCSC) to conduct research and research-related education in areas of...Computing in Biomathematics Applications Report Title The goal of this award was to enhance the capabilities of the Department of Applied Mathematics and...DURIP: High Performance Computing in Biomathematics Applications The goal of this award was to enhance the capabilities of the Department of Applied

  16. Micromagnetics on high-performance workstation and mobile computational platforms

    Fu, S.; Chang, R.; Couture, S.; Menarini, M.; Escobar, M. A.; Kuteifan, M.; Lubarda, M.; Gabay, D.; Lomakin, V.

    2015-05-01

    The feasibility of using high-performance desktop and embedded mobile computational platforms is presented, including multi-core Intel central processing unit, Nvidia desktop graphics processing units, and Nvidia Jetson TK1 Platform. FastMag finite element method-based micromagnetic simulator is used as a testbed, showing high efficiency on all the platforms. Optimization aspects of improving the performance of the mobile systems are discussed. The high performance, low cost, low power consumption, and rapid performance increase of the embedded mobile systems make them a promising candidate for micromagnetic simulations. Such architectures can be used as standalone systems or can be built as low-power computing clusters.

  17. Power efficient and high performance VLSI architecture for AES algorithm

    K. Kalaiselvi

    2015-09-01

    Full Text Available Advanced encryption standard (AES algorithm has been widely deployed in cryptographic applications. This work proposes a low power and high throughput implementation of AES algorithm using key expansion approach. We minimize the power consumption and critical path delay using the proposed high performance architecture. It supports both encryption and decryption using 256-bit keys with a throughput of 0.06 Gbps. The VHDL language is utilized for simulating the design and an FPGA chip has been used for the hardware implementations. Experimental results reveal that the proposed AES architectures offer superior performance than the existing VLSI architectures in terms of power, throughput and critical path delay.

  18. High-performance reconfigurable hardware architecture for restricted Boltzmann machines.

    Ly, Daniel Le; Chow, Paul

    2010-11-01

    Despite the popularity and success of neural networks in research, the number of resulting commercial or industrial applications has been limited. A primary cause for this lack of adoption is that neural networks are usually implemented as software running on general-purpose processors. Hence, a hardware implementation that can exploit the inherent parallelism in neural networks is desired. This paper investigates how the restricted Boltzmann machine (RBM), which is a popular type of neural network, can be mapped to a high-performance hardware architecture on field-programmable gate array (FPGA) platforms. The proposed modular framework is designed to reduce the time complexity of the computations through heavily customized hardware engines. A method to partition large RBMs into smaller congruent components is also presented, allowing the distribution of one RBM across multiple FPGA resources. The framework is tested on a platform of four Xilinx Virtex II-Pro XC2VP70 FPGAs running at 100 MHz through a variety of different configurations. The maximum performance was obtained by instantiating an RBM of 256 × 256 nodes distributed across four FPGAs, which resulted in a computational speed of 3.13 billion connection-updates-per-second and a speedup of 145-fold over an optimized C program running on a 2.8-GHz Intel processor.

  19. Monitoring SLAC High Performance UNIX Computing Systems

    Lettsome, Annette K.

    2005-01-01

    Knowledge of the effectiveness and efficiency of computers is important when working with high performance systems. The monitoring of such systems is advantageous in order to foresee possible misfortunes or system failures. Ganglia is a software system designed for high performance computing systems to retrieve specific monitoring information. An alternative storage facility for Ganglia's collected data is needed since its default storage system, the round-robin database (RRD), struggles with data integrity. The creation of a script-driven MySQL database solves this dilemma. This paper describes the process took in the creation and implementation of the MySQL database for use by Ganglia. Comparisons between data storage by both databases are made using gnuplot and Ganglia's real-time graphical user interface

  20. High-performance computing for airborne applications

    Quinn, Heather M.; Manuzatto, Andrea; Fairbanks, Tom; Dallmann, Nicholas; Desgeorges, Rose

    2010-01-01

    Recently, there has been attempts to move common satellite tasks to unmanned aerial vehicles (UAVs). UAVs are significantly cheaper to buy than satellites and easier to deploy on an as-needed basis. The more benign radiation environment also allows for an aggressive adoption of state-of-the-art commercial computational devices, which increases the amount of data that can be collected. There are a number of commercial computing devices currently available that are well-suited to high-performance computing. These devices range from specialized computational devices, such as field-programmable gate arrays (FPGAs) and digital signal processors (DSPs), to traditional computing platforms, such as microprocessors. Even though the radiation environment is relatively benign, these devices could be susceptible to single-event effects. In this paper, we will present radiation data for high-performance computing devices in a accelerated neutron environment. These devices include a multi-core digital signal processor, two field-programmable gate arrays, and a microprocessor. From these results, we found that all of these devices are suitable for many airplane environments without reliability problems.

  1. High Performance Computing Operations Review Report

    Cupps, Kimberly C. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2013-12-19

    The High Performance Computing Operations Review (HPCOR) meeting—requested by the ASC and ASCR program headquarters at DOE—was held November 5 and 6, 2013, at the Marriott Hotel in San Francisco, CA. The purpose of the review was to discuss the processes and practices for HPC integration and its related software and facilities. Experiences and lessons learned from the most recent systems deployed were covered in order to benefit the deployment of new systems.

  2. High-performance computing in seismology

    NONE

    1996-09-01

    The scientific, technical, and economic importance of the issues discussed here presents a clear agenda for future research in computational seismology. In this way these problems will drive advances in high-performance computing in the field of seismology. There is a broad community that will benefit from this work, including the petroleum industry, research geophysicists, engineers concerned with seismic hazard mitigation, and governments charged with enforcing a comprehensive test ban treaty. These advances may also lead to new applications for seismological research. The recent application of high-resolution seismic imaging of the shallow subsurface for the environmental remediation industry is an example of this activity. This report makes the following recommendations: (1) focused efforts to develop validated documented software for seismological computations should be supported, with special emphasis on scalable algorithms for parallel processors; (2) the education of seismologists in high-performance computing technologies and methodologies should be improved; (3) collaborations between seismologists and computational scientists and engineers should be increased; (4) the infrastructure for archiving, disseminating, and processing large volumes of seismological data should be improved.

  3. Evaluation of high-performance computing software

    Browne, S.; Dongarra, J. [Univ. of Tennessee, Knoxville, TN (United States); Rowan, T. [Oak Ridge National Lab., TN (United States)

    1996-12-31

    The absence of unbiased and up to date comparative evaluations of high-performance computing software complicates a user`s search for the appropriate software package. The National HPCC Software Exchange (NHSE) is attacking this problem using an approach that includes independent evaluations of software, incorporation of author and user feedback into the evaluations, and Web access to the evaluations. We are applying this approach to the Parallel Tools Library (PTLIB), a new software repository for parallel systems software and tools, and HPC-Netlib, a high performance branch of the Netlib mathematical software repository. Updating the evaluations with feed-back and making it available via the Web helps ensure accuracy and timeliness, and using independent reviewers produces unbiased comparative evaluations difficult to find elsewhere.

  4. Software Systems for High-performance Quantum Computing

    Humble, Travis S [ORNL; Britt, Keith A [ORNL

    2016-01-01

    Quantum computing promises new opportunities for solving hard computational problems, but harnessing this novelty requires breakthrough concepts in the design, operation, and application of computing systems. We define some of the challenges facing the development of quantum computing systems as well as software-based approaches that can be used to overcome these challenges. Following a brief overview of the state of the art, we present models for the quantum programming and execution models, the development of architectures for hybrid high-performance computing systems, and the realization of software stacks for quantum networking. This leads to a discussion of the role that conventional computing plays in the quantum paradigm and how some of the current challenges for exascale computing overlap with those facing quantum computing.

  5. High performance parallel computers for science

    Nash, T.; Areti, H.; Atac, R.; Biel, J.; Cook, A.; Deppe, J.; Edel, M.; Fischler, M.; Gaines, I.; Hance, R.

    1989-01-01

    This paper reports that Fermilab's Advanced Computer Program (ACP) has been developing cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 Mflops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction

  6. Quantum Accelerators for High-performance Computing Systems

    Humble, Travis S. [ORNL; Britt, Keith A. [ORNL; Mohiyaddin, Fahd A. [ORNL

    2017-11-01

    We define some of the programming and system-level challenges facing the application of quantum processing to high-performance computing. Alongside barriers to physical integration, prominent differences in the execution of quantum and conventional programs challenges the intersection of these computational models. Following a brief overview of the state of the art, we discuss recent advances in programming and execution models for hybrid quantum-classical computing. We discuss a novel quantum-accelerator framework that uses specialized kernels to offload select workloads while integrating with existing computing infrastructure. We elaborate on the role of the host operating system to manage these unique accelerator resources, the prospects for deploying quantum modules, and the requirements placed on the language hierarchy connecting these different system components. We draw on recent advances in the modeling and simulation of quantum computing systems with the development of architectures for hybrid high-performance computing systems and the realization of software stacks for controlling quantum devices. Finally, we present simulation results that describe the expected system-level behavior of high-performance computing systems composed from compute nodes with quantum processing units. We describe performance for these hybrid systems in terms of time-to-solution, accuracy, and energy consumption, and we use simple application examples to estimate the performance advantage of quantum acceleration.

  7. High performance computing in linear control

    Datta, B.N.

    1993-01-01

    Remarkable progress has been made in both theory and applications of all important areas of control. The theory is rich and very sophisticated. Some beautiful applications of control theory are presently being made in aerospace, biomedical engineering, industrial engineering, robotics, economics, power systems, etc. Unfortunately, the same assessment of progress does not hold in general for computations in control theory. Control Theory is lagging behind other areas of science and engineering in this respect. Nowadays there is a revolution going on in the world of high performance scientific computing. Many powerful computers with vector and parallel processing have been built and have been available in recent years. These supercomputers offer very high speed in computations. Highly efficient software, based on powerful algorithms, has been developed to use on these advanced computers, and has also contributed to increased performance. While workers in many areas of science and engineering have taken great advantage of these hardware and software developments, control scientists and engineers, unfortunately, have not been able to take much advantage of these developments

  8. Optical interconnection networks for high-performance computing systems

    Biberman, Aleksandr; Bergman, Keren

    2012-01-01

    Enabled by silicon photonic technology, optical interconnection networks have the potential to be a key disruptive technology in computing and communication industries. The enduring pursuit of performance gains in computing, combined with stringent power constraints, has fostered the ever-growing computational parallelism associated with chip multiprocessors, memory systems, high-performance computing systems and data centers. Sustaining these parallelism growths introduces unique challenges for on- and off-chip communications, shifting the focus toward novel and fundamentally different communication approaches. Chip-scale photonic interconnection networks, enabled by high-performance silicon photonic devices, offer unprecedented bandwidth scalability with reduced power consumption. We demonstrate that the silicon photonic platforms have already produced all the high-performance photonic devices required to realize these types of networks. Through extensive empirical characterization in much of our work, we demonstrate such feasibility of waveguides, modulators, switches and photodetectors. We also demonstrate systems that simultaneously combine many functionalities to achieve more complex building blocks. We propose novel silicon photonic devices, subsystems, network topologies and architectures to enable unprecedented performance of these photonic interconnection networks. Furthermore, the advantages of photonic interconnection networks extend far beyond the chip, offering advanced communication environments for memory systems, high-performance computing systems, and data centers. (review article)

  9. Component-based software for high-performance scientific computing

    Alexeev, Yuri; Allan, Benjamin A; Armstrong, Robert C; Bernholdt, David E; Dahlgren, Tamara L; Gannon, Dennis; Janssen, Curtis L; Kenny, Joseph P; Krishnan, Manojkumar; Kohl, James A; Kumfert, Gary; McInnes, Lois Curfman; Nieplocha, Jarek; Parker, Steven G; Rasmussen, Craig; Windus, Theresa L

    2005-01-01

    Recent advances in both computational hardware and multidisciplinary science have given rise to an unprecedented level of complexity in scientific simulation software. This paper describes an ongoing grass roots effort aimed at addressing complexity in high-performance computing through the use of Component-Based Software Engineering (CBSE). Highlights of the benefits and accomplishments of the Common Component Architecture (CCA) Forum and SciDAC ISIC are given, followed by an illustrative example of how the CCA has been applied to drive scientific discovery in quantum chemistry. Thrusts for future research are also described briefly.

  10. Component-based software for high-performance scientific computing

    Alexeev, Yuri; Allan, Benjamin A; Armstrong, Robert C; Bernholdt, David E; Dahlgren, Tamara L; Gannon, Dennis; Janssen, Curtis L; Kenny, Joseph P; Krishnan, Manojkumar; Kohl, James A; Kumfert, Gary; McInnes, Lois Curfman; Nieplocha, Jarek; Parker, Steven G; Rasmussen, Craig; Windus, Theresa L

    2005-01-01

    Recent advances in both computational hardware and multidisciplinary science have given rise to an unprecedented level of complexity in scientific simulation software. This paper describes an ongoing grass roots effort aimed at addressing complexity in high-performance computing through the use of Component-Based Software Engineering (CBSE). Highlights of the benefits and accomplishments of the Common Component Architecture (CCA) Forum and SciDAC ISIC are given, followed by an illustrative example of how the CCA has been applied to drive scientific discovery in quantum chemistry. Thrusts for future research are also described briefly

  11. High Performance Computing in Science and Engineering '15 : Transactions of the High Performance Computing Center

    Kröner, Dietmar; Resch, Michael

    2016-01-01

    This book presents the state-of-the-art in supercomputer simulation. It includes the latest findings from leading researchers using systems from the High Performance Computing Center Stuttgart (HLRS) in 2015. The reports cover all fields of computational science and engineering ranging from CFD to computational physics and from chemistry to computer science with a special emphasis on industrially relevant applications. Presenting findings of one of Europe’s leading systems, this volume covers a wide variety of applications that deliver a high level of sustained performance. The book covers the main methods in high-performance computing. Its outstanding results in achieving the best performance for production codes are of particular interest for both scientists and engineers. The book comes with a wealth of color illustrations and tables of results.

  12. High Performance Computing in Science and Engineering '17 : Transactions of the High Performance Computing Center

    Kröner, Dietmar; Resch, Michael; HLRS 2017

    2018-01-01

    This book presents the state-of-the-art in supercomputer simulation. It includes the latest findings from leading researchers using systems from the High Performance Computing Center Stuttgart (HLRS) in 2017. The reports cover all fields of computational science and engineering ranging from CFD to computational physics and from chemistry to computer science with a special emphasis on industrially relevant applications. Presenting findings of one of Europe’s leading systems, this volume covers a wide variety of applications that deliver a high level of sustained performance.The book covers the main methods in high-performance computing. Its outstanding results in achieving the best performance for production codes are of particular interest for both scientists and engineers. The book comes with a wealth of color illustrations and tables of results.

  13. Development and Performance of the Modularized, High-performance Computing and Hybrid-architecture Capable GEOS-Chem Chemical Transport Model

    Long, M. S.; Yantosca, R.; Nielsen, J.; Linford, J. C.; Keller, C. A.; Payer Sulprizio, M.; Jacob, D. J.

    2014-12-01

    The GEOS-Chem global chemical transport model (CTM), used by a large atmospheric chemistry research community, has been reengineered to serve as a platform for a range of computational atmospheric chemistry science foci and applications. Development included modularization for coupling to general circulation and Earth system models (ESMs) and the adoption of co-processor capable atmospheric chemistry solvers. This was done using an Earth System Modeling Framework (ESMF) interface that operates independently of GEOS-Chem scientific code to permit seamless transition from the GEOS-Chem stand-alone serial CTM to deployment as a coupled ESM module. In this manner, the continual stream of updates contributed by the CTM user community is automatically available for broader applications, which remain state-of-science and directly referenceable to the latest version of the standard GEOS-Chem CTM. These developments are now available as part of the standard version of the GEOS-Chem CTM. The system has been implemented as an atmospheric chemistry module within the NASA GEOS-5 ESM. The coupled GEOS-5/GEOS-Chem system was tested for weak and strong scalability and performance with a tropospheric oxidant-aerosol simulation. Results confirm that the GEOS-Chem chemical operator scales efficiently for any number of processes. Although inclusion of atmospheric chemistry in ESMs is computationally expensive, the excellent scalability of the chemical operator means that the relative cost goes down with increasing number of processes, making fine-scale resolution simulations possible.

  14. High-performance, scalable optical network-on-chip architectures

    Tan, Xianfang

    The rapid advance of technology enables a large number of processing cores to be integrated into a single chip which is called a Chip Multiprocessor (CMP) or a Multiprocessor System-on-Chip (MPSoC) design. The on-chip interconnection network, which is the communication infrastructure for these processing cores, plays a central role in a many-core system. With the continuously increasing complexity of many-core systems, traditional metallic wired electronic networks-on-chip (NoC) became a bottleneck because of the unbearable latency in data transmission and extremely high energy consumption on chip. Optical networks-on-chip (ONoC) has been proposed as a promising alternative paradigm for electronic NoC with the benefits of optical signaling communication such as extremely high bandwidth, negligible latency, and low power consumption. This dissertation focus on the design of high-performance and scalable ONoC architectures and the contributions are highlighted as follow: 1. A micro-ring resonator (MRR)-based Generic Wavelength-routed Optical Router (GWOR) is proposed. A method for developing any sized GWOR is introduced. GWOR is a scalable non-blocking ONoC architecture with simple structure, low cost and high power efficiency compared to existing ONoC designs. 2. To expand the bandwidth and improve the fault tolerance of the GWOR, a redundant GWOR architecture is designed by cascading different type of GWORs into one network. 3. The redundant GWOR built with MRR-based comb switches is proposed. Comb switches can expand the bandwidth while keep the topology of GWOR unchanged by replacing the general MRRs with comb switches. 4. A butterfly fat tree (BFT)-based hybrid optoelectronic NoC (HONoC) architecture is developed in which GWORs are used for global communication and electronic routers are used for local communication. The proposed HONoC uses less numbers of electronic routers and links than its counterpart of electronic BFT-based NoC. It takes the advantages of

  15. Overview of Parallel Platforms for Common High Performance Computing

    T. Fryza

    2012-04-01

    Full Text Available The paper deals with various parallel platforms used for high performance computing in the signal processing domain. More precisely, the methods exploiting the multicores central processing units such as message passing interface and OpenMP are taken into account. The properties of the programming methods are experimentally proved in the application of a fast Fourier transform and a discrete cosine transform and they are compared with the possibilities of MATLAB's built-in functions and Texas Instruments digital signal processors with very long instruction word architectures. New FFT and DCT implementations were proposed and tested. The implementation phase was compared with CPU based computing methods and with possibilities of the Texas Instruments digital signal processing library on C6747 floating-point DSPs. The optimal combination of computing methods in the signal processing domain and new, fast routines' implementation is proposed as well.

  16. High Performance Computing in Science and Engineering '99 : Transactions of the High Performance Computing Center

    Jäger, Willi

    2000-01-01

    The book contains reports about the most significant projects from science and engineering of the Federal High Performance Computing Center Stuttgart (HLRS). They were carefully selected in a peer-review process and are showcases of an innovative combination of state-of-the-art modeling, novel algorithms and the use of leading-edge parallel computer technology. The projects of HLRS are using supercomputer systems operated jointly by university and industry and therefore a special emphasis has been put on the industrial relevance of results and methods.

  17. High Performance Computing in Science and Engineering '98 : Transactions of the High Performance Computing Center

    Jäger, Willi

    1999-01-01

    The book contains reports about the most significant projects from science and industry that are using the supercomputers of the Federal High Performance Computing Center Stuttgart (HLRS). These projects are from different scientific disciplines, with a focus on engineering, physics and chemistry. They were carefully selected in a peer-review process and are showcases for an innovative combination of state-of-the-art physical modeling, novel algorithms and the use of leading-edge parallel computer technology. As HLRS is in close cooperation with industrial companies, special emphasis has been put on the industrial relevance of results and methods.

  18. High-Performance Computing Paradigm and Infrastructure

    Yang, Laurence T

    2006-01-01

    With hyperthreading in Intel processors, hypertransport links in next generation AMD processors, multi-core silicon in today's high-end microprocessors from IBM and emerging grid computing, parallel and distributed computers have moved into the mainstream

  19. DOE research in utilization of high-performance computers

    Buzbee, B.L.; Worlton, W.J.; Michael, G.; Rodrigue, G.

    1980-12-01

    Department of Energy (DOE) and other Government research laboratories depend on high-performance computer systems to accomplish their programatic goals. As the most powerful computer systems become available, they are acquired by these laboratories so that advances can be made in their disciplines. These advances are often the result of added sophistication to numerical models whose execution is made possible by high-performance computer systems. However, high-performance computer systems have become increasingly complex; consequently, it has become increasingly difficult to realize their potential performance. The result is a need for research on issues related to the utilization of these systems. This report gives a brief description of high-performance computers, and then addresses the use of and future needs for high-performance computers within DOE, the growing complexity of applications within DOE, and areas of high-performance computer systems warranting research. 1 figure

  20. Peregrine System | High-Performance Computing | NREL

    classes of nodes that users access: Login Nodes Peregrine has four login nodes, each of which has Intel E5 /scratch file systems, the /mss file system is mounted on all login nodes. Compute Nodes Peregrine has 2592

  1. HIGH PERFORMANCE PHOTOGRAMMETRIC PROCESSING ON COMPUTER CLUSTERS

    V. N. Adrov

    2012-07-01

    Full Text Available Most cpu consuming tasks in photogrammetric processing can be done in parallel. The algorithms take independent bits as input and produce independent bits as output. The independence of bits comes from the nature of such algorithms since images, stereopairs or small image blocks parts can be processed independently. Many photogrammetric algorithms are fully automatic and do not require human interference. Photogrammetric workstations can perform tie points measurements, DTM calculations, orthophoto construction, mosaicing and many other service operations in parallel using distributed calculations. Distributed calculations save time reducing several days calculations to several hours calculations. Modern trends in computer technology show the increase of cpu cores in workstations, speed increase in local networks, and as a result dropping the price of the supercomputers or computer clusters that can contain hundreds or even thousands of computing nodes. Common distributed processing in DPW is usually targeted for interactive work with a limited number of cpu cores and is not optimized for centralized administration. The bottleneck of common distributed computing in photogrammetry can be in the limited lan throughput and storage performance, since the processing of huge amounts of large raster images is needed.

  2. CUDA/GPU Technology : Parallel Programming For High Performance Scientific Computing

    YUHENDRA; KUZE, Hiroaki; JOSAPHAT, Tetuko Sri Sumantyo

    2009-01-01

    [ABSTRACT]Graphics processing units (GP Us) originally designed for computer video cards have emerged as the most powerful chip in a high-performance workstation. In the high performance computation capabilities, graphic processing units (GPU) lead to much more powerful performance than conventional CPUs by means of parallel processing. In 2007, the birth of Compute Unified Device Architecture (CUDA) and CUDA-enabled GPUs by NVIDIA Corporation brought a revolution in the general purpose GPU a...

  3. Computer architecture technology trends

    1991-01-01

    Please note this is a Short Discount publication. This year's edition of Computer Architecture Technology Trends analyses the trends which are taking place in the architecture of computing systems today. Due to the sheer number of different applications to which computers are being applied, there seems no end to the different adoptions which proliferate. There are, however, some underlying trends which appear. Decision makers should be aware of these trends when specifying architectures, particularly for future applications. This report is fully revised and updated and provides insight in

  4. Inclusive vision for high performance computing at the CSIR

    Gazendam, A

    2006-02-01

    Full Text Available and computationally intensive applications. A number of different technologies and standards were identified as core to the open and distributed high-performance infrastructure envisaged...

  5. High performance computations using dynamical nucleation theory

    Windus, T L; Crosby, L D; Kathmann, S M

    2008-01-01

    Chemists continue to explore the use of very large computations to perform simulations that describe the molecular level physics of critical challenges in science. In this paper, we describe the Dynamical Nucleation Theory Monte Carlo (DNTMC) model - a model for determining molecular scale nucleation rate constants - and its parallel capabilities. The potential for bottlenecks and the challenges to running on future petascale or larger resources are delineated. A 'master-slave' solution is proposed to scale to the petascale and will be developed in the NWChem software. In addition, mathematical and data analysis challenges are described

  6. RISC Processors and High Performance Computing

    Bailey, David H.; Saini, Subhash; Craw, James M. (Technical Monitor)

    1995-01-01

    This tutorial will discuss the top five RISC microprocessors and the parallel systems in which they are used. It will provide a unique cross-machine comparison not available elsewhere. The effective performance of these processors will be compared by citing standard benchmarks in the context of real applications. The latest NAS Parallel Benchmarks, both absolute performance and performance per dollar, will be listed. The next generation of the NPB will be described. The tutorial will conclude with a discussion of future directions in the field. Technology Transfer Considerations: All of these computer systems are commercially available internationally. Information about these processors is available in the public domain, mostly from the vendors themselves. The NAS Parallel Benchmarks and their results have been previously approved numerous times for public release, beginning back in 1991.

  7. High Performance Computing in Science and Engineering '08 : Transactions of the High Performance Computing Center

    Kröner, Dietmar; Resch, Michael

    2009-01-01

    The discussions and plans on all scienti?c, advisory, and political levels to realize an even larger “European Supercomputer” in Germany, where the hardware costs alone will be hundreds of millions Euro – much more than in the past – are getting closer to realization. As part of the strategy, the three national supercomputing centres HLRS (Stuttgart), NIC/JSC (Julic ¨ h) and LRZ (Munich) have formed the Gauss Centre for Supercomputing (GCS) as a new virtual organization enabled by an agreement between the Federal Ministry of Education and Research (BMBF) and the state ministries for research of Baden-Wurttem ¨ berg, Bayern, and Nordrhein-Westfalen. Already today, the GCS provides the most powerful high-performance computing - frastructure in Europe. Through GCS, HLRS participates in the European project PRACE (Partnership for Advances Computing in Europe) and - tends its reach to all European member countries. These activities aligns well with the activities of HLRS in the European HPC infrastructur...

  8. Power-efficient computer architectures recent advances

    Själander, Magnus; Kaxiras, Stefanos

    2014-01-01

    As Moore's Law and Dennard scaling trends have slowed, the challenges of building high-performance computer architectures while maintaining acceptable power efficiency levels have heightened. Over the past ten years, architecture techniques for power efficiency have shifted from primarily focusing on module-level efficiencies, toward more holistic design styles based on parallelism and heterogeneity. This work highlights and synthesizes recent techniques and trends in power-efficient computer architecture.Table of Contents: Introduction / Voltage and Frequency Management / Heterogeneity and Sp

  9. NINJA: Java for High Performance Numerical Computing

    José E. Moreira

    2002-01-01

    Full Text Available When Java was first introduced, there was a perception that its many benefits came at a significant performance cost. In the particularly performance-sensitive field of numerical computing, initial measurements indicated a hundred-fold performance disadvantage between Java and more established languages such as Fortran and C. Although much progress has been made, and Java now can be competitive with C/C++ in many important situations, significant performance challenges remain. Existing Java virtual machines are not yet capable of performing the advanced loop transformations and automatic parallelization that are now common in state-of-the-art Fortran compilers. Java also has difficulties in implementing complex arithmetic efficiently. These performance deficiencies can be attacked with a combination of class libraries (packages, in Java that implement truly multidimensional arrays and complex numbers, and new compiler techniques that exploit the properties of these class libraries to enable other, more conventional, optimizations. Two compiler techniques, versioning and semantic expansion, can be leveraged to allow fully automatic optimization and parallelization of Java code. Our measurements with the NINJA prototype Java environment show that Java can be competitive in performance with highly optimized and tuned Fortran code.

  10. A Case Study on Neural Inspired Dynamic Memory Management Strategies for High Performance Computing.

    Vineyard, Craig Michael [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Verzi, Stephen Joseph [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-09-01

    As high performance computing architectures pursue more computational power there is a need for increased memory capacity and bandwidth as well. A multi-level memory (MLM) architecture addresses this need by combining multiple memory types with different characteristics as varying levels of the same architecture. How to efficiently utilize this memory infrastructure is an unknown challenge, and in this research we sought to investigate whether neural inspired approaches can meaningfully help with memory management. In particular we explored neurogenesis inspired re- source allocation, and were able to show a neural inspired mixed controller policy can beneficially impact how MLM architectures utilize memory.

  11. Architectural Principles and Experimentation of Distributed High Performance Virtual Clusters

    Younge, Andrew J.

    2016-01-01

    With the advent of virtualization and Infrastructure-as-a-Service (IaaS), the broader scientific computing community is considering the use of clouds for their scientific computing needs. This is due to the relative scalability, ease of use, advanced user environment customization abilities, and the many novel computing paradigms available for…

  12. Enabling High-Performance Computing as a Service

    AbdelBaky, Moustafa; Parashar, Manish; Kim, Hyunjoo; Jordan, Kirk E.; Sachdeva, Vipin; Sexton, James; Jamjoom, Hani; Shae, Zon-Yin; Pencheva, Gergina; Tavakoli, Reza; Wheeler, Mary F.

    2012-01-01

    With the right software infrastructure, clouds can provide scientists with as a service access to high-performance computing resources. An award-winning prototype framework transforms the Blue Gene/P system into an elastic cloud to run a

  13. High performance computing environment for multidimensional image analysis.

    Rao, A Ravishankar; Cecchi, Guillermo A; Magnasco, Marcelo

    2007-07-10

    The processing of images acquired through microscopy is a challenging task due to the large size of datasets (several gigabytes) and the fast turnaround time required. If the throughput of the image processing stage is significantly increased, it can have a major impact in microscopy applications. We present a high performance computing (HPC) solution to this problem. This involves decomposing the spatial 3D image into segments that are assigned to unique processors, and matched to the 3D torus architecture of the IBM Blue Gene/L machine. Communication between segments is restricted to the nearest neighbors. When running on a 2 Ghz Intel CPU, the task of 3D median filtering on a typical 256 megabyte dataset takes two and a half hours, whereas by using 1024 nodes of Blue Gene, this task can be performed in 18.8 seconds, a 478x speedup. Our parallel solution dramatically improves the performance of image processing, feature extraction and 3D reconstruction tasks. This increased throughput permits biologists to conduct unprecedented large scale experiments with massive datasets.

  14. Lightweight Provenance Service for High-Performance Computing

    Dai, Dong; Chen, Yong; Carns, Philip; Jenkins, John; Ross, Robert

    2017-09-09

    Provenance describes detailed information about the history of a piece of data, containing the relationships among elements such as users, processes, jobs, and workflows that contribute to the existence of data. Provenance is key to supporting many data management functionalities that are increasingly important in operations such as identifying data sources, parameters, or assumptions behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. Despite its importance, however, provenance support is largely underdeveloped in highly parallel architectures and systems. One major challenge is the demanding requirements of providing provenance service in situ. The need to remain lightweight and to be always on often conflicts with the need to be transparent and offer an accurate catalog of details regarding the applications and systems. To tackle this challenge, we introduce a lightweight provenance service, called LPS, for high-performance computing (HPC) systems. LPS leverages a kernel instrument mechanism to achieve transparency and introduces representative execution and flexible granularity to capture comprehensive provenance with controllable overhead. Extensive evaluations and use cases have confirmed its efficiency and usability. We believe that LPS can be integrated into current and future HPC systems to support a variety of data management needs.

  15. High Performance Computing Modernization Program Kerberos Throughput Test Report

    2017-10-26

    Naval Research Laboratory Washington, DC 20375-5320 NRL/MR/5524--17-9751 High Performance Computing Modernization Program Kerberos Throughput Test ...NUMBER 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 2. REPORT TYPE1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND SUBTITLE 6. AUTHOR(S) 8. PERFORMING...PAGE 18. NUMBER OF PAGES 17. LIMITATION OF ABSTRACT High Performance Computing Modernization Program Kerberos Throughput Test Report Daniel G. Gdula* and

  16. High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures

    Ltaief, Hatem

    2013-04-01

    This article presents a new high-performance bidiagonal reduction (BRD) for homogeneous multicore architectures. This article is an extension of the high-performance tridiagonal reduction implemented by the same authors [Luszczek et al., IPDPS 2011] to the BRD case. The BRD is the first step toward computing the singular value decomposition of a matrix, which is one of the most important algorithms in numerical linear algebra due to its broad impact in computational science. The high performance of the BRD described in this article comes from the combination of four important features: (1) tile algorithms with tile data layout, which provide an efficient data representation in main memory; (2) a two-stage reduction approach that allows to cast most of the computation during the first stage (reduction to band form) into calls to Level 3 BLAS and reduces the memory traffic during the second stage (reduction from band to bidiagonal form) by using high-performance kernels optimized for cache reuse; (3) a data dependence translation layer that maps the general algorithm with column-major data layout into the tile data layout; and (4) a dynamic runtime system that efficiently schedules the newly implemented kernels across the processing units and ensures that the data dependencies are not violated. A detailed analysis is provided to understand the critical impact of the tile size on the total execution time, which also corresponds to the matrix bandwidth size after the reduction of the first stage. The performance results show a significant improvement over currently established alternatives. The new high-performance BRD achieves up to a 30-fold speedup on a 16-core Intel Xeon machine with a 12000×12000 matrix size against the state-of-the-art open source and commercial numerical software packages, namely LAPACK, compiled with optimized and multithreaded BLAS from MKL as well as Intel MKL version 10.2. © 2013 ACM.

  17. Architecture for high performance stereoscopic game rendering on Android

    Flack, Julien; Sanderson, Hugh; Shetty, Sampath

    2014-03-01

    Stereoscopic gaming is a popular source of content for consumer 3D display systems. There has been a significant shift in the gaming industry towards casual games for mobile devices running on the Android™ Operating System and driven by ARM™ and other low power processors. Such systems are now being integrated directly into the next generation of 3D TVs potentially removing the requirement for an external games console. Although native stereo support has been integrated into some high profile titles on established platforms like Windows PC and PS3 there is a lack of GPU independent 3D support for the emerging Android platform. We describe a framework for enabling stereoscopic 3D gaming on Android for applications on mobile devices, set top boxes and TVs. A core component of the architecture is a 3D game driver, which is integrated into the Android OpenGL™ ES graphics stack to convert existing 2D graphics applications into stereoscopic 3D in real-time. The architecture includes a method of analyzing 2D games and using rule based Artificial Intelligence (AI) to position separate objects in 3D space. We describe an innovative stereo 3D rendering technique to separate the views in the depth domain and render directly into the display buffer. The advantages of the stereo renderer are demonstrated by characterizing the performance in comparison to more traditional render techniques, including depth based image rendering, both in terms of frame rates and impact on battery consumption.

  18. High-Performance Monitoring Architecture for Large-Scale Distributed Systems Using Event Filtering

    Maly, K.

    1998-01-01

    Monitoring is an essential process to observe and improve the reliability and the performance of large-scale distributed (LSD) systems. In an LSD environment, a large number of events is generated by the system components during its execution or interaction with external objects (e.g. users or processes). Monitoring such events is necessary for observing the run-time behavior of LSD systems and providing status information required for debugging, tuning and managing such applications. However, correlated events are generated concurrently and could be distributed in various locations in the applications environment which complicates the management decisions process and thereby makes monitoring LSD systems an intricate task. We propose a scalable high-performance monitoring architecture for LSD systems to detect and classify interesting local and global events and disseminate the monitoring information to the corresponding end- points management applications such as debugging and reactive control tools to improve the application performance and reliability. A large volume of events may be generated due to the extensive demands of the monitoring applications and the high interaction of LSD systems. The monitoring architecture employs a high-performance event filtering mechanism to efficiently process the large volume of event traffic generated by LSD systems and minimize the intrusiveness of the monitoring process by reducing the event traffic flow in the system and distributing the monitoring computation. Our architecture also supports dynamic and flexible reconfiguration of the monitoring mechanism via its Instrumentation and subscription components. As a case study, we show how our monitoring architecture can be utilized to improve the reliability and the performance of the Interactive Remote Instruction (IRI) system which is a large-scale distributed system for collaborative distance learning. The filtering mechanism represents an Intrinsic component integrated

  19. A high performance scientific cloud computing environment for materials simulations

    Jorissen, Kevin; Vila, Fernando D.; Rehr, John J.

    2011-01-01

    We describe the development of a scientific cloud computing (SCC) platform that offers high performance computation capability. The platform consists of a scientific virtual machine prototype containing a UNIX operating system and several materials science codes, together with essential interface tools (an SCC toolset) that offers functionality comparable to local compute clusters. In particular, our SCC toolset provides automatic creation of virtual clusters for parallel computing, including...

  20. Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs

    Dias, Tiago; Roma, Nuno; Sousa, Leonel

    2014-12-01

    A unified architecture for fast and efficient computation of the set of two-dimensional (2-D) transforms adopted by the most recent state-of-the-art digital video standards is presented in this paper. Contrasting to other designs with similar functionality, the presented architecture is supported on a scalable, modular and completely configurable processing structure. This flexible structure not only allows to easily reconfigure the architecture to support different transform kernels, but it also permits its resizing to efficiently support transforms of different orders (e.g. order-4, order-8, order-16 and order-32). Consequently, not only is it highly suitable to realize high-performance multi-standard transform cores, but it also offers highly efficient implementations of specialized processing structures addressing only a reduced subset of transforms that are used by a specific video standard. The experimental results that were obtained by prototyping several configurations of this processing structure in a Xilinx Virtex-7 FPGA show the superior performance and hardware efficiency levels provided by the proposed unified architecture for the implementation of transform cores for the Advanced Video Coding (AVC), Audio Video coding Standard (AVS), VC-1 and High Efficiency Video Coding (HEVC) standards. In addition, such results also demonstrate the ability of this processing structure to realize multi-standard transform cores supporting all the standards mentioned above and that are capable of processing the 8k Ultra High Definition Television (UHDTV) video format (7,680 × 4,320 at 30 fps) in real time.

  1. Computers in Academic Architecture Libraries.

    Willis, Alfred; And Others

    1992-01-01

    Computers are widely used in architectural research and teaching in U.S. schools of architecture. A survey of libraries serving these schools sought information on the emphasis placed on computers by the architectural curriculum, accessibility of computers to library staff, and accessibility of computers to library patrons. Survey results and…

  2. InfoMall: An Innovative Strategy for High-Performance Computing and Communications Applications Development.

    Mills, Kim; Fox, Geoffrey

    1994-01-01

    Describes the InfoMall, a program led by the Northeast Parallel Architectures Center (NPAC) at Syracuse University (New York). The InfoMall features a partnership of approximately 24 organizations offering linked programs in High Performance Computing and Communications (HPCC) technology integration, software development, marketing, education and…

  3. Quantum Accelerators for High-Performance Computing Systems

    Britt, Keith A.; Mohiyaddin, Fahd A.; Humble, Travis S.

    2017-01-01

    We define some of the programming and system-level challenges facing the application of quantum processing to high-performance computing. Alongside barriers to physical integration, prominent differences in the execution of quantum and conventional programs challenges the intersection of these computational models. Following a brief overview of the state of the art, we discuss recent advances in programming and execution models for hybrid quantum-classical computing. We discuss a novel quantu...

  4. High Performance Networks From Supercomputing to Cloud Computing

    Abts, Dennis

    2011-01-01

    Datacenter networks provide the communication substrate for large parallel computer systems that form the ecosystem for high performance computing (HPC) systems and modern Internet applications. The design of new datacenter networks is motivated by an array of applications ranging from communication intensive climatology, complex material simulations and molecular dynamics to such Internet applications as Web search, language translation, collaborative Internet applications, streaming video and voice-over-IP. For both Supercomputing and Cloud Computing the network enables distributed applicati

  5. High Performance Computing Software Applications for Space Situational Awareness

    Giuliano, C.; Schumacher, P.; Matson, C.; Chun, F.; Duncan, B.; Borelli, K.; Desonia, R.; Gusciora, G.; Roe, K.

    The High Performance Computing Software Applications Institute for Space Situational Awareness (HSAI-SSA) has completed its first full year of applications development. The emphasis of our work in this first year was in improving space surveillance sensor models and image enhancement software. These applications are the Space Surveillance Network Analysis Model (SSNAM), the Air Force Space Fence simulation (SimFence), and physically constrained iterative de-convolution (PCID) image enhancement software tool. Specifically, we have demonstrated order of magnitude speed-up in those codes running on the latest Cray XD-1 Linux supercomputer (Hoku) at the Maui High Performance Computing Center. The software applications improvements that HSAI-SSA has made, has had significant impact to the warfighter and has fundamentally changed the role of high performance computing in SSA.

  6. High performance computing and communications: FY 1997 implementation plan

    NONE

    1996-12-01

    The High Performance Computing and Communications (HPCC) Program was formally authorized by passage, with bipartisan support, of the High-Performance Computing Act of 1991, signed on December 9, 1991. The original Program, in which eight Federal agencies participated, has now grown to twelve agencies. This Plan provides a detailed description of the agencies` FY 1996 HPCC accomplishments and FY 1997 HPCC plans. Section 3 of this Plan provides an overview of the HPCC Program. Section 4 contains more detailed definitions of the Program Component Areas, with an emphasis on the overall directions and milestones planned for each PCA. Appendix A provides a detailed look at HPCC Program activities within each agency.

  7. Visualization and Data Analysis for High-Performance Computing

    Sewell, Christopher Meyer [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2016-09-27

    This is a set of slides from a guest lecture for a class at the University of Texas, El Paso on visualization and data analysis for high-performance computing. The topics covered are the following: trends in high-performance computing; scientific visualization, such as OpenGL, ray tracing and volume rendering, VTK, and ParaView; data science at scale, such as in-situ visualization, image databases, distributed memory parallelism, shared memory parallelism, VTK-m, "big data", and then an analysis example.

  8. High Performance Computing in Science and Engineering '14

    Kröner, Dietmar; Resch, Michael

    2015-01-01

    This book presents the state-of-the-art in supercomputer simulation. It includes the latest findings from leading researchers using systems from the High Performance Computing Center Stuttgart (HLRS). The reports cover all fields of computational science and engineering ranging from CFD to computational physics and from chemistry to computer science with a special emphasis on industrially relevant applications. Presenting findings of one of Europe’s leading systems, this volume covers a wide variety of applications that deliver a high level of sustained performance. The book covers the main methods in high-performance computing. Its outstanding results in achieving the best performance for production codes are of particular interest for both scientists and   engineers. The book comes with a wealth of color illustrations and tables of results.  

  9. High-performance scientific computing in the cloud

    Jorissen, Kevin; Vila, Fernando; Rehr, John

    2011-03-01

    Cloud computing has the potential to open up high-performance computational science to a much broader class of researchers, owing to its ability to provide on-demand, virtualized computational resources. However, before such approaches can become commonplace, user-friendly tools must be developed that hide the unfamiliar cloud environment and streamline the management of cloud resources for many scientific applications. We have recently shown that high-performance cloud computing is feasible for parallelized x-ray spectroscopy calculations. We now present benchmark results for a wider selection of scientific applications focusing on electronic structure and spectroscopic simulation software in condensed matter physics. These applications are driven by an improved portable interface that can manage virtual clusters and run various applications in the cloud. We also describe a next generation of cluster tools, aimed at improved performance and a more robust cluster deployment. Supported by NSF grant OCI-1048052.

  10. GPU-based high-performance computing for radiation therapy

    Jia, Xun; Jiang, Steve B; Ziegenhein, Peter

    2014-01-01

    Recent developments in radiotherapy therapy demand high computation powers to solve challenging problems in a timely fashion in a clinical environment. The graphics processing unit (GPU), as an emerging high-performance computing platform, has been introduced to radiotherapy. It is particularly attractive due to its high computational power, small size, and low cost for facility deployment and maintenance. Over the past few years, GPU-based high-performance computing in radiotherapy has experienced rapid developments. A tremendous amount of study has been conducted, in which large acceleration factors compared with the conventional CPU platform have been observed. In this paper, we will first give a brief introduction to the GPU hardware structure and programming model. We will then review the current applications of GPU in major imaging-related and therapy-related problems encountered in radiotherapy. A comparison of GPU with other platforms will also be presented. (topical review)

  11. Enabling High-Performance Computing as a Service

    AbdelBaky, Moustafa

    2012-10-01

    With the right software infrastructure, clouds can provide scientists with as a service access to high-performance computing resources. An award-winning prototype framework transforms the Blue Gene/P system into an elastic cloud to run a representative HPC application. © 2012 IEEE.

  12. Computer science of the high performance; Informatica del alto rendimiento

    Moraleda, A.

    2008-07-01

    The high performance computing is taking shape as a powerful accelerator of the process of innovation, to drastically reduce the waiting times for access to the results and the findings in a growing number of processes and activities as complex and important as medicine, genetics, pharmacology, environment, natural resources management or the simulation of complex processes in a wide variety of industries. (Author)

  13. Contemporary high performance computing from petascale toward exascale

    Vetter, Jeffrey S

    2015-01-01

    A continuation of Contemporary High Performance Computing: From Petascale toward Exascale, this second volume continues the discussion of HPC flagship systems, major application workloads, facilities, and sponsors. The book includes of figures and pictures that capture the state of existing systems: pictures of buildings, systems in production, floorplans, and many block diagrams and charts to illustrate system design and performance.

  14. High Performance Computing in Science and Engineering '16 : Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2016

    Kröner, Dietmar; Resch, Michael

    2016-01-01

    This book presents the state-of-the-art in supercomputer simulation. It includes the latest findings from leading researchers using systems from the High Performance Computing Center Stuttgart (HLRS) in 2016. The reports cover all fields of computational science and engineering ranging from CFD to computational physics and from chemistry to computer science with a special emphasis on industrially relevant applications. Presenting findings of one of Europe’s leading systems, this volume covers a wide variety of applications that deliver a high level of sustained performance. The book covers the main methods in high-performance computing. Its outstanding results in achieving the best performance for production codes are of particular interest for both scientists and engineers. The book comes with a wealth of color illustrations and tables of results.

  15. Blaze-DEMGPU: Modular high performance DEM framework for the GPU architecture

    Nicolin Govender

    2016-01-01

    Full Text Available Blaze-DEMGPU is a modular GPU based discrete element method (DEM framework that supports polyhedral shaped particles. The high level performance is attributed to the light weight and Single Instruction Multiple Data (SIMD that the GPU architecture offers. Blaze-DEMGPU offers suitable algorithms to conduct DEM simulations on the GPU and these algorithms can be extended and modified. Since a large number of scientific simulations are particle based, many of the algorithms and strategies for GPU implementation present in Blaze-DEMGPU can be applied to other fields. Blaze-DEMGPU will make it easier for new researchers to use high performance GPU computing as well as stimulate wider GPU research efforts by the DEM community.

  16. A high performance scientific cloud computing environment for materials simulations

    Jorissen, K.; Vila, F. D.; Rehr, J. J.

    2012-09-01

    We describe the development of a scientific cloud computing (SCC) platform that offers high performance computation capability. The platform consists of a scientific virtual machine prototype containing a UNIX operating system and several materials science codes, together with essential interface tools (an SCC toolset) that offers functionality comparable to local compute clusters. In particular, our SCC toolset provides automatic creation of virtual clusters for parallel computing, including tools for execution and monitoring performance, as well as efficient I/O utilities that enable seamless connections to and from the cloud. Our SCC platform is optimized for the Amazon Elastic Compute Cloud (EC2). We present benchmarks for prototypical scientific applications and demonstrate performance comparable to local compute clusters. To facilitate code execution and provide user-friendly access, we have also integrated cloud computing capability in a JAVA-based GUI. Our SCC platform may be an alternative to traditional HPC resources for materials science or quantum chemistry applications.

  17. High performance computing and communications: FY 1996 implementation plan

    NONE

    1995-05-16

    The High Performance Computing and Communications (HPCC) Program was formally authorized by passage of the High Performance Computing Act of 1991, signed on December 9, 1991. Twelve federal agencies, in collaboration with scientists and managers from US industry, universities, and research laboratories, have developed the Program to meet the challenges of advancing computing and associated communications technologies and practices. This plan provides a detailed description of the agencies` HPCC implementation plans for FY 1995 and FY 1996. This Implementation Plan contains three additional sections. Section 3 provides an overview of the HPCC Program definition and organization. Section 4 contains a breakdown of the five major components of the HPCC Program, with an emphasis on the overall directions and milestones planned for each one. Section 5 provides a detailed look at HPCC Program activities within each agency.

  18. 14th annual Results and Review Workshop on High Performance Computing in Science and Engineering

    Nagel, Wolfgang E; Resch, Michael M; Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2011; High Performance Computing in Science and Engineering '11

    2012-01-01

    This book presents the state-of-the-art in simulation on supercomputers. Leading researchers present results achieved on systems of the High Performance Computing Center Stuttgart (HLRS) for the year 2011. The reports cover all fields of computational science and engineering, ranging from CFD to computational physics and chemistry, to computer science, with a special emphasis on industrially relevant applications. Presenting results for both vector systems and microprocessor-based systems, the book allows readers to compare the performance levels and usability of various architectures. As HLRS

  19. 3rd International Conference on High Performance Scientific Computing

    Kostina, Ekaterina; Phu, Hoang; Rannacher, Rolf

    2008-01-01

    This proceedings volume contains a selection of papers presented at the Third International Conference on High Performance Scientific Computing held at the Hanoi Institute of Mathematics, Vietnamese Academy of Science and Technology (VAST), March 6-10, 2006. The conference has been organized by the Hanoi Institute of Mathematics, Interdisciplinary Center for Scientific Computing (IWR), Heidelberg, and its International PhD Program ``Complex Processes: Modeling, Simulation and Optimization'', and Ho Chi Minh City University of Technology. The contributions cover the broad interdisciplinary spectrum of scientific computing and present recent advances in theory, development of methods, and applications in practice. Subjects covered are mathematical modelling, numerical simulation, methods for optimization and control, parallel computing, software development, applications of scientific computing in physics, chemistry, biology and mechanics, environmental and hydrology problems, transport, logistics and site loca...

  20. 5th International Conference on High Performance Scientific Computing

    Hoang, Xuan; Rannacher, Rolf; Schlöder, Johannes

    2014-01-01

    This proceedings volume gathers a selection of papers presented at the Fifth International Conference on High Performance Scientific Computing, which took place in Hanoi on March 5-9, 2012. The conference was organized by the Institute of Mathematics of the Vietnam Academy of Science and Technology (VAST), the Interdisciplinary Center for Scientific Computing (IWR) of Heidelberg University, Ho Chi Minh City University of Technology, and the Vietnam Institute for Advanced Study in Mathematics. The contributions cover the broad interdisciplinary spectrum of scientific computing and present recent advances in theory, development of methods, and practical applications. Subjects covered include mathematical modeling; numerical simulation; methods for optimization and control; parallel computing; software development; and applications of scientific computing in physics, mechanics and biomechanics, material science, hydrology, chemistry, biology, biotechnology, medicine, sports, psychology, transport, logistics, com...

  1. 6th International Conference on High Performance Scientific Computing

    Phu, Hoang; Rannacher, Rolf; Schlöder, Johannes

    2017-01-01

    This proceedings volume highlights a selection of papers presented at the Sixth International Conference on High Performance Scientific Computing, which took place in Hanoi, Vietnam on March 16-20, 2015. The conference was jointly organized by the Heidelberg Institute of Theoretical Studies (HITS), the Institute of Mathematics of the Vietnam Academy of Science and Technology (VAST), the Interdisciplinary Center for Scientific Computing (IWR) at Heidelberg University, and the Vietnam Institute for Advanced Study in Mathematics, Ministry of Education The contributions cover a broad, interdisciplinary spectrum of scientific computing and showcase recent advances in theory, methods, and practical applications. Subjects covered numerical simulation, methods for optimization and control, parallel computing, and software development, as well as the applications of scientific computing in physics, mechanics, biomechanics and robotics, material science, hydrology, biotechnology, medicine, transport, scheduling, and in...

  2. Unravelling the structure of matter on high-performance computers

    Kieu, T.D.; McKellar, B.H.J.

    1992-11-01

    The various phenomena and the different forms of matter in nature are believed to be the manifestation of only a handful set of fundamental building blocks-the elementary particles-which interact through the four fundamental forces. In the study of the structure of matter at this level one has to consider forces which are not sufficiently weak to be treated as small perturbations to the system, an example of which is the strong force that binds the nucleons together. High-performance computers, both vector and parallel machines, have facilitated the necessary non-perturbative treatments. The principles and the techniques of computer simulations applied to Quantum Chromodynamics are explained examples include the strong interactions, the calculation of the mass of nucleons and their decay rates. Some commercial and special-purpose high-performance machines for such calculations are also mentioned. 3 refs., 2 tabs

  3. A high level language for a high performance computer

    Perrott, R. H.

    1978-01-01

    The proposed computational aerodynamic facility will join the ranks of the supercomputers due to its architecture and increased execution speed. At present, the languages used to program these supercomputers have been modifications of programming languages which were designed many years ago for sequential machines. A new programming language should be developed based on the techniques which have proved valuable for sequential programming languages and incorporating the algorithmic techniques required for these supercomputers. The design objectives for such a language are outlined.

  4. Multi-Language Programming Environments for High Performance Java Computing

    Vladimir Getov; Paul Gray; Sava Mintchev; Vaidy Sunderam

    1999-01-01

    Recent developments in processor capabilities, software tools, programming languages and programming paradigms have brought about new approaches to high performance computing. A steadfast component of this dynamic evolution has been the scientific community’s reliance on established scientific packages. As a consequence, programmers of high‐performance applications are reluctant to embrace evolving languages such as Java. This paper describes the Java‐to‐C Interface (JCI) tool which provides ...

  5. High-performance full adder architecture in quantum-dot cellular automata

    Hamid Rashidi

    2017-06-01

    Full Text Available Quantum-dot cellular automata (QCA is a new and promising computation paradigm, which can be a viable replacement for the complementary metal–oxide–semiconductor technology at nano-scale level. This technology provides a possible solution for improving the computation in various computational applications. Two QCA full adder architectures are presented and evaluated: a new and efficient 1-bit QCA full adder architecture and a 4-bit QCA ripple carry adder (RCA architecture. The proposed architectures are simulated using QCADesigner tool version 2.0.1. These architectures are implemented with the coplanar crossover approach. The simulation results show that the proposed 1-bit QCA full adder and 4-bit QCA RCA architectures utilise 33 and 175 QCA cells, respectively. Our simulation results show that the proposed architectures outperform most results so far in the literature.

  6. High-Performance Java Codes for Computational Fluid Dynamics

    Riley, Christopher; Chatterjee, Siddhartha; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

    2001-01-01

    The computational science community is reluctant to write large-scale computationally -intensive applications in Java due to concerns over Java's poor performance, despite the claimed software engineering advantages of its object-oriented features. Naive Java implementations of numerical algorithms can perform poorly compared to corresponding Fortran or C implementations. To achieve high performance, Java applications must be designed with good performance as a primary goal. This paper presents the object-oriented design and implementation of two real-world applications from the field of Computational Fluid Dynamics (CFD): a finite-volume fluid flow solver (LAURA, from NASA Langley Research Center), and an unstructured mesh adaptation algorithm (2D_TAG, from NASA Ames Research Center). This work builds on our previous experience with the design of high-performance numerical libraries in Java. We examine the performance of the applications using the currently available Java infrastructure and show that the Java version of the flow solver LAURA performs almost within a factor of 2 of the original procedural version. Our Java version of the mesh adaptation algorithm 2D_TAG performs within a factor of 1.5 of its original procedural version on certain platforms. Our results demonstrate that object-oriented software design principles are not necessarily inimical to high performance.

  7. Nuclear forces and high-performance computing: The perfect match

    Luu, T; Walker-Loud, A

    2009-01-01

    High-performance computing is now enabling the calculation of certain hadronic interaction parameters directly from Quantum Chromodynamics, the quantum field theory that governs the behavior of quarks and gluons and is ultimately responsible for the nuclear strong force. In this paper we briefly describe the state of the field and show how other aspects of hadronic interactions will be ascertained in the near future. We give estimates of computational requirements needed to obtain these goals, and outline a procedure for incorporating these results into the broader nuclear physics community.

  8. High-performance computing in accelerating structure design and analysis

    Li Zenghai; Folwell, Nathan; Ge Lixin; Guetz, Adam; Ivanov, Valentin; Kowalski, Marc; Lee, Lie-Quan; Ng, Cho-Kuen; Schussman, Greg; Stingelin, Lukas; Uplenchwar, Ravindra; Wolf, Michael; Xiao, Liling; Ko, Kwok

    2006-01-01

    Future high-energy accelerators such as the Next Linear Collider (NLC) will accelerate multi-bunch beams of high current and low emittance to obtain high luminosity, which put stringent requirements on the accelerating structures for efficiency and beam stability. While numerical modeling has been quite standard in accelerator R and D, designing the NLC accelerating structure required a new simulation capability because of the geometric complexity and level of accuracy involved. Under the US DOE Advanced Computing initiatives (first the Grand Challenge and now SciDAC), SLAC has developed a suite of electromagnetic codes based on unstructured grids and utilizing high-performance computing to provide an advanced tool for modeling structures at accuracies and scales previously not possible. This paper will discuss the code development and computational science research (e.g. domain decomposition, scalable eigensolvers, adaptive mesh refinement) that have enabled the large-scale simulations needed for meeting the computational challenges posed by the NLC as well as projects such as the PEP-II and RIA. Numerical results will be presented to show how high-performance computing has made a qualitative improvement in accelerator structure modeling for these accelerators, either at the component level (single cell optimization), or on the scale of an entire structure (beam heating and long-range wakefields)

  9. Optimized Architectural Approaches in Hardware and Software Enabling Very High Performance Shared Storage Systems

    CERN. Geneva

    2004-01-01

    There are issues encountered in high performance storage systems that normally lead to compromises in architecture. Compute clusters tend to have compute phases followed by an I/O phase that must move data from the entire cluster in one operation. That data may then be shared by a large number of clients creating unpredictable read and write patterns. In some cases the aggregate performance of a server cluster must exceed 100 GB/s to minimize the time required for the I/O cycle thus maximizing compute availability. Accessing the same content from multiple points in a shared file system leads to the classical problems of data "hot spots" on the disk drive side and access collisions on the data connectivity side. The traditional method for increasing apparent bandwidth usually includes data replication which is costly in both storage and management. Scaling a model that includes replicated data presents additional management challenges as capacity and bandwidth expand asymmetrically while the system is scaled. ...

  10. Building a High Performance Computing Infrastructure for Novosibirsk Scientific Center

    Adakin, A; Chubarov, D; Nikultsev, V; Belov, S; Kaplin, V; Sukharev, A; Zaytsev, A; Kalyuzhny, V; Kuchin, N; Lomakin, S

    2011-01-01

    Novosibirsk Scientific Center (NSC), also known worldwide as Akademgorodok, is one of the largest Russian scientific centers hosting Novosibirsk State University (NSU) and more than 35 research organizations of the Siberian Branch of Russian Academy of Sciences including Budker Institute of Nuclear Physics (BINP), Institute of Computational Technologies (ICT), and Institute of Computational Mathematics and Mathematical Geophysics (ICM and MG). Since each institute has specific requirements on the architecture of the computing farms involved in its research field, currently we've got several computing facilities hosted by NSC institutes, each optimized for the particular set of tasks, of which the largest are the NSU Supercomputer Center, Siberian Supercomputer Center (ICM and MG), and a Grid Computing Facility of BINP. Recently a dedicated optical network with the initial bandwidth of 10 Gbps connecting these three facilities was built in order to make it possible to share the computing resources among the research communities of participating institutes, thus providing a common platform for building the computing infrastructure for various scientific projects. Unification of the computing infrastructure is achieved by extensive use of virtualization technologies based on XEN and KVM platforms. The solution implemented was tested thoroughly within the computing environment of KEDR detector experiment which is being carried out at BINP, and foreseen to be applied to the use cases of other HEP experiments in the upcoming future.

  11. The ongoing investigation of high performance parallel computing in HEP

    Peach, Kenneth J; Böck, R K; Dobinson, Robert W; Hansroul, M; Norton, Alan Robert; Willers, Ian Malcolm; Baud, J P; Carminati, F; Gagliardi, F; McIntosh, E; Metcalf, M; Robertson, L; CERN. Geneva. Detector Research and Development Committee

    1993-01-01

    Past and current exploitation of parallel computing in High Energy Physics is summarized and a list of R & D projects in this area is presented. The applicability of new parallel hardware and software to physics problems is investigated, in the light of the requirements for computing power of LHC experiments and the current trends in the computer industry. Four main themes are discussed (possibilities for a finer grain of parallelism; fine-grain communication mechanism; usable parallel programming environment; different programming models and architectures, using standard commercial products). Parallel computing technology is potentially of interest for offline and vital for real time applications in LHC. A substantial investment in applications development and evaluation of state of the art hardware and software products is needed. A solid development environment is required at an early stage, before mainline LHC program development begins.

  12. Multicore Challenges and Benefits for High Performance Scientific Computing

    Ida M.B. Nielsen

    2008-01-01

    Full Text Available Until recently, performance gains in processors were achieved largely by improvements in clock speeds and instruction level parallelism. Thus, applications could obtain performance increases with relatively minor changes by upgrading to the latest generation of computing hardware. Currently, however, processor performance improvements are realized by using multicore technology and hardware support for multiple threads within each core, and taking full advantage of this technology to improve the performance of applications requires exposure of extreme levels of software parallelism. We will here discuss the architecture of parallel computers constructed from many multicore chips as well as techniques for managing the complexity of programming such computers, including the hybrid message-passing/multi-threading programming model. We will illustrate these ideas with a hybrid distributed memory matrix multiply and a quantum chemistry algorithm for energy computation using Møller–Plesset perturbation theory.

  13. High performance computing in science and engineering '09: transactions of the High Performance Computing Center, Stuttgart (HLRS) 2009

    Nagel, Wolfgang E; Kröner, Dietmar; Resch, Michael

    2010-01-01

    ...), NIC/JSC (J¨ u lich), and LRZ (Munich). As part of that strategic initiative, in May 2009 already NIC/JSC has installed the first phase of the GCS HPC Tier-0 resources, an IBM Blue Gene/P with roughly 300.000 Cores, this time in J¨ u lich, With that, the GCS provides the most powerful high-performance computing infrastructure in Europe alread...

  14. High Performance Numerical Computing for High Energy Physics: A New Challenge for Big Data Science

    Pop, Florin

    2014-01-01

    Modern physics is based on both theoretical analysis and experimental validation. Complex scenarios like subatomic dimensions, high energy, and lower absolute temperature are frontiers for many theoretical models. Simulation with stable numerical methods represents an excellent instrument for high accuracy analysis, experimental validation, and visualization. High performance computing support offers possibility to make simulations at large scale, in parallel, but the volume of data generated by these experiments creates a new challenge for Big Data Science. This paper presents existing computational methods for high energy physics (HEP) analyzed from two perspectives: numerical methods and high performance computing. The computational methods presented are Monte Carlo methods and simulations of HEP processes, Markovian Monte Carlo, unfolding methods in particle physics, kernel estimation in HEP, and Random Matrix Theory used in analysis of particles spectrum. All of these methods produce data-intensive applications, which introduce new challenges and requirements for ICT systems architecture, programming paradigms, and storage capabilities.

  15. A Multi-Time Scale Morphable Software Milieu for Polymorphous Computing Architectures (PCA) - Composable, Scalable Systems

    Skjellum, Anthony

    2004-01-01

    Polymorphous Computing Architectures (PCA) rapidly "morph" (reorganize) software and hardware configurations in order to achieve high performance on computation styles ranging from specialized streaming to general threaded applications...

  16. High performance computing and communications: FY 1995 implementation plan

    NONE

    1994-04-01

    The High Performance Computing and Communications (HPCC) Program was formally established following passage of the High Performance Computing Act of 1991 signed on December 9, 1991. Ten federal agencies in collaboration with scientists and managers from US industry, universities, and laboratories have developed the HPCC Program to meet the challenges of advancing computing and associated communications technologies and practices. This plan provides a detailed description of the agencies` HPCC implementation plans for FY 1994 and FY 1995. This Implementation Plan contains three additional sections. Section 3 provides an overview of the HPCC Program definition and organization. Section 4 contains a breakdown of the five major components of the HPCC Program, with an emphasis on the overall directions and milestones planned for each one. Section 5 provides a detailed look at HPCC Program activities within each agency. Although the Department of Education is an official HPCC agency, its current funding and reporting of crosscut activities goes through the Committee on Education and Health Resources, not the HPCC Program. For this reason the Implementation Plan covers nine HPCC agencies.

  17. From Smart-Eco Building to High-Performance Architecture: Optimization of Energy Consumption in Architecture of Developing Countries

    Mahdavinejad, M.; Bitaab, N.

    2017-08-01

    Search for high-performance architecture and dreams of future architecture resulted in attempts towards meeting energy efficient architecture and planning in different aspects. Recent trends as a mean to meet future legacy in architecture are based on the idea of innovative technologies for resource efficient buildings, performative design, bio-inspired technologies etc. while there are meaningful differences between architecture of developed and developing countries. Significance of issue might be understood when the emerging cities are found interested in Dubaization and other related booming development doctrines. This paper is to analyze the level of developing countries’ success to achieve smart-eco buildings’ goals and objectives. Emerging cities of West of Asia are selected as case studies of the paper. The results of the paper show that the concept of high-performance architecture and smart-eco buildings are different in developing countries in comparison with developed countries. The paper is to mention five essential issues in order to improve future architecture of developing countries: 1- Integrated Strategies for Energy Efficiency, 2- Contextual Solutions, 3- Embedded and Initial Energy Assessment, 4- Staff and Occupancy Wellbeing, 5- Life-Cycle Monitoring.

  18. Spatial computing in interactive architecture

    S.O. Dulman (Stefan); M. Krezer; L. Hovestad

    2014-01-01

    htmlabstractDistributed computing is the theoretical foundation for applications and technologies like interactive architecture, wearable computing, and smart materials. It evolves continuously, following needs rising from scientific developments, novel uses of technology, or simply the curiosity to

  19. CITAstudio: Computation in Architecture 2015

    Nicholas, Paul; Ayres, Phil

    2016-01-01

    CITAstudio yearbook. CITAstudio: Computation in Architecture is a two year International Master's Programme at The Royal Danish Academy of Fine Arts, School of Architecture. With a focus on digital design and material fabrication the programme questions how computation is changing our spatial...

  20. High performance computer code for molecular dynamics simulations

    Levay, I.; Toekesi, K.

    2007-01-01

    Complete text of publication follows. Molecular Dynamics (MD) simulation is a widely used technique for modeling complicated physical phenomena. Since 2005 we are developing a MD simulations code for PC computers. The computer code is written in C++ object oriented programming language. The aim of our work is twofold: a) to develop a fast computer code for the study of random walk of guest atoms in Be crystal, b) 3 dimensional (3D) visualization of the particles motion. In this case we mimic the motion of the guest atoms in the crystal (diffusion-type motion), and the motion of atoms in the crystallattice (crystal deformation). Nowadays, it is common to use Graphics Devices in intensive computational problems. There are several ways to use this extreme processing performance, but never before was so easy to programming these devices as now. The CUDA (Compute Unified Device) Architecture introduced by nVidia Corporation in 2007 is a very useful for every processor hungry application. A Unified-architecture GPU include 96-128, or more stream processors, so the raw calculation performance is 576(!) GFLOPS. It is ten times faster, than the fastest dual Core CPU [Fig.1]. Our improved MD simulation software uses this new technology, which speed up our software and the code run 10 times faster in the critical calculation code segment. Although the GPU is a very powerful tool, it has a strongly paralleled structure. It means, that we have to create an algorithm, which works on several processors without deadlock. Our code currently uses 256 threads, shared and constant on-chip memory, instead of global memory, which is 100 times slower than others. It is possible to implement the total algorithm on GPU, therefore we do not need to download and upload the data in every iteration. On behalf of maximal throughput, every thread run with the same instructions

  1. Layered architecture for quantum computing

    Jones, N. Cody; Van Meter, Rodney; Fowler, Austin G.; McMahon, Peter L.; Kim, Jungsang; Ladd, Thaddeus D.; Yamamoto, Yoshihisa

    2010-01-01

    We develop a layered quantum-computer architecture, which is a systematic framework for tackling the individual challenges of developing a quantum computer while constructing a cohesive device design. We discuss many of the prominent techniques for implementing circuit-model quantum computing and introduce several new methods, with an emphasis on employing surface-code quantum error correction. In doing so, we propose a new quantum-computer architecture based on optical control of quantum dot...

  2. Computer architecture a quantitative approach

    Hennessy, John L

    2019-01-01

    Computer Architecture: A Quantitative Approach, Sixth Edition has been considered essential reading by instructors, students and practitioners of computer design for over 20 years. The sixth edition of this classic textbook is fully revised with the latest developments in processor and system architecture. It now features examples from the RISC-V (RISC Five) instruction set architecture, a modern RISC instruction set developed and designed to be a free and openly adoptable standard. It also includes a new chapter on domain-specific architectures and an updated chapter on warehouse-scale computing that features the first public information on Google's newest WSC. True to its original mission of demystifying computer architecture, this edition continues the longstanding tradition of focusing on areas where the most exciting computing innovation is happening, while always keeping an emphasis on good engineering design.

  3. High performance matrix inversion based on LU factorization for multicore architectures

    Dongarra, Jack

    2011-01-01

    The goal of this paper is to present an efficient implementation of an explicit matrix inversion of general square matrices on multicore computer architecture. The inversion procedure is split into four steps: 1) computing the LU factorization, 2) inverting the upper triangular U factor, 3) solving a linear system, whose solution yields inverse of the original matrix and 4) applying backward column pivoting on the inverted matrix. Using a tile data layout, which represents the matrix in the system memory with an optimized cache-aware format, the computation of the four steps is decomposed into computational tasks. A directed acyclic graph is generated on the fly which represents the program data flow. Its nodes represent tasks and edges the data dependencies between them. Previous implementations of matrix inversions, available in the state-of-the-art numerical libraries, are suffer from unnecessary synchronization points, which are non-existent in our implementation in order to fully exploit the parallelism of the underlying hardware. Our algorithmic approach allows to remove these bottlenecks and to execute the tasks with loose synchronization. A runtime environment system called QUARK is necessary to dynamically schedule our numerical kernels on the available processing units. The reported results from our LU-based matrix inversion implementation significantly outperform the state-of-the-art numerical libraries such as LAPACK (5x), MKL (5x) and ScaLAPACK (2.5x) on a contemporary AMD platform with four sockets and the total of 48 cores for a matrix of size 24000. A power consumption analysis shows that our high performance implementation is also energy efficient and substantially consumes less power than its competitors. © 2011 ACM.

  4. High Performance Computing - Power Application Programming Interface Specification.

    Laros, James H.,; Kelly, Suzanne M.; Pedretti, Kevin; Grant, Ryan; Olivier, Stephen Lecler; Levenhagen, Michael J.; DeBonis, David

    2014-08-01

    Measuring and controlling the power and energy consumption of high performance computing systems by various components in the software stack is an active research area [13, 3, 5, 10, 4, 21, 19, 16, 7, 17, 20, 18, 11, 1, 6, 14, 12]. Implementations in lower level software layers are beginning to emerge in some production systems, which is very welcome. To be most effective, a portable interface to measurement and control features would significantly facilitate participation by all levels of the software stack. We present a proposal for a standard power Application Programming Interface (API) that endeavors to cover the entire software space, from generic hardware interfaces to the input from the computer facility manager.

  5. Simple, parallel, high-performance virtual machines for extreme computations

    Chokoufe Nejad, Bijan; Ohl, Thorsten; Reuter, Jurgen

    2014-11-01

    We introduce a high-performance virtual machine (VM) written in a numerically fast language like Fortran or C to evaluate very large expressions. We discuss the general concept of how to perform computations in terms of a VM and present specifically a VM that is able to compute tree-level cross sections for any number of external legs, given the corresponding byte code from the optimal matrix element generator, O'Mega. Furthermore, this approach allows to formulate the parallel computation of a single phase space point in a simple and obvious way. We analyze hereby the scaling behaviour with multiple threads as well as the benefits and drawbacks that are introduced with this method. Our implementation of a VM can run faster than the corresponding native, compiled code for certain processes and compilers, especially for very high multiplicities, and has in general runtimes in the same order of magnitude. By avoiding the tedious compile and link steps, which may fail for source code files of gigabyte sizes, new processes or complex higher order corrections that are currently out of reach could be evaluated with a VM given enough computing power.

  6. What Physicists Should Know About High Performance Computing - Circa 2002

    Frederick, Donald

    2002-08-01

    High Performance Computing (HPC) is a dynamic, cross-disciplinary field that traditionally has involved applied mathematicians, computer scientists, and others primarily from the various disciplines that have been major users of HPC resources - physics, chemistry, engineering, with increasing use by those in the life sciences. There is a technological dynamic that is powered by economic as well as by technical innovations and developments. This talk will discuss practical ideas to be considered when developing numerical applications for research purposes. Even with the rapid pace of development in the field, the author believes that these concepts will not become obsolete for a while, and will be of use to scientists who either are considering, or who have already started down the HPC path. These principles will be applied in particular to current parallel HPC systems, but there will also be references of value to desktop users. The talk will cover such topics as: computing hardware basics, single-cpu optimization, compilers, timing, numerical libraries, debugging and profiling tools and the emergence of Computational Grids.

  7. NCI's High Performance Computing (HPC) and High Performance Data (HPD) Computing Platform for Environmental and Earth System Data Science

    Evans, Ben; Allen, Chris; Antony, Joseph; Bastrakova, Irina; Gohar, Kashif; Porter, David; Pugh, Tim; Santana, Fabiana; Smillie, Jon; Trenham, Claire; Wang, Jingbo; Wyborn, Lesley

    2015-04-01

    The National Computational Infrastructure (NCI) has established a powerful and flexible in-situ petascale computational environment to enable both high performance computing and Data-intensive Science across a wide spectrum of national environmental and earth science data collections - in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress so far to harmonise the underlying data collections for future interdisciplinary research across these large volume data collections. NCI has established 10+ PBytes of major national and international data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI's partners (which include the custodians of many of the major Australian national-scale scientific collections), leading research communities, and collaborating overseas organisations. New infrastructures created at NCI mean the data collections are now accessible within an integrated High Performance Computing and Data (HPC-HPD) environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large-scale high-bandwidth Lustre filesystems. The hardware was designed at inception to ensure that it would allow the layered software environment to flexibly accommodate the advancement of future data science. New approaches to software technology and data models have also had to be developed to enable access to these large and exponentially

  8. Digital design and computer architecture

    Harris, David

    2010-01-01

    Digital Design and Computer Architecture is designed for courses that combine digital logic design with computer organization/architecture or that teach these subjects as a two-course sequence. Digital Design and Computer Architecture begins with a modern approach by rigorously covering the fundamentals of digital logic design and then introducing Hardware Description Languages (HDLs). Featuring examples of the two most widely-used HDLs, VHDL and Verilog, the first half of the text prepares the reader for what follows in the second: the design of a MIPS Processor. By the end of D

  9. Trends in high-performance computing for engineering calculations.

    Giles, M B; Reguly, I

    2014-08-13

    High-performance computing has evolved remarkably over the past 20 years, and that progress is likely to continue. However, in recent years, this progress has been achieved through greatly increased hardware complexity with the rise of multicore and manycore processors, and this is affecting the ability of application developers to achieve the full potential of these systems. This article outlines the key developments on the hardware side, both in the recent past and in the near future, with a focus on two key issues: energy efficiency and the cost of moving data. It then discusses the much slower evolution of system software, and the implications of all of this for application developers. © 2014 The Author(s) Published by the Royal Society. All rights reserved.

  10. Power/energy use cases for high performance computing

    Laros, James H. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Kelly, Suzanne M. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Hammond, Steven [National Renewable Energy Lab. (NREL), Golden, CO (United States); Elmore, Ryan [National Renewable Energy Lab. (NREL), Golden, CO (United States); Munch, Kristin [National Renewable Energy Lab. (NREL), Golden, CO (United States)

    2013-12-01

    Power and Energy have been identified as a first order challenge for future extreme scale high performance computing (HPC) systems. In practice the breakthroughs will need to be provided by the hardware vendors. But to make the best use of the solutions in an HPC environment, it will likely require periodic tuning by facility operators and software components. This document describes the actions and interactions needed to maximize power resources. It strives to cover the entire operational space in which an HPC system occupies. The descriptions are presented as formal use cases, as documented in the Unified Modeling Language Specification [1]. The document is intended to provide a common understanding to the HPC community of the necessary management and control capabilities. Assuming a common understanding can be achieved, the next step will be to develop a set of Application Programing Interfaces (APIs) to which hardware vendors and software developers could utilize to steer power consumption.

  11. Scalability of DL_POLY on High Performance Computing Platform

    Mabule Samuel Mabakane

    2017-12-01

    Full Text Available This paper presents a case study on the scalability of several versions of the molecular dynamics code (DL_POLY performed on South Africa‘s Centre for High Performance Computing e1350 IBM Linux cluster, Sun system and Lengau supercomputers. Within this study different problem sizes were designed and the same chosen systems were employed in order to test the performance of DL_POLY using weak and strong scalability. It was found that the speed-up results for the small systems were better than large systems on both Ethernet and Infiniband network. However, simulations of large systems in DL_POLY performed well using Infiniband network on Lengau cluster as compared to e1350 and Sun supercomputer.

  12. Parameters that affect parallel processing for computational electromagnetic simulation codes on high performance computing clusters

    Moon, Hongsik

    changing computer hardware platforms in order to provide fast, accurate and efficient solutions to large, complex electromagnetic problems. The research in this dissertation proves that the performance of parallel code is intimately related to the configuration of the computer hardware and can be maximized for different hardware platforms. To benchmark and optimize the performance of parallel CEM software, a variety of large, complex projects are created and executed on a variety of computer platforms. The computer platforms used in this research are detailed in this dissertation. The projects run as benchmarks are also described in detail and results are presented. The parameters that affect parallel CEM software on High Performance Computing Clusters (HPCC) are investigated. This research demonstrates methods to maximize the performance of parallel CEM software code.

  13. SAME4HPC: A Promising Approach in Building a Scalable and Mobile Environment for High-Performance Computing

    Karthik, Rajasekar [ORNL

    2014-01-01

    In this paper, an architecture for building Scalable And Mobile Environment For High-Performance Computing with spatial capabilities called SAME4HPC is described using cutting-edge technologies and standards such as Node.js, HTML5, ECMAScript 6, and PostgreSQL 9.4. Mobile devices are increasingly becoming powerful enough to run high-performance apps. At the same time, there exist a significant number of low-end and older devices that rely heavily on the server or the cloud infrastructure to do the heavy lifting. Our architecture aims to support both of these types of devices to provide high-performance and rich user experience. A cloud infrastructure consisting of OpenStack with Ubuntu, GeoServer, and high-performance JavaScript frameworks are some of the key open-source and industry standard practices that has been adopted in this architecture.

  14. Computing on Knights and Kepler Architectures

    Bortolotti, G; Caberletti, M; Ferraro, A; Giacomini, F; Manzali, M; Maron, G; Salomoni, D; Crimi, G; Zanella, M

    2014-01-01

    A recent trend in scientific computing is the increasingly important role of co-processors, originally built to accelerate graphics rendering, and now used for general high-performance computing. The INFN Computing On Knights and Kepler Architectures (COKA) project focuses on assessing the suitability of co-processor boards for scientific computing in a wide range of physics applications, and on studying the best programming methodologies for these systems. Here we present in a comparative way our results in porting a Lattice Boltzmann code on two state-of-the-art accelerators: the NVIDIA K20X, and the Intel Xeon-Phi. We describe our implementations, analyze results and compare with a baseline architecture adopting Intel Sandy Bridge CPUs.

  15. High performance integer arithmetic circuit design on FPGA architecture, implementation and design automation

    Palchaudhuri, Ayan

    2016-01-01

    This book describes the optimized implementations of several arithmetic datapath, controlpath and pseudorandom sequence generator circuits for realization of high performance arithmetic circuits targeted towards a specific family of the high-end Field Programmable Gate Arrays (FPGAs). It explores regular, modular, cascadable, and bit-sliced architectures of these circuits, by directly instantiating the target FPGA-specific primitives in the HDL. Every proposed architecture is justified with detailed mathematical analyses. Simultaneously, constrained placement of the circuit building blocks is performed, by placing the logically related hardware primitives in close proximity to one another by supplying relevant placement constraints in the Xilinx proprietary “User Constraints File”. The book covers the implementation of a GUI-based CAD tool named FlexiCore integrated with the Xilinx Integrated Software Environment (ISE) for design automation of platform-specific high-performance arithmetic circuits from us...

  16. Layered Architecture for Quantum Computing

    N. Cody Jones

    2012-07-01

    Full Text Available We develop a layered quantum-computer architecture, which is a systematic framework for tackling the individual challenges of developing a quantum computer while constructing a cohesive device design. We discuss many of the prominent techniques for implementing circuit-model quantum computing and introduce several new methods, with an emphasis on employing surface-code quantum error correction. In doing so, we propose a new quantum-computer architecture based on optical control of quantum dots. The time scales of physical-hardware operations and logical, error-corrected quantum gates differ by several orders of magnitude. By dividing functionality into layers, we can design and analyze subsystems independently, demonstrating the value of our layered architectural approach. Using this concrete hardware platform, we provide resource analysis for executing fault-tolerant quantum algorithms for integer factoring and quantum simulation, finding that the quantum-dot architecture we study could solve such problems on the time scale of days.

  17. Geometric Computing for Freeform Architecture

    Wallner, J.; Pottmann, Helmut

    2011-01-01

    Geometric computing has recently found a new field of applications, namely the various geometric problems which lie at the heart of rationalization and construction-aware design processes of freeform architecture. We report on our work in this area

  18. Computing architecture for autonomous microgrids

    Goldsmith, Steven Y.

    2015-09-29

    A computing architecture that facilitates autonomously controlling operations of a microgrid is described herein. A microgrid network includes numerous computing devices that execute intelligent agents, each of which is assigned to a particular entity (load, source, storage device, or switch) in the microgrid. The intelligent agents can execute in accordance with predefined protocols to collectively perform computations that facilitate uninterrupted control of the .

  19. A Heterogeneous Quantum Computer Architecture

    Fu, X.; Riesebos, L.; Lao, L.; Garcia Almudever, C.; Sebastiano, F.; Versluis, R.; Charbon, E.; Bertels, K.

    2016-01-01

    In this paper, we present a high level view of the heterogeneous quantum computer architecture as any future quantum computer will consist of both a classical and quantum computing part. The classical part is needed for error correction as well as for the execution of algorithms that contain both

  20. Bringing high-performance computing to the biologist's workbench: approaches, applications, and challenges

    Oehmen, C S; Cannon, W R

    2008-01-01

    Data-intensive and high-performance computing are poised to significantly impact the future of biological research which is increasingly driven by the prevalence of high-throughput experimental methodologies for genome sequencing, transcriptomics, proteomics, and other areas. Large centers such as NIH's National Center for Biotechnology Information, The Institute for Genomic Research, and the DOE's Joint Genome Institute) have made extensive use of multiprocessor architectures to deal with some of the challenges of processing, storing and curating exponentially growing genomic and proteomic datasets, thus enabling users to rapidly access a growing public data source, as well as use analysis tools transparently on high-performance computing resources. Applying this computational power to single-investigator analysis, however, often relies on users to provide their own computational resources, forcing them to endure the learning curve of porting, building, and running software on multiprocessor architectures. Solving the next generation of large-scale biology challenges using multiprocessor machines-from small clusters to emerging petascale machines-can most practically be realized if this learning curve can be minimized through a combination of workflow management, data management and resource allocation as well as intuitive interfaces and compatibility with existing common data formats

  1. Research Activity in Computational Physics utilizing High Performance Computing: Co-authorship Network Analysis

    Ahn, Sul-Ah; Jung, Youngim

    2016-10-01

    The research activities of the computational physicists utilizing high performance computing are analyzed by bibliometirc approaches. This study aims at providing the computational physicists utilizing high-performance computing and policy planners with useful bibliometric results for an assessment of research activities. In order to achieve this purpose, we carried out a co-authorship network analysis of journal articles to assess the research activities of researchers for high-performance computational physics as a case study. For this study, we used journal articles of the Scopus database from Elsevier covering the time period of 2004-2013. We extracted the author rank in the physics field utilizing high-performance computing by the number of papers published during ten years from 2004. Finally, we drew the co-authorship network for 45 top-authors and their coauthors, and described some features of the co-authorship network in relation to the author rank. Suggestions for further studies are discussed.

  2. Computational Environments and Analysis methods available on the NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform

    Evans, B. J. K.; Foster, C.; Minchin, S. A.; Pugh, T.; Lewis, A.; Wyborn, L. A.; Evans, B. J.; Uhlherr, A.

    2014-12-01

    The National Computational Infrastructure (NCI) has established a powerful in-situ computational environment to enable both high performance computing and data-intensive science across a wide spectrum of national environmental data collections - in particular climate, observational data and geoscientific assets. This paper examines 1) the computational environments that supports the modelling and data processing pipelines, 2) the analysis environments and methods to support data analysis, and 3) the progress in addressing harmonisation of the underlying data collections for future transdisciplinary research that enable accurate climate projections. NCI makes available 10+ PB major data collections from both the government and research sectors based on six themes: 1) weather, climate, and earth system science model simulations, 2) marine and earth observations, 3) geosciences, 4) terrestrial ecosystems, 5) water and hydrology, and 6) astronomy, social and biosciences. Collectively they span the lithosphere, crust, biosphere, hydrosphere, troposphere, and stratosphere. The data is largely sourced from NCI's partners (which include the custodians of many of the national scientific records), major research communities, and collaborating overseas organisations. The data is accessible within an integrated HPC-HPD environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large scale and high-bandwidth Lustre filesystems. This computational environment supports a catalogue of integrated reusable software and workflows from earth system and ecosystem modelling, weather research, satellite and other observed data processing and analysis. To enable transdisciplinary research on this scale, data needs to be harmonised so that researchers can readily apply techniques and software across the corpus of data available and not be constrained to work within artificial disciplinary boundaries. Future challenges will

  3. Implementing an Affordable High-Performance Computing for Teaching-Oriented Computer Science Curriculum

    Abuzaghleh, Omar; Goldschmidt, Kathleen; Elleithy, Yasser; Lee, Jeongkyu

    2013-01-01

    With the advances in computing power, high-performance computing (HPC) platforms have had an impact on not only scientific research in advanced organizations but also computer science curriculum in the educational community. For example, multicore programming and parallel systems are highly desired courses in the computer science major. However,…

  4. A Heterogeneous High-Performance System for Computational and Computer Science

    2016-11-15

    expand the research infrastructure at the institution but also to enhance the high -performance computing training provided to both undergraduate and... cloud computing, supercomputing, and the availability of cheap memory and storage led to enormous amounts of data to be sifted through in forensic... High -Performance Computing (HPC) tools that can be integrated with existing curricula and support our research to modernize and dramatically advance

  5. Real-time Tsunami Inundation Prediction Using High Performance Computers

    Oishi, Y.; Imamura, F.; Sugawara, D.

    2014-12-01

    Recently off-shore tsunami observation stations based on cabled ocean bottom pressure gauges are actively being deployed especially in Japan. These cabled systems are designed to provide real-time tsunami data before tsunamis reach coastlines for disaster mitigation purposes. To receive real benefits of these observations, real-time analysis techniques to make an effective use of these data are necessary. A representative study was made by Tsushima et al. (2009) that proposed a method to provide instant tsunami source prediction based on achieving tsunami waveform data. As time passes, the prediction is improved by using updated waveform data. After a tsunami source is predicted, tsunami waveforms are synthesized from pre-computed tsunami Green functions of linear long wave equations. Tsushima et al. (2014) updated the method by combining the tsunami waveform inversion with an instant inversion of coseismic crustal deformation and improved the prediction accuracy and speed in the early stages. For disaster mitigation purposes, real-time predictions of tsunami inundation are also important. In this study, we discuss the possibility of real-time tsunami inundation predictions, which require faster-than-real-time tsunami inundation simulation in addition to instant tsunami source analysis. Although the computational amount is large to solve non-linear shallow water equations for inundation predictions, it has become executable through the recent developments of high performance computing technologies. We conducted parallel computations of tsunami inundation and achieved 6.0 TFLOPS by using 19,000 CPU cores. We employed a leap-frog finite difference method with nested staggered grids of which resolution range from 405 m to 5 m. The resolution ratio of each nested domain was 1/3. Total number of grid points were 13 million, and the time step was 0.1 seconds. Tsunami sources of 2011 Tohoku-oki earthquake were tested. The inundation prediction up to 2 hours after the

  6. High performance computing in power and energy systems

    Khaitan, Siddhartha Kumar [Iowa State Univ., Ames, IA (United States); Gupta, Anshul (eds.) [IBM Watson Research Center, Yorktown Heights, NY (United States)

    2013-07-01

    The twin challenge of meeting global energy demands in the face of growing economies and populations and restricting greenhouse gas emissions is one of the most daunting ones that humanity has ever faced. Smart electrical generation and distribution infrastructure will play a crucial role in meeting these challenges. We would need to develop capabilities to handle large volumes of data generated by the power system components like PMUs, DFRs and other data acquisition devices as well as by the capacity to process these data at high resolution via multi-scale and multi-period simulations, cascading and security analysis, interaction between hybrid systems (electric, transport, gas, oil, coal, etc.) and so on, to get meaningful information in real time to ensure a secure, reliable and stable power system grid. Advanced research on development and implementation of market-ready leading-edge high-speed enabling technologies and algorithms for solving real-time, dynamic, resource-critical problems will be required for dynamic security analysis targeted towards successful implementation of Smart Grid initiatives. This books aims to bring together some of the latest research developments as well as thoughts on the future research directions of the high performance computing applications in electric power systems planning, operations, security, markets, and grid integration of alternate sources of energy, etc.

  7. The Future of Software Engineering for High Performance Computing

    Pope, G [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2015-07-16

    DOE ASCR requested that from May through mid-July 2015 a study group identify issues and recommend solutions from a software engineering perspective transitioning into the next generation of High Performance Computing. The approach used was to ask some of the DOE complex experts who will be responsible for doing this work to contribute to the study group. The technique used was to solicit elevator speeches: a short and concise write up done as if the author was a speaker with only a few minutes to convince a decision maker of their top issues. Pages 2-18 contain the original texts of the contributed elevator speeches and end notes identifying the 20 contributors. The study group also ranked the importance of each topic, and those scores are displayed with each topic heading. A perfect score (and highest priority) is three, two is medium priority, and one is lowest priority. The highest scoring topic areas were software engineering and testing resources; the lowest scoring area was compliance to DOE standards. The following two paragraphs are an elevator speech summarizing the contributed elevator speeches. Each sentence or phrase in the summary is hyperlinked to its source via a numeral embedded in the text. A risk one liner has also been added to each topic to allow future risk tracking and mitigation.

  8. A checkpoint compression study for high-performance computing systems

    Ibtesham, Dewan [Univ. of New Mexico, Albuquerque, NM (United States). Dept. of Computer Science; Ferreira, Kurt B. [Sandia National Laboratories (SNL-NM), Albuquerque, NM (United States). Scalable System Software Dept.; Arnold, Dorian [Univ. of New Mexico, Albuquerque, NM (United States). Dept. of Computer Science

    2015-02-17

    As high-performance computing systems continue to increase in size and complexity, higher failure rates and increased overheads for checkpoint/restart (CR) protocols have raised concerns about the practical viability of CR protocols for future systems. Previously, compression has proven to be a viable approach for reducing checkpoint data volumes and, thereby, reducing CR protocol overhead leading to improved application performance. In this article, we further explore compression-based CR optimization by exploring its baseline performance and scaling properties, evaluating whether improved compression algorithms might lead to even better application performance and comparing checkpoint compression against and alongside other software- and hardware-based optimizations. Our results highlights are: (1) compression is a very viable CR optimization; (2) generic, text-based compression algorithms appear to perform near optimally for checkpoint data compression and faster compression algorithms will not lead to better application performance; (3) compression-based optimizations fare well against and alongside other software-based optimizations; and (4) while hardware-based optimizations outperform software-based ones, they are not as cost effective.

  9. Enabling Efficient Climate Science Workflows in High Performance Computing Environments

    Krishnan, H.; Byna, S.; Wehner, M. F.; Gu, J.; O'Brien, T. A.; Loring, B.; Stone, D. A.; Collins, W.; Prabhat, M.; Liu, Y.; Johnson, J. N.; Paciorek, C. J.

    2015-12-01

    A typical climate science workflow often involves a combination of acquisition of data, modeling, simulation, analysis, visualization, publishing, and storage of results. Each of these tasks provide a myriad of challenges when running on a high performance computing environment such as Hopper or Edison at NERSC. Hurdles such as data transfer and management, job scheduling, parallel analysis routines, and publication require a lot of forethought and planning to ensure that proper quality control mechanisms are in place. These steps require effectively utilizing a combination of well tested and newly developed functionality to move data, perform analysis, apply statistical routines, and finally, serve results and tools to the greater scientific community. As part of the CAlibrated and Systematic Characterization, Attribution and Detection of Extremes (CASCADE) project we highlight a stack of tools our team utilizes and has developed to ensure that large scale simulation and analysis work are commonplace and provide operations that assist in everything from generation/procurement of data (HTAR/Globus) to automating publication of results to portals like the Earth Systems Grid Federation (ESGF), all while executing everything in between in a scalable environment in a task parallel way (MPI). We highlight the use and benefit of these tools by showing several climate science analysis use cases they have been applied to.

  10. Multi-Language Programming Environments for High Performance Java Computing

    Vladimir Getov

    1999-01-01

    Full Text Available Recent developments in processor capabilities, software tools, programming languages and programming paradigms have brought about new approaches to high performance computing. A steadfast component of this dynamic evolution has been the scientific community’s reliance on established scientific packages. As a consequence, programmers of high‐performance applications are reluctant to embrace evolving languages such as Java. This paper describes the Java‐to‐C Interface (JCI tool which provides application programmers wishing to use Java with immediate accessibility to existing scientific packages. The JCI tool also facilitates rapid development and reuse of existing code. These benefits are provided at minimal cost to the programmer. While beneficial to the programmer, the additional advantages of mixed‐language programming in terms of application performance and portability are addressed in detail within the context of this paper. In addition, we discuss how the JCI tool is complementing other ongoing projects such as IBM’s High‐Performance Compiler for Java (HPCJ and IceT’s metacomputing environment.

  11. High Performance Computing (HPC) Challenge (HPCC) Benchmark Suite Development

    Dongarra, J. J

    2005-01-01

    .... The applications of performance modeling are numerous, including evaluation of algorithms, optimization of code implementation, parallel library development, and comparison of system architectures...

  12. Programmable architecture for quantum computing

    Chen, J.; Wang, L.; Charbon, E.; Wang, B.

    2013-01-01

    A programmable architecture called “quantum FPGA (field-programmable gate array)” (QFPGA) is presented for quantum computing, which is a hybrid model combining the advantages of the qubus system and the measurement-based quantum computation. There are two kinds of buses in QFPGA, the local bus and

  13. High performance statistical computing with parallel R: applications to biology and climate modelling

    Samatova, Nagiza F; Branstetter, Marcia; Ganguly, Auroop R; Hettich, Robert; Khan, Shiraj; Kora, Guruprasad; Li, Jiangtian; Ma, Xiaosong; Pan, Chongle; Shoshani, Arie; Yoginath, Srikanth

    2006-01-01

    Ultrascale computing and high-throughput experimental technologies have enabled the production of scientific data about complex natural phenomena. With this opportunity, comes a new problem - the massive quantities of data so produced. Answers to fundamental questions about the nature of those phenomena remain largely hidden in the produced data. The goal of this work is to provide a scalable high performance statistical data analysis framework to help scientists perform interactive analyses of these raw data to extract knowledge. Towards this goal we have been developing an open source parallel statistical analysis package, called Parallel R, that lets scientists employ a wide range of statistical analysis routines on high performance shared and distributed memory architectures without having to deal with the intricacies of parallelizing these routines

  14. How to build a high-performance compute cluster for the Grid

    Reinefeld, A

    2001-01-01

    The success of large-scale multi-national projects like the forthcoming analysis of the LHC particle collision data at CERN relies to a great extent on the ability to efficiently utilize computing and data-storage resources at geographically distributed sites. Currently, much effort is spent on the design of Grid management software (Datagrid, Globus, etc.), while the effective integration of computing nodes has been largely neglected up to now. This is the focus of our work. We present a framework for a high- performance cluster that can be used as a reliable computing node in the Grid. We outline the cluster architecture, the management of distributed data and the seamless integration of the cluster into the Grid environment. (11 refs).

  15. FY 1995 Blue Book: High Performance Computing and Communications: Technology for the National Information Infrastructure

    Networking and Information Technology Research and Development, Executive Office of the President — The Federal High Performance Computing and Communications HPCC Program was created to accelerate the development of future generations of high performance computers...

  16. Savannah River Site computing architecture

    1991-03-29

    A computing architecture is a framework for making decisions about the implementation of computer technology and the supporting infrastructure. Because of the size, diversity, and amount of resources dedicated to computing at the Savannah River Site (SRS), there must be an overall strategic plan that can be followed by the thousands of site personnel who make decisions daily that directly affect the SRS computing environment and impact the site's production and business systems. This plan must address the following requirements: There must be SRS-wide standards for procurement or development of computing systems (hardware and software). The site computing organizations must develop systems that end users find easy to use. Systems must be put in place to support the primary function of site information workers. The developers of computer systems must be given tools that automate and speed up the development of information systems and applications based on computer technology. This document describes a proposal for a site-wide computing architecture that addresses the above requirements. In summary, this architecture is standards-based data-driven, and workstation-oriented with larger systems being utilized for the delivery of needed information to users in a client-server relationship.

  17. Savannah River Site computing architecture

    1991-03-29

    A computing architecture is a framework for making decisions about the implementation of computer technology and the supporting infrastructure. Because of the size, diversity, and amount of resources dedicated to computing at the Savannah River Site (SRS), there must be an overall strategic plan that can be followed by the thousands of site personnel who make decisions daily that directly affect the SRS computing environment and impact the site`s production and business systems. This plan must address the following requirements: There must be SRS-wide standards for procurement or development of computing systems (hardware and software). The site computing organizations must develop systems that end users find easy to use. Systems must be put in place to support the primary function of site information workers. The developers of computer systems must be given tools that automate and speed up the development of information systems and applications based on computer technology. This document describes a proposal for a site-wide computing architecture that addresses the above requirements. In summary, this architecture is standards-based data-driven, and workstation-oriented with larger systems being utilized for the delivery of needed information to users in a client-server relationship.

  18. Computer Architecture A Quantitative Approach

    Hennessy, John L

    2011-01-01

    The computing world today is in the middle of a revolution: mobile clients and cloud computing have emerged as the dominant paradigms driving programming and hardware innovation today. The Fifth Edition of Computer Architecture focuses on this dramatic shift, exploring the ways in which software and technology in the cloud are accessed by cell phones, tablets, laptops, and other mobile computing devices. Each chapter includes two real-world examples, one mobile and one datacenter, to illustrate this revolutionary change.Updated to cover the mobile computing revolutionEmphasizes the two most im

  19. Super-computer architecture

    Hockney, R W

    1977-01-01

    This paper examines the design of the top-of-the-range, scientific, number-crunching computers. The market for such computers is not as large as that for smaller machines, but on the other hand it is by no means negligible. The present work-horse machines in this category are the CDC 7600 and IBM 360/195, and over fifty of the former machines have been sold. The types of installation that form the market for such machines are not only the major scientific research laboratories in the major countries-such as Los Alamos, CERN, Rutherford laboratory-but also major universities or university networks. It is also true that, as with sports cars, innovations made to satisfy the top of the market today often become the standard for the medium-scale computer of tomorrow. Hence there is considerable interest in examining present developments in this area. (0 refs).

  20. Heads in the Cloud: A Primer on Neuroimaging Applications of High Performance Computing

    Anwar S. Shatil

    2015-01-01

    Full Text Available With larger data sets and more sophisticated analyses, it is becoming increasingly common for neuroimaging researchers to push (or exceed the limitations of standalone computer workstations. Nonetheless, although high-performance computing platforms such as clusters, grids and clouds are already in routine use by a small handful of neuroimaging researchers to increase their storage and/or computational power, the adoption of such resources by the broader neuroimaging community remains relatively uncommon. Therefore, the goal of the current manuscript is to: 1 inform prospective users about the similarities and differences between computing clusters, grids and clouds; 2 highlight their main advantages; 3 discuss when it may (and may not be advisable to use them; 4 review some of their potential problems and barriers to access; and finally 5 give a few practical suggestions for how interested new users can start analyzing their neuroimaging data using cloud resources. Although the aim of cloud computing is to hide most of the complexity of the infrastructure management from end-users, we recognize that this can still be an intimidating area for cognitive neuroscientists, psychologists, neurologists, radiologists, and other neuroimaging researchers lacking a strong computational background. Therefore, with this in mind, we have aimed to provide a basic introduction to cloud computing in general (including some of the basic terminology, computer architectures, infrastructure and service models, etc., a practical overview of the benefits and drawbacks, and a specific focus on how cloud resources can be used for various neuroimaging applications.

  1. Heads in the Cloud: A Primer on Neuroimaging Applications of High Performance Computing.

    Shatil, Anwar S; Younas, Sohail; Pourreza, Hossein; Figley, Chase R

    2015-01-01

    With larger data sets and more sophisticated analyses, it is becoming increasingly common for neuroimaging researchers to push (or exceed) the limitations of standalone computer workstations. Nonetheless, although high-performance computing platforms such as clusters, grids and clouds are already in routine use by a small handful of neuroimaging researchers to increase their storage and/or computational power, the adoption of such resources by the broader neuroimaging community remains relatively uncommon. Therefore, the goal of the current manuscript is to: 1) inform prospective users about the similarities and differences between computing clusters, grids and clouds; 2) highlight their main advantages; 3) discuss when it may (and may not) be advisable to use them; 4) review some of their potential problems and barriers to access; and finally 5) give a few practical suggestions for how interested new users can start analyzing their neuroimaging data using cloud resources. Although the aim of cloud computing is to hide most of the complexity of the infrastructure management from end-users, we recognize that this can still be an intimidating area for cognitive neuroscientists, psychologists, neurologists, radiologists, and other neuroimaging researchers lacking a strong computational background. Therefore, with this in mind, we have aimed to provide a basic introduction to cloud computing in general (including some of the basic terminology, computer architectures, infrastructure and service models, etc.), a practical overview of the benefits and drawbacks, and a specific focus on how cloud resources can be used for various neuroimaging applications.

  2. Heads in the Cloud: A Primer on Neuroimaging Applications of High Performance Computing

    Shatil, Anwar S.; Younas, Sohail; Pourreza, Hossein; Figley, Chase R.

    2015-01-01

    With larger data sets and more sophisticated analyses, it is becoming increasingly common for neuroimaging researchers to push (or exceed) the limitations of standalone computer workstations. Nonetheless, although high-performance computing platforms such as clusters, grids and clouds are already in routine use by a small handful of neuroimaging researchers to increase their storage and/or computational power, the adoption of such resources by the broader neuroimaging community remains relatively uncommon. Therefore, the goal of the current manuscript is to: 1) inform prospective users about the similarities and differences between computing clusters, grids and clouds; 2) highlight their main advantages; 3) discuss when it may (and may not) be advisable to use them; 4) review some of their potential problems and barriers to access; and finally 5) give a few practical suggestions for how interested new users can start analyzing their neuroimaging data using cloud resources. Although the aim of cloud computing is to hide most of the complexity of the infrastructure management from end-users, we recognize that this can still be an intimidating area for cognitive neuroscientists, psychologists, neurologists, radiologists, and other neuroimaging researchers lacking a strong computational background. Therefore, with this in mind, we have aimed to provide a basic introduction to cloud computing in general (including some of the basic terminology, computer architectures, infrastructure and service models, etc.), a practical overview of the benefits and drawbacks, and a specific focus on how cloud resources can be used for various neuroimaging applications. PMID:27279746

  3. Specialized computer architectures for computational aerodynamics

    Stevenson, D. K.

    1978-01-01

    In recent years, computational fluid dynamics has made significant progress in modelling aerodynamic phenomena. Currently, one of the major barriers to future development lies in the compute-intensive nature of the numerical formulations and the relative high cost of performing these computations on commercially available general purpose computers, a cost high with respect to dollar expenditure and/or elapsed time. Today's computing technology will support a program designed to create specialized computing facilities to be dedicated to the important problems of computational aerodynamics. One of the still unresolved questions is the organization of the computing components in such a facility. The characteristics of fluid dynamic problems which will have significant impact on the choice of computer architecture for a specialized facility are reviewed.

  4. COMPUTERS: Teraflops for Europe; EEC Working Group on High Performance Computing

    Anon.

    1991-03-15

    In little more than a decade, simulation on high performance computers has become an essential tool for theoretical physics, capable of solving a vast range of crucial problems inaccessible to conventional analytic mathematics. In many ways, computer simulation has become the calculus for interacting many-body systems, a key to the study of transitions from isolated to collective behaviour.

  5. COMPUTERS: Teraflops for Europe; EEC Working Group on High Performance Computing

    Anon.

    1991-01-01

    In little more than a decade, simulation on high performance computers has become an essential tool for theoretical physics, capable of solving a vast range of crucial problems inaccessible to conventional analytic mathematics. In many ways, computer simulation has become the calculus for interacting many-body systems, a key to the study of transitions from isolated to collective behaviour

  6. High performance computing network for cloud environment using simulators

    Singh, N. Ajith; Hemalatha, M.

    2012-01-01

    Cloud computing is the next generation computing. Adopting the cloud computing is like signing up new form of a website. The GUI which controls the cloud computing make is directly control the hardware resource and your application. The difficulty part in cloud computing is to deploy in real environment. Its' difficult to know the exact cost and it's requirement until and unless we buy the service not only that whether it will support the existing application which is available on traditional...

  7. High Performance Computing Facility Operational Assessment 2015: Oak Ridge Leadership Computing Facility

    Barker, Ashley D. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Bernholdt, David E. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Bland, Arthur S. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Gary, Jeff D. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Hack, James J. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; McNally, Stephen T. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Rogers, James H. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Smith, Brian E. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Straatsma, T. P. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Sukumar, Sreenivas Rangan [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Thach, Kevin G. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Tichenor, Suzy [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Vazhkudai, Sudharshan S. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility; Wells, Jack C. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States). Oak Ridge Leadership Computing Facility

    2016-03-01

    Oak Ridge National Laboratory’s (ORNL’s) Leadership Computing Facility (OLCF) continues to surpass its operational target goals: supporting users; delivering fast, reliable systems; creating innovative solutions for high-performance computing (HPC) needs; and managing risks, safety, and security aspects associated with operating one of the most powerful computers in the world. The results can be seen in the cutting-edge science delivered by users and the praise from the research community. Calendar year (CY) 2015 was filled with outstanding operational results and accomplishments: a very high rating from users on overall satisfaction that ties the highest-ever mark set in CY 2014; the greatest number of core-hours delivered to research projects; the largest percentage of capability usage since the OLCF began tracking the metric in 2009; and success in delivering on the allocation of 60, 30, and 10% of core hours offered for the INCITE (Innovative and Novel Computational Impact on Theory and Experiment), ALCC (Advanced Scientific Computing Research Leadership Computing Challenge), and Director’s Discretionary programs, respectively. These accomplishments, coupled with the extremely high utilization rate, represent the fulfillment of the promise of Titan: maximum use by maximum-size simulations. The impact of all of these successes and more is reflected in the accomplishments of OLCF users, with publications this year in notable journals Nature, Nature Materials, Nature Chemistry, Nature Physics, Nature Climate Change, ACS Nano, Journal of the American Chemical Society, and Physical Review Letters, as well as many others. The achievements included in the 2015 OLCF Operational Assessment Report reflect first-ever or largest simulations in their communities; for example Titan enabled engineers in Los Angeles and the surrounding region to design and begin building improved critical infrastructure by enabling the highest-resolution Cybershake map for Southern

  8. Fault Tolerant Computer Architecture

    Sorin, Daniel

    2009-01-01

    For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes

  9. VLSI Architectures for Computing DFT's

    Truong, T. K.; Chang, J. J.; Hsu, I. S.; Reed, I. S.; Pei, D. Y.

    1986-01-01

    Simplifications result from use of residue Fermat number systems. System of finite arithmetic over residue Fermat number systems enables calculation of discrete Fourier transform (DFT) of series of complex numbers with reduced number of multiplications. Computer architectures based on approach suitable for design of very-large-scale integrated (VLSI) circuits for computing DFT's. General approach not limited to DFT's; Applicable to decoding of error-correcting codes and other transform calculations. System readily implemented in VLSI.

  10. SCEAPI: A unified Restful Web API for High-Performance Computing

    Rongqiang, Cao; Haili, Xiao; Shasha, Lu; Yining, Zhao; Xiaoning, Wang; Xuebin, Chi

    2017-10-01

    The development of scientific computing is increasingly moving to collaborative web and mobile applications. All these applications need high-quality programming interface for accessing heterogeneous computing resources consisting of clusters, grid computing or cloud computing. In this paper, we introduce our high-performance computing environment that integrates computing resources from 16 HPC centers across China. Then we present a bundle of web services called SCEAPI and describe how it can be used to access HPC resources with HTTP or HTTPs protocols. We discuss SCEAPI from several aspects including architecture, implementation and security, and address specific challenges in designing compatible interfaces and protecting sensitive data. We describe the functions of SCEAPI including authentication, file transfer and job management for creating, submitting and monitoring, and how to use SCEAPI in an easy-to-use way. Finally, we discuss how to exploit more HPC resources quickly for the ATLAS experiment by implementing the custom ARC compute element based on SCEAPI, and our work shows that SCEAPI is an easy-to-use and effective solution to extend opportunistic HPC resources.

  11. Newmark local time stepping on high-performance computing architectures

    Rietmann, Max

    2016-11-25

    In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strong element-size contrasts (more than 100×). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.

  12. Newmark local time stepping on high-performance computing architectures

    Rietmann, Max, E-mail: max.rietmann@erdw.ethz.ch [Institute for Computational Science, Università della Svizzera italiana, Lugano (Switzerland); Institute of Geophysics, ETH Zurich (Switzerland); Grote, Marcus, E-mail: marcus.grote@unibas.ch [Department of Mathematics and Computer Science, University of Basel (Switzerland); Peter, Daniel, E-mail: daniel.peter@kaust.edu.sa [Institute for Computational Science, Università della Svizzera italiana, Lugano (Switzerland); Institute of Geophysics, ETH Zurich (Switzerland); Schenk, Olaf, E-mail: olaf.schenk@usi.ch [Institute for Computational Science, Università della Svizzera italiana, Lugano (Switzerland)

    2017-04-01

    In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strong element-size contrasts (more than 100x). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.

  13. Newmark local time stepping on high-performance computing architectures

    Rietmann, Max; Grote, Marcus; Peter, Daniel; Schenk, Olaf

    2016-01-01

    In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strong element-size contrasts (more than 100×). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.

  14. A Review of Lightweight Thread Approaches for High Performance Computing

    Castello, Adrian; Pena, Antonio J.; Seo, Sangmin; Mayo, Rafael; Balaji, Pavan; Quintana-Orti, Enrique S.

    2016-09-12

    High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, we study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.

  15. Computer Architecture A Quantitative Approach

    Hennessy, John L

    2007-01-01

    The era of seemingly unlimited growth in processor performance is over: single chip architectures can no longer overcome the performance limitations imposed by the power they consume and the heat they generate. Today, Intel and other semiconductor firms are abandoning the single fast processor model in favor of multi-core microprocessors--chips that combine two or more processors in a single package. In the fourth edition of Computer Architecture, the authors focus on this historic shift, increasing their coverage of multiprocessors and exploring the most effective ways of achieving parallelis

  16. High-level language computer architecture

    Chu, Yaohan

    1975-01-01

    High-Level Language Computer Architecture offers a tutorial on high-level language computer architecture, including von Neumann architecture and syntax-oriented architecture as well as direct and indirect execution architecture. Design concepts of Japanese-language data processing systems are discussed, along with the architecture of stack machines and the SYMBOL computer system. The conceptual design of a direct high-level language processor is also described.Comprised of seven chapters, this book first presents a classification of high-level language computer architecture according to the pr

  17. Time-Predictable Computer Architecture

    Schoeberl Martin

    2009-01-01

    Full Text Available Today's general-purpose processors are optimized for maximum throughput. Real-time systems need a processor with both a reasonable and a known worst-case execution time (WCET. Features such as pipelines with instruction dependencies, caches, branch prediction, and out-of-order execution complicate WCET analysis and lead to very conservative estimates. In this paper, we evaluate the issues of current architectures with respect to WCET analysis. Then, we propose solutions for a time-predictable computer architecture. The proposed architecture is evaluated with implementation of some features in a Java processor. The resulting processor is a good target for WCET analysis and still performs well in the average case.

  18. Frontier: High Performance Database Access Using Standard Web Components in a Scalable Multi-Tier Architecture

    Kosyakov, S.; Kowalkowski, J.; Litvintsev, D.; Lueking, L.; Paterno, M.; White, S.P.; Autio, Lauri; Blumenfeld, B.; Maksimovic, P.; Mathis, M.

    2004-01-01

    A high performance system has been assembled using standard web components to deliver database information to a large number of broadly distributed clients. The CDF Experiment at Fermilab is establishing processing centers around the world imposing a high demand on their database repository. For delivering read-only data, such as calibrations, trigger information, and run conditions data, we have abstracted the interface that clients use to retrieve data objects. A middle tier is deployed that translates client requests into database specific queries and returns the data to the client as XML datagrams. The database connection management, request translation, and data encoding are accomplished in servlets running under Tomcat. Squid Proxy caching layers are deployed near the Tomcat servers, as well as close to the clients, to significantly reduce the load on the database and provide a scalable deployment model. Details the system's construction and use are presented, including its architecture, design, interfaces, administration, performance measurements, and deployment plan

  19. High Performance Flexible Pseudocapacitor based on Nano-architectured Spinel Nickel Cobaltite Anchored Multiwall Carbon Nanotubes

    Shakir, Imran

    2014-01-01

    Highlights: • Two-step fabrication method for nano-architectured spinel nickel cobaltite (NiCo 2 O 4 ) anchored MWCNTs composite. • High performance flexible energy-storage devices. • The NiCo 2 O 4 anchored MWCNTs Exhibits 2032 Fg −1 capacitance which is 1.62 times greater than pristine NiCo 2 O 4 at 1 Ag −1 . - Abstract: We demonstrate a facile two-step fabrication method for nano-architectured spinel nickel cobaltite (NiCo 2 O 4 ) anchored multiwall carbon nanotubes (MWCNTs) based electrodes for high performance flexible energy-storage devices. As electrode materials for flexible supercapacitors, the NiCo 2 O 4 anchored MWCNTs exhibits a high specific capacitance of 2032 Fg −1 , which is nearly 1.62 times greater than pristine NiCo 2 O 4 nanoflakes at 1 Ag −1 . The synthesized NiCo 2 O 4 anchored MWCNTs composite shows excellent rate performance (83.96% capacity retention at 30 Ag −1 ) and stability with coulombic efficiency over 96% after 5,000 cycles when being fully charged/discharged at 1 Ag −1 . Furthermore, NiCo 2 O 4 anchored MWCNTs achieve a maximum energy density of 48.32 Whkg −1 at a power density of 480 Wkg −1 which is 60% higher than pristine NiCo 2 O 4 electrode and significantly outperformed electrode materials based on NiCo 2 O 4 which are currently used in the state-of-the-art supercapacitors throughout the literature. This superior rate performance and high-capacity value offered by NiCo 2 O 4 anchored MWCNTs is mainly due to enhanced electronic and ionic conductivity, which provides a short diffusion path for ions and an easy access of electrolyte flow to nickel cobaltite redox centers besides the high conductivity of MWCNTs

  20. Department of Energy research in utilization of high-performance computers

    Buzbee, B.L.; Worlton, W.J.; Michael, G.; Rodrigue, G.

    1980-08-01

    Department of Energy (DOE) and other Government research laboratories depend on high-performance computer systems to accomplish their programmatic goals. As the most powerful computer systems become available, they are acquired by these laboratories so that advances can be made in their disciplines. These advances are often the result of added sophistication to numerical models, the execution of which is made possible by high-performance computer systems. However, high-performance computer systems have become increasingly complex, and consequently it has become increasingly difficult to realize their potential performance. The result is a need for research on issues related to the utilization of these systems. This report gives a brief description of high-performance computers, and then addresses the use of and future needs for high-performance computers within DOE, the growing complexity of applications within DOE, and areas of high-performance computer systems warranting research. 1 figure

  1. Innovative architecture design for high performance organic and hybrid multi-junction solar cells

    Li, Ning; Spyropoulos, George D.; Brabec, Christoph J.

    2017-08-01

    The multi-junction concept is especially attractive for the photovoltaic (PV) research community owing to its potential to overcome the Schockley-Queisser limit of single-junction solar cells. Tremendous research interests are now focused on the development of high-performance absorbers and novel device architectures for emerging PV technologies, such as organic and perovskite PVs. It has been predicted that the multi-junction concept is able to boost the organic and perovskite PV technologies approaching the 20% and 30% benchmarks, respectively, showing a bright future of commercialization of the emerging PV technologies. In this contribution, we will demonstrate innovative architecture design for solution-processed, highly functional organic and hybrid multi-junction solar cells. A simple but elegant approach to fabricating organic and hybrid multi-junction solar cells will be introduced. By laminating single organic/hybrid solar cells together through an intermediate layer, the manufacturing cost and complexity of large-scale multi-junction solar cells can be significantly reduced. This smart approach to balancing the photocurrents as well as open circuit voltages in multi-junction solar cells will be demonstrated and discussed in detail.

  2. Distributed metadata in a high performance computing environment

    Bent, John M.; Faibish, Sorin; Zhang, Zhenhua; Liu, Xuezhao; Tang, Haiying

    2017-07-11

    A computer-executable method, system, and computer program product for managing meta-data in a distributed storage system, wherein the distributed storage system includes one or more burst buffers enabled to operate with a distributed key-value store, the co computer-executable method, system, and computer program product comprising receiving a request for meta-data associated with a block of data stored in a first burst buffer of the one or more burst buffers in the distributed storage system, wherein the meta data is associated with a key-value, determining which of the one or more burst buffers stores the requested metadata, and upon determination that a first burst buffer of the one or more burst buffers stores the requested metadata, locating the key-value in a portion of the distributed key-value store accessible from the first burst buffer.

  3. Software Applications on the Peregrine System | High-Performance Computing

    Algebraic Modeling System (GAMS) Statistics and analysis High-level modeling system for mathematical reactivity. Gurobi Optimizer Statistics and analysis Solver for mathematical programming LAMMPS Chemistry and , reactivities, and vibrational, electronic and NMR spectra. R Statistical Computing Environment Statistics and

  4. Benchmark Numerical Toolkits for High Performance Computing, Phase I

    National Aeronautics and Space Administration — Computational codes in physics and engineering often use implicit solution algorithms that require linear algebra tools such as Ax=b solvers, eigenvalue,...

  5. Acceleration of FDTD mode solver by high-performance computing techniques.

    Han, Lin; Xi, Yanping; Huang, Wei-Ping

    2010-06-21

    A two-dimensional (2D) compact finite-difference time-domain (FDTD) mode solver is developed based on wave equation formalism in combination with the matrix pencil method (MPM). The method is validated for calculation of both real guided and complex leaky modes of typical optical waveguides against the bench-mark finite-difference (FD) eigen mode solver. By taking advantage of the inherent parallel nature of the FDTD algorithm, the mode solver is implemented on graphics processing units (GPUs) using the compute unified device architecture (CUDA). It is demonstrated that the high-performance computing technique leads to significant acceleration of the FDTD mode solver with more than 30 times improvement in computational efficiency in comparison with the conventional FDTD mode solver running on CPU of a standard desktop computer. The computational efficiency of the accelerated FDTD method is in the same order of magnitude of the standard finite-difference eigen mode solver and yet require much less memory (e.g., less than 10%). Therefore, the new method may serve as an efficient, accurate and robust tool for mode calculation of optical waveguides even when the conventional eigen value mode solvers are no longer applicable due to memory limitation.

  6. Secure Enclaves: An Isolation-centric Approach for Creating Secure High Performance Computing Environments

    Aderholdt, Ferrol [Tennessee Technological Univ., Cookeville, TN (United States); Caldwell, Blake A. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Hicks, Susan Elaine [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Koch, Scott M. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Naughton, III, Thomas J. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Pelfrey, Daniel S. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Pogge, James R [Tennessee Technological Univ., Cookeville, TN (United States); Scott, Stephen L [Tennessee Technological Univ., Cookeville, TN (United States); Shipman, Galen M. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Sorrillo, Lawrence [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2017-01-01

    High performance computing environments are often used for a wide variety of workloads ranging from simulation, data transformation and analysis, and complex workflows to name just a few. These systems may process data at various security levels but in so doing are often enclaved at the highest security posture. This approach places significant restrictions on the users of the system even when processing data at a lower security level and exposes data at higher levels of confidentiality to a much broader population than otherwise necessary. The traditional approach of isolation, while effective in establishing security enclaves poses significant challenges for the use of shared infrastructure in HPC environments. This report details current state-of-the-art in virtualization, reconfigurable network enclaving via Software Defined Networking (SDN), and storage architectures and bridging techniques for creating secure enclaves in HPC environments.

  7. Towards the development of run times leveraging virtualization for high performance computing

    Diakhate, F.

    2010-12-01

    In recent years, there has been a growing interest in using virtualization to improve the efficiency of data centers. This success is rooted in virtualization's excellent fault tolerance and isolation properties, in the overall flexibility it brings, and in its ability to exploit multi-core architectures efficiently. These characteristics also make virtualization an ideal candidate to tackle issues found in new compute cluster architectures. However, in spite of recent improvements in virtualization technology, overheads in the execution of parallel applications remain, which prevent its use in the field of high performance computing. In this thesis, we propose a virtual device dedicated to message passing between virtual machines, so as to improve the performance of parallel applications executed in a cluster of virtual machines. We also introduce a set of techniques facilitating the deployment of virtualized parallel applications. These functionalities have been implemented as part of a runtime system which allows to benefit from virtualization's properties in a way that is as transparent as possible to the user while minimizing performance overheads. (author)

  8. Scalability of DL_POLY on High Performance Computing Platform

    Mabakane, Mabule S

    2017-12-01

    Full Text Available stream_source_info Mabakanea_19979_2017.pdf.txt stream_content_type text/plain stream_size 33716 Content-Encoding UTF-8 stream_name Mabakanea_19979_2017.pdf.txt Content-Type text/plain; charset=UTF-8 SACJ 29(3) December... when using many processors within the compute nodes of the supercomputer. The type of the processors of compute nodes and their memory also play an important role in the overall performance of the parallel application running on a supercomputer. DL...

  9. Computer architecture fundamentals and principles of computer design

    Dumas II, Joseph D

    2005-01-01

    Introduction to Computer ArchitectureWhat is Computer Architecture?Architecture vs. ImplementationBrief History of Computer SystemsThe First GenerationThe Second GenerationThe Third GenerationThe Fourth GenerationModern Computers - The Fifth GenerationTypes of Computer SystemsSingle Processor SystemsParallel Processing SystemsSpecial ArchitecturesQuality of Computer SystemsGenerality and ApplicabilityEase of UseExpandabilityCompatibilityReliabilitySuccess and Failure of Computer Architectures and ImplementationsQuality and the Perception of QualityCost IssuesArchitectural Openness, Market Timi

  10. Running Batch Jobs on Peregrine | High-Performance Computing | NREL

    and run your application. Users typically create or edit job scripts using a text editor such as vi Using Resource Feature to Request Different Node Types Peregrine has several types of compute nodes , which differ in the amount of memory and number of processor cores. The majority of the nodes have 24

  11. Running Interactive Jobs on Peregrine | High-Performance Computing | NREL

    shell prompt, which allows users to execute commands and scripts as they would on the login nodes. Login performed on the compute nodes rather than on login nodes. This page provides instructions and examples of , start GUIs etc. and the commands will execute on that node instead of on the login node. The -V option

  12. Simulating elastic light scattering using high performance computing methods

    Hoekstra, A.G.; Sloot, P.M.A.; Verbraeck, A.; Kerckhoffs, E.J.H.

    1993-01-01

    The Coupled Dipole method, as originally formulated byPurcell and Pennypacker, is a very powerful method tosimulate the Elastic Light Scattering from arbitraryparticles. This method, which is a particle simulationmodel for Computational Electromagnetics, has one majordrawback: if the size of the

  13. High performance stream computing for particle beam transport simulations

    Appleby, R; Bailey, D; Higham, J; Salt, M

    2008-01-01

    Understanding modern particle accelerators requires simulating charged particle transport through the machine elements. These simulations can be very time consuming due to the large number of particles and the need to consider many turns of a circular machine. Stream computing offers an attractive way to dramatically improve the performance of such simulations by calculating the simultaneous transport of many particles using dedicated hardware. Modern Graphics Processing Units (GPUs) are powerful and affordable stream computing devices. The results of simulations of particle transport through the booster-to-storage-ring transfer line of the DIAMOND synchrotron light source using an NVidia GeForce 7900 GPU are compared to the standard transport code MAD. It is found that particle transport calculations are suitable for stream processing and large performance increases are possible. The accuracy and potential speed gains are compared and the prospects for future work in the area are discussed

  14. WinHPC System Configuration | High-Performance Computing | NREL

    ), login node (WinHPC02) and worker/compute nodes. The head node acts as the file, DNS, and license server . The login node is where the users connect to access the cluster. Node 03 has dual Intel Xeon E5530 2008 R2 HPC Edition. The login node, WinHPC02, is where users login to access the system. This is where

  15. High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away

    Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.

    2012-09-01

    , and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.

  16. Using High Performance Computing to Support Water Resource Planning

    Groves, David G. [RAND Corporation, Santa Monica, CA (United States); Lembert, Robert J. [RAND Corporation, Santa Monica, CA (United States); May, Deborah W. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Leek, James R. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States); Syme, James [RAND Corporation, Santa Monica, CA (United States)

    2015-10-22

    In recent years, decision support modeling has embraced deliberation-withanalysis— an iterative process in which decisionmakers come together with experts to evaluate a complex problem and alternative solutions in a scientifically rigorous and transparent manner. Simulation modeling supports decisionmaking throughout this process; visualizations enable decisionmakers to assess how proposed strategies stand up over time in uncertain conditions. But running these simulation models over standard computers can be slow. This, in turn, can slow the entire decisionmaking process, interrupting valuable interaction between decisionmakers and analytics.

  17. Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency

    Ltaief, Hatem

    2011-08-31

    This paper presents the power profile of two high performance dense linear algebra libraries i.e., LAPACK and PLASMA. The former is based on block algorithms that use the fork-join paradigm to achieve parallel performance. The latter uses fine-grained task parallelism that recasts the computation to operate on submatrices called tiles. In this way tile algorithms are formed. We show results from the power profiling of the most common routines, which permits us to clearly identify the different phases of the computations. This allows us to isolate the bottlenecks in terms of energy efficiency. Our results show that PLASMA surpasses LAPACK not only in terms of performance but also in terms of energy efficiency. © 2011 Springer-Verlag.

  18. Mixed-Language High-Performance Computing for Plasma Simulations

    Quanming Lu

    2003-01-01

    Full Text Available Java is receiving increasing attention as the most popular platform for distributed computing. However, programmers are still reluctant to embrace Java as a tool for writing scientific and engineering applications due to its still noticeable performance drawbacks compared with other programming languages such as Fortran or C. In this paper, we present a hybrid Java/Fortran implementation of a parallel particle-in-cell (PIC algorithm for plasma simulations. In our approach, the time-consuming components of this application are designed and implemented as Fortran subroutines, while less calculation-intensive components usually involved in building the user interface are written in Java. The two types of software modules have been glued together using the Java native interface (JNI. Our mixed-language PIC code was tested and its performance compared with pure Java and Fortran versions of the same algorithm on a Sun E6500 SMP system and a Linux cluster of Pentium~III machines.

  19. Electromagnetic Modeling of Human Body Using High Performance Computing

    Ng, Cho-Kuen; Beall, Mark; Ge, Lixin; Kim, Sanghoek; Klaas, Ottmar; Poon, Ada

    Realistic simulation of electromagnetic wave propagation in the actual human body can expedite the investigation of the phenomenon of harvesting implanted devices using wireless powering coupled from external sources. The parallel electromagnetics code suite ACE3P developed at SLAC National Accelerator Laboratory is based on the finite element method for high fidelity accelerator simulation, which can be enhanced to model electromagnetic wave propagation in the human body. Starting with a CAD model of a human phantom that is characterized by a number of tissues, a finite element mesh representing the complex geometries of the individual tissues is built for simulation. Employing an optimal power source with a specific pattern of field distribution, the propagation and focusing of electromagnetic waves in the phantom has been demonstrated. Substantial speedup of the simulation is achieved by using multiple compute cores on supercomputers.

  20. Accessible high performance computing solutions for near real-time image processing for time critical applications

    Bielski, Conrad; Lemoine, Guido; Syryczynski, Jacek

    2009-09-01

    High Performance Computing (HPC) hardware solutions such as grid computing and General Processing on a Graphics Processing Unit (GPGPU) are now accessible to users with general computing needs. Grid computing infrastructures in the form of computing clusters or blades are becoming common place and GPGPU solutions that leverage the processing power of the video card are quickly being integrated into personal workstations. Our interest in these HPC technologies stems from the need to produce near real-time maps from a combination of pre- and post-event satellite imagery in support of post-disaster management. Faster processing provides a twofold gain in this situation: 1. critical information can be provided faster and 2. more elaborate automated processing can be performed prior to providing the critical information. In our particular case, we test the use of the PANTEX index which is based on analysis of image textural measures extracted using anisotropic, rotation-invariant GLCM statistics. The use of this index, applied in a moving window, has been shown to successfully identify built-up areas in remotely sensed imagery. Built-up index image masks are important input to the structuring of damage assessment interpretation because they help optimise the workload. The performance of computing the PANTEX workflow is compared on two different HPC hardware architectures: (1) a blade server with 4 blades, each having dual quad-core CPUs and (2) a CUDA enabled GPU workstation. The reference platform is a dual CPU-quad core workstation and the PANTEX workflow total computing time is measured. Furthermore, as part of a qualitative evaluation, the differences in setting up and configuring various hardware solutions and the related software coding effort is presented.

  1. STEMsalabim: A high-performance computing cluster friendly code for scanning transmission electron microscopy image simulations of thin specimens

    Oelerich, Jan Oliver; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D.; Volz, Kerstin

    2017-01-01

    Highlights: • We present STEMsalabim, a modern implementation of the multislice algorithm for simulation of STEM images. • Our package is highly parallelizable on high-performance computing clusters, combining shared and distributed memory architectures. • With STEMsalabim, computationally and memory expensive STEM image simulations can be carried out within reasonable time. - Abstract: We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space.

  2. STEMsalabim: A high-performance computing cluster friendly code for scanning transmission electron microscopy image simulations of thin specimens

    Oelerich, Jan Oliver, E-mail: jan.oliver.oelerich@physik.uni-marburg.de; Duschek, Lennart; Belz, Jürgen; Beyer, Andreas; Baranovskii, Sergei D.; Volz, Kerstin

    2017-06-15

    Highlights: • We present STEMsalabim, a modern implementation of the multislice algorithm for simulation of STEM images. • Our package is highly parallelizable on high-performance computing clusters, combining shared and distributed memory architectures. • With STEMsalabim, computationally and memory expensive STEM image simulations can be carried out within reasonable time. - Abstract: We present a new multislice code for the computer simulation of scanning transmission electron microscope (STEM) images based on the frozen lattice approximation. Unlike existing software packages, the code is optimized to perform well on highly parallelized computing clusters, combining distributed and shared memory architectures. This enables efficient calculation of large lateral scanning areas of the specimen within the frozen lattice approximation and fine-grained sweeps of parameter space.

  3. FY1995 study of design methodology and environment of high-performance processor architectures; 1995 nendo koseino processor architecture sekkeiho to sekkei kankyo no kenkyu

    NONE

    1997-03-01

    The aim of our project is to develop high-performance processor architectures for both general purpose and application-specific purpose. We also plan to develop basic softwares, such as compliers, and various design aid tools for those architectures. We are particularly interested in performance evaluation at architecture design phase, design optimization, automatic generation of compliers from processor designs, and architecture design methodologies combined with circuit layout. We have investigated both microprocessor architectures and design methodologies / environments for the processors. Our goal is to establish design technologies for high-performance, low-power, low-cost and highly-reliable systems in system-on-silicon era. We have proposed PPRAM architecture for high-performance system using DRAM and logic mixture technology, Softcore processor architecture for special purpose processors in embedded systems, and Power-Pro architecture for low power systems. We also developed design methodologies and design environments for the above architectures as well as a new method for design verification of microprocessors. (NEDO)

  4. High Performance Motion-Planner Architecture for Hardware-In-the-Loop System Based on Position-Based-Admittance-Control

    Francesco La Mura

    2018-02-01

    Full Text Available This article focuses on a Hardware-In-the-Loop application developed from the advanced energy field project LIFES50+. The aim is to replicate, inside a wind gallery test facility, the combined effect of aerodynamic and hydrodynamic loads on a floating wind turbine model for offshore energy production, using a force controlled robotic device, emulating floating substructure’s behaviour. In addition to well known real-time Hardware-In-the-Loop (HIL issues, the particular application presented has stringent safety requirements of the HIL equipment and difficult to predict operating conditions, so that extra computational efforts have to be spent running specific safety algorithms and achieving desired performance. To meet project requirements, a high performance software architecture based on Position-Based-Admittance-Control (PBAC is presented, combining low level motion interpolation techniques, efficient motion planning, based on buffer management and Time-base control, and advanced high level safety algorithms, implemented in a rapid real-time control architecture.

  5. Big Data and High-Performance Computing in Global Seismology

    Bozdag, Ebru; Lefebvre, Matthieu; Lei, Wenjie; Peter, Daniel; Smith, James; Komatitsch, Dimitri; Tromp, Jeroen

    2014-05-01

    Much of our knowledge of Earth's interior is based on seismic observations and measurements. Adjoint methods provide an efficient way of incorporating 3D full wave propagation in iterative seismic inversions to enhance tomographic images and thus our understanding of processes taking place inside the Earth. Our aim is to take adjoint tomography, which has been successfully applied to regional and continental scale problems, further to image the entire planet. This is one of the extreme imaging challenges in seismology, mainly due to the intense computational requirements and vast amount of high-quality seismic data that can potentially be assimilated. We have started low-resolution inversions (T > 30 s and T > 60 s for body and surface waves, respectively) with a limited data set (253 carefully selected earthquakes and seismic data from permanent and temporary networks) on Oak Ridge National Laboratory's Cray XK7 "Titan" system. Recent improvements in our 3D global wave propagation solvers, such as a GPU version of the SPECFEM3D_GLOBE package, will enable us perform higher-resolution (T > 9 s) and longer duration (~180 m) simulations to take the advantage of high-frequency body waves and major-arc surface waves, thereby improving imbalanced ray coverage as a result of the uneven global distribution of sources and receivers. Our ultimate goal is to use all earthquakes in the global CMT catalogue within the magnitude range of our interest and data from all available seismic networks. To take the full advantage of computational resources, we need a solid framework to manage big data sets during numerical simulations, pre-processing (i.e., data requests and quality checks, processing data, window selection, etc.) and post-processing (i.e., pre-conditioning and smoothing kernels, etc.). We address the bottlenecks in our global seismic workflow, which are mainly coming from heavy I/O traffic during simulations and the pre- and post-processing stages, by defining new data

  6. Geometric Computing for Freeform Architecture

    Wallner, J.

    2011-06-03

    Geometric computing has recently found a new field of applications, namely the various geometric problems which lie at the heart of rationalization and construction-aware design processes of freeform architecture. We report on our work in this area, dealing with meshes with planar faces and meshes which allow multilayer constructions (which is related to discrete surfaces and their curvatures), triangles meshes with circle-packing properties (which is related to conformal uniformization), and with the paneling problem. We emphasize the combination of numerical optimization and geometric knowledge.

  7. Computer aid in solar architecture

    Rosendahl, E W

    1982-02-01

    Among architects the question is being discussed in how far new buildings can be designed in a way to make more economical use of energy by architectural means. Solar houses in the USA are often taken as a model. As yet it is unclear how such measures will affect heat demand in the central European climate and with domestic building materials being used. A computer simulation program is introduced by which these questions can be answered as early as in the stage of planning. The program can be run on a common microcomputersystem.

  8. Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing

    Amooie, M. A.; Moortgat, J.

    2017-12-01

    We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.

  9. FY 1992 Blue Book: Grand Challenges: High Performance Computing and Communications

    Networking and Information Technology Research and Development, Executive Office of the President — High performance computing and computer communications networks are becoming increasingly important to scientific advancement, economic competition, and national...

  10. FY 1993 Blue Book: Grand Challenges 1993: High Performance Computing and Communications

    Networking and Information Technology Research and Development, Executive Office of the President — High performance computing and computer communications networks are becoming increasingly important to scientific advancement, economic competition, and national...

  11. The contribution of high-performance computing and modelling for industrial development

    Sithole, Happy

    2017-10-01

    Full Text Available Performance Computing and Modelling for Industrial Development Dr Happy Sithole and Dr Onno Ubbink 2 Strategic context • High-performance computing (HPC) combined with machine Learning and artificial intelligence present opportunities to non...

  12. BEAGLE: an application programming interface and high-performance computing library for statistical phylogenetics.

    Ayres, Daniel L; Darling, Aaron; Zwickl, Derrick J; Beerli, Peter; Holder, Mark T; Lewis, Paul O; Huelsenbeck, John P; Ronquist, Fredrik; Swofford, David L; Cummings, Michael P; Rambaut, Andrew; Suchard, Marc A

    2012-01-01

    Phylogenetic inference is fundamental to our understanding of most aspects of the origin and evolution of life, and in recent years, there has been a concentration of interest in statistical approaches such as Bayesian inference and maximum likelihood estimation. Yet, for large data sets and realistic or interesting models of evolution, these approaches remain computationally demanding. High-throughput sequencing can yield data for thousands of taxa, but scaling to such problems using serial computing often necessitates the use of nonstatistical or approximate approaches. The recent emergence of graphics processing units (GPUs) provides an opportunity to leverage their excellent floating-point computational performance to accelerate statistical phylogenetic inference. A specialized library for phylogenetic calculation would allow existing software packages to make more effective use of available computer hardware, including GPUs. Adoption of a common library would also make it easier for other emerging computing architectures, such as field programmable gate arrays, to be used in the future. We present BEAGLE, an application programming interface (API) and library for high-performance statistical phylogenetic inference. The API provides a uniform interface for performing phylogenetic likelihood calculations on a variety of compute hardware platforms. The library includes a set of efficient implementations and can currently exploit hardware including GPUs using NVIDIA CUDA, central processing units (CPUs) with Streaming SIMD Extensions and related processor supplementary instruction sets, and multicore CPUs via OpenMP. To demonstrate the advantages of a common API, we have incorporated the library into several popular phylogenetic software packages. The BEAGLE library is free open source software licensed under the Lesser GPL and available from http://beagle-lib.googlecode.com. An example client program is available as public domain software.

  13. A Coarse-Grained Reconfigurable Architecture with Compilation for High Performance

    Lu Wan

    2012-01-01

    Full Text Available We propose a fast data relay (FDR mechanism to enhance existing CGRA (coarse-grained reconfigurable architecture. FDR can not only provide multicycle data transmission in concurrent with computations but also convert resource-demanding inter-processing-element global data accesses into local data accesses to avoid communication congestion. We also propose the supporting compiler techniques that can efficiently utilize the FDR feature to achieve higher performance for a variety of applications. Our results on FDR-based CGRA are compared with two other works in this field: ADRES and RCP. Experimental results for various multimedia applications show that FDR combined with the new compiler deliver up to 29% and 21% higher performance than ADRES and RCP, respectively.

  14. High performance parallel computing of flows in complex geometries: I. Methods

    Gourdain, N; Gicquel, L; Montagnac, M; Vermorel, O; Staffelbach, G; Garcia, M; Boussuge, J-F; Gazaix, M; Poinsot, T

    2009-01-01

    Efficient numerical tools coupled with high-performance computers, have become a key element of the design process in the fields of energy supply and transportation. However flow phenomena that occur in complex systems such as gas turbines and aircrafts are still not understood mainly because of the models that are needed. In fact, most computational fluid dynamics (CFD) predictions as found today in industry focus on a reduced or simplified version of the real system (such as a periodic sector) and are usually solved with a steady-state assumption. This paper shows how to overcome such barriers and how such a new challenge can be addressed by developing flow solvers running on high-end computing platforms, using thousands of computing cores. Parallel strategies used by modern flow solvers are discussed with particular emphases on mesh-partitioning, load balancing and communication. Two examples are used to illustrate these concepts: a multi-block structured code and an unstructured code. Parallel computing strategies used with both flow solvers are detailed and compared. This comparison indicates that mesh-partitioning and load balancing are more straightforward with unstructured grids than with multi-block structured meshes. However, the mesh-partitioning stage can be challenging for unstructured grids, mainly due to memory limitations of the newly developed massively parallel architectures. Finally, detailed investigations show that the impact of mesh-partitioning on the numerical CFD solutions, due to rounding errors and block splitting, may be of importance and should be accurately addressed before qualifying massively parallel CFD tools for a routine industrial use.

  15. Non-Planar Nanotube and Wavy Architecture Based Ultra-High Performance Field Effect Transistors

    Hanna, Amir

    2016-01-01

    This dissertation also introduces a novel thin-film-transistors architecture that is named the Wavy Channel (WC) architecture, which allows for extending device width by integrating vertical fin-like substrate corrugations giving

  16. Peer-to-peer computing for secure high performance data copying

    Hanushevsky, A.; Trunov, A.; Cottrell, L.

    2001-01-01

    The BaBar Copy Program (bbcp) is an excellent representative of peer-to-peer (P2P) computing. It is also a pioneering application of its type in the P2P arena. Built upon the foundation of its predecessor, Secure Fast Copy (sfcp), bbcp incorporates significant improvements performance and usability. As with sfcp, bbcp uses ssh for authentication; providing an elegant and simple working model--if you can ssh to a location, you can copy files to or from that location. To fully support this notion, bbcp transparently supports 3rd party copy operations. The program also incorporates several mechanism to deal with firewall security; the bane of P2P computing. To achieve high performance in a wide area network, bbcp allows a user to independently specify, the number of parallel network streams, tcp window size, and the file I/O blocking factor. Using these parameters, data is pipelined from source to target to provide a uniform traffic pattern that maximizes router efficiency. For improved recoverability, bbcp also keeps track of copy operations so that an operation can be restarted from the point of failure at a later time; minimizing the amount of network traffic in the event of a copy failure. Here, the authors present the bbcp architecture, it's various features, and the reasons for their inclusion

  17. Linear Subpixel Learning Algorithm for Land Cover Classification from WELD using High Performance Computing

    Ganguly, S.; Kumar, U.; Nemani, R. R.; Kalia, S.; Michaelis, A.

    2017-12-01

    In this work, we use a Fully Constrained Least Squares Subpixel Learning Algorithm to unmix global WELD (Web Enabled Landsat Data) to obtain fractions or abundances of substrate (S), vegetation (V) and dark objects (D) classes. Because of the sheer nature of data and compute needs, we leveraged the NASA Earth Exchange (NEX) high performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into 4 classes namely, forest, farmland, water and urban areas (with NPP-VIIRS - national polar orbiting partnership visible infrared imaging radiometer suite nighttime lights data) over California, USA using Random Forest classifier. Validation of these land cover maps with NLCD (National Land Cover Database) 2011 products and NAFD (North American Forest Dynamics) static forest cover maps showed that an overall classification accuracy of over 91% was achieved, which is a 6% improvement in unmixing based classification relative to per-pixel based classification. As such, abundance maps continue to offer an useful alternative to high-spatial resolution data derived classification maps for forest inventory analysis, multi-class mapping for eco-climatic models and applications, fast multi-temporal trend analysis and for societal and policy-relevant applications needed at the watershed scale.

  18. Development of high performance scientific components for interoperability of computing packages

    Gulabani, Teena Pratap [Iowa State Univ., Ames, IA (United States)

    2008-01-01

    Three major high performance quantum chemistry computational packages, NWChem, GAMESS and MPQC have been developed by different research efforts following different design patterns. The goal is to achieve interoperability among these packages by overcoming the challenges caused by the different communication patterns and software design of each of these packages. A chemistry algorithm is hard to develop as well as being a time consuming process; integration of large quantum chemistry packages will allow resource sharing and thus avoid reinvention of the wheel. Creating connections between these incompatible packages is the major motivation of the proposed work. This interoperability is achieved by bringing the benefits of Component Based Software Engineering through a plug-and-play component framework called Common Component Architecture (CCA). In this thesis, I present a strategy and process used for interfacing two widely used and important computational chemistry methodologies: Quantum Mechanics and Molecular Mechanics. To show the feasibility of the proposed approach the Tuning and Analysis Utility (TAU) has been coupled with NWChem code and its CCA components. Results show that the overhead is negligible when compared to the ease and potential of organizing and coping with large-scale software applications.

  19. Peer-to-Peer Computing for Secure High Performance Data Copying

    2002-01-01

    The BaBar Copy Program (bbcp) is an excellent representative of peer-to-peer (P2P) computing. It is also a pioneering application of its type in the P2P arena. Built upon the foundation of its predecessor, Secure Fast Copy (sfcp), bbcp incorporates significant improvements performance and usability. As with sfcp, bbcp uses ssh for authentication; providing an elegant and simple working model -- if you can ssh to a location, you can copy files to or from that location. To fully support this notion, bbcp transparently supports 3rd party copy operations. The program also incorporates several mechanism to deal with firewall security; the bane of P2P computing. To achieve high performance in a wide area network, bbcp allows a user to independently specify, the number of parallel network streams, tcp window size, and the file I/O blocking factor. Using these parameters, data is pipelined from source to target to provide a uniform traffic pattern that maximizes router efficiency. For improved recoverability, bbcp also keeps track of copy operations so that an operation can be restarted from the point of failure at a later time; minimizing the amount of network traffic in the event of a copy failure. Here, we preset the bbcp architecture, it's various features, and the reasons for their inclusion

  20. Architectural and compiler techniques for energy reduction in high-performance microprocessors

    Bellas, Nikolaos

    1999-11-01

    The microprocessor industry has started viewing power, along with area and performance, as a decisive design factor in today's microprocessors. The increasing cost of packaging and cooling systems poses stringent requirements on the maximum allowable power dissipation. Most of the research in recent years has focused on the circuit, gate, and register-transfer (RT) levels of the design. In this research, we focus on the software running on a microprocessor and we view the program as a power consumer. Our work concentrates on the role of the compiler in the construction of "power-efficient" code, and especially its interaction with the hardware so that unnecessary processor activity is saved. We propose techniques that use extra hardware features and compiler-driven code transformations that specifically target activity reduction in certain parts of the CPU which are known to be large power and energy consumers. Design for low power/energy at this level of abstraction entails larger energy gains than in the lower stages of the design hierarchy in which the design team has already made the most important design commitments. The role of the compiler in generating code which exploits the processor organization is also fundamental in energy minimization. Hence, we propose a hardware/software co-design paradigm, and we show what code transformations are necessary by the compiler so that "wasted" power in a modern microprocessor can be trimmed. More specifically, we propose a technique that uses an additional mini cache located between the instruction cache (I-Cache) and the CPU core; the mini cache buffers instructions that are nested within loops and are continuously fetched from the I-Cache. This mechanism can create very substantial energy savings, since the I-Cache unit is one of the main power consumers in most of today's high-performance microprocessors. Results are reported for the SPEC95 benchmarks in the R-4400 processor which implements the MIPS2 instruction

  1. High performance parallel computers for science: New developments at the Fermilab advanced computer program

    Nash, T.; Areti, H.; Atac, R.

    1988-08-01

    Fermilab's Advanced Computer Program (ACP) has been developing highly cost effective, yet practical, parallel computers for high energy physics since 1984. The ACP's latest developments are proceeding in two directions. A Second Generation ACP Multiprocessor System for experiments will include $3500 RISC processors each with performance over 15 VAX MIPS. To support such high performance, the new system allows parallel I/O, parallel interprocess communication, and parallel host processes. The ACP Multi-Array Processor, has been developed for theoretical physics. Each $4000 node is a FORTRAN or C programmable pipelined 20 MFlops (peak), 10 MByte single board computer. These are plugged into a 16 port crossbar switch crate which handles both inter and intra crate communication. The crates are connected in a hypercube. Site oriented applications like lattice gauge theory are supported by system software called CANOPY, which makes the hardware virtually transparent to users. A 256 node, 5 GFlop, system is under construction. 10 refs., 7 figs

  2. Department of Energy Mathematical, Information, and Computational Sciences Division: High Performance Computing and Communications Program

    NONE

    1996-11-01

    This document is intended to serve two purposes. Its first purpose is that of a program status report of the considerable progress that the Department of Energy (DOE) has made since 1993, the time of the last such report (DOE/ER-0536, The DOE Program in HPCC), toward achieving the goals of the High Performance Computing and Communications (HPCC) Program. The second purpose is that of a summary report of the many research programs administered by the Mathematical, Information, and Computational Sciences (MICS) Division of the Office of Energy Research under the auspices of the HPCC Program and to provide, wherever relevant, easy access to pertinent information about MICS-Division activities via universal resource locators (URLs) on the World Wide Web (WWW).

  3. Department of Energy: MICS (Mathematical Information, and Computational Sciences Division). High performance computing and communications program

    NONE

    1996-06-01

    This document is intended to serve two purposes. Its first purpose is that of a program status report of the considerable progress that the Department of Energy (DOE) has made since 1993, the time of the last such report (DOE/ER-0536, {open_quotes}The DOE Program in HPCC{close_quotes}), toward achieving the goals of the High Performance Computing and Communications (HPCC) Program. The second purpose is that of a summary report of the many research programs administered by the Mathematical, Information, and Computational Sciences (MICS) Division of the Office of Energy Research under the auspices of the HPCC Program and to provide, wherever relevant, easy access to pertinent information about MICS-Division activities via universal resource locators (URLs) on the World Wide Web (WWW). The information pointed to by the URL is updated frequently, and the interested reader is urged to access the WWW for the latest information.

  4. Field-programmable custom computing technology architectures, tools, and applications

    Luk, Wayne; Pocek, Ken

    2000-01-01

    Field-Programmable Custom Computing Technology: Architectures, Tools, and Applications brings together in one place important contributions and up-to-date research results in this fast-moving area. In seven selected chapters, the book describes the latest advances in architectures, design methods, and applications of field-programmable devices for high-performance reconfigurable systems. The contributors to this work were selected from the leading researchers and practitioners in the field. It will be valuable to anyone working or researching in the field of custom computing technology. It serves as an excellent reference, providing insight into some of the most challenging issues being examined today.

  5. (Invited) Wavy Channel TFT Architecture for High Performance Oxide Based Displays

    Hanna, Amir

    2015-05-22

    We show the effectiveness of wavy channel architecture for thin film transistor application for increased output current. This specific architecture allows increased width of the device by adopting a corrugated shape of the substrate without any further real estate penalty. The performance improvement is attributed not only to the increased transistor width, but also to enhanced applied electric field in the channel due to the wavy architecture.

  6. (Invited) Wavy Channel TFT Architecture for High Performance Oxide Based Displays

    Hanna, Amir; Hussain, Aftab M.; Hussain, Aftab M.; Ghoneim, Mohamed T.; Rojas, Jhonathan Prieto; Sevilla, Galo T.; Hussain, Muhammad Mustafa

    2015-01-01

    We show the effectiveness of wavy channel architecture for thin film transistor application for increased output current. This specific architecture allows increased width of the device by adopting a corrugated shape of the substrate without any further real estate penalty. The performance improvement is attributed not only to the increased transistor width, but also to enhanced applied electric field in the channel due to the wavy architecture.

  7. The Convergence of High Performance Computing and Large Scale Data Analytics

    Duffy, D.; Bowen, M. K.; Thompson, J. H.; Yang, C. P.; Hu, F.; Wills, B.

    2015-12-01

    As the combinations of remote sensing observations and model outputs have grown, scientists are increasingly burdened with both the necessity and complexity of large-scale data analysis. Scientists are increasingly applying traditional high performance computing (HPC) solutions to solve their "Big Data" problems. While this approach has the benefit of limiting data movement, the HPC system is not optimized to run analytics, which can create problems that permeate throughout the HPC environment. To solve these issues and to alleviate some of the strain on the HPC environment, the NASA Center for Climate Simulation (NCCS) has created the Advanced Data Analytics Platform (ADAPT), which combines both HPC and cloud technologies to create an agile system designed for analytics. Large, commonly used data sets are stored in this system in a write once/read many file system, such as Landsat, MODIS, MERRA, and NGA. High performance virtual machines are deployed and scaled according to the individual scientist's requirements specifically for data analysis. On the software side, the NCCS and GMU are working with emerging commercial technologies and applying them to structured, binary scientific data in order to expose the data in new ways. Native NetCDF data is being stored within a Hadoop Distributed File System (HDFS) enabling storage-proximal processing through MapReduce while continuing to provide accessibility of the data to traditional applications. Once the data is stored within HDFS, an additional indexing scheme is built on top of the data and placed into a relational database. This spatiotemporal index enables extremely fast mappings of queries to data locations to dramatically speed up analytics. These are some of the first steps toward a single unified platform that optimizes for both HPC and large-scale data analysis, and this presentation will elucidate the resulting and necessary exascale architectures required for future systems.

  8. Holey Nanocarbon Architectures for High-Performance Lithium-Air Batteries

    National Aeronautics and Space Administration — The objective of this proposal is to develop 3-dimensional hierarchical mesoporous nanocarbon architecture using primarily our unique holey nanocarbon platforms...

  9. Scientific Grand Challenges: Forefront Questions in Nuclear Science and the Role of High Performance Computing

    Khaleel, Mohammad A.

    2009-01-01

    This report is an account of the deliberations and conclusions of the workshop on 'Forefront Questions in Nuclear Science and the Role of High Performance Computing' held January 26-28, 2009, co-sponsored by the U.S. Department of Energy (DOE) Office of Nuclear Physics (ONP) and the DOE Office of Advanced Scientific Computing (ASCR). Representatives from the national and international nuclear physics communities, as well as from the high performance computing community, participated. The purpose of this workshop was to (1) identify forefront scientific challenges in nuclear physics and then determine which-if any-of these could be aided by high performance computing at the extreme scale; (2) establish how and why new high performance computing capabilities could address issues at the frontiers of nuclear science; (3) provide nuclear physicists the opportunity to influence the development of high performance computing; and (4) provide the nuclear physics community with plans for development of future high performance computing capability by DOE ASCR.

  10. Scientific Grand Challenges: Forefront Questions in Nuclear Science and the Role of High Performance Computing

    Khaleel, Mohammad A.

    2009-10-01

    This report is an account of the deliberations and conclusions of the workshop on "Forefront Questions in Nuclear Science and the Role of High Performance Computing" held January 26-28, 2009, co-sponsored by the U.S. Department of Energy (DOE) Office of Nuclear Physics (ONP) and the DOE Office of Advanced Scientific Computing (ASCR). Representatives from the national and international nuclear physics communities, as well as from the high performance computing community, participated. The purpose of this workshop was to 1) identify forefront scientific challenges in nuclear physics and then determine which-if any-of these could be aided by high performance computing at the extreme scale; 2) establish how and why new high performance computing capabilities could address issues at the frontiers of nuclear science; 3) provide nuclear physicists the opportunity to influence the development of high performance computing; and 4) provide the nuclear physics community with plans for development of future high performance computing capability by DOE ASCR.

  11. A performance model for the communication in fast multipole methods on high-performance computing platforms

    Ibeid, Huda

    2016-03-04

    Exascale systems are predicted to have approximately 1 billion cores, assuming gigahertz cores. Limitations on affordable network topologies for distributed memory systems of such massive scale bring new challenges to the currently dominant parallel programing model. Currently, there are many efforts to evaluate the hardware and software bottlenecks of exascale designs. It is therefore of interest to model application performance and to understand what changes need to be made to ensure extrapolated scalability. The fast multipole method (FMM) was originally developed for accelerating N-body problems in astrophysics and molecular dynamics but has recently been extended to a wider range of problems. Its high arithmetic intensity combined with its linear complexity and asynchronous communication patterns make it a promising algorithm for exascale systems. In this paper, we discuss the challenges for FMM on current parallel computers and future exascale architectures, with a focus on internode communication. We focus on the communication part only; the efficiency of the computational kernels are beyond the scope of the present study. We develop a performance model that considers the communication patterns of the FMM and observe a good match between our model and the actual communication time on four high-performance computing (HPC) systems, when latency, bandwidth, network topology, and multicore penalties are all taken into account. To our knowledge, this is the first formal characterization of internode communication in FMM that validates the model against actual measurements of communication time. The ultimate communication model is predictive in an absolute sense; however, on complex systems, this objective is often out of reach or of a difficulty out of proportion to its benefit when there exists a simpler model that is inexpensive and sufficient to guide coding decisions leading to improved scaling. The current model provides such guidance.

  12. A performance model for the communication in fast multipole methods on high-performance computing platforms

    Ibeid, Huda; Yokota, Rio; Keyes, David E.

    2016-01-01

    model and the actual communication time on four high-performance computing (HPC) systems, when latency, bandwidth, network topology, and multicore penalties are all taken into account. To our knowledge, this is the first formal characterization

  13. Spatial Processing of Urban Acoustic Wave Fields from High-Performance Computations

    Ketcham, Stephen A; Wilson, D. K; Cudney, Harley H; Parker, Michael W

    2007-01-01

    .... The objective of this work is to develop spatial processing techniques for acoustic wave propagation data from three-dimensional high-performance computations to quantify scattering due to urban...

  14. FY 1996 Blue Book: High Performance Computing and Communications: Foundations for America`s Information Future

    Networking and Information Technology Research and Development, Executive Office of the President — The Federal High Performance Computing and Communications HPCC Program will celebrate its fifth anniversary in October 1996 with an impressive array of...

  15. Optical high-performance computing: introduction to the JOSA A and Applied Optics feature.

    Caulfield, H John; Dolev, Shlomi; Green, William M J

    2009-08-01

    The feature issues in both Applied Optics and the Journal of the Optical Society of America A focus on topics of immediate relevance to the community working in the area of optical high-performance computing.

  16. FY 1997 Blue Book: High Performance Computing and Communications: Advancing the Frontiers of Information Technology

    Networking and Information Technology Research and Development, Executive Office of the President — The Federal High Performance Computing and Communications HPCC Program will celebrate its fifth anniversary in October 1996 with an impressive array of...

  17. Can We Build a Truly High Performance Computer Which is Flexible and Transparent?

    Rojas, Jhonathan Prieto; Sevilla, Galo T.; Hussain, Muhammad Mustafa

    2013-01-01

    cost advantage. In that context, low-cost mono-crystalline bulk silicon (100) based high performance transistors are considered as the heart of today's computers. One limitation is silicon's rigidity and brittleness. Here we show a generic batch process

  18. Comprehensive Simulation Lifecycle Management for High Performance Computing Modeling and Simulation, Phase I

    National Aeronautics and Space Administration — There are significant logistical barriers to entry-level high performance computing (HPC) modeling and simulation (M IllinoisRocstar) sets up the infrastructure for...

  19. Analysis and modeling of social influence in high performance computing workloads

    Zheng, Shuai; Shae, Zon Yin; Zhang, Xiangliang; Jamjoom, Hani T.; Fong, Liana

    2011-01-01

    Social influence among users (e.g., collaboration on a project) creates bursty behavior in the underlying high performance computing (HPC) workloads. Using representative HPC and cluster workload logs, this paper identifies, analyzes, and quantifies

  20. Export Controls: Implementation of the 1998 Legislative Mandate for High Performance Computers

    1999-01-01

    We found that most of the 938 proposed exports of high performance computers to civilian end users in countries of concern from February 3, 1998, when procedures implementing the 1998 authorization...

  1. High-Performance Computer Modeling of the Cosmos-Iridium Collision

    Olivier, S; Cook, K; Fasenfest, B; Jefferson, D; Jiang, M; Leek, J; Levatin, J; Nikolaev, S; Pertica, A; Phillion, D; Springer, K; De Vries, W

    2009-08-28

    This paper describes the application of a new, integrated modeling and simulation framework, encompassing the space situational awareness (SSA) enterprise, to the recent Cosmos-Iridium collision. This framework is based on a flexible, scalable architecture to enable efficient simulation of the current SSA enterprise, and to accommodate future advancements in SSA systems. In particular, the code is designed to take advantage of massively parallel, high-performance computer systems available, for example, at Lawrence Livermore National Laboratory. We will describe the application of this framework to the recent collision of the Cosmos and Iridium satellites, including (1) detailed hydrodynamic modeling of the satellite collision and resulting debris generation, (2) orbital propagation of the simulated debris and analysis of the increased risk to other satellites (3) calculation of the radar and optical signatures of the simulated debris and modeling of debris detection with space surveillance radar and optical systems (4) determination of simulated debris orbits from modeled space surveillance observations and analysis of the resulting orbital accuracy, (5) comparison of these modeling and simulation results with Space Surveillance Network observations. We will also discuss the use of this integrated modeling and simulation framework to analyze the risks and consequences of future satellite collisions and to assess strategies for mitigating or avoiding future incidents, including the addition of new sensor systems, used in conjunction with the Space Surveillance Network, for improving space situational awareness.

  2. Brain architecture: a design for natural computation.

    Kaiser, Marcus

    2007-12-15

    Fifty years ago, John von Neumann compared the architecture of the brain with that of the computers he invented and which are still in use today. In those days, the organization of computers was based on concepts of brain organization. Here, we give an update on current results on the global organization of neural systems. For neural systems, we outline how the spatial and topological architecture of neuronal and cortical networks facilitates robustness against failures, fast processing and balanced network activation. Finally, we discuss mechanisms of self-organization for such architectures. After all, the organization of the brain might again inspire computer architecture.

  3. Computer programming and architecture the VAX

    Levy, Henry

    2014-01-01

    Takes a unique systems approach to programming and architecture of the VAXUsing the VAX as a detailed example, the first half of this book offers a complete course in assembly language programming. The second describes higher-level systems issues in computer architecture. Highlights include the VAX assembler and debugger, other modern architectures such as RISCs, multiprocessing and parallel computing, microprogramming, caches and translation buffers, and an appendix on the Berkeley UNIX assembler.

  4. Transforming the existing building stock to high performed energy efficient and experienced architecture

    Vestergaard, Inge

    architectural heritage to energy efficiency and from architectural quality to sustainability. The first, second and third renovations are discussed from financial and sustainable view points. The role of housing related to the public energy supply system and the relation between the levels of renovation......The project Sustainable Renovation examines the challenge of the current and future architectural renovation of Danish suburbs which were designed in the period from 1945 to 1973. The research project takes its starting point in the perspectives of energy optimization and the fact that the building...

  5. Resilient and Robust High Performance Computing Platforms for Scientific Computing Integrity

    Jin, Yier [Univ. of Central Florida, Orlando, FL (United States)

    2017-07-14

    As technology advances, computer systems are subject to increasingly sophisticated cyber-attacks that compromise both their security and integrity. High performance computing platforms used in commercial and scientific applications involving sensitive, or even classified data, are frequently targeted by powerful adversaries. This situation is made worse by a lack of fundamental security solutions that both perform efficiently and are effective at preventing threats. Current security solutions fail to address the threat landscape and ensure the integrity of sensitive data. As challenges rise, both private and public sectors will require robust technologies to protect its computing infrastructure. The research outcomes from this project try to address all these challenges. For example, we present LAZARUS, a novel technique to harden kernel Address Space Layout Randomization (KASLR) against paging-based side-channel attacks. In particular, our scheme allows for fine-grained protection of the virtual memory mappings that implement the randomization. We demonstrate the effectiveness of our approach by hardening a recent Linux kernel with LAZARUS, mitigating all of the previously presented side-channel attacks on KASLR. Our extensive evaluation shows that LAZARUS incurs only 0.943% overhead for standard benchmarks, and is therefore highly practical. We also introduced HA2lloc, a hardware-assisted allocator that is capable of leveraging an extended memory management unit to detect memory errors in the heap. We also perform testing using HA2lloc in a simulation environment and find that the approach is capable of preventing common memory vulnerabilities.

  6. High Performance Computing Facility Operational Assessment, FY 2010 Oak Ridge Leadership Computing Facility

    Bland, Arthur S Buddy [ORNL; Hack, James J [ORNL; Baker, Ann E [ORNL; Barker, Ashley D [ORNL; Boudwin, Kathlyn J. [ORNL; Kendall, Ricky A [ORNL; Messer, Bronson [ORNL; Rogers, James H [ORNL; Shipman, Galen M [ORNL; White, Julia C [ORNL

    2010-08-01

    Oak Ridge National Laboratory's (ORNL's) Cray XT5 supercomputer, Jaguar, kicked off the era of petascale scientific computing in 2008 with applications that sustained more than a thousand trillion floating point calculations per second - or 1 petaflop. Jaguar continues to grow even more powerful as it helps researchers broaden the boundaries of knowledge in virtually every domain of computational science, including weather and climate, nuclear energy, geosciences, combustion, bioenergy, fusion, and materials science. Their insights promise to broaden our knowledge in areas that are vitally important to the Department of Energy (DOE) and the nation as a whole, particularly energy assurance and climate change. The science of the 21st century, however, will demand further revolutions in computing, supercomputers capable of a million trillion calculations a second - 1 exaflop - and beyond. These systems will allow investigators to continue attacking global challenges through modeling and simulation and to unravel longstanding scientific questions. Creating such systems will also require new approaches to daunting challenges. High-performance systems of the future will need to be codesigned for scientific and engineering applications with best-in-class communications networks and data-management infrastructures and teams of skilled researchers able to take full advantage of these new resources. The Oak Ridge Leadership Computing Facility (OLCF) provides the nation's most powerful open resource for capability computing, with a sustainable path that will maintain and extend national leadership for DOE's Office of Science (SC). The OLCF has engaged a world-class team to support petascale science and to take a dramatic step forward, fielding new capabilities for high-end science. This report highlights the successful delivery and operation of a petascale system and shows how the OLCF fosters application development teams, developing cutting-edge tools

  7. Electrospun fibers for high performance anodes in microbial fuel cells. Optimizing materials and architecture

    Chen, Shuiliang

    2010-04-15

    the results of above, the porosity and the pore size in the fiber mat are utmost important for the performance of anode in MFCs. With concept of curve or helix in fibers can lead to higher porosity in the fiber mat, a novel 3D porous architecture, nanospring, was designed for high performance anode structure in future MFC. Polymeric nanospring was prepared by bicomponent electrospinning. The reasons for the formation of polymeric nanosprings were investigated by coaxial electrospinning of bicomponent rigid i.e. Nomex {sup registered} or polysulfonamide (PSA) (rigid) and flexible polymers i.e. thermoplastic polyurethane (TPU) (flexible). The results indicated that the nanospring formation is attributed to longitudinal compressive forces which are resulted from the different shrinkages of the rigid and flexible two polymer components and a good electrical conductivity of one of the polymer solutions in coaxial electrospinning system. The modified electrospinning i.e. off-centered electrospinning and side-by-side electrospinning are much more effective than the coaxial electrospinning for generating polymer spring or helical structures, because of the higher longitudinal compressive forces which derived from the lopsided elastic forces. The aligned nanofiber mat with high percent of nanospring shows higher elongation and higher storage modulus below transition glass temperature (T{sub g}) compared to that with straight fibers. The nanospring or helical shape preserves much void-space in the mat. It would be a potential architecture for highly efficient anode in future MFCs. (orig.)

  8. High-performance computing on GPUs for resistivity logging of oil and gas wells

    Glinskikh, V.; Dudaev, A.; Nechaev, O.; Surodina, I.

    2017-10-01

    We developed and implemented into software an algorithm for high-performance simulation of electrical logs from oil and gas wells using high-performance heterogeneous computing. The numerical solution of the 2D forward problem is based on the finite-element method and the Cholesky decomposition for solving a system of linear algebraic equations (SLAE). Software implementations of the algorithm used the NVIDIA CUDA technology and computing libraries are made, allowing us to perform decomposition of SLAE and find its solution on central processor unit (CPU) and graphics processor unit (GPU). The calculation time is analyzed depending on the matrix size and number of its non-zero elements. We estimated the computing speed on CPU and GPU, including high-performance heterogeneous CPU-GPU computing. Using the developed algorithm, we simulated resistivity data in realistic models.

  9. Implementation of the Principal Component Analysis onto High-Performance Computer Facilities for Hyperspectral Dimensionality Reduction: Results and Comparisons

    Ernestina Martel

    2018-06-01

    Full Text Available Dimensionality reduction represents a critical preprocessing step in order to increase the efficiency and the performance of many hyperspectral imaging algorithms. However, dimensionality reduction algorithms, such as the Principal Component Analysis (PCA, suffer from their computationally demanding nature, becoming advisable for their implementation onto high-performance computer architectures for applications under strict latency constraints. This work presents the implementation of the PCA algorithm onto two different high-performance devices, namely, an NVIDIA Graphics Processing Unit (GPU and a Kalray manycore, uncovering a highly valuable set of tips and tricks in order to take full advantage of the inherent parallelism of these high-performance computing platforms, and hence, reducing the time that is required to process a given hyperspectral image. Moreover, the achieved results obtained with different hyperspectral images have been compared with the ones that were obtained with a field programmable gate array (FPGA-based implementation of the PCA algorithm that has been recently published, providing, for the first time in the literature, a comprehensive analysis in order to highlight the pros and cons of each option.

  10. Architectures for single-chip image computing

    Gove, Robert J.

    1992-04-01

    This paper will focus on the architectures of VLSI programmable processing components for image computing applications. TI, the maker of industry-leading RISC, DSP, and graphics components, has developed an architecture for a new-generation of image processors capable of implementing a plurality of image, graphics, video, and audio computing functions. We will show that the use of a single-chip heterogeneous MIMD parallel architecture best suits this class of processors--those which will dominate the desktop multimedia, document imaging, computer graphics, and visualization systems of this decade.

  11. High-performance floating-point image computing workstation for medical applications

    Mills, Karl S.; Wong, Gilman K.; Kim, Yongmin

    1990-07-01

    The medical imaging field relies increasingly on imaging and graphics techniques in diverse applications with needs similar to (or more stringent than) those of the military, industrial and scientific communities. However, most image processing and graphics systems available for use in medical imaging today are either expensive, specialized, or in most cases both. High performance imaging and graphics workstations which can provide real-time results for a number of applications, while maintaining affordability and flexibility, can facilitate the application of digital image computing techniques in many different areas. This paper describes the hardware and software architecture of a medium-cost floating-point image processing and display subsystem for the NeXT computer, and its applications as a medical imaging workstation. Medical imaging applications of the workstation include use in a Picture Archiving and Communications System (PACS), in multimodal image processing and 3-D graphics workstation for a broad range of imaging modalities, and as an electronic alternator utilizing its multiple monitor display capability and large and fast frame buffer. The subsystem provides a 2048 x 2048 x 32-bit frame buffer (16 Mbytes of image storage) and supports both 8-bit gray scale and 32-bit true color images. When used to display 8-bit gray scale images, up to four different 256-color palettes may be used for each of four 2K x 2K x 8-bit image frames. Three of these image frames can be used simultaneously to provide pixel selectable region of interest display. A 1280 x 1024 pixel screen with 1: 1 aspect ratio can be windowed into the frame buffer for display of any portion of the processed image or images. In addition, the system provides hardware support for integer zoom and an 82-color cursor. This subsystem is implemented on an add-in board occupying a single slot in the NeXT computer. Up to three boards may be added to the NeXT for multiple display capability (e

  12. Transforming the existing building stock to high performed energy efficient and experienced architecture

    Vestergaard, Inge

    The project Sustainable Renovation examines the challenge of the current and future architectural renovation of Danish suburbs which were designed in the period from 1945 to 1973. The research project takes its starting point in the perspectives of energy optimization and the fact that the building...

  13. Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency

    Ltaief, Hatem; Luszczek, Piotr R.; Dongarra, Jack

    2011-01-01

    This paper presents the power profile of two high performance dense linear algebra libraries i.e., LAPACK and PLASMA. The former is based on block algorithms that use the fork-join paradigm to achieve parallel performance. The latter uses fine

  14. Non-Planar Nanotube and Wavy Architecture Based Ultra-High Performance Field Effect Transistors

    Hanna, Amir

    2016-11-01

    This dissertation presents a unique concept for a device architecture named the nanotube (NT) architecture, which is capable of higher drive current compared to the Gate-All-Around Nanowire architecture when applied to heterostructure Tunnel Field Effect Transistors. Through the use of inner/outer core-shell gates, heterostructure NT TFET leverages physically larger tunneling area thus achieving higher driver current (ION) and saving real estates by eliminating arraying requirement. We discuss the physics of p-type (Silicon/Indium Arsenide) and n-type (Silicon/Germanium hetero-structure) based TFETs. Numerical TCAD simulations have shown that NT TFETs have 5x and 1.6 x higher normalized ION when compared to GAA NW TFET for p and n-type TFETs, respectively. This is due to the availability of larger tunneling junction cross sectional area, and lower Shockley-Reed-Hall recombination, while achieving sub 60 mV/dec performance for more than 5 orders of magnitude of drain current, thus enabling scaling down of Vdd to 0.5 V. This dissertation also introduces a novel thin-film-transistors architecture that is named the Wavy Channel (WC) architecture, which allows for extending device width by integrating vertical fin-like substrate corrugations giving rise to up to 50% larger device width, without occupying extra chip area. The novel architecture shows 2x higher output drive current per unit chip area when compared to conventional planar architecture. The current increase is attributed to both the extra device width and 50% enhancement in field effect mobility due to electrostatic gating effects. Digital circuits are fabricated to demonstrate the potential of integrating WC TFT based circuits. WC inverters have shown 2× the peak-to-peak output voltage for the same input, and ~2× the operation frequency of the planar inverters for the same peak-to-peak output voltage. WC NAND circuits have shown 2× higher peak-to-peak output voltage, and 3× lower high-to-low propagation

  15. Wavy channel thin film transistor architecture for area efficient, high performance and low power displays

    Hanna, Amir

    2013-12-23

    We demonstrate a new thin film transistor (TFT) architecture that allows expansion of the device width using continuous fin features - termed as wavy channel (WC) architecture. This architecture allows expansion of transistor width in a direction perpendicular to the substrate, thus not consuming extra chip area, achieving area efficiency. The devices have shown for a 13% increase in the device width resulting in a maximum 2.5× increase in \\'ON\\' current value of the WCTFT, when compared to planar devices consuming the same chip area, while using atomic layer deposition based zinc oxide (ZnO) as the channel material. The WCTFT devices also maintain similar \\'OFF\\' current value, ~100 pA, when compared to planar devices, thus not compromising on power consumption for performance which usually happens with larger width devices. This work offers an interesting opportunity to use WCTFTs as backplane circuitry for large-area high-resolution display applications. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  16. High performance computing in science and engineering Garching/Munich 2016

    Wagner, Siegfried; Bode, Arndt; Bruechle, Helmut; Brehm, Matthias (eds.)

    2016-11-01

    Computer simulations are the well-established third pillar of natural sciences along with theory and experimentation. Particularly high performance computing is growing fast and constantly demands more and more powerful machines. To keep pace with this development, in spring 2015, the Leibniz Supercomputing Centre installed the high performance computing system SuperMUC Phase 2, only three years after the inauguration of its sibling SuperMUC Phase 1. Thereby, the compute capabilities were more than doubled. This book covers the time-frame June 2014 until June 2016. Readers will find many examples of outstanding research in the more than 130 projects that are covered in this book, with each one of these projects using at least 4 million core-hours on SuperMUC. The largest scientific communities using SuperMUC in the last two years were computational fluid dynamics simulations, chemistry and material sciences, astrophysics, and life sciences.

  17. High-performance Sonitopia (Sonic Utopia): Hyper intelligent Material-based Architectural Systems for Acoustic Energy Harvesting

    Heidari, F.; Mahdavinejad, M.

    2017-08-01

    The rate of energy consumption in all over the world, based on reliable statistics of international institutions such as the International Energy Agency (IEA) shows significant increase in energy demand in recent years. Periodical recorded data shows a continuous increasing trend in energy consumption especially in developed countries as well as recently emerged developing economies such as China and India. While air pollution and water contamination as results of high consumption of fossil energy resources might be consider as menace to civic ideals such as livability, conviviality and people-oriented cities. In other hand, automobile dependency, cars oriented design and other noisy activities in urban spaces consider as threats to urban life. Thus contemporary urban design and planning concentrates on rethinking about ecology of sound, reorganizing the soundscape of neighborhoods, redesigning the sonic order of urban space. It seems that contemporary architecture and planning trends through soundscape mapping look for sonitopia (Sonic + Utopia) This paper is to propose some interactive hyper intelligent material-based architectural systems for acoustic energy harvesting. The proposed architectural design system may be result in high-performance architecture and planning strategies for future cities. The ultimate aim of research is to develop a comprehensive system for acoustic energy harvesting which cover the aim of noise reduction as well as being in harmony with architectural design. The research methodology is based on a literature review as well as experimental and quasi-experimental strategies according the paradigm of designedly ways of doing and knowing. While architectural design has solution-focused essence in problem-solving process, the proposed systems had better be hyper intelligent rather than predefined procedures. Therefore, the steps of the inference mechanism of the research include: 1- understanding sonic energy and noise potentials as energy

  18. Brain architecture: A design for natural computation

    Kaiser, Marcus

    2008-01-01

    Fifty years ago, John von Neumann compared the architecture of the brain with that of computers that he invented and which is still in use today. In those days, the organisation of computers was based on concepts of brain organisation. Here, we give an update on current results on the global organisation of neural systems. For neural systems, we outline how the spatial and topological architecture of neuronal and cortical networks facilitates robustness against failures, fast processing, and ...

  19. A computer architecture for intelligent machines

    Lefebvre, D. R.; Saridis, G. N.

    1992-01-01

    The theory of intelligent machines proposes a hierarchical organization for the functions of an autonomous robot based on the principle of increasing precision with decreasing intelligence. An analytic formulation of this theory using information-theoretic measures of uncertainty for each level of the intelligent machine has been developed. The authors present a computer architecture that implements the lower two levels of the intelligent machine. The architecture supports an event-driven programming paradigm that is independent of the underlying computer architecture and operating system. Execution-level controllers for motion and vision systems are briefly addressed, as well as the Petri net transducer software used to implement coordination-level functions. A case study illustrates how this computer architecture integrates real-time and higher-level control of manipulator and vision systems.

  20. High performance 3D neutron transport on peta scale and hybrid architectures within APOLLO3 code

    Jamelot, E.; Dubois, J.; Lautard, J-J.; Calvin, C.; Baudron, A-M.

    2011-01-01

    APOLLO3 code is a common project of CEA, AREVA and EDF for the development of a new generation system for core physics analysis. We present here the parallelization of two deterministic transport solvers of APOLLO3: MINOS, a simplified 3D transport solver on structured Cartesian and hexagonal grids, and MINARET, a transport solver based on triangular meshes on 2D and prismatic ones in 3D. We used two different techniques to accelerate MINOS: a domain decomposition method, combined with an accelerated algorithm using GPU. The domain decomposition is based on the Schwarz iterative algorithm, with Robin boundary conditions to exchange information. The Robin parameters influence the convergence and we detail how we optimized the choice of these parameters. MINARET parallelization is based on angular directions calculation using explicit message passing. Fine grain parallelization is also available for each angular direction using shared memory multithreaded acceleration. Many performance results are presented on massively parallel architectures using more than 103 cores and on hybrid architectures using some tens of GPUs. This work contributes to the HPC development in reactor physics at the CEA Nuclear Energy Division. (author)

  1. SEMICONDUCTOR INTEGRATED CIRCUITS: A high performance 90 nm CMOS SAR ADC with hybrid architecture

    Xingyuan, Tong; Jianming, Chen; Zhangming, Zhu; Yintang, Yang

    2010-01-01

    A 10-bit 2.5 MS/s SAR A/D converter is presented. In the circuit design, an R-C hybrid architecture D/A converter, pseudo-differential comparison architecture and low power voltage level shifters are utilized. Design challenges and considerations are also discussed. In the layout design, each unit resistor is sided by dummies for good matching performance, and the capacitors are routed with a common-central symmetry method to reduce the nonlin-earity error. This proposed converter is implemented based on 90 nm CMOS logic process. With a 3.3 V analog supply and a 1.0 V digital supply, the differential and integral nonlinearity are measured to be less than 0.36 LSB and 0.69 LSB respectively. With an input frequency of 1.2 MHz at 2.5 MS/s sampling rate, the SFDR and ENOB are measured to be 72.86 dB and 9.43 bits respectively, and the power dissipation is measured to be 6.62 mW including the output drivers. This SAR A/D converter occupies an area of 238 × 214 μm2. The design results of this converter show that it is suitable for multi-supply embedded SoC applications.

  2. A high performance 90 nm CMOS SAR ADC with hybrid architecture

    Tong Xingyuan; Zhu Zhangming; Yang Yintang; Chen Jianming

    2010-01-01

    A 10-bit 2.5 MS/s SAR A/D converter is presented. In the circuit design, an R-C hybrid architecture D/A converter, pseudo-differential comparison architecture and low power voltage level shifters are utilized. Design challenges and considerations are also discussed. In the layout design, each unit resistor is sided by dummies for good matching performance, and the capacitors are routed with a common-central symmetry method to reduce the nonlin-earity error. This proposed converter is implemented based on 90 nm CMOS logic process. With a 3.3 V analog supply and a 1.0 V digital supply, the differential and integral nonlinearity are measured to be less than 0.36 LSB and 0.69 LSB respectively. With an input frequency of 1.2 MHz at 2.5 MS/s sampling rate, the SFDR and ENOB are measured to be 72.86 dB and 9.43 bits respectively, and the power dissipation is measured to be 6.62 mW including the output drivers. This SAR A/D converter occupies an area of 238 x 214 μm 2 . The design results of this converter show that it is suitable for multi-supply embedded SoC applications. (semiconductor integrated circuits)

  3. Nest-like LiFePO4/C architectures for high performance lithium ion batteries

    Deng Honggui; Jin Shuangling; Zhan Liang; Qiao Wenming; Ling Licheng

    2012-01-01

    Highlights: ► Nest-like LiFePO 4 /C architectures (nest-like LPCs) were synthesized by solvothermal method. ► The microstructures of nest-like LPCs are very stable constructed by many nanosheets. ► The unique structures offer nest-like LPC electrode with high rate performance. ► The reversible capacity of nest-like LPCs electrode is as high as 120 mAh g −1 at 10 C. - Abstract: A novel kind of microsized nest-like LiFePO 4 /C architectures was synthesized by solvothermal method using inexpensive and stable Fe 3+ salt as iron source and ethylene glycol as mediate. A layer of carbon could be coated directly on the surface of LiFePO 4 crystals and the nest-like unique structures offer the cathode materials with high reversible capacity, excellent cycling stability and high rate performance. The reversible capacity can maintain 159 mAh g −1 at 0.1 C and 120 mAh g −1 at 10 C.

  4. Cactus and Visapult: An ultra-high performance grid-distributedvisualization architecture using connectionless protocols

    Bethel, E. Wes; Shalf, John

    2002-08-31

    This past decade has seen rapid growth in the size,resolution, and complexity of Grand Challenge simulation codes. Thistrend is accompanied by a trend towards multinational, multidisciplinaryteams who carry out this research in distributed teams, and thecorresponding growth of Grid infrastructure to support these widelydistributed Virtual Organizations. As the number and diversity ofdistributed teams grow, the need for visualization tools to analyze anddisplay multi-terabyte, remote data becomes more pronounced and moreurgent. One such tool that has been successfully used to address thisproblem is Visapult. Visapult is a parallel visualization tool thatemploys Grid-distributed components, latency tolerant visualization andgraphics algorithms, along with high performance network I/O in order toachieve effective remote analysis of massive datasets. In this paper wediscuss improvements to network bandwidth utilization and responsivenessof the Visapult application that result from using connectionlessprotocols to move data payload between the distributed Visapultcomponents and a Grid-enabled, high performance physics simulation usedto study gravitational waveforms of colliding black holes: The Cactuscode. These improvements have boosted Visapult's network efficiency to88-96 percent of the maximum theoretical available bandwidth onmulti-gigabit Wide Area Networks, and greatly enhanced interactivity.Such improvements are critically important for future development ofeffective interactive Grid applications.

  5. Business Models of High Performance Computing Centres in Higher Education in Europe

    Eurich, Markus; Calleja, Paul; Boutellier, Roman

    2013-01-01

    High performance computing (HPC) service centres are a vital part of the academic infrastructure of higher education organisations. However, despite their importance for research and the necessary high capital expenditures, business research on HPC service centres is mostly missing. From a business perspective, it is important to find an answer to…

  6. High-Performance Computing in Neuroscience for Data-Driven Discovery, Integration, and Dissemination

    Bouchard, Kristofer E.

    2016-01-01

    A lack of coherent plans to analyze, manage, and understand data threatens the various opportunities offered by new neuro-technologies. High-performance computing will allow exploratory analysis of massive datasets stored in standardized formats, hosted in open repositories, and integrated with simulations.

  7. Requirements for high performance computing for lattice QCD. Report of the ECFA working panel

    Jegerlehner, F.; Kenway, R.D.; Martinelli, G.; Michael, C.; Pene, O.; Petersson, B.; Petronzio, R.; Sachrajda, C.T.; Schilling, K.

    2000-01-01

    This report, prepared at the request of the European Committee for Future Accelerators (ECFA), contains an assessment of the High Performance Computing resources which will be required in coming years by European physicists working in Lattice Field Theory and a review of the scientific opportunities which these resources would open. (orig.)

  8. The high performance cluster computing system for BES offline data analysis

    Sun Yongzhao; Xu Dong; Zhang Shaoqiang; Yang Ting

    2004-01-01

    A high performance cluster computing system (EPCfarm) is introduced, which used for BES offline data analysis. The setup and the characteristics of the hardware and software of EPCfarm are described. The PBS, a queue management package, and the performance of EPCfarm is presented also. (authors)

  9. Security Architecture of Cloud Computing

    V.KRISHNA REDDY; Dr. L.S.S.REDDY

    2011-01-01

    The Cloud Computing offers service over internet with dynamically scalable resources. Cloud Computing services provides benefits to the users in terms of cost and ease of use. Cloud Computing services need to address the security during the transmission of sensitive data and critical applications to shared and public cloud environments. The cloud environments are scaling large for data processing and storage needs. Cloud computing environment have various advantages as well as disadvantages o...

  10. A Strategy for Architecture Design of Crystalline Perovskite Light-Emitting Diodes with High Performance.

    Shi, Yifei; Wu, Wen; Dong, Hua; Li, Guangru; Xi, Kai; Divitini, Giorgio; Ran, Chenxin; Yuan, Fang; Zhang, Min; Jiao, Bo; Hou, Xun; Wu, Zhaoxin

    2018-06-01

    All present designs of perovskite light-emitting diodes (PeLEDs) stem from polymer light-emitting diodes (PLEDs) or perovskite solar cells. The optimal structure of PeLEDs can be predicted to differ from PLEDs due to the different fluorescence dynamics and crystallization between perovskite and polymer. Herein, a new design strategy and conception is introduced, "insulator-perovskite-insulator" (IPI) architecture tailored to PeLEDs. As examples of FAPbBr 3 and MAPbBr 3 , it is experimentally shown that the IPI structure effectively induces charge carriers into perovskite crystals, blocks leakage currents via pinholes in the perovskite film, and avoids exciton quenching simultaneously. Consequently, as for FAPbBr 3 , a 30-fold enhancement in the current efficiency of IPI-structured PeLEDs compared to a control device with poly(3,4ethylenedioxythiophene):poly(styrene sulfonate) as hole-injection layer is achieved-from 0.64 to 20.3 cd A -1 -while the external quantum efficiency is increased from 0.174% to 5.53%. As the example of CsPbBr 3 , compared with the control device, both current efficiency and lifetime of IPI-structured PeLEDs are improved from 1.42 and 4 h to 9.86 cd A -1 and 96 h. This IPI architecture represents a novel strategy for the design of light-emitting didoes based on various perovskites with high efficiencies and stabilities. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. Infrastructure for Multiphysics Software Integration in High Performance Computing-Aided Science and Engineering

    Campbell, Michael T. [Illinois Rocstar LLC, Champaign, IL (United States); Safdari, Masoud [Illinois Rocstar LLC, Champaign, IL (United States); Kress, Jessica E. [Illinois Rocstar LLC, Champaign, IL (United States); Anderson, Michael J. [Illinois Rocstar LLC, Champaign, IL (United States); Horvath, Samantha [Illinois Rocstar LLC, Champaign, IL (United States); Brandyberry, Mark D. [Illinois Rocstar LLC, Champaign, IL (United States); Kim, Woohyun [Illinois Rocstar LLC, Champaign, IL (United States); Sarwal, Neil [Illinois Rocstar LLC, Champaign, IL (United States); Weisberg, Brian [Illinois Rocstar LLC, Champaign, IL (United States)

    2016-10-15

    The project described in this report constructed and exercised an innovative multiphysics coupling toolkit called the Illinois Rocstar MultiPhysics Application Coupling Toolkit (IMPACT). IMPACT is an open source, flexible, natively parallel infrastructure for coupling multiple uniphysics simulation codes into multiphysics computational systems. IMPACT works with codes written in several high-performance-computing (HPC) programming languages, and is designed from the beginning for HPC multiphysics code development. It is designed to be minimally invasive to the individual physics codes being integrated, and has few requirements on those physics codes for integration. The goal of IMPACT is to provide the support needed to enable coupling existing tools together in unique and innovative ways to produce powerful new multiphysics technologies without extensive modification and rewrite of the physics packages being integrated. There are three major outcomes from this project: 1) construction, testing, application, and open-source release of the IMPACT infrastructure, 2) production of example open-source multiphysics tools using IMPACT, and 3) identification and engagement of interested organizations in the tools and applications resulting from the project. This last outcome represents the incipient development of a user community and application echosystem being built using IMPACT. Multiphysics coupling standardization can only come from organizations working together to define needs and processes that span the space of necessary multiphysics outcomes, which Illinois Rocstar plans to continue driving toward. The IMPACT system, including source code, documentation, and test problems are all now available through the public gitHUB.org system to anyone interested in multiphysics code coupling. Many of the basic documents explaining use and architecture of IMPACT are also attached as appendices to this document. Online HTML documentation is available through the gitHUB site

  12. Fundamentals of computer architecture and design

    Bindal, Ahmet

    2017-01-01

    This textbook provides semester-length coverage of computer architecture and design, providing a strong foundation for students to understand modern computer system architecture and to apply these insights and principles to future computer designs.  It is based on the author’s decades of industrial experience with computer architecture and design, as well as with teaching students focused on pursuing careers in computer engineering.  Unlike a number of existing textbooks for this course, this one focuses not only on CPU architecture, but also covers in great detail in system buses, peripherals and memories.This book teaches every element in a computing system in two steps.  First, it introduces the functionality of each topic (and subtopics) and then goes into “from-scratch design” of a particular digital block from its architectural specifications using timing diagrams.  The author describes how the data-path of a certain digital block is generated using timin g diagrams, a method which most textbo...

  13. Quantum computation architecture using optical tweezers

    Weitenberg, Christof; Kuhr, Stefan; Mølmer, Klaus

    2011-01-01

    We present a complete architecture for scalable quantum computation with ultracold atoms in optical lattices using optical tweezers focused to the size of a lattice spacing. We discuss three different two-qubit gates based on local collisional interactions. The gates between arbitrary qubits...... quantum computing....

  14. High-performance computing for structural mechanics and earthquake/tsunami engineering

    Hori, Muneo; Ohsaki, Makoto

    2016-01-01

    Huge earthquakes and tsunamis have caused serious damage to important structures such as civil infrastructure elements, buildings and power plants around the globe.  To quantitatively evaluate such damage processes and to design effective prevention and mitigation measures, the latest high-performance computational mechanics technologies, which include telascale to petascale computers, can offer powerful tools. The phenomena covered in this book include seismic wave propagation in the crust and soil, seismic response of infrastructure elements such as tunnels considering soil-structure interactions, seismic response of high-rise buildings, seismic response of nuclear power plants, tsunami run-up over coastal towns and tsunami inundation considering fluid-structure interactions. The book provides all necessary information for addressing these phenomena, ranging from the fundamentals of high-performance computing for finite element methods, key algorithms of accurate dynamic structural analysis, fluid flows ...

  15. 8th International Workshop on Parallel Tools for High Performance Computing

    Gracia, José; Knüpfer, Andreas; Resch, Michael; Nagel, Wolfgang

    2015-01-01

    Numerical simulation and modelling using High Performance Computing has evolved into an established technique in academic and industrial research. At the same time, the High Performance Computing infrastructure is becoming ever more complex. For instance, most of the current top systems around the world use thousands of nodes in which classical CPUs are combined with accelerator cards in order to enhance their compute power and energy efficiency. This complexity can only be mastered with adequate development and optimization tools. Key topics addressed by these tools include parallelization on heterogeneous systems, performance optimization for CPUs and accelerators, debugging of increasingly complex scientific applications, and optimization of energy usage in the spirit of green IT. This book represents the proceedings of the 8th International Parallel Tools Workshop, held October 1-2, 2014 in Stuttgart, Germany – which is a forum to discuss the latest advancements in the parallel tools.

  16. High performance computing and communications: Advancing the frontiers of information technology

    NONE

    1997-12-31

    This report, which supplements the President`s Fiscal Year 1997 Budget, describes the interagency High Performance Computing and Communications (HPCC) Program. The HPCC Program will celebrate its fifth anniversary in October 1996 with an impressive array of accomplishments to its credit. Over its five-year history, the HPCC Program has focused on developing high performance computing and communications technologies that can be applied to computation-intensive applications. Major highlights for FY 1996: (1) High performance computing systems enable practical solutions to complex problems with accuracies not possible five years ago; (2) HPCC-funded research in very large scale networking techniques has been instrumental in the evolution of the Internet, which continues exponential growth in size, speed, and availability of information; (3) The combination of hardware capability measured in gigaflop/s, networking technology measured in gigabit/s, and new computational science techniques for modeling phenomena has demonstrated that very large scale accurate scientific calculations can be executed across heterogeneous parallel processing systems located thousands of miles apart; (4) Federal investments in HPCC software R and D support researchers who pioneered the development of parallel languages and compilers, high performance mathematical, engineering, and scientific libraries, and software tools--technologies that allow scientists to use powerful parallel systems to focus on Federal agency mission applications; and (5) HPCC support for virtual environments has enabled the development of immersive technologies, where researchers can explore and manipulate multi-dimensional scientific and engineering problems. Educational programs fostered by the HPCC Program have brought into classrooms new science and engineering curricula designed to teach computational science. This document contains a small sample of the significant HPCC Program accomplishments in FY 1996.

  17. Cloud Computing: Architecture and Services

    Ms. Ravneet Kaur

    2018-01-01

    Cloud computing is Internet-based computing, whereby shared resources, software, and information are provided to computers and other devices on demand, like the electricity grid. It is a method for delivering information technology (IT) services where resources are retrieved from the Internet through web-based tools and applications, as opposed to a direct connection to a server. Rather than keeping files on a proprietary hard drive or local storage device, cloud-based storage makes it possib...

  18. Monte Carlo simulations on SIMD computer architectures

    Burmester, C.P.; Gronsky, R.; Wille, L.T.

    1992-01-01

    In this paper algorithmic considerations regarding the implementation of various materials science applications of the Monte Carlo technique to single instruction multiple data (SIMD) computer architectures are presented. In particular, implementation of the Ising model with nearest, next nearest, and long range screened Coulomb interactions on the SIMD architecture MasPar MP-1 (DEC mpp-12000) series of massively parallel computers is demonstrated. Methods of code development which optimize processor array use and minimize inter-processor communication are presented including lattice partitioning and the use of processor array spanning tree structures for data reduction. Both geometric and algorithmic parallel approaches are utilized. Benchmarks in terms of Monte Carl updates per second for the MasPar architecture are presented and compared to values reported in the literature from comparable studies on other architectures

  19. High-performance field emission device utilizing vertically aligned carbon nanotubes-based pillar architectures

    Gupta, Bipin Kumar; Kedawat, Garima; Gangwar, Amit Kumar; Nagpal, Kanika; Kashyap, Pradeep Kumar; Srivastava, Shubhda; Singh, Satbir; Kumar, Pawan; Suryawanshi, Sachin R.; Seo, Deok Min; Tripathi, Prashant; More, Mahendra A.; Srivastava, O. N.; Hahm, Myung Gwan; Late, Dattatray J.

    2018-01-01

    The vertical aligned carbon nanotubes (CNTs)-based pillar architectures were created on laminated silicon oxide/silicon (SiO2/Si) wafer substrate at 775 °C by using water-assisted chemical vapor deposition under low pressure process condition. The lamination was carried out by aluminum (Al, 10.0 nm thickness) as a barrier layer and iron (Fe, 1.5 nm thickness) as a catalyst precursor layer sequentially on a silicon wafer substrate. Scanning electron microscope (SEM) images show that synthesized CNTs are vertically aligned and uniformly distributed with a high density. The CNTs have approximately 2-30 walls with an inner diameter of 3-8 nm. Raman spectrum analysis shows G-band at 1580 cm-1 and D-band at 1340 cm-1. The G-band is higher than D-band, which indicates that CNTs are highly graphitized. The field emission analysis of the CNTs revealed high field emission current density (4mA/cm2 at 1.2V/μm), low turn-on field (0.6 V/μm) and field enhancement factor (6917) with better stability and longer lifetime. Emitter morphology resulting in improved promising field emission performances, which is a crucial factor for the fabrication of pillared shaped vertical aligned CNTs bundles as practical electron sources.

  20. High-performance field emission device utilizing vertically aligned carbon nanotubes-based pillar architectures

    Bipin Kumar Gupta

    2018-01-01

    Full Text Available The vertical aligned carbon nanotubes (CNTs-based pillar architectures were created on laminated silicon oxide/silicon (SiO2/Si wafer substrate at 775 °C by using water-assisted chemical vapor deposition under low pressure process condition. The lamination was carried out by aluminum (Al, 10.0 nm thickness as a barrier layer and iron (Fe, 1.5 nm thickness as a catalyst precursor layer sequentially on a silicon wafer substrate. Scanning electron microscope (SEM images show that synthesized CNTs are vertically aligned and uniformly distributed with a high density. The CNTs have approximately 2–30 walls with an inner diameter of 3–8 nm. Raman spectrum analysis shows G-band at 1580 cm−1 and D-band at 1340 cm−1. The G-band is higher than D-band, which indicates that CNTs are highly graphitized. The field emission analysis of the CNTs revealed high field emission current density (4mA/cm2 at 1.2V/μm, low turn-on field (0.6 V/μm and field enhancement factor (6917 with better stability and longer lifetime. Emitter morphology resulting in improved promising field emission performances, which is a crucial factor for the fabrication of pillared shaped vertical aligned CNTs bundles as practical electron sources.

  1. High performance computing system in the framework of the Higgs boson studies

    Belyaev, Nikita; The ATLAS collaboration

    2017-01-01

    The Higgs boson physics is one of the most important and promising fields of study in modern High Energy Physics. To perform precision measurements of the Higgs boson properties, the use of fast and efficient instruments of Monte Carlo event simulation is required. Due to the increasing amount of data and to the growing complexity of the simulation software tools, the computing resources currently available for Monte Carlo simulation on the LHC GRID are not sufficient. One of the possibilities to address this shortfall of computing resources is the usage of institutes computer clusters, commercial computing resources and supercomputers. In this paper, a brief description of the Higgs boson physics, the Monte-Carlo generation and event simulation techniques are presented. A description of modern high performance computing systems and tests of their performance are also discussed. These studies have been performed on the Worldwide LHC Computing Grid and Kurchatov Institute Data Processing Center, including Tier...

  2. Digital architecture, wearable computers and providing affinity

    Guglielmi, Michel; Johannesen, Hanne Louise

    2005-01-01

    as the setting for the events of experience. Contemporary architecture is a meta-space residing almost any thinkable field, striving to blur boundaries between art, architecture, design and urbanity and break down the distinction between the material and the user or inhabitant. The presentation for this paper...... will, through research, a workshop and participation in a cumulus competition, focus on the exploration of boundaries between digital architecture, performative space and wearable computers. Our design method in general focuses on the interplay between the performing body and the environment – between...

  3. Topic 14+16: High-performance and scientific applications and extreme-scale computing (Introduction)

    Downes, Turlough P.

    2013-01-01

    As our understanding of the world around us increases it becomes more challenging to make use of what we already know, and to increase our understanding still further. Computational modeling and simulation have become critical tools in addressing this challenge. The requirements of high-resolution, accurate modeling have outstripped the ability of desktop computers and even small clusters to provide the necessary compute power. Many applications in the scientific and engineering domains now need very large amounts of compute time, while other applications, particularly in the life sciences, frequently have large data I/O requirements. There is thus a growing need for a range of high performance applications which can utilize parallel compute systems effectively, which have efficient data handling strategies and which have the capacity to utilise current and future systems. The High Performance and Scientific Applications topic aims to highlight recent progress in the use of advanced computing and algorithms to address the varied, complex and increasing challenges of modern research throughout both the "hard" and "soft" sciences. This necessitates being able to use large numbers of compute nodes, many of which are equipped with accelerators, and to deal with difficult I/O requirements. © 2013 Springer-Verlag.

  4. Thinking processes used by high-performing students in a computer programming task

    Marietjie Havenga

    2011-07-01

    Full Text Available Computer programmers must be able to understand programming source code and write programs that execute complex tasks to solve real-world problems. This article is a trans- disciplinary study at the intersection of computer programming, education and psychology. It outlines the role of mental processes in the process of programming and indicates how successful thinking processes can support computer science students in writing correct and well-defined programs. A mixed methods approach was used to better understand the thinking activities and programming processes of participating students. Data collection involved both computer programs and students’ reflective thinking processes recorded in their journals. This enabled analysis of psychological dimensions of participants’ thinking processes and their problem-solving activities as they considered a programming problem. Findings indicate that the cognitive, reflective and psychological processes used by high-performing programmers contributed to their success in solving a complex programming problem. Based on the thinking processes of high performers, we propose a model of integrated thinking processes, which can support computer programming students. Keywords: Computer programming, education, mixed methods research, thinking processes.  Disciplines: Computer programming, education, psychology

  5. Architecture, systems research and computational sciences

    2012-01-01

    The Winter 2012 (vol. 14 no. 1) issue of the Nexus Network Journal is dedicated to the theme “Architecture, Systems Research and Computational Sciences”. This is an outgrowth of the session by the same name which took place during the eighth international, interdisciplinary conference “Nexus 2010: Relationships between Architecture and Mathematics, held in Porto, Portugal, in June 2010. Today computer science is an integral part of even strictly historical investigations, such as those concerning the construction of vaults, where the computer is used to survey the existing building, analyse the data and draw the ideal solution. What the papers in this issue make especially evident is that information technology has had an impact at a much deeper level as well: architecture itself can now be considered as a manifestation of information and as a complex system. The issue is completed with other research papers, conference reports and book reviews.

  6. Contributing to the design of run-time systems dedicated to high performance computing

    Perache, M.

    2006-10-01

    In the field of intensive scientific computing, the quest for performance has to face the increasing complexity of parallel architectures. Nowadays, these machines exhibit a deep memory hierarchy which complicates the design of efficient parallel applications. This thesis proposes a programming environment allowing to design efficient parallel programs on top of clusters of multi-processors. It features a programming model centered around collective communications and synchronizations, and provides load balancing facilities. The programming interface, named MPC, provides high level paradigms which are optimized according to the underlying architecture. The environment is fully functional and used within the CEA/DAM (TERANOVA) computing center. The evaluations presented in this document confirm the relevance of our approach. (author)

  7. 10th International Workshop on Parallel Tools for High Performance Computing

    Gracia, José; Hilbrich, Tobias; Knüpfer, Andreas; Resch, Michael; Nagel, Wolfgang

    2017-01-01

    This book presents the proceedings of the 10th International Parallel Tools Workshop, held October 4-5, 2016 in Stuttgart, Germany – a forum to discuss the latest advances in parallel tools. High-performance computing plays an increasingly important role for numerical simulation and modelling in academic and industrial research. At the same time, using large-scale parallel systems efficiently is becoming more difficult. A number of tools addressing parallel program development and analysis have emerged from the high-performance computing community over the last decade, and what may have started as collection of small helper script has now matured to production-grade frameworks. Powerful user interfaces and an extensive body of documentation allow easy usage by non-specialists.

  8. International Conference on Modern Mathematical Methods and High Performance Computing in Science and Technology

    Srivastava, HM; Venturino, Ezio; Resch, Michael; Gupta, Vijay

    2016-01-01

    The book discusses important results in modern mathematical models and high performance computing, such as applied operations research, simulation of operations, statistical modeling and applications, invisibility regions and regular meta-materials, unmanned vehicles, modern radar techniques/SAR imaging, satellite remote sensing, coding, and robotic systems. Furthermore, it is valuable as a reference work and as a basis for further study and research. All contributing authors are respected academicians, scientists and researchers from around the globe. All the papers were presented at the international conference on Modern Mathematical Methods and High Performance Computing in Science & Technology (M3HPCST 2015), held at Raj Kumar Goel Institute of Technology, Ghaziabad, India, from 27–29 December 2015, and peer-reviewed by international experts. The conference provided an exceptional platform for leading researchers, academicians, developers, engineers and technocrats from a broad range of disciplines ...

  9. Air Force Science & Technology Issues & Opportunities Regarding High Performance Embedded Computing

    2009-09-23

    price-performance advantage include: large scale simulations of neuromorphic computing models GOTCHA radar video SAR for wide area persistent...the handcuffs were not for me and that the military had so far got … Neuromorphic example: Robust recognition of occluded text Gotcha SAR PCID Image...Architecture 16 cores / chip 10 x 10 stacks / board50 chips / stack EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM AFPGA EDRAM

  10. Solving Problems in Various Domains by Hybrid Models of High Performance Computations

    Yurii Rogozhin

    2014-03-01

    Full Text Available This work presents a hybrid model of high performance computations. The model is based on membrane system (P~system where some membranes may contain quantum device that is triggered by the data entering the membrane. This model is supposed to take advantages of both biomolecular and quantum paradigms and to overcome some of their inherent limitations. The proposed approach is demonstrated through two selected problems: SAT, and image retrieving.

  11. Switching from computer to microcomputer architecture education

    Bolanakis, Dimosthenis E.; Kotsis, Konstantinos T.; Laopoulos, Theodore

    2010-03-01

    In the last decades, the technological and scientific evolution of the computing discipline has been widely affecting research in software engineering education, which nowadays advocates more enlightened and liberal ideas. This article reviews cross-disciplinary research on a computer architecture class in consideration of its switching to microcomputer architecture. The authors present their strategies towards a successful crossing of boundaries between engineering disciplines. This communication aims at providing a different aspect on professional courses that are, nowadays, addressed at the expense of traditional courses.

  12. The new landscape of parallel computer architecture

    Shalf, John [NERSC Division, Lawrence Berkeley National Laboratory 1 Cyclotron Road, Berkeley California, 94720 (United States)

    2007-07-15

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models.

  13. The new landscape of parallel computer architecture

    Shalf, John

    2007-01-01

    The past few years has seen a sea change in computer architecture that will impact every facet of our society as every electronic device from cell phone to supercomputer will need to confront parallelism of unprecedented scale. Whereas the conventional multicore approach (2, 4, and even 8 cores) adopted by the computing industry will eventually hit a performance plateau, the highest performance per watt and per chip area is achieved using manycore technology (hundreds or even thousands of cores). However, fully unleashing the potential of the manycore approach to ensure future advances in sustained computational performance will require fundamental advances in computer architecture and programming models that are nothing short of reinventing computing. In this paper we examine the reasons behind the movement to exponentially increasing parallelism, and its ramifications for system design, applications and programming models

  14. FPGA hardware acceleration for high performance neutron transport computation based on agent methodology - 318

    Shanjie, Xiao; Tatjana, Jevremovic

    2010-01-01

    The accurate, detailed and 3D neutron transport analysis for Gen-IV reactors is still time-consuming regardless of advanced computational hardware available in developed countries. This paper introduces a new concept in addressing the computational time while persevering the detailed and accurate modeling; a specifically designed FPGA co-processor accelerates robust AGENT methodology for complex reactor geometries. For the first time this approach is applied to accelerate the neutronics analysis. The AGENT methodology solves neutron transport equation using the method of characteristics. The AGENT methodology performance was carefully analyzed before the hardware design based on the FPGA co-processor was adopted. The most time-consuming kernel part is then transplanted into the FPGA co-processor. The FPGA co-processor is designed with data flow-driven non von-Neumann architecture and has much higher efficiency than the conventional computer architecture. Details of the FPGA co-processor design are introduced and the design is benchmarked using two different examples. The advanced chip architecture helps the FPGA co-processor obtaining more than 20 times speed up with its working frequency much lower than the CPU frequency. (authors)

  15. High Performance Computing - Power Application Programming Interface Specification Version 2.0.

    Laros, James H. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Grant, Ryan [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Levenhagen, Michael J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Olivier, Stephen Lecler [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Pedretti, Kevin [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Ward, H. Lee [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Younge, Andrew J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-03-01

    Measuring and controlling the power and energy consumption of high performance computing systems by various components in the software stack is an active research area. Implementations in lower level software layers are beginning to emerge in some production systems, which is very welcome. To be most effective, a portable interface to measurement and control features would significantly facilitate participation by all levels of the software stack. We present a proposal for a standard power Application Programming Interface (API) that endeavors to cover the entire software space, from generic hardware interfaces to the input from the computer facility manager.

  16. A high-performance model for shallow-water simulations in distributed and heterogeneous architectures

    Conde, Daniel; Canelas, Ricardo B.; Ferreira, Rui M. L.

    2017-04-01

    One of the most common challenges in hydrodynamic modelling is the trade off one must make between highly resolved simulations and the time required for their computation. In the particular case of urban floods, modelers are often forced to simplify the complex geometries of the problem, or to implicitly include some of its hydrodynamic effects, due to the typically very large spatial scales involved and limited computational resources. At CEris - Instituto Superior Técnico, Universidade de Lisboa - the STAV-2D shallow-water model, particularly suited for strong transient flows in complex and dynamic geometries, has been under development for the past recent years (Canelas et al., 2013 & Conde et al., 2013). The model is based on an explicit, first-order 2DH finite-volume discretization scheme for unstructured triangular meshes, in which a flux-splitting technique is paired with a reviewed Roe-Riemann solver, yielding a model applicable to discontinuous flows over time-evolving geometries. STAV-2D features solid transport in both Euleran and Lagrangian forms, with the first aiming at describing the transport of fine natural sediments and the latter aimed at large individual debris. The model has been validated with theoretical solutions and laboratory experiments (Canelas et al., 2013 & Conde et al., 2015). This work presents our most recent effort in STAV-2D: the re-design of the code in a modern Object-Oriented parallel framework for heterogeneous computations in CPUs and GPUs. The programming language of choice for this re-design was C++, due to its wide support of established and emerging parallel programming interfaces. The current implementation of STAV-2D provides two different levels of parallel granularity: inter-node and intra-node. Inter-node parallelism is achieved by distributing a simulation across a set of worker nodes, with communication between nodes being explicitly managed through MPI. At this level, the main difficulty is associated with the

  17. Electromagnetic Physics Models for Parallel Computing Architectures

    Amadio, G; Bianchini, C; Iope, R; Ananya, A; Apostolakis, J; Aurora, A; Bandieramonte, M; Brun, R; Carminati, F; Gheata, A; Gheata, M; Goulas, I; Nikitina, T; Bhattacharyya, A; Mohanty, A; Canal, P; Elvira, D; Jun, S Y; Lima, G; Duhem, L

    2016-01-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well. (paper)

  18. Electromagnetic Physics Models for Parallel Computing Architectures

    Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

    2016-10-01

    The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.

  19. Cloud object store for archive storage of high performance computing data using decoupling middleware

    Bent, John M.; Faibish, Sorin; Grider, Gary

    2015-06-30

    Cloud object storage is enabled for archived data, such as checkpoints and results, of high performance computing applications using a middleware process. A plurality of archived files, such as checkpoint files and results, generated by a plurality of processes in a parallel computing system are stored by obtaining the plurality of archived files from the parallel computing system; converting the plurality of archived files to objects using a log structured file system middleware process; and providing the objects for storage in a cloud object storage system. The plurality of processes may run, for example, on a plurality of compute nodes. The log structured file system middleware process may be embodied, for example, as a Parallel Log-Structured File System (PLFS). The log structured file system middleware process optionally executes on a burst buffer node.

  20. Cloud object store for checkpoints of high performance computing applications using decoupling middleware

    Bent, John M.; Faibish, Sorin; Grider, Gary

    2016-04-19

    Cloud object storage is enabled for checkpoints of high performance computing applications using a middleware process. A plurality of files, such as checkpoint files, generated by a plurality of processes in a parallel computing system are stored by obtaining said plurality of files from said parallel computing system; converting said plurality of files to objects using a log structured file system middleware process; and providing said objects for storage in a cloud object storage system. The plurality of processes may run, for example, on a plurality of compute nodes. The log structured file system middleware process may be embodied, for example, as a Parallel Log-Structured File System (PLFS). The log structured file system middleware process optionally executes on a burst buffer node.

  1. RAPPORT: running scientific high-performance computing applications on the cloud.

    Cohen, Jeremy; Filippis, Ioannis; Woodbridge, Mark; Bauer, Daniela; Hong, Neil Chue; Jackson, Mike; Butcher, Sarah; Colling, David; Darlington, John; Fuchs, Brian; Harvey, Matt

    2013-01-28

    Cloud computing infrastructure is now widely used in many domains, but one area where there has been more limited adoption is research computing, in particular for running scientific high-performance computing (HPC) software. The Robust Application Porting for HPC in the Cloud (RAPPORT) project took advantage of existing links between computing researchers and application scientists in the fields of bioinformatics, high-energy physics (HEP) and digital humanities, to investigate running a set of scientific HPC applications from these domains on cloud infrastructure. In this paper, we focus on the bioinformatics and HEP domains, describing the applications and target cloud platforms. We conclude that, while there are many factors that need consideration, there is no fundamental impediment to the use of cloud infrastructure for running many types of HPC applications and, in some cases, there is potential for researchers to benefit significantly from the flexibility offered by cloud platforms.

  2. CAAD as Computer-Activated Architectural Design

    Galle, Per

    1998-01-01

    In a brief sketch, drawing on a general philosophical conception of human interaction with the world, the architectural design process is analysed in terms of two kinds of human action: interpretation and production. Both of these are seen as establishing a link between mental and material entities....... On this background two alternative roles of computers in computer-aided architectural design (CAAD) are distinguished: a passive and a more active role, where in the latter case, the computer’s capacity for symbol manipulation is utilized to influence design thinking actively. The analysis offered in this paper may...... serve at least two purposes: to provide a conceptual machinery for research and reflection on CAAD, and to clarify the notion of ‘artificial intelligence’ in the light of architectural design....

  3. The Fermilab central computing facility architectural model

    Nicholls, J.

    1989-01-01

    The goal of the current Central Computing Upgrade at Fermilab is to create a computing environment that maximizes total productivity, particularly for high energy physics analysis. The Computing Department and the Next Computer Acquisition Committee decided upon a model which includes five components: an interactive front-end, a Large-Scale Scientific Computer (LSSC, a mainframe computing engine), a microprocessor farm system, a file server, and workstations. With the exception of the file server, all segments of this model are currently in production: a VAX/VMS cluster interactive front-end, an Amdahl VM Computing engine, ACP farms, and (primarily) VMS workstations. This paper will discuss the implementation of the Fermilab Central Computing Facility Architectural Model. Implications for Code Management in such a heterogeneous environment, including issues such as modularity and centrality, will be considered. Special emphasis will be placed on connectivity and communications between the front-end, LSSC, and workstations, as practiced at Fermilab. (orig.)

  4. The Fermilab Central Computing Facility architectural model

    Nicholls, J.

    1989-05-01

    The goal of the current Central Computing Upgrade at Fermilab is to create a computing environment that maximizes total productivity, particularly for high energy physics analysis. The Computing Department and the Next Computer Acquisition Committee decided upon a model which includes five components: an interactive front end, a Large-Scale Scientific Computer (LSSC, a mainframe computing engine), a microprocessor farm system, a file server, and workstations. With the exception of the file server, all segments of this model are currently in production: a VAX/VMS Cluster interactive front end, an Amdahl VM computing engine, ACP farms, and (primarily) VMS workstations. This presentation will discuss the implementation of the Fermilab Central Computing Facility Architectural Model. Implications for Code Management in such a heterogeneous environment, including issues such as modularity and centrality, will be considered. Special emphasis will be placed on connectivity and communications between the front-end, LSSC, and workstations, as practiced at Fermilab. 2 figs

  5. On the impact of quantum computing technology on future developments in high-performance scientific computing

    Möller, Matthias; Vuik, Cornelis

    2017-01-01

    Quantum computing technologies have become a hot topic in academia and industry receiving much attention and financial support from all sides. Building a quantum computer that can be used practically is in itself an outstanding challenge that has become the ‘new race to the moon’. Next to researchers and vendors of future computing technologies, national authorities are showing strong interest in maturing this technology due to its known potential to break many of today’s encryption technique...

  6. Computational Fluid Dynamics (CFD) Computations With Zonal Navier-Stokes Flow Solver (ZNSFLOW) Common High Performance Computing Scalable Software Initiative (CHSSI) Software

    Edge, Harris

    1999-01-01

    ...), computational fluid dynamics (CFD) 6 project. Under the project, a proven zonal Navier-Stokes solver was rewritten for scalable parallel performance on both shared memory and distributed memory high performance computers...

  7. Computer aided architectural design : futures 2001

    Vries, de B.; Leeuwen, van J.P.; Achten, H.H.

    2001-01-01

    CAAD Futures is a bi-annual conference that aims to promote the advancement of computer-aided architectural design in the service of those concerned with the quality of the built environment. The conferences are organized under the auspices of the CAAD Futures Foundation, which has its secretariat

  8. Can We Build a Truly High Performance Computer Which is Flexible and Transparent?

    Rojas, Jhonathan Prieto

    2013-09-10

    State-of-the art computers need high performance transistors, which consume ultra-low power resulting in longer battery lifetime. Billions of transistors are integrated neatly using matured silicon fabrication process to maintain the performance per cost advantage. In that context, low-cost mono-crystalline bulk silicon (100) based high performance transistors are considered as the heart of today\\'s computers. One limitation is silicon\\'s rigidity and brittleness. Here we show a generic batch process to convert high performance silicon electronics into flexible and semi-transparent one while retaining its performance, process compatibility, integration density and cost. We demonstrate high-k/metal gate stack based p-type metal oxide semiconductor field effect transistors on 4 inch silicon fabric released from bulk silicon (100) wafers with sub-threshold swing of 80 mV dec(-1) and on/off ratio of near 10(4) within 10% device uniformity with a minimum bending radius of 5 mm and an average transmittance of similar to 7% in the visible spectrum.

  9. Achieving high performance in numerical computations on RISC workstations and parallel systems

    Goedecker, S. [Max-Planck Inst. for Solid State Research, Stuttgart (Germany); Hoisie, A. [Los Alamos National Lab., NM (United States)

    1997-08-20

    The nominal peak speeds of both serial and parallel computers is raising rapidly. At the same time however it is becoming increasingly difficult to get out a significant fraction of this high peak speed from modern computer architectures. In this tutorial the authors give the scientists and engineers involved in numerically demanding calculations and simulations the necessary basic knowledge to write reasonably efficient programs. The basic principles are rather simple and the possible rewards large. Writing a program by taking into account optimization techniques related to the computer architecture can significantly speedup your program, often by factors of 10--100. As such, optimizing a program can for instance be a much better solution than buying a faster computer. If a few basic optimization principles are applied during program development, the additional time needed for obtaining an efficient program is practically negligible. In-depth optimization is usually only needed for a few subroutines or kernels and the effort involved is therefore also acceptable.

  10. On the impact of quantum computing technology on future developments in high-performance scientific computing

    Möller, M.; Vuik, C.

    2017-01-01

    Quantum computing technologies have become a hot topic in academia and industry receiving much attention and financial support from all sides. Building a quantum computer that can be used practically is in itself an outstanding challenge that has become the ‘new race to the moon’. Next to

  11. A comprehensive approach to decipher biological computation to achieve next generation high-performance exascale computing.

    James, Conrad D.; Schiess, Adrian B.; Howell, Jamie; Baca, Michael J.; Partridge, L. Donald; Finnegan, Patrick Sean; Wolfley, Steven L.; Dagel, Daryl James; Spahn, Olga Blum; Harper, Jason C.; Pohl, Kenneth Roy; Mickel, Patrick R.; Lohn, Andrew; Marinella, Matthew

    2013-10-01

    The human brain (volume=1200cm3) consumes 20W and is capable of performing > 10^16 operations/s. Current supercomputer technology has reached 1015 operations/s, yet it requires 1500m^3 and 3MW, giving the brain a 10^12 advantage in operations/s/W/cm^3. Thus, to reach exascale computation, two achievements are required: 1) improved understanding of computation in biological tissue, and 2) a paradigm shift towards neuromorphic computing where hardware circuits mimic properties of neural tissue. To address 1), we will interrogate corticostriatal networks in mouse brain tissue slices, specifically with regard to their frequency filtering capabilities as a function of input stimulus. To address 2), we will instantiate biological computing characteristics such as multi-bit storage into hardware devices with future computational and memory applications. Resistive memory devices will be modeled, designed, and fabricated in the MESA facility in consultation with our internal and external collaborators.

  12. Large computer systems and new architectures

    Bloch, T.

    1978-01-01

    The super-computers of today are becoming quite specialized and one can no longer expect to get all the state-of-the-art software and hardware facilities in one package. In order to achieve faster and faster computing it is necessary to experiment with new architectures, and the cost of developing each experimental architecture into a general-purpose computer system is too high when one considers the relatively small market for these computers. The result is that such computers are becoming 'back-ends' either to special systems (BSP, DAP) or to anything (CRAY-1). Architecturally the CRAY-1 is the most attractive today since it guarantees a speed gain of a factor of two over a CDC 7600 thus allowing us to regard any speed up resulting from vectorization as a bonus. It looks, however, as if it will be very difficult to make substantially faster computers using only pipe-lining techniques and that it will be necessary to explore multiple processors working on the same problem. The experience which will be gained with the BSP and the DAP over the next few years will certainly be most valuable in this respect. (Auth.)

  13. DOE High Performance Computing Operational Review (HPCOR): Enabling Data-Driven Scientific Discovery at HPC Facilities

    Gerber, Richard; Allcock, William; Beggio, Chris; Campbell, Stuart; Cherry, Andrew; Cholia, Shreyas; Dart, Eli; England, Clay; Fahey, Tim; Foertter, Fernanda; Goldstone, Robin; Hick, Jason; Karelitz, David; Kelly, Kaki; Monroe, Laura; Prabhat,; Skinner, David; White, Julia

    2014-10-17

    U.S. Department of Energy (DOE) High Performance Computing (HPC) facilities are on the verge of a paradigm shift in the way they deliver systems and services to science and engineering teams. Research projects are producing a wide variety of data at unprecedented scale and level of complexity, with community-specific services that are part of the data collection and analysis workflow. On June 18-19, 2014 representatives from six DOE HPC centers met in Oakland, CA at the DOE High Performance Operational Review (HPCOR) to discuss how they can best provide facilities and services to enable large-scale data-driven scientific discovery at the DOE national laboratories. The report contains findings from that review.

  14. Integrated State Estimation and Contingency Analysis Software Implementation using High Performance Computing Techniques

    Chen, Yousu; Glaesemann, Kurt R.; Rice, Mark J.; Huang, Zhenyu

    2015-12-31

    Power system simulation tools are traditionally developed in sequential mode and codes are optimized for single core computing only. However, the increasing complexity in the power grid models requires more intensive computation. The traditional simulation tools will soon not be able to meet the grid operation requirements. Therefore, power system simulation tools need to evolve accordingly to provide faster and better results for grid operations. This paper presents an integrated state estimation and contingency analysis software implementation using high performance computing techniques. The software is able to solve large size state estimation problems within one second and achieve a near-linear speedup of 9,800 with 10,000 cores for contingency analysis application. The performance evaluation is presented to show its effectiveness.

  15. Optimization and mathematical modeling in computer architecture

    Sankaralingam, Karu; Nowatzki, Tony

    2013-01-01

    In this book we give an overview of modeling techniques used to describe computer systems to mathematical optimization tools. We give a brief introduction to various classes of mathematical optimization frameworks with special focus on mixed integer linear programming which provides a good balance between solver time and expressiveness. We present four detailed case studies -- instruction set customization, data center resource management, spatial architecture scheduling, and resource allocation in tiled architectures -- showing how MILP can be used and quantifying by how much it outperforms t

  16. Smart SOA platforms in cloud computing architectures

    Exposito , Ernesto

    2014-01-01

    This book is intended to introduce the principles of the Event-Driven and Service-Oriented Architecture (SOA 2.0) and its role in the new interconnected world based on the cloud computing architecture paradigm. In this new context, the concept of "service" is widely applied to the hardware and software resources available in the new generation of the Internet. The authors focus on how current and future SOA technologies provide the basis for the smart management of the service model provided by the Platform as a Service (PaaS) layer.

  17. ATCA for Machines-- Advanced Telecommunications Computing Architecture

    Larsen, R.S.; /SLAC

    2008-04-22

    The Advanced Telecommunications Computing Architecture is a new industry open standard for electronics instrument modules and shelves being evaluated for the International Linear Collider (ILC). It is the first industrial standard designed for High Availability (HA). ILC availability simulations have shown clearly that the capabilities of ATCA are needed in order to achieve acceptable integrated luminosity. The ATCA architecture looks attractive for beam instruments and detector applications as well. This paper provides an overview of ongoing R&D including application of HA principles to power electronics systems.

  18. ATCA for Machines-- Advanced Telecommunications Computing Architecture

    Larsen, R

    2008-01-01

    The Advanced Telecommunications Computing Architecture is a new industry open standard for electronics instrument modules and shelves being evaluated for the International Linear Collider (ILC). It is the first industrial standard designed for High Availability (HA). ILC availability simulations have shown clearly that the capabilities of ATCA are needed in order to achieve acceptable integrated luminosity. The ATCA architecture looks attractive for beam instruments and detector applications as well. This paper provides an overview of ongoing R and D including application of HA principles to power electronics systems

  19. HIGH-PERFORMANCE COMPUTING FOR THE STUDY OF EARTH AND ENVIRONMENTAL SCIENCE MATERIALS USING SYNCHROTRON X-RAY COMPUTED MICROTOMOGRAPHY

    FENG, H.; JONES, K.W.; MCGUIGAN, M.; SMITH, G.J.; SPILETIC, J.

    2001-01-01

    Synchrotron x-ray computed microtomography (CMT) is a non-destructive method for examination of rock, soil, and other types of samples studied in the earth and environmental sciences. The high x-ray intensities of the synchrotron source make possible the acquisition of tomographic volumes at a high rate that requires the application of high-performance computing techniques for data reconstruction to produce the three-dimensional volumes, for their visualization, and for data analysis. These problems are exacerbated by the need to share information between collaborators at widely separated locations over both local and tide-area networks. A summary of the CMT technique and examples of applications are given here together with a discussion of the applications of high-performance computing methods to improve the experimental techniques and analysis of the data

  20. HIGH-PERFORMANCE COMPUTING FOR THE STUDY OF EARTH AND ENVIRONMENTAL SCIENCE MATERIALS USING SYNCHROTRON X-RAY COMPUTED MICROTOMOGRAPHY.

    FENG,H.; JONES,K.W.; MCGUIGAN,M.; SMITH,G.J.; SPILETIC,J.

    2001-10-12

    Synchrotron x-ray computed microtomography (CMT) is a non-destructive method for examination of rock, soil, and other types of samples studied in the earth and environmental sciences. The high x-ray intensities of the synchrotron source make possible the acquisition of tomographic volumes at a high rate that requires the application of high-performance computing techniques for data reconstruction to produce the three-dimensional volumes, for their visualization, and for data analysis. These problems are exacerbated by the need to share information between collaborators at widely separated locations over both local and tide-area networks. A summary of the CMT technique and examples of applications are given here together with a discussion of the applications of high-performance computing methods to improve the experimental techniques and analysis of the data.

  1. High performance simulation for the Silva project using the tera computer

    Bergeaud, V.; La Hargue, J.P.; Mougery, F. [CS Communication and Systemes, 92 - Clamart (France); Boulet, M.; Scheurer, B. [CEA Bruyeres-le-Chatel, 91 - Bruyeres-le-Chatel (France); Le Fur, J.F.; Comte, M.; Benisti, D.; Lamare, J. de; Petit, A. [CEA Saclay, 91 - Gif sur Yvette (France)

    2003-07-01

    In the context of the SILVA Project (Atomic Vapor Laser Isotope Separation), numerical simulation of the plant scale propagation of laser beams through uranium vapour was a great challenge. The PRODIGE code has been developed to achieve this goal. Here we focus on the task of achieving high performance simulation on the TERA computer. We describe the main issues for optimizing the parallelization of the PRODIGE code on TERA. Thus, we discuss advantages and drawbacks of the implemented diagonal parallelization scheme. As a consequence, it has been found fruitful to fit out the code in three aspects: memory allocation, MPI communications and interconnection network bandwidth usage. We stress out the interest of MPI/IO in this context and the benefit obtained for production computations on TERA. Finally, we shall illustrate our developments. We indicate some performance measurements reflecting the good parallelization properties of PRODIGE on the TERA computer. The code is currently used for demonstrating the feasibility of the laser propagation at a plant enrichment level and for preparing the 2003 Menphis experiment. We conclude by emphasizing the contribution of high performance TERA simulation to the project. (authors)

  2. High performance simulation for the Silva project using the tera computer

    Bergeaud, V.; La Hargue, J.P.; Mougery, F.; Boulet, M.; Scheurer, B.; Le Fur, J.F.; Comte, M.; Benisti, D.; Lamare, J. de; Petit, A.

    2003-01-01

    In the context of the SILVA Project (Atomic Vapor Laser Isotope Separation), numerical simulation of the plant scale propagation of laser beams through uranium vapour was a great challenge. The PRODIGE code has been developed to achieve this goal. Here we focus on the task of achieving high performance simulation on the TERA computer. We describe the main issues for optimizing the parallelization of the PRODIGE code on TERA. Thus, we discuss advantages and drawbacks of the implemented diagonal parallelization scheme. As a consequence, it has been found fruitful to fit out the code in three aspects: memory allocation, MPI communications and interconnection network bandwidth usage. We stress out the interest of MPI/IO in this context and the benefit obtained for production computations on TERA. Finally, we shall illustrate our developments. We indicate some performance measurements reflecting the good parallelization properties of PRODIGE on the TERA computer. The code is currently used for demonstrating the feasibility of the laser propagation at a plant enrichment level and for preparing the 2003 Menphis experiment. We conclude by emphasizing the contribution of high performance TERA simulation to the project. (authors)

  3. DEISA2: supporting and developing a European high-performance computing ecosystem

    Lederer, H

    2008-01-01

    The DEISA Consortium has deployed and operated the Distributed European Infrastructure for Supercomputing Applications. Through the EU FP7 DEISA2 project (funded for three years as of May 2008), the consortium is continuing to support and enhance the distributed high-performance computing infrastructure and its activities and services relevant for applications enabling, operation, and technologies, as these are indispensable for the effective support of computational sciences for high-performance computing (HPC). The service-provisioning model will be extended from one that supports single projects to one supporting virtual European communities. Collaborative activities will also be carried out with new European and other international initiatives. Of strategic importance is cooperation with the PRACE project, which is preparing for the installation of a limited number of leadership-class Tier-0 supercomputers in Europe. The key role and aim of DEISA will be to deliver a turnkey operational solution for a persistent European HPC ecosystem that will integrate national Tier-1 centers and the new Tier-0 centers

  4. High Performance Computing and Storage Requirements for Nuclear Physics: Target 2017

    Gerber, Richard [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Wasserman, Harvey [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2014-04-30

    In April 2014, NERSC, ASCR, and the DOE Office of Nuclear Physics (NP) held a review to characterize high performance computing (HPC) and storage requirements for NP research through 2017. This review is the 12th in a series of reviews held by NERSC and Office of Science program offices that began in 2009. It is the second for NP, and the final in the second round of reviews that covered the six Office of Science program offices. This report is the result of that review

  5. Interactive Data Exploration for High-Performance Fluid Flow Computations through Porous Media

    Perovic, Nevena

    2014-09-01

    © 2014 IEEE. Huge data advent in high-performance computing (HPC) applications such as fluid flow simulations usually hinders the interactive processing and exploration of simulation results. Such an interactive data exploration not only allows scientiest to \\'play\\' with their data but also to visualise huge (distributed) data sets in both an efficient and easy way. Therefore, we propose an HPC data exploration service based on a sliding window concept, that enables researches to access remote data (available on a supercomputer or cluster) during simulation runtime without exceeding any bandwidth limitations between the HPC back-end and the user front-end.

  6. Commercialization issues and funding opportunities for high-performance optoelectronic computing modules

    Hessenbruch, John M.; Guilfoyle, Peter S.

    1997-01-01

    Low power, optoelectronic integrated circuits are being developed for high speed switching and data processing applications. These high performance optoelectronic computing modules consist of three primary components: vertical cavity surface emitting lasers, diffractive optical interconnect elements, and detector/amplifier/laser driver arrays. Following the design and fabrication of an HPOC module prototype, selected commercial funding sources will be evaluated to support a product development stage. These include the formation of a strategic alliance with one or more microprocessor or telecommunications vendors, and/or equity investment from one or more venture capital firms.

  7. Failure detection in high-performance clusters and computers using chaotic map computations

    Rao, Nageswara S.

    2015-09-01

    A programmable media includes a processing unit capable of independent operation in a machine that is capable of executing 10.sup.18 floating point operations per second. The processing unit is in communication with a memory element and an interconnect that couples computing nodes. The programmable media includes a logical unit configured to execute arithmetic functions, comparative functions, and/or logical functions. The processing unit is configured to detect computing component failures, memory element failures and/or interconnect failures by executing programming threads that generate one or more chaotic map trajectories. The central processing unit or graphical processing unit is configured to detect a computing component failure, memory element failure and/or an interconnect failure through an automated comparison of signal trajectories generated by the chaotic maps.

  8. Addressing Cloud Computing in Enterprise Architecture: Issues and Challenges

    Khan, Khaled; Gangavarapu, Narendra

    2009-01-01

    This article discusses how the characteristics of cloud computing affect the enterprise architecture in four domains: business, data, application and technology. The ownership and control of architectural components are shifted from organisational perimeters to cloud providers. It argues that although cloud computing promises numerous benefits to enterprises, the shifting control from enterprises to cloud providers on architectural components introduces several architectural challenges. The d...

  9. Selecting a Benchmark Suite to Profile High-Performance Computing (HPC) Machines

    2014-11-01

    architectures. Machines now contain central processing units (CPUs), graphics processing units (GPUs), and many integrated core ( MIC ) architecture all...evaluate the feasibility and applicability of a new architecture just released to the market . Researchers are often unsure how available resources will...architectures. Having a suite of programs running on different architectures, such as GPUs, MICs , and CPUs, adds complexity and technical challenges

  10. Challenges and opportunities of modeling plasma–surface interactions in tungsten using high-performance computing

    Wirth, Brian D., E-mail: bdwirth@utk.edu [Department of Nuclear Engineering, University of Tennessee, Knoxville, TN 37996 (United States); Nuclear Science and Engineering Directorate, Oak Ridge National Laboratory, Oak Ridge, TN (United States); Hammond, K.D. [Department of Nuclear Engineering, University of Tennessee, Knoxville, TN 37996 (United States); Krasheninnikov, S.I. [University of California, San Diego, La Jolla, CA (United States); Maroudas, D. [University of Massachusetts, Amherst, Amherst, MA 01003 (United States)

    2015-08-15

    The performance of plasma facing components (PFCs) is critical for ITER and future magnetic fusion reactors. The ITER divertor will be tungsten, which is the primary candidate material for future reactors. Recent experiments involving tungsten exposure to low-energy helium plasmas reveal significant surface modification, including the growth of nanometer-scale tendrils of “fuzz” and formation of nanometer-sized bubbles in the near-surface region. The large span of spatial and temporal scales governing plasma surface interactions are among the challenges to modeling divertor performance. Fortunately, recent innovations in computational modeling, increasingly powerful high-performance computers, and improved experimental characterization tools provide a path toward self-consistent, experimentally validated models of PFC and divertor performance. Recent advances in understanding tungsten–helium interactions are reviewed, including such processes as helium clustering, which serve as nuclei for gas bubbles; and trap mutation, dislocation loop punching and bubble bursting; which together initiate surface morphological modification.

  11. High Performance Computing - Power Application Programming Interface Specification Version 1.4

    Laros III, James H. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); DeBonis, David [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Grant, Ryan [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Kelly, Suzanne M. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Levenhagen, Michael J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Olivier, Stephen Lecler [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Pedretti, Kevin [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2016-10-01

    Measuring and controlling the power and energy consumption of high performance computing systems by various components in the software stack is an active research area [13, 3, 5, 10, 4, 21, 19, 16, 7, 17, 20, 18, 11, 1, 6, 14, 12]. Implementations in lower level software layers are beginning to emerge in some production systems, which is very welcome. To be most effective, a portable interface to measurement and control features would significantly facilitate participation by all levels of the software stack. We present a proposal for a standard power Application Programming Interface (API) that endeavors to cover the entire software space, from generic hardware interfaces to the input from the computer facility manager.

  12. Use of high performance computing to examine the effectiveness of aquifer remediation

    Tompson, A.F.B.; Ashby, S.F.; Falgout, R.D.; Smith, S.G.; Fogwell, T.W.; Loosmore, G.A.

    1994-06-01

    Large-scale simulation of fluid flow and chemical migration is being used to study the effectiveness of pump-and-treat restoration of a contaminated, saturated aquifer. A three-element approach focusing on geostatistical representations of heterogeneous aquifers, high-performance computing strategies for simulating flow, migration, and reaction processes in large three-dimensional systems, and highly-resolved simulations of flow and chemical migration in porous formations will be discussed. Results from a preliminary application of this approach to examine pumping behavior at a real, heterogeneous field site will be presented. Future activities will emphasize parallel computations in larger, dynamic, and nonlinear (two-phase) flow problems as well as improved interpretive methods for defining detailed material property distributions

  13. Polymer waveguides for electro-optical integration in data centers and high-performance computers.

    Dangel, Roger; Hofrichter, Jens; Horst, Folkert; Jubin, Daniel; La Porta, Antonio; Meier, Norbert; Soganci, Ibrahim Murat; Weiss, Jonas; Offrein, Bert Jan

    2015-02-23

    To satisfy the intra- and inter-system bandwidth requirements of future data centers and high-performance computers, low-cost low-power high-throughput optical interconnects will become a key enabling technology. To tightly integrate optics with the computing hardware, particularly in the context of CMOS-compatible silicon photonics, optical printed circuit boards using polymer waveguides are considered as a formidable platform. IBM Research has already demonstrated the essential silicon photonics and interconnection building blocks. A remaining challenge is electro-optical packaging, i.e., the connection of the silicon photonics chips with the system. In this paper, we present a new single-mode polymer waveguide technology and a scalable method for building the optical interface between silicon photonics chips and single-mode polymer waveguides.

  14. Analysis and modeling of social influence in high performance computing workloads

    Zheng, Shuai

    2011-01-01

    Social influence among users (e.g., collaboration on a project) creates bursty behavior in the underlying high performance computing (HPC) workloads. Using representative HPC and cluster workload logs, this paper identifies, analyzes, and quantifies the level of social influence across HPC users. We show the existence of a social graph that is characterized by a pattern of dominant users and followers. This pattern also follows a power-law distribution, which is consistent with those observed in mainstream social networks. Given its potential impact on HPC workloads prediction and scheduling, we propose a fast-converging, computationally-efficient online learning algorithm for identifying social groups. Extensive evaluation shows that our online algorithm can (1) quickly identify the social relationships by using a small portion of incoming jobs and (2) can efficiently track group evolution over time. © 2011 Springer-Verlag.

  15. Parallel Backprojection: A Case Study in High-Performance Reconfigurable Computing

    Cordes Ben

    2009-01-01

    Full Text Available High-performance reconfigurable computing (HPRC is a novel approach to provide large-scale computing power to modern scientific applications. Using both general-purpose processors and FPGAs allows application designers to exploit fine-grained and coarse-grained parallelism, achieving high degrees of speedup. One scientific application that benefits from this technique is backprojection, an image formation algorithm that can be used as part of a synthetic aperture radar (SAR processing system. We present an implementation of backprojection for SAR on an HPRC system. Using simulated data taken at a variety of ranges, our implementation runs over 200 times faster than a similar software program, with an overall application speedup better than 50x. The backprojection application is easily parallelizable, achieving near-linear speedup when run on multiple nodes of a clustered HPRC system. The results presented can be applied to other systems and other algorithms with similar characteristics.

  16. Parallel Backprojection: A Case Study in High-Performance Reconfigurable Computing

    2009-03-01

    Full Text Available High-performance reconfigurable computing (HPRC is a novel approach to provide large-scale computing power to modern scientific applications. Using both general-purpose processors and FPGAs allows application designers to exploit fine-grained and coarse-grained parallelism, achieving high degrees of speedup. One scientific application that benefits from this technique is backprojection, an image formation algorithm that can be used as part of a synthetic aperture radar (SAR processing system. We present an implementation of backprojection for SAR on an HPRC system. Using simulated data taken at a variety of ranges, our implementation runs over 200 times faster than a similar software program, with an overall application speedup better than 50x. The backprojection application is easily parallelizable, achieving near-linear speedup when run on multiple nodes of a clustered HPRC system. The results presented can be applied to other systems and other algorithms with similar characteristics.

  17. Challenges and opportunities of modeling plasma–surface interactions in tungsten using high-performance computing

    Wirth, Brian D.; Hammond, K.D.; Krasheninnikov, S.I.; Maroudas, D.

    2015-01-01

    The performance of plasma facing components (PFCs) is critical for ITER and future magnetic fusion reactors. The ITER divertor will be tungsten, which is the primary candidate material for future reactors. Recent experiments involving tungsten exposure to low-energy helium plasmas reveal significant surface modification, including the growth of nanometer-scale tendrils of “fuzz” and formation of nanometer-sized bubbles in the near-surface region. The large span of spatial and temporal scales governing plasma surface interactions are among the challenges to modeling divertor performance. Fortunately, recent innovations in computational modeling, increasingly powerful high-performance computers, and improved experimental characterization tools provide a path toward self-consistent, experimentally validated models of PFC and divertor performance. Recent advances in understanding tungsten–helium interactions are reviewed, including such processes as helium clustering, which serve as nuclei for gas bubbles; and trap mutation, dislocation loop punching and bubble bursting; which together initiate surface morphological modification

  18. 9th International Workshop on Parallel Tools for High Performance Computing

    Hilbrich, Tobias; Niethammer, Christoph; Gracia, José; Nagel, Wolfgang; Resch, Michael

    2016-01-01

    High Performance Computing (HPC) remains a driver that offers huge potentials and benefits for science and society. However, a profound understanding of the computational matters and specialized software is needed to arrive at effective and efficient simulations. Dedicated software tools are important parts of the HPC software landscape, and support application developers. Even though a tool is by definition not a part of an application, but rather a supplemental piece of software, it can make a fundamental difference during the development of an application. Such tools aid application developers in the context of debugging, performance analysis, and code optimization, and therefore make a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools presented and discussed at the 9th International Parallel Tools Workshop held in Dresden, Germany, September 2-3, 2015, which offered an established forum for discussing the latest advances in paral...

  19. Using high performance interconnects in a distributed computing and mass storage environment

    Ernst, M.

    1994-01-01

    Detector Collaborations of the HERA Experiments typically involve more than 500 physicists from a few dozen institutes. These physicists require access to large amounts of data in a fully transparent manner. Important issues include Distributed Mass Storage Management Systems in a Distributed and Heterogeneous Computing Environment. At the very center of a distributed system, including tens of CPUs and network attached mass storage peripherals are the communication links. Today scientists are witnessing an integration of computing and communication technology with the open-quote network close-quote becoming the computer. This contribution reports on a centrally operated computing facility for the HERA Experiments at DESY, including Symmetric Multiprocessor Machines (84 Processors), presently more than 400 GByte of magnetic disk and 40 TB of automoted tape storage, tied together by a HIPPI open-quote network close-quote. Focussing on the High Performance Interconnect technology, details will be provided about the HIPPI based open-quote Backplane close-quote configured around a 20 Gigabit/s Multi Media Router and the performance and efficiency of the related computer interfaces

  20. Human and Robotic Space Mission Use Cases for High-Performance Spaceflight Computing

    Some, Raphael; Doyle, Richard; Bergman, Larry; Whitaker, William; Powell, Wesley; Johnson, Michael; Goforth, Montgomery; Lowry, Michael

    2013-01-01

    Spaceflight computing is a key resource in NASA space missions and a core determining factor of spacecraft capability, with ripple effects throughout the spacecraft, end-to-end system, and mission. Onboard computing can be aptly viewed as a "technology multiplier" in that advances provide direct dramatic improvements in flight functions and capabilities across the NASA mission classes, and enable new flight capabilities and mission scenarios, increasing science and exploration return. Space-qualified computing technology, however, has not advanced significantly in well over ten years and the current state of the practice fails to meet the near- to mid-term needs of NASA missions. Recognizing this gap, the NASA Game Changing Development Program (GCDP), under the auspices of the NASA Space Technology Mission Directorate, commissioned a study on space-based computing needs, looking out 15-20 years. The study resulted in a recommendation to pursue high-performance spaceflight computing (HPSC) for next-generation missions, and a decision to partner with the Air Force Research Lab (AFRL) in this development.

  1. The Centre of High-Performance Scientific Computing, Geoverbund, ABC/J - Geosciences enabled by HPSC

    Kollet, Stefan; Görgen, Klaus; Vereecken, Harry; Gasper, Fabian; Hendricks-Franssen, Harrie-Jan; Keune, Jessica; Kulkarni, Ketan; Kurtz, Wolfgang; Sharples, Wendy; Shrestha, Prabhakar; Simmer, Clemens; Sulis, Mauro; Vanderborght, Jan

    2016-04-01

    The Centre of High-Performance Scientific Computing (HPSC TerrSys) was founded 2011 to establish a centre of competence in high-performance scientific computing in terrestrial systems and the geosciences enabling fundamental and applied geoscientific research in the Geoverbund ABC/J (geoscientfic research alliance of the Universities of Aachen, Cologne, Bonn and the Research Centre Jülich, Germany). The specific goals of HPSC TerrSys are to achieve relevance at the national and international level in (i) the development and application of HPSC technologies in the geoscientific community; (ii) student education; (iii) HPSC services and support also to the wider geoscientific community; and in (iv) the industry and public sectors via e.g., useful applications and data products. A key feature of HPSC TerrSys is the Simulation Laboratory Terrestrial Systems, which is located at the Jülich Supercomputing Centre (JSC) and provides extensive capabilities with respect to porting, profiling, tuning and performance monitoring of geoscientific software in JSC's supercomputing environment. We will present a summary of success stories of HPSC applications including integrated terrestrial model development, parallel profiling and its application from watersheds to the continent; massively parallel data assimilation using physics-based models and ensemble methods; quasi-operational terrestrial water and energy monitoring; and convection permitting climate simulations over Europe. The success stories stress the need for a formalized education of students in the application of HPSC technologies in future.

  2. Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces

    Brown, W. Michael; Wang, Peng; Plimpton, Steven J.; Tharrington, Arnold N.

    2011-01-01

    The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - (1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory, (2) minimizing the amount of code that must be ported for efficient acceleration, (3) utilizing the available processing power from both many-core CPUs and accelerators, and (4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.

  3. Roadmap to the SRS computing architecture

    Johnson, A.

    1994-07-05

    This document outlines the major steps that must be taken by the Savannah River Site (SRS) to migrate the SRS information technology (IT) environment to the new architecture described in the Savannah River Site Computing Architecture. This document proposes an IT environment that is {open_quotes}...standards-based, data-driven, and workstation-oriented, with larger systems being utilized for the delivery of needed information to users in a client-server relationship.{close_quotes} Achieving this vision will require many substantial changes in the computing applications, systems, and supporting infrastructure at the site. This document consists of a set of roadmaps which provide explanations of the necessary changes for IT at the site and describes the milestones that must be completed to finish the migration.

  4. High Performance Computing Facility Operational Assessment, CY 2011 Oak Ridge Leadership Computing Facility

    Baker, Ann E [ORNL; Barker, Ashley D [ORNL; Bland, Arthur S Buddy [ORNL; Boudwin, Kathlyn J. [ORNL; Hack, James J [ORNL; Kendall, Ricky A [ORNL; Messer, Bronson [ORNL; Rogers, James H [ORNL; Shipman, Galen M [ORNL; Wells, Jack C [ORNL; White, Julia C [ORNL; Hudson, Douglas L [ORNL

    2012-02-01

    Oak Ridge National Laboratory's Leadership Computing Facility (OLCF) continues to deliver the most powerful resources in the U.S. for open science. At 2.33 petaflops peak performance, the Cray XT Jaguar delivered more than 1.4 billion core hours in calendar year (CY) 2011 to researchers around the world for computational simulations relevant to national and energy security; advancing the frontiers of knowledge in physical sciences and areas of biological, medical, environmental, and computer sciences; and providing world-class research facilities for the nation's science enterprise. Users reported more than 670 publications this year arising from their use of OLCF resources. Of these we report the 300 in this review that are consistent with guidance provided. Scientific achievements by OLCF users cut across all range scales from atomic to molecular to large-scale structures. At the atomic scale, researchers discovered that the anomalously long half-life of Carbon-14 can be explained by calculating, for the first time, the very complex three-body interactions between all the neutrons and protons in the nucleus. At the molecular scale, researchers combined experimental results from LBL's light source and simulations on Jaguar to discover how DNA replication continues past a damaged site so a mutation can be repaired later. Other researchers combined experimental results from ORNL's Spallation Neutron Source and simulations on Jaguar to reveal the molecular structure of ligno-cellulosic material used in bioethanol production. This year, Jaguar has been used to do billion-cell CFD calculations to develop shock wave compression turbo machinery as a means to meet DOE goals for reducing carbon sequestration costs. General Electric used Jaguar to calculate the unsteady flow through turbo machinery to learn what efficiencies the traditional steady flow assumption is hiding from designers. Even a 1% improvement in turbine design can save the nation

  5. Computer Architecture for Energy Efficient SFQ

    2014-08-27

    IBM Corporation (T.J. Watson Research Laboratory) 1101 Kitchawan Road Yorktown Heights, NY 10598 -0000 2 ABSTRACT Number of Papers published in peer...accomplished during this ARO-sponsored project at IBM Research to identify and model an energy efficient SFQ-based computer architecture. The... IBM Windsor Blue (WB), illustrated schematically in Figure 2. The basic building block of WB is a "tile" comprised of a 64-bit arithmetic logic unit

  6. High Performance Computing Facility Operational Assessment, FY 2011 Oak Ridge Leadership Computing Facility

    Baker, Ann E [ORNL; Bland, Arthur S Buddy [ORNL; Hack, James J [ORNL; Barker, Ashley D [ORNL; Boudwin, Kathlyn J. [ORNL; Kendall, Ricky A [ORNL; Messer, Bronson [ORNL; Rogers, James H [ORNL; Shipman, Galen M [ORNL; Wells, Jack C [ORNL; White, Julia C [ORNL

    2011-08-01

    Oak Ridge National Laboratory's Leadership Computing Facility (OLCF) continues to deliver the most powerful resources in the U.S. for open science. At 2.33 petaflops peak performance, the Cray XT Jaguar delivered more than 1.5 billion core hours in calendar year (CY) 2010 to researchers around the world for computational simulations relevant to national and energy security; advancing the frontiers of knowledge in physical sciences and areas of biological, medical, environmental, and computer sciences; and providing world-class research facilities for the nation's science enterprise. Scientific achievements by OLCF users range from collaboration with university experimentalists to produce a working supercapacitor that uses atom-thick sheets of carbon materials to finely determining the resolution requirements for simulations of coal gasifiers and their components, thus laying the foundation for development of commercial-scale gasifiers. OLCF users are pushing the boundaries with software applications sustaining more than one petaflop of performance in the quest to illuminate the fundamental nature of electronic devices. Other teams of researchers are working to resolve predictive capabilities of climate models, to refine and validate genome sequencing, and to explore the most fundamental materials in nature - quarks and gluons - and their unique properties. Details of these scientific endeavors - not possible without access to leadership-class computing resources - are detailed in Section 4 of this report and in the INCITE in Review. Effective operations of the OLCF play a key role in the scientific missions and accomplishments of its users. This Operational Assessment Report (OAR) will delineate the policies, procedures, and innovations implemented by the OLCF to continue delivering a petaflop-scale resource for cutting-edge research. The 2010 operational assessment of the OLCF yielded recommendations that have been addressed (Reference Section 1) and

  7. Π4U: A high performance computing framework for Bayesian uncertainty quantification of complex models

    Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.

    2015-03-01

    We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.

  8. Π4U: A high performance computing framework for Bayesian uncertainty quantification of complex models

    Hadjidoukas, P.E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.

    2015-01-01

    We present Π4U, 1 an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow

  9. Π4U: A high performance computing framework for Bayesian uncertainty quantification of complex models

    Hadjidoukas, P.E.; Angelikopoulos, P. [Computational Science and Engineering Laboratory, ETH Zürich, CH-8092 (Switzerland); Papadimitriou, C. [Department of Mechanical Engineering, University of Thessaly, GR-38334 Volos (Greece); Koumoutsakos, P., E-mail: petros@ethz.ch [Computational Science and Engineering Laboratory, ETH Zürich, CH-8092 (Switzerland)

    2015-03-01

    We present Π4U,{sup 1} an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.

  10. Application of High Performance Computing to Earthquake Hazard and Disaster Estimation in Urban Area

    Muneo Hori

    2018-02-01

    Full Text Available Integrated earthquake simulation (IES is a seamless simulation of analyzing all processes of earthquake hazard and disaster. There are two difficulties in carrying out IES, namely, the requirement of large-scale computation and the requirement of numerous analysis models for structures in an urban area, and they are solved by taking advantage of high performance computing (HPC and by developing a system of automated model construction. HPC is a key element in developing IES, as it needs to analyze wave propagation and amplification processes in an underground structure; a model of high fidelity for the underground structure exceeds a degree-of-freedom larger than 100 billion. Examples of IES for Tokyo Metropolis are presented; the numerical computation is made by using K computer, the supercomputer of Japan. The estimation of earthquake hazard and disaster for a given earthquake scenario is made by the ground motion simulation and the urban area seismic response simulation, respectively, for the target area of 10,000 m × 10,000 m.

  11. A Framework for Debugging Geoscience Projects in a High Performance Computing Environment

    Baxter, C.; Matott, L.

    2012-12-01

    High performance computing (HPC) infrastructure has become ubiquitous in today's world with the emergence of commercial cloud computing and academic supercomputing centers. Teams of geoscientists, hydrologists and engineers can take advantage of this infrastructure to undertake large research projects - for example, linking one or more site-specific environmental models with soft computing algorithms, such as heuristic global search procedures, to perform parameter estimation and predictive uncertainty analysis, and/or design least-cost remediation systems. However, the size, complexity and distributed nature of these projects can make identifying failures in the associated numerical experiments using conventional ad-hoc approaches both time- consuming and ineffective. To address these problems a multi-tiered debugging framework has been developed. The framework allows for quickly isolating and remedying a number of potential experimental failures, including: failures in the HPC scheduler; bugs in the soft computing code; bugs in the modeling code; and permissions and access control errors. The utility of the framework is demonstrated via application to a series of over 200,000 numerical experiments involving a suite of 5 heuristic global search algorithms and 15 mathematical test functions serving as cheap analogues for the simulation-based optimization of pump-and-treat subsurface remediation systems.

  12. Enabling the ATLAS Experiment at the LHC for High Performance Computing

    AUTHOR|(CDS)2091107; Ereditato, Antonio

    In this thesis, I studied the feasibility of running computer data analysis programs from the Worldwide LHC Computing Grid, in particular large-scale simulations of the ATLAS experiment at the CERN LHC, on current general purpose High Performance Computing (HPC) systems. An approach for integrating HPC systems into the Grid is proposed, which has been implemented and tested on the „Todi” HPC machine at the Swiss National Supercomputing Centre (CSCS). Over the course of the test, more than 500000 CPU-hours of processing time have been provided to ATLAS, which is roughly equivalent to the combined computing power of the two ATLAS clusters at the University of Bern. This showed that current HPC systems can be used to efficiently run large-scale simulations of the ATLAS detector and of the detected physics processes. As a first conclusion of my work, one can argue that, in perspective, running large-scale tasks on a few large machines might be more cost-effective than running on relatively small dedicated com...

  13. High-performance computational fluid dynamics: a custom-code approach

    Fannon, James; Náraigh, Lennon Ó; Loiseau, Jean-Christophe; Valluri, Prashant; Bethune, Iain

    2016-01-01

    We introduce a modified and simplified version of the pre-existing fully parallelized three-dimensional Navier–Stokes flow solver known as TPLS. We demonstrate how the simplified version can be used as a pedagogical tool for the study of computational fluid dynamics (CFDs) and parallel computing. TPLS is at its heart a two-phase flow solver, and uses calls to a range of external libraries to accelerate its performance. However, in the present context we narrow the focus of the study to basic hydrodynamics and parallel computing techniques, and the code is therefore simplified and modified to simulate pressure-driven single-phase flow in a channel, using only relatively simple Fortran 90 code with MPI parallelization, but no calls to any other external libraries. The modified code is analysed in order to both validate its accuracy and investigate its scalability up to 1000 CPU cores. Simulations are performed for several benchmark cases in pressure-driven channel flow, including a turbulent simulation, wherein the turbulence is incorporated via the large-eddy simulation technique. The work may be of use to advanced undergraduate and graduate students as an introductory study in CFDs, while also providing insight for those interested in more general aspects of high-performance computing. (paper)

  14. High-performance computational fluid dynamics: a custom-code approach

    Fannon, James; Loiseau, Jean-Christophe; Valluri, Prashant; Bethune, Iain; Náraigh, Lennon Ó.

    2016-07-01

    We introduce a modified and simplified version of the pre-existing fully parallelized three-dimensional Navier-Stokes flow solver known as TPLS. We demonstrate how the simplified version can be used as a pedagogical tool for the study of computational fluid dynamics (CFDs) and parallel computing. TPLS is at its heart a two-phase flow solver, and uses calls to a range of external libraries to accelerate its performance. However, in the present context we narrow the focus of the study to basic hydrodynamics and parallel computing techniques, and the code is therefore simplified and modified to simulate pressure-driven single-phase flow in a channel, using only relatively simple Fortran 90 code with MPI parallelization, but no calls to any other external libraries. The modified code is analysed in order to both validate its accuracy and investigate its scalability up to 1000 CPU cores. Simulations are performed for several benchmark cases in pressure-driven channel flow, including a turbulent simulation, wherein the turbulence is incorporated via the large-eddy simulation technique. The work may be of use to advanced undergraduate and graduate students as an introductory study in CFDs, while also providing insight for those interested in more general aspects of high-performance computing.

  15. High performance parallel computing of flows in complex geometries: II. Applications

    Gourdain, N; Gicquel, L; Staffelbach, G; Vermorel, O; Duchaine, F; Boussuge, J-F; Poinsot, T

    2009-01-01

    Present regulations in terms of pollutant emissions, noise and economical constraints, require new approaches and designs in the fields of energy supply and transportation. It is now well established that the next breakthrough will come from a better understanding of unsteady flow effects and by considering the entire system and not only isolated components. However, these aspects are still not well taken into account by the numerical approaches or understood whatever the design stage considered. The main challenge is essentially due to the computational requirements inferred by such complex systems if it is to be simulated by use of supercomputers. This paper shows how new challenges can be addressed by using parallel computing platforms for distinct elements of a more complex systems as encountered in aeronautical applications. Based on numerical simulations performed with modern aerodynamic and reactive flow solvers, this work underlines the interest of high-performance computing for solving flow in complex industrial configurations such as aircrafts, combustion chambers and turbomachines. Performance indicators related to parallel computing efficiency are presented, showing that establishing fair criterions is a difficult task for complex industrial applications. Examples of numerical simulations performed in industrial systems are also described with a particular interest for the computational time and the potential design improvements obtained with high-fidelity and multi-physics computing methods. These simulations use either unsteady Reynolds-averaged Navier-Stokes methods or large eddy simulation and deal with turbulent unsteady flows, such as coupled flow phenomena (thermo-acoustic instabilities, buffet, etc). Some examples of the difficulties with grid generation and data analysis are also presented when dealing with these complex industrial applications.

  16. ABINIT: Plane-Wave-Based Density-Functional Theory on High Performance Computers

    Torrent, Marc

    2014-03-01

    For several years, a continuous effort has been produced to adapt electronic structure codes based on Density-Functional Theory to the future computing architectures. Among these codes, ABINIT is based on a plane-wave description of the wave functions which allows to treat systems of any kind. Porting such a code on petascale architectures pose difficulties related to the many-body nature of the DFT equations. To improve the performances of ABINIT - especially for what concerns standard LDA/GGA ground-state and response-function calculations - several strategies have been followed: A full multi-level parallelisation MPI scheme has been implemented, exploiting all possible levels and distributing both computation and memory. It allows to increase the number of distributed processes and could not be achieved without a strong restructuring of the code. The core algorithm used to solve the eigen problem (``Locally Optimal Blocked Congugate Gradient''), a Blocked-Davidson-like algorithm, is based on a distribution of processes combining plane-waves and bands. In addition to the distributed memory parallelization, a full hybrid scheme has been implemented, using standard shared-memory directives (openMP/openACC) or porting some comsuming code sections to Graphics Processing Units (GPU). As no simple performance model exists, the complexity of use has been increased; the code efficiency strongly depends on the distribution of processes among the numerous levels. ABINIT is able to predict the performances of several process distributions and automatically choose the most favourable one. On the other hand, a big effort has been carried out to analyse the performances of the code on petascale architectures, showing which sections of codes have to be improved; they all are related to Matrix Algebra (diagonalisation, orthogonalisation). The different strategies employed to improve the code scalability will be described. They are based on an exploration of new diagonalization

  17. Using the Eclipse Parallel Tools Platform to Assist Earth Science Model Development and Optimization on High Performance Computers

    Alameda, J. C.

    2011-12-01

    Development and optimization of computational science models, particularly on high performance computers, and with the advent of ubiquitous multicore processor systems, practically on every system, has been accomplished with basic software tools, typically, command-line based compilers, debuggers, performance tools that have not changed substantially from the days of serial and early vector computers. However, model complexity, including the complexity added by modern message passing libraries such as MPI, and the need for hybrid code models (such as openMP and MPI) to be able to take full advantage of high performance computers with an increasing core count per shared memory node, has made development and optimization of such codes an increasingly arduous task. Additional architectural developments, such as many-core processors, only complicate the situation further. In this paper, we describe how our NSF-funded project, "SI2-SSI: A Productive and Accessible Development Workbench for HPC Applications Using the Eclipse Parallel Tools Platform" (WHPC) seeks to improve the Eclipse Parallel Tools Platform, an environment designed to support scientific code development targeted at a diverse set of high performance computing systems. Our WHPC project to improve Eclipse PTP takes an application-centric view to improve PTP. We are using a set of scientific applications, each with a variety of challenges, and using PTP to drive further improvements to both the scientific application, as well as to understand shortcomings in Eclipse PTP from an application developer perspective, to drive our list of improvements we seek to make. We are also partnering with performance tool providers, to drive higher quality performance tool integration. We have partnered with the Cactus group at Louisiana State University to improve Eclipse's ability to work with computational frameworks and extremely complex build systems, as well as to develop educational materials to incorporate into

  18. Low-cost, high-performance and efficiency computational photometer design

    Siewert, Sam B.; Shihadeh, Jeries; Myers, Randall; Khandhar, Jay; Ivanov, Vitaly

    2014-05-01

    Researchers at the University of Alaska Anchorage and University of Colorado Boulder have built a low cost high performance and efficiency drop-in-place Computational Photometer (CP) to test in field applications ranging from port security and safety monitoring to environmental compliance monitoring and surveying. The CP integrates off-the-shelf visible spectrum cameras with near to long wavelength infrared detectors and high resolution digital snapshots in a single device. The proof of concept combines three or more detectors into a single multichannel imaging system that can time correlate read-out, capture, and image process all of the channels concurrently with high performance and energy efficiency. The dual-channel continuous read-out is combined with a third high definition digital snapshot capability and has been designed using an FPGA (Field Programmable Gate Array) to capture, decimate, down-convert, re-encode, and transform images from two standard definition CCD (Charge Coupled Device) cameras at 30Hz. The continuous stereo vision can be time correlated to megapixel high definition snapshots. This proof of concept has been fabricated as a fourlayer PCB (Printed Circuit Board) suitable for use in education and research for low cost high efficiency field monitoring applications that need multispectral and three dimensional imaging capabilities. Initial testing is in progress and includes field testing in ports, potential test flights in un-manned aerial systems, and future planned missions to image harsh environments in the arctic including volcanic plumes, ice formation, and arctic marine life.

  19. A C++11 implementation of arbitrary-rank tensors for high-performance computing

    Aragón, Alejandro M.

    2014-11-01

    This article discusses an efficient implementation of tensors of arbitrary rank by using some of the idioms introduced by the recently published C++ ISO Standard (C++11). With the aims at providing a basic building block for high-performance computing, a single Array class template is carefully crafted, from which vectors, matrices, and even higher-order tensors can be created. An expression template facility is also built around the array class template to provide convenient mathematical syntax. As a result, by using templates, an extra high-level layer is added to the C++ language when dealing with algebraic objects and their operations, without compromising performance. The implementation is tested running on both CPU and GPU.

  20. Additive Manufacturing and High-Performance Computing: a Disruptive Latent Technology

    Goodwin, Bruce

    2015-03-01

    This presentation will discuss the relationship between recent advances in Additive Manufacturing (AM) technology, High-Performance Computing (HPC) simulation and design capabilities, and related advances in Uncertainty Quantification (UQ), and then examines their impacts upon national and international security. The presentation surveys how AM accelerates the fabrication process, while HPC combined with UQ provides a fast track for the engineering design cycle. The combination of AM and HPC/UQ almost eliminates the engineering design and prototype iterative cycle, thereby dramatically reducing cost of production and time-to-market. These methods thereby present significant benefits for US national interests, both civilian and military, in an age of austerity. Finally, considering cyber security issues and the advent of the ``cloud,'' these disruptive, currently latent technologies may well enable proliferation and so challenge both nuclear and non-nuclear aspects of international security.

  1. 7th International Workshop on Parallel Tools for High Performance Computing

    Gracia, José; Nagel, Wolfgang; Resch, Michael

    2014-01-01

    Current advances in High Performance Computing (HPC) increasingly impact efficient software development workflows. Programmers for HPC applications need to consider trends such as increased core counts, multiple levels of parallelism, reduced memory per core, and I/O system challenges in order to derive well performing and highly scalable codes. At the same time, the increasing complexity adds further sources of program defects. While novel programming paradigms and advanced system libraries provide solutions for some of these challenges, appropriate supporting tools are indispensable. Such tools aid application developers in debugging, performance analysis, or code optimization and therefore make a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools presented and discussed at the 7th International Parallel Tools Workshop, held in Dresden, Germany, September 3-4, 2013.  

  2. Exploring Infiniband Hardware Virtualization in OpenNebula towards Efficient High-Performance Computing

    Pais Pitta de Lacerda Ruivo, Tiago [IIT, Chicago; Bernabeu Altayo, Gerard [Fermilab; Garzoglio, Gabriele [Fermilab; Timm, Steven [Fermilab; Kim, Hyun-Woo [Fermilab; Noh, Seo-Young [KISTI, Daejeon; Raicu, Ioan [IIT, Chicago

    2014-11-11

    has been widely accepted that software virtualization has a big negative impact on high-performance computing (HPC) application performance. This work explores the potential use of Infiniband hardware virtualization in an OpenNebula cloud towards the efficient support of MPI-based workloads. We have implemented, deployed, and tested an Infiniband network on the FermiCloud private Infrastructure-as-a-Service (IaaS) cloud. To avoid software virtualization towards minimizing the virtualization overhead, we employed a technique called Single Root Input/Output Virtualization (SRIOV). Our solution spanned modifications to the Linux’s Hypervisor as well as the OpenNebula manager. We evaluated the performance of the hardware virtualization on up to 56 virtual machines connected by up to 8 DDR Infiniband network links, with micro-benchmarks (latency and bandwidth) as well as w a MPI-intensive application (the HPL Linpack benchmark).

  3. Compact, open-architecture computed radiography system

    Huang, H.K.; Lim, A.; Kangarloo, H.; Eldredge, S.; Loloyan, M.; Chuang, K.S.

    1990-01-01

    Computed radiography (CR) was introduced in 1982, and its basic system design has not changed. Current CR systems have certain limitations: spatial resolution and signal-to-noise ratios are lower than those of screen-film systems, they are complicated and expensive to build, and they have a closed architecture. The authors of this paper designed and implemented a simpler, lower-cost, compact, open-architecture CR system to overcome some of these limitations. The open-architecture system is a manual-load-single-plate reader that can fit on a desk top. Phosphor images are stored in a local disk and can be sent to any other computer through standard interfaces. Any manufacturer's plate can be read with a scanning time of 90 second for a 35 x 43-cm plate. The standard pixel size is 174 μm and can be adjusted for higher spatial resolution. The data resolution is 12 bits/pixel over an x-ray exposure range of 0.01-100 mR

  4. Cloud CPFP: a shotgun proteomics data analysis pipeline using cloud and high performance computing.

    Trudgian, David C; Mirzaei, Hamid

    2012-12-07

    We have extended the functionality of the Central Proteomics Facilities Pipeline (CPFP) to allow use of remote cloud and high performance computing (HPC) resources for shotgun proteomics data processing. CPFP has been modified to include modular local and remote scheduling for data processing jobs. The pipeline can now be run on a single PC or server, a local cluster, a remote HPC cluster, and/or the Amazon Web Services (AWS) cloud. We provide public images that allow easy deployment of CPFP in its entirety in the AWS cloud. This significantly reduces the effort necessary to use the software, and allows proteomics laboratories to pay for compute time ad hoc, rather than obtaining and maintaining expensive local server clusters. Alternatively the Amazon cloud can be used to increase the throughput of a local installation of CPFP as necessary. We demonstrate that cloud CPFP allows users to process data at higher speed than local installations but with similar cost and lower staff requirements. In addition to the computational improvements, the web interface to CPFP is simplified, and other functionalities are enhanced. The software is under active development at two leading institutions and continues to be released under an open-source license at http://cpfp.sourceforge.net.

  5. High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations

    Pieper, Andreas [Ernst-Moritz-Arndt-Universität Greifswald (Germany); Kreutzer, Moritz [Friedrich-Alexander-Universität Erlangen-Nürnberg (Germany); Alvermann, Andreas, E-mail: alvermann@physik.uni-greifswald.de [Ernst-Moritz-Arndt-Universität Greifswald (Germany); Galgon, Martin [Bergische Universität Wuppertal (Germany); Fehske, Holger [Ernst-Moritz-Arndt-Universität Greifswald (Germany); Hager, Georg [Friedrich-Alexander-Universität Erlangen-Nürnberg (Germany); Lang, Bruno [Bergische Universität Wuppertal (Germany); Wellein, Gerhard [Friedrich-Alexander-Universität Erlangen-Nürnberg (Germany)

    2016-11-15

    We study Chebyshev filter diagonalization as a tool for the computation of many interior eigenvalues of very large sparse symmetric matrices. In this technique the subspace projection onto the target space of wanted eigenvectors is approximated with filter polynomials obtained from Chebyshev expansions of window functions. After the discussion of the conceptual foundations of Chebyshev filter diagonalization we analyze the impact of the choice of the damping kernel, search space size, and filter polynomial degree on the computational accuracy and effort, before we describe the necessary steps towards a parallel high-performance implementation. Because Chebyshev filter diagonalization avoids the need for matrix inversion it can deal with matrices and problem sizes that are presently not accessible with rational function methods based on direct or iterative linear solvers. To demonstrate the potential of Chebyshev filter diagonalization for large-scale problems of this kind we include as an example the computation of the 10{sup 2} innermost eigenpairs of a topological insulator matrix with dimension 10{sup 9} derived from quantum physics applications.

  6. Analysis and Modeling of Social In uence in High Performance Computing Workloads

    Zheng, Shuai

    2011-06-01

    High Performance Computing (HPC) is becoming a common tool in many research areas. Social influence (e.g., project collaboration) among increasing users of HPC systems creates bursty behavior in underlying workloads. This bursty behavior is increasingly common with the advent of grid computing and cloud computing. Mining the user bursty behavior is important for HPC workloads prediction and scheduling, which has direct impact on overall HPC computing performance. A representative work in this area is the Mixed User Group Model (MUGM), which clusters users according to the resource demand features of their submissions, such as duration time and parallelism. However, MUGM has some difficulties when implemented in real-world system. First, representing user behaviors by the features of their resource demand is usually difficult. Second, these features are not always available. Third, measuring the similarities among users is not a well-defined problem. In this work, we propose a Social Influence Model (SIM) to identify, analyze, and quantify the level of social influence across HPC users. The advantage of the SIM model is that it finds HPC communities by analyzing user job submission time, thereby avoiding the difficulties of MUGM. An offline algorithm and a fast-converging, computationally-efficient online learning algorithm for identifying social groups are proposed. Both offline and online algorithms are applied on several HPC and grid workloads, including Grid 5000, EGEE 2005 and 2007, and KAUST Supercomputing Lab (KSL) BGP data. From the experimental results, we show the existence of a social graph, which is characterized by a pattern of dominant users and followers. In order to evaluate the effectiveness of identified user groups, we show the pattern discovered by the offline algorithm follows a power-law distribution, which is consistent with those observed in mainstream social networks. We finally conclude the thesis and discuss future directions of our work.

  7. Fast semivariogram computation using FPGA architectures

    Lagadapati, Yamuna; Shirvaikar, Mukul; Dong, Xuanliang

    2015-02-01

    The semivariogram is a statistical measure of the spatial distribution of data and is based on Markov Random Fields (MRFs). Semivariogram analysis is a computationally intensive algorithm that has typically seen applications in the geosciences and remote sensing areas. Recently, applications in the area of medical imaging have been investigated, resulting in the need for efficient real time implementation of the algorithm. The semivariogram is a plot of semivariances for different lag distances between pixels. A semi-variance, γ(h), is defined as the half of the expected squared differences of pixel values between any two data locations with a lag distance of h. Due to the need to examine each pair of pixels in the image or sub-image being processed, the base algorithm complexity for an image window with n pixels is O(n2). Field Programmable Gate Arrays (FPGAs) are an attractive solution for such demanding applications due to their parallel processing capability. FPGAs also tend to operate at relatively modest clock rates measured in a few hundreds of megahertz, but they can perform tens of thousands of calculations per clock cycle while operating in the low range of power. This paper presents a technique for the fast computation of the semivariogram using two custom FPGA architectures. The design consists of several modules dedicated to the constituent computational tasks. A modular architecture approach is chosen to allow for replication of processing units. This allows for high throughput due to concurrent processing of pixel pairs. The current implementation is focused on isotropic semivariogram computations only. Anisotropic semivariogram implementation is anticipated to be an extension of the current architecture, ostensibly based on refinements to the current modules. The algorithm is benchmarked using VHDL on a Xilinx XUPV5-LX110T development Kit, which utilizes the Virtex5 FPGA. Medical image data from MRI scans are utilized for the experiments

  8. High performance in software development

    CERN. Geneva; Haapio, Petri; Liukkonen, Juha-Matti

    2015-01-01

    What are the ingredients of high-performing software? Software development, especially for large high-performance systems, is one the most complex tasks mankind has ever tried. Technological change leads to huge opportunities but challenges our old ways of working. Processing large data sets, possibly in real time or with other tight computational constraints, requires an efficient solution architecture. Efficiency requirements span from the distributed storage and large-scale organization of computation and data onto the lowest level of processor and data bus behavior. Integrating performance behavior over these levels is especially important when the computation is resource-bounded, as it is in numerics: physical simulation, machine learning, estimation of statistical models, etc. For example, memory locality and utilization of vector processing are essential for harnessing the computing power of modern processor architectures due to the deep memory hierarchies of modern general-purpose computers. As a r...

  9. Performance management of high performance computing for medical image processing in Amazon Web Services

    Bao, Shunxing; Damon, Stephen M.; Landman, Bennett A.; Gokhale, Aniruddha

    2016-03-01

    Adopting high performance cloud computing for medical image processing is a popular trend given the pressing needs of large studies. Amazon Web Services (AWS) provide reliable, on-demand, and inexpensive cloud computing services. Our research objective is to implement an affordable, scalable and easy-to-use AWS framework for the Java Image Science Toolkit (JIST). JIST is a plugin for Medical- Image Processing, Analysis, and Visualization (MIPAV) that provides a graphical pipeline implementation allowing users to quickly test and develop pipelines. JIST is DRMAA-compliant allowing it to run on portable batch system grids. However, as new processing methods are implemented and developed, memory may often be a bottleneck for not only lab computers, but also possibly some local grids. Integrating JIST with the AWS cloud alleviates these possible restrictions and does not require users to have deep knowledge of programming in Java. Workflow definition/management and cloud configurations are two key challenges in this research. Using a simple unified control panel, users have the ability to set the numbers of nodes and select from a variety of pre-configured AWS EC2 nodes with different numbers of processors and memory storage. Intuitively, we configured Amazon S3 storage to be mounted by pay-for- use Amazon EC2 instances. Hence, S3 storage is recognized as a shared cloud resource. The Amazon EC2 instances provide pre-installs of all necessary packages to run JIST. This work presents an implementation that facilitates the integration of JIST with AWS. We describe the theoretical cost/benefit formulae to decide between local serial execution versus cloud computing and apply this analysis to an empirical diffusion tensor imaging pipeline.

  10. Performance Management of High Performance Computing for Medical Image Processing in Amazon Web Services.

    Bao, Shunxing; Damon, Stephen M; Landman, Bennett A; Gokhale, Aniruddha

    2016-02-27

    Adopting high performance cloud computing for medical image processing is a popular trend given the pressing needs of large studies. Amazon Web Services (AWS) provide reliable, on-demand, and inexpensive cloud computing services. Our research objective is to implement an affordable, scalable and easy-to-use AWS framework for the Java Image Science Toolkit (JIST). JIST is a plugin for Medical-Image Processing, Analysis, and Visualization (MIPAV) that provides a graphical pipeline implementation allowing users to quickly test and develop pipelines. JIST is DRMAA-compliant allowing it to run on portable batch system grids. However, as new processing methods are implemented and developed, memory may often be a bottleneck for not only lab computers, but also possibly some local grids. Integrating JIST with the AWS cloud alleviates these possible restrictions and does not require users to have deep knowledge of programming in Java. Workflow definition/management and cloud configurations are two key challenges in this research. Using a simple unified control panel, users have the ability to set the numbers of nodes and select from a variety of pre-configured AWS EC2 nodes with different numbers of processors and memory storage. Intuitively, we configured Amazon S3 storage to be mounted by pay-for-use Amazon EC2 instances. Hence, S3 storage is recognized as a shared cloud resource. The Amazon EC2 instances provide pre-installs of all necessary packages to run JIST. This work presents an implementation that facilitates the integration of JIST with AWS. We describe the theoretical cost/benefit formulae to decide between local serial execution versus cloud computing and apply this analysis to an empirical diffusion tensor imaging pipeline.

  11. Towards Portable Large-Scale Image Processing with High-Performance Computing.

    Huo, Yuankai; Blaber, Justin; Damon, Stephen M; Boyd, Brian D; Bao, Shunxing; Parvathaneni, Prasanna; Noguera, Camilo Bermudez; Chaganti, Shikha; Nath, Vishwesh; Greer, Jasmine M; Lyu, Ilwoo; French, William R; Newton, Allen T; Rogers, Baxter P; Landman, Bennett A

    2018-05-03

    High-throughput, large-scale medical image computing demands tight integration of high-performance computing (HPC) infrastructure for data storage, job distribution, and image processing. The Vanderbilt University Institute for Imaging Science (VUIIS) Center for Computational Imaging (CCI) has constructed a large-scale image storage and processing infrastructure that is composed of (1) a large-scale image database using the eXtensible Neuroimaging Archive Toolkit (XNAT), (2) a content-aware job scheduling platform using the Distributed Automation for XNAT pipeline automation tool (DAX), and (3) a wide variety of encapsulated image processing pipelines called "spiders." The VUIIS CCI medical image data storage and processing infrastructure have housed and processed nearly half-million medical image volumes with Vanderbilt Advanced Computing Center for Research and Education (ACCRE), which is the HPC facility at the Vanderbilt University. The initial deployment was natively deployed (i.e., direct installations on a bare-metal server) within the ACCRE hardware and software environments, which lead to issues of portability and sustainability. First, it could be laborious to deploy the entire VUIIS CCI medical image data storage and processing infrastructure to another HPC center with varying hardware infrastructure, library availability, and software permission policies. Second, the spiders were not developed in an isolated manner, which has led to software dependency issues during system upgrades or remote software installation. To address such issues, herein, we describe recent innovations using containerization techniques with XNAT/DAX which are used to isolate the VUIIS CCI medical image data storage and processing infrastructure from the underlying hardware and software environments. The newly presented XNAT/DAX solution has the following new features: (1) multi-level portability from system level to the application level, (2) flexible and dynamic software

  12. Exploring Subpixel Learning Algorithms for Estimating Global Land Cover Fractions from Satellite Data Using High Performance Computing

    Uttam Kumar

    2017-10-01

    Full Text Available Land cover (LC refers to the physical and biological cover present over the Earth’s surface in terms of the natural environment such as vegetation, water, bare soil, etc. Most LC features occur at finer spatial scales compared to the resolution of primary remote sensing satellites. Therefore, observed data are a mixture of spectral signatures of two or more LC features resulting in mixed pixels. One solution to the mixed pixel problem is the use of subpixel learning algorithms to disintegrate the pixel spectrum into its constituent spectra. Despite the popularity and existing research conducted on the topic, the most appropriate approach is still under debate. As an attempt to address this question, we compared the performance of several subpixel learning algorithms based on least squares, sparse regression, signal–subspace and geometrical methods. Analysis of the results obtained through computer-simulated and Landsat data indicated that fully constrained least squares (FCLS outperformed the other techniques. Further, FCLS was used to unmix global Web-Enabled Landsat Data to obtain abundances of substrate (S, vegetation (V and dark object (D classes. Due to the sheer nature of data and computational needs, we leveraged the NASA Earth Exchange (NEX high-performance computing architecture to optimize and scale our algorithm for large-scale processing. Subsequently, the S-V-D abundance maps were characterized into four classes, namely forest, farmland, water and urban areas (in conjunction with nighttime lights data over California, USA using a random forest classifier. Validation of these LC maps with the National Land Cover Database 2011 products and North American Forest Dynamics static forest map shows a 6% improvement in unmixing-based classification relative to per-pixel classification. As such, abundance maps continue to offer a useful alternative to high-spatial-resolution classified maps for forest inventory analysis, multi

  13. Improving the Eco-Efficiency of High Performance Computing Clusters Using EECluster

    Alberto Cocaña-Fernández

    2016-03-01

    Full Text Available As data and supercomputing centres increase their performance to improve service quality and target more ambitious challenges every day, their carbon footprint also continues to grow, and has already reached the magnitude of the aviation industry. Also, high power consumptions are building up to a remarkable bottleneck for the expansion of these infrastructures in economic terms due to the unavailability of sufficient energy sources. A substantial part of the problem is caused by current energy consumptions of High Performance Computing (HPC clusters. To alleviate this situation, we present in this work EECluster, a tool that integrates with multiple open-source Resource Management Systems to significantly reduce the carbon footprint of clusters by improving their energy efficiency. EECluster implements a dynamic power management mechanism based on Computational Intelligence techniques by learning a set of rules through multi-criteria evolutionary algorithms. This approach enables cluster operators to find the optimal balance between a reduction in the cluster energy consumptions, service quality, and number of reconfigurations. Experimental studies using both synthetic and actual workloads from a real world cluster support the adoption of this tool to reduce the carbon footprint of HPC clusters.

  14. Accelerated Synchrotron X-ray Diffraction Data Analysis on a Heterogeneous High Performance Computing System

    Qin, J; Bauer, M A, E-mail: qin.jinhui@gmail.com, E-mail: bauer@uwo.ca [Computer Science Department, University of Western Ontario, London, ON N6A 5B7 (Canada)

    2010-11-01

    The analysis of synchrotron X-ray Diffraction (XRD) data has been used by scientists and engineers to understand and predict properties of materials. However, the large volume of XRD image data and the intensive computations involved in the data analysis makes it hard for researchers to quickly reach any conclusions about the images from an experiment when using conventional XRD data analysis software. Synchrotron time is valuable and delays in XRD data analysis can impact decisions about subsequent experiments or about materials that they are investigating. In order to improve the data analysis performance, ideally to achieve near real time data analysis during an XRD experiment, we designed and implemented software for accelerated XRD data analysis. The software has been developed for a heterogeneous high performance computing (HPC) system, comprised of IBM PowerXCell 8i processors and Intel quad-core Xeon processors. This paper describes the software and reports on the improved performance. The results indicate that it is possible for XRD data to be analyzed at the rate it is being produced.

  15. Accelerated Synchrotron X-ray Diffraction Data Analysis on a Heterogeneous High Performance Computing System

    Qin, J; Bauer, M A

    2010-01-01

    The analysis of synchrotron X-ray Diffraction (XRD) data has been used by scientists and engineers to understand and predict properties of materials. However, the large volume of XRD image data and the intensive computations involved in the data analysis makes it hard for researchers to quickly reach any conclusions about the images from an experiment when using conventional XRD data analysis software. Synchrotron time is valuable and delays in XRD data analysis can impact decisions about subsequent experiments or about materials that they are investigating. In order to improve the data analysis performance, ideally to achieve near real time data analysis during an XRD experiment, we designed and implemented software for accelerated XRD data analysis. The software has been developed for a heterogeneous high performance computing (HPC) system, comprised of IBM PowerXCell 8i processors and Intel quad-core Xeon processors. This paper describes the software and reports on the improved performance. The results indicate that it is possible for XRD data to be analyzed at the rate it is being produced.

  16. Scalable domain decomposition solvers for stochastic PDEs in high performance computing

    Desai, Ajit; Pettit, Chris; Poirel, Dominique; Sarkar, Abhijit

    2017-01-01

    Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolution in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.

  17. High performance optical encryption based on computational ghost imaging with QR code and compressive sensing technique

    Zhao, Shengmei; Wang, Le; Liang, Wenqiang; Cheng, Weiwen; Gong, Longyan

    2015-10-01

    In this paper, we propose a high performance optical encryption (OE) scheme based on computational ghost imaging (GI) with QR code and compressive sensing (CS) technique, named QR-CGI-OE scheme. N random phase screens, generated by Alice, is a secret key and be shared with its authorized user, Bob. The information is first encoded by Alice with QR code, and the QR-coded image is then encrypted with the aid of computational ghost imaging optical system. Here, measurement results from the GI optical system's bucket detector are the encrypted information and be transmitted to Bob. With the key, Bob decrypts the encrypted information to obtain the QR-coded image with GI and CS techniques, and further recovers the information by QR decoding. The experimental and numerical simulated results show that the authorized users can recover completely the original image, whereas the eavesdroppers can not acquire any information about the image even the eavesdropping ratio (ER) is up to 60% at the given measurement times. For the proposed scheme, the number of bits sent from Alice to Bob are reduced considerably and the robustness is enhanced significantly. Meantime, the measurement times in GI system is reduced and the quality of the reconstructed QR-coded image is improved.

  18. Applying Machine Learning and High Performance Computing to Water Quality Assessment and Prediction

    Ruijian Zhang

    2017-12-01

    Full Text Available Water quality assessment and prediction is a more and more important issue. Traditional ways either take lots of time or they can only do assessments. In this research, by applying machine learning algorithm to a long period time of water attributes’ data; we can generate a decision tree so that it can predict the future day’s water quality in an easy and efficient way. The idea is to combine the traditional ways and the computer algorithms together. Using machine learning algorithms, the assessment of water quality will be far more efficient, and by generating the decision tree, the prediction will be quite accurate. The drawback of the machine learning modeling is that the execution takes quite long time, especially when we employ a better accuracy but more time-consuming algorithm in clustering. Therefore, we applied the high performance computing (HPC System to deal with this problem. Up to now, the pilot experiments have achieved very promising preliminary results. The visualized water quality assessment and prediction obtained from this project would be published in an interactive website so that the public and the environmental managers could use the information for their decision making.

  19. Using High Performance Computing to Examine the Processes of Neurogenesis Underlying Pattern Separation/Completion of Episodic Information.

    Aimone, James Bradley [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Betty, Rita [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2015-03-01

    Using High Performance Computing to Examine the Processes of Neurogenesis Underlying Pattern Separation/Completion of Episodic Information - Sandia researchers developed novel methods and metrics for studying the computational function of neurogenesis, thus generating substantial impact to the neuroscience and neural computing communities. This work could benefit applications in machine learning and other analysis activities.

  20. Developing a Distributed Computing Architecture at Arizona State University.

    Armann, Neil; And Others

    1994-01-01

    Development of Arizona State University's computing architecture, designed to ensure that all new distributed computing pieces will work together, is described. Aspects discussed include the business rationale, the general architectural approach, characteristics and objectives of the architecture, specific services, and impact on the university…

  1. A Computational Architecture for Programmable Automation Research

    Taylor, Russell H.; Korein, James U.; Maier, Georg E.; Durfee, Lawrence F.

    1987-03-01

    This short paper describes recent work at the IBM T. J. Watson Research Center directed at developing a highly flexible computational architecture for research on sensor-based programmable automation. The system described here has been designed with a focus on dynamic configurability, layered user inter-faces and incorporation of sensor-based real time operations into new commands. It is these features which distinguish it from earlier work. The system is cur-rently being implemented at IBM for research purposes and internal use and is an outgrowth of programmable automation research which has been ongoing since 1972 [e.g., 1, 2, 3, 4, 5, 6] .

  2. ELASTIC CLOUD COMPUTING ARCHITECTURE AND SYSTEM FOR HETEROGENEOUS SPATIOTEMPORAL COMPUTING

    X. Shi

    2017-10-01

    Full Text Available Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs, while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.

  3. Elastic Cloud Computing Architecture and System for Heterogeneous Spatiotemporal Computing

    Shi, X.

    2017-10-01

    Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs), while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC) or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA) may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.

  4. An Interactive, Web-based High Performance Modeling Environment for Computational Epidemiology.

    Deodhar, Suruchi; Bisset, Keith R; Chen, Jiangzhuo; Ma, Yifei; Marathe, Madhav V

    2014-07-01

    We present an integrated interactive modeling environment to support public health epidemiology. The environment combines a high resolution individual-based model with a user-friendly web-based interface that allows analysts to access the models and the analytics back-end remotely from a desktop or a mobile device. The environment is based on a loosely-coupled service-oriented-architecture that allows analysts to explore various counter factual scenarios. As the modeling tools for public health epidemiology are getting more sophisticated, it is becoming increasingly hard for non-computational scientists to effectively use the systems that incorporate such models. Thus an important design consideration for an integrated modeling environment is to improve ease of use such that experimental simulations can be driven by the users. This is achieved by designing intuitive and user-friendly interfaces that allow users to design and analyze a computational experiment and steer the experiment based on the state of the system. A key feature of a system that supports this design goal is the ability to start, stop, pause and roll-back the disease propagation and intervention application process interactively. An analyst can access the state of the system at any point in time and formulate dynamic interventions based on additional information obtained through state assessment. In addition, the environment provides automated services for experiment set-up and management, thus reducing the overall time for conducting end-to-end experimental studies. We illustrate the applicability of the system by describing computational experiments based on realistic pandemic planning scenarios. The experiments are designed to demonstrate the system's capability and enhanced user productivity.

  5. Strengthening LLNL Missions through Laboratory Directed Research and Development in High Performance Computing

    Willis, D. K. [Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States)

    2016-12-01

    High performance computing (HPC) has been a defining strength of Lawrence Livermore National Laboratory (LLNL) since its founding. Livermore scientists have designed and used some of the world’s most powerful computers to drive breakthroughs in nearly every mission area. Today, the Laboratory is recognized as a world leader in the application of HPC to complex science, technology, and engineering challenges. Most importantly, HPC has been integral to the National Nuclear Security Administration’s (NNSA’s) Stockpile Stewardship Program—designed to ensure the safety, security, and reliability of our nuclear deterrent without nuclear testing. A critical factor behind Lawrence Livermore’s preeminence in HPC is the ongoing investments made by the Laboratory Directed Research and Development (LDRD) Program in cutting-edge concepts to enable efficient utilization of these powerful machines. Congress established the LDRD Program in 1991 to maintain the technical vitality of the Department of Energy (DOE) national laboratories. Since then, LDRD has been, and continues to be, an essential tool for exploring anticipated needs that lie beyond the planning horizon of our programs and for attracting the next generation of talented visionaries. Through LDRD, Livermore researchers can examine future challenges, propose and explore innovative solutions, and deliver creative approaches to support our missions. The present scientific and technical strengths of the Laboratory are, in large part, a product of past LDRD investments in HPC. Here, we provide seven examples of LDRD projects from the past decade that have played a critical role in building LLNL’s HPC, computer science, mathematics, and data science research capabilities, and describe how they have impacted LLNL’s mission.

  6. Direct numerical simulation of reactor two-phase flows enabled by high-performance computing

    Fang, Jun; Cambareri, Joseph J.; Brown, Cameron S.; Feng, Jinyong; Gouws, Andre; Li, Mengnan; Bolotnov, Igor A.

    2018-04-01

    Nuclear reactor two-phase flows remain a great engineering challenge, where the high-resolution two-phase flow database which can inform practical model development is still sparse due to the extreme reactor operation conditions and measurement difficulties. Owing to the rapid growth of computing power, the direct numerical simulation (DNS) is enjoying a renewed interest in investigating the related flow problems. A combination between DNS and an interface tracking method can provide a unique opportunity to study two-phase flows based on first principles calculations. More importantly, state-of-the-art high-performance computing (HPC) facilities are helping unlock this great potential. This paper reviews the recent research progress of two-phase flow DNS related to reactor applications. The progress in large-scale bubbly flow DNS has been focused not only on the sheer size of those simulations in terms of resolved Reynolds number, but also on the associated advanced modeling and analysis techniques. Specifically, the current areas of active research include modeling of sub-cooled boiling, bubble coalescence, as well as the advanced post-processing toolkit for bubbly flow simulations in reactor geometries. A novel bubble tracking method has been developed to track the evolution of bubbles in two-phase bubbly flow. Also, spectral analysis of DNS database in different geometries has been performed to investigate the modulation of the energy spectrum slope due to bubble-induced turbulence. In addition, the single-and two-phase analysis results are presented for turbulent flows within the pressurized water reactor (PWR) core geometries. The related simulations are possible to carry out only with the world leading HPC platforms. These simulations are allowing more complex turbulence model development and validation for use in 3D multiphase computational fluid dynamics (M-CFD) codes.

  7. Tackling some of the most intricate geophysical challenges via high-performance computing

    Khosronejad, A.

    2016-12-01

    Recently, world has been witnessing significant enhancements in computing power of supercomputers. Computer clusters in conjunction with the advanced mathematical algorithms has set the stage for developing and applying powerful numerical tools to tackle some of the most intricate geophysical challenges that today`s engineers face. One such challenge is to understand how turbulent flows, in real-world settings, interact with (a) rigid and/or mobile complex bed bathymetry of waterways and sea-beds in the coastal areas; (b) objects with complex geometry that are fully or partially immersed; and (c) free-surface of waterways and water surface waves in the coastal area. This understanding is especially important because the turbulent flows in real-world environments are often bounded by geometrically complex boundaries, which dynamically deform and give rise to multi-scale and multi-physics transport phenomena, and characterized by multi-lateral interactions among various phases (e.g. air/water/sediment phases). Herein, I present some of the multi-scale and multi-physics geophysical fluid mechanics processes that I have attempted to study using an in-house high-performance computational model, the so-called VFS-Geophysics. More specifically, I will present the simulation results of turbulence/sediment/solute/turbine interactions in real-world settings. Parts of the simulations I present are performed to gain scientific insights into the processes such as sand wave formation (A. Khosronejad, and F. Sotiropoulos, (2014), Numerical simulation of sand waves in a turbulent open channel flow, Journal of Fluid Mechanics, 753:150-216), while others are carried out to predict the effects of climate change and large flood events on societal infrastructures ( A. Khosronejad, et al., (2016), Large eddy simulation of turbulence and solute transport in a forested headwater stream, Journal of Geophysical Research:, doi: 10.1002/2014JF003423).

  8. Computer architecture evaluation for structural dynamics computations: Project summary

    Standley, Hilda M.

    1989-01-01

    The intent of the proposed effort is the examination of the impact of the elements of parallel architectures on the performance realized in a parallel computation. To this end, three major projects are developed: a language for the expression of high level parallelism, a statistical technique for the synthesis of multicomputer interconnection networks based upon performance prediction, and a queueing model for the analysis of shared memory hierarchies.

  9. Experimental high energy physics and modern computer architectures

    Hoek, J.

    1988-06-01

    The paper examines how experimental High Energy Physics can use modern computer architectures efficiently. In this connection parallel and vector architectures are investigated, and the types available at the moment for general use are discussed. A separate section briefly describes some architectures that are either a combination of both, or exemplify other architectures. In an appendix some directions in which computing seems to be developing in the USA are mentioned. (author)

  10. A Lightweight, High-performance I/O Management Package for Data-intensive Computing

    Wang, Jun

    2011-06-22

    Our group has been working with ANL collaborators on the topic bridging the gap between parallel file system and local file system during the course of this project period. We visited Argonne National Lab -- Dr. Robert Ross's group for one week in the past summer 2007. We looked over our current project progress and planned the activities for the incoming years 2008-09. The PI met Dr. Robert Ross several times such as HEC FSIO workshop 08, SC08 and SC10. We explored the opportunities to develop a production system by leveraging our current prototype to (SOGP+PVFS) a new PVFS version. We delivered SOGP+PVFS codes to ANL PVFS2 group in 2008.We also talked about exploring a potential project on developing new parallel programming models and runtime systems for data-intensive scalable computing (DISC). The methodology is to evolve MPI towards DISC by incorporating some functions of Google MapReduce parallel programming model. More recently, we are together exploring how to leverage existing works to perform (1) coordination/aggregation of local I/O operations prior to movement over the WAN, (2) efficient bulk data movement over the WAN, (3) latency hiding techniques for latency-intensive operations. Since 2009, we start applying Hadoop/MapReduce to some HEC applications with LANL scientists John Bent and Salman Habib. Another on-going work is to improve checkpoint performance at I/O forwarding Layer for the Road Runner super computer with James Nuetz and Gary Gridder at LANL. Two senior undergraduates from our research group did summer internships about high-performance file and storage system projects in LANL since 2008 for consecutive three years. Both of them are now pursuing Ph.D. degree in our group and will be 4th year in the PhD program in Fall 2011 and go to LANL to advance two above-mentioned works during this winter break. Since 2009, we have been collaborating with several computer scientists (Gary Grider, John bent, Parks Fields, James Nunez, Hsing

  11. SiGe epitaxial memory for neuromorphic computing with reproducible high performance based on engineered dislocations

    Choi, Shinhyun; Tan, Scott H.; Li, Zefan; Kim, Yunjo; Choi, Chanyeol; Chen, Pai-Yu; Yeon, Hanwool; Yu, Shimeng; Kim, Jeehwan

    2018-01-01

    Although several types of architecture combining memory cells and transistors have been used to demonstrate artificial synaptic arrays, they usually present limited scalability and high power consumption. Transistor-free analog switching devices may overcome these limitations, yet the typical switching process they rely on—formation of filaments in an amorphous medium—is not easily controlled and hence hampers the spatial and temporal reproducibility of the performance. Here, we demonstrate analog resistive switching devices that possess desired characteristics for neuromorphic computing networks with minimal performance variations using a single-crystalline SiGe layer epitaxially grown on Si as a switching medium. Such epitaxial random access memories utilize threading dislocations in SiGe to confine metal filaments in a defined, one-dimensional channel. This confinement results in drastically enhanced switching uniformity and long retention/high endurance with a high analog on/off ratio. Simulations using the MNIST handwritten recognition data set prove that epitaxial random access memories can operate with an online learning accuracy of 95.1%.

  12. National cyber defense high performance computing and analysis : concepts, planning and roadmap.

    Hamlet, Jason R.; Keliiaa, Curtis M.

    2010-09-01

    There is a national cyber dilemma that threatens the very fabric of government, commercial and private use operations worldwide. Much is written about 'what' the problem is, and though the basis for this paper is an assessment of the problem space, we target the 'how' solution space of the wide-area national information infrastructure through the advancement of science, technology, evaluation and analysis with actionable results intended to produce a more secure national information infrastructure and a comprehensive national cyber defense capability. This cybersecurity High Performance Computing (HPC) analysis concepts, planning and roadmap activity was conducted as an assessment of cybersecurity analysis as a fertile area of research and investment for high value cybersecurity wide-area solutions. This report and a related SAND2010-4765 Assessment of Current Cybersecurity Practices in the Public Domain: Cyber Indications and Warnings Domain report are intended to provoke discussion throughout a broad audience about developing a cohesive HPC centric solution to wide-area cybersecurity problems.

  13. Analysis of Application Power and Schedule Composition in a High Performance Computing Environment

    Elmore, Ryan [National Renewable Energy Lab. (NREL), Golden, CO (United States); Gruchalla, Kenny [National Renewable Energy Lab. (NREL), Golden, CO (United States); Phillips, Caleb [National Renewable Energy Lab. (NREL), Golden, CO (United States); Purkayastha, Avi [National Renewable Energy Lab. (NREL), Golden, CO (United States); Wunder, Nick [National Renewable Energy Lab. (NREL), Golden, CO (United States)

    2016-01-05

    As the capacity of high performance computing (HPC) systems continues to grow, small changes in energy management have the potential to produce significant energy savings. In this paper, we employ an extensive informatics system for aggregating and analyzing real-time performance and power use data to evaluate energy footprints of jobs running in an HPC data center. We look at the effects of algorithmic choices for a given job on the resulting energy footprints, and analyze application-specific power consumption, and summarize average power use in the aggregate. All of these views reveal meaningful power variance between classes of applications as well as chosen methods for a given job. Using these data, we discuss energy-aware cost-saving strategies based on reordering the HPC job schedule. Using historical job and power data, we present a hypothetical job schedule reordering that: (1) reduces the facility's peak power draw and (2) manages power in conjunction with a large-scale photovoltaic array. Lastly, we leverage this data to understand the practical limits on predicting key power use metrics at the time of submission.

  14. Performance Analysis of Ivshmem for High-Performance Computing in Virtual Machines

    Ivanovic, Pavle; Richter, Harald

    2018-01-01

    High-Performance computing (HPC) is rarely accomplished via virtual machines (VMs). In this paper, we present a remake of ivshmem which can change this. Ivshmem was a shared memory (SHM) between virtual machines on the same server, with SHM-access synchronization included, until about 5 years ago when newer versions of Linux and its virtualization library libvirt evolved. We restored that SHM-access synchronization feature because it is indispensable for HPC and made ivshmem runnable with contemporary versions of Linux, libvirt, KVM, QEMU and especially MPICH, which is an implementation of MPI - the standard HPC communication library. Additionally, MPICH was transparently modified by us to get ivshmem included, resulting in a three to ten times performance improvement compared to TCP/IP. Furthermore, we have transparently replaced MPI_PUT, a single-side MPICH communication mechanism, by an own MPI_PUT wrapper. As a result, our ivshmem even surpasses non-virtualized SHM data transfers for block lengths greater than 512 KBytes, showing the benefits of virtualization. All improvements were possible without using SR-IOV.

  15. A parallel calibration utility for WRF-Hydro on high performance computers

    Wang, J.; Wang, C.; Kotamarthi, V. R.

    2017-12-01

    A successful modeling of complex hydrological processes comprises establishing an integrated hydrological model which simulates the hydrological processes in each water regime, calibrates and validates the model performance based on observation data, and estimates the uncertainties from different sources especially those associated with parameters. Such a model system requires large computing resources and often have to be run on High Performance Computers (HPC). The recently developed WRF-Hydro modeling system provides a significant advancement in the capability to simulate regional water cycles more completely. The WRF-Hydro model has a large range of parameters such as those in the input table files — GENPARM.TBL, SOILPARM.TBL and CHANPARM.TBL — and several distributed scaling factors such as OVROUGHRTFAC. These parameters affect the behavior and outputs of the model and thus may need to be calibrated against the observations in order to obtain a good modeling performance. Having a parameter calibration tool specifically for automate calibration and uncertainty estimates of WRF-Hydro model can provide significant convenience for the modeling community. In this study, we developed a customized tool using the parallel version of the model-independent parameter estimation and uncertainty analysis tool, PEST, to enabled it to run on HPC with PBS and SLURM workload manager and job scheduler. We also developed a series of PEST input file templates that are specifically for WRF-Hydro model calibration and uncertainty analysis. Here we will present a flood case study occurred in April 2013 over Midwest. The sensitivity and uncertainties are analyzed using the customized PEST tool we developed.

  16. Outline of a novel architecture for cortical computation

    Majumdar, Kaushik

    2007-01-01

    In this paper a novel architecture for cortical computation has been proposed. This architecture is composed of computing paths consisting of neurons and synapses only. These paths have been decomposed into lateral, longitudinal and vertical components. Cortical computation has then been decomposed into lateral computation (LaC), longitudinal computation (LoC) and vertical computation (VeC). It has been shown that various loop structures in the cortical circuit play important roles in cortica...

  17. DAFS Storage for High Performance Computing Using MPI-I/O: Design and Experience

    Velusamy, Vijay

    2004-01-01

    .... DAFS is a well-defined highperformance protocol for file access across a network as well as set of APIs, uDAFS, for user level OS-bypassing application programming, designed to take advantage of RDMA-based transports, such as Virtual Interface Architecture (VIA), InfiniBand Architecture1, and iWARP.

  18. Development of a Computational Steering Framework for High Performance Computing Environments on Blue Gene/P Systems

    Danani, Bob K.

    2012-07-01

    Computational steering has revolutionized the traditional workflow in high performance computing (HPC) applications. The standard workflow that consists of preparation of an application’s input, running of a simulation, and visualization of simulation results in a post-processing step is now transformed into a real-time interactive workflow that significantly reduces development and testing time. Computational steering provides the capability to direct or re-direct the progress of a simulation application at run-time. It allows modification of application-defined control parameters at run-time using various user-steering applications. In this project, we propose a computational steering framework for HPC environments that provides an innovative solution and easy-to-use platform, which allows users to connect and interact with running application(s) in real-time. This framework uses RealityGrid as the underlying steering library and adds several enhancements to the library to enable steering support for Blue Gene systems. Included in the scope of this project is the development of a scalable and efficient steering relay server that supports many-to-many connectivity between multiple steered applications and multiple steering clients. Steered applications can range from intermediate simulation and physical modeling applications to complex computational fluid dynamics (CFD) applications or advanced visualization applications. The Blue Gene supercomputer presents special challenges for remote access because the compute nodes reside on private networks. This thesis presents an implemented solution and demonstrates it on representative applications. Thorough implementation details and application enablement steps are also presented in this thesis to encourage direct usage of this framework.

  19. High performance computing, supercomputing, náročné počítání

    Okrouhlík, Miloslav

    2003-01-01

    Roč. 10, č. 5 (2003), s. 429-438 ISSN 1210-2717 R&D Projects: GA ČR GA101/02/0072 Institutional research plan: CEZ:AV0Z2076919 Keywords : high performance computing * vector and parallel computers * programing tools for parellelization Subject RIV: BI - Acoustics

  20. Leveraging High Performance Computing for Managing Large and Evolving Data Collections

    Ritu Arora

    2014-10-01

    Full Text Available The process of developing a digital collection in the context of a research project often involves a pipeline pattern during which data growth, data types, and data authenticity need to be assessed iteratively in relation to the different research steps and in the interest of archiving. Throughout a project’s lifecycle curators organize newly generated data while cleaning and integrating legacy data when it exists, and deciding what data will be preserved for the long term. Although these actions should be part of a well-oiled data management workflow, there are practical challenges in doing so if the collection is very large and heterogeneous, or is accessed by several researchers contemporaneously. There is a need for data management solutions that can help curators with efficient and on-demand analyses of their collection so that they remain well-informed about its evolving characteristics. In this paper, we describe our efforts towards developing a workflow to leverage open science High Performance Computing (HPC resources for routinely and efficiently conducting data management tasks on large collections. We demonstrate that HPC resources and techniques can significantly reduce the time for accomplishing critical data management tasks, and enable a dynamic archiving throughout the research process. We use a large archaeological data collection with a long and complex formation history as our test case. We share our experiences in adopting open science HPC resources for large-scale data management, which entails understanding usage of the open source HPC environment and training users. These experiences can be generalized to meet the needs of other data curators working with large collections.

  1. High performance communication by people with paralysis using an intracortical brain-computer interface

    Pandarinath, Chethan; Nuyujukian, Paul; Blabe, Christine H; Sorice, Brittany L; Saab, Jad; Willett, Francis R; Hochberg, Leigh R

    2017-01-01

    Brain-computer interfaces (BCIs) have the potential to restore communication for people with tetraplegia and anarthria by translating neural activity into control signals for assistive communication devices. While previous pre-clinical and clinical studies have demonstrated promising proofs-of-concept (Serruya et al., 2002; Simeral et al., 2011; Bacher et al., 2015; Nuyujukian et al., 2015; Aflalo et al., 2015; Gilja et al., 2015; Jarosiewicz et al., 2015; Wolpaw et al., 1998; Hwang et al., 2012; Spüler et al., 2012; Leuthardt et al., 2004; Taylor et al., 2002; Schalk et al., 2008; Moran, 2010; Brunner et al., 2011; Wang et al., 2013; Townsend and Platsko, 2016; Vansteensel et al., 2016; Nuyujukian et al., 2016; Carmena et al., 2003; Musallam et al., 2004; Santhanam et al., 2006; Hochberg et al., 2006; Ganguly et al., 2011; O’Doherty et al., 2011; Gilja et al., 2012), the performance of human clinical BCI systems is not yet high enough to support widespread adoption by people with physical limitations of speech. Here we report a high-performance intracortical BCI (iBCI) for communication, which was tested by three clinical trial participants with paralysis. The system leveraged advances in decoder design developed in prior pre-clinical and clinical studies (Gilja et al., 2015; Kao et al., 2016; Gilja et al., 2012). For all three participants, performance exceeded previous iBCIs (Bacher et al., 2015; Jarosiewicz et al., 2015) as measured by typing rate (by a factor of 1.4–4.2) and information throughput (by a factor of 2.2–4.0). This high level of performance demonstrates the potential utility of iBCIs as powerful assistive communication devices for people with limited motor function. Clinical Trial No: NCT00912041 DOI: http://dx.doi.org/10.7554/eLife.18554.001 PMID:28220753

  2. LIAR -- A computer program for the modeling and simulation of high performance linacs

    Assmann, R.; Adolphsen, C.; Bane, K.; Emma, P.; Raubenheimer, T.; Siemann, R.; Thompson, K.; Zimmermann, F.

    1997-04-01

    The computer program LIAR (LInear Accelerator Research Code) is a numerical modeling and simulation tool for high performance linacs. Amongst others, it addresses the needs of state-of-the-art linear colliders where low emittance, high-intensity beams must be accelerated to energies in the 0.05-1 TeV range. LIAR is designed to be used for a variety of different projects. LIAR allows the study of single- and multi-particle beam dynamics in linear accelerators. It calculates emittance dilutions due to wakefield deflections, linear and non-linear dispersion and chromatic effects in the presence of multiple accelerator imperfections. Both single-bunch and multi-bunch beams can be simulated. Several basic and advanced optimization schemes are implemented. Present limitations arise from the incomplete treatment of bending magnets and sextupoles. A major objective of the LIAR project is to provide an open programming platform for the accelerator physics community. Due to its design, LIAR allows straight-forward access to its internal FORTRAN data structures. The program can easily be extended and its interactive command language ensures maximum ease of use. Presently, versions of LIAR are compiled for UNIX and MS Windows operating systems. An interface for the graphical visualization of results is provided. Scientific graphs can be saved in the PS and EPS file formats. In addition a Mathematica interface has been developed. LIAR now contains more than 40,000 lines of source code in more than 130 subroutines. This report describes the theoretical basis of the program, provides a reference for existing features and explains how to add further commands. The LIAR home page and the ONLINE version of this manual can be accessed under: http://www.slac.stanford.edu/grp/arb/rwa/liar.htm

  3. High performance computation of landscape genomic models including local indicators of spatial association.

    Stucki, S; Orozco-terWengel, P; Forester, B R; Duruz, S; Colli, L; Masembe, C; Negrini, R; Landguth, E; Jones, M R; Bruford, M W; Taberlet, P; Joost, S

    2017-09-01

    With the increasing availability of both molecular and topo-climatic data, the main challenges facing landscape genomics - that is the combination of landscape ecology with population genomics - include processing large numbers of models and distinguishing between selection and demographic processes (e.g. population structure). Several methods address the latter, either by estimating a null model of population history or by simultaneously inferring environmental and demographic effects. Here we present samβada, an approach designed to study signatures of local adaptation, with special emphasis on high performance computing of large-scale genetic and environmental data sets. samβada identifies candidate loci using genotype-environment associations while also incorporating multivariate analyses to assess the effect of many environmental predictor variables. This enables the inclusion of explanatory variables representing population structure into the models to lower the occurrences of spurious genotype-environment associations. In addition, samβada calculates local indicators of spatial association for candidate loci to provide information on whether similar genotypes tend to cluster in space, which constitutes a useful indication of the possible kinship between individuals. To test the usefulness of this approach, we carried out a simulation study and analysed a data set from Ugandan cattle to detect signatures of local adaptation with samβada, bayenv, lfmm and an F ST outlier method (FDIST approach in arlequin) and compare their results. samβada - an open source software for Windows, Linux and Mac OS X available at http://lasig.epfl.ch/sambada - outperforms other approaches and better suits whole-genome sequence data processing. © 2016 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  4. Teaching Computer Organization and Architecture Using Simulation and FPGA Applications

    D. K.M. Al-Aubidy

    2007-01-01

    This paper presents the design concepts and realization of incorporating micro-operation simulation and FPGA implementation into a teaching tool for computer organization and architecture. This teaching tool helps computer engineering and computer science students to be familiarized practically with computer organization and architecture through the development of their own instruction set, computer programming and interfacing experiments. A two-pass assembler has been designed and implemente...

  5. A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery Using a Probabilistic Learning Framework

    Basu, Saikat; Ganguly, Sangram; Michaelis, Andrew; Votava, Petr; Roy, Anshuman; Mukhopadhyay, Supratik; Nemani, Ramakrishna

    2015-01-01

    Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets, which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.

  6. A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery using a Probabilistic Learning Framework

    Basu, S.; Ganguly, S.; Michaelis, A.; Votava, P.; Roy, A.; Mukhopadhyay, S.; Nemani, R. R.

    2015-12-01

    Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.

  7. Computation, architectural design and fabrication logic

    Larsen, Niels Martin

    2016-01-01

    Digital fabrication and digital form generation can change the way different professions interact in relation to the development and construction of architecture. The technologies can provide a more integrated design process and expand the architectural vocabulary. At Aarhus School of Architectur...

  8. Multilayered architecture of graphene nanosheets and MnO2 nanowires as an electrode material for high-performance supercapacitors

    Wu, Mao-Sung; Lin, Chih-Jui; Ho, Chia-Ling

    2012-01-01

    Highlights: ► Multilayered architecture of the graphene/MnO 2 electrode is fabricated. ► The composite provides horizontal and vertical channels for electrolyte access. ► Graphene (GN) layer provides fast electron conduction in the composite. ► MnO 2 nanowire layer on the GN layer suppresses the oxygen evolution reaction. ► Capacitance behavior is enhanced by the multilayered architecture of GN/MnO 2 . - Abstract: Multilayered graphene/MnO 2 nanocomposite electrode prepared by anodic electrodeposition and electrophoresis exhibited superior capacitive behavior compared to the bare MnO 2 and graphene electrodes. The multilayered architecture provided both the horizontal and vertical channels for electrolyte access during fast charging and discharging. The graphene layer turned out to play an important role in enhancing the electron conduction in the multilayered architecture. Therefore, the improved electrochemical behavior might result from the significantly improved ion transport and electron conduction in the multilayered architecture of the graphene/MnO 2 composite electrode. Furthermore, the MnO 2 nanowire layer coated on the graphene layer could significantly suppress the oxygen evolution reaction, broadening the potential window of water stability.

  9. Polymorphous Computing Architecture (PCA) Application Benchmark 1: Three-Dimensional Radar Data Processing

    Lebak, J

    2001-01-01

    The DARPA Polymorphous Computing Architecture (PCA) program is building advanced computer architectures that can reorganize their computation and communication structures to achieve better overall application performance...

  10. High performance architecture design for large scale fibre-optic sensor arrays using distributed EDFAs and hybrid TDM/DWDM

    Liao, Yi; Austin, Ed; Nash, Philip J.; Kingsley, Stuart A.; Richardson, David J.

    2013-09-01

    A distributed amplified dense wavelength division multiplexing (DWDM) array architecture is presented for interferometric fibre-optic sensor array systems. This architecture employs a distributed erbium-doped fibre amplifier (EDFA) scheme to decrease the array insertion loss, and employs time division multiplexing (TDM) at each wavelength to increase the number of sensors that can be supported. The first experimental demonstration of this system is reported including results which show the potential for multiplexing and interrogating up to 4096 sensors using a single telemetry fibre pair with good system performance. The number can be increased to 8192 by using dual pump sources.

  11. Memory controllers for high-performance and real-time MPSoCs : requirements, architectures, and future trends

    Akesson, K.B.; Huang, Po-Chun; Clermidy, F.; Dutoit, D.; Goossens, K.G.W.; Chang, Yuan-Hao; Kuo, Tei-Wei; Vivet, P.; Wingard, D.

    2011-01-01

    Designing memory controllers for complex real-time and high-performance multi-processor systems-on-chip is challenging, since sufficient capacity and (real-time) performance must be provided in a reliable manner at low cost and with low power consumption. This special session contains four

  12. Outline of a novel architecture for cortical computation.

    Majumdar, Kaushik

    2008-03-01

    In this paper a novel architecture for cortical computation has been proposed. This architecture is composed of computing paths consisting of neurons and synapses. These paths have been decomposed into lateral, longitudinal and vertical components. Cortical computation has then been decomposed into lateral computation (LaC), longitudinal computation (LoC) and vertical computation (VeC). It has been shown that various loop structures in the cortical circuit play important roles in cortical computation as well as in memory storage and retrieval, keeping in conformity with the molecular basis of short and long term memory. A new learning scheme for the brain has also been proposed and how it is implemented within the proposed architecture has been explained. A few mathematical results about the architecture have been proposed, some of which are without proof.

  13. High performance computing and quantum trajectory method in CPU and GPU systems

    Wiśniewska, Joanna; Sawerwain, Marek; Leoński, Wiesław

    2015-01-01

    Nowadays, a dynamic progress in computational techniques allows for development of various methods, which offer significant speed-up of computations, especially those related to the problems of quantum optics and quantum computing. In this work, we propose computational solutions which re-implement the quantum trajectory method (QTM) algorithm in modern parallel computation environments in which multi-core CPUs and modern many-core GPUs can be used. In consequence, new computational routines are developed in more effective way than those applied in other commonly used packages, such as Quantum Optics Toolbox (QOT) for Matlab or QuTIP for Python

  14. WinSCP for Windows File Transfers | High-Performance Computing | NREL

    WinSCP for Windows File Transfers WinSCP for Windows File Transfers WinSCP for can used to securely transfer files between your local computer running Microsoft Windows and a remote computer running Linux

  15. The computer program LIAR for the simulation and modeling of high performance linacs

    Assmann, R.; Adolphsen, C.; Bane, K.; Emma, P.; Raubenheimer, T.O.; Siemann, R.; Thompson, K.; Zimmermann, F.

    1997-07-01

    High performance linear accelerators are the central components of the proposed next generation of linear colliders. They must provide acceleration of up to 750 GeV per beam while maintaining small normalized emittances. Standard simulation programs, mainly developed for storage rings, did not meet the specific requirements for high performance linacs with high bunch charges and strong wakefields. The authors present the program. LIAR (LInear Accelerator Research code) that includes single and multi-bunch wakefield effects, a 6D coupled beam description, specific optimization algorithms and other advanced features. LIAR has been applied to and checked against the existing Stanford Linear Collider (SLC), the linacs of the proposed Next Linear Collider (NLC) and the proposed Linac Coherent Light Source (LCLS) at SLAC. Its modular structure allows easy extension for different purposes. The program is available for UNIX workstations and Windows PC's

  16. Energy Efficiency Evaluation and Benchmarking of AFRL’s Condor High Performance Computer

    2011-08-01

    PlayStation 3 nodes executing the HPL benchmark. When idle, the two PS3s consume 188.49 W on average. At peak HPL performance, the nodes draw an average of...AUG 2011 2. REPORT TYPE CONFERENCE PAPER (Post Print) 3. DATES COVERED (From - To) JAN 2011 – JUN 2011 4 . TITLE AND SUBTITLE ENERGY EFFICIENCY...the High Performance LINPACK (HPL) benchmark while also measuring the energy consumed to achieve such performance. Supercomputers are ranked by

  17. A high-performance data acquisition system for computer-based multichannel analyzer

    Zhou Xinzhi; Bai Rongsheng; Wen Liangbi; Huang Yanwen

    1996-01-01

    A high-performance data acquisition system applied in the multichannel analyzer is designed with single-chip microcomputer system. The paper proposes the principle and the method of realizing the simultaneous data acquisition, the data pre-processing, and the fast bidirectional data transfer by means of direct memory access based on dual-port RAM as well. The measurement for dead or live time of ADC system can also be implemented efficiently by using it

  18. Performance Analysis of Cloud Computing Architectures Using Discrete Event Simulation

    Stocker, John C.; Golomb, Andrew M.

    2011-01-01

    Cloud computing offers the economic benefit of on-demand resource allocation to meet changing enterprise computing needs. However, the flexibility of cloud computing is disadvantaged when compared to traditional hosting in providing predictable application and service performance. Cloud computing relies on resource scheduling in a virtualized network-centric server environment, which makes static performance analysis infeasible. We developed a discrete event simulation model to evaluate the overall effectiveness of organizations in executing their workflow in traditional and cloud computing architectures. The two part model framework characterizes both the demand using a probability distribution for each type of service request as well as enterprise computing resource constraints. Our simulations provide quantitative analysis to design and provision computing architectures that maximize overall mission effectiveness. We share our analysis of key resource constraints in cloud computing architectures and findings on the appropriateness of cloud computing in various applications.

  19. Mesoporous ZnO-NiO architectures for use in a high-performance nonenzymatic glucose sensor

    Liu, Yuanying; Wei, Chengzhen; Hao, Mingming; Zheng, Shasha; Pang, Huan; Zheng, Mingbo

    2014-01-01

    Mesoporous ZnO-NiO architectures were prepared by thermal annealing of zinc-nickel hydroxycarbonate composites. The resulting architectures are shown to be assembled by many mesoporous nanosheets, and this results in a large surface area and a strong synergy between the ZnO and NiO nanoparticles. The material obtained by annealing at 400 °C was used as an electrode that responds to glucose over a wide concentration range (from 0.5 μM to 6.4 mM), with a detection limit as low as 0.5 μM, fast response time (<3 s), and good sensitivity (120.5 μA · mM −1  · cm −2 ). (author)

  20. Achieving High Performance With TCP Over 40 GbE on NUMA Architectures for CMS Data Acquisition

    Bawej, Tomasz; et al.

    2014-01-01

    TCP and the socket abstraction have barely changed over the last two decades, but at the network layer there has been a giant leap from a few megabits to 100 gigabits in bandwidth. At the same time, CPU architectures have evolved into the multicore era and applications are expected to make full use of all available resources. Applications in the data acquisition domain based on the standard socket library running in a Non-Uniform Memory Access (NUMA) architecture are unable to reach full efficiency and scalability without the software being adequately aware about the IRQ (Interrupt Request), CPU and memory affinities. During the first long shutdown of LHC, the CMS DAQ system is going to be upgraded for operation from 2015 onwards and a new software component has been designed and developed in the CMS online framework for transferring data with sockets. This software attempts to wrap the low-level socket library to ease higher-level programming with an API based on an asynchronous event driven model similar to the DAT uDAPL API. It is an event-based application with NUMA optimizations, that allows for a high throughput of data across a large distributed system. This paper describes the architecture, the technologies involved and the performance measurements of the software in the context of the CMS distributed event building.

  1. The Benefits and Complexities of Operating Geographic Information Systems (GIS) in a High Performance Computing (HPC) Environment

    Shute, J.; Carriere, L.; Duffy, D.; Hoy, E.; Peters, J.; Shen, Y.; Kirschbaum, D.

    2017-12-01

    The NASA Center for Climate Simulation (NCCS) at the Goddard Space Flight Center is building and maintaining an Enterprise GIS capability for its stakeholders, to include NASA scientists, industry partners, and the public. This platform is powered by three GIS subsystems operating in a highly-available, virtualized environment: 1) the Spatial Analytics Platform is the primary NCCS GIS and provides users discoverability of the vast DigitalGlobe/NGA raster assets within the NCCS environment; 2) the Disaster Mapping Platform provides mapping and analytics services to NASA's Disaster Response Group; and 3) the internal (Advanced Data Analytics Platform/ADAPT) enterprise GIS provides users with the full suite of Esri and open source GIS software applications and services. All systems benefit from NCCS's cutting edge infrastructure, to include an InfiniBand network for high speed data transfers; a mixed/heterogeneous environment featuring seamless sharing of information between Linux and Windows subsystems; and in-depth system monitoring and warning systems. Due to its co-location with the NCCS Discover High Performance Computing (HPC) environment and the Advanced Data Analytics Platform (ADAPT), the GIS platform has direct access to several large NCCS datasets including DigitalGlobe/NGA, Landsat, MERRA, and MERRA2. Additionally, the NCCS ArcGIS Desktop Windows virtual machines utilize existing NetCDF and OPeNDAP assets for visualization, modelling, and analysis - thus eliminating the need for data duplication. With the advent of this platform, Earth scientists have full access to vast data repositories and the industry-leading tools required for successful management and analysis of these multi-petabyte, global datasets. The full system architecture and integration with scientific datasets will be presented. Additionally, key applications and scientific analyses will be explained, to include the NASA Global Landslide Catalog (GLC) Reporter crowdsourcing application, the

  2. Unified, Cross-Platform, Open-Source Library Package for High-Performance Computing

    Kozacik, Stephen [EM Photonics, Inc., Newark, DE (United States)

    2017-05-15

    Compute power is continually increasing, but this increased performance is largely found in sophisticated computing devices and supercomputer resources that are difficult to use, resulting in under-utilization. We developed a unified set of programming tools that will allow users to take full advantage of the new technology by allowing them to work at a level abstracted away from the platform specifics, encouraging the use of modern computing systems, including government-funded supercomputer facilities.

  3. FLASHRAD: A 3D Rad Hard Memory Module For High Performance Space Computers, Phase I

    National Aeronautics and Space Administration — The computing capabilities of onboard spacecraft are a major limiting factor for accomplishing many classes of future missions. Although technology development...

  4. Very High-Performance Embedded Computing Will Allow Ambitious Space Science Investigation

    Pignol, Michel

    2005-01-01

    .... developed on radiation tolerant technologies. Unfortunately, the microprocessors today available on such technologies have the computing throughput which was available about 10 years ago on the commercial market...

  5. High Performance Computing and Storage Requirements for Biological and Environmental Research Target 2017

    Gerber, Richard [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC); Wasserman, Harvey [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). National Energy Research Scientific Computing Center (NERSC)

    2013-05-01

    The National Energy Research Scientific Computing Center (NERSC) is the primary computing center for the DOE Office of Science, serving approximately 4,500 users working on some 650 projects that involve nearly 600 codes in a wide variety of scientific disciplines. In addition to large-­scale computing and storage resources NERSC provides support and expertise that help scientists make efficient use of its systems. The latest review revealed several key requirements, in addition to achieving its goal of characterizing BER computing and storage needs.

  6. Run-Time Dynamically-Adaptable FPGA-Based Architecture for High-Performance Autonomous Distributed Systems

    Valverde Alcalá, Juan

    2016-01-01

    Esta tesis doctoral se enmarca dentro del campo de los sistemas embebidos reconfigurables, redes de sensores inalámbricas para aplicaciones de altas prestaciones, y computación distribuida. El documento se centra en el estudio de alternativas de procesamiento para sistemas embebidos autónomos distribuidos de altas prestaciones (por sus siglas en inglés, High-Performance Autonomous Distributed Systems (HPADS)), así como su evolución hacia el procesamiento de alta resolución. El estudio se ha ...

  7. Complex matrix multiplication operations with data pre-conditioning in a high performance computing architecture

    Eichenberger, Alexandre E; Gschwind, Michael K; Gunnels, John A

    2014-02-11

    Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register.

  8. VLab: A Science Gateway for Distributed First Principles Calculations in Heterogeneous High Performance Computing Systems

    da Silveira, Pedro Rodrigo Castro

    2014-01-01

    This thesis describes the development and deployment of a cyberinfrastructure for distributed high-throughput computations of materials properties at high pressures and/or temperatures--the Virtual Laboratory for Earth and Planetary Materials--VLab. VLab was developed to leverage the aggregated computational power of grid systems to solve…

  9. On Architectural Acoustics Design using Computer Simulation

    Schmidt, Anne Marie Due; Kirkegaard, Poul Henning

    2004-01-01

    The acoustical quality of a given building, or space within the building, is highly dependent on the architectural design. Architectural acoustics design has in the past been based on simple design rules. However, with a growing complexity in the architectural acoustic and the emergence of potent...... room acoustic simulation programs it is now possible to subjectively analyze and evaluate acoustic properties prior to the actual construction of a facility. With the right tools applied, the acoustic design can become an integrated part of the architectural design process. The aim of the present paper...... this information is discussed. The conclusion of the paper is that the application of acoustical simulation programs is most beneficial in the last of three phases but that an application of the program to the two first phases would be preferable and possible with an improvement of the interface of the program....

  10. High performance homes

    Beim, Anne; Vibæk, Kasper Sánchez

    2014-01-01

    Can prefabrication contribute to the development of high performance homes? To answer this question, this chapter defines high performance in more broadly inclusive terms, acknowledging the technical, architectural, social and economic conditions under which energy consumption and production occur....... Consideration of all these factors is a precondition for a truly integrated practice and as this chapter demonstrates, innovative project delivery methods founded on the manufacturing of prefabricated buildings contribute to the production of high performance homes that are cost effective to construct, energy...

  11. Biomimetic design processes in architecture: morphogenetic and evolutionary computational design

    Menges, Achim

    2012-01-01

    Design computation has profound impact on architectural design methods. This paper explains how computational design enables the development of biomimetic design processes specific to architecture, and how they need to be significantly different from established biomimetic processes in engineering disciplines. The paper first explains the fundamental difference between computer-aided and computational design in architecture, as the understanding of this distinction is of critical importance for the research presented. Thereafter, the conceptual relation and possible transfer of principles from natural morphogenesis to design computation are introduced and the related developments of generative, feature-based, constraint-based, process-based and feedback-based computational design methods are presented. This morphogenetic design research is then related to exploratory evolutionary computation, followed by the presentation of two case studies focusing on the exemplary development of spatial envelope morphologies and urban block morphologies. (paper)

  12. Design for scalability in 3D computer graphics architectures

    Holten-Lund, Hans Erik

    2002-01-01

    This thesis describes useful methods and techniques for designing scalable hybrid parallel rendering architectures for 3D computer graphics. Various techniques for utilizing parallelism in a pipelines system are analyzed. During the Ph.D study a prototype 3D graphics architecture named Hybris has...

  13. FY 2000 Blue Book: High Performance Computing and Communications: Information Technology Frontiers for a New Millennium

    Networking and Information Technology Research and Development, Executive Office of the President — As we near the dawn of a new millennium, advances made possible by computing, information, and communications research and development R and D ? once barely...

  14. FY 1994 Blue Book: High Performance Computing and Communications: Toward a National Information Infrastructure

    Networking and Information Technology Research and Development, Executive Office of the President — government and industry that advanced computer and telecommunications technologies could provide huge benefits throughout the research community and the entire U.S....

  15. High performance computing system in the framework of the Higgs boson studies

    Belyaev, Nikita; The ATLAS collaboration; Velikhov, Vasily; Konoplich, Rostislav

    2017-01-01

    The Higgs boson physics is one of the most important and promising fields of study in the modern high energy physics. It is important to notice, that GRID computing resources become strictly limited due to increasing amount of statistics, required for physics analyses and unprecedented LHC performance. One of the possibilities to address the shortfall of computing resources is the usage of computer institutes' clusters, commercial computing resources and supercomputers. To perform precision measurements of the Higgs boson properties in these realities, it is also highly required to have effective instruments to simulate kinematic distributions of signal events. In this talk we give a brief description of the modern distribution reconstruction method called Morphing and perform few efficiency tests to demonstrate its potential. These studies have been performed on the WLCG and Kurchatov Institute’s Data Processing Center, including Tier-1 GRID site and supercomputer as well. We also analyze the CPU efficienc...

  16. Topic 14+16: High-performance and scientific applications and extreme-scale computing (Introduction)

    Downes, Turlough P.; Roller, Sabine P.; Seitsonen, Ari Paavo; Valcke, Sophie; Keyes, David E.; Sawley, Marie Christine; Schulthess, Thomas C.; Shalf, John M.

    2013-01-01

    and algorithms to address the varied, complex and increasing challenges of modern research throughout both the "hard" and "soft" sciences. This necessitates being able to use large numbers of compute nodes, many of which are equipped with accelerators

  17. PRODEEDINGS OF RIKEN BNL RESEARCH CENTER WORKSHOP : HIGH PERFORMANCE COMPUTING WITH QCDOC AND BLUEGENE.

    CHRIST,N.; DAVENPORT,J.; DENG,Y.; GARA,A.; GLIMM,J.; MAWHINNEY,R.; MCFADDEN,E.; PESKIN,A.; PULLEYBLANK,W.

    2003-03-11

    Staff of Brookhaven National Laboratory, Columbia University, IBM and the RIKEN BNL Research Center organized a one-day workshop held on February 28, 2003 at Brookhaven to promote the following goals: (1) To explore areas other than QCD applications where the QCDOC and BlueGene/L machines can be applied to good advantage, (2) To identify areas where collaboration among the sponsoring institutions can be fruitful, and (3) To expose scientists to the emerging software architecture. This workshop grew out of an informal visit last fall by BNL staff to the IBM Thomas J. Watson Research Center that resulted in a continuing dialog among participants on issues common to these two related supercomputers. The workshop was divided into three sessions, addressing the hardware and software status of each system, prospective applications, and future directions.

  18. JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.

    David K Brown

    Full Text Available Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS, a workflow management system and web interface for high performance computing (HPC. JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.

  19. JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing.

    Brown, David K; Penkler, David L; Musyoka, Thommas M; Bishop, Özlem Tastan

    2015-01-01

    Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS.

  20. JMS: An Open Source Workflow Management System and Web-Based Cluster Front-End for High Performance Computing

    Brown, David K.; Penkler, David L.; Musyoka, Thommas M.; Bishop, Özlem Tastan

    2015-01-01

    Complex computational pipelines are becoming a staple of modern scientific research. Often these pipelines are resource intensive and require days of computing time. In such cases, it makes sense to run them over high performance computing (HPC) clusters where they can take advantage of the aggregated resources of many powerful computers. In addition to this, researchers often want to integrate their workflows into their own web servers. In these cases, software is needed to manage the submission of jobs from the web interface to the cluster and then return the results once the job has finished executing. We have developed the Job Management System (JMS), a workflow management system and web interface for high performance computing (HPC). JMS provides users with a user-friendly web interface for creating complex workflows with multiple stages. It integrates this workflow functionality with the resource manager, a tool that is used to control and manage batch jobs on HPC clusters. As such, JMS combines workflow management functionality with cluster administration functionality. In addition, JMS provides developer tools including a code editor and the ability to version tools and scripts. JMS can be used by researchers from any field to build and run complex computational pipelines and provides functionality to include these pipelines in external interfaces. JMS is currently being used to house a number of bioinformatics pipelines at the Research Unit in Bioinformatics (RUBi) at Rhodes University. JMS is an open-source project and is freely available at https://github.com/RUBi-ZA/JMS. PMID:26280450

  1. High-Performance Control of Paralleled Three-Phase Inverters for Residential Microgrid Architectures Based on Online Uninterruptable Power Systems

    Zhang, Chi; Guerrero, Josep M.; Vasquez, Juan Carlos

    2015-01-01

    In this paper, a control strategy for the parallel operation of three-phase inverters forming an online uninterruptible power system (UPS) is presented. The UPS system consists of a cluster of paralleled inverters with LC filters directly connected to an AC critical bus and an AC/DC forming a DC...... bus. The proposed control scheme is performed on two layers: (i) a local layer that contains a “reactive power vs phase” in order to synchronize the phase angle of each inverter and a virtual resistance loop that guarantees equal power sharing among inverters; (ii) a central controller that guarantees...... synchronization with an external real/fictitious utility, and critical bus voltage restoration. Constant transient and steady-state frequency, active, reactive and harmonic power sharing, and global phase-locked loop resynchronization capability are achieved. Detailed system topology and control architecture...

  2. High performance image acquisition and processing architecture for fast plant system controllers based on FPGA and GPU

    Nieto, J.; Sanz, D.; Guillén, P.; Esquembri, S.; Arcas, G. de; Ruiz, M.; Vega, J.; Castro, R.

    2016-01-01

    Highlights: • To test an image acquisition and processing system for Camera Link devices based in a FPGA, compliant with ITER fast controllers. • To move data acquired from the set NI1483-NIPXIe7966R directly to a NVIDIA GPU using NVIDIA GPUDirect RDMA technology. • To obtain a methodology to include GPUs processing in ITER Fast Plant Controllers, using EPICS integration through Nominal Device Support (NDS). - Abstract: The two dominant technologies that are being used in real time image processing are Field Programmable Gate Array (FPGA) and Graphical Processor Unit (GPU) due to their algorithm parallelization capabilities. But not much work has been done to standardize how these technologies can be integrated in data acquisition systems, where control and supervisory requirements are in place, such as ITER (International Thermonuclear Experimental Reactor). This work proposes an architecture, and a development methodology, to develop image acquisition and processing systems based on FPGAs and GPUs compliant with ITER fast controller solutions. A use case based on a Camera Link device connected to an FPGA DAQ device (National Instruments FlexRIO technology), and a NVIDIA Tesla GPU series card has been developed and tested. The architecture proposed has been designed to optimize system performance by minimizing data transfer operations and CPU intervention thanks to the use of NVIDIA GPUDirect RDMA and DMA technologies. This allows moving the data directly between the different hardware elements (FPGA DAQ-GPU-CPU) avoiding CPU intervention and therefore the use of intermediate CPU memory buffers. A special effort has been put to provide a development methodology that, maintaining the highest possible abstraction from the low level implementation details, allows obtaining solutions that conform to CODAC Core System standards by providing EPICS and Nominal Device Support.

  3. High performance image acquisition and processing architecture for fast plant system controllers based on FPGA and GPU

    Nieto, J., E-mail: jnieto@sec.upm.es [Grupo de Investigación en Instrumentación y Acústica Aplicada, Universidad Politécnica de Madrid, Crta. Valencia Km-7, Madrid 28031 (Spain); Sanz, D.; Guillén, P.; Esquembri, S.; Arcas, G. de; Ruiz, M. [Grupo de Investigación en Instrumentación y Acústica Aplicada, Universidad Politécnica de Madrid, Crta. Valencia Km-7, Madrid 28031 (Spain); Vega, J.; Castro, R. [Asociación EURATOM/CIEMAT para Fusión, Madrid (Spain)

    2016-11-15

    Highlights: • To test an image acquisition and processing system for Camera Link devices based in a FPGA, compliant with ITER fast controllers. • To move data acquired from the set NI1483-NIPXIe7966R directly to a NVIDIA GPU using NVIDIA GPUDirect RDMA technology. • To obtain a methodology to include GPUs processing in ITER Fast Plant Controllers, using EPICS integration through Nominal Device Support (NDS). - Abstract: The two dominant technologies that are being used in real time image processing are Field Programmable Gate Array (FPGA) and Graphical Processor Unit (GPU) due to their algorithm parallelization capabilities. But not much work has been done to standardize how these technologies can be integrated in data acquisition systems, where control and supervisory requirements are in place, such as ITER (International Thermonuclear Experimental Reactor). This work proposes an architecture, and a development methodology, to develop image acquisition and processing systems based on FPGAs and GPUs compliant with ITER fast controller solutions. A use case based on a Camera Link device connected to an FPGA DAQ device (National Instruments FlexRIO technology), and a NVIDIA Tesla GPU series card has been developed and tested. The architecture proposed has been designed to optimize system performance by minimizing data transfer operations and CPU intervention thanks to the use of NVIDIA GPUDirect RDMA and DMA technologies. This allows moving the data directly between the different hardware elements (FPGA DAQ-GPU-CPU) avoiding CPU intervention and therefore the use of intermediate CPU memory buffers. A special effort has been put to provide a development methodology that, maintaining the highest possible abstraction from the low level implementation details, allows obtaining solutions that conform to CODAC Core System standards by providing EPICS and Nominal Device Support.

  4. Money for Research, Not for Energy Bills: Finding Energy and Cost Savings in High Performance Computer Facility Designs

    Drewmark Communications; Sartor, Dale; Wilson, Mark

    2010-07-01

    High-performance computing facilities in the United States consume an enormous amount of electricity, cutting into research budgets and challenging public- and private-sector efforts to reduce energy consumption and meet environmental goals. However, these facilities can greatly reduce their energy demand through energy-efficient design of the facility itself. Using a case study of a facility under design, this article discusses strategies and technologies that can be used to help achieve energy reductions.

  5. High-Bandwidth Tactical-Network Data Analysis in a High-Performance-Computing (HPC) Environment: Packet-Level Analysis

    2015-09-01

    individual fragments using the hash-based method. In general, fragments 6 appear in order and relatively close to each other in the file. A fragment...data product derived from the data model is shown in Fig. 5, a Google Earth12 Keyhole Markup Language (KML) file. This product includes aggregate...System BLOb binary large object FPGA field-programmable gate array HPC high-performance computing IP Internet Protocol KML Keyhole Markup Language

  6. A heterogeneous hierarchical architecture for real-time computing

    Skroch, D.A.; Fornaro, R.J.

    1988-12-01

    The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.

  7. Implementation of the Two-Point Angular Correlation Function on a High-Performance Reconfigurable Computer

    Volodymyr V. Kindratenko

    2009-01-01

    Full Text Available We present a parallel implementation of an algorithm for calculating the two-point angular correlation function as applied in the field of computational cosmology. The algorithm has been specifically developed for a reconfigurable computer. Our implementation utilizes a microprocessor and two reconfigurable processors on a dual-MAP SRC-6 system. The two reconfigurable processors are used as two application-specific co-processors. Two independent computational kernels are simultaneously executed on the reconfigurable processors while data pre-fetching from disk and initial data pre-processing are executed on the microprocessor. The overall end-to-end algorithm execution speedup achieved by this implementation is over 90× as compared to a sequential implementation of the algorithm executed on a single 2.8 GHz Intel Xeon microprocessor.

  8. Computer-Aided Chemical Product Design Framework: Design of High Performance and Environmentally Friendly Refrigerants

    Cignitti, Stefano; Zhang, Lei; Gani, Rafiqul

    properties and needs should carefully be selected for a given heat pump cycle to ensure that an optimum refrigerant is found? How can cycle performance and environmental criteria be integrated at the product design stage and not in post-design analysis? Computer-aided product design methods enable...... the possibility of designing novel molecules, mixtures and blends, such as refrigerants through a systematic framework (Cignitti et al., 2015; Yunus et al., 2014). In this presentation a computer-aided framework is presented for chemical product design through mathematical optimization. Here, molecules, mixtures...... and blends, are systematically designed through a decomposition based solution method. Given a problem definition, computer-aided molecular design (CAMD) problem is defined, which is formulated into a mixed integer nonlinear program (MINLP). The decomposed solution method then sequentially divides the MINLP...

  9. A Protocol for Provably Secure Authentication of a Tiny Entity to a High Performance Computing One

    Siniša Tomović

    2016-01-01

    Full Text Available The problem of developing authentication protocols dedicated to a specific scenario where an entity with limited computational capabilities should prove the identity to a computationally powerful Verifier is addressed. An authentication protocol suitable for the considered scenario which jointly employs the learning parity with noise (LPN problem and a paradigm of random selection is proposed. It is shown that the proposed protocol is secure against active attacking scenarios and so called GRS man-in-the-middle (MIM attacking scenarios. In comparison with the related previously reported authentication protocols the proposed one provides reduction of the implementation complexity and at least the same level of the cryptographic security.

  10. High-performance secure multi-party computation for data mining applications

    Bogdanov, Dan; Niitsoo, Margus; Toft, Tomas

    2012-01-01

    Secure multi-party computation (MPC) is a technique well suited for privacy-preserving data mining. Even with the recent progress in two-party computation techniques such as fully homomorphic encryption, general MPC remains relevant as it has shown promising performance metrics in real...... operations such as multiplication and comparison. Secondly, the confidential processing of financial data requires the use of more complex primitives, including a secure division operation. This paper describes new protocols in the Sharemind model for secure multiplication, share conversion, equality, bit...

  11. Memristor-based nanoelectronic computing circuits and architectures

    Vourkas, Ioannis

    2016-01-01

    This book considers the design and development of nanoelectronic computing circuits, systems and architectures focusing particularly on memristors, which represent one of today’s latest technology breakthroughs in nanoelectronics. The book studies, explores, and addresses the related challenges and proposes solutions for the smooth transition from conventional circuit technologies to emerging computing memristive nanotechnologies. Its content spans from fundamental device modeling to emerging storage system architectures and novel circuit design methodologies, targeting advanced non-conventional analog/digital massively parallel computational structures. Several new results on memristor modeling, memristive interconnections, logic circuit design, memory circuit architectures, computer arithmetic systems, simulation software tools, and applications of memristors in computing are presented. High-density memristive data storage combined with memristive circuit-design paradigms and computational tools applied t...

  12. The Use of High Performance Computing (HPC) to Strengthen the Development of Army Systems

    2011-11-01

    changes in what the warfighter wants – in the middle of an acquisition cycle such changes create havoc in terms of delays, recycling of the research...A little bit later the first personal computers (PCs) came on the market, mostly as curiosities . The operating systems were either ms-dos or cp/m

  13. A High Performance Computing Framework for Physics-based Modeling and Simulation of Military Ground Vehicles

    2011-03-25

    cluster. The co-processing idea is the enabler of the heterogeneous computing concept advertised recently as the paradigm capable of delivering exascale ...Petascale to Exascale : Extending Intel’s HPC Commitment: http://download.intel.com/pressroom/archive/reference/ISC_2010_Skaugen_keynote.pdf in

  14. High Performance Parallel Processing Project: Industrial computing initiative. Progress reports for fiscal year 1995

    Koniges, A.

    1996-02-09

    This project is a package of 11 individual CRADA`s plus hardware. This innovative project established a three-year multi-party collaboration that is significantly accelerating the availability of commercial massively parallel processing computing software technology to U.S. government, academic, and industrial end-users. This report contains individual presentations from nine principal investigators along with overall program information.

  15. Department of Defense High Performance Computing Modernization Program. 2008 Annual Report

    2009-04-01

    Environment, Doug Post and the CREATE Team (K. Hill, D. van Veldhuizen , G. Zelinski, AFRL; S. Arevalo, T. Blacker, D. Fisher, P. Genalis, A. Harris, M...RF Antenna Group, David van Veldhuizen Computational Research and Engineering Acquisition Tools and Environments (CREATE), David Fisher and

  16. A high performance computing framework for physics-based modeling and simulation of military ground vehicles

    Negrut, Dan; Lamb, David; Gorsich, David

    2011-06-01

    This paper describes a software infrastructure made up of tools and libraries designed to assist developers in implementing computational dynamics applications running on heterogeneous and distributed computing environments. Together, these tools and libraries compose a so called Heterogeneous Computing Template (HCT). The heterogeneous and distributed computing hardware infrastructure is assumed herein to be made up of a combination of CPUs and Graphics Processing Units (GPUs). The computational dynamics applications targeted to execute on such a hardware topology include many-body dynamics, smoothed-particle hydrodynamics (SPH) fluid simulation, and fluid-solid interaction analysis. The underlying theme of the solution approach embraced by HCT is that of partitioning the domain of interest into a number of subdomains that are each managed by a separate core/accelerator (CPU/GPU) pair. Five components at the core of HCT enable the envisioned distributed computing approach to large-scale dynamical system simulation: (a) the ability to partition the problem according to the one-to-one mapping; i.e., spatial subdivision, discussed above (pre-processing); (b) a protocol for passing data between any two co-processors; (c) algorithms for element proximity computation; and (d) the ability to carry out post-processing in a distributed fashion. In this contribution the components (a) and (b) of the HCT are demonstrated via the example of the Discrete Element Method (DEM) for rigid body dynamics with friction and contact. The collision detection task required in frictional-contact dynamics (task (c) above), is shown to benefit on the GPU of a two order of magnitude gain in efficiency when compared to traditional sequential implementations. Note: Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not imply its endorsement, recommendation, or favoring by the United States Army. The views and

  17. High-performance parallel computing in the classroom using the public goods game as an example

    Perc, Matjaž

    2017-07-01

    The use of computers in statistical physics is common because the sheer number of equations that describe the behaviour of an entire system particle by particle often makes it impossible to solve them exactly. Monte Carlo methods form a particularly important class of numerical methods for solving problems in statistical physics. Although these methods are simple in principle, their proper use requires a good command of statistical mechanics, as well as considerable computational resources. The aim of this paper is to demonstrate how the usage of widely accessible graphics cards on personal computers can elevate the computing power in Monte Carlo simulations by orders of magnitude, thus allowing live classroom demonstration of phenomena that would otherwise be out of reach. As an example, we use the public goods game on a square lattice where two strategies compete for common resources in a social dilemma situation. We show that the second-order phase transition to an absorbing phase in the system belongs to the directed percolation universality class, and we compare the time needed to arrive at this result by means of the main processor and by means of a suitable graphics card. Parallel computing on graphics processing units has been developed actively during the last decade, to the point where today the learning curve for entry is anything but steep for those familiar with programming. The subject is thus ripe for inclusion in graduate and advanced undergraduate curricula, and we hope that this paper will facilitate this process in the realm of physics education. To that end, we provide a documented source code for an easy reproduction of presented results and for further development of Monte Carlo simulations of similar systems.

  18. Open source acceleration of wave optics simulations on energy efficient high-performance computing platforms

    Beck, Jeffrey; Bos, Jeremy P.

    2017-05-01

    We compare several modifications to the open-source wave optics package, WavePy, intended to improve execution time. Specifically, we compare the relative performance of the Intel MKL, a CPU based OpenCV distribution, and GPU-based version. Performance is compared between distributions both on the same compute platform and between a fully-featured computing workstation and the NVIDIA Jetson TX1 platform. Comparisons are drawn in terms of both execution time and power consumption. We have found that substituting the Fast Fourier Transform operation from OpenCV provides a marked improvement on all platforms. In addition, we show that embedded platforms offer some possibility for extensive improvement in terms of efficiency compared to a fully featured workstation.

  19. Heterogeneous Gpu&Cpu Cluster For High Performance Computing In Cryptography

    Michał Marks

    2012-01-01

    Full Text Available This paper addresses issues associated with distributed computing systems andthe application of mixed GPU&CPU technology to data encryption and decryptionalgorithms. We describe a heterogenous cluster HGCC formed by twotypes of nodes: Intel processor with NVIDIA graphics processing unit and AMDprocessor with AMD graphics processing unit (formerly ATI, and a novel softwareframework that hides the heterogeneity of our cluster and provides toolsfor solving complex scientific and engineering problems. Finally, we present theresults of numerical experiments. The considered case study is concerned withparallel implementations of selected cryptanalysis algorithms. The main goal ofthe paper is to show the wide applicability of the GPU&CPU technology tolarge scale computation and data processing.

  20. LIAR: A COMPUTER PROGRAM FOR THE SIMULATION AND MODELING OF HIGH PERFORMANCE LINACS

    Adolphsen, Chris

    2003-01-01

    The computer program LIAR (''LInear Accelerator Research code'') is a numerical simulation and tracking program for linear colliders. The LIAR project was started at SLAC in August 1995 in order to provide a computing and simulation tool that specifically addresses the needs of high energy linear colliders. LIAR is designed to be used for a variety of different linear accelerators. It has been applied for and checked against the existing Stanford Linear Collider (SLC) as well as the linacs of the proposed Next Linear Collider (NLC) and the proposed Linac Coherent Light Source (LCLS). The program includes wakefield effects, a 4D coupled beam description, specific optimization algorithms and other advanced features. We describe the most important concepts and highlights of the program. After having presented the LIAR program at the LINAC96 and the PAC97 conferences, we do now introduce it to the European particle accelerator community

  1. Quo vadis: Hydrologic inverse analyses using high-performance computing and a D-Wave quantum annealer

    O'Malley, D.; Vesselinov, V. V.

    2017-12-01

    Classical microprocessors have had a dramatic impact on hydrology for decades, due largely to the exponential growth in computing power predicted by Moore's law. However, this growth is not expected to continue indefinitely and has already begun to slow. Quantum computing is an emerging alternative to classical microprocessors. Here, we demonstrated cutting edge inverse model analyses utilizing some of the best available resources in both worlds: high-performance classical computing and a D-Wave quantum annealer. The classical high-performance computing resources are utilized to build an advanced numerical model that assimilates data from O(10^5) observations, including water levels, drawdowns, and contaminant concentrations. The developed model accurately reproduces the hydrologic conditions at a Los Alamos National Laboratory contamination site, and can be leveraged to inform decision-making about site remediation. We demonstrate the use of a D-Wave 2X quantum annealer to solve hydrologic inverse problems. This work can be seen as an early step in quantum-computational hydrology. We compare and contrast our results with an early inverse approach in classical-computational hydrology that is comparable to the approach we use with quantum annealing. Our results show that quantum annealing can be useful for identifying regions of high and low permeability within an aquifer. While the problems we consider are small-scale compared to the problems that can be solved with modern classical computers, they are large compared to the problems that could be solved with early classical CPUs. Further, the binary nature of the high/low permeability problem makes it well-suited to quantum annealing, but challenging for classical computers.

  2. AHPCRC (Army High Performance Computing Research Center) Bulletin. Volume 1, Issue 2

    2011-01-01

    area and the researchers working on these projects. Also inside: news from the AHPCRC consortium partners at Morgan State University and the NASA ...Computing Research Center is provided by the supercomputing and research facilities at Stanford University and at the NASA Ames Research Center at...atomic and molecular level, he said. He noted that “every general would like to have” a Star Trek -like holodeck, where holographic avatars could

  3. Tracking the NGS revolution: managing life science research on shared high-performance computing clusters.

    Dahlö, Martin; Scofield, Douglas G; Schaal, Wesley; Spjuth, Ola

    2018-05-01

    Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences. The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat. Hosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases.

  4. High performance graphics processor based computed tomography reconstruction algorithms for nuclear and other large scale applications.

    Jimenez, Edward S. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Orr, Laurel J. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States); Thompson, Kyle R. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2013-09-01

    The goal of this work is to develop a fast computed tomography (CT) reconstruction algorithm based on graphics processing units (GPU) that achieves significant improvement over traditional central processing unit (CPU) based implementations. The main challenge in developing a CT algorithm that is capable of handling very large datasets is parallelizing the algorithm in such a way that data transfer does not hinder performance of the reconstruction algorithm. General Purpose Graphics Processing (GPGPU) is a new technology that the Science and Technology (S&T) community is starting to adopt in many fields where CPU-based computing is the norm. GPGPU programming requires a new approach to algorithm development that utilizes massively multi-threaded environments. Multi-threaded algorithms in general are difficult to optimize since performance bottlenecks occur that are non-existent in single-threaded algorithms such as memory latencies. If an efficient GPU-based CT reconstruction algorithm can be developed; computational times could be improved by a factor of 20. Additionally, cost benefits will be realized as commodity graphics hardware could potentially replace expensive supercomputers and high-end workstations. This project will take advantage of the CUDA programming environment and attempt to parallelize the task in such a way that multiple slices of the reconstruction volume are computed simultaneously. This work will also take advantage of the GPU memory by utilizing asynchronous memory transfers, GPU texture memory, and (when possible) pinned host memory so that the memory transfer bottleneck inherent to GPGPU is amortized. Additionally, this work will take advantage of GPU-specific hardware (i.e. fast texture memory, pixel-pipelines, hardware interpolators, and varying memory hierarchy) that will allow for additional performance improvements.

  5. Tracking the NGS revolution: managing life science research on shared high-performance computing clusters

    2018-01-01

    Abstract Background Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences. Results The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat. Conclusions Hosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases. PMID:29659792

  6. High-Throughput Computing on High-Performance Platforms: A Case Study

    Oleynik, D [University of Texas at Arlington; Panitkin, S [Brookhaven National Laboratory (BNL); Matteo, Turilli [Rutgers University; Angius, Alessio [Rutgers University; Oral, H Sarp [ORNL; De, K [University of Texas at Arlington; Klimentov, A [Brookhaven National Laboratory (BNL); Wells, Jack C. [ORNL; Jha, S [Rutgers University

    2017-10-01

    The computing systems used by LHC experiments has historically consisted of the federation of hundreds to thousands of distributed resources, ranging from small to mid-size resource. In spite of the impressive scale of the existing distributed computing solutions, the federation of small to mid-size resources will be insufficient to meet projected future demands. This paper is a case study of how the ATLAS experiment has embraced Titan -- a DOE leadership facility in conjunction with traditional distributed high- throughput computing to reach sustained production scales of approximately 52M core-hours a years. The three main contributions of this paper are: (i) a critical evaluation of design and operational considerations to support the sustained, scalable and production usage of Titan; (ii) a preliminary characterization of a next generation executor for PanDA to support new workloads and advanced execution modes; and (iii) early lessons for how current and future experimental and observational systems can be integrated with production supercomputers and other platforms in a general and extensible manner.

  7. High-performance flat data center network architecture based on scalable and flow-controlled optical switching system

    Calabretta, Nicola; Miao, Wang; Dorren, Harm

    2016-03-01

    Traffic in data centers networks (DCNs) is steadily growing to support various applications and virtualization technologies. Multi-tenancy enabling efficient resource utilization is considered as a key requirement for the next generation DCs resulting from the growing demands for services and applications. Virtualization mechanisms and technologies can leverage statistical multiplexing and fast switch reconfiguration to further extend the DC efficiency and agility. We present a novel high performance flat DCN employing bufferless and distributed fast (sub-microsecond) optical switches with wavelength, space, and time switching operation. The fast optical switches can enhance the performance of the DCNs by providing large-capacity switching capability and efficiently sharing the data plane resources by exploiting statistical multiplexing. Benefiting from the Software-Defined Networking (SDN) control of the optical switches, virtual DCNs can be flexibly created and reconfigured by the DCN provider. Numerical and experimental investigations of the DCN based on the fast optical switches show the successful setup of virtual network slices for intra-data center interconnections. Experimental results to assess the DCN performance in terms of latency and packet loss show less than 10^-5 packet loss and 640ns end-to-end latency with 0.4 load and 16- packet size buffer. Numerical investigation on the performance of the systems when the port number of the optical switch is scaled to 32x32 system indicate that more than 1000 ToRs each with Terabit/s interface can be interconnected providing a Petabit/s capacity. The roadmap to photonic integration of large port optical switches will be also presented.

  8. HPGMG 1.0: A Benchmark for Ranking High Performance Computing Systems

    Adams, Mark; Brown, Jed; Shalf, John; Straalen, Brian Van; Strohmaier, Erich; Williams, Sam

    2014-05-05

    This document provides an overview of the benchmark ? HPGMG ? for ranking large scale general purpose computers for use on the Top500 list [8]. We provide a rationale for the need for a replacement for the current metric HPL, some background of the Top500 list and the challenges of developing such a metric; we discuss our design philosophy and methodology, and an overview of the specification of the benchmark. The primary documentation with maintained details on the specification can be found at hpgmg.org and the Wiki and benchmark code itself can be found in the repository https://bitbucket.org/hpgmg/hpgmg.

  9. Applying Machine Learning and High Performance Computing to Water Quality Assessment and Prediction

    Ruijian Zhang; Deren Li

    2017-01-01

    Water quality assessment and prediction is a more and more important issue. Traditional ways either take lots of time or they can only do assessments. In this research, by applying machine learning algorithm to a long period time of water attributes’ data; we can generate a decision tree so that it can predict the future day’s water quality in an easy and efficient way. The idea is to combine the traditional ways and the computer algorithms together. Using machine learning algorithms, the ass...

  10. ArrayBridge: Interweaving declarative array processing with high-performance computing

    Xing, Haoyuan [The Ohio State Univ., Columbus, OH (United States); Floratos, Sofoklis [The Ohio State Univ., Columbus, OH (United States); Blanas, Spyros [The Ohio State Univ., Columbus, OH (United States); Byna, Suren [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Prabhat, Prabhat [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Wu, Kesheng [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Brown, Paul [Paradigm4, Inc., Waltham, MA (United States)

    2017-05-04

    Scientists are increasingly turning to datacenter-scale computers to produce and analyze massive arrays. Despite decades of database research that extols the virtues of declarative query processing, scientists still write, debug and parallelize imperative HPC kernels even for the most mundane queries. This impedance mismatch has been partly attributed to the cumbersome data loading process; in response, the database community has proposed in situ mechanisms to access data in scientific file formats. Scientists, however, desire more than a passive access method that reads arrays from files. This paper describes ArrayBridge, a bi-directional array view mechanism for scientific file formats, that aims to make declarative array manipulations interoperable with imperative file-centric analyses. Our prototype implementation of ArrayBridge uses HDF5 as the underlying array storage library and seamlessly integrates into the SciDB open-source array database system. In addition to fast querying over external array objects, ArrayBridge produces arrays in the HDF5 file format just as easily as it can read from it. ArrayBridge also supports time travel queries from imperative kernels through the unmodified HDF5 API, and automatically deduplicates between array versions for space efficiency. Our extensive performance evaluation in NERSC, a large-scale scientific computing facility, shows that ArrayBridge exhibits statistically indistinguishable performance and I/O scalability to the native SciDB storage engine.

  11. Statistical physics of fracture: scientific discovery through high-performance computing

    Kumar, Phani; Nukala, V V; Simunovic, Srdan; Mills, Richard T

    2006-01-01

    The paper presents the state-of-the-art algorithmic developments for simulating the fracture of disordered quasi-brittle materials using discrete lattice systems. Large scale simulations are often required to obtain accurate scaling laws; however, due to computational complexity, the simulations using the traditional algorithms were limited to small system sizes. We have developed two algorithms: a multiple sparse Cholesky downdating scheme for simulating 2D random fuse model systems, and a block-circulant preconditioner for simulating 2D random fuse model systems. Using these algorithms, we were able to simulate fracture of largest ever lattice system sizes (L = 1024 in 2D, and L = 64 in 3D) with extensive statistical sampling. Our recent simulations on 1024 processors of Cray-XT3 and IBM Blue-Gene/L have further enabled us to explore fracture of 3D lattice systems of size L = 200, which is a significant computational achievement. These largest ever numerical simulations have enhanced our understanding of physics of fracture; in particular, we analyze damage localization and its deviation from percolation behavior, scaling laws for damage density, universality of fracture strength distribution, size effect on the mean fracture strength, and finally the scaling of crack surface roughness

  12. An Introduction to High Performance Fortran

    John Merlin

    1995-01-01

    Full Text Available High Performance Fortran (HPF is an informal standard for extensions to Fortran 90 to assist its implementation on parallel architectures, particularly for data-parallel computation. Among other things, it includes directives for specifying data distribution across multiple memories, and concurrent execution features. This article provides a tutorial introduction to the main features of HPF.

  13. Computer system architecture for laboratory automation

    Penney, B.K.

    1978-01-01

    This paper describes the various approaches that may be taken to provide computing resources for laboratory automation. Three distinct approaches are identified, the single dedicated small computer, shared use of a larger computer, and a distributed approach in which resources are provided by a number of computers, linked together, and working in some cooperative way. The significance of the microprocessor in laboratory automation is discussed, and it is shown that it is not simply a cheap replacement of the minicomputer. (Auth.)

  14. A computational architecture for social agents

    Bond, A.H. [California Institute of Technology, Pasadena, CA (United States)

    1996-12-31

    This article describes a new class of information-processing models for social agents. They axe derived from primate brain architecture, the processing in brain regions, the interactions among brain regions, and the social behavior of primates. In another paper, we have reviewed the neuroanatomical connections and functional involvements of cortical regions. We reviewed the evidence for a hierarchical architecture in the primate brain. By examining neuroanatomical evidence for connections among neural areas, we were able to establish anatomical regions and connections. We then examined evidence for specific functional involvements of the different neural axeas and found some support for hierarchical functioning, not only for the perception hierarchies but also for the planning and action hierarchy in the frontal lobes.

  15. On architectural acoustic design using computer simulation

    Schmidt, Anne Marie Due; Kirkegaard, Poul Henning

    2004-01-01

    properties prior to the actual construction of a building. With the right tools applied, acoustic design can become an integral part of the architectural design process. The aim of this paper is to investigate the field of application that an acoustic simulation programme can have during an architectural...... acoustic design process. The emphasis is put on the first three out of five phases in the working process of the architect and a case study is carried out in which each phase is represented by typical results ? as exemplified with reference to the design of Bagsværd Church by Jørn Utzon. The paper...... discusses the advantages and disadvantages of the programme in each phase compared to the works of architects not using acoustic simulation programmes. The conclusion of the paper is that the application of acoustic simulation programs is most beneficial in the last of three phases but an application...

  16. Operational mesoscale atmospheric dispersion prediction using high performance parallel computing cluster for emergency response

    Srinivas, C.V.; Venkatesan, R.; Muralidharan, N.V.; Das, Someshwar; Dass, Hari; Eswara Kumar, P.

    2005-08-01

    An operational atmospheric dispersion prediction system is implemented on a cluster super computer for 'Online Emergency Response' for Kalpakkam nuclear site. The numerical system constitutes a parallel version of a nested grid meso-scale meteorological model MM5 coupled to a random walk particle dispersion model FLEXPART. The system provides 48 hour forecast of the local weather and radioactive plume dispersion due to hypothetical air borne releases in a range of 100 km around the site. The parallel code was implemented on different cluster configurations like distributed and shared memory systems. Results of MM5 run time performance for 1-day prediction are reported on all the machines available for testing. A reduction of 5 times in runtime is achieved using 9 dual Xeon nodes (18 physical/36 logical processors) compared to a single node sequential run. Based on the above run time results a cluster computer facility with 9-node Dual Xeon is commissioned at IGCAR for model operation. The run time of a triple nested domain MM5 is about 4 h for 24 h forecast. The system has been operated continuously for a few months and results were ported on the IMSc home page. Initial and periodic boundary condition data for MM5 are provided by NCMRWF, New Delhi. An alternative source is found to be NCEP, USA. These two sources provide the input data to the operational models at different spatial and temporal resolutions and using different assimilation methods. A comparative study on the results of forecast is presented using these two data sources for present operational use. Slight improvement is noticed in rainfall, winds, geopotential heights and the vertical atmospheric structure while using NCEP data probably because of its high spatial and temporal resolution. (author)

  17. Technologies and tools for high-performance distributed computing. Final report

    Karonis, Nicholas T.

    2000-05-01

    In this project we studied the practical use of the MPI message-passing interface in advanced distributed computing environments. We built on the existing software infrastructure provided by the Globus Toolkit{trademark}, the MPICH portable implementation of MPI, and the MPICH-G integration of MPICH with Globus. As a result of this project we have replaced MPICH-G with its successor MPICH-G2, which is also an integration of MPICH with Globus. MPICH-G2 delivers significant improvements in message passing performance when compared to its predecessor MPICH-G and was based on superior software design principles resulting in a software base that was much easier to make the functional extensions and improvements we did. Using Globus services we replaced the default implementation of MPI's collective operations in MPICH-G2 with more efficient multilevel topology-aware collective operations which, in turn, led to the development of a new timing methodology for broadcasts [8]. MPICH-G2 was extended to include client/server functionality from the MPI-2 standard [23] to facilitate remote visualization applications and, through the use of MPI idioms, MPICH-G2 provided application-level control of quality-of-service parameters as well as application-level discovery of underlying Grid-topology information. Finally, MPICH-G2 was successfully used in a number of applications including an award-winning record-setting computation in numerical relativity. In the sections that follow we describe in detail the accomplishments of this project, we present experimental results quantifying the performance improvements, and conclude with a discussion of our applications experiences. This project resulted in a significant increase in the utility of MPICH-G2.

  18. Exploiting Data Intensive Applications on High Performance Computers to Unlock Australia's Landsat Archive

    Purss, Matthew; Lewis, Adam; Edberg, Roger; Ip, Alex; Sixsmith, Joshua; Frankish, Glenn; Chan, Tai; Evans, Ben; Hurst, Lachlan

    2013-04-01

    Australia's Earth Observation Program has downlinked and archived satellite data acquired under the NASA Landsat mission for the Australian Government since the establishment of the Australian Landsat Station in 1979. Geoscience Australia maintains this archive and produces image products to aid the delivery of government policy objectives. Due to the labor intensive nature of processing of this data there have been few national-scale datasets created to date. To compile any Earth Observation product the historical approach has been to select the required subset of data and process "scene by scene" on an as-needed basis. As data volumes have increased over time, and the demand for the processed data has also grown, it has become increasingly difficult to rapidly produce these products and achieve satisfactory policy outcomes using these historic processing methods. The result is that we have been "drowning in a sea of uncalibrated data" and scientists, policy makers and the public have not been able to realize the full potential of the Australian Landsat Archive and its value is therefore significantly diminished. To overcome this critical issue, the Australian Space Research Program has funded the "Unlocking the Landsat Archive" (ULA) Project from April 2011 to June 2013 to improve the access and utilization of Australia's archive of Landsat data. The ULA Project is a public-private consortium led by Lockheed Martin Australia (LMA) and involving Geoscience Australia (GA), the Victorian Partnership for Advanced Computing (VPAC), the National Computational Infrastructure (NCI) at the Australian National University (ANU) and the Cooperative Research Centre for Spatial Information (CRC-SI). The outputs from the ULA project will become a fundamental component of Australia's eResearch infrastructure, with the Australian Landsat Archive hosted on the NCI and made openly available under a creative commons license. NCI provides access to researchers through significant HPC

  19. Simulation and high performance computing-Building a predictive capability for fusion

    Strand, P.I.; Coelho, R.; Coster, D.; Eriksson, L.-G.; Imbeaux, F.; Guillerminet, Bernard

    2010-01-01

    The Integrated Tokamak Modelling Task Force (ITM-TF) is developing an infrastructure where the validation needs, as being formulated in terms of multi-device data access and detailed physics comparisons aiming for inclusion of synthetic diagnostics in the simulation chain, are key components. As the activity and the modelling tools are aimed for general use, although focused on ITER plasmas, a device independent approach to data transport and a standardized approach to data management (data structures, naming, and access) is being developed in order to allow cross-validation between different fusion devices using a single toolset. Extensive work has already gone into, and is continuing to go into, the development of standardized descriptions of the data (Consistent Physical Objects). The longer term aim is a complete simulation platform which is expected to last and be extended in different ways for the coming 30 years. The technical underpinning is therefore of vital importance. In particular the platform needs to be extensible and open-ended to be able to take full advantage of not only today's most advanced technologies but also be able to marshal future developments. As a full level comprehensive prediction of ITER physics rapidly becomes expensive in terms of computing resources, the simulation framework needs to be able to use both grid and HPC computing facilities. Hence data access and code coupling technologies are required to be available for a heterogeneous, possibly distributed, environment. The developments in this area are pursued in a separate project-EUFORIA (EU Fusion for ITER Applications) which is providing about 15 professional person year (ppy) per annum from 14 different institutes. The range and size of the activity is not only technically challenging but is providing some unique management challenges in that a large and geographically distributed team (a truly pan-European set of researchers) need to be coordinated on a fairly detailed

  20. A high performance, low power computational platform for complex sensing operations in smart cities

    Jiang, Jiming; Claudel, Christian

    2017-01-01

    This paper presents a new wireless platform designed for an integrated traffic/flash flood monitoring system. The sensor platform is built around a 32-bit ARM Cortex M4 microcontroller and a 2.4GHz 802.15.4802.15.4 ISM compliant radio module. It can be interfaced with fixed traffic sensors, or receive data from vehicle transponders. This platform is specifically designed for solar-powered, low bandwidth, high computational performance wireless sensor network applications. A self-recovering unit is designed to increase reliability and allow periodic hard resets, an essential requirement for sensor networks. A radio monitoring circuitry is proposed to monitor incoming and outgoing transmissions, simplifying software debugging. We illustrate the performance of this wireless sensor platform on complex problems arising in smart cities, such as traffic flow monitoring, machine-learning-based flash flood monitoring or Kalman-filter based vehicle trajectory estimation. All design files have been uploaded and shared in an open science framework, and can be accessed from [1]. The hardware design is under CERN Open Hardware License v1.2.

  1. A high performance, low power computational platform for complex sensing operations in smart cities

    Jiang, Jiming

    2017-02-02

    This paper presents a new wireless platform designed for an integrated traffic/flash flood monitoring system. The sensor platform is built around a 32-bit ARM Cortex M4 microcontroller and a 2.4GHz 802.15.4802.15.4 ISM compliant radio module. It can be interfaced with fixed traffic sensors, or receive data from vehicle transponders. This platform is specifically designed for solar-powered, low bandwidth, high computational performance wireless sensor network applications. A self-recovering unit is designed to increase reliability and allow periodic hard resets, an essential requirement for sensor networks. A radio monitoring circuitry is proposed to monitor incoming and outgoing transmissions, simplifying software debugging. We illustrate the performance of this wireless sensor platform on complex problems arising in smart cities, such as traffic flow monitoring, machine-learning-based flash flood monitoring or Kalman-filter based vehicle trajectory estimation. All design files have been uploaded and shared in an open science framework, and can be accessed from [1]. The hardware design is under CERN Open Hardware License v1.2.

  2. The visual simulators for architecture and computer organization learning

    Nikolić Boško; Grbanović Nenad; Đorđević Jovan

    2009-01-01

    The paper proposes a method of an effective distance learning of architecture and computer organization. The proposed method is based on a software system that is possible to be applied in any course in this field. Within this system students are enabled to observe simulation of already created computer systems. The system provides creation and simulation of switch systems, too.

  3. A memory-array architecture for computer vision

    Balsara, P.T.

    1989-01-01

    With the fast advances in the area of computer vision and robotics there is a growing need for machines that can understand images at a very high speed. A conventional von Neumann computer is not suited for this purpose because it takes a tremendous amount of time to solve most typical image processing problems. Exploiting the inherent parallelism present in various vision tasks can significantly reduce the processing time. Fortunately, parallelism is increasingly affordable as hardware gets cheaper. Thus it is now imperative to study computer vision in a parallel processing framework. The author should first design a computational structure which is well suited for a wide range of vision tasks and then develop parallel algorithms which can run efficiently on this structure. Recent advances in VLSI technology have led to several proposals for parallel architectures for computer vision. In this thesis he demonstrates that a memory array architecture with efficient local and global communication capabilities can be used for high speed execution of a wide range of computer vision tasks. This architecture, called the Access Constrained Memory Array Architecture (ACMAA), is efficient for VLSI implementation because of its modular structure, simple interconnect and limited global control. Several parallel vision algorithms have been designed for this architecture. The choice of vision problems demonstrates the versatility of ACMAA for a wide range of vision tasks. These algorithms were simulated on a high level ACMAA simulator running on the Intel iPSC/2 hypercube, a parallel architecture. The results of this simulation are compared with those of sequential algorithms running on a single hypercube node. Details of the ACMAA processor architecture are also presented.

  4. Optoelectronic Computer Architecture Development for Image Reconstruction

    Forber, Richard

    1996-01-01

    .... Specifically, we collaborated with UCSD and ERIM on the development of an optically augmented electronic computer for high speed inverse transform calculations to enable real time image reconstruction...

  5. Architecture independent environment for developing engineering software on MIMD computers

    Valimohamed, Karim A.; Lopez, L. A.

    1990-01-01

    Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management.

  6. Heavy Lift Vehicle (HLV) Avionics Flight Computing Architecture Study

    Hodson, Robert F.; Chen, Yuan; Morgan, Dwayne R.; Butler, A. Marc; Sdhuh, Joseph M.; Petelle, Jennifer K.; Gwaltney, David A.; Coe, Lisa D.; Koelbl, Terry G.; Nguyen, Hai D.

    2011-01-01

    A NASA multi-Center study team was assembled from LaRC, MSFC, KSC, JSC and WFF to examine potential flight computing architectures for a Heavy Lift Vehicle (HLV) to better understand avionics drivers. The study examined Design Reference Missions (DRMs) and vehicle requirements that could impact the vehicles avionics. The study considered multiple self-checking and voting architectural variants and examined reliability, fault-tolerance, mass, power, and redundancy management impacts. Furthermore, a goal of the study was to develop the skills and tools needed to rapidly assess additional architectures should requirements or assumptions change.

  7. Design of Carborane Molecular Architectures via Electronic Structure Computations

    Oliva, J.M.; Serrano-Andres, L.; Klein, D.J.; Schleyer, P.V.R.; Mich, J.

    2009-01-01

    Quantum-mechanical electronic structure computations were employed to explore initial steps towards a comprehensive design of poly carborane architectures through assembly of molecular units. Aspects considered were (i) the striking modification of geometrical parameters through substitution, (ii) endohedral carboranes and proposed ejection mechanisms for energy/ion/atom/energy storage/transport, (iii) the excited state character in single and dimeric molecular units, and (iv) higher architectural constructs. A goal of this work is to find optimal architectures where atom/ion/energy/spin transport within carborane superclusters is feasible in order to modernize and improve future photo energy processes.

  8. Coordinated Fault-Tolerance for High-Performance Computing Final Project Report

    Panda, Dhabaleswar Kumar [The Ohio State University; Beckman, Pete

    2011-07-28

    existing publish-subscribe tools. We enhanced the intrinsic fault tolerance capabilities representative implementations of a variety of key HPC software subsystems and integrated them with the FTB. Targeting software subsystems included: MPI communication libraries, checkpoint/restart libraries, resource managers and job schedulers, and system monitoring tools. Leveraging the aforementioned infrastructure, as well as developing and utilizing additional tools, we have examined issues associated with expanded, end-to-end fault response from both system and application viewpoints. From the standpoint of system operations, we have investigated log and root cause analysis, anomaly detection and fault prediction, and generalized notification mechanisms. Our applications work has included libraries for fault-tolerance linear algebra, application frameworks for coupled multiphysics applications, and external frameworks to support the monitoring and response for general applications. Our final goal was to engage the high-end computing community to increase awareness of tools and issues around coordinated end-to-end fault management.

  9. Architectural analysis for wirelessly powered computing platforms

    Kapoor, A.; Pineda de Gyvez, J.

    2013-01-01

    We present a design framework for wirelessly powered generic computing platforms that takes into account various system parameters in response to a time-varying energy source. These parameters are the charging profile of the energy source, computing speed (fclk), digital supply voltage (VDD), energy

  10. MOMCC: Market-Oriented Architecture for Mobile Cloud Computing Based on Service Oriented Architecture

    Abolfazli, Saeid; Sanaei, Zohreh; Gani, Abdullah; Shiraz, Muhammad

    2012-01-01

    The vision of augmenting computing capabilities of mobile devices, especially smartphones with least cost is likely transforming to reality leveraging cloud computing. Cloud exploitation by mobile devices breeds a new research domain called Mobile Cloud Computing (MCC). However, issues like portability and interoperability should be addressed for mobile augmentation which is a non-trivial task using component-based approaches. Service Oriented Architecture (SOA) is a promising design philosop...

  11. Development of three-dimensional neoclassical transport simulation code with high performance Fortran on a vector-parallel computer

    Satake, Shinsuke; Okamoto, Masao; Nakajima, Noriyoshi; Takamaru, Hisanori

    2005-11-01

    A neoclassical transport simulation code (FORTEC-3D) applicable to three-dimensional configurations has been developed using High Performance Fortran (HPF). Adoption of computing techniques for parallelization and a hybrid simulation model to the δf Monte-Carlo method transport simulation, including non-local transport effects in three-dimensional configurations, makes it possible to simulate the dynamism of global, non-local transport phenomena with a self-consistent radial electric field within a reasonable computation time. In this paper, development of the transport code using HPF is reported. Optimization techniques in order to achieve both high vectorization and parallelization efficiency, adoption of a parallel random number generator, and also benchmark results, are shown. (author)

  12. Cloud Computing Security in Openstack Architecture: General Overview

    Gleb Igorevich Shakulo

    2015-10-01

    Full Text Available The subject of article is cloud computing security. Article begins with author analyzing cloud computing advantages and disadvantages, factors of growth, both positive and negative. Among latter, security is deemed one of the most prominent. Furthermore, author takes architecture of OpenStack project as an example for study: describes its essential components and their interconnection. As conclusion, author raises series of questions as possible areas of further research to resolve security concerns, thus making cloud computing more secure technology.

  13. Building Honeycomb-Like Hollow Microsphere Architecture in a Bubble Template Reaction for High-Performance Lithium-Rich Layered Oxide Cathode Materials.

    Chen, Zhaoyong; Yan, Xiaoyan; Xu, Ming; Cao, Kaifeng; Zhu, Huali; Li, Lingjun; Duan, Junfei

    2017-09-13

    In the family of high-performance cathode materials for lithium-ion batteries, lithium-rich layered oxides come out in front because of a high reversible capacity exceeding 250 mAh g -1 . However, the long-term energy retention and high energy densities for lithium-rich layered oxide cathode materials require a stable structure with large surface areas. Here we propose a "bubble template" reaction to build "honeycomb-like" hollow microsphere architecture for a Li 1.2 Mn 0.52 Ni 0.2 Co 0.08 O 2 cathode material. Our material is designed with ca. 8-μm-sized secondary particles with hollow and highly exposed porous structures that promise a large flexible volume to achieve superior structure stability and high rate capability. Our preliminary electrochemical experiments show a high capacity of 287 mAh g -1 at 0.1 C and a capacity retention of 96% after 100 cycles at 1.0 C. Furthermore, the rate capability is superior without any other modifications, reaching 197 mAh g -1 at 3.0 C with a capacity retention of 94% after 100 cycles. This approach may shed light on a new material engineering for high-performance cathode materials.

  14. The NCI High Performance Computing (HPC) and High Performance Data (HPD) Platform to Support the Analysis of Petascale Environmental Data Collections

    Evans, B. J. K.; Pugh, T.; Wyborn, L. A.; Porter, D.; Allen, C.; Smillie, J.; Antony, J.; Trenham, C.; Evans, B. J.; Beckett, D.; Erwin, T.; King, E.; Hodge, J.; Woodcock, R.; Fraser, R.; Lescinsky, D. T.

    2014-12-01

    The National Computational Infrastructure (NCI) has co-located a priority set of national data assets within a HPC research platform. This powerful in-situ computational platform has been created to help serve and analyse the massive amounts of data across the spectrum of environmental collections - in particular the climate, observational data and geoscientific domains. This paper examines the infrastructure, innovation and opportunity for this significant research platform. NCI currently manages nationally significant data collections (10+ PB) categorised as 1) earth system sciences, climate and weather model data assets and products, 2) earth and marine observations and products, 3) geosciences, 4) terrestrial ecosystem, 5) water management and hydrology, and 6) astronomy, social science and biosciences. The data is largely sourced from the NCI partners (who include the custodians of many of the national scientific records), major research communities, and collaborating overseas organisations. By co-locating these large valuable data assets, new opportunities have arisen by harmonising the data collections, making a powerful transdisciplinary research platformThe data is accessible within an integrated HPC-HPD environment - a 1.2 PFlop supercomputer (Raijin), a HPC class 3000 core OpenStack cloud system and several highly connected large scale and high-bandwidth Lustre filesystems. New scientific software, cloud-scale techniques, server-side visualisation and data services have been harnessed and integrated into the platform, so that analysis is performed seamlessly across the traditional boundaries of the underlying data domains. Characterisation of the techniques along with performance profiling ensures scalability of each software component, all of which can either be enhanced or replaced through future improvements. A Development-to-Operations (DevOps) framework has also been implemented to manage the scale of the software complexity alone. This ensures that

  15. Layered Architectures for Quantum Computers and Quantum Repeaters

    Jones, Nathan C.

    This chapter examines how to organize quantum computers and repeaters using a systematic framework known as layered architecture, where machine control is organized in layers associated with specialized tasks. The framework is flexible and could be used for analysis and comparison of quantum information systems. To demonstrate the design principles in practice, we develop architectures for quantum computers and quantum repeaters based on optically controlled quantum dots, showing how a myriad of technologies must operate synchronously to achieve fault-tolerance. Optical control makes information processing in this system very fast, scalable to large problem sizes, and extendable to quantum communication.

  16. Architectural design for a topological cluster state quantum computer

    Devitt, Simon J; Munro, William J; Nemoto, Kae; Fowler, Austin G; Stephens, Ashley M; Greentree, Andrew D; Hollenberg, Lloyd C L

    2009-01-01

    The development of a large scale quantum computer is a highly sought after goal of fundamental research and consequently a highly non-trivial problem. Scalability in quantum information processing is not just a problem of qubit manufacturing and control but it crucially depends on the ability to adapt advanced techniques in quantum information theory, such as error correction, to the experimental restrictions of assembling qubit arrays into the millions. In this paper, we introduce a feasible architectural design for large scale quantum computation in optical systems. We combine the recent developments in topological cluster state computation with the photonic module, a simple chip-based device that can be used as a fundamental building block for a large-scale computer. The integration of the topological cluster model with this comparatively simple operational element addresses many significant issues in scalable computing and leads to a promising modular architecture with complete integration of active error correction, exhibiting high fault-tolerant thresholds.

  17. Computer Architecture Techniques for Power-Efficiency

    Kaxiras, Stefanos

    2008-01-01

    In the last few years, power dissipation has become an important design constraint, on par with performance, in the design of new computer systems. Whereas in the past, the primary job of the computer architect was to translate improvements in operating frequency and transistor count into performance, now power efficiency must be taken into account at every step of the design process. While for some time, architects have been successful in delivering 40% to 50% annual improvement in processor performance, costs that were previously brushed aside eventually caught up. The most critical of these

  18. Centaure: an heterogeneous parallel architecture for computer vision

    Peythieux, Marc

    1997-01-01

    This dissertation deals with the architecture of parallel computers dedicated to computer vision. In the first chapter, the problem to be solved is presented, as well as the architecture of the Sympati and Symphonie computers, on which this work is based. The second chapter is about the state of the art of computers and integrated processors that can execute computer vision and image processing codes. The third chapter contains a description of the architecture of Centaure. It has an heterogeneous structure: it is composed of a multiprocessor system based on Analog Devices ADSP21060 Sharc digital signal processor, and of a set of Symphonie computers working in a multi-SIMD fashion. Centaure also has a modular structure. Its basic node is composed of one Symphonie computer, tightly coupled to a Sharc thanks to a dual ported memory. The nodes of Centaure are linked together by the Sharc communication links. The last chapter deals with a performance validation of Centaure. The execution times on Symphonie and on Centaure of a benchmark which is typical of industrial vision, are presented and compared. In the first place, these results show that the basic node of Centaure allows a faster execution than Symphonie, and that increasing the size of the tested computer leads to a better speed-up with Centaure than with Symphonie. In the second place, these results validate the choice of running the low level structure of Centaure in a multi- SIMD fashion. (author) [fr

  19. Neuromorphic Computing – From Materials Research to Systems Architecture Roundtable

    Schuller, Ivan K. [Univ. of California, San Diego, CA (United States); Stevens, Rick [Argonne National Lab. (ANL), Argonne, IL (United States); Univ. of Chicago, IL (United States); Pino, Robinson [Dept. of Energy (DOE) Office of Science, Washington, DC (United States); Pechan, Michael [Dept. of Energy (DOE) Office of Science, Washington, DC (United States)

    2015-10-29

    Computation in its many forms is the engine that fuels our modern civilization. Modern computation—based on the von Neumann architecture—has allowed, until now, the development of continuous improvements, as predicted by Moore’s law. However, computation using current architectures and materials will inevitably—within the next 10 years—reach a limit because of fundamental scientific reasons. DOE convened a roundtable of experts in neuromorphic computing systems, materials science, and computer science in Washington on October 29-30, 2015 to address the following basic questions: Can brain-like (“neuromorphic”) computing devices based on new material concepts and systems be developed to dramatically outperform conventional CMOS based technology? If so, what are the basic research challenges for materials sicence and computing? The overarching answer that emerged was: The development of novel functional materials and devices incorporated into unique architectures will allow a revolutionary technological leap toward the implementation of a fully “neuromorphic” computer. To address this challenge, the following issues were considered: The main differences between neuromorphic and conventional computing as related to: signaling models, timing/clock, non-volatile memory, architecture, fault tolerance, integrated memory and compute, noise tolerance, analog vs. digital, and in situ learning New neuromorphic architectures needed to: produce lower energy consumption, potential novel nanostructured materials, and enhanced computation Device and materials properties needed to implement functions such as: hysteresis, stability, and fault tolerance Comparisons of different implementations: spin torque, memristors, resistive switching, phase change, and optical schemes for enhanced breakthroughs in performance, cost, fault tolerance, and/or manufacturability.

  20. Development of a Computational Steering Framework for High Performance Computing Environments on Blue Gene/P Systems

    Danani, Bob K.

    2012-01-01

    of simulation results in a post-processing step is now transformed into a real-time interactive workflow that significantly reduces development and testing time. Computational steering provides the capability to direct or re-direct the progress of a simulation