WorldWideScience

Sample records for hardware accelerated scalable

  1. Hardware Accelerated Sequence Alignment with Traceback

    Directory of Open Access Journals (Sweden)

    Scott Lloyd

    2009-01-01

    in a timely manner. Known methods to accelerate alignment on reconfigurable hardware only address sequence comparison, limit the sequence length, or exhibit memory and I/O bottlenecks. A space-efficient, global sequence alignment algorithm and architecture is presented that accelerates the forward scan and traceback in hardware without memory and I/O limitations. With 256 processing elements in FPGA technology, a performance gain over 300 times that of a desktop computer is demonstrated on sequence lengths of 16000. For greater performance, the architecture is scalable to more processing elements.

  2. A Framework for Hardware-Accelerated Services Using Partially Reconfigurable SoCs

    Directory of Open Access Journals (Sweden)

    MACHIDON, O. M.

    2016-05-01

    Full Text Available The current trend towards ?Everything as a Service? fosters a new approach on reconfigurable hardware resources. This innovative, service-oriented approach has the potential of bringing a series of benefits for both reconfigurable and distributed computing fields by favoring a hardware-based acceleration of web services and increasing service performance. This paper proposes a framework for accelerating web services by offloading the compute-intensive tasks to reconfigurable System-on-Chip (SoC devices, as integrated IP (Intellectual Property cores. The framework provides a scalable, dynamic management of the tasks and hardware processing cores, based on dynamic partial reconfiguration of the SoC. We have enhanced security of the entire system by making use of the built-in detection features of the hardware device and also by implementing active counter-measures that protect the sensitive data.

  3. Programming time-multiplexed reconfigurable hardware using a scalable neuromorphic compiler.

    Science.gov (United States)

    Minkovich, Kirill; Srinivasa, Narayan; Cruz-Albrecht, Jose M; Cho, Youngkwan; Nogin, Aleksey

    2012-06-01

    Scalability and connectivity are two key challenges in designing neuromorphic hardware that can match biological levels. In this paper, we describe a neuromorphic system architecture design that addresses an approach to meet these challenges using traditional complementary metal-oxide-semiconductor (CMOS) hardware. A key requirement in realizing such neural architectures in hardware is the ability to automatically configure the hardware to emulate any neural architecture or model. The focus for this paper is to describe the details of such a programmable front-end. This programmable front-end is composed of a neuromorphic compiler and a digital memory, and is designed based on the concept of synaptic time-multiplexing (STM). The neuromorphic compiler automatically translates any given neural architecture to hardware switch states and these states are stored in digital memory to enable desired neural architectures. STM enables our proposed architecture to address scalability and connectivity using traditional CMOS hardware. We describe the details of the proposed design and the programmable front-end, and provide examples to illustrate its capabilities. We also provide perspectives for future extensions and potential applications.

  4. Scalable fast multipole accelerated vortex methods

    KAUST Repository

    Hu, Qi

    2014-05-01

    The fast multipole method (FMM) is often used to accelerate the calculation of particle interactions in particle-based methods to simulate incompressible flows. To evaluate the most time-consuming kernels - the Biot-Savart equation and stretching term of the vorticity equation, we mathematically reformulated it so that only two Laplace scalar potentials are used instead of six. This automatically ensuring divergence-free far-field computation. Based on this formulation, we developed a new FMM-based vortex method on heterogeneous architectures, which distributed the work between multicore CPUs and GPUs to best utilize the hardware resources and achieve excellent scalability. The algorithm uses new data structures which can dynamically manage inter-node communication and load balance efficiently, with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff functions induced by the vortex particle method. Our implementation can perform one time step of the velocity+stretching calculation for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s.

  5. Implementation of Hardware Accelerators on Zynq

    DEFF Research Database (Denmark)

    Toft, Jakob Kenn

    of the ARM Cortex-9 processor featured on the Zynq SoC, with regard to execution time, power dissipation and energy consumption. The implementation of the hardware accelerators were successful. Use of the Monte Carlo processor resulted in a significant increase in performance. The Telco hardware accelerator......In the recent years it has become obvious that the performance of general purpose processors are having trouble meeting the requirements of high performance computing applications of today. This is partly due to the relatively high power consumption, compared to the performance, of general purpose...... processors, which has made hardware accelerators an essential part of several datacentres and the worlds fastest super-computers. In this work, two different hardware accelerators were implemented on a Xilinx Zynq SoC platform mounted on the ZedBoard platform. The two accelerators are based on two different...

  6. Evaluating the scalability of HEP software and multi-core hardware

    CERN Document Server

    Jarp, S; Leduc, J; Nowak, A

    2011-01-01

    As researchers have reached the practical limits of processor performance improvements by frequency scaling, it is clear that the future of computing lies in the effective utilization of parallel and multi-core architectures. Since this significant change in computing is well underway, it is vital for HEP programmers to understand the scalability of their software on modern hardware and the opportunities for potential improvements. This work aims to quantify the benefit of new mainstream architectures to the HEP community through practical benchmarking on recent hardware solutions, including the usage of parallelized HEP applications.

  7. Hardware Accelerators Targeting a Novel Group Based Packet Classification Algorithm

    Directory of Open Access Journals (Sweden)

    O. Ahmed

    2013-01-01

    Full Text Available Packet classification is a ubiquitous and key building block for many critical network devices. However, it remains as one of the main bottlenecks faced when designing fast network devices. In this paper, we propose a novel Group Based Search packet classification Algorithm (GBSA that is scalable, fast, and efficient. GBSA consumes an average of 0.4 megabytes of memory for a 10 k rule set. The worst-case classification time per packet is 2 microseconds, and the preprocessing speed is 3 M rules/second based on an Xeon processor operating at 3.4 GHz. When compared with other state-of-the-art classification techniques, the results showed that GBSA outperforms the competition with respect to speed, memory usage, and processing time. Moreover, GBSA is amenable to implementation in hardware. Three different hardware implementations are also presented in this paper including an Application Specific Instruction Set Processor (ASIP implementation and two pure Register-Transfer Level (RTL implementations based on Impulse-C and Handel-C flows, respectively. Speedups achieved with these hardware accelerators ranged from 9x to 18x compared with a pure software implementation running on an Xeon processor.

  8. Hardware Accelerated Simulated Radiography

    International Nuclear Information System (INIS)

    Laney, D; Callahan, S; Max, N; Silva, C; Langer, S; Frank, R

    2005-01-01

    We present the application of hardware accelerated volume rendering algorithms to the simulation of radiographs as an aid to scientists designing experiments, validating simulation codes, and understanding experimental data. The techniques presented take advantage of 32 bit floating point texture capabilities to obtain validated solutions to the radiative transport equation for X-rays. An unsorted hexahedron projection algorithm is presented for curvilinear hexahedra that produces simulated radiographs in the absorption-only regime. A sorted tetrahedral projection algorithm is presented that simulates radiographs of emissive materials. We apply the tetrahedral projection algorithm to the simulation of experimental diagnostics for inertial confinement fusion experiments on a laser at the University of Rochester. We show that the hardware accelerated solution is faster than the current technique used by scientists

  9. Hardware-Accelerated Simulated Radiography

    International Nuclear Information System (INIS)

    Laney, D; Callahan, S; Max, N; Silva, C; Langer, S.; Frank, R

    2005-01-01

    We present the application of hardware accelerated volume rendering algorithms to the simulation of radiographs as an aid to scientists designing experiments, validating simulation codes, and understanding experimental data. The techniques presented take advantage of 32-bit floating point texture capabilities to obtain solutions to the radiative transport equation for X-rays. The hardware accelerated solutions are accurate enough to enable scientists to explore the experimental design space with greater efficiency than the methods currently in use. An unsorted hexahedron projection algorithm is presented for curvilinear hexahedral meshes that produces simulated radiographs in the absorption-only regime. A sorted tetrahedral projection algorithm is presented that simulates radiographs of emissive materials. We apply the tetrahedral projection algorithm to the simulation of experimental diagnostics for inertial confinement fusion experiments on a laser at the University of Rochester

  10. Flexible hardware design for RSA and Elliptic Curve Cryptosystems

    NARCIS (Netherlands)

    Batina, L.; Bruin - Muurling, G.; Örs, S.B.; Okamoto, T.

    2004-01-01

    This paper presents a scalable hardware implementation of both commonly used public key cryptosystems, RSA and Elliptic Curve Cryptosystem (ECC) on the same platform. The introduced hardware accelerator features a design which can be varied from very small (less than 20 Kgates) targeting wireless

  11. Design of hardware accelerators for demanding applications.

    NARCIS (Netherlands)

    Jozwiak, L.; Jan, Y.

    2010-01-01

    This paper focuses on mastering the architecture development of hardware accelerators. It presents the results of our analysis of the main issues that have to be addressed when designing accelerators for modern demanding applications, when using as an example the accelerator design for LDPC decoding

  12. A Hardware Framework for on-Chip FPGA Acceleration

    DEFF Research Database (Denmark)

    Lomuscio, Andrea; Cardarilli, Gian Carlo; Nannarelli, Alberto

    2016-01-01

    In this work, we present a new framework to dynamically load hardware accelerators on reconfigurable platforms (FPGAs). Provided a library of application-specific processors, we load on-the-fly the specific processor in the FPGA, and we transfer the execution from the CPU to the FPGA-based accele......In this work, we present a new framework to dynamically load hardware accelerators on reconfigurable platforms (FPGAs). Provided a library of application-specific processors, we load on-the-fly the specific processor in the FPGA, and we transfer the execution from the CPU to the FPGA......-based accelerator. Results show that significant speed-up can be obtained by the proposed acceleration framework on system-on-chips where reconfigurable fabric is placed next to the CPUs. The speed-up is due to both the intrinsic acceleration in the application-specific processors, and to the increased parallelism....

  13. Hardware availability calculations and results of the IFMIF accelerator facility

    International Nuclear Information System (INIS)

    Bargalló, Enric; Arroyo, Jose Manuel; Abal, Javier; Beauvais, Pierre-Yves; Gobin, Raphael; Orsini, Fabienne; Weber, Moisés; Podadera, Ivan; Grespan, Francesco; Fagotti, Enrico; De Blas, Alfredo; Dies, Javier; Tapia, Carlos; Mollá, Joaquín; Ibarra, Ángel

    2014-01-01

    Highlights: • IFMIF accelerator facility hardware availability analyses methodology is described. • Results of the individual hardware availability analyses are shown for the reference design. • Accelerator design improvements are proposed for each system. • Availability results are evaluated and compared with the requirements. - Abstract: Hardware availability calculations have been done individually for each system of the deuteron accelerators of the International Fusion Materials Irradiation Facility (IFMIF). The principal goal of these analyses is to estimate the availability of the systems, compare it with the challenging IFMIF requirements and find new paths to improve availability performances. Major unavailability contributors are highlighted and possible design changes are proposed in order to achieve the hardware availability requirements established for each system. In this paper, such possible improvements are implemented in fault tree models and the availability results are evaluated. The parallel activity on the design and construction of the linear IFMIF prototype accelerator (LIPAc) provides detailed design information for the RAMI (reliability, availability, maintainability and inspectability) analyses and allows finding out the improvements that the final accelerator could have. Because of the R and D behavior of the LIPAc, RAMI improvements could be the major differences between the prototype and the IFMIF accelerator design

  14. Hardware availability calculations and results of the IFMIF accelerator facility

    Energy Technology Data Exchange (ETDEWEB)

    Bargalló, Enric, E-mail: enric.bargallo-font@upc.edu [Fusion Energy Engineering Laboratory (FEEL), Technical University of Catalonia (UPC), Barcelona (Spain); Arroyo, Jose Manuel [Laboratorio Nacional de Fusión por Confinamiento Magnético – CIEMAT, Madrid (Spain); Abal, Javier [Fusion Energy Engineering Laboratory (FEEL), Technical University of Catalonia (UPC), Barcelona (Spain); Beauvais, Pierre-Yves; Gobin, Raphael; Orsini, Fabienne [Commissariat à l’Energie Atomique, Saclay (France); Weber, Moisés; Podadera, Ivan [Laboratorio Nacional de Fusión por Confinamiento Magnético – CIEMAT, Madrid (Spain); Grespan, Francesco; Fagotti, Enrico [Istituto Nazionale di Fisica Nucleare, Legnaro (Italy); De Blas, Alfredo; Dies, Javier; Tapia, Carlos [Fusion Energy Engineering Laboratory (FEEL), Technical University of Catalonia (UPC), Barcelona (Spain); Mollá, Joaquín; Ibarra, Ángel [Laboratorio Nacional de Fusión por Confinamiento Magnético – CIEMAT, Madrid (Spain)

    2014-10-15

    Highlights: • IFMIF accelerator facility hardware availability analyses methodology is described. • Results of the individual hardware availability analyses are shown for the reference design. • Accelerator design improvements are proposed for each system. • Availability results are evaluated and compared with the requirements. - Abstract: Hardware availability calculations have been done individually for each system of the deuteron accelerators of the International Fusion Materials Irradiation Facility (IFMIF). The principal goal of these analyses is to estimate the availability of the systems, compare it with the challenging IFMIF requirements and find new paths to improve availability performances. Major unavailability contributors are highlighted and possible design changes are proposed in order to achieve the hardware availability requirements established for each system. In this paper, such possible improvements are implemented in fault tree models and the availability results are evaluated. The parallel activity on the design and construction of the linear IFMIF prototype accelerator (LIPAc) provides detailed design information for the RAMI (reliability, availability, maintainability and inspectability) analyses and allows finding out the improvements that the final accelerator could have. Because of the R and D behavior of the LIPAc, RAMI improvements could be the major differences between the prototype and the IFMIF accelerator design.

  15. Acceleration of Meshfree Radial Point Interpolation Method on Graphics Hardware

    International Nuclear Information System (INIS)

    Nakata, Susumu

    2008-01-01

    This article describes a parallel computational technique to accelerate radial point interpolation method (RPIM)-based meshfree method using graphics hardware. RPIM is one of the meshfree partial differential equation solvers that do not require the mesh structure of the analysis targets. In this paper, a technique for accelerating RPIM using graphics hardware is presented. In the method, the computation process is divided into small processes suitable for processing on the parallel architecture of the graphics hardware in a single instruction multiple data manner.

  16. Scalable devices

    KAUST Repository

    Krüger, Jens J.

    2014-01-01

    In computer science in general and in particular the field of high performance computing and supercomputing the term scalable plays an important role. It indicates that a piece of hardware, a concept, an algorithm, or an entire system scales with the size of the problem, i.e., it can not only be used in a very specific setting but it\\'s applicable for a wide range of problems. From small scenarios to possibly very large settings. In this spirit, there exist a number of fixed areas of research on scalability. There are works on scalable algorithms, scalable architectures but what are scalable devices? In the context of this chapter, we are interested in a whole range of display devices, ranging from small scale hardware such as tablet computers, pads, smart-phones etc. up to large tiled display walls. What interests us mostly is not so much the hardware setup but mostly the visualization algorithms behind these display systems that scale from your average smart phone up to the largest gigapixel display walls.

  17. A Framework for Dynamically-Loaded Hardware Library (HLL) in FPGA Acceleration

    DEFF Research Database (Denmark)

    Cardarilli, Gian Carlo; Di Carlo, Leonardo; Nannarelli, Alberto

    2016-01-01

    Hardware acceleration is often used to address the need for speed and computing power in embedded systems. FPGAs always represented a good solution for HW acceleration and, recently, new SoC platforms extended the flexibility of the FPGAs by combining on a single chip both high-performance CPUs...... and FPGA fabric. The aim of this work is the implementation of hardware accelerators for these new SoCs. The innovative feature of these accelerators is the on-the-fly reconfiguration of the hardware to dynamically adapt the accelerator’s functionalities to the current CPU workload. The realization...... of the accelerators preliminarily requires also the profiling of both the SW (ARM CPU + NEON Units) and HW (FPGA) performance, an evaluation of the partial reconfiguration times and the development of an applicationspecific IP-cores library. This paper focuses on the profiling aspect of both the SW and HW...

  18. Accelerator Technology: Injection and Extraction Related Hardware: Kickers and Septa

    CERN Document Server

    Barnes, M J; Mertens, V

    2013-01-01

    This document is part of Subvolume C 'Accelerators and Colliders' of Volume 21 'Elementary Particles' of Landolt-Börnstein - Group I 'Elementary Particles, Nuclei and Atoms'. It contains the the Section '8.7 Injection and Extraction Related Hardware: Kickers and Septa' of the Chapter '8 Accelerator Technology' with the content: 8.7 Injection and Extraction Related Hardware: Kickers and Septa 8.7.1 Fast Pulsed Systems (Kickers) 8.7.2 Electrostatic and Magnetic Septa

  19. The NIDS Cluster: Scalable, Stateful Network Intrusion Detection on Commodity Hardware

    Energy Technology Data Exchange (ETDEWEB)

    Tierney, Brian L; Vallentin, Matthias; Sommer, Robin; Lee, Jason; Leres, Craig; Paxson, Vern; Tierney, Brian

    2007-09-19

    In this work we present a NIDS cluster as a scalable solution for realizing high-performance, stateful network intrusion detection on commodity hardware. The design addresses three challenges: (i) distributing traffic evenly across an extensible set of analysis nodes in a fashion that minimizes the communication required for coordination, (ii) adapting the NIDS's operation to support coordinating its low-level analysis rather than just aggregating alerts; and (iii) validating that the cluster produces sound results. Prototypes of our NIDS cluster now operate at the Lawrence Berkeley National Laboratory and the University of California at Berkeley. In both environments the clusters greatly enhance the power of the network security monitoring.

  20. Interfacing Hardware Accelerators to a Time-Division Multiplexing Network-on-Chip

    DEFF Research Database (Denmark)

    Pezzarossa, Luca; Sørensen, Rasmus Bo; Schoeberl, Martin

    2015-01-01

    This paper addresses the integration of stateless hardware accelerators into time-predictable multi-core platforms based on time-division multiplexing networks-on-chip. Stateless hardware accelerators, like floating-point units, are typically attached as co-processors to individual processors in ...... implementation. The design evaluation is carried out using the open source T-CREST multi-core platform implemented on an Altera Cyclone IV FPGA. The size of the proposed design, including a floating-point accelerator, is about two-thirds of a processor....

  1. Hardware Acceleration of Adaptive Neural Algorithms.

    Energy Technology Data Exchange (ETDEWEB)

    James, Conrad D. [Sandia National Lab. (SNL-NM), Albuquerque, NM (United States)

    2017-11-01

    As tradit ional numerical computing has faced challenges, researchers have turned towards alternative computing approaches to reduce power - per - computation metrics and improve algorithm performance. Here, we describe an approach towards non - conventional computing that strengthens the connection between machine learning and neuroscience concepts. The Hardware Acceleration of Adaptive Neural Algorithms (HAANA) project ha s develop ed neural machine learning algorithms and hardware for applications in image processing and cybersecurity. While machine learning methods are effective at extracting relevant features from many types of data, the effectiveness of these algorithms degrades when subjected to real - world conditions. Our team has generated novel neural - inspired approa ches to improve the resiliency and adaptability of machine learning algorithms. In addition, we have also designed and fabricated hardware architectures and microelectronic devices specifically tuned towards the training and inference operations of neural - inspired algorithms. Finally, our multi - scale simulation framework allows us to assess the impact of microelectronic device properties on algorithm performance.

  2. Hardware Implementation of Lossless Adaptive and Scalable Hyperspectral Data Compression for Space

    Science.gov (United States)

    Aranki, Nazeeh; Keymeulen, Didier; Bakhshi, Alireza; Klimesh, Matthew

    2009-01-01

    On-board lossless hyperspectral data compression reduces data volume in order to meet NASA and DoD limited downlink capabilities. The technique also improves signature extraction, object recognition and feature classification capabilities by providing exact reconstructed data on constrained downlink resources. At JPL a novel, adaptive and predictive technique for lossless compression of hyperspectral data was recently developed. This technique uses an adaptive filtering method and achieves a combination of low complexity and compression effectiveness that far exceeds state-of-the-art techniques currently in use. The JPL-developed 'Fast Lossless' algorithm requires no training data or other specific information about the nature of the spectral bands for a fixed instrument dynamic range. It is of low computational complexity and thus well-suited for implementation in hardware. A modified form of the algorithm that is better suited for data from pushbroom instruments is generally appropriate for flight implementation. A scalable field programmable gate array (FPGA) hardware implementation was developed. The FPGA implementation achieves a throughput performance of 58 Msamples/sec, which can be increased to over 100 Msamples/sec in a parallel implementation that uses twice the hardware resources This paper describes the hardware implementation of the 'Modified Fast Lossless' compression algorithm on an FPGA. The FPGA implementation targets the current state-of-the-art FPGAs (Xilinx Virtex IV and V families) and compresses one sample every clock cycle to provide a fast and practical real-time solution for space applications.

  3. Evaluation of accelerated iterative x-ray CT image reconstruction using floating point graphics hardware

    International Nuclear Information System (INIS)

    Kole, J S; Beekman, F J

    2006-01-01

    Statistical reconstruction methods offer possibilities to improve image quality as compared with analytical methods, but current reconstruction times prohibit routine application in clinical and micro-CT. In particular, for cone-beam x-ray CT, the use of graphics hardware has been proposed to accelerate the forward and back-projection operations, in order to reduce reconstruction times. In the past, wide application of this texture hardware mapping approach was hampered owing to limited intrinsic accuracy. Recently, however, floating point precision has become available in the latest generation commodity graphics cards. In this paper, we utilize this feature to construct a graphics hardware accelerated version of the ordered subset convex reconstruction algorithm. The aims of this paper are (i) to study the impact of using graphics hardware acceleration for statistical reconstruction on the reconstructed image accuracy and (ii) to measure the speed increase one can obtain by using graphics hardware acceleration. We compare the unaccelerated algorithm with the graphics hardware accelerated version, and for the latter we consider two different interpolation techniques. A simulation study of a micro-CT scanner with a mathematical phantom shows that at almost preserved reconstructed image accuracy, speed-ups of a factor 40 to 222 can be achieved, compared with the unaccelerated algorithm, and depending on the phantom and detector sizes. Reconstruction from physical phantom data reconfirms the usability of the accelerated algorithm for practical cases

  4. Transform coding for hardware-accelerated volume rendering.

    Science.gov (United States)

    Fout, Nathaniel; Ma, Kwan-Liu

    2007-01-01

    Hardware-accelerated volume rendering using the GPU is now the standard approach for real-time volume rendering, although limited graphics memory can present a problem when rendering large volume data sets. Volumetric compression in which the decompression is coupled to rendering has been shown to be an effective solution to this problem; however, most existing techniques were developed in the context of software volume rendering, and all but the simplest approaches are prohibitive in a real-time hardware-accelerated volume rendering context. In this paper we present a novel block-based transform coding scheme designed specifically with real-time volume rendering in mind, such that the decompression is fast without sacrificing compression quality. This is made possible by consolidating the inverse transform with dequantization in such a way as to allow most of the reprojection to be precomputed. Furthermore, we take advantage of the freedom afforded by off-line compression in order to optimize the encoding as much as possible while hiding this complexity from the decoder. In this context we develop a new block classification scheme which allows us to preserve perceptually important features in the compression. The result of this work is an asymmetric transform coding scheme that allows very large volumes to be compressed and then decompressed in real-time while rendering on the GPU.

  5. 3D IBFV : Hardware-Accelerated 3D Flow Visualization

    NARCIS (Netherlands)

    Telea, Alexandru; Wijk, Jarke J. van

    2003-01-01

    We present a hardware-accelerated method for visualizing 3D flow fields. The method is based on insertion, advection, and decay of dye. To this aim, we extend the texture-based IBFV technique for 2D flow visualization in two main directions. First, we decompose the 3D flow visualization problem in a

  6. 3D IBFV : hardware-accelerated 3D flow visualization

    NARCIS (Netherlands)

    Telea, A.C.; Wijk, van J.J.

    2003-01-01

    We present a hardware-accelerated method for visualizing 3D flow fields. The method is based on insertion, advection, and decay of dye. To this aim, we extend the texture-based IBFV technique presented by van Wijk (2001) for 2D flow visualization in two main directions. First, we decompose the 3D

  7. Spectral-element Seismic Wave Propagation on CUDA/OpenCL Hardware Accelerators

    Science.gov (United States)

    Peter, D. B.; Videau, B.; Pouget, K.; Komatitsch, D.

    2015-12-01

    Seismic wave propagation codes are essential tools to investigate a variety of wave phenomena in the Earth. Furthermore, they can now be used for seismic full-waveform inversions in regional- and global-scale adjoint tomography. Although these seismic wave propagation solvers are crucial ingredients to improve the resolution of tomographic images to answer important questions about the nature of Earth's internal processes and subsurface structure, their practical application is often limited due to high computational costs. They thus need high-performance computing (HPC) facilities to improving the current state of knowledge. At present, numerous large HPC systems embed many-core architectures such as graphics processing units (GPUs) to enhance numerical performance. Such hardware accelerators can be programmed using either the CUDA programming environment or the OpenCL language standard. CUDA software development targets NVIDIA graphic cards while OpenCL was adopted by additional hardware accelerators, like e.g. AMD graphic cards, ARM-based processors as well as Intel Xeon Phi coprocessors. For seismic wave propagation simulations using the open-source spectral-element code package SPECFEM3D_GLOBE, we incorporated an automatic source-to-source code generation tool (BOAST) which allows us to use meta-programming of all computational kernels for forward and adjoint runs. Using our BOAST kernels, we generate optimized source code for both CUDA and OpenCL languages within the source code package. Thus, seismic wave simulations are able now to fully utilize CUDA and OpenCL hardware accelerators. We show benchmarks of forward seismic wave propagation simulations using SPECFEM3D_GLOBE on CUDA/OpenCL GPUs, validating results and comparing performances for different simulations and hardware usages.

  8. Automatic Optimization of Hardware Accelerators for Image Processing

    OpenAIRE

    Reiche, Oliver; Häublein, Konrad; Reichenbach, Marc; Hannig, Frank; Teich, Jürgen; Fey, Dietmar

    2015-01-01

    In the domain of image processing, often real-time constraints are required. In particular, in safety-critical applications, such as X-ray computed tomography in medical imaging or advanced driver assistance systems in the automotive domain, timing is of utmost importance. A common approach to maintain real-time capabilities of compute-intensive applications is to offload those computations to dedicated accelerator hardware, such as Field Programmable Gate Arrays (FPGAs). Programming such arc...

  9. Scalable devices

    KAUST Repository

    Krü ger, Jens J.; Hadwiger, Markus

    2014-01-01

    In computer science in general and in particular the field of high performance computing and supercomputing the term scalable plays an important role. It indicates that a piece of hardware, a concept, an algorithm, or an entire system scales

  10. Establishing a novel modeling tool: a python-based interface for a neuromorphic hardware system.

    Science.gov (United States)

    Brüderle, Daniel; Müller, Eric; Davison, Andrew; Muller, Eilif; Schemmel, Johannes; Meier, Karlheinz

    2009-01-01

    Neuromorphic hardware systems provide new possibilities for the neuroscience modeling community. Due to the intrinsic parallelism of the micro-electronic emulation of neural computation, such models are highly scalable without a loss of speed. However, the communities of software simulator users and neuromorphic engineering in neuroscience are rather disjoint. We present a software concept that provides the possibility to establish such hardware devices as valuable modeling tools. It is based on the integration of the hardware interface into a simulator-independent language which allows for unified experiment descriptions that can be run on various simulation platforms without modification, implying experiment portability and a huge simplification of the quantitative comparison of hardware and simulator results. We introduce an accelerated neuromorphic hardware device and describe the implementation of the proposed concept for this system. An example setup and results acquired by utilizing both the hardware system and a software simulator are demonstrated.

  11. Hardware dependencies of GPU-accelerated beamformer performances for microwave breast cancer detection

    Directory of Open Access Journals (Sweden)

    Salomon Christoph J.

    2016-09-01

    Full Text Available UWB microwave imaging has proven to be a promising technique for early-stage breast cancer detection. The extensive image reconstruction time can be accelerated by parallelizing the execution of the underlying beamforming algorithms. However, the efficiency of the parallelization will most likely depend on the grade of parallelism of the imaging algorithm and of the utilized hardware. This paper investigates the dependencies of two different beamforming algorithms on multiple hardware specification of several graphics boards. The parallel implementation is realized by using NVIDIA’s CUDA. Three conclusions are drawn about the behavior of the parallel implementation and how to efficiently use the accessible hardware.

  12. Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms.

    Science.gov (United States)

    Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian

    2018-01-01

    We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).

  13. Open Hardware for CERN's accelerator control systems

    International Nuclear Information System (INIS)

    Bij, E van der; Serrano, J; Wlostowski, T; Cattin, M; Gousiou, E; Sanchez, P Alvarez; Boccardi, A; Voumard, N; Penacoba, G

    2012-01-01

    The accelerator control systems at CERN will be upgraded and many electronics modules such as analog and digital I/O, level converters and repeaters, serial links and timing modules are being redesigned. The new developments are based on the FPGA Mezzanine Card, PCI Express and VME64x standards while the Wishbone specification is used as a system on a chip bus. To attract partners, the projects are developed in an 'Open' fashion. Within this Open Hardware project new ways of working with industry are being evaluated and it has been proven that industry can be involved at all stages, from design to production and support.

  14. Establishing a novel modeling tool: a python-based interface for a neuromorphic hardware system

    Directory of Open Access Journals (Sweden)

    Daniel Brüderle

    2009-06-01

    Full Text Available Neuromorphic hardware systems provide new possibilities for the neuroscience modeling community. Due to the intrinsic parallelism of the micro-electronic emulation of neural computation, such models are highly scalable without a loss of speed. However, the communities of software simulator users and neuromorphic engineering in neuroscience are rather disjoint. We present a software concept that provides the possibility to establish such hardware devices as valuable modeling tools. It is based on the integration of the hardware interface into a simulator-independent language which allows for unified experiment descriptions that can be run on various simulation platforms without modification, implying experiment portability and a huge simplification of the quantitative comparison of hardware and simulator results. We introduce an accelerated neuromorphic hardware device and describe the implementation of the proposed concept for this system. An example setup and results acquired by utilizing both the hardware system and a software simulator are demonstrated.

  15. Hardware Design Considerations for Edge-Accelerated Stereo Correspondence Algorithms

    Directory of Open Access Journals (Sweden)

    Christos Ttofis

    2012-01-01

    Full Text Available Stereo correspondence is a popular algorithm for the extraction of depth information from a pair of rectified 2D images. Hence, it has been used in many computer vision applications that require knowledge about depth. However, stereo correspondence is a computationally intensive algorithm and requires high-end hardware resources in order to achieve real-time processing speed in embedded computer vision systems. This paper presents an overview of the use of edge information as a means to accelerate hardware implementations of stereo correspondence algorithms. The presented approach restricts the stereo correspondence algorithm only to the edges of the input images rather than to all image points, thus resulting in a considerable reduction of the search space. The paper highlights the benefits of the edge-directed approach by applying it to two stereo correspondence algorithms: an SAD-based fixed-support algorithm and a more complex adaptive support weight algorithm. Furthermore, we present design considerations about the implementation of these algorithms on reconfigurable hardware and also discuss issues related to the memory structures needed, the amount of parallelism that can be exploited, the organization of the processing blocks, and so forth. The two architectures (fixed-support based versus adaptive-support weight based are compared in terms of processing speed, disparity map accuracy, and hardware overheads, when both are implemented on a Virtex-5 FPGA platform.

  16. An Introduction to Parallelism, Concurrency and Acceleration (1/2)

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    Concurrency and parallelism are firm elements of any modern computing infrastructure, made even more prominent by the emergence of accelerators. These lectures offer an introduction to these important concepts. We will begin with a brief refresher of recent hardware offerings to modern-day programmers. We will then open the main discussion with an overview of the laws and practical aspects of scalability. Key parallelism data structures, patterns and algorithms will be shown. The main threats to scalability and mitigation strategies will be discussed in the context of real-life optimization problems.

  17. FPGA Hardware Acceleration of a Phylogenetic Tree Reconstruction with Maximum Parsimony Algorithm

    OpenAIRE

    BLOCK, Henry; MARUYAMA, Tsutomu

    2017-01-01

    In this paper, we present an FPGA hardware implementation for a phylogenetic tree reconstruction with a maximum parsimony algorithm. We base our approach on a particular stochastic local search algorithm that uses the Progressive Neighborhood and the Indirect Calculation of Tree Lengths method. This method is widely used for the acceleration of the phylogenetic tree reconstruction algorithm in software. In our implementation, we define a tree structure and accelerate the search by parallel an...

  18. Optimizing memory-bound SYMV kernel on GPU hardware accelerators

    KAUST Repository

    Abdelfattah, Ahmad

    2013-01-01

    Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming language extensions (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized numerical kernel for computing the symmetric matrix-vector product on nVidia Fermi GPUs. Due to its inherent memory-bound nature, this kernel is very critical in the tridiagonalization of a symmetric dense matrix, which is a preprocessing step to calculate the eigenpairs. Using a novel design to address the irregular memory accesses by hiding latency and increasing bandwidth, our preliminary asymptotic results show 3.5x and 2.5x fold speedups over the similar CUBLAS 4.0 kernel, and 7-8% and 30% fold improvement over the Matrix Algebra on GPU and Multicore Architectures (MAGMA) library in single and double precision arithmetics, respectively. © 2013 Springer-Verlag.

  19. GASPRNG: GPU accelerated scalable parallel random number generator library

    Science.gov (United States)

    Gao, Shuang; Peterson, Gregory D.

    2013-04-01

    Graphics processors represent a promising technology for accelerating computational science applications. Many computational science applications require fast and scalable random number generation with good statistical properties, so they use the Scalable Parallel Random Number Generators library (SPRNG). We present the GPU Accelerated SPRNG library (GASPRNG) to accelerate SPRNG in GPU-based high performance computing systems. GASPRNG includes code for a host CPU and CUDA code for execution on NVIDIA graphics processing units (GPUs) along with a programming interface to support various usage models for pseudorandom numbers and computational science applications executing on the CPU, GPU, or both. This paper describes the implementation approach used to produce high performance and also describes how to use the programming interface. The programming interface allows a user to be able to use GASPRNG the same way as SPRNG on traditional serial or parallel computers as well as to develop tightly coupled programs executing primarily on the GPU. We also describe how to install GASPRNG and use it. To help illustrate linking with GASPRNG, various demonstration codes are included for the different usage models. GASPRNG on a single GPU shows up to 280x speedup over SPRNG on a single CPU core and is able to scale for larger systems in the same manner as SPRNG. Because GASPRNG generates identical streams of pseudorandom numbers as SPRNG, users can be confident about the quality of GASPRNG for scalable computational science applications. Catalogue identifier: AEOI_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOI_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: UTK license. No. of lines in distributed program, including test data, etc.: 167900 No. of bytes in distributed program, including test data, etc.: 1422058 Distribution format: tar.gz Programming language: C and CUDA. Computer: Any PC or

  20. Forward and adjoint spectral-element simulations of seismic wave propagation using hardware accelerators

    Science.gov (United States)

    Peter, Daniel; Videau, Brice; Pouget, Kevin; Komatitsch, Dimitri

    2015-04-01

    Improving the resolution of tomographic images is crucial to answer important questions on the nature of Earth's subsurface structure and internal processes. Seismic tomography is the most prominent approach where seismic signals from ground-motion records are used to infer physical properties of internal structures such as compressional- and shear-wave speeds, anisotropy and attenuation. Recent advances in regional- and global-scale seismic inversions move towards full-waveform inversions which require accurate simulations of seismic wave propagation in complex 3D media, providing access to the full 3D seismic wavefields. However, these numerical simulations are computationally very expensive and need high-performance computing (HPC) facilities for further improving the current state of knowledge. During recent years, many-core architectures such as graphics processing units (GPUs) have been added to available large HPC systems. Such GPU-accelerated computing together with advances in multi-core central processing units (CPUs) can greatly accelerate scientific applications. There are mainly two possible choices of language support for GPU cards, the CUDA programming environment and OpenCL language standard. CUDA software development targets NVIDIA graphic cards while OpenCL was adopted mainly by AMD graphic cards. In order to employ such hardware accelerators for seismic wave propagation simulations, we incorporated a code generation tool BOAST into an existing spectral-element code package SPECFEM3D_GLOBE. This allows us to use meta-programming of computational kernels and generate optimized source code for both CUDA and OpenCL languages, running simulations on either CUDA or OpenCL hardware accelerators. We show here applications of forward and adjoint seismic wave propagation on CUDA/OpenCL GPUs, validating results and comparing performances for different simulations and hardware usages.

  1. A compact linear accelerator based on a scalable microelectromechanical-system RF-structure

    Science.gov (United States)

    Persaud, A.; Ji, Q.; Feinberg, E.; Seidl, P. A.; Waldron, W. L.; Schenkel, T.; Lal, A.; Vinayakumar, K. B.; Ardanuc, S.; Hammer, D. A.

    2017-06-01

    A new approach for a compact radio-frequency (RF) accelerator structure is presented. The new accelerator architecture is based on the Multiple Electrostatic Quadrupole Array Linear Accelerator (MEQALAC) structure that was first developed in the 1980s. The MEQALAC utilized RF resonators producing the accelerating fields and providing for higher beam currents through parallel beamlets focused using arrays of electrostatic quadrupoles (ESQs). While the early work obtained ESQs with lateral dimensions on the order of a few centimeters, using a printed circuit board (PCB), we reduce the characteristic dimension to the millimeter regime, while massively scaling up the potential number of parallel beamlets. Using Microelectromechanical systems scalable fabrication approaches, we are working on further reducing the characteristic dimension to the sub-millimeter regime. The technology is based on RF-acceleration components and ESQs implemented in the PCB or silicon wafers where each beamlet passes through beam apertures in the wafer. The complete accelerator is then assembled by stacking these wafers. This approach has the potential for fast and inexpensive batch fabrication of the components and flexibility in system design for application specific beam energies and currents. For prototyping the accelerator architecture, the components have been fabricated using the PCB. In this paper, we present proof of concept results of the principal components using the PCB: RF acceleration and ESQ focusing. Ongoing developments on implementing components in silicon and scaling of the accelerator technology to high currents and beam energies are discussed.

  2. A compact linear accelerator based on a scalable microelectromechanical-system RF-structure.

    Science.gov (United States)

    Persaud, A; Ji, Q; Feinberg, E; Seidl, P A; Waldron, W L; Schenkel, T; Lal, A; Vinayakumar, K B; Ardanuc, S; Hammer, D A

    2017-06-01

    A new approach for a compact radio-frequency (RF) accelerator structure is presented. The new accelerator architecture is based on the Multiple Electrostatic Quadrupole Array Linear Accelerator (MEQALAC) structure that was first developed in the 1980s. The MEQALAC utilized RF resonators producing the accelerating fields and providing for higher beam currents through parallel beamlets focused using arrays of electrostatic quadrupoles (ESQs). While the early work obtained ESQs with lateral dimensions on the order of a few centimeters, using a printed circuit board (PCB), we reduce the characteristic dimension to the millimeter regime, while massively scaling up the potential number of parallel beamlets. Using Microelectromechanical systems scalable fabrication approaches, we are working on further reducing the characteristic dimension to the sub-millimeter regime. The technology is based on RF-acceleration components and ESQs implemented in the PCB or silicon wafers where each beamlet passes through beam apertures in the wafer. The complete accelerator is then assembled by stacking these wafers. This approach has the potential for fast and inexpensive batch fabrication of the components and flexibility in system design for application specific beam energies and currents. For prototyping the accelerator architecture, the components have been fabricated using the PCB. In this paper, we present proof of concept results of the principal components using the PCB: RF acceleration and ESQ focusing. Ongoing developments on implementing components in silicon and scaling of the accelerator technology to high currents and beam energies are discussed.

  3. A Hardware Accelerator for Fault Simulation Utilizing a Reconfigurable Array Architecture

    Directory of Open Access Journals (Sweden)

    Sungho Kang

    1996-01-01

    Full Text Available In order to reduce cost and to achieve high speed a new hardware accelerator for fault simulation has been designed. The architecture of the new accelerator is based on a reconfigurabl mesh type processing element (PE array. Circuit elements at the same topological level are simulated concurrently, as in a pipelined process. A new parallel simulation algorithm expands all of the gates to two input gates in order to limit the number of faults to two at each gate, so that the faults can be distributed uniformly throughout the PE array. The PE array reconfiguration operation provides a simulation speed advantage by maximizing the use of each PE cell.

  4. Hardware Acceleration on Cloud Services: The use of Restricted Boltzmann Machines on Handwritten Digits Recognition

    Directory of Open Access Journals (Sweden)

    Eleni Bougioukou

    2018-02-01

    Full Text Available Cloud computing allows users and enterprises to process their data in high performance servers, thus reducing the need for advanced hardware at the client side. Although local processing is viable in many cases, collecting data from multiple clients and processing them in a server gives the best possible performance in terms of processing rate. In this work, the implementation of a high performance cloud computing engine for recognizing handwritten digits is presented. The engine exploits the benefits of cloud and uses a powerful hardware accelerator in order to classify the images received concurrently from multiple clients. The accelerator implements a number of neural networks, operating in parallel, resulting to a processing rate of more than 10 MImages/sec.

  5. Hardware accelerator design for tracking in smart camera

    Science.gov (United States)

    Singh, Sanjay; Dunga, Srinivasa Murali; Saini, Ravi; Mandal, A. S.; Shekhar, Chandra; Vohra, Anil

    2011-10-01

    Smart Cameras are important components in video analysis. For video analysis, smart cameras needs to detect interesting moving objects, track such objects from frame to frame, and perform analysis of object track in real time. Therefore, the use of real-time tracking is prominent in smart cameras. The software implementation of tracking algorithm on a general purpose processor (like PowerPC) could achieve low frame rate far from real-time requirements. This paper presents the SIMD approach based hardware accelerator designed for real-time tracking of objects in a scene. The system is designed and simulated using VHDL and implemented on Xilinx XUP Virtex-IIPro FPGA. Resulted frame rate is 30 frames per second for 250x200 resolution video in gray scale.

  6. A scalable healthcare information system based on a service-oriented architecture.

    Science.gov (United States)

    Yang, Tzu-Hsiang; Sun, Yeali S; Lai, Feipei

    2011-06-01

    Many existing healthcare information systems are composed of a number of heterogeneous systems and face the important issue of system scalability. This paper first describes the comprehensive healthcare information systems used in National Taiwan University Hospital (NTUH) and then presents a service-oriented architecture (SOA)-based healthcare information system (HIS) based on the service standard HL7. The proposed architecture focuses on system scalability, in terms of both hardware and software. Moreover, we describe how scalability is implemented in rightsizing, service groups, databases, and hardware scalability. Although SOA-based systems sometimes display poor performance, through a performance evaluation of our HIS based on SOA, the average response time for outpatient, inpatient, and emergency HL7Central systems are 0.035, 0.04, and 0.036 s, respectively. The outpatient, inpatient, and emergency WebUI average response times are 0.79, 1.25, and 0.82 s. The scalability of the rightsizing project and our evaluation results show that the SOA HIS we propose provides evidence that SOA can provide system scalability and sustainability in a highly demanding healthcare information system.

  7. Open Hardware For CERN's Accelerator Control Systems

    CERN Document Server

    van der Bij, E; Ayass, M; Boccardi, A; Cattin, M; Gil Soriano, C; Gousiou, E; Iglesias Gonsálvez, S; Penacoba Fernandez, G; Serrano, J; Voumard, N; Wlostowski, T

    2011-01-01

    The accelerator control systems at CERN will be renovated and many electronics modules will be redesigned as the modules they will replace cannot be bought anymore or use obsolete components. The modules used in the control systems are diverse: analog and digital I/O, level converters and repeaters, serial links and timing modules. Overall around 120 modules are supported that are used in systems such as beam instrumentation, cryogenics and power converters. Only a small percentage of the currently used modules are commercially available, while most of them had been specifically designed at CERN. The new developments are based on VITA and PCI-SIG standards such as FMC (FPGA Mezzanine Card), PCI Express and VME64x using transition modules. As system-on-chip interconnect, the public domain Wishbone specification is used. For the renovation, it is considered imperative to have for each board access to the full hardware design and its firmware so that problems could quickly be resolved by CERN engineers or its ...

  8. How to create successful Open Hardware projects - About White Rabbits and open fields

    CERN Document Server

    van der Bij, E; Lewis, J; Stana, T; Wlostowski, T; Gousiou, E; Serrano, J; Arruat, M; Lipinski, M M; Daniluk, G; Voumard, N; Cattin, M

    2013-01-01

    CERN's accelerator control group has embraced "Open Hardware" (OH) to facilitate peer review, avoid vendor lock-in and make support tasks scalable. A web-based tool for easing collaborative work was set up and the CERN OH Licence was created. New ADC, TDC, fine delay and carrier cards based on VITA and PCI-SIG standards were designed and drivers for Linux were written. Often industry was paid for developments, while quality and documentation was controlled by CERN. An innovative timing network was also developed with the OH paradigm. Industry now sells and supports these designs that find their way into new fields.

  9. Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data

    Directory of Open Access Journals (Sweden)

    Wanrong Huang

    2017-01-01

    Full Text Available The Internet applications, such as network searching, electronic commerce, and modern medical applications, produce and process massive data. Considerable data parallelism exists in computation processes of data-intensive applications. A traversal algorithm, breadth-first search (BFS, is fundamental in many graph processing applications and metrics when a graph grows in scale. A variety of scientific programming methods have been proposed for accelerating and parallelizing BFS because of the poor temporal and spatial locality caused by inherent irregular memory access patterns. However, new parallel hardware could provide better improvement for scientific methods. To address small-world graph problems, we propose a scalable and novel field-programmable gate array-based heterogeneous multicore system for scientific programming. The core is multithread for streaming processing. And the communication network InfiniBand is adopted for scalability. We design a binary search algorithm to address mapping to unify all processor addresses. Within the limits permitted by the Graph500 test bench after 1D parallel hybrid BFS algorithm testing, our 8-core and 8-thread-per-core system achieved superior performance and efficiency compared with the prior work under the same degree of parallelism. Our system is efficient not as a special acceleration unit but as a processor platform that deals with graph searching applications.

  10. How to create successful Open Hardware projects — About White Rabbits and open fields

    International Nuclear Information System (INIS)

    Bij, E van der; Arruat, M; Cattin, M; Daniluk, G; Cobas, J D Gonzalez; Gousiou, E; Lewis, J; Lipinski, M M; Serrano, J; Stana, T; Voumard, N; Wlostowski, T

    2013-01-01

    CERN's accelerator control group has embraced ''Open Hardware'' (OH) to facilitate peer review, avoid vendor lock-in and make support tasks scalable. A web-based tool for easing collaborative work was set up and the CERN OH Licence was created. New ADC, TDC, fine delay and carrier cards based on VITA and PCI-SIG standards were designed and drivers for Linux were written. Often industry was paid for developments, while quality and documentation was controlled by CERN. An innovative timing network was also developed with the OH paradigm. Industry now sells and supports these designs that find their way into new fields

  11. Scalable fast multipole accelerated vortex methods

    KAUST Repository

    Hu, Qi; Gumerov, Nail A.; Yokota, Rio; Barba, Lorena A.; Duraiswami, Ramani

    2014-01-01

    -node communication and load balance efficiently, with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff

  12. A Scalable Approach for Hardware Semiformal Verification

    OpenAIRE

    Grimm, Tomas; Lettnin, Djones; Hübner, Michael

    2018-01-01

    The current verification flow of complex systems uses different engines synergistically: virtual prototyping, formal verification, simulation, emulation and FPGA prototyping. However, none is able to verify a complete architecture. Furthermore, hybrid approaches aiming at complete verification use techniques that lower the overall complexity by increasing the abstraction level. This work focuses on the verification of complex systems at the RT level to handle the hardware peculiarities. Our r...

  13. Hardware accelerator design for change detection in smart camera

    Science.gov (United States)

    Singh, Sanjay; Dunga, Srinivasa Murali; Saini, Ravi; Mandal, A. S.; Shekhar, Chandra; Chaudhury, Santanu; Vohra, Anil

    2011-10-01

    Smart Cameras are important components in Human Computer Interaction. In any remote surveillance scenario, smart cameras have to take intelligent decisions to select frames of significant changes to minimize communication and processing overhead. Among many of the algorithms for change detection, one based on clustering based scheme was proposed for smart camera systems. However, such an algorithm could achieve low frame rate far from real-time requirements on a general purpose processors (like PowerPC) available on FPGAs. This paper proposes the hardware accelerator capable of detecting real time changes in a scene, which uses clustering based change detection scheme. The system is designed and simulated using VHDL and implemented on Xilinx XUP Virtex-IIPro FPGA board. Resulted frame rate is 30 frames per second for QVGA resolution in gray scale.

  14. Scalable fast multipole methods for vortex element methods

    KAUST Repository

    Hu, Qi

    2012-11-01

    We use a particle-based method to simulate incompressible flows, where the Fast Multipole Method (FMM) is used to accelerate the calculation of particle interactions. The most time-consuming kernelsâ\\'the Biot-Savart equation and stretching term of the vorticity equationâ\\'are mathematically reformulated so that only two Laplace scalar potentials are used instead of six, while automatically ensuring divergence-free far-field computation. Based on this formulation, and on our previous work for a scalar heterogeneous FMM algorithm, we develop a new FMM-based vortex method capable of simulating general flows including turbulence on heterogeneous architectures, which distributes the work between multi-core CPUs and GPUs to best utilize the hardware resources and achieve excellent scalability. The algorithm also uses new data structures which can dynamically manage inter-node communication and load balance efficiently but with only a small parallel construction overhead. This algorithm can scale to large-sized clusters showing both strong and weak scalability. Careful error and timing trade-off analysis are also performed for the cutoff functions induced by the vortex particle method. Our implementation can perform one time step of the velocity+stretching for one billion particles on 32 nodes in 55.9 seconds, which yields 49.12 Tflop/s. © 2012 IEEE.

  15. FPGA hardware acceleration for high performance neutron transport computation based on agent methodology - 318

    International Nuclear Information System (INIS)

    Shanjie, Xiao; Tatjana, Jevremovic

    2010-01-01

    The accurate, detailed and 3D neutron transport analysis for Gen-IV reactors is still time-consuming regardless of advanced computational hardware available in developed countries. This paper introduces a new concept in addressing the computational time while persevering the detailed and accurate modeling; a specifically designed FPGA co-processor accelerates robust AGENT methodology for complex reactor geometries. For the first time this approach is applied to accelerate the neutronics analysis. The AGENT methodology solves neutron transport equation using the method of characteristics. The AGENT methodology performance was carefully analyzed before the hardware design based on the FPGA co-processor was adopted. The most time-consuming kernel part is then transplanted into the FPGA co-processor. The FPGA co-processor is designed with data flow-driven non von-Neumann architecture and has much higher efficiency than the conventional computer architecture. Details of the FPGA co-processor design are introduced and the design is benchmarked using two different examples. The advanced chip architecture helps the FPGA co-processor obtaining more than 20 times speed up with its working frequency much lower than the CPU frequency. (authors)

  16. Memory Based Machine Intelligence Techniques in VLSI hardware

    OpenAIRE

    James, Alex Pappachen

    2012-01-01

    We briefly introduce the memory based approaches to emulate machine intelligence in VLSI hardware, describing the challenges and advantages. Implementation of artificial intelligence techniques in VLSI hardware is a practical and difficult problem. Deep architectures, hierarchical temporal memories and memory networks are some of the contemporary approaches in this area of research. The techniques attempt to emulate low level intelligence tasks and aim at providing scalable solutions to high ...

  17. A Hardware-Accelerated Quantum Monte Carlo framework (HAQMC) for N-body systems

    Science.gov (United States)

    Gothandaraman, Akila; Peterson, Gregory D.; Warren, G. Lee; Hinde, Robert J.; Harrison, Robert J.

    2009-12-01

    Interest in the study of structural and energetic properties of highly quantum clusters, such as inert gas clusters has motivated the development of a hardware-accelerated framework for Quantum Monte Carlo simulations. In the Quantum Monte Carlo method, the properties of a system of atoms, such as the ground-state energies, are averaged over a number of iterations. Our framework is aimed at accelerating the computations in each iteration of the QMC application by offloading the calculation of properties, namely energy and trial wave function, onto reconfigurable hardware. This gives a user the capability to run simulations for a large number of iterations, thereby reducing the statistical uncertainty in the properties, and for larger clusters. This framework is designed to run on the Cray XD1 high performance reconfigurable computing platform, which exploits the coarse-grained parallelism of the processor along with the fine-grained parallelism of the reconfigurable computing devices available in the form of field-programmable gate arrays. In this paper, we illustrate the functioning of the framework, which can be used to calculate the energies for a model cluster of helium atoms. In addition, we present the capabilities of the framework that allow the user to vary the chemical identities of the simulated atoms. Program summaryProgram title: Hardware Accelerated Quantum Monte Carlo (HAQMC) Catalogue identifier: AEEP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEP_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 691 537 No. of bytes in distributed program, including test data, etc.: 5 031 226 Distribution format: tar.gz Programming language: C/C++ for the QMC application, VHDL and Xilinx 8.1 ISE/EDK tools for FPGA design and development Computer: Cray XD

  18. Trainable hardware for dynamical computing using error backpropagation through physical media.

    Science.gov (United States)

    Hermans, Michiel; Burm, Michaël; Van Vaerenbergh, Thomas; Dambre, Joni; Bienstman, Peter

    2015-03-24

    Neural networks are currently implemented on digital Von Neumann machines, which do not fully leverage their intrinsic parallelism. We demonstrate how to use a novel class of reconfigurable dynamical systems for analogue information processing, mitigating this problem. Our generic hardware platform for dynamic, analogue computing consists of a reciprocal linear dynamical system with nonlinear feedback. Thanks to reciprocity, a ubiquitous property of many physical phenomena like the propagation of light and sound, the error backpropagation-a crucial step for tuning such systems towards a specific task-can happen in hardware. This can potentially speed up the optimization process significantly, offering important benefits for the scalability of neuro-inspired hardware. In this paper, we show, using one experimentally validated and one conceptual example, that such systems may provide a straightforward mechanism for constructing highly scalable, fully dynamical analogue computers.

  19. Scalable Resolution Display Walls

    KAUST Repository

    Leigh, Jason; Johnson, Andrew; Renambot, Luc; Peterka, Tom; Jeong, Byungil; Sandin, Daniel J.; Talandis, Jonas; Jagodic, Ratko; Nam, Sungwon; Hur, Hyejung; Sun, Yiwen

    2013-01-01

    This article will describe the progress since 2000 on research and development in 2-D and 3-D scalable resolution display walls that are built from tiling individual lower resolution flat panel displays. The article will describe approaches and trends in display hardware construction, middleware architecture, and user-interaction design. The article will also highlight examples of use cases and the benefits the technology has brought to their respective disciplines. © 1963-2012 IEEE.

  20. Another way of doing RSA cryptography in hardware

    NARCIS (Netherlands)

    Batina, L.; Bruin - Muurling, G.; Honary, B.

    2001-01-01

    In this paper we describe an efficient and secure hardware implementation of the RSA cryptosystem. Modular exponentiation is based on Montgomery’s method without any modular reduction achieving the optimal bound. The presented systolic array architecture is scalable in severalparameters which makes

  1. Combining hardware and simulation for datacenter scaling studies

    DEFF Research Database (Denmark)

    Ruepp, Sarah Renée; Pilimon, Artur; Thrane, Jakob

    2017-01-01

    and simulation to illustrate the scalability and performance of datacenter networks. We simulate a Datacenter network and interconnect it with real world traffic generation hardware. Analysis of the introduced packet conversion and virtual queueing delays shows that the conversion efficiency is at the order...

  2. Final Scientific/Technical Report for "Enabling Exascale Hardware and Software Design through Scalable System Virtualization"

    Energy Technology Data Exchange (ETDEWEB)

    Dinda, Peter August [Northwestern Univ., Evanston, IL (United States)

    2015-03-17

    This report describes the activities, findings, and products of the Northwestern University component of the "Enabling Exascale Hardware and Software Design through Scalable System Virtualization" project. The purpose of this project has been to extend the state of the art of systems software for high-end computing (HEC) platforms, and to use systems software to better enable the evaluation of potential future HEC platforms, for example exascale platforms. Such platforms, and their systems software, have the goal of providing scientific computation at new scales, thus enabling new research in the physical sciences and engineering. Over time, the innovations in systems software for such platforms also become applicable to more widely used computing clusters, data centers, and clouds. This was a five-institution project, centered on the Palacios virtual machine monitor (VMM) systems software, a project begun at Northwestern, and originally developed in a previous collaboration between Northwestern University and the University of New Mexico. In this project, Northwestern (including via our subcontract to the University of Pittsburgh) contributed to the continued development of Palacios, along with other team members. We took the leadership role in (1) continued extension of support for emerging Intel and AMD hardware, (2) integration and performance enhancement of overlay networking, (3) connectivity with architectural simulation, (4) binary translation, and (5) support for modern Non-Uniform Memory Access (NUMA) hosts and guests. We also took a supporting role in support for specialized hardware for I/O virtualization, profiling, configurability, and integration with configuration tools. The efforts we led (1-5) were largely successful and executed as expected, with code and papers resulting from them. The project demonstrated the feasibility of a virtualization layer for HEC computing, similar to such layers for cloud or datacenter computing. For effort (3

  3. An Introduction to Parallelism, Concurrency and Acceleration (1/2)

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    Concurrency and parallelism are firm elements of any modern computing infrastructure, made even more prominent by the emergence of accelerators. These lectures offer an introduction to these important concepts. We will begin with a brief refresher of recent hardware offerings to modern-day programmers. We will then open the main discussion with an overview of the laws and practical aspects of scalability. Key parallelism data structures, patterns and algorithms will be shown. The main threats to scalability and mitigation strategies will be discussed in the context of real-life optimization problems. Lecturer's short bio: Andrzej Nowak has 10 years of experience in computing technologies, primarily from CERN openlab and Intel. At CERN, he managed a research lab collaborating with Intel and was part of the openlab Chief Technology Office. Andrzej also worked closely and initiated projects with the private sector (e.g. HP and Google), as well as international research institutes, such as EPFL. Current...

  4. Hardware and software status of QCDOC

    International Nuclear Information System (INIS)

    Boyle, P.A.; Chen, D.; Christ, N.H.; Clark, M.; Cohen, S.D.; Cristian, C.; Dong, Z.; Gara, A.; Joo, B.; Jung, C.; Kim, C.; Levkova, L.; Liao, X.; Liu, G.; Mawhinney, R.D.; Ohta, S.; Petrov, K.; Wettig, T.; Yamaguchi, A.

    2004-01-01

    QCDOC is a massively parallel supercomputer whose processing nodes are based on an application-specific integrated circuit (ASIC). This ASIC was custom-designed so that crucial lattice QCD kernels achieve an overall sustained performance of 50% on machines with several 10,000 nodes. This strong scalability, together with low power consumption and a price/performance ratio of $1 per sustained MFlops, enable QCDOC to attack the most demanding lattice QCD problems. The first ASICs became available in June of 2003, and the testing performed so far has shown all systems functioning according to specification. We review the hardware and software status of QCDOC and present performance figures obtained in real hardware as well as in simulation

  5. Enhancing Scalability of Sparse Direct Methods

    International Nuclear Information System (INIS)

    Li, Xiaoye S.; Demmel, James; Grigori, Laura; Gu, Ming; Xia, Jianlin; Jardin, Steve; Sovinec, Carl; Lee, Lie-Quan

    2007-01-01

    TOPS is providing high-performance, scalable sparse direct solvers, which have had significant impacts on the SciDAC applications, including fusion simulation (CEMM), accelerator modeling (COMPASS), as well as many other mission-critical applications in DOE and elsewhere. Our recent developments have been focusing on new techniques to overcome scalability bottleneck of direct methods, in both time and memory. These include parallelizing symbolic analysis phase and developing linear-complexity sparse factorization methods. The new techniques will make sparse direct methods more widely usable in large 3D simulations on highly-parallel petascale computers

  6. Dynamically-Loaded Hardware Libraries (HLL) Technology for Audio Applications

    DEFF Research Database (Denmark)

    Esposito, A.; Lomuscio, A.; Nunzio, L. Di

    2016-01-01

    In this work, we apply hardware acceleration to embedded systems running audio applications. We present a new framework, Dynamically-Loaded Hardware Libraries or HLL, to dynamically load hardware libraries on reconfigurable platforms (FPGAs). Provided a library of application-specific processors......, we load on-the-fly the specific processor in the FPGA, and we transfer the execution from the CPU to the FPGA-based accelerator. The proposed architecture provides excellent flexibility with respect to the different audio applications implemented, high quality audio, and an energy efficient solution....

  7. Instrumentation Of The CERN Accelerator Logging Service: Ensuring Performance, Scalability, Maintenance And Diagnostics

    CERN Document Server

    Roderick, C; Dinis Teixeira, D

    2011-01-01

    The CERN accelerator Logging Service currently holds more than 90 terabytes of data online, and processes approximately 450 gigabytes per day, via hundreds of data loading processes and data extraction requests. This service is mission-critical for day-to-day operations, especially with respect to the tracking of live data from the LHC beam and equipment. In order to effectively manage any service, the service provider’s goals should include knowing how the underlying systems are being used, in terms of: “Who is doing what, from where, using which applications and methods, and how long each action takes”. Armed with such information, it is then possible to: analyse and tune system performance over time; plan for scalability ahead of time; assess the impact of maintenance operations and infrastructure upgrades; diagnose past, on-going, or re-occurring problems. The Logging Service is based on Oracle DBMS and Application Servers, and Java technology, and is comprised of several layered and multi-tiered s...

  8. Compact FPGA hardware architecture for public key encryption in embedded devices.

    Science.gov (United States)

    Rodríguez-Flores, Luis; Morales-Sandoval, Miguel; Cumplido, René; Feregrino-Uribe, Claudia; Algredo-Badillo, Ignacio

    2018-01-01

    Security is a crucial requirement in the envisioned applications of the Internet of Things (IoT), where most of the underlying computing platforms are embedded systems with reduced computing capabilities and energy constraints. In this paper we present the design and evaluation of a scalable low-area FPGA hardware architecture that serves as a building block to accelerate the costly operations of exponentiation and multiplication in [Formula: see text], commonly required in security protocols relying on public key encryption, such as in key agreement, authentication and digital signature. The proposed design can process operands of different size using the same datapath, which exhibits a significant reduction in area without loss of efficiency if compared to representative state of the art designs. For example, our design uses 96% less standard logic than a similar design optimized for performance, and 46% less resources than other design optimized for area. Even using fewer area resources, our design still performs better than its embedded software counterparts (190x and 697x).

  9. A High Performance QDWH-SVD Solver using Hardware Accelerators

    KAUST Repository

    Sukkari, Dalal E.; Ltaief, Hatem; Keyes, David E.

    2015-01-01

    few digits of accuracy, compared to the full double precision floating point arithmetic. We further leverage the single GPU QDWH-SVD implementation by introducing the first multi-GPU SVD solver to study the scalability of the QDWH-SVD framework.

  10. A Software and Hardware IPTV Architecture for Scalable DVB Distribution

    Directory of Open Access Journals (Sweden)

    Georg Acher

    2009-01-01

    Full Text Available Many standards and even more proprietary technologies deal with IP-based television (IPTV. But none of them can transparently map popular public broadcast services such as DVB or ATSC to IPTV with acceptable effort. In this paper we explain why we believe that such a mapping using a light weight framework is an important step towards all-IP multimedia. We then present the NetCeiver architecture: it is based on well-known standards such as IPv6, and it allows zero configuration. The use of multicast streaming makes NetCeiver highly scalable. We also describe a low cost FPGA implementation of the proposed NetCeiver architecture, which can concurrently stream services from up to six full transponders.

  11. Palacios and Kitten : high performance operating systems for scalable virtualized and native supercomputing.

    Energy Technology Data Exchange (ETDEWEB)

    Widener, Patrick (University of New Mexico); Jaconette, Steven (Northwestern University); Bridges, Patrick G. (University of New Mexico); Xia, Lei (Northwestern University); Dinda, Peter (Northwestern University); Cui, Zheng.; Lange, John (Northwestern University); Hudson, Trammell B.; Levenhagen, Michael J.; Pedretti, Kevin Thomas Tauke; Brightwell, Ronald Brian

    2009-09-01

    Palacios and Kitten are new open source tools that enable applications, whether ported or not, to achieve scalable high performance on large machines. They provide a thin layer over the hardware to support both full-featured virtualized environments and native code bases. Kitten is an OS under development at Sandia that implements a lightweight kernel architecture to provide predictable behavior and increased flexibility on large machines, while also providing Linux binary compatibility. Palacios is a VMM that is under development at Northwestern University and the University of New Mexico. Palacios, which can be embedded into Kitten and other OSes, supports existing, unmodified applications and operating systems by using virtualization that leverages hardware technologies. We describe the design and implementation of both Kitten and Palacios. Our benchmarks show that they provide near native, scalable performance. Palacios and Kitten provide an incremental path to using supercomputer resources that is not performance-compromised.

  12. Floating-point-based hardware accelerator of a beam phase-magnitude detector and filter for a beam phase control system in a heavy-ion synchrotron application

    International Nuclear Information System (INIS)

    Samman, F.A.; Pongyupinpanich Surapong; Spies, C.; Glesner, M.

    2012-01-01

    A hardware implementation of an adaptive phase and magnitude detector and filter of a beam-phase control system in a heavy ion synchrotron application is presented in this paper. The main components of the hardware are adaptive LMS (Least-Mean-Square) filters and phase and magnitude detectors. The phase detectors are implemented by using a CORDIC (Coordinate Rotation Digital Computer) algorithm based on 32-bit binary floating-point arithmetic data formats. The floating-point-based hardware is designed to improve the precision of the past hardware implementation that were based on fixed-point arithmetics. The hardware of the detector and the adaptive LMS filter have been implemented on a programmable logic device (FPGA) for hardware acceleration purpose. The ideal Matlab/Simulink model of the hardware and the VHDL model of the adaptive LMS filter and the phase and magnitude detector are compared. The comparison result shows that the output signal of the floating-point based adaptive FIR filter as well as the phase and magnitude detector agree with the expected output signal of the ideal Matlab/Simulink model. (authors)

  13. An FPGA-Based Quench Detection and Protection System for Superconducting Accelerator Magnets

    CERN Document Server

    Carcagno, Ruben H; Lamm, Michael J; Makulski, Andrzej; Nehring, Roger; Orris, Darryl; Pishchalnikov, Yu M; Tartaglia, M

    2005-01-01

    A new quench detection and protection system for superconducting accelerator magnets was developed at the Fermilab's Magnet Test Facility (MTF). This system is based on a Field-Programmable Gate Array (FPGA) module, and it is made of mostly commerically available, integrated hardware and software components. It provides most of the functionality of our existing VME-based quench detection and protection system, but in addition the new system is easily scalable to protect multiple magnets powered independently and has a more powerful user interface and analysis tools. First applications of the new system will be for testing corrector coil packages. In this paper we describe the new system and present results of testing LHC Interaction Region Quadrupole (IRQ) correctors.

  14. An FPGA-based quench detection and protection system for superconducting accelerator magnets

    International Nuclear Information System (INIS)

    Carcagno, R.H.; Feher, S.; Lamm, M.; Makulski, A.; Nehring, R.; Orris, D.F.; Pischalnikov, Y.; Tartaglia, M.; Fermilab

    2005-01-01

    A new quench detection and protection system for superconducting accelerator magnets was developed for the Fermilab's Magnet Test Facility (MTF). This system is based on a Field-Programmable Gate Array (FPGA) module, and it is made of mostly commercially available, integrated hardware and software components. It provides all the functions of our existing VME-based quench detection and protection system, but in addition the new system is easily scalable to protect multiple magnets powered independently and a more powerful user interface and analysis tools. The new system has been used successfully for testing LHC Interaction Region Quadrupoles correctors and High Field Magnet HFDM04. In this paper we describe the system and present results

  15. An FPGA-based quench detection and protection system for superconducting accelerator magnets

    Energy Technology Data Exchange (ETDEWEB)

    Carcagno, R.H.; Feher, S.; Lamm, M.; Makulski, A.; Nehring, R.; Orris, D.F.; Pischalnikov, Y.; Tartaglia, M.; /Fermilab

    2005-05-01

    A new quench detection and protection system for superconducting accelerator magnets was developed for the Fermilab's Magnet Test Facility (MTF). This system is based on a Field-Programmable Gate Array (FPGA) module, and it is made of mostly commercially available, integrated hardware and software components. It provides all the functions of our existing VME-based quench detection and protection system, but in addition the new system is easily scalable to protect multiple magnets powered independently and a more powerful user interface and analysis tools. The new system has been used successfully for testing LHC Interaction Region Quadrupoles correctors and High Field Magnet HFDM04. In this paper we describe the system and present results.

  16. FPGA Acceleration by Dynamically-Loaded Hardware Libraries

    DEFF Research Database (Denmark)

    Lomuscio, Andrea; Nannarelli, Alberto; Re, Marco

    -the-y the speciffic processor in the FPGA, and we transfer the execution from the CPU to the FPGA-based accelerator. Results show that significant speed-up and energy efficiency can be obtained by HLL acceleration on system-on-chips where reconfigurable fabric is placed next to the CPUs....

  17. The VMTG Hardware Description

    CERN Document Server

    Puccio, B

    1998-01-01

    The document describes the hardware features of the CERN Master Timing Generator. This board is the common platform for the transmission of General Timing Machine required by the CERN accelerators. In addition, the paper shows the various jumper options to customise the card which is compliant to the VMEbus standard.

  18. Advanced technologies for scalable ATLAS conditions database access on the grid

    CERN Document Server

    Basset, R; Dimitrov, G; Girone, M; Hawkings, R; Nevski, P; Valassi, A; Vaniachine, A; Viegas, F; Walker, R; Wong, A

    2010-01-01

    During massive data reprocessing operations an ATLAS Conditions Database application must support concurrent access from numerous ATLAS data processing jobs running on the Grid. By simulating realistic work-flow, ATLAS database scalability tests provided feedback for Conditions Db software optimization and allowed precise determination of required distributed database resources. In distributed data processing one must take into account the chaotic nature of Grid computing characterized by peak loads, which can be much higher than average access rates. To validate database performance at peak loads, we tested database scalability at very high concurrent jobs rates. This has been achieved through coordinated database stress tests performed in series of ATLAS reprocessing exercises at the Tier-1 sites. The goal of database stress tests is to detect scalability limits of the hardware deployed at the Tier-1 sites, so that the server overload conditions can be safely avoided in a production environment. Our analysi...

  19. Design and implementation of embedded hardware accelerator for diagnosing HDL-CODE in assertion-based verification environment

    Directory of Open Access Journals (Sweden)

    C. U. Ngene

    2013-08-01

    Full Text Available The use of assertions for monitoring the designer’s intention in hardware description language (HDL model is gaining popularity as it helps the designer to observe internal errors at the output ports of the device under verification. During verification assertions are synthesised and the generated data are represented in a tabular forms. The amount of data generated can be enormous depending on the size of the code and the number of modules that constitute the code. Furthermore, to manually inspect these data and diagnose the module with functional violation is a time consuming process which negatively affects the overall product development time. To locate the module with functional violation within acceptable diagnostic time, the data processing and analysis procedure must be accelerated. In this paper a multi-array processor (hardware accelerator was designed and implemented in Virtex6 field programmable gate array (FPGA and it can be integrated into verification environment. The design was captured in very high speed integrated circuit HDL (VHDL. The design was synthesised with Xilinx design suite ISE 13.1 and simulated with Xilinx ISIM. The multi-array processor (MAP executes three logical operations (AND, OR, XOR and a one’s compaction operation on array of data in parallel. An improvement in processing and analysis time was recorded as compared to the manual procedure after the multi-array processor was integrated into the verification environment. It was also found that the multi-array processor which was developed as an Intellectual Property (IP core can also be used in applications where output responses and golden model that are represented in the form of matrices can be compared for searching, recognition and decision-making.

  20. DISP: Optimizations towards Scalable MPI Startup

    Energy Technology Data Exchange (ETDEWEB)

    Fu, Huansong [Florida State University, Tallahassee; Pophale, Swaroop S [ORNL; Gorentla Venkata, Manjunath [ORNL; Yu, Weikuan [Florida State University, Tallahassee

    2016-01-01

    Despite the popularity of MPI for high performance computing, the startup of MPI programs faces a scalability challenge as both the execution time and memory consumption increase drastically at scale. We have examined this problem using the collective modules of Cheetah and Tuned in Open MPI as representative implementations. Previous improvements for collectives have focused on algorithmic advances and hardware off-load. In this paper, we examine the startup cost of the collective module within a communicator and explore various techniques to improve its efficiency and scalability. Accordingly, we have developed a new scalable startup scheme with three internal techniques, namely Delayed Initialization, Module Sharing and Prediction-based Topology Setup (DISP). Our DISP scheme greatly benefits the collective initialization of the Cheetah module. At the same time, it helps boost the performance of non-collective initialization in the Tuned module. We evaluate the performance of our implementation on Titan supercomputer at ORNL with up to 4096 processes. The results show that our delayed initialization can speed up the startup of Tuned and Cheetah by an average of 32.0% and 29.2%, respectively, our module sharing can reduce the memory consumption of Tuned and Cheetah by up to 24.1% and 83.5%, respectively, and our prediction-based topology setup can speed up the startup of Cheetah by up to 80%.

  1. Hardware Algorithms For Tile-Based Real-Time Rendering

    NARCIS (Netherlands)

    Crisu, D.

    2012-01-01

    In this dissertation, we present the GRAphics AcceLerator (GRAAL) framework for developing embedded tile-based rasterization hardware for mobile devices, meant to accelerate real-time 3-D graphics (OpenGL compliant) applications. The goal of the framework is a low-cost, low-power, high-performance

  2. CODA: A scalable, distributed data acquisition system

    International Nuclear Information System (INIS)

    Watson, W.A. III; Chen, J.; Heyes, G.; Jastrzembski, E.; Quarrie, D.

    1994-01-01

    A new data acquisition system has been designed for physics experiments scheduled to run at CEBAF starting in the summer of 1994. This system runs on Unix workstations connected via ethernet, FDDI, or other network hardware to multiple intelligent front end crates -- VME, CAMAC or FASTBUS. CAMAC crates may either contain intelligent processors, or may be interfaced to VME. The system is modular and scalable, from a single front end crate and one workstation linked by ethernet, to as may as 32 clusters of front end crates ultimately connected via a high speed network to a set of analysis workstations. The system includes an extensible, device independent slow controls package with drivers for CAMAC, VME, and high voltage crates, as well as a link to CEBAF accelerator controls. All distributed processes are managed by standard remote procedure calls propagating change-of-state requests, or reading and writing program variables. Custom components may be easily integrated. The system is portable to any front end processor running the VxWorks real-time kernel, and to most workstations supplying a few standard facilities such as rsh and X-windows, and Motif and socket libraries. Sample implementations exist for 2 Unix workstation families connected via ethernet or FDDI to VME (with interfaces to FASTBUS or CAMAC), and via ethernet to FASTBUS or CAMAC

  3. Apple-CORE: Microgrids of SVP cores: flexible, general-purpose, fine-grained hardware concurrency management

    NARCIS (Netherlands)

    Poss, R.; Lankamp, M.; Yang, Q.; Fu, J.; van Tol, M.W.; Jesshope, C.; Nair, S.

    2012-01-01

    To harness the potential of CMPs for scalable, energy-efficient performance in general-purpose computers, the Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency control across multiple cores. Its SVP interface

  4. Final Report: Enabling Exascale Hardware and Software Design through Scalable System Virtualization

    Energy Technology Data Exchange (ETDEWEB)

    Bridges, Patrick G.

    2015-02-01

    In this grant, we enhanced the Palacios virtual machine monitor to increase its scalability and suitability for addressing exascale system software design issues. This included a wide range of research on core Palacios features, large-scale system emulation, fault injection, perfomrance monitoring, and VMM extensibility. This research resulted in large number of high-impact publications in well-known venues, the support of a number of students, and the graduation of two Ph.D. students and one M.S. student. In addition, our enhanced version of the Palacios virtual machine monitor has been adopted as a core element of the Hobbes operating system under active DOE-funded research and development.

  5. Hardware Support for Dynamic Languages

    DEFF Research Database (Denmark)

    Schleuniger, Pascal; Karlsson, Sven; Probst, Christian W.

    2011-01-01

    In recent years, dynamic programming languages have enjoyed increasing popularity. For example, JavaScript has become one of the most popular programming languages on the web. As the complexity of web applications is growing, compute-intensive workloads are increasingly handed off to the client...... side. While a lot of effort is put in increasing the performance of web browsers, we aim for multicore systems with dedicated cores to effectively support dynamic languages. We have designed Tinuso, a highly flexible core for experimentation that is optimized for high performance when implemented...... on FPGA. We composed a scalable multicore configuration where we study how hardware support for software speculation can be used to increase the performance of dynamic languages....

  6. Research of Virtual Accelerator Control System

    Institute of Scientific and Technical Information of China (English)

    DongJinmei; YuanYoujin; ZhengJianhua

    2003-01-01

    A Virtual Accelerator is a computer process which simulates behavior of beam in an accelerator and responds to the accelerator control program under development in a same way as an actual accelerator. To realize Virtual Accelerator, control system should provide the same program interface to top layer Application Control Program, it can make 'Real Accelerator' and 'Virtual Accelerator'use the same GUI, so control system should have a layer to hide hardware details, Application Control Program access control devices through logical name but not through coded hardware address. Without this layer, it is difficult to develop application program which can access both 'Virtual' and 'Real' Accelerators using same program interfaces. For this reason, we can create CSR Runtime Database which allows application program to access hardware devices and data on a simulation process in a unified way. A device 'is represented as a collection of records in CSR Runtime Database. A control program on host computer can access devices in the system only through names of record fields, called channel.

  7. Scalability Modeling for Optimal Provisioning of Data Centers in Telenor: A better balance between under- and over-provisioning

    OpenAIRE

    Rygg, Knut Helge

    2012-01-01

    The scalability of an information system describes the relationship between system ca-pacity and system size. This report studies the scalability of Microsoft Lync Server 2010 in order to provide guidelines for provisioning hardware resources. Optimal pro-visioning is required to reduce both deployment and operational costs, while keeping an acceptable service quality.All Lync servers in the test setup are virtualizedusingVMware ESXi 5.0 and the system runs on a Cisco Unified Computing System...

  8. No-hardware-signature cybersecurity-crypto-module: a resilient cyber defense agent

    Science.gov (United States)

    Zaghloul, A. R. M.; Zaghloul, Y. A.

    2014-06-01

    We present an optical cybersecurity-crypto-module as a resilient cyber defense agent. It has no hardware signature since it is bitstream reconfigurable, where single hardware architecture functions as any selected device of all possible ones of the same number of inputs. For a two-input digital device, a 4-digit bitstream of 0s and 1s determines which device, of a total of 16 devices, the hardware performs as. Accordingly, the hardware itself is not physically reconfigured, but its performance is. Such a defense agent allows the attack to take place, rendering it harmless. On the other hand, if the system is already infected with malware sending out information, the defense agent allows the information to go out, rendering it meaningless. The hardware architecture is immune to side attacks since such an attack would reveal information on the attack itself and not on the hardware. This cyber defense agent can be used to secure a point-to-point, point-to-multipoint, a whole network, and/or a single entity in the cyberspace. Therefore, ensuring trust between cyber resources. It can provide secure communication in an insecure network. We provide the hardware design and explain how it works. Scalability of the design is briefly discussed. (Protected by United States Patents No.: US 8,004,734; US 8,325,404; and other National Patents worldwide.)

  9. Energy Efficient FPGA based Hardware Accelerators for Financial Applications

    DEFF Research Database (Denmark)

    Kenn Toft, Jakob; Nannarelli, Alberto

    2014-01-01

    Field Programmable Gate Arrays (FPGAs) based accelerators are very suitable to implement application-specific processors using uncommon operations or number systems. In this work, we design FPGA-based accelerators for two financial computations with different characteristics and we compare...... the accelerator performance and energy consumption to a software execution of the application. The experimental results show that significant speed-up and energy savings, can be obtained for large data sets by using the accelerator at expenses of a longer development time....

  10. A hardware acceleration based on high-level synthesis approach for glucose-insulin analysis

    Science.gov (United States)

    Daud, Nur Atikah Mohd; Mahmud, Farhanahani; Jabbar, Muhamad Hairol

    2017-01-01

    In this paper, the research is focusing on Type 1 Diabetes Mellitus (T1DM). Since this disease requires a full attention on the blood glucose concentration with the help of insulin injection, it is important to have a tool that able to predict that level when consume a certain amount of carbohydrate during meal time. Therefore, to make it realizable, a Hovorka model which is aiming towards T1DM is chosen in this research. A high-level language is chosen that is C++ to construct the mathematical model of the Hovorka model. Later, this constructed code is converted into intellectual property (IP) which is also known as a hardware accelerator by using of high-level synthesis (HLS) approach which able to improve in terms of design and performance for glucose-insulin analysis tool later as will be explained further in this paper. This is the first step in this research before implementing the design into system-on-chip (SoC) to achieve a high-performance system for the glucose-insulin analysis tool.

  11. The TOTEM DAQ based on the Scalable Readout System (SRS)

    Science.gov (United States)

    Quinto, Michele; Cafagna, Francesco S.; Fiergolski, Adrian; Radicioni, Emilio

    2018-02-01

    The TOTEM (TOTal cross section, Elastic scattering and diffraction dissociation Measurement at the LHC) experiment at LHC, has been designed to measure the total proton-proton cross-section and study the elastic and diffractive scattering at the LHC energies. In order to cope with the increased machine luminosity and the higher statistic required by the extension of the TOTEM physics program, approved for the LHC's Run Two phase, the previous VME based data acquisition system has been replaced with a new one based on the Scalable Readout System. The system features an aggregated data throughput of 2GB / s towards the online storage system. This makes it possible to sustain a maximum trigger rate of ˜ 24kHz, to be compared with the 1KHz rate of the previous system. The trigger rate is further improved by implementing zero-suppression and second-level hardware algorithms in the Scalable Readout System. The new system fulfils the requirements for an increased efficiency, providing higher bandwidth, and increasing the purity of the data recorded. Moreover full compatibility has been guaranteed with the legacy front-end hardware, as well as with the DAQ interface of the CMS experiment and with the LHC's Timing, Trigger and Control distribution system. In this contribution we describe in detail the architecture of full system and its performance measured during the commissioning phase at the LHC Interaction Point.

  12. Hardware support for CSP on a Java chip multiprocessor

    DEFF Research Database (Denmark)

    Gruian, Flavius; Schoeberl, Martin

    2013-01-01

    Due to memory bandwidth limitations, chip multiprocessors (CMPs) adopting the convenient shared memory model for their main memory architecture scale poorly. On-chip core-to-core communication is a solution to this problem, that can lead to further performance increase for a number of multithreaded...... applications. Programmatically, the Communicating Sequential Processes (CSPs) paradigm provides a sound computational model for such an architecture with message based communication. In this paper we explore hardware support for CSP in the context of an embedded Java CMP. The hardware support for CSP are on......-chip communication channels, implemented by a ring-based network-on-chip (NoC), to reduce the memory bandwidth pressure on the shared memory.The presented solution is scalable and also specific for our limited resources and real-time predictability requirements. CMP architectures of three to eight processors were...

  13. Accelerating the Non-equispaced Fast Fourier Transform on Commodity Graphics Hardware

    DEFF Research Database (Denmark)

    Sørensen, Thomas Sangild; Schaeffter, Tobias; Noe, Karsten Østergaard

    2008-01-01

    We present a fast parallel algorithm to compute the Non-equispaced fast Fourier transform on commodity graphics hardware (the GPU). We focus particularly on a novel implementation of the convolution step in the transform, which was previously its most time consuming part. We describe the performa......We present a fast parallel algorithm to compute the Non-equispaced fast Fourier transform on commodity graphics hardware (the GPU). We focus particularly on a novel implementation of the convolution step in the transform, which was previously its most time consuming part. We describe...

  14. Implementation of the Timepix ASIC in the Scalable Readout System

    Energy Technology Data Exchange (ETDEWEB)

    Lupberger, M., E-mail: lupberger@physik.uni-bonn.de; Desch, K.; Kaminski, J.

    2016-09-11

    We report on the development of electronics hardware, FPGA firmware and software to provide a flexible multi-chip readout of the Timepix ASIC within the framework of the Scalable Readout System (SRS). The system features FPGA-based zero-suppression and the possibility to read out up to 4×8 chips with a single Front End Concentrator (FEC). By operating several FECs in parallel, in principle an arbitrary number of chips can be read out, exploiting the scaling features of SRS. Specifically, we tested the system with a setup consisting of 160 Timepix ASICs, operated as GridPix devices in a large TPC field cage in a 1 T magnetic field at a DESY test beam facility providing an electron beam of up to 6 GeV. We discuss the design choices, the dedicated hardware components, the FPGA firmware as well as the performance of the system in the test beam.

  15. Center for Programming Models for Scalable Parallel Computing - Towards Enhancing OpenMP for Manycore and Heterogeneous Nodes

    Energy Technology Data Exchange (ETDEWEB)

    Barbara Chapman

    2012-02-01

    OpenMP was not well recognized at the beginning of the project, around year 2003, because of its limited use in DoE production applications and the inmature hardware support for an efficient implementation. Yet in the recent years, it has been graduately adopted both in HPC applications, mostly in the form of MPI+OpenMP hybrid code, and in mid-scale desktop applications for scientific and experimental studies. We have observed this trend and worked deligiently to improve our OpenMP compiler and runtimes, as well as to work with the OpenMP standard organization to make sure OpenMP are evolved in the direction close to DoE missions. In the Center for Programming Models for Scalable Parallel Computing project, the HPCTools team at the University of Houston (UH), directed by Dr. Barbara Chapman, has been working with project partners, external collaborators and hardware vendors to increase the scalability and applicability of OpenMP for multi-core (and future manycore) platforms and for distributed memory systems by exploring different programming models, language extensions, compiler optimizations, as well as runtime library support.

  16. Binary Associative Memories as a Benchmark for Spiking Neuromorphic Hardware

    Directory of Open Access Journals (Sweden)

    Andreas Stöckel

    2017-08-01

    Full Text Available Large-scale neuromorphic hardware platforms, specialized computer systems for energy efficient simulation of spiking neural networks, are being developed around the world, for example as part of the European Human Brain Project (HBP. Due to conceptual differences, a universal performance analysis of these systems in terms of runtime, accuracy and energy efficiency is non-trivial, yet indispensable for further hard- and software development. In this paper we describe a scalable benchmark based on a spiking neural network implementation of the binary neural associative memory. We treat neuromorphic hardware and software simulators as black-boxes and execute exactly the same network description across all devices. Experiments on the HBP platforms under varying configurations of the associative memory show that the presented method allows to test the quality of the neuron model implementation, and to explain significant deviations from the expected reference output.

  17. Implementation of the Lattice Boltzmann Method on Heterogeneous Hardware and Platforms using OpenCL

    Directory of Open Access Journals (Sweden)

    TEKIC, P. M.

    2012-02-01

    Full Text Available The Lattice Boltzmann method (LBM has become an alternative method for computational fluid dynamics with a wide range of applications. Besides its numerical stability and accuracy, one of the major advantages of LBM is its relatively easy parallelization and, hence, it is especially well fitted to many-core hardware as graphics processing units (GPU. The majority of work concerning LBM implementation on GPU's has used the CUDA programming model, supported exclusively by NVIDIA. Recently, the open standard for parallel programming of heterogeneous systems (OpenCL has been introduced. OpenCL standard matures and is supported on processors from most vendors. In this paper, we make use of the OpenCL framework for the lattice Boltzmann method simulation, using hardware accelerators - AMD ATI Radeon GPU, AMD Dual-Core CPU and NVIDIA GeForce GPU's. Application has been developed using a combination of Java and OpenCL programming languages. Java bindings for OpenCL have been utilized. This approach offers the benefits of hardware and operating system independence, as well as speeding up of lattice Boltzmann algorithm. It has been showed that the developed lattice Boltzmann source code can be executed without modification on all of the used hardware accelerators. Performance results have been presented and compared for the hardware accelerators that have been utilized.

  18. Hardware-software face detection system based on multi-block local binary patterns

    Science.gov (United States)

    Acasandrei, Laurentiu; Barriga, Angel

    2015-03-01

    Face detection is an important aspect for biometrics, video surveillance and human computer interaction. Due to the complexity of the detection algorithms any face detection system requires a huge amount of computational and memory resources. In this communication an accelerated implementation of MB LBP face detection algorithm targeting low frequency, low memory and low power embedded system is presented. The resulted implementation is time deterministic and uses a customizable AMBA IP hardware accelerator. The IP implements the kernel operations of the MB-LBP algorithm and can be used as universal accelerator for MB LBP based applications. The IP employs 8 parallel MB-LBP feature evaluators cores, uses a deterministic bandwidth, has a low area profile and the power consumption is ~95 mW on a Virtex5 XC5VLX50T. The resulted implementation acceleration gain is between 5 to 8 times, while the hardware MB-LBP feature evaluation gain is between 69 and 139 times.

  19. Scalable Multicasting over Next-Generation Internet Design, Analysis and Applications

    CERN Document Server

    Tian, Xiaohua

    2013-01-01

    Next-generation Internet providers face high expectations, as contemporary users worldwide expect high-quality multimedia functionality in a landscape of ever-expanding network applications. This volume explores the critical research issue of turning today’s greatly enhanced hardware capacity to good use in designing a scalable multicast  protocol for supporting large-scale multimedia services. Linking new hardware to improved performance in the Internet’s next incarnation is a research hot-spot in the computer communications field.   The methodical presentation deals with the key questions in turn: from the mechanics of multicast protocols to current state-of-the-art designs, and from methods of theoretical analysis of these protocols to applying them in the ns2 network simulator, known for being hard to extend. The authors’ years of research in the field inform this thorough treatment, which covers details such as applying AOM (application-oriented multicast) protocol to IPTV provision and resolving...

  20. Integration of an intelligent systems behavior simulator and a scalable soldier-machine interface

    Science.gov (United States)

    Johnson, Tony; Manteuffel, Chris; Brewster, Benjamin; Tierney, Terry

    2007-04-01

    As the Army's Future Combat Systems (FCS) introduce emerging technologies and new force structures to the battlefield, soldiers will increasingly face new challenges in workload management. The next generation warfighter will be responsible for effectively managing robotic assets in addition to performing other missions. Studies of future battlefield operational scenarios involving the use of automation, including the specification of existing and proposed technologies, will provide significant insight into potential problem areas regarding soldier workload. The US Army Tank Automotive Research, Development, and Engineering Center (TARDEC) is currently executing an Army technology objective program to analyze and evaluate the effect of automated technologies and their associated control devices with respect to soldier workload. The Human-Robotic Interface (HRI) Intelligent Systems Behavior Simulator (ISBS) is a human performance measurement simulation system that allows modelers to develop constructive simulations of military scenarios with various deployments of interface technologies in order to evaluate operator effectiveness. One such interface is TARDEC's Scalable Soldier-Machine Interface (SMI). The scalable SMI provides a configurable machine interface application that is capable of adapting to several hardware platforms by recognizing the physical space limitations of the display device. This paper describes the integration of the ISBS and Scalable SMI applications, which will ultimately benefit both systems. The ISBS will be able to use the Scalable SMI to visualize the behaviors of virtual soldiers performing HRI tasks, such as route planning, and the scalable SMI will benefit from stimuli provided by the ISBS simulation environment. The paper describes the background of each system and details of the system integration approach.

  1. Hardware-accelerated autostereogram rendering for interactive 3D visualization

    Science.gov (United States)

    Petz, Christoph; Goldluecke, Bastian; Magnor, Marcus

    2003-05-01

    Single Image Random Dot Stereograms (SIRDS) are an attractive way of depicting three-dimensional objects using conventional display technology. Once trained in decoupling the eyes' convergence and focusing, autostereograms of this kind are able to convey the three-dimensional impression of a scene. We present in this work an algorithm that generates SIRDS at interactive frame rates on a conventional PC. The presented system allows rotating a 3D geometry model and observing the object from arbitrary positions in real-time. Subjective tests show that the perception of a moving or rotating 3D scene presents no problem: The gaze remains focused onto the object. In contrast to conventional SIRDS algorithms, we render multiple pixels in a single step using a texture-based approach, exploiting the parallel-processing architecture of modern graphics hardware. A vertex program determines the parallax for each vertex of the geometry model, and the graphics hardware's texture unit is used to render the dot pattern. No data has to be transferred between main memory and the graphics card for generating the autostereograms, leaving CPU capacity available for other tasks. Frame rates of 25 fps are attained at a resolution of 1024x512 pixels on a standard PC using a consumer-grade nVidia GeForce4 graphics card, demonstrating the real-time capability of the system.

  2. Space Situational Awareness Data Processing Scalability Utilizing Google Cloud Services

    Science.gov (United States)

    Greenly, D.; Duncan, M.; Wysack, J.; Flores, F.

    Space Situational Awareness (SSA) is a fundamental and critical component of current space operations. The term SSA encompasses the awareness, understanding and predictability of all objects in space. As the population of orbital space objects and debris increases, the number of collision avoidance maneuvers grows and prompts the need for accurate and timely process measures. The SSA mission continually evolves to near real-time assessment and analysis demanding the need for higher processing capabilities. By conventional methods, meeting these demands requires the integration of new hardware to keep pace with the growing complexity of maneuver planning algorithms. SpaceNav has implemented a highly scalable architecture that will track satellites and debris by utilizing powerful virtual machines on the Google Cloud Platform. SpaceNav algorithms for processing CDMs outpace conventional means. A robust processing environment for tracking data, collision avoidance maneuvers and various other aspects of SSA can be created and deleted on demand. Migrating SpaceNav tools and algorithms into the Google Cloud Platform will be discussed and the trials and tribulations involved. Information will be shared on how and why certain cloud products were used as well as integration techniques that were implemented. Key items to be presented are: 1.Scientific algorithms and SpaceNav tools integrated into a scalable architecture a) Maneuver Planning b) Parallel Processing c) Monte Carlo Simulations d) Optimization Algorithms e) SW Application Development/Integration into the Google Cloud Platform 2. Compute Engine Processing a) Application Engine Automated Processing b) Performance testing and Performance Scalability c) Cloud MySQL databases and Database Scalability d) Cloud Data Storage e) Redundancy and Availability

  3. Content-Aware Scalability-Type Selection for Rate Adaptation of Scalable Video

    Directory of Open Access Journals (Sweden)

    Tekalp A Murat

    2007-01-01

    Full Text Available Scalable video coders provide different scaling options, such as temporal, spatial, and SNR scalabilities, where rate reduction by discarding enhancement layers of different scalability-type results in different kinds and/or levels of visual distortion depend on the content and bitrate. This dependency between scalability type, video content, and bitrate is not well investigated in the literature. To this effect, we first propose an objective function that quantifies flatness, blockiness, blurriness, and temporal jerkiness artifacts caused by rate reduction by spatial size, frame rate, and quantization parameter scaling. Next, the weights of this objective function are determined for different content (shot types and different bitrates using a training procedure with subjective evaluation. Finally, a method is proposed for choosing the best scaling type for each temporal segment that results in minimum visual distortion according to this objective function given the content type of temporal segments. Two subjective tests have been performed to validate the proposed procedure for content-aware selection of the best scalability type on soccer videos. Soccer videos scaled from 600 kbps to 100 kbps by the proposed content-aware selection of scalability type have been found visually superior to those that are scaled using a single scalability option over the whole sequence.

  4. Exploiting multi-scale parallelism for large scale numerical modelling of laser wakefield accelerators

    International Nuclear Information System (INIS)

    Fonseca, R A; Vieira, J; Silva, L O; Fiuza, F; Davidson, A; Tsung, F S; Mori, W B

    2013-01-01

    A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ∼10 6 cores and sustained performance over ∼2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios. (paper)

  5. Hardware Middleware for Person Tracking on Embedded Distributed Smart Cameras

    Directory of Open Access Journals (Sweden)

    Ali Akbar Zarezadeh

    2012-01-01

    Full Text Available Tracking individuals is a prominent application in such domains like surveillance or smart environments. This paper provides a development of a multiple camera setup with jointed view that observes moving persons in a site. It focuses on a geometry-based approach to establish correspondence among different views. The expensive computational parts of the tracker are hardware accelerated via a novel system-on-chip (SoC design. In conjunction with this vision application, a hardware object request broker (ORB middleware is presented as the underlying communication system. The hardware ORB provides a hardware/software architecture to achieve real-time intercommunication among multiple smart cameras. Via a probing mechanism, a performance analysis is performed to measure network latencies, that is, time traversing the TCP/IP stack, in both software and hardware ORB approaches on the same smart camera platform. The empirical results show that using the proposed hardware ORB as client and server in separate smart camera nodes will considerably reduce the network latency up to 100 times compared to the software ORB.

  6. VALU, AVX and GPU acceleration techniques for parallel FDTD methods

    CERN Document Server

    Yu, Wenhua

    2013-01-01

    This book introduces a general hardware acceleration technique that can significantly speed up FDTD simulations and their applications to engineering problems without requiring any additional hardware devices. This acceleration of complex problems can be efficient in saving both time and money and once learned these new techniques can be used repeatedly.

  7. Hardware Genetic Algorithm Optimization by Critical Path Analysis using a Custom VLSI Architecture

    Directory of Open Access Journals (Sweden)

    Farouk Smith

    2015-07-01

    Full Text Available This paper propose a Virtual-Field Programmable Gate Array (V-FPGA architecture that allows direct access to its configuration bits to facilitate hardware evolution, thereby allowing any combinational or sequential digital circuit to be realized. By using the V-FPGA, this paper investigates two possible ways of making evolutionary hardware systems more scalable: by optimizing the system’s genetic algorithm (GA; and by decomposing the solution circuit into smaller, evolvable sub-circuits. GA optimization is done by: omitting a canonical GA’s crossover operator (i.e. by using a 1+λ algorithm; applying evolution constraints; and optimizing the fitness function. A noteworthy contribution this research has made is the in-depth analysis of the phenotypes’ CPs. Through analyzing the CPs, it has been shown that a great amount of insight can be gained into a phenotype’s fitness. We found that as the number of columns in the Cartesian Genetic Programming array increases, so the likelihood of an external output being placed in the column decreases. Furthermore, the number of used LEs per column also substantially decreases per added column. Finally, we demonstrated the evolution of a state-decomposed control circuit. It was shown that the evolution of each state’s sub-circuit was possible, and suggest that modular evolution can be a successful tool when dealing with scalability.

  8. A Prediction Packetizing Scheme for Reducing Channel Traffic in Transaction-Level Hardware/Software Co-Emulation

    OpenAIRE

    Lee , Jae-Gon; Chung , Moo-Kyoung; Ahn , Ki-Yong; Lee , Sang-Heon; Kyung , Chong-Min

    2005-01-01

    Submitted on behalf of EDAA (http://www.edaa.com/); International audience; This paper presents a scheme for efficient channel usage between simulator and accelerator where the accelerator models some RTL sub-blocks in the accelerator-based hardware/software co-simulation while the simulator runs transaction-level model of the remaining part of the whole chip being verified. With conventional simulation accelerator, evaluations of simulator and accelerator alternate at every valid simulation ...

  9. Acquisition of reliable vacuum hardware for large accelerator systems

    International Nuclear Information System (INIS)

    Welch, K.M.

    1996-01-01

    Credible and effective communications prove to be the major challenge in the acquisition of reliable vacuum hardware. Technical competence is necessary but not sufficient. We must effectively communicate with management, sponsoring agencies, project organizations, service groups, staff and with vendors. Most of Deming's 14 quality assurance tenets relate to creating an enlightened environment of good communications. All projects progress along six distinct, closely coupled, dynamic phases; all six phases are in a state of perpetual change. These phases and their elements are discussed, with emphasis given to the acquisition phase and its related vocabulary. (author)

  10. Accelerating cardiac bidomain simulations using graphics processing units.

    Science.gov (United States)

    Neic, A; Liebmann, M; Hoetzl, E; Mitchell, L; Vigmond, E J; Haase, G; Plank, G

    2012-08-01

    Anatomically realistic and biophysically detailed multiscale computer models of the heart are playing an increasingly important role in advancing our understanding of integrated cardiac function in health and disease. Such detailed simulations, however, are computationally vastly demanding, which is a limiting factor for a wider adoption of in-silico modeling. While current trends in high-performance computing (HPC) hardware promise to alleviate this problem, exploiting the potential of such architectures remains challenging since strongly scalable algorithms are necessitated to reduce execution times. Alternatively, acceleration technologies such as graphics processing units (GPUs) are being considered. While the potential of GPUs has been demonstrated in various applications, benefits in the context of bidomain simulations where large sparse linear systems have to be solved in parallel with advanced numerical techniques are less clear. In this study, the feasibility of multi-GPU bidomain simulations is demonstrated by running strong scalability benchmarks using a state-of-the-art model of rabbit ventricles. The model is spatially discretized using the finite element methods (FEM) on fully unstructured grids. The GPU code is directly derived from a large pre-existing code, the Cardiac Arrhythmia Research Package (CARP), with very minor perturbation of the code base. Overall, bidomain simulations were sped up by a factor of 11.8 to 16.3 in benchmarks running on 6-20 GPUs compared to the same number of CPU cores. To match the fastest GPU simulation which engaged 20 GPUs, 476 CPU cores were required on a national supercomputing facility.

  11. A Scalable Policy and SNMP Based Network Management Framework

    Institute of Scientific and Technical Information of China (English)

    LIU Su-ping; DING Yong-sheng

    2009-01-01

    Traditional SNMP-based network management can not deal with the task of managing large-scaled distributed network,while policy-based management is one of the effective solutions in network and distributed systems management. However,cross-vendor hardware compatibility is one of the limitations in policy-based management. Devices existing in current network mostly support SNMP rather than Common Open Policy Service (COPS) protocol. By analyzing traditional network management and policy-based network management, a scalable network management framework is proposed. It is combined with Internet Engineering Task Force (IETF) framework for policybased management and SNMP-based network management. By interpreting and translating policy decision to SNMP message,policy can be executed in traditional SNMP-based device.

  12. Travel Software using GPU Hardware

    CERN Document Server

    Szalwinski, Chris M; Dimov, Veliko Atanasov; CERN. Geneva. ATS Department

    2015-01-01

    Travel is the main multi-particle tracking code being used at CERN for the beam dynamics calculations through hadron and ion linear accelerators. It uses two routines for the calculation of space charge forces, namely, rings of charges and point-to-point. This report presents the studies to improve the performance of Travel using GPU hardware. The studies showed that the performance of Travel with the point-to-point simulations of space-charge effects can be speeded up at least 72 times using current GPU hardware. Simple recompilation of the source code using an Intel compiler can improve performance at least 4 times without GPU support. The limited memory of the GPU is the bottleneck. Two algorithms were investigated on this point: repeated computation and tiling. The repeating computation algorithm is simpler and is the currently recommended solution. The tiling algorithm was more complicated and degraded performance. Both build and test instructions for the parallelized version of the software are inclu...

  13. Modern control techniques for accelerators

    International Nuclear Information System (INIS)

    Goodwin, R.W.; Shea, M.F.

    1984-01-01

    Beginning in the mid to late sixties, most new accelerators were designed to include computer based control systems. Although each installation differed in detail, the technology of the sixties and early to mid seventies dictated an architecture that was essentially the same for the control systems of that era. A mini-computer was connected to the hardware and to a console. Two developments have changed the architecture of modern systems: the microprocessor and local area networks. This paper discusses these two developments and demonstrates their impact on control system design and implementation by way of describing a possible architecture for any size of accelerator. Both hardware and software aspects are included

  14. FPGA Implementation of Decimal Processors for Hardware Acceleration

    DEFF Research Database (Denmark)

    Borup, Nicolas; Dindorp, Jonas; Nannarelli, Alberto

    2011-01-01

    Applications in non-conventional number systems can benefit from accelerators implemented on reconfigurable platforms, such as Field Programmable Gate-Arrays (FPGAs). In this paper, we show that applications requiring decimal operations, such as the ones necessary in accounting or financial trans...... execution on the CPU of the hosting computer....

  15. The CEBAF Element Database and Related Operational Software

    Energy Technology Data Exchange (ETDEWEB)

    Larrieu, Theodore [Thomas Jefferson National Accelerator Facility, Newport News, VA (United States); Slominski, Christopher [Thomas Jefferson National Accelerator Facility, Newport News, VA (United States); Keesee, Marie [Thomas Jefferson National Accelerator Facility, Newport News, VA (United States); Turner, Dennison [Thomas Jefferson National Accelerator Facility, Newport News, VA (United States); Joyce, Michele [Thomas Jefferson National Accelerator Facility, Newport News, VA (United States)

    2015-09-01

    The newly commissioned 12GeV CEBAF accelerator relies on a flexible, scalable and comprehensive database to define the accelerator. This database delivers the configuration for CEBAF operational tools, including hardware checkout, the downloadable optics model, control screens, and much more. The presentation will describe the flexible design of the CEBAF Element Database (CED), its features and assorted use case examples.

  16. Performance-scalable volumetric data classification for online industrial inspection

    Science.gov (United States)

    Abraham, Aby J.; Sadki, Mustapha; Lea, R. M.

    2002-03-01

    Non-intrusive inspection and non-destructive testing of manufactured objects with complex internal structures typically requires the enhancement, analysis and visualization of high-resolution volumetric data. Given the increasing availability of fast 3D scanning technology (e.g. cone-beam CT), enabling on-line detection and accurate discrimination of components or sub-structures, the inherent complexity of classification algorithms inevitably leads to throughput bottlenecks. Indeed, whereas typical inspection throughput requirements range from 1 to 1000 volumes per hour, depending on density and resolution, current computational capability is one to two orders-of-magnitude less. Accordingly, speeding up classification algorithms requires both reduction of algorithm complexity and acceleration of computer performance. A shape-based classification algorithm, offering algorithm complexity reduction, by using ellipses as generic descriptors of solids-of-revolution, and supporting performance-scalability, by exploiting the inherent parallelism of volumetric data, is presented. A two-stage variant of the classical Hough transform is used for ellipse detection and correlation of the detected ellipses facilitates position-, scale- and orientation-invariant component classification. Performance-scalability is achieved cost-effectively by accelerating a PC host with one or more COTS (Commercial-Off-The-Shelf) PCI multiprocessor cards. Experimental results are reported to demonstrate the feasibility and cost-effectiveness of the data-parallel classification algorithm for on-line industrial inspection applications.

  17. Decentralized control of a scalable photovoltaic (PV)-battery hybrid power system

    International Nuclear Information System (INIS)

    Kim, Myungchin; Bae, Sungwoo

    2017-01-01

    Highlights: • This paper introduces the design and control of a PV-battery hybrid power system. • Reliable and scalable operation of hybrid power systems is achieved. • System and power control are performed without a centralized controller. • Reliability and scalability characteristics are studied in a quantitative manner. • The system control performance is verified using realistic solar irradiation data. - Abstract: This paper presents the design and control of a sustainable standalone photovoltaic (PV)-battery hybrid power system (HPS). The research aims to develop an approach that contributes to increased level of reliability and scalability for an HPS. To achieve such objectives, a PV-battery HPS with a passively connected battery was studied. A quantitative hardware reliability analysis was performed to assess the effect of energy storage configuration to the overall system reliability. Instead of requiring the feedback control information of load power through a centralized supervisory controller, the power flow in the proposed HPS is managed by a decentralized control approach that takes advantage of the system architecture. Reliable system operation of an HPS is achieved through the proposed control approach by not requiring a separate supervisory controller. Furthermore, performance degradation of energy storage can be prevented by selecting the controller gains such that the charge rate does not exceed operational requirements. The performance of the proposed system architecture with the control strategy was verified by simulation results using realistic irradiance data and a battery model in which its temperature effect was considered. With an objective to support scalable operation, details on how the proposed design could be applied were also studied so that the HPS could satisfy potential system growth requirements. Such scalability was verified by simulating various cases that involve connection and disconnection of sources and loads. The

  18. Hardware for Accelerating N-Modular Redundant Systems for High-Reliability Computing

    Science.gov (United States)

    Dobbs, Carl, Sr.

    2012-01-01

    A hardware unit has been designed that reduces the cost, in terms of performance and power consumption, for implementing N-modular redundancy (NMR) in a multiprocessor device. The innovation monitors transactions to memory, and calculates a form of sumcheck on-the-fly, thereby relieving the processors of calculating the sumcheck in software

  19. Hardware controls for the STAR experiment at RHIC

    International Nuclear Information System (INIS)

    Reichhold, D.; Bieser, F.; Bordua, M.; Cherney, M.; Chrin, J.; Dunlop, J.C.; Ferguson, M.I.; Ghazikhanian, V.; Gross, J.; Harper, G.; Howe, M.; Jacobson, S.; Klein, S.R.; Kravtsov, P.; Lewis, S.; Lin, J.; Lionberger, C.; LoCurto, G.; McParland, C.; McShane, T.; Meier, J.; Sakrejda, I.; Sandler, Z.; Schambach, J.; Shi, Y.; Willson, R.; Yamamoto, E.; Zhang, W.

    2003-01-01

    The STAR detector sits in a high radiation area when operating normally; therefore it was necessary to develop a robust system to remotely control all hardware. The STAR hardware controls system monitors and controls approximately 14,000 parameters in the STAR detector. Voltages, currents, temperatures, and other parameters are monitored. Effort has been minimized by the adoption of experiment-wide standards and the use of pre-packaged software tools. The system is based on the Experimental Physics and Industrial Control System (EPICS) . VME processors communicate with subsystem-based sensors over a variety of field busses, with High-level Data Link Control (HDLC) being the most prevalent. Other features of the system include interfaces to accelerator and magnet control systems, a web-based archiver, and C++-based communication between STAR online, run control and hardware controls and their associated databases. The system has been designed for easy expansion as new detector elements are installed in STAR

  20. Accelerating epistasis analysis in human genetics with consumer graphics hardware

    Directory of Open Access Journals (Sweden)

    Cancare Fabio

    2009-07-01

    Full Text Available Abstract Background Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs have more memory bandwidth and computational capability than Central Processing Units (CPUs and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. Findings We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective

  1. Accelerating epistasis analysis in human genetics with consumer graphics hardware.

    Science.gov (United States)

    Sinnott-Armstrong, Nicholas A; Greene, Casey S; Cancare, Fabio; Moore, Jason H

    2009-07-24

    Human geneticists are now capable of measuring more than one million DNA sequence variations from across the human genome. The new challenge is to develop computationally feasible methods capable of analyzing these data for associations with common human disease, particularly in the context of epistasis. Epistasis describes the situation where multiple genes interact in a complex non-linear manner to determine an individual's disease risk and is thought to be ubiquitous for common diseases. Multifactor Dimensionality Reduction (MDR) is an algorithm capable of detecting epistasis. An exhaustive analysis with MDR is often computationally expensive, particularly for high order interactions. This challenge has previously been met with parallel computation and expensive hardware. The option we examine here exploits commodity hardware designed for computer graphics. In modern computers Graphics Processing Units (GPUs) have more memory bandwidth and computational capability than Central Processing Units (CPUs) and are well suited to this problem. Advances in the video game industry have led to an economy of scale creating a situation where these powerful components are readily available at very low cost. Here we implement and evaluate the performance of the MDR algorithm on GPUs. Of primary interest are the time required for an epistasis analysis and the price to performance ratio of available solutions. We found that using MDR on GPUs consistently increased performance per machine over both a feature rich Java software package and a C++ cluster implementation. The performance of a GPU workstation running a GPU implementation reduces computation time by a factor of 160 compared to an 8-core workstation running the Java implementation on CPUs. This GPU workstation performs similarly to 150 cores running an optimized C++ implementation on a Beowulf cluster. Furthermore this GPU system provides extremely cost effective performance while leaving the CPU available for other

  2. Accelerating artificial intelligence with reconfigurable computing

    Science.gov (United States)

    Cieszewski, Radoslaw

    Reconfigurable computing is emerging as an important area of research in computer architectures and software systems. Many algorithms can be greatly accelerated by placing the computationally intense portions of an algorithm into reconfigurable hardware. Reconfigurable computing combines many benefits of both software and ASIC implementations. Like software, the mapped circuit is flexible, and can be changed over the lifetime of the system. Similar to an ASIC, reconfigurable systems provide a method to map circuits into hardware. Reconfigurable systems therefore have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute operations of traditional processors, and possibly exploiting a greater level of parallelism. Such a field, where there is many different algorithms which can be accelerated, is an artificial intelligence. This paper presents example hardware implementations of Artificial Neural Networks, Genetic Algorithms and Expert Systems.

  3. A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data.

    Directory of Open Access Journals (Sweden)

    Giovanni Delussu

    Full Text Available This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR's formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called "Constant Load" and "Constant Number of Records", with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes.

  4. A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data

    Science.gov (United States)

    Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi

    2016-01-01

    This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR’s formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called “Constant Load” and “Constant Number of Records”, with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes. PMID:27936191

  5. A Scalable Data Access Layer to Manage Structured Heterogeneous Biomedical Data.

    Science.gov (United States)

    Delussu, Giovanni; Lianas, Luca; Frexia, Francesca; Zanetti, Gianluigi

    2016-01-01

    This work presents a scalable data access layer, called PyEHR, designed to support the implementation of data management systems for secondary use of structured heterogeneous biomedical and clinical data. PyEHR adopts the openEHR's formalisms to guarantee the decoupling of data descriptions from implementation details and exploits structure indexing to accelerate searches. Data persistence is guaranteed by a driver layer with a common driver interface. Interfaces for two NoSQL Database Management Systems are already implemented: MongoDB and Elasticsearch. We evaluated the scalability of PyEHR experimentally through two types of tests, called "Constant Load" and "Constant Number of Records", with queries of increasing complexity on synthetic datasets of ten million records each, containing very complex openEHR archetype structures, distributed on up to ten computing nodes.

  6. A Cross-Platform Infrastructure for Scalable Runtime Application Performance Analysis

    Energy Technology Data Exchange (ETDEWEB)

    Jack Dongarra; Shirley Moore; Bart Miller, Jeffrey Hollingsworth; Tracy Rafferty

    2005-03-15

    The purpose of this project was to build an extensible cross-platform infrastructure to facilitate the development of accurate and portable performance analysis tools for current and future high performance computing (HPC) architectures. Major accomplishments include tools and techniques for multidimensional performance analysis, as well as improved support for dynamic performance monitoring of multithreaded and multiprocess applications. Previous performance tool development has been limited by the burden of having to re-write a platform-dependent low-level substrate for each architecture/operating system pair in order to obtain the necessary performance data from the system. Manual interpretation of performance data is not scalable for large-scale long-running applications. The infrastructure developed by this project provides a foundation for building portable and scalable performance analysis tools, with the end goal being to provide application developers with the information they need to analyze, understand, and tune the performance of terascale applications on HPC architectures. The backend portion of the infrastructure provides runtime instrumentation capability and access to hardware performance counters, with thread-safety for shared memory environments and a communication substrate to support instrumentation of multiprocess and distributed programs. Front end interfaces provides tool developers with a well-defined, platform-independent set of calls for requesting performance data. End-user tools have been developed that demonstrate runtime data collection, on-line and off-line analysis of performance data, and multidimensional performance analysis. The infrastructure is based on two underlying performance instrumentation technologies. These technologies are the PAPI cross-platform library interface to hardware performance counters and the cross-platform Dyninst library interface for runtime modification of executable images. The Paradyn and KOJAK

  7. Reconfigurable Hardware for Compressing Hyperspectral Image Data

    Science.gov (United States)

    Aranki, Nazeeh; Namkung, Jeffrey; Villapando, Carlos; Kiely, Aaron; Klimesh, Matthew; Xie, Hua

    2010-01-01

    High-speed, low-power, reconfigurable electronic hardware has been developed to implement ICER-3D, an algorithm for compressing hyperspectral-image data. The algorithm and parts thereof have been the topics of several NASA Tech Briefs articles, including Context Modeler for Wavelet Compression of Hyperspectral Images (NPO-43239) and ICER-3D Hyperspectral Image Compression Software (NPO-43238), which appear elsewhere in this issue of NASA Tech Briefs. As described in more detail in those articles, the algorithm includes three main subalgorithms: one for computing wavelet transforms, one for context modeling, and one for entropy encoding. For the purpose of designing the hardware, these subalgorithms are treated as modules to be implemented efficiently in field-programmable gate arrays (FPGAs). The design takes advantage of industry- standard, commercially available FPGAs. The implementation targets the Xilinx Virtex II pro architecture, which has embedded PowerPC processor cores with flexible on-chip bus architecture. It incorporates an efficient parallel and pipelined architecture to compress the three-dimensional image data. The design provides for internal buffering to minimize intensive input/output operations while making efficient use of offchip memory. The design is scalable in that the subalgorithms are implemented as independent hardware modules that can be combined in parallel to increase throughput. The on-chip processor manages the overall operation of the compression system, including execution of the top-level control functions as well as scheduling, initiating, and monitoring processes. The design prototype has been demonstrated to be capable of compressing hyperspectral data at a rate of 4.5 megasamples per second at a conservative clock frequency of 50 MHz, with a potential for substantially greater throughput at a higher clock frequency. The power consumption of the prototype is less than 6.5 W. The reconfigurability (by means of reprogramming) of

  8. Modern control techniques for accelerators

    International Nuclear Information System (INIS)

    Goodwin, R.W.; Shea, M.F.

    1984-05-01

    Beginning in the mid to late sixties, most new accelerators were designed to include computer based control systems. Although each installation differed in detail, the technology of the sixties and early to mid seventies dictated an architecture that was essentially the same for the control systems of that era. A mini-computer was connected to the hardware and to a console. Two developments have changed the architecture of modern systems: (a) the microprocessor and (b) local area networks. This paper discusses these two developments and demonstrates their impact on control system design and implementation by way of describing a possible architecture for any size of accelerator. Both hardware and software aspects are included

  9. Compact accelerator for medical therapy

    Science.gov (United States)

    Caporaso, George J.; Chen, Yu-Jiuan; Hawkins, Steven A.; Sampayan, Stephen E.; Paul, Arthur C.

    2010-05-04

    A compact accelerator system having an integrated particle generator-linear accelerator with a compact, small-scale construction capable of producing an energetic (.about.70-250 MeV) proton beam or other nuclei and transporting the beam direction to a medical therapy patient without the need for bending magnets or other hardware often required for remote beam transport. The integrated particle generator-accelerator is actuable as a unitary body on a support structure to enable scanning of a particle beam by direction actuation of the particle generator-accelerator.

  10. Computational scalability of large size image dissemination

    Science.gov (United States)

    Kooper, Rob; Bajcsy, Peter

    2011-01-01

    We have investigated the computational scalability of image pyramid building needed for dissemination of very large image data. The sources of large images include high resolution microscopes and telescopes, remote sensing and airborne imaging, and high resolution scanners. The term 'large' is understood from a user perspective which means either larger than a display size or larger than a memory/disk to hold the image data. The application drivers for our work are digitization projects such as the Lincoln Papers project (each image scan is about 100-150MB or about 5000x8000 pixels with the total number to be around 200,000) and the UIUC library scanning project for historical maps from 17th and 18th century (smaller number but larger images). The goal of our work is understand computational scalability of the web-based dissemination using image pyramids for these large image scans, as well as the preservation aspects of the data. We report our computational benchmarks for (a) building image pyramids to be disseminated using the Microsoft Seadragon library, (b) a computation execution approach using hyper-threading to generate image pyramids and to utilize the underlying hardware, and (c) an image pyramid preservation approach using various hard drive configurations of Redundant Array of Independent Disks (RAID) drives for input/output operations. The benchmarks are obtained with a map (334.61 MB, JPEG format, 17591x15014 pixels). The discussion combines the speed and preservation objectives.

  11. PCI hardware support in LIA-2 control system

    International Nuclear Information System (INIS)

    Bolkhovityanov, D.; Cheblakov, P.

    2012-01-01

    The control system of the LIA-2 accelerator is built on cPCI crates with *86- compatible processor boards running Linux. Slow electronics is connected via CAN-bus, while fast electronics (4 MHz and 200 MHz fast ADCs and 200 MHz timers) are implemented as cPCI/PMC modules. Several ways to drive PCI control electronics in Linux were examined. Finally a user-space drivers approach was chosen. These drivers communicate with hardware via a small kernel module, which provides access to PCI BARs and to interrupt handling. This module was named USPCI (User-Space PCI access). This approach dramatically simplifies creation of drivers, as opposed to kernel drivers, and provides high reliability (because only a tiny and thoroughly-debugged piece of code runs in kernel). LIA-2 accelerator was successfully commissioned, and the solution chosen has proven adequate and very easy to use. Besides, USPCI turned out to be a handy tool for examination and debugging of PCI devices direct from command-line. In this paper available approaches to work with PCI control hardware in Linux are considered, and USPCI architecture is described. (authors)

  12. Fast DRR splat rendering using common consumer graphics hardware

    International Nuclear Information System (INIS)

    Spoerk, Jakob; Bergmann, Helmar; Wanschitz, Felix; Dong, Shuo; Birkfellner, Wolfgang

    2007-01-01

    Digitally rendered radiographs (DRR) are a vital part of various medical image processing applications such as 2D/3D registration for patient pose determination in image-guided radiotherapy procedures. This paper presents a technique to accelerate DRR creation by using conventional graphics hardware for the rendering process. DRR computation itself is done by an efficient volume rendering method named wobbled splatting. For programming the graphics hardware, NVIDIAs C for Graphics (Cg) is used. The description of an algorithm used for rendering DRRs on the graphics hardware is presented, together with a benchmark comparing this technique to a CPU-based wobbled splatting program. Results show a reduction of rendering time by about 70%-90% depending on the amount of data. For instance, rendering a volume of 2x10 6 voxels is feasible at an update rate of 38 Hz compared to 6 Hz on a common Intel-based PC using the graphics processing unit (GPU) of a conventional graphics adapter. In addition, wobbled splatting using graphics hardware for DRR computation provides higher resolution DRRs with comparable image quality due to special processing characteristics of the GPU. We conclude that DRR generation on common graphics hardware using the freely available Cg environment is a major step toward 2D/3D registration in clinical routine

  13. Many-core graph analytics using accelerated sparse linear algebra routines

    Science.gov (United States)

    Kozacik, Stephen; Paolini, Aaron L.; Fox, Paul; Kelmelis, Eric

    2016-05-01

    Graph analytics is a key component in identifying emerging trends and threats in many real-world applications. Largescale graph analytics frameworks provide a convenient and highly-scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. Another model of graph computation has emerged that promises improved performance and scalability by using abstract linear algebra operations as the basis for graph analysis as laid out by the GraphBLAS standard. By using sparse linear algebra as the basis, existing highly efficient algorithms can be adapted to perform computations on the graph. This approach, however, is often less intuitive to graph analytics experts, who are accustomed to vertex-centric APIs such as Giraph, GraphX, and Tinkerpop. We are developing an implementation of the high-level operations supported by these APIs in terms of linear algebra operations. This implementation is be backed by many-core implementations of the fundamental GraphBLAS operations required, and offers the advantages of both the intuitive programming model of a vertex-centric API and the performance of a sparse linear algebra implementation. This technology can reduce the number of nodes required, as well as the run-time for a graph analysis problem, enabling customers to perform more complex analysis with less hardware at lower cost. All of this can be accomplished without the requirement for the customer to make any changes to their analytics code, thanks to the compatibility with existing graph APIs.

  14. A Scalable Parallel PWTD-Accelerated SIE Solver for Analyzing Transient Scattering from Electrically Large Objects

    KAUST Repository

    Liu, Yang

    2015-12-17

    A scalable parallel plane-wave time-domain (PWTD) algorithm for efficient and accurate analysis of transient scattering from electrically large objects is presented. The algorithm produces scalable communication patterns on very large numbers of processors by leveraging two mechanisms: (i) a hierarchical parallelization strategy to evenly distribute the computation and memory loads at all levels of the PWTD tree among processors, and (ii) a novel asynchronous communication scheme to reduce the cost and memory requirement of the communications between the processors. The efficiency and accuracy of the algorithm are demonstrated through its applications to the analysis of transient scattering from a perfect electrically conducting (PEC) sphere with a diameter of 70 wavelengths and a PEC square plate with a dimension of 160 wavelengths. Furthermore, the proposed algorithm is used to analyze transient fields scattered from realistic airplane and helicopter models under high frequency excitation.

  15. Hardware-in-the-loop vehicle system including dynamic fuel cell model

    Energy Technology Data Exchange (ETDEWEB)

    Lemes, Z.; Lenhart, T.; Braun, M.; Maencher, H. [MAGNUM Automatisierungstechnik GmbH, Darmstadt (Germany)

    2005-07-01

    In order to reduce costs and accelerate the development of fuel cells and systems the usage of hardware-in-the-loop (HIL) testing and dynamic modelling opens new possibilities. The dynamic model of a proton exchange membrane fuel cell (PEMFC) together with a vehicle model is used to carry out a comprehensive system investigation, which allows designing and optimising the behaviour of the components and the entire fuel cell system. The set-up of a HIL system enables real time interaction between the selected hardware and the model. (orig.)

  16. Fast volume reconstruction in positron emission tomography: Implementation of four algorithms on a high-performance scalable parallel platform

    International Nuclear Information System (INIS)

    Egger, M.L.; Scheurer, A.H.; Joseph, C.

    1996-01-01

    The issue of long reconstruction times in PET has been addressed from several points of view, resulting in an affordable dedicated system capable of handling routine 3D reconstruction in a few minutes per frame: on the hardware side using fast processors and a parallel architecture, and on the software side, using efficient implementations of computationally less intensive algorithms. Execution times obtained for the PRT-1 data set on a parallel system of five hybrid nodes, each combining an Alpha processor for computation and a transputer for communication, are the following (256 sinograms of 96 views by 128 radial samples): Ramp algorithm 56 s, Favor 81 s and reprojection algorithm of Kinahan and Rogers 187 s. The implementation of fast rebinning algorithms has shown our hardware platform to become communications-limited; they execute faster on a conventional single-processor Alpha workstation: single-slice rebinning 7 s, Fourier rebinning 22 s, 2D filtered backprojection 5 s. The scalability of the system has been demonstrated, and a saturation effect at network sizes above ten nodes has become visible; new T9000-based products lifting most of the constraints on network topology and link throughput are expected to result in improved parallel efficiency and scalability properties

  17. A Message-Passing Hardware/Software Cosimulation Environment for Reconfigurable Computing Systems

    Directory of Open Access Journals (Sweden)

    Manuel Saldaña

    2009-01-01

    Full Text Available High-performance reconfigurable computers (HPRCs provide a mix of standard processors and FPGAs to collectively accelerate applications. This introduces new design challenges, such as the need for portable programming models across HPRCs and system-level verification tools. To address the need for cosimulating a complete heterogeneous application using both software and hardware in an HPRC, we have created a tool called the Message-passing Simulation Framework (MSF. We have used it to simulate and develop an interface enabling an MPI-based approach to exchange data between X86 processors and hardware engines inside FPGAs. The MSF can also be used as an application development tool that enables multiple FPGAs in simulation to exchange messages amongst themselves and with X86 processors. As an example, we simulate a LINPACK benchmark hardware core using an Intel-FSB-Xilinx-FPGA platform to quickly prototype the hardware, to test the communications. and to verify the benchmark results.

  18. Accelerating and benchmarking operating system functions in a “soft” system

    Directory of Open Access Journals (Sweden)

    Péter Molnár

    2015-06-01

    Full Text Available The todays computing technology provokes serious debates whether the operating system functions are implemented in the best possible way. The suggestions range from accelerating only certain functions through providing complete real-time operating systems as coprocessors to using simultaneously hardware and software implemented threads in the operating system. The performance gain in such systems depends on many factors, so its quantification is not a simple task at all. In addition to the subtleties of operating systems, the hardware accelerators in modern processors may considerably affect the results of such measurements. The reconfigurable systems offer a platform, where even end users can carry out reliable and accurate measurements. The paper presents a hardware acceleration idea for speeding up a simple OS service, its verification setup and the measurement results.

  19. Modern computer networks and distributed intelligence in accelerator controls

    International Nuclear Information System (INIS)

    Briegel, C.

    1991-01-01

    Appropriate hardware and software network protocols are surveyed for accelerator control environments. Accelerator controls network topologies are discussed with respect to the following criteria: vertical versus horizontal and distributed versus centralized. Decision-making considerations are provided for accelerator network architecture specification. Current trends and implementations at Fermilab are discussed

  20. GateKeeper: a new hardware architecture for accelerating pre-alignment in DNA short read mapping.

    Science.gov (United States)

    Alser, Mohammed; Hassan, Hasan; Xin, Hongyi; Ergin, Oguz; Mutlu, Onur; Alkan, Can

    2017-11-01

    High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments -called short reads- that cause significant computational burden. To analyze the entire genome, each of the billions of short reads must be mapped to a reference genome based on the similarity between a read and 'candidate' locations in that reference genome. The similarity measurement, called alignment, formulated as an approximate string matching problem, is the computational bottleneck because: (i) it is implemented using quadratic-time dynamic programming algorithms and (ii) the majority of candidate locations in the reference genome do not align with a given read due to high dissimilarity. Calculating the alignment of such incorrect candidate locations consumes an overwhelming majority of a modern read mapper's execution time. Therefore, it is crucial to develop a fast and effective filter that can detect incorrect candidate locations and eliminate them before invoking computationally costly alignment algorithms. We propose GateKeeper, a new hardware accelerator that functions as a pre-alignment step that quickly filters out most incorrect candidate locations. GateKeeper is the first design to accelerate pre-alignment using Field-Programmable Gate Arrays (FPGAs), which can perform pre-alignment much faster than software. When implemented on a single FPGA chip, GateKeeper maintains high accuracy (on average >96%) while providing, on average, 90-fold and 130-fold speedup over the state-of-the-art software pre-alignment techniques, Adjacency Filter and Shifted Hamming Distance (SHD), respectively. The addition of GateKeeper as a pre-alignment step can reduce the verification time of the mrFAST mapper by a factor of 10. https://github.com/BilkentCompGen/GateKeeper. mohammedalser@bilkent.edu.tr or onur.mutlu@inf.ethz.ch or calkan@cs.bilkent.edu.tr. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press

  1. Hardware-accelerated Point Generation and Rendering of Point-based Impostors

    DEFF Research Database (Denmark)

    Bærentzen, Jakob Andreas

    2005-01-01

    This paper presents a novel scheme for generating points from triangle models. The method is fast and lends itself well to implementation using graphics hardware. The triangle to point conversion is done by rendering the models, and the rendering may be performed procedurally or by a black box API....... I describe the technique in detail and discuss how the generated point sets can easily be used as impostors for the original triangle models used to create the points. Since the points reside solely in GPU memory, these impostors are fairly efficient. Source code is available online....

  2. Accelerated Adaptive MGS Phase Retrieval

    Science.gov (United States)

    Lam, Raymond K.; Ohara, Catherine M.; Green, Joseph J.; Bikkannavar, Siddarayappa A.; Basinger, Scott A.; Redding, David C.; Shi, Fang

    2011-01-01

    The Modified Gerchberg-Saxton (MGS) algorithm is an image-based wavefront-sensing method that can turn any science instrument focal plane into a wavefront sensor. MGS characterizes optical systems by estimating the wavefront errors in the exit pupil using only intensity images of a star or other point source of light. This innovative implementation of MGS significantly accelerates the MGS phase retrieval algorithm by using stream-processing hardware on conventional graphics cards. Stream processing is a relatively new, yet powerful, paradigm to allow parallel processing of certain applications that apply single instructions to multiple data (SIMD). These stream processors are designed specifically to support large-scale parallel computing on a single graphics chip. Computationally intensive algorithms, such as the Fast Fourier Transform (FFT), are particularly well suited for this computing environment. This high-speed version of MGS exploits commercially available hardware to accomplish the same objective in a fraction of the original time. The exploit involves performing matrix calculations in nVidia graphic cards. The graphical processor unit (GPU) is hardware that is specialized for computationally intensive, highly parallel computation. From the software perspective, a parallel programming model is used, called CUDA, to transparently scale multicore parallelism in hardware. This technology gives computationally intensive applications access to the processing power of the nVidia GPUs through a C/C++ programming interface. The AAMGS (Accelerated Adaptive MGS) software takes advantage of these advanced technologies, to accelerate the optical phase error characterization. With a single PC that contains four nVidia GTX-280 graphic cards, the new implementation can process four images simultaneously to produce a JWST (James Webb Space Telescope) wavefront measurement 60 times faster than the previous code.

  3. Advanced technologies for scalable ATLAS conditions database access on the grid

    International Nuclear Information System (INIS)

    Basset, R; Canali, L; Girone, M; Hawkings, R; Valassi, A; Viegas, F; Dimitrov, G; Nevski, P; Vaniachine, A; Walker, R; Wong, A

    2010-01-01

    During massive data reprocessing operations an ATLAS Conditions Database application must support concurrent access from numerous ATLAS data processing jobs running on the Grid. By simulating realistic work-flow, ATLAS database scalability tests provided feedback for Conditions Db software optimization and allowed precise determination of required distributed database resources. In distributed data processing one must take into account the chaotic nature of Grid computing characterized by peak loads, which can be much higher than average access rates. To validate database performance at peak loads, we tested database scalability at very high concurrent jobs rates. This has been achieved through coordinated database stress tests performed in series of ATLAS reprocessing exercises at the Tier-1 sites. The goal of database stress tests is to detect scalability limits of the hardware deployed at the Tier-1 sites, so that the server overload conditions can be safely avoided in a production environment. Our analysis of server performance under stress tests indicates that Conditions Db data access is limited by the disk I/O throughput. An unacceptable side-effect of the disk I/O saturation is a degradation of the WLCG 3D Services that update Conditions Db data at all ten ATLAS Tier-1 sites using the technology of Oracle Streams. To avoid such bottlenecks we prototyped and tested a novel approach for database peak load avoidance in Grid computing. Our approach is based upon the proven idea of pilot job submission on the Grid: instead of the actual query, an ATLAS utility library sends to the database server a pilot query first.

  4. Acquisition of reliable vacuum hardware for large accelerator systems

    International Nuclear Information System (INIS)

    Welch, K.M.

    1995-01-01

    Credible and effective communications prove to be the major challenge in the acquisition of reliable vacuum hardware. Technical competence is necessary but not sufficient. The authors must effectively communicate with management, sponsoring agencies, project organizations, service groups, staff and with vendors. Most of Deming's 14 quality assurance tenants relate to creating an enlightened environment of good communications. All projects progress along six distinct, closely coupled, dynamic phases. All six phases are in a state of perpetual change. These phases and their elements are discussed, with emphasis given to the acquisition phase and its related vocabulary. Large projects require great clarity and rigor as poor communications can be costly. For rigor to be cost effective, it can't be pedantic. Clarity thrives best in a low-risk, team environment

  5. Comparison Of Hybrid Sorting Algorithms Implemented On Different Parallel Hardware Platforms

    Directory of Open Access Journals (Sweden)

    Dominik Zurek

    2013-01-01

    Full Text Available Sorting is a common problem in computer science. There are lot of well-known sorting algorithms created for sequential execution on a single processor. Recently, hardware platforms enable to create wide parallel algorithms. We have standard processors consist of multiple cores and hardware accelerators like GPU. The graphic cards with their parallel architecture give new possibility to speed up many algorithms. In this paper we describe results of implementation of a few different sorting algorithms on GPU cards and multicore processors. Then hybrid algorithm will be presented which consists of parts executed on both platforms, standard CPU and GPU.

  6. 500 kV mercury accelerator

    International Nuclear Information System (INIS)

    Brodowski, J.; Maschke, A.W.; Mobley, R.M.; Keane, J.T.; Meier, E.

    1979-01-01

    The objective of building a low-cost pre-accelerator for low energy heavy ion particle accelerator was realized by using standard, readily available material and hardware. Some savings were obtained in the construction of the dome by avoiding welding, expensive metal spinnings and unnecessary corona rings. Larger monetary economies were realized by unique approach to building the high voltage column utilizing a glass tube

  7. Architecture design of reconfigurable accelerators for demanding apllications.

    NARCIS (Netherlands)

    Jozwiak, L.; Jan, Y.

    2010-01-01

    This paper focuses on mastering the architecture development of reconfigurable hardware accelerators for highly demanding applications. It presents the results of our analysis of the main issues that have to be addressed when designing accelerators for demanding applications, when using as an

  8. Scalable Nanomanufacturing—A Review

    Directory of Open Access Journals (Sweden)

    Khershed Cooper

    2017-01-01

    Full Text Available This article describes the field of scalable nanomanufacturing, its importance and need, its research activities and achievements. The National Science Foundation is taking a leading role in fostering basic research in scalable nanomanufacturing (SNM. From this effort several novel nanomanufacturing approaches have been proposed, studied and demonstrated, including scalable nanopatterning. This paper will discuss SNM research areas in materials, processes and applications, scale-up methods with project examples, and manufacturing challenges that need to be addressed to move nanotechnology discoveries closer to the marketplace.

  9. Acceleration of polarized protons in the IHEP accelerator complex

    International Nuclear Information System (INIS)

    Anferov, V.A.; Ado, Yu.M.; Shoumkin, D.

    1995-01-01

    The paper considers possibility to accelerate polarized beam in the IHEP accelerator complex (including first stage of the UNK). The scheme of preserving beam polarization is described for all acceleration stages up to 400 GeV beam energy. Polarization and intensity of the polarized proton beam are estimated. The suggested scheme includes using two Siberian snakes in opposite straight sections of the UNK-1, where each snake consists of five dipole magnets. In the U-70 it is suggested to use one helical Siberian snake, which is turned on adiabatically at 10 GeV, and four pulsed quadrupoles. To incorporate the snake into the accelerator lattice it is proposed to make modification of one superperiod. This would make a 13 m long straight section. Spin depolarization in the Booster is avoided by decreasing the extraction energy to 0.9 GeV. Then no additional hardware is required in the Booster

  10. Hardware Acceleration of SQL-Queries Processing in MDM-Systems Based on MISDSolution

    Directory of Open Access Journals (Sweden)

    V. E. Podol'skii

    2015-01-01

    Full Text Available In this article we examine the possibility of hardware support for functions of mobile device management platform (MDM-platform using a Multiple Instructions and Single Data stream computer system, developed within the framework of the project in Bauman Moscow State Technical University. At the universities the MDM-platform is used to provide various mobile services for the faculty, students and administration to facilitate the learning process: a mobile schedule, document sharing, text messages, and other interactive activities. Most of these services are provided by the extensive use of data stored in MDM-platform databases. When accessing the databases SQL- queries are commonly used. These queries comprise operators of SQL-language that are based on mathematical sets theory. Hardware support for operations on sets is implemented in Multiple Instructions and Single Data stream computer system (MISD System. This allows performance improvement of algorithms and operations on sets. Thus, the hardware support for the processing of SQL-queries in MISD system allows us to benefit from the implementation of SQL-queries in the MISD paradigm.The scientific novelty of the work lies in the fact that it is the first time a set of algorithms for basic SQL statements has been presented in a format supported by MISD system. In addition, for the first time operators INNER JOIN, LEFT JOIN and LEFT OUTER JOIN have been implemented for MISD system and tested for it (testing was done for FPGA Xilinx Virtex-II Pro XC2VP30 implementation of MISD system. The practical significance of the work lies in the fact that the results of the study will be used in the project "Development of the Russian analogue of the system software for centralized management of personal devices and platforms in enterprise networks" of the St. Petersburg Polytechnic University (with the financial support of the state represented by the Ministry of Education and Science of the Russian

  11. Fast and Reliable Mouse Picking Using Graphics Hardware

    Directory of Open Access Journals (Sweden)

    Hanli Zhao

    2009-01-01

    Full Text Available Mouse picking is the most commonly used intuitive operation to interact with 3D scenes in a variety of 3D graphics applications. High performance for such operation is necessary in order to provide users with fast responses. This paper proposes a fast and reliable mouse picking algorithm using graphics hardware for 3D triangular scenes. Our approach uses a multi-layer rendering algorithm to perform the picking operation in linear time complexity. The objectspace based ray-triangle intersection test is implemented in a highly parallelized geometry shader. After applying the hardware-supported occlusion queries, only a small number of objects (or sub-objects are rendered in subsequent layers, which accelerates the picking efficiency. Experimental results demonstrate the high performance of our novel approach. Due to its simplicity, our algorithm can be easily integrated into existing real-time rendering systems.

  12. Novel flat datacenter network architecture based on scalable and flow-controlled optical switch system.

    Science.gov (United States)

    Miao, Wang; Luo, Jun; Di Lucente, Stefano; Dorren, Harm; Calabretta, Nicola

    2014-02-10

    We propose and demonstrate an optical flat datacenter network based on scalable optical switch system with optical flow control. Modular structure with distributed control results in port-count independent optical switch reconfiguration time. RF tone in-band labeling technique allowing parallel processing of the label bits ensures the low latency operation regardless of the switch port-count. Hardware flow control is conducted at optical level by re-using the label wavelength without occupying extra bandwidth, space, and network resources which further improves the performance of latency within a simple structure. Dynamic switching including multicasting operation is validated for a 4 x 4 system. Error free operation of 40 Gb/s data packets has been achieved with only 1 dB penalty. The system could handle an input load up to 0.5 providing a packet loss lower that 10(-5) and an average latency less that 500 ns when a buffer size of 16 packets is employed. Investigation on scalability also indicates that the proposed system could potentially scale up to large port count with limited power penalty.

  13. Hardware Resource Allocation for Hardware/Software Partitioning in the LYCOS System

    DEFF Research Database (Denmark)

    Grode, Jesper Nicolai Riis; Knudsen, Peter Voigt; Madsen, Jan

    1998-01-01

    as a designer's/design tool's aid to generate good hardware allocations for use in hardware/software partitioning. The algorithm has been implemented in a tool under the LYCOS system. The results show that the allocations produced by the algorithm come close to the best allocations obtained by exhaustive search.......This paper presents a novel hardware resource allocation technique for hardware/software partitioning. It allocates hardware resources to the hardware data-path using information such as data-dependencies between operations in the application, and profiling information. The algorithm is useful...

  14. Optimized hardware framework of MLP with random hidden layers for classification applications

    Science.gov (United States)

    Zyarah, Abdullah M.; Ramesh, Abhishek; Merkel, Cory; Kudithipudi, Dhireesha

    2016-05-01

    Multilayer Perceptron Networks with random hidden layers are very efficient at automatic feature extraction and offer significant performance improvements in the training process. They essentially employ large collection of fixed, random features, and are expedient for form-factor constrained embedded platforms. In this work, a reconfigurable and scalable architecture is proposed for the MLPs with random hidden layers with a customized building block based on CORDIC algorithm. The proposed architecture also exploits fixed point operations for area efficiency. The design is validated for classification on two different datasets. An accuracy of ~ 90% for MNIST dataset and 75% for gender classification on LFW dataset was observed. The hardware has 299 speed-up over the corresponding software realization.

  15. Feasibility and advantages of commercial process I/O systems for accelerator control

    International Nuclear Information System (INIS)

    Belshe, R.A.; Elischer, V.P.; Jacobson, V.

    1975-03-01

    Control systems for large particle accelerators must be able to handle analog and digital signals and timing coordination for devices which are spread over a large physical area. Many signals must be converted and transmitted to and from a central control area during each accelerator cycle. Digital transmission is often used to combat common mode and RF interference. Most accelerators in use today have met these requirements with custom process I/O hardware, data transmission systems, and computer interfaces. In-house development of hardware and software has been a very costly and time consuming process, but due to the lack of available commercial equipment, there was often no other alternative. Today, a large portion of these development costs can be avoided. Small control computers are now available off the shelf which have extensive process control I/O hardware and software capability. Computer control should be designed into accelerator systems from the beginning, using operating systems available from manufacturer. With most of the systems programming done, the designers can begin immediately on the applications software. (U.S.)

  16. Interfacing to accelerator instrumentation

    International Nuclear Information System (INIS)

    Shea, T.J.

    1995-01-01

    As the sensory system for an accelerator, the beam instrumentation provides a tremendous amount of diagnostic information. Access to this information can vary from periodic spot checks by operators to high bandwidth data acquisition during studies. In this paper, example applications will illustrate the requirements on interfaces between the control system and the instrumentation hardware. A survey of the major accelerator facilities will identify the most popular interface standards. The impact of developments such as isochronous protocols and embedded digital signal processing will also be discussed

  17. Hardware for soft computing and soft computing for hardware

    CERN Document Server

    Nedjah, Nadia

    2014-01-01

    Single and Multi-Objective Evolutionary Computation (MOEA),  Genetic Algorithms (GAs), Artificial Neural Networks (ANNs), Fuzzy Controllers (FCs), Particle Swarm Optimization (PSO) and Ant colony Optimization (ACO) are becoming omnipresent in almost every intelligent system design. Unfortunately, the application of the majority of these techniques is complex and so requires a huge computational effort to yield useful and practical results. Therefore, dedicated hardware for evolutionary, neural and fuzzy computation is a key issue for designers. With the spread of reconfigurable hardware such as FPGAs, digital as well as analog hardware implementations of such computation become cost-effective. The idea behind this book is to offer a variety of hardware designs for soft computing techniques that can be embedded in any final product. Also, to introduce the successful application of soft computing technique to solve many hard problem encountered during the design of embedded hardware designs. Reconfigurable em...

  18. Tomographic image reconstruction and rendering with texture-mapping hardware

    International Nuclear Information System (INIS)

    Azevedo, S.G.; Cabral, B.K.; Foran, J.

    1994-07-01

    The image reconstruction problem, also known as the inverse Radon transform, for x-ray computed tomography (CT) is found in numerous applications in medicine and industry. The most common algorithm used in these cases is filtered backprojection (FBP), which, while a simple procedure, is time-consuming for large images on any type of computational engine. Specially-designed, dedicated parallel processors are commonly used in medical CT scanners, whose results are then passed to graphics workstation for rendering and analysis. However, a fast direct FBP algorithm can be implemented on modern texture-mapping hardware in current high-end workstation platforms. This is done by casting the FBP algorithm as an image warping operation with summing. Texture-mapping hardware, such as that on the Silicon Graphics Reality Engine (TM), shows around 600 times speedup of backprojection over a CPU-based implementation (a 100 Mhz R4400 in this case). This technique has the further advantages of flexibility and rapid programming. In addition, the same hardware can be used for both image reconstruction and for volumetric rendering. The techniques can also be used to accelerate iterative reconstruction algorithms. The hardware architecture also allows more complex operations than straight-ray backprojection if they are required, including fan-beam, cone-beam, and curved ray paths, with little or no speed penalties

  19. Hardware Acceleration of SQL-Queries Processing in MDM-Systems Based on MISDSolution

    OpenAIRE

    V. E. Podol'skii; A. V. Samochadin; S. S. Koloskov

    2015-01-01

    In this article we examine the possibility of hardware support for functions of mobile device management platform (MDM-platform) using a Multiple Instructions and Single Data stream computer system, developed within the framework of the project in Bauman Moscow State Technical University. At the universities the MDM-platform is used to provide various mobile services for the faculty, students and administration to facilitate the learning process: a mobile schedule, document sharing, text mess...

  20. FPGA-accelerated simulation of computer systems

    CERN Document Server

    Angepat, Hari; Chung, Eric S; Hoe, James C; Chung, Eric S

    2014-01-01

    To date, the most common form of simulators of computer systems are software-based running on standard computers. One promising approach to improve simulation performance is to apply hardware, specifically reconfigurable hardware in the form of field programmable gate arrays (FPGAs). This manuscript describes various approaches of using FPGAs to accelerate software-implemented simulation of computer systems and selected simulators that incorporate those techniques. More precisely, we describe a simulation architecture taxonomy that incorporates a simulation architecture specifically designed f

  1. CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU.

    Science.gov (United States)

    Jiang, Hanyu; Ganesan, Narayan

    2016-02-27

    HMMER software suite is widely used for analysis of homologous protein and nucleotide sequences with high sensitivity. The latest version of hmmsearch in HMMER 3.x, utilizes heuristic-pipeline which consists of MSV/SSV (Multiple/Single ungapped Segment Viterbi) stage, P7Viterbi stage and the Forward scoring stage to accelerate homology detection. Since the latest version is highly optimized for performance on modern multi-core CPUs with SSE capabilities, only a few acceleration attempts report speedup. However, the most compute intensive tasks within the pipeline (viz., MSV/SSV and P7Viterbi stages) still stand to benefit from the computational capabilities of massively parallel processors. A Multi-Tiered Parallel Framework (CUDAMPF) implemented on CUDA-enabled GPUs presented here, offers a finer-grained parallelism for MSV/SSV and Viterbi algorithms. We couple SIMT (Single Instruction Multiple Threads) mechanism with SIMD (Single Instructions Multiple Data) video instructions with warp-synchronism to achieve high-throughput processing and eliminate thread idling. We also propose a hardware-aware optimal allocation scheme of scarce resources like on-chip memory and caches in order to boost performance and scalability of CUDAMPF. In addition, runtime compilation via NVRTC available with CUDA 7.0 is incorporated into the presented framework that not only helps unroll innermost loop to yield upto 2 to 3-fold speedup than static compilation but also enables dynamic loading and switching of kernels depending on the query model size, in order to achieve optimal performance. CUDAMPF is designed as a hardware-aware parallel framework for accelerating computational hotspots within the hmmsearch pipeline as well as other sequence alignment applications. It achieves significant speedup by exploiting hierarchical parallelism on single GPU and takes full advantage of limited resources based on their own performance features. In addition to exceeding performance of other

  2. Automatic generation of application specific FPGA multicore accelerators

    DEFF Research Database (Denmark)

    Hindborg, Andreas Erik; Schleuniger, Pascal; Jensen, Nicklas Bo

    2014-01-01

    High performance computing systems make increasing use of hardware accelerators to improve performance and power properties. For large high-performance FPGAs to be successfully integrated in such computing systems, methods to raise the abstraction level of FPGA programming are required...... to identify optimal performance energy trade-offs points for a multicore based FPGA accelerator....

  3. Numeric Analysis for Relationship-Aware Scalable Streaming Scheme

    Directory of Open Access Journals (Sweden)

    Heung Ki Lee

    2014-01-01

    Full Text Available Frequent packet loss of media data is a critical problem that degrades the quality of streaming services over mobile networks. Packet loss invalidates frames containing lost packets and other related frames at the same time. Indirect loss caused by losing packets decreases the quality of streaming. A scalable streaming service can decrease the amount of dropped multimedia resulting from a single packet loss. Content providers typically divide one large media stream into several layers through a scalable streaming service and then provide each scalable layer to the user depending on the mobile network. Also, a scalable streaming service makes it possible to decode partial multimedia data depending on the relationship between frames and layers. Therefore, a scalable streaming service provides a way to decrease the wasted multimedia data when one packet is lost. However, the hierarchical structure between frames and layers of scalable streams determines the service quality of the scalable streaming service. Even if whole packets of layers are transmitted successfully, they cannot be decoded as a result of the absence of reference frames and layers. Therefore, the complicated relationship between frames and layers in a scalable stream increases the volume of abandoned layers. For providing a high-quality scalable streaming service, we choose a proper relationship between scalable layers as well as the amount of transmitted multimedia data depending on the network situation. We prove that a simple scalable scheme outperforms a complicated scheme in an error-prone network. We suggest an adaptive set-top box (AdaptiveSTB to lower the dependency between scalable layers in a scalable stream. Also, we provide a numerical model to obtain the indirect loss of multimedia data and apply it to various multimedia streams. Our AdaptiveSTB enhances the quality of a scalable streaming service by removing indirect loss.

  4. Scalable coherent interface

    International Nuclear Information System (INIS)

    Alnaes, K.; Kristiansen, E.H.; Gustavson, D.B.; James, D.V.

    1990-01-01

    The Scalable Coherent Interface (IEEE P1596) is establishing an interface standard for very high performance multiprocessors, supporting a cache-coherent-memory model scalable to systems with up to 64K nodes. This Scalable Coherent Interface (SCI) will supply a peak bandwidth per node of 1 GigaByte/second. The SCI standard should facilitate assembly of processor, memory, I/O and bus bridge cards from multiple vendors into massively parallel systems with throughput far above what is possible today. The SCI standard encompasses two levels of interface, a physical level and a logical level. The physical level specifies electrical, mechanical and thermal characteristics of connectors and cards that meet the standard. The logical level describes the address space, data transfer protocols, cache coherence mechanisms, synchronization primitives and error recovery. In this paper we address logical level issues such as packet formats, packet transmission, transaction handshake, flow control, and cache coherence. 11 refs., 10 figs

  5. A low power biomedical signal processor ASIC based on hardware software codesign.

    Science.gov (United States)

    Nie, Z D; Wang, L; Chen, W G; Zhang, T; Zhang, Y T

    2009-01-01

    A low power biomedical digital signal processor ASIC based on hardware and software codesign methodology was presented in this paper. The codesign methodology was used to achieve higher system performance and design flexibility. The hardware implementation included a low power 32bit RISC CPU ARM7TDMI, a low power AHB-compatible bus, and a scalable digital co-processor that was optimized for low power Fast Fourier Transform (FFT) calculations. The co-processor could be scaled for 8-point, 16-point and 32-point FFTs, taking approximate 50, 100 and 150 clock circles, respectively. The complete design was intensively simulated using ARM DSM model and was emulated by ARM Versatile platform, before conducted to silicon. The multi-million-gate ASIC was fabricated using SMIC 0.18 microm mixed-signal CMOS 1P6M technology. The die area measures 5,000 microm x 2,350 microm. The power consumption was approximately 3.6 mW at 1.8 V power supply and 1 MHz clock rate. The power consumption for FFT calculations was less than 1.5 % comparing with the conventional embedded software-based solution.

  6. Internet-based hardware/software co-design framework for embedded 3D graphics applications

    Directory of Open Access Journals (Sweden)

    Wong Weng-Fai

    2011-01-01

    Full Text Available Abstract Advances in technology are making it possible to run three-dimensional (3D graphics applications on embedded and handheld devices. In this article, we propose a hardware/software co-design environment for 3D graphics application development that includes the 3D graphics software, OpenGL ES application programming interface (API, device driver, and 3D graphics hardware simulators. We developed a 3D graphics system-on-a-chip (SoC accelerator using transaction-level modeling (TLM. This gives software designers early access to the hardware even before it is ready. On the other hand, hardware designers also stand to gain from the more complex test benches made available in the software for verification. A unique aspect of our framework is that it allows hardware and software designers from geographically dispersed areas to cooperate and work on the same framework. Designs can be entered and executed from anywhere in the world without full access to the entire framework, which may include proprietary components. This results in controlled and secure transparency and reproducibility, granting leveled access to users of various roles.

  7. Scalable photoreactor for hydrogen production

    KAUST Repository

    Takanabe, Kazuhiro; Shinagawa, Tatsuya

    2017-01-01

    Provided herein are scalable photoreactors that can include a membrane-free water- splitting electrolyzer and systems that can include a plurality of membrane-free water- splitting electrolyzers. Also provided herein are methods of using the scalable photoreactors provided herein.

  8. Scalable photoreactor for hydrogen production

    KAUST Repository

    Takanabe, Kazuhiro

    2017-04-06

    Provided herein are scalable photoreactors that can include a membrane-free water- splitting electrolyzer and systems that can include a plurality of membrane-free water- splitting electrolyzers. Also provided herein are methods of using the scalable photoreactors provided herein.

  9. Accelerator Control and Global Networks State of the Art

    CERN Document Server

    Gurd, D P

    2004-01-01

    As accelerators increase in size and complexity, demands upon their control systems increase correspondingly. Machine complexity is reflected in complexity of control system hardware and software and careful configuration management is essential. Model-based procedures and fast feedback based upon even faster beam instrumentation are often required. Managing machine protection systems with tens of thousands of inputs is another significant challenge. Increased use of commodity hardware and software introduces new issues of security and control. Large new facilities will increasingly be built by national (e.g. SNS) or international (e.g. a linear collider) collaborations. Building an integrated control system for an accelerator whose development is geographically widespread presents particular problems, not all of them technical. Recent discussions of a “Global Accelerator Network” include the possibility of multiple remote control rooms and no more night shifts. Based upon current experien...

  10. Resource-aware complexity scalability for mobile MPEG encoding

    NARCIS (Netherlands)

    Mietens, S.O.; With, de P.H.N.; Hentschel, C.; Panchanatan, S.; Vasudev, B.

    2004-01-01

    Complexity scalability attempts to scale the required resources of an algorithm with the chose quality settings, in order to broaden the application range. In this paper, we present complexity-scalable MPEG encoding of which the core processing modules are modified for scalability. Scalability is

  11. A versatile scalable PET processing system

    International Nuclear Information System (INIS)

    Dong, H.; Weisenberger, A.; McKisson, J.; Wenze, Xi; Cuevas, C.; Wilson, J.; Zukerman, L.

    2011-01-01

    Positron Emission Tomography (PET) historically has major clinical and preclinical applications in cancerous oncology, neurology, and cardiovascular diseases. Recently, in a new direction, an application specific PET system is being developed at Thomas Jefferson National Accelerator Facility (Jefferson Lab) in collaboration with Duke University, University of Maryland at Baltimore (UMAB), and West Virginia University (WVU) targeted for plant eco-physiology research. The new plant imaging PET system is versatile and scalable such that it could adapt to several plant imaging needs - imaging many important plant organs including leaves, roots, and stems. The mechanical arrangement of the detectors is designed to accommodate the unpredictable and random distribution in space of the plant organs without requiring the plant be disturbed. Prototyping such a system requires a new data acquisition system (DAQ) and data processing system which are adaptable to the requirements of these unique and versatile detectors.

  12. Scalable Light Module for Low-Cost, High-Efficiency Light- Emitting Diode Luminaires

    Energy Technology Data Exchange (ETDEWEB)

    Tarsa, Eric [Cree, Inc., Goleta, CA (United States)

    2015-08-31

    During this two-year program Cree developed a scalable, modular optical architecture for low-cost, high-efficacy light emitting diode (LED) luminaires. Stated simply, the goal of this architecture was to efficiently and cost-effectively convey light from LEDs (point sources) to broad luminaire surfaces (area sources). By simultaneously developing warm-white LED components and low-cost, scalable optical elements, a high system optical efficiency resulted. To meet program goals, Cree evaluated novel approaches to improve LED component efficacy at high color quality while not sacrificing LED optical efficiency relative to conventional packages. Meanwhile, efficiently coupling light from LEDs into modular optical elements, followed by optimally distributing and extracting this light, were challenges that were addressed via novel optical design coupled with frequent experimental evaluations. Minimizing luminaire bill of materials and assembly costs were two guiding principles for all design work, in the effort to achieve luminaires with significantly lower normalized cost ($/klm) than existing LED fixtures. Chief project accomplishments included the achievement of >150 lm/W warm-white LEDs having primary optics compatible with low-cost modular optical elements. In addition, a prototype Light Module optical efficiency of over 90% was measured, demonstrating the potential of this scalable architecture for ultra-high-efficacy LED luminaires. Since the project ended, Cree has continued to evaluate optical element fabrication and assembly methods in an effort to rapidly transfer this scalable, cost-effective technology to Cree production development groups. The Light Module concept is likely to make a strong contribution to the development of new cost-effective, high-efficacy luminaries, thereby accelerating widespread adoption of energy-saving SSL in the U.S.

  13. Introduction to Hardware Security

    Directory of Open Access Journals (Sweden)

    Yier Jin

    2015-10-01

    Full Text Available Hardware security has become a hot topic recently with more and more researchers from related research domains joining this area. However, the understanding of hardware security is often mixed with cybersecurity and cryptography, especially cryptographic hardware. For the same reason, the research scope of hardware security has never been clearly defined. To help researchers who have recently joined in this area better understand the challenges and tasks within the hardware security domain and to help both academia and industry investigate countermeasures and solutions to solve hardware security problems, we will introduce the key concepts of hardware security as well as its relations to related research topics in this survey paper. Emerging hardware security topics will also be clearly depicted through which the future trend will be elaborated, making this survey paper a good reference for the continuing research efforts in this area.

  14. Argonne Wakefield Accelerator update '92

    International Nuclear Information System (INIS)

    Rosing, M.; Balka, L.; Chojnacki, E.; Gai, W.; Ho, C.; Konecny, R.; Power, J.; Schoessow, P.; Simpson, J.

    1992-01-01

    The construction of the Argonne Wakefield Accelerator (AWA) is under way. The majority of the hardware is about to be delivered or is installed. Radiation safety systems are in the review process, and the laser system is operational. Bunch production should begin in December 1992. 4 refs., 5 figs

  15. FY1995 evolvable hardware chip; 1995 nendo shinkasuru hardware chip

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-03-01

    This project aims at the development of 'Evolvable Hardware' (EHW) which can adapt its hardware structure to the environment to attain better hardware performance, under the control of genetic algorithms. EHW is a key technology to explore the new application area requiring real-time performance and on-line adaptation. 1. Development of EHW-LSI for function level hardware evolution, which includes 15 DSPs in one chip. 2. Application of the EHW to the practical industrial applications such as data compression, ATM control, digital mobile communication. 3. Two patents : (1) the architecture and the processing method for programmable EHW-LSI. (2) The method of data compression for loss-less data, using EHW. 4. The first international conference for evolvable hardware was held by authors: Intl. Conf. on Evolvable Systems (ICES96). It was determined at ICES96 that ICES will be held every two years between Japan and Europe. So the new society has been established by us. (NEDO)

  16. FY1995 evolvable hardware chip; 1995 nendo shinkasuru hardware chip

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-03-01

    This project aims at the development of 'Evolvable Hardware' (EHW) which can adapt its hardware structure to the environment to attain better hardware performance, under the control of genetic algorithms. EHW is a key technology to explore the new application area requiring real-time performance and on-line adaptation. 1. Development of EHW-LSI for function level hardware evolution, which includes 15 DSPs in one chip. 2. Application of the EHW to the practical industrial applications such as data compression, ATM control, digital mobile communication. 3. Two patents : (1) the architecture and the processing method for programmable EHW-LSI. (2) The method of data compression for loss-less data, using EHW. 4. The first international conference for evolvable hardware was held by authors: Intl. Conf. on Evolvable Systems (ICES96). It was determined at ICES96 that ICES will be held every two years between Japan and Europe. So the new society has been established by us. (NEDO)

  17. Exploring Hardware Support For Scaling Irregular Applications on Multi-node Multi-core Architectures

    Energy Technology Data Exchange (ETDEWEB)

    Secchi, Simone; Ceriani, Marco; Tumeo, Antonino; Villa, Oreste; Palermo, Gianluca; Raffo, Luigi

    2013-06-05

    With the recent emergence of large-scale knowledge dis- covery, data mining and social network analysis, irregular applications have gained renewed interest. Classic cache-based high-performance architectures do not provide optimal performances with such kind of workloads, mainly due to the very low spatial and temporal locality of the irregular control and memory access patterns. In this paper, we present a multi-node, multi-core, fine-grained multi-threaded shared-memory system architecture specifically designed for the execution of large-scale irregular applications, and built on top of three pillars, that we believe are fundamental to support these workloads. First, we offer transparent hardware support for Partitioned Global Address Space (PGAS) to provide a large globally-shared address space with no software library overhead. Second, we employ multi-threaded multi-core processing nodes to achieve the necessary latency tolerance required by accessing global memory, which potentially resides in a remote node. Finally, we devise hardware support for inter-thread synchronization on the whole global address space. We first model the performances by using an analytical model that takes into account the main architecture and application characteristics. We describe the hardware design of the proposed cus- tom architectural building blocks that provide support for the above- mentioned three pillars. Finally, we present a limited-scale evaluation of the system on a multi-board FPGA prototype with typical irregular kernels and benchmarks. The experimental evaluation demonstrates the architecture performance scalability for different configurations of the whole system.

  18. Accelerating Climate Simulations Through Hybrid Computing

    Science.gov (United States)

    Zhou, Shujia; Sinno, Scott; Cruz, Carlos; Purcell, Mark

    2009-01-01

    Unconventional multi-core processors (e.g., IBM Cell B/E and NYIDIDA GPU) have emerged as accelerators in climate simulation. However, climate models typically run on parallel computers with conventional processors (e.g., Intel and AMD) using MPI. Connecting accelerators to this architecture efficiently and easily becomes a critical issue. When using MPI for connection, we identified two challenges: (1) identical MPI implementation is required in both systems, and; (2) existing MPI code must be modified to accommodate the accelerators. In response, we have extended and deployed IBM Dynamic Application Virtualization (DAV) in a hybrid computing prototype system (one blade with two Intel quad-core processors, two IBM QS22 Cell blades, connected with Infiniband), allowing for seamlessly offloading compute-intensive functions to remote, heterogeneous accelerators in a scalable, load-balanced manner. Currently, a climate solar radiation model running with multiple MPI processes has been offloaded to multiple Cell blades with approx.10% network overhead.

  19. A Modular Framework for Modeling Hardware Elements in Distributed Engine Control Systems

    Science.gov (United States)

    Zinnecker, Alicia M.; Culley, Dennis E.; Aretskin-Hariton, Eliot D.

    2015-01-01

    Progress toward the implementation of distributed engine control in an aerospace application may be accelerated through the development of a hardware-in-the-loop (HIL) system for testing new control architectures and hardware outside of a physical test cell environment. One component required in an HIL simulation system is a high-fidelity model of the control platform: sensors, actuators, and the control law. The control system developed for the Commercial Modular Aero-Propulsion System Simulation 40k (C-MAPSS40k) provides a verifiable baseline for development of a model for simulating a distributed control architecture. This distributed controller model will contain enhanced hardware models, capturing the dynamics of the transducer and the effects of data processing, and a model of the controller network. A multilevel framework is presented that establishes three sets of interfaces in the control platform: communication with the engine (through sensors and actuators), communication between hardware and controller (over a network), and the physical connections within individual pieces of hardware. This introduces modularity at each level of the model, encouraging collaboration in the development and testing of various control schemes or hardware designs. At the hardware level, this modularity is leveraged through the creation of a SimulinkR library containing blocks for constructing smart transducer models complying with the IEEE 1451 specification. These hardware models were incorporated in a distributed version of the baseline C-MAPSS40k controller and simulations were run to compare the performance of the two models. The overall tracking ability differed only due to quantization effects in the feedback measurements in the distributed controller. Additionally, it was also found that the added complexity of the smart transducer models did not prevent real-time operation of the distributed controller model, a requirement of an HIL system.

  20. Scalable Frequent Subgraph Mining

    KAUST Repository

    Abdelhamid, Ehab

    2017-06-19

    A graph is a data structure that contains a set of nodes and a set of edges connecting these nodes. Nodes represent objects while edges model relationships among these objects. Graphs are used in various domains due to their ability to model complex relations among several objects. Given an input graph, the Frequent Subgraph Mining (FSM) task finds all subgraphs with frequencies exceeding a given threshold. FSM is crucial for graph analysis, and it is an essential building block in a variety of applications, such as graph clustering and indexing. FSM is computationally expensive, and its existing solutions are extremely slow. Consequently, these solutions are incapable of mining modern large graphs. This slowness is caused by the underlying approaches of these solutions which require finding and storing an excessive amount of subgraph matches. This dissertation proposes a scalable solution for FSM that avoids the limitations of previous work. This solution is composed of four components. The first component is a single-threaded technique which, for each candidate subgraph, needs to find only a minimal number of matches. The second component is a scalable parallel FSM technique that utilizes a novel two-phase approach. The first phase quickly builds an approximate search space, which is then used by the second phase to optimize and balance the workload of the FSM task. The third component focuses on accelerating frequency evaluation, which is a critical step in FSM. To do so, a machine learning model is employed to predict the type of each graph node, and accordingly, an optimized method is selected to evaluate that node. The fourth component focuses on mining dynamic graphs, such as social networks. To this end, an incremental index is maintained during the dynamic updates. Only this index is processed and updated for the majority of graph updates. Consequently, search space is significantly pruned and efficiency is improved. The empirical evaluation shows that the

  1. Adaptive format conversion for scalable video coding

    Science.gov (United States)

    Wan, Wade K.; Lim, Jae S.

    2001-12-01

    The enhancement layer in many scalable coding algorithms is composed of residual coding information. There is another type of information that can be transmitted instead of (or in addition to) residual coding. Since the encoder has access to the original sequence, it can utilize adaptive format conversion (AFC) to generate the enhancement layer and transmit the different format conversion methods as enhancement data. This paper investigates the use of adaptive format conversion information as enhancement data in scalable video coding. Experimental results are shown for a wide range of base layer qualities and enhancement bitrates to determine when AFC can improve video scalability. Since the parameters needed for AFC are small compared to residual coding, AFC can provide video scalability at low enhancement layer bitrates that are not possible with residual coding. In addition, AFC can also be used in addition to residual coding to improve video scalability at higher enhancement layer bitrates. Adaptive format conversion has not been studied in detail, but many scalable applications may benefit from it. An example of an application that AFC is well-suited for is the migration path for digital television where AFC can provide immediate video scalability as well as assist future migrations.

  2. A simple route to scalable fabrication of perfectly ordered ZnO nanorod arrays

    International Nuclear Information System (INIS)

    Liu, D F; Xiang, Y J; Liao, Q; Zhang, J P; Wu, X C; Zhang, Z X; Liu, L F; Ma, W J; Shen, J; Zhou, W Y; Xie, S S

    2007-01-01

    ZnO nanorod arrays with perfect order and uniformity were prepared using a simple, low-cost, commonly available and scalable nanosphere lithography for patterning gold catalyst particles and a successive bottom-up growth technique in a tube furnace chemical vapor deposition system. Each rod in the arrays had perfect surface facets, sharp edges and uniform size. For all of the rods, their sides were oriented the same. This bottom-up assembly method may accelerate the use of ZnO nanorods in real device applications

  3. Hardware Commissioning of the LHC Quality Assurance, follow-up and storing of the test results

    CERN Document Server

    Barbero, E

    2005-01-01

    During the commissioning of the LHC technical systems [1] (the so-called Hardware Commissioning) a large number of test sequences and procedures will be applied to the different systems and components of the accelerator. All the information related to the coordination of the Hardware Commissioning will be structured and managed towards the final objective of integrating all the data produced in the Manufacturing and Test Folders (MTF) [2] at both equipment level (i.e. individual system tests) and commissioning level (i.e.Hardware Commissioning). The MTF for Hardware Commissioning will be mainly used to archive the results of the tests (i.e. status, parameters and waveforms) which will be used later as reference during the operation with beam. Also it is an indispensable tool for monitoring the progress of the different tests and ensuring the proper follow-up of the procedures described in the engineering specifications; in this way, the Quality Assurance process will be completed. This paper describes the spe...

  4. The BNL Accelerator Test Facility control system

    International Nuclear Information System (INIS)

    Malone, R.; Bottke, I.; Fernow, R.; Ben-Zvi, I.

    1993-01-01

    Described is the VAX/CAMAC-based control system for Brookhaven National Laboratory's Accelerator Test Facility, a laser/linac research complex. Details of hardware and software configurations are presented along with experiences of using Vsystem, a commercial control system package

  5. Development of a scalable suspension culture for cardiac differentiation from human pluripotent stem cells

    Directory of Open Access Journals (Sweden)

    Vincent C. Chen

    2015-09-01

    Full Text Available To meet the need of a large quantity of hPSC-derived cardiomyocytes (CM for pre-clinical and clinical studies, a robust and scalable differentiation system for CM production is essential. With a human pluripotent stem cells (hPSC aggregate suspension culture system we established previously, we developed a matrix-free, scalable, and GMP-compliant process for directing hPSC differentiation to CM in suspension culture by modulating Wnt pathways with small molecules. By optimizing critical process parameters including: cell aggregate size, small molecule concentrations, induction timing, and agitation rate, we were able to consistently differentiate hPSCs to >90% CM purity with an average yield of 1.5 to 2 × 109 CM/L at scales up to 1 L spinner flasks. CM generated from the suspension culture displayed typical genetic, morphological, and electrophysiological cardiac cell characteristics. This suspension culture system allows seamless transition from hPSC expansion to CM differentiation in a continuous suspension culture. It not only provides a cost and labor effective scalable process for large scale CM production, but also provides a bioreactor prototype for automation of cell manufacturing, which will accelerate the advance of hPSC research towards therapeutic applications.

  6. A High Performance QDWH-SVD Solver using Hardware Accelerators

    KAUST Repository

    Sukkari, Dalal E.

    2015-04-08

    This paper describes a new high performance implementation of the QR-based Dynamically Weighted Halley Singular Value Decomposition (QDWH-SVD) solver on multicore architecture enhanced with multiple GPUs. The standard QDWH-SVD algorithm was introduced by Nakatsukasa and Higham (SIAM SISC, 2013) and combines three successive computational stages: (1) the polar decomposition calculation of the original matrix using the QDWH algorithm, (2) the symmetric eigendecomposition of the resulting polar factor to obtain the singular values and the right singular vectors and (3) the matrix-matrix multiplication to get the associated left singular vectors. A comprehensive test suite highlights the numerical robustness of the QDWH-SVD solver. Although it performs up to two times more flops when computing all singular vectors compared to the standard SVD solver algorithm, our new high performance implementation on single GPU results in up to 3.8x improvements for asymptotic matrix sizes, compared to the equivalent routines from existing state-of-the-art open-source and commercial libraries. However, when only singular values are needed, QDWH-SVD is penalized by performing up to 14 times more flops. The singular value only implementation of QDWH-SVD on single GPU can still run up to 18% faster than the best existing equivalent routines. Integrating mixed precision techniques in the solver can additionally provide up to 40% improvement at the price of losing few digits of accuracy, compared to the full double precision floating point arithmetic. We further leverage the single GPU QDWH-SVD implementation by introducing the first multi-GPU SVD solver to study the scalability of the QDWH-SVD framework.

  7. Chromium Renderserver: Scalable and Open Source Remote RenderingInfrastructure

    Energy Technology Data Exchange (ETDEWEB)

    Paul, Brian; Ahern, Sean; Bethel, E. Wes; Brugger, Eric; Cook,Rich; Daniel, Jamison; Lewis, Ken; Owen, Jens; Southard, Dale

    2007-12-01

    Chromium Renderserver (CRRS) is software infrastructure thatprovides the ability for one or more users to run and view image outputfrom unmodified, interactive OpenGL and X11 applications on a remote,parallel computational platform equipped with graphics hardwareaccelerators via industry-standard Layer 7 network protocolsand clientviewers. The new contributions of this work include a solution to theproblem of synchronizing X11 and OpenGL command streams, remote deliveryof parallel hardware-accelerated rendering, and a performance analysis ofseveral different optimizations that are generally applicable to avariety of rendering architectures. CRRSis fully operational, Open Sourcesoftware.

  8. Scalable Density-Based Subspace Clustering

    DEFF Research Database (Denmark)

    Müller, Emmanuel; Assent, Ira; Günnemann, Stephan

    2011-01-01

    For knowledge discovery in high dimensional databases, subspace clustering detects clusters in arbitrary subspace projections. Scalability is a crucial issue, as the number of possible projections is exponential in the number of dimensions. We propose a scalable density-based subspace clustering...... method that steers mining to few selected subspace clusters. Our novel steering technique reduces subspace processing by identifying and clustering promising subspaces and their combinations directly. Thereby, it narrows down the search space while maintaining accuracy. Thorough experiments on real...... and synthetic databases show that steering is efficient and scalable, with high quality results. For future work, our steering paradigm for density-based subspace clustering opens research potential for speeding up other subspace clustering approaches as well....

  9. Harnessing the crowd to accelerate molecular medicine research.

    Science.gov (United States)

    Smith, Robert J; Merchant, Raina M

    2015-07-01

    Crowdsourcing presents a novel approach to solving complex problems within molecular medicine. By leveraging the expertise of fellow scientists across the globe, broadcasting to and engaging the public for idea generation, harnessing a scalable workforce for quick data management, and fundraising for research endeavors, crowdsourcing creates novel opportunities for accelerating scientific progress. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Design of Power Efficient FPGA based Hardware Accelerators for Financial Applications

    DEFF Research Database (Denmark)

    Hegner, Jonas Stenbæk; Sindholt, Joakim; Nannarelli, Alberto

    2012-01-01

    Using Field Programmable Gate Arrays (FPGAs) to accelerate financial derivative calculations is becoming very common. In this work, we implement an FPGA-based specific processor for European option pricing using Monte Carlo simulations, and we compare its performance and power dissipation...

  11. Energy-aware SQL query acceleration through FPGA-based dynamic partial reconfiguration

    NARCIS (Netherlands)

    Becher, Andreas; Bauer, Florian; Ziener, Daniel; Teich, Jürgen

    2014-01-01

    In this paper, we propose an approach for energy-aware FPGA-based query acceleration for databases on embedded devices. After the analysis of an incoming query, a query-specific hardware accelerator is generated on-the-fly and loaded on the FPGA for subsequent query execution using partial dynamic

  12. Construction of a smart medication dispenser with high degree of scalability and remote manageability.

    Science.gov (United States)

    Pak, JuGeon; Park, KeeHyun

    2012-01-01

    We propose a smart medication dispenser having a high degree of scalability and remote manageability. We construct the dispenser to have extensible hardware architecture for achieving scalability, and we install an agent program in it for achieving remote manageability. The dispenser operates as follows: when the real-time clock reaches the predetermined medication time and the user presses the dispense button at that time, the predetermined medication is dispensed from the medication dispensing tray (MDT). In the proposed dispenser, the medication for each patient is stored in an MDT. One smart medication dispenser contains mainly one MDT; however, the dispenser can be extended to include more MDTs in order to support multiple users using one dispenser. For remote management, the proposed dispenser transmits the medication status and the system configurations to the monitoring server. In the case of a specific event such as a shortage of medication, memory overload, software error, or non-adherence, the event is transmitted immediately. All these operations are performed automatically without the intervention of patients, through the agent program installed in the dispenser. Results of implementation and verification show that the proposed dispenser operates normally and performs the management operations from the medication monitoring server suitably.

  13. Construction of a Smart Medication Dispenser with High Degree of Scalability and Remote Manageability

    Directory of Open Access Journals (Sweden)

    JuGeon Pak

    2012-01-01

    Full Text Available We propose a smart medication dispenser having a high degree of scalability and remote manageability. We construct the dispenser to have extensible hardware architecture for achieving scalability, and we install an agent program in it for achieving remote manageability. The dispenser operates as follows: when the real-time clock reaches the predetermined medication time and the user presses the dispense button at that time, the predetermined medication is dispensed from the medication dispensing tray (MDT. In the proposed dispenser, the medication for each patient is stored in an MDT. One smart medication dispenser contains mainly one MDT; however, the dispenser can be extended to include more MDTs in order to support multiple users using one dispenser. For remote management, the proposed dispenser transmits the medication status and the system configurations to the monitoring server. In the case of a specific event such as a shortage of medication, memory overload, software error, or non-adherence, the event is transmitted immediately. All these operations are performed automatically without the intervention of patients, through the agent program installed in the dispenser. Results of implementation and verification show that the proposed dispenser operates normally and performs the management operations from the medication monitoring server suitably.

  14. Hardware description languages

    Science.gov (United States)

    Tucker, Jerry H.

    1994-01-01

    Hardware description languages are special purpose programming languages. They are primarily used to specify the behavior of digital systems and are rapidly replacing traditional digital system design techniques. This is because they allow the designer to concentrate on how the system should operate rather than on implementation details. Hardware description languages allow a digital system to be described with a wide range of abstraction, and they support top down design techniques. A key feature of any hardware description language environment is its ability to simulate the modeled system. The two most important hardware description languages are Verilog and VHDL. Verilog has been the dominant language for the design of application specific integrated circuits (ASIC's). However, VHDL is rapidly gaining in popularity.

  15. Development of a distributed control system for the JAERI tandem accelerator facility

    International Nuclear Information System (INIS)

    Hanashima, Susumu

    2005-01-01

    In the JAERI tandem accelerator facility, we are building accelerator complex aiming generation and acceleration of radio nuclear beam. Several accelerators, ion sources and a charge breeder are installed in the facility. We are developing a distributed control system enabling smooth operation of the facility. We report basic concepts of the control system in this article. We also describe about a control hardware using plastic optical fiber, which is developed for the control system. (author)

  16. Overview of Fermi National Accelerator Lab Control System

    International Nuclear Information System (INIS)

    Lucas, P.W.

    1990-01-01

    Various facets of the control of the Fermilab accelerators, in particular the Tevatron, are presented. Since Fermilab contains a superconducting machine and a sophisticated injection complex, much of the controls functionality will of necessity be the same at the SSC. The various functions required at a large laboratory are discussed; these include computer-based fire and security alarms and a cable television system, as well as computer networks connected to accelerator hardware components. A description is given of that hardware, of which much is Camac but with considerable computer backplane bus equipment also present. A large fraction of the controls hardware has access to high precision real-time clocks. Our various networks are introduced, with the physical layer being a combination of copper and more modern optic cables, with the primary intercomputer link being Token Ring. A description of the computers is presented - basically these consist of operators' consoles, host VAXs, and link driving front ends. The software effort is detailed, with emphasis on consoles and microprocessors where the majority of effort has been placed. Future plans for the system are presented briefly. 3 refs., 2 figs., 2 tabs

  17. Accelerator control using RSX-11M and CAMAC

    International Nuclear Information System (INIS)

    Kulaga, J.E.

    1978-01-01

    This paper describes a computer-control system for a superconducting linear accelerator currently under development at Argonne National Laboratory. RSX-11M V3.1 running on a PDP 11/34 is used with CAMAC hardware to fully control 22 active beam-line elements and monitor critical accelerator conditions such as temperature, vacuum, and beam characteristics. This paper contrasts the use of an RSX compatible CAMAC driver for most CAMAC I/O operations and the use of the Connect-to-Interrupt Vector directive for fast ADC operation. The usage of table-driven software to achieve hardware configuration independence is discussed, along with the design considerations of the software interface between a human operator and a computer-control system featuring multi-function computer-readable control knobs and computer-writable displays which make up the operator's control console

  18. Accelerating Science Driven System Design With RAMP

    Energy Technology Data Exchange (ETDEWEB)

    Wawrzynek, John [Univ. of California, Berkeley, CA (United States)

    2015-05-01

    Researchers from UC Berkeley, in collaboration with the Lawrence Berkeley National Lab, are engaged in developing an Infrastructure for Synthesis with Integrated Simulation (ISIS). The ISIS Project was a cooperative effort for “application-driven hardware design” that engages application scientists in the early parts of the hardware design process for future generation supercomputing systems. This project served to foster development of computing systems that are better tuned to the application requirements of demanding scientific applications and result in more cost-effective and efficient HPC system designs. In order to overcome long conventional design-cycle times, we leveraged reconfigurable devices to aid in the design of high-efficiency systems, including conventional multi- and many-core systems. The resulting system emulation/prototyping environment, in conjunction with the appropriate intermediate abstractions, provided both a convenient user programming experience and retained flexibility, and thus efficiency, of a reconfigurable platform. We initially targeted the Berkeley RAMP system (Research Accelerator for Multiple Processors) as that hardware emulation environment to facilitate and ultimately accelerate the iterative process of science-driven system design. Our goal was to develop and demonstrate a design methodology for domain-optimized computer system architectures. The tangible outcome is a methodology and tools for rapid prototyping and design-space exploration, leading to highly optimized and efficient HPC systems.

  19. Scalable algorithms for contact problems

    CERN Document Server

    Dostál, Zdeněk; Sadowská, Marie; Vondrák, Vít

    2016-01-01

    This book presents a comprehensive and self-contained treatment of the authors’ newly developed scalable algorithms for the solutions of multibody contact problems of linear elasticity. The brand new feature of these algorithms is theoretically supported numerical scalability and parallel scalability demonstrated on problems discretized by billions of degrees of freedom. The theory supports solving multibody frictionless contact problems, contact problems with possibly orthotropic Tresca’s friction, and transient contact problems. It covers BEM discretization, jumping coefficients, floating bodies, mortar non-penetration conditions, etc. The exposition is divided into four parts, the first of which reviews appropriate facets of linear algebra, optimization, and analysis. The most important algorithms and optimality results are presented in the third part of the volume. The presentation is complete, including continuous formulation, discretization, decomposition, optimality results, and numerical experimen...

  20. Delayless acceleration measurement method for motion control applications

    Energy Technology Data Exchange (ETDEWEB)

    Vaeliviita, S.; Ovaska, S.J. [Helsinki University of Technology, Otaniemi (Finland). Institute of Intelligent Power Electronics

    1997-12-31

    Delayless and accurate sensing of angular acceleration can improve the performance of motion control in motor drives. Acceleration control is, however, seldom implemented in practical drive systems due to prohibitively high costs or unsatisfactory results of most acceleration measurement methods. In this paper we propose an efficient and accurate acceleration measurement method based on direct differentiation of the corresponding velocity signal. Polynomial predictive filtering is used to smooth the resulting noisy signal without delay. This type of prediction is justified by noticing that a low-degree polynomial can usually be fitted into the primary acceleration curve. No additional hardware is required to implement the procedure if the velocity signal is already available. The performance of the acceleration measurement method is evaluated by applying it to a demanding motion control application. (orig.) 12 refs.

  1. Accelerated Degradation for Hardware in the Loop Simulation of Fuel Cell-Gas Turbine Hybrid System

    DEFF Research Database (Denmark)

    Abreu-Sepulveda, Maria A.; Harun, Nor Farida; Hackett, Gregory

    2015-01-01

    The U.S. Department of Energy (DOE)-National Energy Technology Laboratory (NETL) in Morgantown, WV has developed the hybrid performance (HyPer) project in which a solid oxide fuel cell (SOFC) one-dimensional (1D), real-time operating model is coupled to a gas turbine hardware system by utilizing...

  2. Development of a modular and scalable sensor system for the gathering of position and orientation of moved objects

    International Nuclear Information System (INIS)

    Klingbeil, L.

    2006-02-01

    A modular and scalable sensor system for the estimation of position and orientation of moving objects has been developed and characterized. A sensor unit, which is mounted to the moving object, consists of acceleration -, angular rate - and magnetic field sensors for every spatial axis. Customized Kalman filter algorithms provide a robust and low latency reconstruction of the sensor's orientation. Additionally an ultrasound transducer network is used to measure the distance of a sensor unit with respect to several reference points in the room. This allows reconstruction of the absolute position using trilateration methods. The system is scalable with respect to the number of sensor units and the covered tracking volume. It is suitable for various applications for example the analysis of body movements or head tracking in augmented or virtual reality environments. (orig.)

  3. Computing requirements for S.S.C. accelerator design and studies

    International Nuclear Information System (INIS)

    Dragt, A.; Talman, R.; Siemann, R.; Dell, G.F.; Leemann, B.; Leemann, C.; Nauenberg, U.; Peggs, S.; Douglas, D.

    1984-01-01

    We estimate the computational hardware resources that will be required for accelerator physics studies during the design of the Superconducting SuperCollider. It is found that both Class IV and Class VI facilities (1) will be necessary. We describe a user environment for these facilities that is desirable within the context of accelerator studies. An acquisition scenario for these facilities is presented

  4. The control computer for the Chalk River electron test accelerator

    International Nuclear Information System (INIS)

    McMichael, G.E.; Fraser, J.S.; McKeown, J.

    1978-02-01

    A versatile control and data acquisition system has been developed for a modest-sized linear accelerator using mainly process I/O hardware and software. This report describes the evolution of the present system since 1972, the modifications needed to satisfy the changing requirements of the various accelerator physics experiments and the limitations of such a system in process control. (author)

  5. Fast and Scalable Computation of the Forward and Inverse Discrete Periodic Radon Transform.

    Science.gov (United States)

    Carranza, Cesar; Llamocca, Daniel; Pattichis, Marios

    2016-01-01

    The discrete periodic radon transform (DPRT) has extensively been used in applications that involve image reconstructions from projections. Beyond classic applications, the DPRT can also be used to compute fast convolutions that avoids the use of floating-point arithmetic associated with the use of the fast Fourier transform. Unfortunately, the use of the DPRT has been limited by the need to compute a large number of additions and the need for a large number of memory accesses. This paper introduces a fast and scalable approach for computing the forward and inverse DPRT that is based on the use of: a parallel array of fixed-point adder trees; circular shift registers to remove the need for accessing external memory components when selecting the input data for the adder trees; an image block-based approach to DPRT computation that can fit the proposed architecture to available resources; and fast transpositions that are computed in one or a few clock cycles that do not depend on the size of the input image. As a result, for an N × N image (N prime), the proposed approach can compute up to N(2) additions per clock cycle. Compared with the previous approaches, the scalable approach provides the fastest known implementations for different amounts of computational resources. For example, for a 251×251 image, for approximately 25% fewer flip-flops than required for a systolic implementation, we have that the scalable DPRT is computed 36 times faster. For the fastest case, we introduce optimized just 2N + ⌈log(2) N⌉ + 1 and 2N + 3 ⌈log(2) N⌉ + B + 2 cycles, architectures that can compute the DPRT and its inverse in respectively, where B is the number of bits used to represent each input pixel. On the other hand, the scalable DPRT approach requires more 1-b additions than for the systolic implementation and provides a tradeoff between speed and additional 1-b additions. All of the proposed DPRT architectures were implemented in VHSIC Hardware Description Language

  6. Ice-sheet modelling accelerated by graphics cards

    Science.gov (United States)

    Brædstrup, Christian Fredborg; Damsgaard, Anders; Egholm, David Lundbek

    2014-11-01

    Studies of glaciers and ice sheets have increased the demand for high performance numerical ice flow models over the past decades. When exploring the highly non-linear dynamics of fast flowing glaciers and ice streams, or when coupling multiple flow processes for ice, water, and sediment, researchers are often forced to use super-computing clusters. As an alternative to conventional high-performance computing hardware, the Graphical Processing Unit (GPU) is capable of massively parallel computing while retaining a compact design and low cost. In this study, we present a strategy for accelerating a higher-order ice flow model using a GPU. By applying the newest GPU hardware, we achieve up to 180× speedup compared to a similar but serial CPU implementation. Our results suggest that GPU acceleration is a competitive option for ice-flow modelling when compared to CPU-optimised algorithms parallelised by the OpenMP or Message Passing Interface (MPI) protocols.

  7. Exploiting first-class arrays in Fortran for accelerator programming

    International Nuclear Information System (INIS)

    Rasmussen, Craig E.; Weseloh, Wayne N.; Robey, Robert W.; Sottile, Matthew J.; Quinlan, Daniel; Overbey, Jeffrey

    2010-01-01

    Emerging architectures for high performance computing often are well suited to a data parallel programming model. This paper presents a simple programming methodology based on existing languages and compiler tools that allows programmers to take advantage of these systems. We will work with the array features of Fortran 90 to show how this infrequently exploited, standardized language feature is easily transformed to lower level accelerator code. Our transformations are based on a mapping from Fortran 90 to C++ code with OpenCL extensions. The sheer complexity of programming for clusters of many or multi-core processors with tens of millions threads of execution make the simplicity of the data parallel model attractive. Furthermore, the increasing complexity of todays applications (especially when convolved with the increasing complexity of the hardware) and the need for portability across hardware architectures make a higher-level and simpler programming model like data parallel attractive. The goal of this work has been to exploit source-to-source transformations that allow programmers to develop and maintain programs at a high-level of abstraction, without coding to a specific hardware architecture. Furthermore these transformations allow multiple hardware architectures to be targeted without changing the high-level source. It also removes the necessity for application programmers to understand details of the accelerator architecture or to know OpenCL.

  8. Quick setup of test unit for accelerator control system

    International Nuclear Information System (INIS)

    Fu, W.; D'Ottavio, T.; Gassner, D.; Nemesure, S.; Morris, J.

    2011-01-01

    Testing a single hardware unit of an accelerator control system often requires the setup of a program with graphical user interface. Developing a dedicated application for a specific hardware unit test could be time consuming and the application may become obsolete after the unit tests. This paper documents a methodology for quick design and setup of an interface focused on performing unit tests of accelerator equipment with minimum programming work. The method has three components. The first is a generic accelerator device object (ADO) manager which can be used to setup, store, and log testing controls parameters for any unit testing system. The second involves the design of a TAPE (Tool for Automated Procedure Execution) sequence file that specifies and implements all te testing and control logic. The sting third is the design of a PET (parameter editing tool) page that provides the unit tester with all the necessary control parameters required for testing. This approach has been used for testing the horizontal plane of the Stochastic Cooling Motion Control System at RHIC.

  9. New Complexity Scalable MPEG Encoding Techniques for Mobile Applications

    Directory of Open Access Journals (Sweden)

    Stephan Mietens

    2004-03-01

    Full Text Available Complexity scalability offers the advantage of one-time design of video applications for a large product family, including mobile devices, without the need of redesigning the applications on the algorithmic level to meet the requirements of the different products. In this paper, we present complexity scalable MPEG encoding having core modules with modifications for scalability. The interdependencies of the scalable modules and the system performance are evaluated. Experimental results show scalability giving a smooth change in complexity and corresponding video quality. Scalability is basically achieved by varying the number of computed DCT coefficients and the number of evaluated motion vectors but other modules are designed such they scale with the previous parameters. In the experiments using the “Stefan” sequence, the elapsed execution time of the scalable encoder, reflecting the computational complexity, can be gradually reduced to roughly 50% of its original execution time. The video quality scales between 20 dB and 48 dB PSNR with unity quantizer setting, and between 21.5 dB and 38.5 dB PSNR for different sequences targeting 1500 kbps. The implemented encoder and the scalability techniques can be successfully applied in mobile systems based on MPEG video compression.

  10. Performance evaluation of OpenFOAM on many-core architectures

    International Nuclear Information System (INIS)

    Brzobohatý, Tomáš; Říha, Lubomír; Karásek, Tomáš; Kozubek, Tomáš

    2015-01-01

    In this article application of Open Source Field Operation and Manipulation (OpenFOAM) C++ libraries for solving engineering problems on many-core architectures is presented. Objective of this article is to present scalability of OpenFOAM on parallel platforms solving real engineering problems of fluid dynamics. Scalability test of OpenFOAM is performed using various hardware and different implementation of standard PCG and PBiCG Krylov iterative methods. Speed up of various implementations of linear solvers using GPU and MIC accelerators are presented in this paper. Numerical experiments of 3D lid-driven cavity flow for several cases with various number of cells are presented

  11. Performance evaluation of OpenFOAM on many-core architectures

    Energy Technology Data Exchange (ETDEWEB)

    Brzobohatý, Tomáš; Říha, Lubomír; Karásek, Tomáš, E-mail: tomas.karasek@vsb.cz; Kozubek, Tomáš [IT4Innovations National Supercomputing Center, VŠB-Technical University of Ostrava (Czech Republic)

    2015-03-10

    In this article application of Open Source Field Operation and Manipulation (OpenFOAM) C++ libraries for solving engineering problems on many-core architectures is presented. Objective of this article is to present scalability of OpenFOAM on parallel platforms solving real engineering problems of fluid dynamics. Scalability test of OpenFOAM is performed using various hardware and different implementation of standard PCG and PBiCG Krylov iterative methods. Speed up of various implementations of linear solvers using GPU and MIC accelerators are presented in this paper. Numerical experiments of 3D lid-driven cavity flow for several cases with various number of cells are presented.

  12. A~Scalable~Data~Taking~System at~a~Test~Beam~for~LHC

    CERN Multimedia

    2002-01-01

    % RD-13 A Scalable Data Taking System at a Test Beam for LHC \\\\ \\\\We have installed a test beam read-out facility for the simultaneous test of LHC detectors, trigger and read-out electronics, together with the development of the supporting architecture in a multiprocessor environment. The aim of the project is to build a system which incorporates all the functionality of a complete read-out chain. Emphasis is put on a highly modular design, such that new hardware and software developments can be conveniently introduced. Exploiting this modularity, the set-up will evolve driven by progress in technologies and new software developments. \\\\ \\\\One of the main thrusts of the project is modelling and integration of different read-out architectures to provide a valuable training ground for new techniques. To address these aspects in a realistic manner, we collaborate with detector R\\&D projects in order to test higher level trigger systems, event building and high rate data transfers, once the techniques involve...

  13. Steps Towards Scalable and Modularized Flight Software for Unmanned Aircraft Systems

    Directory of Open Access Journals (Sweden)

    Johann C. Dauer

    2014-05-01

    Full Text Available Unmanned aircraft (UA applications impose a variety of computing tasks on the on-board computer system. From a research perspective, it is often more convenient to evaluate algorithms on bigger aircraft as they are capable of lifting heavier loads and thus more powerful computational units. On the other hand, smaller systems are often less expensive and operation is less restricted in many countries. This paper thus presents a conceptual design for flight software that can be evaluated on the UA of convenient size. The integration effort required to transfer the algorithm to different sized UA is significantly reduced. This scalability is achieved by using exchangeable payload modules and a flexible process distribution on different processing units. The presented approach is discussed using the example of the flight software of a 14 kg unmanned helicopter and an equivalent of 1.5 kg. The proof of concept is shown by means of flight performance in a hardware-in-the-loop simulation.

  14. Case study: Accelerated schedule for MULTI LIMS installation

    International Nuclear Information System (INIS)

    Ibsen, T.G.

    1994-05-01

    This presentation focuses on the steps taken by the Westinghouse Hanford Company to meet an accelerated schedule for configuration and implementation of the MULTI LIMS in a multiple laboratory environment. The Westinghouse Hanford Company purchased the MULTI LIMS Laboratory Information Management System in August, 1993. Hardware delivery began in October, 1993. Less than four months later, the initial configuration was released for use in two Westinghouse Hanford Company laboratories. Several major obstacles were overcome during implementation. These include information gathering for base table loading, user training, acceptance of the new system by users of a legacy system, and hardware configuration issues. In summary, steps needed to be taken to meet the accelerated implementation schedule of the MULTI LIMS at the Hanford Site. The obstacles faced were overcome through the in-depth knowledge and help of the vendor and the dedication and drive of the technical staff

  15. Foundations of hardware IP protection

    CERN Document Server

    Torres, Lionel

    2017-01-01

    This book provides a comprehensive and up-to-date guide to the design of security-hardened, hardware intellectual property (IP). Readers will learn how IP can be threatened, as well as protected, by using means such as hardware obfuscation/camouflaging, watermarking, fingerprinting (PUF), functional locking, remote activation, hidden transmission of data, hardware Trojan detection, protection against hardware Trojan, use of secure element, ultra-lightweight cryptography, and digital rights management. This book serves as a single-source reference to design space exploration of hardware security and IP protection. · Provides readers with a comprehensive overview of hardware intellectual property (IP) security, describing threat models and presenting means of protection, from integrated circuit layout to digital rights management of IP; · Enables readers to transpose techniques fundamental to digital rights management (DRM) to the realm of hardware IP security; · Introduce designers to the concept of salutar...

  16. The continuous electron beam accelerator facility

    International Nuclear Information System (INIS)

    Grunder, H.A.

    1989-01-01

    Tunnel construction and accelerator component development, assembly, and testing are under way at the Continuous Electron Beam Accelerator Facility. CEBAF's 4-GeV, 200-μA superconducting recirculating accelerator will provide cw beam to simultaneous experiments in three end stations for studies of the nuclear many-body system, its quark substructure, and the strong and electroweak interactions governing this form of matter. Prototype accelerating cavities, assembled in cryostats and tested on site, continue to exceed performance specifications. An on-site liquid helium capability supports cryostat development and cavity testing. Major elements of the accelerator instrumentation and control hardware and software are in use in cryogenics, rf, and injector tests. Prototype rf systems have been operated and prototype klystrons have been ordered. The initial, 100-keV, room-temperature region of the 45-MeV injector is operational and meets specifications. CEBAF's end stations have been conceptually designed; experimental equipment conceptual designs will be completed in 1989. 2 refs., 5 figs., 2 tabs

  17. Principle of accelerator mass spectrometry

    International Nuclear Information System (INIS)

    Matsuzaki, Hiroyuki

    2007-01-01

    The principle of accelerator mass spectrometry (AMS) is described mainly on technical aspects: hardware construction of AMS, measurement of isotope ratio, sensitivity of measurement (measuring limit), measuring accuracy, and application of data. The content may be summarized as follows: rare isotope (often long-lived radioactive isotope) can be detected by various use of the ion energy obtained by the acceleration of ions, a measurable isotope ratio is one of rare isotope to abundant isotopes, and a measured value of isotope ratio is uncertainty to true one. Such a fact must be kept in mind on the use of AMS data to application research. (M.H.)

  18. Personal computer control system for small size tandem accelerator

    Energy Technology Data Exchange (ETDEWEB)

    Takayama, Hiroshi; Kawano, Kazuhiro; Shinozaki, Masataka [Nissin - High Voltage Co. Ltd., Kyoto (Japan)

    1996-12-01

    As the analysis apparatus using tandem accelerator has a lot of control parameter, numbers of control parts set on control panel are so many to make the panel more complex and its operativity worse. In order to improve these faults, development and design of a control system using personal computer for the control panel mainly constituted by conventional hardware parts were tried. Their predominant characteristics are shown as follows: (1) To make the control panel construction simpler and more compact, because the hardware device on the panel surface becomes the smallest limit as required by using a personal computer for man-machine interface. (2) To make control speed more rapid, because sequence control is closed within each block by driving accelerator system to each block and installing local station of the sequencer network at each block. (3) To make expandability larger, because of few improvement of the present hardware by interrupting the sequencer local station into the net and correcting image of the computer when increasing a new beamline. And, (4) to make control system cheaper, because of cheaper investment and easier programming by using the personal computer. (G.K.)

  19. Open hardware for open science

    CERN Multimedia

    CERN Bulletin

    2011-01-01

    Inspired by the open source software movement, the Open Hardware Repository was created to enable hardware developers to share the results of their R&D activities. The recently published CERN Open Hardware Licence offers the legal framework to support this knowledge and technology exchange.   Two years ago, a group of electronics designers led by Javier Serrano, a CERN engineer, working in experimental physics laboratories created the Open Hardware Repository (OHR). This project was initiated in order to facilitate the exchange of hardware designs across the community in line with the ideals of “open science”. The main objectives include avoiding duplication of effort by sharing results across different teams that might be working on the same need. “For hardware developers, the advantages of open hardware are numerous. For example, it is a great learning tool for technologies some developers would not otherwise master, and it avoids unnecessary work if someone ha...

  20. Evolution of control systems for accelerators

    International Nuclear Information System (INIS)

    Crowley-Milling, M.C.

    1983-01-01

    The author reviews the development of control systems for accelerators. After an historical survey and a general introduction the hardware and software of such systems is described. As example the control system of the CERN SP5 is considered. Finally an outlook is given to future developments with special regards to the LEP storage ring. (HSI)

  1. Superconductivity and future accelerators

    International Nuclear Information System (INIS)

    Danby, G.T.; Jackson, J.W.

    1963-01-01

    For 50 years particle accelerators employing accelerating cavities and deflecting magnets have been developed at a prodigious rate. New accelerator concepts and hardware ensembles have yielded great improvements in performance and GeV/$. The great idea for collective acceleration resulting from intense auxiliary charged-particle beams or laser light may or may not be just around the corner. In its absence, superconductivity (SC) applied both to rf cavities and to magnets opened up the potential for very large accelerators without excessive energy consumption and with other economies, even with the cw operation desirable for colliding beams. HEP has aggressively pioneered this new technology: the Fermilab single ring 1 TeV accelerator - 2 TeV collider is near the testing stage. Brookhaven National Laboratory's high luminosity pp 2 ring 800 GeV CBA collider is well into construction. Other types of superconducting projects are in the planning stage with much background R and D accomplished. The next generation of hadron colliders under discussion involves perhaps a 20 TeV ring (or rings) with 40 TeV CM energy. This is a very large machine: even if the highest practical field B approx. 10T is used, the radius is 10x that of the Fermilab accelerator. An extreme effort to get maximum GeV/$ may be crucial even for serious consideration of funding

  2. IPbus A flexible Ethernet-based control system for xTCA hardware

    CERN Document Server

    Williams, Thomas Stephen

    2014-01-01

    The ATCA and uTCA standards include industry-standard data pathway technologies such as Gigabit Ethernet which can be used for control communication, but no specific hardware control protocol is defined. The IPbus suite of software and firmware implements a reliable high-performance control link for particle physics electronics, and has successfully replaced VME control in several large projects. In this paper, we outline the IPbus system architecture, and describe recent developments in the reliability, scalability and performance of IPbus systems, carried out in preparation for deployment of uTCA-based CMS upgrades before the LHC 2015 run. We also discuss plans for future development of the IPbus suite.SUMMARY IPbus will be used for controlling the uTCA electronics in the CMS HCAL, TCDS, Pixel and Level-1 trigger upgrades. IPbus control has already been extensively used in the work of these upgrade projects so far, and final uTCA systems will be deployed in the experiment starting from Autumn 2014. IPbus is...

  3. Open Hardware Business Models

    OpenAIRE

    Edy Ferreira

    2008-01-01

    In the September issue of the Open Source Business Resource, Patrick McNamara, president of the Open Hardware Foundation, gave a comprehensive introduction to the concept of open hardware, including some insights about the potential benefits for both companies and users. In this article, we present the topic from a different perspective, providing a classification of market offers from companies that are making money with open hardware.

  4. Real-Time Fabric Defect Detection Using Accelerated Small-Scale Over-Completed Dictionary of Sparse Coding

    Directory of Open Access Journals (Sweden)

    Tianpeng Feng

    2016-01-01

    Full Text Available An auto fabric defect detection system via computer vision is used to replace manual inspection. In this paper, we propose a hardware accelerated algorithm based on a small-scale over-completed dictionary (SSOCD via sparse coding (SC method, which is realized on a parallel hardware platform (TMS320C6678. In order to reduce computation, the image patches projections in the training SSOCD are taken as features and the proposed features are more robust, and exhibit obvious advantages in detection results and computational cost. Furthermore, we introduce detection ratio and false ratio in order to measure the performance and reliability of the hardware accelerated algorithm. The experiments show that the proposed algorithm can run with high parallel efficiency and that the detection speed meets the real-time requirements of industrial inspection.

  5. Scalable High-Performance Parallel Design for Network Intrusion Detection Systems on Many-Core Processors

    OpenAIRE

    Jiang, Hayang; Xie, Gaogang; Salamatian, Kavé; Mathy, Laurent

    2013-01-01

    Network Intrusion Detection Systems (NIDSes) face significant challenges coming from the relentless network link speed growth and increasing complexity of threats. Both hardware accelerated and parallel software-based NIDS solutions, based on commodity multi-core and GPU processors, have been proposed to overcome these challenges. Network Intrusion Detection Systems (NIDSes) face significant challenges coming from the relentless network link speed growth and increasing complexity of threats. ...

  6. Concurrent heterogeneous neural model simulation on real-time neuromimetic hardware.

    Science.gov (United States)

    Rast, Alexander; Galluppi, Francesco; Davies, Sergio; Plana, Luis; Patterson, Cameron; Sharp, Thomas; Lester, David; Furber, Steve

    2011-11-01

    Dedicated hardware is becoming increasingly essential to simulate emerging very-large-scale neural models. Equally, however, it needs to be able to support multiple models of the neural dynamics, possibly operating simultaneously within the same system. This may be necessary either to simulate large models with heterogeneous neural types, or to simplify simulation and analysis of detailed, complex models in a large simulation by isolating the new model to a small subpopulation of a larger overall network. The SpiNNaker neuromimetic chip is a dedicated neural processor able to support such heterogeneous simulations. Implementing these models on-chip uses an integrated library-based tool chain incorporating the emerging PyNN interface that allows a modeller to input a high-level description and use an automated process to generate an on-chip simulation. Simulations using both LIF and Izhikevich models demonstrate the ability of the SpiNNaker system to generate and simulate heterogeneous networks on-chip, while illustrating, through the network-scale effects of wavefront synchronisation and burst gating, methods that can provide effective behavioural abstractions for large-scale hardware modelling. SpiNNaker's asynchronous virtual architecture permits greater scope for model exploration, with scalable levels of functional and temporal abstraction, than conventional (or neuromorphic) computing platforms. The complete system illustrates a potential path to understanding the neural model of computation, by building (and breaking) neural models at various scales, connecting the blocks, then comparing them against the biology: computational cognitive neuroscience. Copyright © 2011 Elsevier Ltd. All rights reserved.

  7. Workshop Engages PCs in Accelerator Controls

    International Nuclear Information System (INIS)

    Matthew Bickley

    2006-01-01

    To discuss the rapidly growing and changing use of personal computers (PCs) in accelerator control systems, 80 accelerator controls specialists from 26 institutions in North America, Europe and Asia attended the 6. International Workshop on Personal Computers and Particle Accelerator Controls, PCaPAC2006, held October 24-27 at Jefferson Lab in Newport News, Virginia. PCs have become increasingly applicable to the control of accelerators as their computing capacities have increased exponentially over the last 10 years. Capabilities that once required the power available only from expensive, small-market systems offered by DEC, Sun or IBM can now be obtained with commodity hardware offered by many vendors. The price/performance ratio presented by any standard PC makes a compelling case for using PC hardware in accelerator controls wherever possible. The PCaPAC meeting underscored the importance of collaborative control system development. Several talks focused on additions to three such systems, TINE, TANGO and EPICS. The diverse contributions to these toolkits, both in content and source, demonstrate the power of leveraged software development across a number of facilities. TINE originated in DESY's desire to give users a unified software bus above disparate underlying platforms. TINE discussions at PCaPAC centered on the toolkit's interface layers, including address redirection and integration with other control systems. TANGO has been a collaborative effort from its inception. Based on CORBA, this open-source controls toolkit is a registered project in the source forge system. The workshop TANGO presentation discussed contributions from four TANGO institutions, and mentioned a broad range of new tools, from user interface applications to code generators and database integration software. EPICS, which was started at LANL in the 1980s, includes contributions from dozens of institutions around the world. EPICS-related PCaPAC discussions included virtual machines at

  8. Open Hardware Business Models

    Directory of Open Access Journals (Sweden)

    Edy Ferreira

    2008-04-01

    Full Text Available In the September issue of the Open Source Business Resource, Patrick McNamara, president of the Open Hardware Foundation, gave a comprehensive introduction to the concept of open hardware, including some insights about the potential benefits for both companies and users. In this article, we present the topic from a different perspective, providing a classification of market offers from companies that are making money with open hardware.

  9. HISTRAP [Heavy Ion Storage Ring for Atomic Physics] prototype hardware studies

    International Nuclear Information System (INIS)

    Olsen, D.K.; Atkins, W.H.; Dowling, D.T.; Johnson, J.W.; Lord, R.S.; McConnell, J.W.; Milner, W.T.; Mosko, S.W.; Tatum, B.A.

    1989-01-01

    HISTRAP, Heavy Ion Storage Ring for Atomic Physics, is a proposed 2.67-Tm synchrotron/cooler/storage ring optimized for advanced atomic physics research which will be injected with ions from either the HHIRF 25-MV tandem accelerator or a dedicated ECR source and RFQ linac. Over the last two years, hardware prototypes have been developed for difficult and long lead-time components. A vacuum test stand, the rf cavity, and a prototype dipole magnet have been designed, constructed, and tested. 7 refs., 8 figs., 2 tabs

  10. Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.

    Science.gov (United States)

    Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele

    2015-01-01

    Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.

  11. Sindbad: a multi-purpose and scalable X-ray simulation tool for NDE and medical imaging

    Energy Technology Data Exchange (ETDEWEB)

    Guillemaud, R.; Tabary, J.; Hugonnard, P.; Mathy, F.; Koenig, A.; Gliere, A

    2003-07-01

    In a unified framework, S.i.n.d.b.a.d. is a multipurpose X-ray simulation software which provides scalable approach of computation and very efficient results by combining analytical and monte Carlo simulations. The software has been validated experimentally. it is also a easy to use software with a strong emphasize on user friendly GUI, simple description of object (CAD or volume) and visualization tools. The next developments will be focused on acceleration of Monte Carlo simulation for scatter fraction computation and the addition of new types of detector. (N.C.)

  12. The VINEYARD project: Versatile Integrated Accelerator-based Heterogeneous Data Centers

    OpenAIRE

    Kachris, Christoforos; Soudris, Dimitrios; Gaydadjiev, Georgi; Nguyen, Huy-Nam

    2016-01-01

    Emerging applications like cloud computing and big data analytics have created the need for powerful centers hosting hundreds of thousands of servers. Currently, the data centers are based on general purpose processors that provide high flexibility but lacks the energy efficiency of customized accelerators. VINEYARD1 aims to develop novel servers based on programmable hardware accelerators. Furthermore, VINEYARD will develop an integrated framework for allowing end-users to seamlessly utilize...

  13. CMFD and GPU acceleration on method of characteristics for hexagonal cores

    International Nuclear Information System (INIS)

    Han, Yu; Jiang, Xiaofeng; Wang, Dezhong

    2014-01-01

    Highlights: • A merged hex-mesh CMFD method solved via tri-diagonal matrix inversion. • Alternative hardware acceleration of using inexpensive GPU. • A hex-core benchmark with solution to confirm two acceleration methods. - Abstract: Coarse Mesh Finite Difference (CMFD) has been widely adopted as an effective way to accelerate the source iteration of transport calculation. However in a core with hexagonal assemblies there are non-hexagonal meshes around the edges of assemblies, causing a problem for CMFD if the CMFD equations are still to be solved via tri-diagonal matrix inversion by simply scanning the whole core meshes in different directions. To solve this problem, we propose an unequal mesh CMFD formulation that combines the non-hexagonal cells on the boundary of neighboring assemblies into non-regular hexagonal cells. We also investigated the alternative hardware acceleration of using graphics processing units (GPU) with graphics card in a personal computer. The tool CUDA is employed, which is a parallel computing platform and programming model invented by the company NVIDIA for harnessing the power of GPU. To investigate and implement these two acceleration methods, a 2-D hexagonal core transport code using the method of characteristics (MOC) is developed. A hexagonal mini-core benchmark problem is established to confirm the accuracy of the MOC code and to assess the effectiveness of CMFD and GPU parallel acceleration. For this benchmark problem, the CMFD acceleration increases the speed 16 times while the GPU acceleration speeds it up 25 times. When used simultaneously, they provide a speed gain of 292 times

  14. CMFD and GPU acceleration on method of characteristics for hexagonal cores

    Energy Technology Data Exchange (ETDEWEB)

    Han, Yu, E-mail: hanyu1203@gmail.com [School of Nuclear Science and Engineering, Shanghai Jiaotong University, Shanghai 200240 (China); Jiang, Xiaofeng [Shanghai NuStar Nuclear Power Technology Co., Ltd., No. 81 South Qinzhou Road, XuJiaHui District, Shanghai 200000 (China); Wang, Dezhong [School of Nuclear Science and Engineering, Shanghai Jiaotong University, Shanghai 200240 (China)

    2014-12-15

    Highlights: • A merged hex-mesh CMFD method solved via tri-diagonal matrix inversion. • Alternative hardware acceleration of using inexpensive GPU. • A hex-core benchmark with solution to confirm two acceleration methods. - Abstract: Coarse Mesh Finite Difference (CMFD) has been widely adopted as an effective way to accelerate the source iteration of transport calculation. However in a core with hexagonal assemblies there are non-hexagonal meshes around the edges of assemblies, causing a problem for CMFD if the CMFD equations are still to be solved via tri-diagonal matrix inversion by simply scanning the whole core meshes in different directions. To solve this problem, we propose an unequal mesh CMFD formulation that combines the non-hexagonal cells on the boundary of neighboring assemblies into non-regular hexagonal cells. We also investigated the alternative hardware acceleration of using graphics processing units (GPU) with graphics card in a personal computer. The tool CUDA is employed, which is a parallel computing platform and programming model invented by the company NVIDIA for harnessing the power of GPU. To investigate and implement these two acceleration methods, a 2-D hexagonal core transport code using the method of characteristics (MOC) is developed. A hexagonal mini-core benchmark problem is established to confirm the accuracy of the MOC code and to assess the effectiveness of CMFD and GPU parallel acceleration. For this benchmark problem, the CMFD acceleration increases the speed 16 times while the GPU acceleration speeds it up 25 times. When used simultaneously, they provide a speed gain of 292 times.

  15. Computer automation of an accelerator mass spectrometry system

    International Nuclear Information System (INIS)

    Gressett, J.D.; Maxson, D.L.; Matteson, S.; McDaniel, F.D.; Duggan, J.L.; Mackey, H.J.; North Texas State Univ., Denton, TX; Anthony, J.M.

    1989-01-01

    The determination of trace impurities in electronic materials using accelerator mass spectrometry (AMS) requires efficient automation of the beam transport and mass discrimination hardware. The ability to choose between a variety of charge states, isotopes and injected molecules is necessary to provide survey capabilities similar to that available on conventional mass spectrometers. This paper will discuss automation hardware and software for flexible, high-sensitivity trace analysis of electronic materials, e.g. Si, GaAs and HgCdTe. Details regarding settling times will be presented, along with proof-of-principle experimental data. Potential and present applications will also be discussed. (orig.)

  16. Database characterisation of HEP applications

    International Nuclear Information System (INIS)

    Piorkowski, Mariusz; Grancher, Eric; Topurov, Anton

    2012-01-01

    Oracle-based database applications underpin many key aspects of operations for both the LHC accelerator and the LHC experiments. In addition to the overall performance, the predictability of the response is a key requirement to ensure smooth operations and delivering predictability requires understanding the applications from the ground up. Fortunately, database management systems provide several tools to check, measure, analyse and gather useful information. We present our experiences characterising the performance of several typical HEP database applications performance characterisations that were used to deliver improved predictability and scalability as well as for optimising the hardware platform choice as we migrated to new hardware and Oracle 11g.

  17. Open Hardware at CERN

    CERN Multimedia

    CERN Knowledge Transfer Group

    2015-01-01

    CERN is actively making its knowledge and technology available for the benefit of society and does so through a variety of different mechanisms. Open hardware has in recent years established itself as a very effective way for CERN to make electronics designs and in particular printed circuit board layouts, accessible to anyone, while also facilitating collaboration and design re-use. It is creating an impact on many levels, from companies producing and selling products based on hardware designed at CERN, to new projects being released under the CERN Open Hardware Licence. Today the open hardware community includes large research institutes, universities, individual enthusiasts and companies. Many of the companies are actively involved in the entire process from design to production, delivering services and consultancy and even making their own products available under open licences.

  18. A hardware overview of the RHIC LLRF platform

    International Nuclear Information System (INIS)

    Hayes, T.; Smith, K.S.

    2011-01-01

    The RHIC Low Level RF (LLRF) platform is a flexible, modular system designed around a carrier board with six XMC daughter sites. The carrier board features a Xilinx FPGA with an embedded, hard core Power PC that is remotely reconfigurable. It serves as a front end computer (FEC) that interfaces with the RHIC control system. The carrier provides high speed serial data paths to each daughter site and between daughter sites as well as four generic external fiber optic links. It also distributes low noise clocks and serial data links to all daughter sites and monitors temperature, voltage and current. To date, two XMC cards have been designed: a four channel high speed ADC and a four channel high speed DAC. The new LLRF hardware was used to replace the old RHIC LLRF system for the 2009 run. For the 2010 run, the RHIC RF system operation was dramatically changed with the introduction of accelerating both beams in a new, common cavity instead of each ring having independent cavities. The flexibility of the new system was beneficial in allowing the low level system to be adapted to support this new configuration. This hardware was also used in 2009 to provide LLRF for the newly commissioned Electron Beam Ion Source.

  19. Modelling of control system architecture for next-generation accelerators

    International Nuclear Information System (INIS)

    Liu, Shi-Yao; Kurokawa, Shin-ichi

    1990-01-01

    Functional, hardware and software system architectures define the fundamental structure of control systems. Modelling is a protocol of system architecture used in system design. This paper reviews various modellings adopted in past ten years and suggests a new modelling for next generation accelerators. (author)

  20. A computer control system for the PNC high power cw electron linac. Concept and hardware

    Energy Technology Data Exchange (ETDEWEB)

    Emoto, T.; Hirano, K.; Takei, Hayanori; Nomura, Masahiro; Tani, S. [Power Reactor and Nuclear Fuel Development Corp., Oarai, Ibaraki (Japan). Oarai Engineering Center; Kato, Y.; Ishikawa, Y.

    1998-06-01

    Design and construction of a high power cw (Continuous Wave) electron linac for studying feasibility of nuclear waste transmutation was started in 1989 at PNC. The PNC accelerator (10 MeV, 20 mA average current, 4 ms pulse width, 50 Hz repetition) is dedicated machine for development of the high current acceleration technology in future need. The computer control system is responsible for accelerator control and supporting the experiment for high power operation. The feature of the system is the measurements of accelerator status simultaneously and modularity of software and hardware for easily implemented for modification or expansion. The high speed network (SCRAM Net {approx} 15 MB/s), Ethernet, and front end processors (Digital Signal Processor) were employed for the high speed data taking and control. The system was designed to be standard modules and software implemented man machine interface. Due to graphical-user-interface and object-oriented-programming, the software development environment is effortless programming and maintenance. (author)

  1. The Concept of Business Model Scalability

    DEFF Research Database (Denmark)

    Lund, Morten; Nielsen, Christian

    2018-01-01

    -term pro table business. However, the main message of this article is that while providing a good value proposition may help the rm ‘get by’, the really successful businesses of today are those able to reach the sweet-spot of business model scalability. Design/Methodology/Approach: The article is based...... on a ve-year longitudinal action research project of over 90 companies that participated in the International Center for Innovation project aimed at building 10 global network-based business models. Findings: This article introduces and discusses the term scalability from a company-level perspective......Purpose: The purpose of the article is to de ne what scalable business models are. Central to the contemporary understanding of business models is the value proposition towards the customer and the hypotheses generated about delivering value to the customer which become a good foundation for a long...

  2. Oracle database performance and scalability a quantitative approach

    CERN Document Server

    Liu, Henry H

    2011-01-01

    A data-driven, fact-based, quantitative text on Oracle performance and scalability With database concepts and theories clearly explained in Oracle's context, readers quickly learn how to fully leverage Oracle's performance and scalability capabilities at every stage of designing and developing an Oracle-based enterprise application. The book is based on the author's more than ten years of experience working with Oracle, and is filled with dependable, tested, and proven performance optimization techniques. Oracle Database Performance and Scalability is divided into four parts that enable reader

  3. PKI Scalability Issues

    OpenAIRE

    Slagell, Adam J; Bonilla, Rafael

    2004-01-01

    This report surveys different PKI technologies such as PKIX and SPKI and the issues of PKI that affect scalability. Much focus is spent on certificate revocation methodologies and status verification systems such as CRLs, Delta-CRLs, CRS, Certificate Revocation Trees, Windowed Certificate Revocation, OCSP, SCVP and DVCS.

  4. Reconfigurable ATCA hardware for plasma control and data acquisition

    Energy Technology Data Exchange (ETDEWEB)

    Carvalho, B.B., E-mail: bernardo@ipfn.ist.utl.p [Associacao EURATOM/IST Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico, Av. Rovisco Pais, 1049-001 Lisboa (Portugal); Batista, A.J.N.; Correia, M.; Neto, A.; Fernandes, H.; Goncalves, B.; Sousa, J. [Associacao EURATOM/IST Instituto de Plasmas e Fusao Nuclear, Instituto Superior Tecnico, Av. Rovisco Pais, 1049-001 Lisboa (Portugal)

    2010-07-15

    The IST/EURATOM Association is developing a new generation of control and data acquisition hardware for fusion experiments based on the ATCA architecture. This emerging open standard offers a significantly higher data throughput over a reliable High Availability (HA) mechanical and electrical platform. One of this ATCA boards has 32 galvanically isolated ADC channels (18 bit) each mounted on a swappable plug-in card, 8 DAC channels (16 bit), 8 digital I/O channels and embeds a high performance XILINX Virtex 4 family field programmable gate array (FPGA). The specific modular and configurable hardware design enables adaptable utilization of the board in dissimilar applications. The first configuration, specially developed for tokamak plasma Vertical Stabilization, consists of a Multiple-Input-Multiple-Output (MIMO) controller that is capable of feedback loops faster than 1 ms using a multitude of input signals fed from different boards communicating through the Aurora{sup TM} point-to-point protocol. Massive parallel algorithms can be implemented on the FPGA either with programmed digital logic, using a HDL hardware description language, or within its internal silicon PowerPC{sup TM} running a full fledged real-time operating system. The second board configuration is dedicated for transient recording of the entire 32 channels at 2 MSamples/s to the on-board 512 MB DDR2 memory. Signal data retrieval is accelerated by a DMA-driven PCI Express{sup TM} x1 Interface to the ATCA system controller, providing an overall throughput in excess of 100 MB/s. This paper illustrates these developments and discusses possible configurations for foreseen applications.

  5. A portable accelerator control toolkit

    Energy Technology Data Exchange (ETDEWEB)

    Watson, W.A. III

    1997-06-01

    In recent years, the expense of creating good control software has led to a number of collaborative efforts among laboratories to share this cost. The EPICS collaboration is a particularly successful example of this trend. More recently another collaborative effort has addressed the need for sophisticated high level software, including model driven accelerator controls. This work builds upon the CDEV (Common DEVice) software framework, which provides a generic abstraction of a control system, and maps that abstraction onto a number of site-specific control systems including EPICS, the SLAC control system, CERN/PS and others. In principle, it is now possible to create portable accelerator control applications which have no knowledge of the underlying and site-specific control system. Applications based on CDEV now provide a growing suite of tools for accelerator operations, including general purpose displays, an on-line accelerator model, beamline steering, machine status displays incorporating both hardware and model information (such as beam positions overlaid with beta functions) and more. A survey of CDEV compatible portable applications will be presented, as well as plans for future development.

  6. A portable accelerator control toolkit

    International Nuclear Information System (INIS)

    Watson, W.A. III.

    1997-01-01

    In recent years, the expense of creating good control software has led to a number of collaborative efforts among laboratories to share this cost. The EPICS collaboration is a particularly successful example of this trend. More recently another collaborative effort has addressed the need for sophisticated high level software, including model driven accelerator controls. This work builds upon the CDEV (Common DEVice) software framework, which provides a generic abstraction of a control system, and maps that abstraction onto a number of site-specific control systems including EPICS, the SLAC control system, CERN/PS and others. In principle, it is now possible to create portable accelerator control applications which have no knowledge of the underlying and site-specific control system. Applications based on CDEV now provide a growing suite of tools for accelerator operations, including general purpose displays, an on-line accelerator model, beamline steering, machine status displays incorporating both hardware and model information (such as beam positions overlaid with beta functions) and more. A survey of CDEV compatible portable applications will be presented, as well as plans for future development

  7. XACC - eXtreme-scale Accelerator Programming Framework

    Energy Technology Data Exchange (ETDEWEB)

    2016-11-18

    Hybrid programming models for beyond-CMOS technologies will prove critical for integrating new computing technologies alongside our existing infrastructure. Unfortunately the software infrastructure required to enable this is lacking or not available. XACC is a programming framework for extreme-scale, post-exascale accelerator architectures that integrates alongside existing conventional applications. It is a pluggable framework for programming languages developed for next-gen computing hardware architectures like quantum and neuromorphic computing. It lets computational scientists efficiently off-load classically intractable work to attached accelerators through user-friendly Kernel definitions. XACC makes post-exascale hybrid programming approachable for domain computational scientists.

  8. On Scalability and Replicability of Smart Grid Projects—A Case Study

    Directory of Open Access Journals (Sweden)

    Lukas Sigrist

    2016-03-01

    Full Text Available This paper studies the scalability and replicability of smart grid projects. Currently, most smart grid projects are still in the R&D or demonstration phases. The full roll-out of the tested solutions requires a suitable degree of scalability and replicability to prevent project demonstrators from remaining local experimental exercises. Scalability and replicability are the preliminary requisites to perform scaling-up and replication successfully; therefore, scalability and replicability allow for or at least reduce barriers for the growth and reuse of the results of project demonstrators. The paper proposes factors that influence and condition a project’s scalability and replicability. These factors involve technical, economic, regulatory and stakeholder acceptance related aspects, and they describe requirements for scalability and replicability. In order to assess and evaluate the identified scalability and replicability factors, data has been collected from European and national smart grid projects by means of a survey, reflecting the projects’ view and results. The evaluation of the factors allows quantifying the status quo of on-going projects with respect to the scalability and replicability, i.e., they provide a feedback on to what extent projects take into account these factors and on whether the projects’ results and solutions are actually scalable and replicable.

  9. From Newton to Einstein - N-body dynamics in galactic nuclei and SPH using new special hardware and astrogrid-D

    International Nuclear Information System (INIS)

    Spurzem, R; Berczik, P; Berentzen, I; Merritt, D; Nakasato, N; Adorf, H M; Bruesemeister, T; Schwekendiek, P; Steinacker, J; Wambsganss, J; Martinez, G Marcus; Lienhart, G; Kugel, A; Maenner, R; Burkert, A; Naab, T; Vasquez, H; Wetzstein, M

    2007-01-01

    The dynamics of galactic nuclei containing multiple supermassive black holes is modelled including relativistic dynamics. It is shown that for certain initial conditions there is no stalling problem for the relativistic coalescence of supermassive black hole binaries. This astrophysical application and another one using a smoothed particle hydrodynamics code are our first use cases on a new computer architecture using GRAPE and new MPRACE accelerator cards based on reconfigurable chips, developed in the GRACE project. We briefly discuss our science applications and first benchmarks obtained with the new hardware. Our present architecture still relies on the GRAPE special purpose hardware (not reconfigurable), but next generations will focus on new architectural approaches including custom network and computing architectures. The new hardware is embedded into national and international grid infrastructures

  10. Scalable-to-lossless transform domain distributed video coding

    DEFF Research Database (Denmark)

    Huang, Xin; Ukhanova, Ann; Veselov, Anton

    2010-01-01

    Distributed video coding (DVC) is a novel approach providing new features as low complexity encoding by mainly exploiting the source statistics at the decoder based on the availability of decoder side information. In this paper, scalable-tolossless DVC is presented based on extending a lossy Tran...... codec provides frame by frame encoding. Comparing the lossless coding efficiency, the proposed scalable-to-lossless TDWZ video codec can save up to 5%-13% bits compared to JPEG LS and H.264 Intra frame lossless coding and do so as a scalable-to-lossless coding....

  11. Hardware protection through obfuscation

    CERN Document Server

    Bhunia, Swarup; Tehranipoor, Mark

    2017-01-01

    This book introduces readers to various threats faced during design and fabrication by today’s integrated circuits (ICs) and systems. The authors discuss key issues, including illegal manufacturing of ICs or “IC Overproduction,” insertion of malicious circuits, referred as “Hardware Trojans”, which cause in-field chip/system malfunction, and reverse engineering and piracy of hardware intellectual property (IP). The authors provide a timely discussion of these threats, along with techniques for IC protection based on hardware obfuscation, which makes reverse-engineering an IC design infeasible for adversaries and untrusted parties with any reasonable amount of resources. This exhaustive study includes a review of the hardware obfuscation methods developed at each level of abstraction (RTL, gate, and layout) for conventional IC manufacturing, new forms of obfuscation for emerging integration strategies (split manufacturing, 2.5D ICs, and 3D ICs), and on-chip infrastructure needed for secure exchange o...

  12. The FAST (FRC Acceleration Space Thruster) Experiment

    Science.gov (United States)

    Martin, Adam; Eskridge, R.; Lee, M.; Richeson, J.; Smith, J.; Thio, Y. C. F.; Slough, J.; Rodgers, Stephen L. (Technical Monitor)

    2001-01-01

    The Field Reverse Configuration (FRC) is a magnetized plasmoid that has been developed for use in magnetic confinement fusion. Several of its properties suggest that it may also be useful as a thruster for in-space propulsion. The FRC is a compact toroid that has only poloidal field, and is characterized by a high plasma beta = (P)/(B (sup 2) /2Mu0), the ratio of plasma pressure to magnetic field pressure, so that it makes efficient use of magnetic field to confine a plasma. In an FRC thruster, plasmoids would be repetitively formed and accelerated to high velocity; velocities of = 250 km/s (Isp = 25,000s) have already been achieved in fusion experiments. The FRC is inductively formed and accelerated, and so is not subject to the problem of electrode erosion. As the plasmoid may be accelerated over an extended length, it can in principle be made very efficient. And the achievable jet powers should be scalable to the MW range. A 10 kW thruster experiment - FAST (FRC Acceleration Space Thruster) has just started at the Marshall Space Flight Center. The design of FAST and the status of construction and operation will be presented.

  13. High-performance computing in accelerating structure design and analysis

    International Nuclear Information System (INIS)

    Li Zenghai; Folwell, Nathan; Ge Lixin; Guetz, Adam; Ivanov, Valentin; Kowalski, Marc; Lee, Lie-Quan; Ng, Cho-Kuen; Schussman, Greg; Stingelin, Lukas; Uplenchwar, Ravindra; Wolf, Michael; Xiao, Liling; Ko, Kwok

    2006-01-01

    Future high-energy accelerators such as the Next Linear Collider (NLC) will accelerate multi-bunch beams of high current and low emittance to obtain high luminosity, which put stringent requirements on the accelerating structures for efficiency and beam stability. While numerical modeling has been quite standard in accelerator R and D, designing the NLC accelerating structure required a new simulation capability because of the geometric complexity and level of accuracy involved. Under the US DOE Advanced Computing initiatives (first the Grand Challenge and now SciDAC), SLAC has developed a suite of electromagnetic codes based on unstructured grids and utilizing high-performance computing to provide an advanced tool for modeling structures at accuracies and scales previously not possible. This paper will discuss the code development and computational science research (e.g. domain decomposition, scalable eigensolvers, adaptive mesh refinement) that have enabled the large-scale simulations needed for meeting the computational challenges posed by the NLC as well as projects such as the PEP-II and RIA. Numerical results will be presented to show how high-performance computing has made a qualitative improvement in accelerator structure modeling for these accelerators, either at the component level (single cell optimization), or on the scale of an entire structure (beam heating and long-range wakefields)

  14. Mechanical engineering and design criteria for the Magnetically Insulated Transmission Experiment Accelerator

    International Nuclear Information System (INIS)

    Staller, G.E.; Hamilton, I.D.; Aker, M.F.; Fifer, H.G.

    1978-02-01

    A single-unit electron beam accelerator was designed, fabricated, and assembled in Sandia's Technical Area V to conduct magnetically insulated transmission experiments. Results of these experiments will be utilized in the future design of larger, more complex accelerators. This design makes optimum use of existing facilities and equipment. When designing new components, possible future applications were considered as well as compatibility with existing facilities and hardware

  15. An extended systematic mapping study about the scalability of i* Models

    Directory of Open Access Journals (Sweden)

    Paulo Lima

    2016-12-01

    Full Text Available i* models have been used for requirements specification in many domains, such as healthcare, telecommunication, and air traffic control. Managing the scalability and the complexity of such models is an important challenge in Requirements Engineering (RE. Scalability is also one of the most intractable issues in the design of visual notations in general: a well-known problem with visual representations is that they do not scale well. This issue has led us to investigate scalability in i* models and its variants by means of a systematic mapping study. This paper is an extended version of a previous paper on the scalability of i* including papers indicated by specialists. Moreover, we also discuss the challenges and open issues regarding scalability of i* models and its variants. A total of 126 papers were analyzed in order to understand: how the RE community perceives scalability; and which proposals have considered this topic. We found that scalability issues are indeed perceived as relevant and that further work is still required, even though many potential solutions have already been proposed. This study can be a starting point for researchers aiming to further advance the treatment of scalability in i* models.

  16. Performance and scalability of the back-end sub-system in the ATLAS DAQ/EF prototype

    CERN Document Server

    Alexandrov, I N; Badescu, E; Burckhart, Doris; Caprini, M; Cohen, L; Duval, P Y; Hart, R; Jones, R; Kazarov, A; Kolos, S; Kotov, V; Laugier, D; Mapelli, Livio P; Moneta, L; Qian, Z; Radu, A A; Ribeiro, C A; Roumiantsev, V; Ryabov, Yu; Schweiger, D; Soloviev, I V

    2000-01-01

    The DAQ group of the future ATLAS experiment has developed a prototype system based on the trigger/DAQ architecture described in the ATLAS Technical Proposal to support studies of the full system functionality, architecture as well as available hardware and software technologies. One sub-system of this prototype is the back- end which encompasses the software needed to configure, control and monitor the DAQ, but excludes the processing and transportation of physics data. The back-end consists of a number of components including run control, configuration databases and message reporting system. The software has been developed using standard, external software technologies such as OO databases and CORBA. It has been ported to several C++ compilers and operating systems including Solaris, Linux, WNT and LynxOS. This paper gives an overview of the back-end software, its performance, scalability and current status. (17 refs).

  17. Scalable Transactions for Web Applications in the Cloud

    NARCIS (Netherlands)

    Zhou, W.; Pierre, G.E.O.; Chi, C.-H.

    2009-01-01

    Cloud Computing platforms provide scalability and high availability properties for web applications but they sacrifice data consistency at the same time. However, many applications cannot afford any data inconsistency. We present a scalable transaction manager for NoSQL cloud database services to

  18. Expert System analysis of non-fuel assembly hardware and spent fuel disassembly hardware: Its generation and recommended disposal

    International Nuclear Information System (INIS)

    Williamson, D.A.

    1991-01-01

    Almost all of the effort being expended on radioactive waste disposal in the United States is being focused on the disposal of spent Nuclear Fuel, with little consideration for other areas that will have to be disposed of in the same facilities. one area of radioactive waste that has not been addressed adequately because it is considered a secondary part of the waste issue is the disposal of the various Non-Fuel Bearing Components of the reactor core. These hardware components fall somewhat arbitrarily into two categories: Non-Fuel Assembly (NFA) hardware and Spent Fuel Disassembly (SFD) hardware. This work provides a detailed examination of the generation and disposal of NFA hardware and SFD hardware by the nuclear utilities of the United States as it relates to the Civilian Radioactive Waste Management Program. All available sources of data on NFA and SFD hardware are analyzed with particular emphasis given to the Characteristics Data Base developed by Oak Ridge National Laboratory and the characterization work performed by Pacific Northwest Laboratories and Rochester Gas ampersand Electric. An Expert System developed as a portion of this work is used to assist in the prediction of quantities of NFA hardware and SFD hardware that will be generated by the United States' utilities. Finally, the hardware waste management practices of the United Kingdom, France, Germany, Sweden, and Japan are studied for possible application to the disposal of domestic hardware wastes. As a result of this work, a general classification scheme for NFA and SFD hardware was developed. Only NFA and SFD hardware constructed of zircaloy and experiencing a burnup of less than 70,000 MWD/MTIHM and PWR control rods constructed of stainless steel are considered Low-Level Waste. All other hardware is classified as Greater-ThanClass-C waste

  19. Requirements for Scalable Access Control and Security Management Architectures

    National Research Council Canada - National Science Library

    Keromytis, Angelos D; Smith, Jonathan M

    2005-01-01

    Maximizing local autonomy has led to a scalable Internet. Scalability and the capacity for distributed control have unfortunately not extended well to resource access control policies and mechanisms...

  20. Microprocessor controller for phasing the accelerator

    International Nuclear Information System (INIS)

    Howry, S.K.; Wilmunder, A.R.

    1977-03-01

    A microprocessor controller is being developed to perform automatic phasing of the SLAC accelerator. It will replace the existing relay/analog boxes which are ten years old. The new system is all solid state except for the stepping motors that drive the phase shifters. A description is given of the components of the system, the control algorithm, microprocessor hardware and software design and development, and interaction with SLAC's computer control system

  1. Scalable cloud without dedicated storage

    Science.gov (United States)

    Batkovich, D. V.; Kompaniets, M. V.; Zarochentsev, A. K.

    2015-05-01

    We present a prototype of a scalable computing cloud. It is intended to be deployed on the basis of a cluster without the separate dedicated storage. The dedicated storage is replaced by the distributed software storage. In addition, all cluster nodes are used both as computing nodes and as storage nodes. This solution increases utilization of the cluster resources as well as improves fault tolerance and performance of the distributed storage. Another advantage of this solution is high scalability with a relatively low initial and maintenance cost. The solution is built on the basis of the open source components like OpenStack, CEPH, etc.

  2. Scalable Multi-Platform Distribution of Spatial 3d Contents

    Science.gov (United States)

    Klimke, J.; Hagedorn, B.; Döllner, J.

    2013-09-01

    Virtual 3D city models provide powerful user interfaces for communication of 2D and 3D geoinformation. Providing high quality visualization of massive 3D geoinformation in a scalable, fast, and cost efficient manner is still a challenging task. Especially for mobile and web-based system environments, software and hardware configurations of target systems differ significantly. This makes it hard to provide fast, visually appealing renderings of 3D data throughout a variety of platforms and devices. Current mobile or web-based solutions for 3D visualization usually require raw 3D scene data such as triangle meshes together with textures delivered from server to client, what makes them strongly limited in terms of size and complexity of the models they can handle. In this paper, we introduce a new approach for provisioning of massive, virtual 3D city models on different platforms namely web browsers, smartphones or tablets, by means of an interactive map assembled from artificial oblique image tiles. The key concept is to synthesize such images of a virtual 3D city model by a 3D rendering service in a preprocessing step. This service encapsulates model handling and 3D rendering techniques for high quality visualization of massive 3D models. By generating image tiles using this service, the 3D rendering process is shifted from the client side, which provides major advantages: (a) The complexity of the 3D city model data is decoupled from data transfer complexity (b) the implementation of client applications is simplified significantly as 3D rendering is encapsulated on server side (c) 3D city models can be easily deployed for and used by a large number of concurrent users, leading to a high degree of scalability of the overall approach. All core 3D rendering techniques are performed on a dedicated 3D rendering server, and thin-client applications can be compactly implemented for various devices and platforms.

  3. The Fermilab Accelerator control system

    Science.gov (United States)

    Bogert, Dixon

    1986-06-01

    With the advent of the Tevatron, considerable upgrades have been made to the controls of all the Fermilab Accelerators. The current system is based on making as large an amount of data as possible available to many operators or end-users. Specifically there are about 100 000 separate readings, settings, and status and control registers in the various machines, all of which can be accessed by seventeen consoles, some in the Main Control Room and others distributed throughout the complex. A "Host" computer network of approximately eighteen PDP-11/34's, seven PDP-11/44's, and three VAX-11/785's supports a distributed data acquisition system including Lockheed MAC-16's left from the original Main Ring and Booster instrumentation and upwards of 1000 Z80, Z8002, and M68000 microprocessors in dozens of configurations. Interaction of the various parts of the system is via a central data base stored on the disk of one of the VAXes. The primary computer-hardware communication is via CAMAC for the new Tevatron and Antiproton Source; certain subsystems, among them vacuum, refrigeration, and quench protection, reside in the distributed microprocessors and communicate via GAS, an in-house protocol. An important hardware feature is an accurate clock system making a large number of encoded "events" in the accelerator supercycle available for both hardware modules and computers. System software features include the ability to save the current state of the machine or any subsystem and later restore it or compare it with the state at another time, a general logging facility to keep track of specific variables over long periods of time, detection of "exception conditions" and the posting of alarms, and a central filesharing capability in which files on VAX disks are available for access by any of the "Host" processors.

  4. The Fermilab accelerator control system

    International Nuclear Information System (INIS)

    Bogert, D.

    1986-01-01

    With the advent of the Tevatron, considerable upgrades have been made to the controls of all the Fermilab Accelerators. The current system is based on making as large an amount of data as possible available to many operators or end-users. Specifically there are about 100000 separate readings, settings, and status and control registers in the various machines, all of which can be accessed by seventeen consoles, some in the Main Control Room and others distributed throughout the complex. A ''Host'' computer network of approximately eighteen PDP-11/34's, seven PDP-11/44's, and three VAX-11/785's supports a distributed data acquisition system including Lockheed MAC-16's left from the original Main Ring and Booster instrumentation and upwards of 1000 Z80, Z8002, and M68000 microprocessors in dozens of configurations. Interaction of the various parts of the system is via a central data base stored on the disk of one of the VAXes. The primary computer-hardware communication is via CAMAC for the new Tevatron and Antiproton Source; certain subsystems, among them vacuum, refrigeration and quench protection, reside in the distributed microprocessors and communicate via GAS, an in-house protocol. An important hardware feature is an accurate clock system making a large number of encoded ''events'' in the accelerator supercycle available for both hardware modules and computers. System software features include the ability to save the current state of the machine or any subsystem and later restore it or compare it with the state at another time, a general logging facility to keep track of specific variables over long periods of time, detection of 'exception conditions' and the posting of alarms, and a central filesharing capability in which files on VAX disks are available for access by any of the ''Host'' processors. (orig.)

  5. A systematic FPGA acceleration design for applications based on convolutional neural networks

    Science.gov (United States)

    Dong, Hao; Jiang, Li; Li, Tianjian; Liang, Xiaoyao

    2018-04-01

    Most FPGA accelerators for convolutional neural network are designed to optimize the inner acceleration and are ignored of the optimization for the data path between the inner accelerator and the outer system. This could lead to poor performance in applications like real time video object detection. We propose a brand new systematic FPFA acceleration design to solve this problem. This design takes the data path optimization between the inner accelerator and the outer system into consideration and optimizes the data path using techniques like hardware format transformation, frame compression. It also takes fixed-point, new pipeline technique to optimize the inner accelerator. All these make the final system's performance very good, reaching about 10 times the performance comparing with the original system.

  6. Hardware Support for Embedded Java

    DEFF Research Database (Denmark)

    Schoeberl, Martin

    2012-01-01

    The general Java runtime environment is resource hungry and unfriendly for real-time systems. To reduce the resource consumption of Java in embedded systems, direct hardware support of the language is a valuable option. Furthermore, an implementation of the Java virtual machine in hardware enables...... worst-case execution time analysis of Java programs. This chapter gives an overview of current approaches to hardware support for embedded and real-time Java....

  7. HARDWARE TROJAN IDENTIFICATION AND DETECTION

    OpenAIRE

    Samer Moein; Fayez Gebali; T. Aaron Gulliver; Abdulrahman Alkandari

    2017-01-01

    ABSTRACT The majority of techniques developed to detect hardware trojans are based on specific attributes. Further, the ad hoc approaches employed to design methods for trojan detection are largely ineffective. Hardware trojans have a number of attributes which can be used to systematically develop detection techniques. Based on this concept, a detailed examination of current trojan detection techniques and the characteristics of existing hardware trojans is presented. This is used to dev...

  8. Hunting for hardware changes in data centres

    International Nuclear Information System (INIS)

    Coelho dos Santos, M; Steers, I; Szebenyi, I; Xafi, A; Barring, O; Bonfillou, E

    2012-01-01

    With many servers and server parts the environment of warehouse sized data centres is increasingly complex. Server life-cycle management and hardware failures are responsible for frequent changes that need to be managed. To manage these changes better a project codenamed “hardware hound” focusing on hardware failure trending and hardware inventory has been started at CERN. By creating and using a hardware oriented data set - the inventory - with detailed information on servers and their parts as well as tracking changes to this inventory, the project aims at, for example, being able to discover trends in hardware failure rates.

  9. Operating experience with the Fermilab 500-GeV accelerator

    International Nuclear Information System (INIS)

    Urban, G.S.; Gannon, J.C.

    1977-01-01

    The Fermilab accelerator has been operating for more than four years. It has been improved so that it is now capable of operating at an energy of 500 GeV and an intensity in excess of 2.0 x 10 13 protons per pulse. The accelerator is manned on a 24 hour a day basis by an operating team of five persons. This is possible in part, because almost all of the hardware systems have status monitoring and control through an advanced computer control system. A discussion is given of the operation of the accelerator with emphasis on person to machine interface, operator training techniques used at Fermilab, and the keeping of records and reliability information

  10. Open-source hardware for medical devices.

    Science.gov (United States)

    Niezen, Gerrit; Eslambolchilar, Parisa; Thimbleby, Harold

    2016-04-01

    Open-source hardware is hardware whose design is made publicly available so anyone can study, modify, distribute, make and sell the design or the hardware based on that design. Some open-source hardware projects can potentially be used as active medical devices. The open-source approach offers a unique combination of advantages, including reducing costs and faster innovation. This article compares 10 of open-source healthcare projects in terms of how easy it is to obtain the required components and build the device.

  11. Re-configurable ATCA Hardware for Plasma Control and Data Acquisition

    Energy Technology Data Exchange (ETDEWEB)

    Carvalho, B.; Batista, A.; Correia, M.; Fernandes, H.; Sousa, J. [Instituto de Plasmas e Fusao Nuclear - Instituto Superior Tecnico, Lisbon (Portugal)

    2009-07-01

    The IST/EURATOM Association is developing a new generation of control and data acquisition hardware for fusion experiments based on the ATCA architecture. This emerging open standard offers a significantly higher data throughput over a reliable High Availability (HA) mechanical and electrical platform. One of this ATCA boards, has 32 galvanic isolated ADC channels (18 bit) each mounted on a exchangeable plug-in card, 8 DAC channels (16 bit), 8 digital I/O channels and embeds a high performance XILINX Virtex 4 family field programmable gate array (FPGA). The specific modular hardware design enables adaptable utilization of the board in dissimilar applications. The first configuration, specially developed for tokamak plasma Vertical Stabilization, consists of a Multiple-Input-Multiple-Output (MIMO) controller that is capable of feedback loops faster than 1 ms, using a multitude of input signals fed from different boards communicating through the Aurora point-to-point protocol. Massive parallel algorithms can be implemented inside the FPGA either with programmed digital logic, using a HDL hardware description language, or inside the two included silicon PowerPCs running a full fledged real-time operating system. The second board configuration is dedicated for transient recording of the entire 32 channels at 2 MSamples/s to the built-in 512 MBDDR2 memory. Signal data retrieval is accelerated by a DMA-driven PCI Express-x1 Interface to the ATCA system controller providing an overall throughput in excess of 250 MB/s. This paper illustrates these developments and discusses possible configurations for foreseen applications. (authors)

  12. Modular Universal Scalable Ion-trap Quantum Computer

    Science.gov (United States)

    2016-06-02

    SECURITY CLASSIFICATION OF: The main goal of the original MUSIQC proposal was to construct and demonstrate a modular and universally- expandable ion...Distribution Unlimited UU UU UU UU 02-06-2016 1-Aug-2010 31-Jan-2016 Final Report: Modular Universal Scalable Ion-trap Quantum Computer The views...P.O. Box 12211 Research Triangle Park, NC 27709-2211 Ion trap quantum computation, scalable modular architectures REPORT DOCUMENTATION PAGE 11

  13. Scalable and Media Aware Adaptive Video Streaming over Wireless Networks

    Directory of Open Access Journals (Sweden)

    Béatrice Pesquet-Popescu

    2008-07-01

    Full Text Available This paper proposes an advanced video streaming system based on scalable video coding in order to optimize resource utilization in wireless networks with retransmission mechanisms at radio protocol level. The key component of this system is a packet scheduling algorithm which operates on the different substreams of a main scalable video stream and which is implemented in a so-called media aware network element. The concerned type of transport channel is a dedicated channel subject to parameters (bitrate, loss rate variations on the long run. Moreover, we propose a combined scalability approach in which common temporal and SNR scalability features can be used jointly with a partitioning of the image into regions of interest. Simulation results show that our approach provides substantial quality gain compared to classical packet transmission methods and they demonstrate how ROI coding combined with SNR scalability allows to improve again the visual quality.

  14. Design issues for numerical libraries on scalable multicore architectures

    International Nuclear Information System (INIS)

    Heroux, M A

    2008-01-01

    Future generations of scalable computers will rely on multicore nodes for a significant portion of overall system performance. At present, most applications and libraries cannot exploit multiple cores beyond running addition MPI processes per node. In this paper we discuss important multicore architecture issues, programming models, algorithms requirements and software design related to effective use of scalable multicore computers. In particular, we focus on important issues for library research and development, making recommendations for how to effectively develop libraries for future scalable computer systems

  15. An evaluation of Skylab habitability hardware

    Science.gov (United States)

    Stokes, J.

    1974-01-01

    For effective mission performance, participants in space missions lasting 30-60 days or longer must be provided with hardware to accommodate their personal needs. Such habitability hardware was provided on Skylab. Equipment defined as habitability hardware was that equipment composing the food system, water system, sleep system, waste management system, personal hygiene system, trash management system, and entertainment equipment. Equipment not specifically defined as habitability hardware but which served that function were the Wardroom window, the exercise equipment, and the intercom system, which was occasionally used for private communications. All Skylab habitability hardware generally functioned as intended for the three missions, and most items could be considered as adequate concepts for future flights of similar duration. Specific components were criticized for their shortcomings.

  16. A High-Speed Design of Montgomery Multiplier

    Science.gov (United States)

    Fan, Yibo; Ikenaga, Takeshi; Goto, Satoshi

    With the increase of key length used in public cryptographic algorithms such as RSA and ECC, the speed of Montgomery multiplication becomes a bottleneck. This paper proposes a high speed design of Montgomery multiplier. Firstly, a modified scalable high-radix Montgomery algorithm is proposed to reduce critical path. Secondly, a high-radix clock-saving dataflow is proposed to support high-radix operation and one clock cycle delay in dataflow. Finally, a hardware-reused architecture is proposed to reduce the hardware cost and a parallel radix-16 design of data path is proposed to accelerate the speed. By using HHNEC 0.25μm standard cell library, the implementation results show that the total cost of Montgomery multiplier is 130 KGates, the clock frequency is 180MHz and the throughput of 1024-bit RSA encryption is 352kbps. This design is suitable to be used in high speed RSA or ECC encryption/decryption. As a scalable design, it supports any key-length encryption/decryption up to the size of on-chip memory.

  17. Hardware processor for tracking particles in an alternating-gradient synchrotron

    International Nuclear Information System (INIS)

    Johnson, M.; Avilez, C.

    1987-01-01

    We discuss the design and performance of special-purpose processors for tracking particles through an alternating-gradient synchrotron. We present block diagram designs for two hardware processors. Both processors use algorithms based on the 'kick' approximation, i.e., transport matrices are used for dipoles and quadrupoles, and the thin-lens approximation is used for all higher multipoles. The faster processor makes extensive use of memory look-up tables for evaluating functions. For the case of magnets with multipoles up to pole 30 and using one kick per magnet, this processor can track 19 particles through an accelerator at a rate that is only 220 times slower than the time it takes real particles to travel around the machine. For a model consisting of only thin lenses, it is only 150 times slower than real particles. An additional factor of 2 can be obtained with chips now becoming available. The number of magnets in the accelerator is limited only by the amount of memory available for storing magnet parameters. (author) 20 refs., 7 figs., 2 tabs

  18. Hardware/Software Co-Design of a Traffic Sign Recognition System Using Zynq FPGAs

    Directory of Open Access Journals (Sweden)

    Yan Han

    2015-12-01

    Full Text Available Traffic sign recognition (TSR, taken as an important component of an intelligent vehicle system, has been an emerging research topic in recent years. In this paper, a traffic sign detection system based on color segmentation, speeded-up robust features (SURF detection and the k-nearest neighbor classifier is introduced. The proposed system benefits from the SURF detection algorithm, which achieves invariance to rotated, skewed and occluded signs. In addition to the accuracy and robustness issues, a TSR system should target a real-time implementation on an embedded system. Therefore, a hardware/software co-design architecture for a Zynq-7000 FPGA is presented as a major objective of this work. The sign detection operations are accelerated by programmable hardware logic that searches the potential candidates for sign classification. Sign recognition and classification uses a feature extraction and matching algorithm, which is implemented as a software component that runs on the embedded ARM CPU.

  19. Is Hardware Removal Recommended after Ankle Fracture Repair?

    Directory of Open Access Journals (Sweden)

    Hong-Geun Jung

    2016-01-01

    Full Text Available The indications and clinical necessity for routine hardware removal after treating ankle or distal tibia fracture with open reduction and internal fixation are disputed even when hardware-related pain is insignificant. Thus, we determined the clinical effects of routine hardware removal irrespective of the degree of hardware-related pain, especially in the perspective of patients’ daily activities. This study was conducted on 80 consecutive cases (78 patients treated by surgery and hardware removal after bony union. There were 56 ankle and 24 distal tibia fractures. The hardware-related pain, ankle joint stiffness, discomfort on ambulation, and patient satisfaction were evaluated before and at least 6 months after hardware removal. Pain score before hardware removal was 3.4 (range 0 to 6 and decreased to 1.3 (range 0 to 6 after removal. 58 (72.5% patients experienced improved ankle stiffness and 65 (81.3% less discomfort while walking on uneven ground and 63 (80.8% patients were satisfied with hardware removal. These results suggest that routine hardware removal after ankle or distal tibia fracture could ameliorate hardware-related pain and improves daily activities and patient satisfaction even when the hardware-related pain is minimal.

  20. Door Hardware and Installations; Carpentry: 901894.

    Science.gov (United States)

    Dade County Public Schools, Miami, FL.

    The curriculum guide outlines a course designed to provide instruction in the selection, preparation, and installation of hardware for door assemblies. The course is divided into five blocks of instruction (introduction to doors and hardware, door hardware, exterior doors and jambs, interior doors and jambs, and a quinmester post-test) totaling…

  1. Evaluation of the Intel Xeon Phi Co-processor to accelerate the sensitivity map calculation for PET imaging

    Science.gov (United States)

    Dey, T.; Rodrigue, P.

    2015-07-01

    We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 103 to 104 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called `intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The examination

  2. Evaluation of the Intel Xeon Phi Co-processor to accelerate the sensitivity map calculation for PET imaging

    International Nuclear Information System (INIS)

    Dey, T.; Rodrigue, P.

    2015-01-01

    We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 10 3 to 10 4 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called 'intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The

  3. Research on GPU acceleration for Monte Carlo criticality calculation

    International Nuclear Information System (INIS)

    Xu, Q.; Yu, G.; Wang, K.

    2013-01-01

    The Monte Carlo (MC) neutron transport method can be naturally parallelized by multi-core architectures due to the dependency between particles during the simulation. The GPU+CPU heterogeneous parallel mode has become an increasingly popular way of parallelism in the field of scientific supercomputing. Thus, this work focuses on the GPU acceleration method for the Monte Carlo criticality simulation, as well as the computational efficiency that GPUs can bring. The 'neutron transport step' is introduced to increase the GPU thread occupancy. In order to test the sensitivity of the MC code's complexity, a 1D one-group code and a 3D multi-group general purpose code are respectively transplanted to GPUs, and the acceleration effects are compared. The result of numerical experiments shows considerable acceleration effect of the 'neutron transport step' strategy. However, the performance comparison between the 1D code and the 3D code indicates the poor scalability of MC codes on GPUs. (authors)

  4. The proof-of-concept experiment for the spiral line induction accelerator

    Energy Technology Data Exchange (ETDEWEB)

    Putnam, S D; Bailey, V L; Smith, J; Lidestri, J; Thomas, H; Lackner, H; Nishimoto, H [Pulse Sciences, Inc., San Leandro, CA (United States)

    1997-12-31

    A proof-of-concept experiment (POCE) for the Spiral Line Induction Accelerator (SLIA) is underway at Pulse Sciences, Inc. to demonstrate a new compact high current ({>=} few kiloamperes) recirculating induction accelerator for high power ({>=} 100 kW) commercial processing and other applications. Hardware has been fabricated to generate 9.5 MeV electron beams at 2 and 10 kA by recirculating the beam for two passes through each of two 1.5 MeV accelerating units. Initial experiments have demonstrated acceleration of 2 and 10 kA beams to 5.5 MeV by transport around a complete turn with two passes through a single accelerating unit and work is currently in progress to complete the full POCE. Experimental results to date are reported. (author). 5 figs., 14 refs.

  5. KOLAM: a cross-platform architecture for scalable visualization and tracking in wide-area imagery

    Science.gov (United States)

    Fraser, Joshua; Haridas, Anoop; Seetharaman, Guna; Rao, Raghuveer M.; Palaniappan, Kannappan

    2013-05-01

    KOLAM is an open, cross-platform, interoperable, scalable and extensible framework supporting a novel multi- scale spatiotemporal dual-cache data structure for big data visualization and visual analytics. This paper focuses on the use of KOLAM for target tracking in high-resolution, high throughput wide format video also known as wide-area motion imagery (WAMI). It was originally developed for the interactive visualization of extremely large geospatial imagery of high spatial and spectral resolution. KOLAM is platform, operating system and (graphics) hardware independent, and supports embedded datasets scalable from hundreds of gigabytes to feasibly petabytes in size on clusters, workstations, desktops and mobile computers. In addition to rapid roam, zoom and hyper- jump spatial operations, a large number of simultaneously viewable embedded pyramid layers (also referred to as multiscale or sparse imagery), interactive colormap and histogram enhancement, spherical projection and terrain maps are supported. The KOLAM software architecture was extended to support airborne wide-area motion imagery by organizing spatiotemporal tiles in very large format video frames using a temporal cache of tiled pyramid cached data structures. The current version supports WAMI animation, fast intelligent inspection, trajectory visualization and target tracking (digital tagging); the latter by interfacing with external automatic tracking software. One of the critical needs for working with WAMI is a supervised tracking and visualization tool that allows analysts to digitally tag multiple targets, quickly review and correct tracking results and apply geospatial visual analytic tools on the generated trajectories. One-click manual tracking combined with multiple automated tracking algorithms are available to assist the analyst and increase human effectiveness.

  6. Scuba: scalable kernel-based gene prioritization.

    Science.gov (United States)

    Zampieri, Guido; Tran, Dinh Van; Donini, Michele; Navarin, Nicolò; Aiolli, Fabio; Sperduti, Alessandro; Valle, Giorgio

    2018-01-25

    The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba .

  7. Building scalable apps with Redis and Node.js

    CERN Document Server

    Johanan, Joshua

    2014-01-01

    If the phrase scalability sounds alien to you, then this is an ideal book for you. You will not need much Node.js experience as each framework is demonstrated in a way that requires no previous knowledge of the framework. You will be building scalable Node.js applications in no time! Knowledge of JavaScript is required.

  8. Scalable shared-memory multiprocessing

    CERN Document Server

    Lenoski, Daniel E

    1995-01-01

    Dr. Lenoski and Dr. Weber have experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. They were key contributors to the architecture and design of the DASH multiprocessor. Currently, they are involved with commercializing scalable shared-memory technology.

  9. From Open Source Software to Open Source Hardware

    OpenAIRE

    Viseur , Robert

    2012-01-01

    Part 2: Lightning Talks; International audience; The open source software principles progressively give rise to new initiatives for culture (free culture), data (open data) or hardware (open hardware). The open hardware is experiencing a significant growth but the business models and legal aspects are not well known. This paper is dedicated to the economics of open hardware. We define the open hardware concept and determine intellectual property tools we can apply to open hardware, with a str...

  10. Future directions in controlling the LAMPF-PSR accelerator complex at Los Alamos National Laboratory

    International Nuclear Information System (INIS)

    Stuewe, R.; Schaller, S.; Bjorklund, E.

    1992-01-01

    Four interrelated projects are underway whose purpose is to migrate the LAMPF-PSR Accelerator Complex control systems to a system with a common set of hardware and software components. Project goals address problems in performance, maintenance and growth potential. Front-end hardware, operator interface hardware and software, computer systems, network systems and data system software are being simultaneously upgraded as part of these efforts. The efforts are being coordinated to provide for a smooth and timely migration to a client-server model-based data acquisition and control system. An increased use of distributed intelligence at both the front-end and the operator interface is a key element of the projects. (author)

  11. KLYNAC: Compact linear accelerator with integrated power supply

    Energy Technology Data Exchange (ETDEWEB)

    Malyzhenkov, Alexander [Los Alamos National Lab. (LANL), Los Alamos, NM (United States)

    2017-05-16

    Accelerators and accelerator-based light sources have a wide range of applications in science, engineering technology and medicine. Today the scienti c community is working towards improving the quality of the accelerated beam and its parameters while trying to develop technology for reducing accelerator size. This work describes a design of a compact linear accelerator (linac) prototype, resonant Klynac device, which is a combined linear accelerator and its power supply - klystron. The intended purpose of a Klynac device is to provide a compact and inexpensive alternative to a conventional 1 to 6 MeV accelerator, which typically requires a separate RF source, an accelerator itself and all the associated hardware. Because the Klynac is a single structure, it has the potential to be much less sensitive to temperature variations than a system with separate klystron and linac. We start by introducing a simpli ed theoretical model for a Klynac device. We then demonstrate how a prototype is designed step-by-step using particle-in-cell simulation studies for mono- resonant and bi-resonant structures. Finally, we discuss design options from a stability point of view and required input power as well as behavior of competing modes for the actual built device.

  12. Requirements and solutions for accelerator control systems

    International Nuclear Information System (INIS)

    Anicic, D.; Blumer, T.; Jirousek, I.; Lutz, H.; Mezger, A.

    2001-01-01

    Throughout the life cycle of control systems, we are faced with the question of what fabulous new piece of hardware or software should be used and how to integrate this into a viable system. Accelerators cover a wide range, from simple cyclotrons for isotope production, to cascades of cyclotrons for variable energy and multiple particles, this precludes a standard answer for all cases. The system requirements according to the purpose and nature of the accelerator are analyzed and we try to extract some guidelines for implementation, development and maintenance of the appropriate control systems. We then try to analyze present trends in a selection of fields like operating systems, commercial systems, software sharing, field busses, etc

  13. Co-designed accelerator for homomorphic encryption applications

    Directory of Open Access Journals (Sweden)

    Asma Mkhinini

    2018-02-01

    Full Text Available Fully Homomorphic Encryption (FHE is considered as a key cryptographic tool in building a secure cloud computing environment since it allows computing arbitrary functions directly on encrypted data. However, existing FHE implementations remain impractical due to very high time and resource costs. These costs are essentially due to the computationally intensive modular polynomial multiplication. In this paper, we present a software/hardware co-designed modular polynomial multiplier in order to accelerate homomorphic schemes. The hardware part is implemented through a High-Level Synthesis (HLS flow. Experimental results show competitive latencies when compared with hand-made designs, while maintaining large advantages on resources. Moreover, we show that our high-level description can be easily configured with different parameters and very large sizes in negligible time, generating new designs for numerous applications.

  14. JPEG2000-Compatible Scalable Scheme for Wavelet-Based Video Coding

    Directory of Open Access Journals (Sweden)

    Thomas André

    2007-03-01

    Full Text Available We present a simple yet efficient scalable scheme for wavelet-based video coders, able to provide on-demand spatial, temporal, and SNR scalability, and fully compatible with the still-image coding standard JPEG2000. Whereas hybrid video coders must undergo significant changes in order to support scalability, our coder only requires a specific wavelet filter for temporal analysis, as well as an adapted bit allocation procedure based on models of rate-distortion curves. Our study shows that scalably encoded sequences have the same or almost the same quality than nonscalably encoded ones, without a significant increase in complexity. A full compatibility with Motion JPEG2000, which tends to be a serious candidate for the compression of high-definition video sequences, is ensured.

  15. JPEG2000-Compatible Scalable Scheme for Wavelet-Based Video Coding

    Directory of Open Access Journals (Sweden)

    André Thomas

    2007-01-01

    Full Text Available We present a simple yet efficient scalable scheme for wavelet-based video coders, able to provide on-demand spatial, temporal, and SNR scalability, and fully compatible with the still-image coding standard JPEG2000. Whereas hybrid video coders must undergo significant changes in order to support scalability, our coder only requires a specific wavelet filter for temporal analysis, as well as an adapted bit allocation procedure based on models of rate-distortion curves. Our study shows that scalably encoded sequences have the same or almost the same quality than nonscalably encoded ones, without a significant increase in complexity. A full compatibility with Motion JPEG2000, which tends to be a serious candidate for the compression of high-definition video sequences, is ensured.

  16. Development of Power System for Medium Energy Accelerator

    International Nuclear Information System (INIS)

    Kwon, Hyeok Jung; Kim, Dae Il; Kim, Han Sung; Seol, Kyung Tae; Jang, Ji Ho; Cho, Yong Sub; Hong, In Seok; Kim, Kyung Ryul

    2008-05-01

    The main goal of the studies are to develop a power supply system used for 100MeV proton accelerator and to operate 20MeV accelerator which has been installed in KAERI site. The 100MeV proton accelerator uses RF cavity to accelerate beams and need RF amplifier, klystron. To operate the klystron, a high power pulse power supply is required and the power supply system should have high quality because the reliability of the power supply has critical impact on the overall reliability of accelerator system. Therefore, high power pulse power system and related technology development are inevitable for 100MeV accelerator system development. 20MeV accelerator system has been developed and installed in KAERI site, which will be used as an injector for 100MeV accelerator and supply 20MeV beam to users. A study on the 20MeV accelerator characteristics should be performed to operate the machine efficiently. In addition, this machine can be used as a test bench for developing the 100MeV accelerator components. Therefore, not only the hardware so called 'high voltage power supply', but the related technology of the high quality high voltage power system and man power can be obtained from the results of this studies. The test results of the 20MeV accelerator can be utilized as a basis for efficient operation of 100MeV accelerator and these are the ultimate objective and necessities of the study

  17. PCIU: Hardware Implementations of an Efficient Packet Classification Algorithm with an Incremental Update Capability

    Directory of Open Access Journals (Sweden)

    O. Ahmed

    2011-01-01

    Full Text Available Packet classification plays a crucial role for a number of network services such as policy-based routing, firewalls, and traffic billing, to name a few. However, classification can be a bottleneck in the above-mentioned applications if not implemented properly and efficiently. In this paper, we propose PCIU, a novel classification algorithm, which improves upon previously published work. PCIU provides lower preprocessing time, lower memory consumption, ease of incremental rule update, and reasonable classification time compared to state-of-the-art algorithms. The proposed algorithm was evaluated and compared to RFC and HiCut using several benchmarks. Results obtained indicate that PCIU outperforms these algorithms in terms of speed, memory usage, incremental update capability, and preprocessing time. The algorithm, furthermore, was improved and made more accessible for a variety of applications through implementation in hardware. Two such implementations are detailed and discussed in this paper. The results indicate that a hardware/software codesign approach results in a slower, but easier to optimize and improve within time constraints, PCIU solution. A hardware accelerator based on an ESL approach using Handel-C, on the other hand, resulted in a 31x speed-up over a pure software implementation running on a state of the art Xeon processor.

  18. Run-Time Scalable Hardware for Reconfigurable Systems

    OpenAIRE

    Otero Marnotes, Andres

    2014-01-01

    La optimización de parámetros tales como el consumo de potencia, la cantidad de recursos lógicos empleados o la ocupación de memoria ha sido siempre una de las preocupaciones principales a la hora de diseñar sistemas embebidos. Esto es debido a que se trata de sistemas dotados de una cantidad de recursos limitados, y que han sido tradicionalmente empleados para un propósito específico, que permanece invariable a lo largo de toda la vida útil del sistema. Sin embargo, el uso de sistemas embebi...

  19. ZEUS hardware control system

    Science.gov (United States)

    Loveless, R.; Erhard, P.; Ficenec, J.; Gather, K.; Heath, G.; Iacovacci, M.; Kehres, J.; Mobayyen, M.; Notz, D.; Orr, R.; Orr, R.; Sephton, A.; Stroili, R.; Tokushuku, K.; Vogel, W.; Whitmore, J.; Wiggers, L.

    1989-12-01

    The ZEUS collaboration is building a system to monitor, control and document the hardware of the ZEUS detector. This system is based on a network of VAX computers and microprocessors connected via ethernet. The database for the hardware values will be ADAMO tables; the ethernet connection will be DECNET, TCP/IP, or RPC. Most of the documentation will also be kept in ADAMO tables for easy access by users.

  20. ZEUS hardware control system

    International Nuclear Information System (INIS)

    Loveless, R.; Erhard, P.; Ficenec, J.; Gather, K.; Heath, G.; Iacovacci, M.; Kehres, J.; Mobayyen, M.; Notz, D.; Orr, R.; Sephton, A.; Stroili, R.; Tokushuku, K.; Vogel, W.; Whitmore, J.; Wiggers, L.

    1989-01-01

    The ZEUS collaboration is building a system to monitor, control and document the hardware of the ZEUS detector. This system is based on a network of VAX computers and microprocessors connected via ethernet. The database for the hardware values will be ADAMO tables; the ethernet connection will be DECNET, TCP/IP, or RPC. Most of the documentation will also be kept in ADAMO tables for easy access by users. (orig.)

  1. A new approach to modeling linear accelerator systems

    International Nuclear Information System (INIS)

    Gillespie, G.H.; Hill, B.W.; Jameson, R.A.

    1994-01-01

    A novel computer code is being developed to generate system level designs of radiofrequency ion accelerators with specific applications to machines of interest to Accelerator Driven Transmutation Technologies (ADTT). The goal of the Accelerator System Model (ASM) code is to create a modeling and analysis tool that is easy to use, automates many of the initial design calculations, supports trade studies used in accessing alternate designs and yet is flexible enough to incorporate new technology concepts as they emerge. Hardware engineering parameters and beam dynamics are to be modeled at comparable levels of fidelity. Existing scaling models of accelerator subsystems were used to produce a prototype of ASM (version 1.0) working within the Shell for Particle Accelerator Related Code (SPARC) graphical user interface. A small user group has been testing and evaluating the prototype for about a year. Several enhancements and improvements are now being developed. The current version of ASM is described and examples of the modeling and analysis capabilities are illustrated. The results of an example study, for an accelerator concept typical of ADTT applications, is presented and sample displays from the computer interface are shown

  2. Declarative and Scalable Selection for Map Visualizations

    DEFF Research Database (Denmark)

    Kefaloukos, Pimin Konstantin Balic

    and is itself a source and cause of prolific data creation. This calls for scalable map processing techniques that can handle the data volume and which play well with the predominant data models on the Web. (4) Maps are now consumed around the clock by a global audience. While historical maps were singleuser......-defined constraints as well as custom objectives. The purpose of the language is to derive a target multi-scale database from a source database according to holistic specifications. (b) The Glossy SQL compiler allows Glossy SQL to be scalably executed in a spatial analytics system, such as a spatial relational......, there are indications that the method is scalable for databases that contain millions of records, especially if the target language of the compiler is substituted by a cluster-ready variant of SQL. While several realistic use cases for maps have been implemented in CVL, additional non-geographic data visualization uses...

  3. Scalable robotic biofabrication of tissue spheroids

    International Nuclear Information System (INIS)

    Mehesz, A Nagy; Hajdu, Z; Visconti, R P; Markwald, R R; Mironov, V; Brown, J; Beaver, W; Da Silva, J V L

    2011-01-01

    Development of methods for scalable biofabrication of uniformly sized tissue spheroids is essential for tissue spheroid-based bioprinting of large size tissue and organ constructs. The most recent scalable technique for tissue spheroid fabrication employs a micromolded recessed template prepared in a non-adhesive hydrogel, wherein the cells loaded into the template self-assemble into tissue spheroids due to gravitational force. In this study, we present an improved version of this technique. A new mold was designed to enable generation of 61 microrecessions in each well of a 96-well plate. The microrecessions were seeded with cells using an EpMotion 5070 automated pipetting machine. After 48 h of incubation, tissue spheroids formed at the bottom of each microrecession. To assess the quality of constructs generated using this technology, 600 tissue spheroids made by this method were compared with 600 spheroids generated by the conventional hanging drop method. These analyses showed that tissue spheroids fabricated by the micromolded method are more uniform in diameter. Thus, use of micromolded recessions in a non-adhesive hydrogel, combined with automated cell seeding, is a reliable method for scalable robotic fabrication of uniform-sized tissue spheroids.

  4. Scalable robotic biofabrication of tissue spheroids

    Energy Technology Data Exchange (ETDEWEB)

    Mehesz, A Nagy; Hajdu, Z; Visconti, R P; Markwald, R R; Mironov, V [Advanced Tissue Biofabrication Center, Department of Regenerative Medicine and Cell Biology, Medical University of South Carolina, Charleston, SC (United States); Brown, J [Department of Mechanical Engineering, Clemson University, Clemson, SC (United States); Beaver, W [York Technical College, Rock Hill, SC (United States); Da Silva, J V L, E-mail: mironovv@musc.edu [Renato Archer Information Technology Center-CTI, Campinas (Brazil)

    2011-06-15

    Development of methods for scalable biofabrication of uniformly sized tissue spheroids is essential for tissue spheroid-based bioprinting of large size tissue and organ constructs. The most recent scalable technique for tissue spheroid fabrication employs a micromolded recessed template prepared in a non-adhesive hydrogel, wherein the cells loaded into the template self-assemble into tissue spheroids due to gravitational force. In this study, we present an improved version of this technique. A new mold was designed to enable generation of 61 microrecessions in each well of a 96-well plate. The microrecessions were seeded with cells using an EpMotion 5070 automated pipetting machine. After 48 h of incubation, tissue spheroids formed at the bottom of each microrecession. To assess the quality of constructs generated using this technology, 600 tissue spheroids made by this method were compared with 600 spheroids generated by the conventional hanging drop method. These analyses showed that tissue spheroids fabricated by the micromolded method are more uniform in diameter. Thus, use of micromolded recessions in a non-adhesive hydrogel, combined with automated cell seeding, is a reliable method for scalable robotic fabrication of uniform-sized tissue spheroids.

  5. Engineering research and development for the Elise Heavy Ion Induction Accelerator

    International Nuclear Information System (INIS)

    Reginato, L.; Peters, C.

    1995-08-01

    The Fusion Energy Research engineering team has been conducting Research and Development Associated with the Construction (RDAC) of the Elise accelerator since the approval of Key Decision one (KD1 is start of construction). The engineering design effort has worked in close cooperation with the physics design staff to achieve all parameters of the Elise accelerator. The design included the 2 MV injector, matching section, combiner, induction cells, electric/magnetic quadrupoles, alignment system and controls. All major designs and some hardware testing will be discussed

  6. Engineering research and development for the Elise heavy ion induction accelerator

    International Nuclear Information System (INIS)

    Reginato, L.; Peters, C.

    1996-01-01

    The fusion energy research engineering team has been conducting research and development associated with the construction of the Elise accelerator since the approval of key decision 1 (this is the start of construction). The engineering design effort has worked in close cooperation with the physics design staff to achieve all parameters of the Elise accelerator. The design included the 2 MV injector, matching section, combiner, induction cells, electric-magnetic quadrupoles, alignment system and controls. All major designs and some hardware testing will be discussed. (orig.)

  7. Architectures and Applications for Scalable Quantum Information Systems

    Science.gov (United States)

    2007-01-01

    Gershenfeld and I. Chuang. Quantum computing with molecules. Scientific American, June 1998. [16] A. Globus, D. Bailey, J. Han, R. Jaffe, C. Levit , R...AFRL-IF-RS-TR-2007-12 Final Technical Report January 2007 ARCHITECTURES AND APPLICATIONS FOR SCALABLE QUANTUM INFORMATION SYSTEMS...NUMBER 5b. GRANT NUMBER FA8750-01-2-0521 4. TITLE AND SUBTITLE ARCHITECTURES AND APPLICATIONS FOR SCALABLE QUANTUM INFORMATION SYSTEMS 5c

  8. Extending JPEG-LS for low-complexity scalable video coding

    DEFF Research Database (Denmark)

    Ukhanova, Anna; Sergeev, Anton; Forchhammer, Søren

    2011-01-01

    JPEG-LS, the well-known international standard for lossless and near-lossless image compression, was originally designed for non-scalable applications. In this paper we propose a scalable modification of JPEG-LS and compare it with the leading image and video coding standards JPEG2000 and H.264/SVC...

  9. Application of the UKP-2-1 accelerator of heavy ions in the field of nuclear and radiation physics. Chapter 2

    International Nuclear Information System (INIS)

    2003-01-01

    The UKP-2-1 accelerator is intended for research works conducting in the field of solid state physics, low energy nuclear physics, nuclear microanalysis, materials modification and others. The accelerator includes two autonomous beam transporting channels jointed by one accelerating potential. One of the channel is intended for hydrogen and inert gases' ions acceleration, obtained from duoplasmatron. The second one includes the source with cesium dispersion and it is intended for heavy ions acceleration. On the base of the accelerator the set of the analytical methods such as PIXE, RBS, NRA were developed allowing to study of samples element content, distribution of elements by depth, analysis of thin films thickness. The accelerator intensively using in the filed of inertial nuclear fusion and studies on Coulomb energy losses of plasma target fast protons. The experience of the accelerator in different environmental researches is gained as well. In particular of deuterium determination in the water samples by the nuclear reaction method and study of plutonium and uranium distribution in 'hot' particles by the proton-induced X-ray method are developed. Beginning of 1999 on the accelerator a new research activity trend related with nuclear physical analysis methods adaptation on charged particles beams for study of a biological objects has been developed. At present the accelerator hardware does not concedes to hardware of the best world laboratories

  10. NDAS Hardware Translation Layer Development

    Science.gov (United States)

    Nazaretian, Ryan N.; Holladay, Wendy T.

    2011-01-01

    The NASA Data Acquisition System (NDAS) project is aimed to replace all DAS software for NASA s Rocket Testing Facilities. There must be a software-hardware translation layer so the software can properly talk to the hardware. Since the hardware from each test stand varies, drivers for each stand have to be made. These drivers will act more like plugins for the software. If the software is being used in E3, then the software should point to the E3 driver package. If the software is being used at B2, then the software should point to the B2 driver package. The driver packages should also be filled with hardware drivers that are universal to the DAS system. For example, since A1, A2, and B2 all use the Preston 8300AU signal conditioners, then the driver for those three stands should be the same and updated collectively.

  11. Hardware standardization for embedded systems

    International Nuclear Information System (INIS)

    Sharma, M.K.; Kalra, Mohit; Patil, M.B.; Mohanty, Ashutos; Ganesh, G.; Biswas, B.B.

    2010-01-01

    Reactor Control Division (RCnD) has been one of the main designers of safety and safety related systems for power reactors. These systems have been built using in-house developed hardware. Since the present set of hardware was designed long ago, a need was felt to design a new family of hardware boards. A Working Group on Electronics Hardware Standardization (WG-EHS) was formed with an objective to develop a family of boards, which is general purpose enough to meet the requirements of the system designers/end users. RCnD undertook the responsibility of design, fabrication and testing of boards for embedded systems. VME and a proprietary I/O bus were selected as the two system buses. The boards have been designed based on present day technology and components. The intelligence of these boards has been implemented on FPGA/CPLD using VHDL. This paper outlines the various boards that have been developed with a brief description. (author)

  12. Hardware for dynamic quantum computing.

    Science.gov (United States)

    Ryan, Colm A; Johnson, Blake R; Ristè, Diego; Donovan, Brian; Ohki, Thomas A

    2017-10-01

    We describe the hardware, gateware, and software developed at Raytheon BBN Technologies for dynamic quantum information processing experiments on superconducting qubits. In dynamic experiments, real-time qubit state information is fed back or fed forward within a fraction of the qubits' coherence time to dynamically change the implemented sequence. The hardware presented here covers both control and readout of superconducting qubits. For readout, we created a custom signal processing gateware and software stack on commercial hardware to convert pulses in a heterodyne receiver into qubit state assignments with minimal latency, alongside data taking capability. For control, we developed custom hardware with gateware and software for pulse sequencing and steering information distribution that is capable of arbitrary control flow in a fraction of superconducting qubit coherence times. Both readout and control platforms make extensive use of field programmable gate arrays to enable tailored qubit control systems in a reconfigurable fabric suitable for iterative development.

  13. Broadband accelerator control network

    International Nuclear Information System (INIS)

    Skelly, J.; Clifford, T.; Frankel, R.

    1983-01-01

    A broadband data communications network has been implemented at BNL for control of the Alternating Gradient Synchrotron (AG) proton accelerator, using commercial CATV hardware, dual coaxial cables as the communications medium, and spanning 2.0 km. A 4 MHz bandwidth Digital Control channel using CSMA-CA protocol is provided for digital data transmission, with 8 access nodes available over the length of the RELWAY. Each node consists of an rf modem and a microprocessor-based store-and-forward message handler which interfaces the RELWAY to a branch line implemented in GPIB. A gateway to the RELWAY control channel for the (preexisting) AGS Computerized Accelerator Operating system has been constructed using an LSI-11/23 microprocessor as a device in a GPIB branch line. A multilayer communications protocol has been defined for the Digital Control Channel, based on the ISO Open Systems Interconnect layered model, and a RELWAY Device Language defined as the required universal language for device control on this channel

  14. A robust and scalable neuromorphic communication system by combining synaptic time multiplexing and MIMO-OFDM.

    Science.gov (United States)

    Srinivasa, Narayan; Zhang, Deying; Grigorian, Beayna

    2014-03-01

    This paper describes a novel architecture for enabling robust and efficient neuromorphic communication. The architecture combines two concepts: 1) synaptic time multiplexing (STM) that trades space for speed of processing to create an intragroup communication approach that is firing rate independent and offers more flexibility in connectivity than cross-bar architectures and 2) a wired multiple input multiple output (MIMO) communication with orthogonal frequency division multiplexing (OFDM) techniques to enable a robust and efficient intergroup communication for neuromorphic systems. The MIMO-OFDM concept for the proposed architecture was analyzed by simulating large-scale spiking neural network architecture. Analysis shows that the neuromorphic system with MIMO-OFDM exhibits robust and efficient communication while operating in real time with a high bit rate. Through combining STM with MIMO-OFDM techniques, the resulting system offers a flexible and scalable connectivity as well as a power and area efficient solution for the implementation of very large-scale spiking neural architectures in hardware.

  15. Hardware device binding and mutual authentication

    Science.gov (United States)

    Hamlet, Jason R; Pierson, Lyndon G

    2014-03-04

    Detection and deterrence of device tampering and subversion by substitution may be achieved by including a cryptographic unit within a computing device for binding multiple hardware devices and mutually authenticating the devices. The cryptographic unit includes a physically unclonable function ("PUF") circuit disposed in or on the hardware device, which generates a binding PUF value. The cryptographic unit uses the binding PUF value during an enrollment phase and subsequent authentication phases. During a subsequent authentication phase, the cryptographic unit uses the binding PUF values of the multiple hardware devices to generate a challenge to send to the other device, and to verify a challenge received from the other device to mutually authenticate the hardware devices.

  16. Microprocessor-based accelerating power level detector

    Energy Technology Data Exchange (ETDEWEB)

    Nagpal, M.; Zarecki, W.; Albrecht, J.C.

    1994-01-01

    An accelerating power level detector was built using state-of-the-art microprocessor technology at Powertech Labs Inc. The detector will monitor the real power flowing in two 300 kV transmission lines out of Kemano Hydroelectric Generating Station and will detect any sudden loss of load due to a fault on either line under certain pre-selected power flow conditions. This paper discusses the criteria of operation for the detector and its implementation details, including digital processing, hardware, and software.

  17. Secure coupling of hardware components

    NARCIS (Netherlands)

    Hoepman, J.H.; Joosten, H.J.M.; Knobbe, J.W.

    2011-01-01

    A method and a system for securing communication between at least a first and a second hardware components of a mobile device is described. The method includes establishing a first shared secret between the first and the second hardware components during an initialization of the mobile device and,

  18. Hardware-Efficient On-line Learning through Pipelined Truncated-Error Backpropagation in Binary-State Networks

    Directory of Open Access Journals (Sweden)

    Hesham Mostafa

    2017-09-01

    Full Text Available Artificial neural networks (ANNs trained using backpropagation are powerful learning architectures that have achieved state-of-the-art performance in various benchmarks. Significant effort has been devoted to developing custom silicon devices to accelerate inference in ANNs. Accelerating the training phase, however, has attracted relatively little attention. In this paper, we describe a hardware-efficient on-line learning technique for feedforward multi-layer ANNs that is based on pipelined backpropagation. Learning is performed in parallel with inference in the forward pass, removing the need for an explicit backward pass and requiring no extra weight lookup. By using binary state variables in the feedforward network and ternary errors in truncated-error backpropagation, the need for any multiplications in the forward and backward passes is removed, and memory requirements for the pipelining are drastically reduced. Further reduction in addition operations owing to the sparsity in the forward neural and backpropagating error signal paths contributes to highly efficient hardware implementation. For proof-of-concept validation, we demonstrate on-line learning of MNIST handwritten digit classification on a Spartan 6 FPGA interfacing with an external 1Gb DDR2 DRAM, that shows small degradation in test error performance compared to an equivalently sized binary ANN trained off-line using standard back-propagation and exact errors. Our results highlight an attractive synergy between pipelined backpropagation and binary-state networks in substantially reducing computation and memory requirements, making pipelined on-line learning practical in deep networks.

  19. Hardware-Efficient On-line Learning through Pipelined Truncated-Error Backpropagation in Binary-State Networks.

    Science.gov (United States)

    Mostafa, Hesham; Pedroni, Bruno; Sheik, Sadique; Cauwenberghs, Gert

    2017-01-01

    Artificial neural networks (ANNs) trained using backpropagation are powerful learning architectures that have achieved state-of-the-art performance in various benchmarks. Significant effort has been devoted to developing custom silicon devices to accelerate inference in ANNs. Accelerating the training phase, however, has attracted relatively little attention. In this paper, we describe a hardware-efficient on-line learning technique for feedforward multi-layer ANNs that is based on pipelined backpropagation. Learning is performed in parallel with inference in the forward pass, removing the need for an explicit backward pass and requiring no extra weight lookup. By using binary state variables in the feedforward network and ternary errors in truncated-error backpropagation, the need for any multiplications in the forward and backward passes is removed, and memory requirements for the pipelining are drastically reduced. Further reduction in addition operations owing to the sparsity in the forward neural and backpropagating error signal paths contributes to highly efficient hardware implementation. For proof-of-concept validation, we demonstrate on-line learning of MNIST handwritten digit classification on a Spartan 6 FPGA interfacing with an external 1Gb DDR2 DRAM, that shows small degradation in test error performance compared to an equivalently sized binary ANN trained off-line using standard back-propagation and exact errors. Our results highlight an attractive synergy between pipelined backpropagation and binary-state networks in substantially reducing computation and memory requirements, making pipelined on-line learning practical in deep networks.

  20. Traffic and Quality Characterization of the H.264/AVC Scalable Video Coding Extension

    Directory of Open Access Journals (Sweden)

    Geert Van der Auwera

    2008-01-01

    Full Text Available The recent scalable video coding (SVC extension to the H.264/AVC video coding standard has unprecedented compression efficiency while supporting a wide range of scalability modes, including temporal, spatial, and quality (SNR scalability, as well as combined spatiotemporal SNR scalability. The traffic characteristics, especially the bit rate variabilities, of the individual layer streams critically affect their network transport. We study the SVC traffic statistics, including the bit rate distortion and bit rate variability distortion, with long CIF resolution video sequences and compare them with the corresponding MPEG-4 Part 2 traffic statistics. We consider (i temporal scalability with three temporal layers, (ii spatial scalability with a QCIF base layer and a CIF enhancement layer, as well as (iii quality scalability modes FGS and MGS. We find that the significant improvement in RD efficiency of SVC is accompanied by substantially higher traffic variabilities as compared to the equivalent MPEG-4 Part 2 streams. We find that separately analyzing the traffic of temporal-scalability only encodings gives reasonable estimates of the traffic statistics of the temporal layers embedded in combined spatiotemporal encodings and in the base layer of combined FGS-temporal encodings. Overall, we find that SVC achieves significantly higher compression ratios than MPEG-4 Part 2, but produces unprecedented levels of traffic variability, thus presenting new challenges for the network transport of scalable video.

  1. IKONET: distributed accelerator and experiment control

    International Nuclear Information System (INIS)

    Koldewijn, P.

    1986-01-01

    IKONET is a network consisting of some 35 computers used to control the 500 MeV Medium Energy Amsterdam electron accelerator (MEA) and its various experiments. The control system is distributed over a whole variety of machines, which are combined in a transparent central-oriented network. The local hardware is switched and tuned via Camac by a series of mini-computers with a real-time multitask operating system. Larger systems provide central intelligence for the higher-level control layers. An image of the complete accelerator settings is maintained by central database administrators. Different operator facilities handle touchpanels, multi-purpose knobs and graphical displays. The network provides remote login facilities and file servers. On basis of the present layout, an overview is given of future developments for subsystems of the network. (Auth.)

  2. Scalable, full-colour and controllable chromotropic plasmonic printing

    OpenAIRE

    Xue, Jiancai; Zhou, Zhang-Kai; Wei, Zhiqiang; Su, Rongbin; Lai, Juan; Li, Juntao; Li, Chao; Zhang, Tengwei; Wang, Xue-Hua

    2015-01-01

    Plasmonic colour printing has drawn wide attention as a promising candidate for the next-generation colour-printing technology. However, an efficient approach to realize full colour and scalable fabrication is still lacking, which prevents plasmonic colour printing from practical applications. Here we present a scalable and full-colour plasmonic printing approach by combining conjugate twin-phase modulation with a plasmonic broadband absorber. More importantly, our approach also demonstrates ...

  3. Future directions in controlling the LAMPF-PSR Accelerator Complex at Los Alamos National Laboratory

    International Nuclear Information System (INIS)

    Stuewe, R.; Schaller, S.; Bjorklund, E.; Burns, M.; Callaway, T.; Carr, G.; Cohen, S.; Kubicek, D.; Harrington, M.; Poore, R.; Schultz, D.

    1991-01-01

    Four interrelated projects are underway whose purpose is to migrate the LAMPF-PSR Accelerator Complex control systems to a system with a common set of hardware and software components. Project goals address problems in performance, maintenance and growth potential. Front-end hardware, operator interface hardware and software, computer systems, network systems and data system software are being simultaneously upgraded as part of these efforts. The efforts are being coordinated to provide for a smooth and timely migration to a client-sever model-based data acquisition and control system. An increased use of the distributed intelligence at both the front-end and operator interface is a key element of the projects. 2 refs., 2 figs

  4. Temporal scalability comparison of the H.264/SVC and distributed video codec

    DEFF Research Database (Denmark)

    Huang, Xin; Ukhanova, Ann; Belyaev, Evgeny

    2009-01-01

    The problem of the multimedia scalable video streaming is a current topic of interest. There exist many methods for scalable video coding. This paper is focused on the scalable extension of H.264/AVC (H.264/SVC) and distributed video coding (DVC). The paper presents an efficiency comparison of SV...

  5. Scalable and near-optimal design space exploration for embedded systems

    CERN Document Server

    Kritikakou, Angeliki; Goutis, Costas

    2014-01-01

    This book describes scalable and near-optimal, processor-level design space exploration (DSE) methodologies.  The authors present design methodologies for data storage and processing in real-time, cost-sensitive data-dominated embedded systems.  Readers will be enabled to reduce time-to-market, while satisfying system requirements for performance, area, and energy consumption, thereby minimizing the overall cost of the final design.   • Describes design space exploration (DSE) methodologies for data storage and processing in embedded systems, which achieve near-optimal solutions with scalable exploration time; • Presents a set of principles and the processes which support the development of the proposed scalable and near-optimal methodologies; • Enables readers to apply scalable and near-optimal methodologies to the intra-signal in-place optimization step for both regular and irregular memory accesses.

  6. Real-Time Simulation and Hardware-in-the-Loop Testbed for Distribution Synchrophasor Applications

    Directory of Open Access Journals (Sweden)

    Matthias Stifter

    2018-04-01

    Full Text Available With the advent of Distribution Phasor Measurement Units (D-PMUs and Micro-Synchrophasors (Micro-PMUs, the situational awareness in power distribution systems is going to the next level using time-synchronization. However, designing, analyzing, and testing of such accurate measurement devices are still challenging. Due to the lack of available knowledge and sufficient history for synchrophasors’ applications at the power distribution level, the realistic simulation, and validation environments are essential for D-PMU development and deployment. This paper presents a vendor agnostic PMU real-time simulation and hardware-in-the-Loop (PMU-RTS-HIL testbed, which helps in multiple PMUs validation and studies. The network of real and virtual PMUs was built in a full time-synchronized environment for PMU applications’ validation. The proposed testbed also includes an emulated communication network (CNS layer to replicate bandwidth, packet loss and collisions conditions inherent to the PMUs data streams’ issues. Experimental results demonstrate the flexibility and scalability of the developed PMU-RTS-HIL testbed by producing large amounts of measurements under typical normal and abnormal distribution grid operation conditions.

  7. Software performance and scalability a quantitative approach

    CERN Document Server

    Liu, Henry H

    2009-01-01

    Praise from the Reviewers:"The practicality of the subject in a real-world situation distinguishes this book from othersavailable on the market."—Professor Behrouz Far, University of Calgary"This book could replace the computer organization texts now in use that every CS and CpEstudent must take. . . . It is much needed, well written, and thoughtful."—Professor Larry Bernstein, Stevens Institute of TechnologyA distinctive, educational text onsoftware performance and scalabilityThis is the first book to take a quantitative approach to the subject of software performance and scalability

  8. A Scalable proxy cache for Grid Data Access

    International Nuclear Information System (INIS)

    Cristian Cirstea, Traian; Just Keijser, Jan; Arthur Koeroo, Oscar; Starink, Ronald; Alan Templon, Jeffrey

    2012-01-01

    We describe a prototype grid proxy cache system developed at Nikhef, motivated by a desire to construct the first building block of a future https-based Content Delivery Network for grid infrastructures. Two goals drove the project: firstly to provide a “native view” of the grid for desktop-type users, and secondly to improve performance for physics-analysis type use cases, where multiple passes are made over the same set of data (residing on the grid). We further constrained the design by requiring that the system should be made of standard components wherever possible. The prototype that emerged from this exercise is a horizontally-scalable, cooperating system of web server / cache nodes, fronted by a customized webDAV server. The webDAV server is custom only in the sense that it supports http redirects (providing horizontal scaling) and that the authentication module has, as back end, a proxy delegation chain that can be used by the cache nodes to retrieve files from the grid. The prototype was deployed at Nikhef and tested at a scale of several terabytes of data and approximately one hundred fast cores of computing. Both small and large files were tested, in a number of scenarios, and with various numbers of cache nodes, in order to understand the scaling properties of the system. For properly-dimensioned cache-node hardware, the system showed speedup of several integer factors for the analysis-type use cases. These results and others are presented and discussed.

  9. High intensity proton accelerator controls network upgrade

    International Nuclear Information System (INIS)

    Krempaska, R.; Bertrand, A.; Lendzian, F.; Lutz, H.

    2012-01-01

    The High Intensity Proton Accelerator (HIPA) control system network is spread through a vast area in PSI and it was grown historically in an unorganized way. The miscellaneous network hardware infrastructure and the lack of the documentation and components overview could no longer guarantee the reliability of the control system and the facility operation. Therefore, a new network, based on modern network topology, PSI standard hardware with monitoring and detailed documentation and overview was needed. The number of active components has been reduced from 25 to 9 Cisco Catalyst 24- or 48-port switches. They are the same type as other PSI switches, thus a replacement emergency stock is not an issue anymore. We would like to present how we successfully achieved this goal and the advantages of the clean and well documented network infrastructure. (authors)

  10. High intensity proton acceleration at the Brookhaven AGS -- An update

    International Nuclear Information System (INIS)

    Ahrens, L.; Alessi, J.; Blaskiewicz, M.

    1997-01-01

    The AGS accelerator complex is into its third year of 60+ x 10 12 (teraproton = Tp) per cycle operation. The hardware making up the complex as configured in 1997 is briefly mentioned. The present level of accelerator performance is discussed. This includes beam transfer efficiencies at each step in the acceleration process, i.e. losses; which are a serious issue at this intensity level. Progress made in understanding beam behavior at the Linac-to-Booster (LtB) injection, at the Booster-to-AGS (BtA) transfer as well as across the 450 ms AGS accumulation porch is presented. The state of transition crossing, with the gamma-tr jump is described. Coherent effects including those driven by space charge are important at all of these steps

  11. Generic algorithms for high performance scalable geocomputing

    Science.gov (United States)

    de Jong, Kor; Schmitz, Oliver; Karssenberg, Derek

    2016-04-01

    During the last decade, the characteristics of computing hardware have changed a lot. For example, instead of a single general purpose CPU core, personal computers nowadays contain multiple cores per CPU and often general purpose accelerators, like GPUs. Additionally, compute nodes are often grouped together to form clusters or a supercomputer, providing enormous amounts of compute power. For existing earth simulation models to be able to use modern hardware platforms, their compute intensive parts must be rewritten. This can be a major undertaking and may involve many technical challenges. Compute tasks must be distributed over CPU cores, offloaded to hardware accelerators, or distributed to different compute nodes. And ideally, all of this should be done in such a way that the compute task scales well with the hardware resources. This presents two challenges: 1) how to make good use of all the compute resources and 2) how to make these compute resources available for developers of simulation models, who may not (want to) have the required technical background for distributing compute tasks. The first challenge requires the use of specialized technology (e.g.: threads, OpenMP, MPI, OpenCL, CUDA). The second challenge requires the abstraction of the logic handling the distribution of compute tasks from the model-specific logic, hiding the technical details from the model developer. To assist the model developer, we are developing a C++ software library (called Fern) containing algorithms that can use all CPU cores available in a single compute node (distributing tasks over multiple compute nodes will be done at a later stage). The algorithms are grid-based (finite difference) and include local and spatial operations such as convolution filters. The algorithms handle distribution of the compute tasks to CPU cores internally. In the resulting model the low-level details of how this is done is separated from the model-specific logic representing the modeled system

  12. A SOPC-BASED Evaluation of AES for 2.4 GHz Wireless Network

    Science.gov (United States)

    Ken, Cai; Xiaoying, Liang

    In modern systems, data security is needed more than ever before and many cryptographic algorithms are utilized for security services. Wireless Sensor Networks (WSN) is an example of such technologies. In this paper an innovative SOPC-based approach for the security services evaluation in WSN is proposed that addresses the issues of scalability, flexible performance, and silicon efficiency for the hardware acceleration of encryption system. The design includes a Nios II processor together with custom designed modules for the Advanced Encryption Standard (AES) which has become the default choice for various security services in numerous applications. The objective of this mechanism is to present an efficient hardware realization of AES using very high speed integrated circuit hardware description language (Verilog HDL) and expand the usability for various applications. As compared to traditional customize processor design, the mechanism provides a very broad range of cost/performance points.

  13. The Impact of Flight Hardware Scavenging on Space Logistics

    Science.gov (United States)

    Oeftering, Richard C.

    2011-01-01

    For a given fixed launch vehicle capacity the logistics payload delivered to the moon may be only roughly 20 percent of the payload delivered to the International Space Station (ISS). This is compounded by the much lower flight frequency to the moon and thus low availability of spares for maintenance. This implies that lunar hardware is much more scarce and more costly per kilogram than ISS and thus there is much more incentive to preserve hardware. The Constellation Lunar Surface System (LSS) program is considering ways of utilizing hardware scavenged from vehicles including the Altair lunar lander. In general, the hardware will have only had a matter of hours of operation yet there may be years of operational life remaining. By scavenging this hardware the program, in effect, is treating vehicle hardware as part of the payload. Flight hardware may provide logistics spares for system maintenance and reduce the overall logistics footprint. This hardware has a wide array of potential applications including expanding the power infrastructure, and exploiting in-situ resources. Scavenging can also be seen as a way of recovering the value of, literally, billions of dollars worth of hardware that would normally be discarded. Scavenging flight hardware adds operational complexity and steps must be taken to augment the crew s capability with robotics, capabilities embedded in flight hardware itself, and external processes. New embedded technologies are needed to make hardware more serviceable and scavengable. Process technologies are needed to extract hardware, evaluate hardware, reconfigure or repair hardware, and reintegrate it into new applications. This paper also illustrates how scavenging can be used to drive down the cost of the overall program by exploiting the intrinsic value of otherwise discarded flight hardware.

  14. Dealing with post-accelerated electrons in the ITER SINGAP accelerator

    International Nuclear Information System (INIS)

    Esch, H. de; Hemsworth, R.S.

    2006-01-01

    Electrons formed by stripping of the negative deuterium beam can be accelerated up to 960 keV in the 1 MeV SINGAP 40 A negative ion accelerator proposed by Europe for the ITER neutral beam injectors. SINGAP accelerates 1280 pre-accelerated 40 keV deuterium beamlets to 1 MeV in a single 350 mm wide gap. At the expected gas pressure of 0.03 Pa inside the accelerator, 2.7 MW of electrons are calculated to leave the accelerator and strike various beamline components, especially the neutraliser. The accelerators of the ITER injectors are designed to produce 4 '' column '' beams which pass through the 4 vertical channels of the neutraliser. Unperturbed the accelerated electrons create small, high power density, 3.3 kW/cm 2 , spots on the leading edges of the neutraliser channels, which is far in excess of their power handling capability. The hot spots arise from the overlapping of beamlets due to the bending induced by the far field of the magnetic filter in the ion source. The proposed solution bends the electrons further downwards, redistributing the power over the neutraliser floor, a vertical electron dump perpendicular to the beam axis located below the neutraliser entrance, and the neutraliser entrance. The bending is to be effected by a magnetic field transverse to the beam direction at the exit of the post-acceleration grid. This field is created by vertical columns of permanent magnets either side of each column beam. After passing between the magnet columns, the electron beams reach the electron dump with a maximum power density of 2.1 kW/cm 2 . The peak power density on the neutraliser entrance is 1.35 kW/cm 2 and on the neutraliser floor 0.82 kW/cm 2 . Electron backscattering would reduce all the numbers by 20%. To further reduce the average power density seen by the beamline components it is proposed to sweep the electron beam in an oscillatory fashion. It is suggested that a failsafe, inexpensive, way is to use a power supply with a ripple of ± 10% to

  15. Quality Scalability Compression on Single-Loop Solution in HEVC

    Directory of Open Access Journals (Sweden)

    Mengmeng Zhang

    2014-01-01

    Full Text Available This paper proposes a quality scalable extension design for the upcoming high efficiency video coding (HEVC standard. In the proposed design, the single-loop decoder solution is extended into the proposed scalable scenario. A novel interlayer intra/interprediction is added to reduce the amount of bits representation by exploiting the correlation between coding layers. The experimental results indicate that the average Bjøntegaard delta rate decrease of 20.50% can be gained compared with the simulcast encoding. The proposed technique achieved 47.98% Bjøntegaard delta rate reduction compared with the scalable video coding extension of the H.264/AVC. Consequently, significant rate savings confirm that the proposed method achieves better performance.

  16. From experiment to design -- Fault characterization and detection in parallel computer systems using computational accelerators

    Science.gov (United States)

    Yim, Keun Soo

    This dissertation summarizes experimental validation and co-design studies conducted to optimize the fault detection capabilities and overheads in hybrid computer systems (e.g., using CPUs and Graphics Processing Units, or GPUs), and consequently to improve the scalability of parallel computer systems using computational accelerators. The experimental validation studies were conducted to help us understand the failure characteristics of CPU-GPU hybrid computer systems under various types of hardware faults. The main characterization targets were faults that are difficult to detect and/or recover from, e.g., faults that cause long latency failures (Ch. 3), faults in dynamically allocated resources (Ch. 4), faults in GPUs (Ch. 5), faults in MPI programs (Ch. 6), and microarchitecture-level faults with specific timing features (Ch. 7). The co-design studies were based on the characterization results. One of the co-designed systems has a set of source-to-source translators that customize and strategically place error detectors in the source code of target GPU programs (Ch. 5). Another co-designed system uses an extension card to learn the normal behavioral and semantic execution patterns of message-passing processes executing on CPUs, and to detect abnormal behaviors of those parallel processes (Ch. 6). The third co-designed system is a co-processor that has a set of new instructions in order to support software-implemented fault detection techniques (Ch. 7). The work described in this dissertation gains more importance because heterogeneous processors have become an essential component of state-of-the-art supercomputers. GPUs were used in three of the five fastest supercomputers that were operating in 2011. Our work included comprehensive fault characterization studies in CPU-GPU hybrid computers. In CPUs, we monitored the target systems for a long period of time after injecting faults (a temporally comprehensive experiment), and injected faults into various types of

  17. SOL: A Library for Scalable Online Learning Algorithms

    OpenAIRE

    Wu, Yue; Hoi, Steven C. H.; Liu, Chenghao; Lu, Jing; Sahoo, Doyen; Yu, Nenghai

    2016-01-01

    SOL is an open-source library for scalable online learning algorithms, and is particularly suitable for learning with high-dimensional data. The library provides a family of regular and sparse online learning algorithms for large-scale binary and multi-class classification tasks with high efficiency, scalability, portability, and extensibility. SOL was implemented in C++, and provided with a collection of easy-to-use command-line tools, python wrappers and library calls for users and develope...

  18. Constructing Hardware in a Scale Embedded Language

    Energy Technology Data Exchange (ETDEWEB)

    2014-08-21

    Chisel is a new open-source hardware construction language developed at UC Berkeley that supports advanced hardware design using highly parameterized generators and layered domain-specific hardware languages. Chisel is embedded in the Scala programming language, which raises the level of hardware design abstraction by providing concepts including object orientation, functional programming, parameterized types, and type inference. From the same source, Chisel can generate a high-speed C++-based cycle-accurate software simulator, or low-level Verilog designed to pass on to standard ASIC or FPGA tools for synthesis and place and route.

  19. GPU-Accelerated Real-Time Surveillance De-Weathering

    OpenAIRE

    Pettersson, Niklas

    2013-01-01

    A fully automatic de-weathering system to increase the visibility/stability in surveillance applications during bad weather has been developed. Rain, snow and haze during daylight are handled in real-time performance with acceleration from CUDA implemented algorithms. Video from fixed cameras is processed on a PC with no need of special hardware except an NVidia GPU. The system does not use any background model and does not require any precalibration. Increase in contrast is obtained in all h...

  20. A scalable distributed RRT for motion planning

    KAUST Repository

    Jacobs, Sam Ade

    2013-05-01

    Rapidly-exploring Random Tree (RRT), like other sampling-based motion planning methods, has been very successful in solving motion planning problems. Even so, sampling-based planners cannot solve all problems of interest efficiently, so attention is increasingly turning to parallelizing them. However, one challenge in parallelizing RRT is the global computation and communication overhead of nearest neighbor search, a key operation in RRTs. This is a critical issue as it limits the scalability of previous algorithms. We present two parallel algorithms to address this problem. The first algorithm extends existing work by introducing a parameter that adjusts how much local computation is done before a global update. The second algorithm radially subdivides the configuration space into regions, constructs a portion of the tree in each region in parallel, and connects the subtrees,i removing cycles if they exist. By subdividing the space, we increase computation locality enabling a scalable result. We show that our approaches are scalable. We present results demonstrating almost linear scaling to hundreds of processors on a Linux cluster and a Cray XE6 machine. © 2013 IEEE.

  1. A scalable distributed RRT for motion planning

    KAUST Repository

    Jacobs, Sam Ade; Stradford, Nicholas; Rodriguez, Cesar; Thomas, Shawna; Amato, Nancy M.

    2013-01-01

    Rapidly-exploring Random Tree (RRT), like other sampling-based motion planning methods, has been very successful in solving motion planning problems. Even so, sampling-based planners cannot solve all problems of interest efficiently, so attention is increasingly turning to parallelizing them. However, one challenge in parallelizing RRT is the global computation and communication overhead of nearest neighbor search, a key operation in RRTs. This is a critical issue as it limits the scalability of previous algorithms. We present two parallel algorithms to address this problem. The first algorithm extends existing work by introducing a parameter that adjusts how much local computation is done before a global update. The second algorithm radially subdivides the configuration space into regions, constructs a portion of the tree in each region in parallel, and connects the subtrees,i removing cycles if they exist. By subdividing the space, we increase computation locality enabling a scalable result. We show that our approaches are scalable. We present results demonstrating almost linear scaling to hundreds of processors on a Linux cluster and a Cray XE6 machine. © 2013 IEEE.

  2. A new tool for accelerator system modeling and analysis

    International Nuclear Information System (INIS)

    Gillespie, G.H.; Hill, B.W.; Jameson, R.A.

    1994-01-01

    A novel computer code is being developed to generate system level designs of radiofrequency ion accelerators. The goal of the Accelerator System Model (ASM) code is to create a modeling and analysis tool that is easy to use, automates many of the initial design calculations, supports trade studies used in assessing alternate designs and yet is flexible enough to incorporate new technology concepts as they emerge. Hardware engineering parameters and beam dynamics are modeled at comparable levels of fidelity. Existing scaling models of accelerator subsystems were sued to produce a prototype of ASM (version 1.0) working within the Shell for Particle Accelerator Related Codes (SPARC) graphical user interface. A small user group has been testing and evaluating the prototype for about a year. Several enhancements and improvements are now being developed. The current version (1.1) of ASM is briefly described and an example of the modeling and analysis capabilities is illustrated

  3. Hardware Objects for Java

    DEFF Research Database (Denmark)

    Schoeberl, Martin; Thalinger, Christian; Korsholm, Stephan

    2008-01-01

    Java, as a safe and platform independent language, avoids access to low-level I/O devices or direct memory access. In standard Java, low-level I/O it not a concern; it is handled by the operating system. However, in the embedded domain resources are scarce and a Java virtual machine (JVM) without...... an underlying middleware is an attractive architecture. When running the JVM on bare metal, we need access to I/O devices from Java; therefore we investigate a safe and efficient mechanism to represent I/O devices as first class Java objects, where device registers are represented by object fields. Access...... to those registers is safe as Java’s type system regulates it. The access is also fast as it is directly performed by the bytecodes getfield and putfield. Hardware objects thus provide an object-oriented abstraction of low-level hardware devices. As a proof of concept, we have implemented hardware objects...

  4. Upgrade of the TOTEM DAQ using the Scalable Readout System (SRS)

    International Nuclear Information System (INIS)

    Quinto, M; Cafagna, F; Fiergolski, A; Radicioni, E

    2013-01-01

    The main goals of the TOTEM Experiment at the LHC are the measurements of the elastic and total p-p cross sections and the studies of the diffractive dissociation processes. At LHC, collisions are produced at a rate of 40 MHz, imposing strong requirements for the Data Acquisition Systems (DAQ) in terms of trigger rate and data throughput. The TOTEM DAQ adopts a modular approach that, in standalone mode, is based on VME bus system. The VME based Front End Driver (FED) modules, host mezzanines that receive data through optical fibres directly from the detectors. After data checks and formatting are applied in the mezzanine, data is retransmitted to the VME interface and to another mezzanine card plugged in the FED module. The VME bus maximum bandwidth limits the maximum first level trigger (L1A) to 1 kHz rate. In order to get rid of the VME bottleneck and improve scalability and the overall capabilities of the DAQ, a new system was designed and constructed based on the Scalable Readout System (SRS), developed in the framework of the RD51 Collaboration. The project aims to increase the efficiency of the actual readout system providing higher bandwidth, and increasing data filtering, implementing a second-level trigger event selection based on hardware pattern recognition algorithms. This goal is to be achieved preserving the maximum back compatibility with the LHC Timing, Trigger and Control (TTC) system as well as with the CMS DAQ. The obtained results and the perspectives of the project are reported. In particular, we describe the system architecture and the new Opto-FEC adapter card developed to connect the SRS with the FED mezzanine modules. A first test bench was built and validated during the last TOTEM data taking period (February 2013). Readout of a set of 3 TOTEM Roman Pot silicon detectors was carried out to verify performance in the real LHC environment. In addition, the test allowed a check of data consistency and quality

  5. Hardware Development Process for Human Research Facility Applications

    Science.gov (United States)

    Bauer, Liz

    2000-01-01

    The simple goal of the Human Research Facility (HRF) is to conduct human research experiments on the International Space Station (ISS) astronauts during long-duration missions. This is accomplished by providing integration and operation of the necessary hardware and software capabilities. A typical hardware development flow consists of five stages: functional inputs and requirements definition, market research, design life cycle through hardware delivery, crew training, and mission support. The purpose of this presentation is to guide the audience through the early hardware development process: requirement definition through selecting a development path. Specific HRF equipment is used to illustrate the hardware development paths. The source of hardware requirements is the science community and HRF program. The HRF Science Working Group, consisting of SCientists from various medical disciplines, defined a basic set of equipment with functional requirements. This established the performance requirements of the hardware. HRF program requirements focus on making the hardware safe and operational in a space environment. This includes structural, thermal, human factors, and material requirements. Science and HRF program requirements are defined in a hardware requirements document which includes verification methods. Once the hardware is fabricated, requirements are verified by inspection, test, analysis, or demonstration. All data is compiled and reviewed to certify the hardware for flight. Obviously, the basis for all hardware development activities is requirement definition. Full and complete requirement definition is ideal prior to initiating the hardware development. However, this is generally not the case, but the hardware team typically has functional inputs as a guide. The first step is for engineers to conduct market research based on the functional inputs provided by scientists. CommerCially available products are evaluated against the science requirements as

  6. GPU-Accelerated Text Mining

    International Nuclear Information System (INIS)

    Cui, X.; Mueller, F.; Zhang, Y.; Potok, Thomas E.

    2009-01-01

    Accelerating hardware devices represent a novel promise for improving the performance for many problem domains but it is not clear for which domains what accelerators are suitable. While there is no room in general-purpose processor design to significantly increase the processor frequency, developers are instead resorting to multi-core chips duplicating conventional computing capabilities on a single die. Yet, accelerators offer more radical designs with a much higher level of parallelism and novel programming environments. This present work assesses the viability of text mining on CUDA. Text mining is one of the key concepts that has become prominent as an effective means to index the Internet, but its applications range beyond this scope and extend to providing document similarity metrics, the subject of this work. We have developed and optimized text search algorithms for GPUs to exploit their potential for massive data processing. We discuss the algorithmic challenges of parallelization for text search problems on GPUs and demonstrate the potential of these devices in experiments by reporting significant speedups. Our study may be one of the first to assess more complex text search problems for suitability for GPU devices, and it may also be one of the first to exploit and report on atomic instruction usage that have recently become available in NVIDIA devices

  7. CloudTPS: Scalable Transactions for Web Applications in the Cloud

    NARCIS (Netherlands)

    Zhou, W.; Pierre, G.E.O.; Chi, C.-H.

    2010-01-01

    NoSQL Cloud data services provide scalability and high availability properties for web applications but at the same time they sacrifice data consistency. However, many applications cannot afford any data inconsistency. CloudTPS is a scalable transaction manager to allow cloud database services to

  8. Advanced visualization technology for terascale particle accelerator simulations

    International Nuclear Information System (INIS)

    Ma, K-L; Schussman, G.; Wilson, B.; Ko, K.; Qiang, J.; Ryne, R.

    2002-01-01

    This paper presents two new hardware-assisted rendering techniques developed for interactive visualization of the terascale data generated from numerical modeling of next generation accelerator designs. The first technique, based on a hybrid rendering approach, makes possible interactive exploration of large-scale particle data from particle beam dynamics modeling. The second technique, based on a compact texture-enhanced representation, exploits the advanced features of commodity graphics cards to achieve perceptually effective visualization of the very dense and complex electromagnetic fields produced from the modeling of reflection and transmission properties of open structures in an accelerator design. Because of the collaborative nature of the overall accelerator modeling project, the visualization technology developed is for both desktop and remote visualization settings. We have tested the techniques using both time varying particle data sets containing up to one billion particle s per time step and electromagnetic field data sets with millions of mesh elements

  9. Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms.

    Science.gov (United States)

    Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N

    2009-06-01

    One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).

  10. Induction accelerator development for heavy ion fusion

    International Nuclear Information System (INIS)

    Reginato, L.L.

    1993-05-01

    For approximately a decade, the Heavy Ion Fusion Accelerator Research (HIFAR) group at LBL has been exploring the use of induction accelerators with multiple beams as the driver for inertial fusion targets. Scaled experiments have investigated the transport of space charge dominated beams (SBTE), and the current amplification and transverse emittance control in induction linacs (MBE-4) with very encouraging results. In order to study many of the beam manipulations required by a driver and to further develop economically competitive technology, a proposal has been made in partnership with LLNL to build a 10 MeV accelerator and to conduct a series of experiments collectively called the Induction Linac System Experiments (ILSE). The major components critical to the ILSE accelerator are currently under development. We have constructed a full scale induction module and we have tested a number of amorphous magnetic materials developed by Allied Signal to establish an overall optimal design. The electric and magnetic quadrupoles critical to the transport and focusing of heavy ion beams are also under development The hardware is intended to be economically competitive for a driver without sacrificing any of the physics or performance requirements. This paper will concentrate on the recent developments and tests of the major components required by the ILSE accelerator

  11. Induction accelerator development for heavy ion fusion

    International Nuclear Information System (INIS)

    Reginato, L.L.

    1993-05-01

    For approximately a decade, the Heavy Ion Fusion Accelerator Research (HIFAR) group at LBL has been exploring the use of induction accelerators with multiple beams as the driver for inertial fusion targets. Scaled experiments have investigated the transport of space charge dominated beams (SBTE), and the current amplification and transverse emittance control in induction linacs (MBE-4) with very encouraging results. In order to study many of the beam manipulations required by a driver and to further develop economically competitive technology, a proposal has been made in partnership with LLNL to build a 10 MeV accelerator and to conduct a series of experiments collectively called the Induction Linac System Experiments (ILSE).The major components critical to the ILSE accelerator are currently under development. We have constructed a full scale induction module and we have tested a number of amorphous magnetic materials developed by Allied Signal to establish an overall optimal design. The electric and magnetic quadrupoles critical to the transport and focusing of heavy ion beams are also under development. The hardware is intended to be economically competitive for a driver without sacrificing any of the physics or performance requirements. This paper will concentrate on the recent developments and tests of the major components required by the ILSE accelerator

  12. Production, Characterization, and Acceleration of Optical Microbunches

    Energy Technology Data Exchange (ETDEWEB)

    Sears, Christopher M.S. [Stanford Univ., CA (United States)

    2008-06-20

    Optical microbunches with a spacing of 800 nm have been produced for laser acceleration research. The microbunches are produced using a inverse Free-Electron-Laser (IFEL) followed by a dispersive chicane. The microbunched electron beam is characterized by coherent optical transition radiation (COTR) with good agreement to the analytic theory for bunch formation. In a second experiment the bunches are accelerated in a second stage to achieve for the first time direct net acceleration of electrons traveling in a vacuum with visible light. This dissertation presents the theory of microbunch formation and characterization of the microbunches. It also presents the design of the experimental hardware from magnetostatic and particle tracking simulations, to fabrication and measurement of the undulator and chicane magnets. Finally, the dissertation discusses three experiments aimed at demonstrating the IFEL interaction, microbunch production, and the net acceleration of the microbunched beam. At the close of the dissertation, a separate but related research effort on the tight focusing of electrons for coupling into optical scale, Photonic Bandgap, structures is presented. This includes the design and fabrication of a strong focusing permanent magnet quadrupole triplet and an outline of an initial experiment using the triplet to observe wakefields generated by an electron beam passing through an optical scale accelerator.

  13. From Digital Disruption to Business Model Scalability

    DEFF Research Database (Denmark)

    Nielsen, Christian; Lund, Morten; Thomsen, Peter Poulsen

    2017-01-01

    This article discusses the terms disruption, digital disruption, business models and business model scalability. It illustrates how managers should be using these terms for the benefit of their business by developing business models capable of achieving exponentially increasing returns to scale...... will seldom lead to business model scalability capable of competing with digital disruption(s)....... as a response to digital disruption. A series of case studies illustrate that besides frequent existing messages in the business literature relating to the importance of creating agile businesses, both in growing and declining economies, as well as hard to copy value propositions or value propositions that take...

  14. Development of a simple, low cost, indirect ion beam fluence measurement system for ion implanters, accelerators

    Science.gov (United States)

    Suresh, K.; Balaji, S.; Saravanan, K.; Navas, J.; David, C.; Panigrahi, B. K.

    2018-02-01

    We developed a simple, low cost user-friendly automated indirect ion beam fluence measurement system for ion irradiation and analysis experiments requiring indirect beam fluence measurements unperturbed by sample conditions like low temperature, high temperature, sample biasing as well as in regular ion implantation experiments in the ion implanters and electrostatic accelerators with continuous beam. The system, which uses simple, low cost, off-the-shelf components/systems and two distinct layers of in-house built softwarenot only eliminates the need for costly data acquisition systems but also overcomes difficulties in using properietry software. The hardware of the system is centered around a personal computer, a PIC16F887 based embedded system, a Faraday cup drive cum monitor circuit, a pair of Faraday Cups and a beam current integrator and the in-house developed software include C based microcontroller firmware and LABVIEW based virtual instrument automation software. The automatic fluence measurement involves two important phases, a current sampling phase lasting over 20-30 seconds during which the ion beam current is continuously measured by intercepting the ion beam and the averaged beam current value is computed. A subsequent charge computation phase lasting 700-900 seconds is executed making the ion beam to irradiate the samples and the incremental fluence received by the sampleis estimated usingthe latest averaged beam current value from the ion beam current sampling phase. The cycle of current sampling-charge computation is repeated till the required fluence is reached. Besides simplicity and cost-effectiveness, other important advantages of the developed system include easy reconfiguration of the system to suit customisation of experiments, scalability, easy debug and maintenance of the hardware/software, ability to work as a standalone system. The system was tested with different set of samples and ion fluences and the results were verified using

  15. VEG-01: Veggie Hardware Verification Testing

    Science.gov (United States)

    Massa, Gioia; Newsham, Gary; Hummerick, Mary; Morrow, Robert; Wheeler, Raymond

    2013-01-01

    The Veggie plant/vegetable production system is scheduled to fly on ISS at the end of2013. Since much of the technology associated with Veggie has not been previously tested in microgravity, a hardware validation flight was initiated. This test will allow data to be collected about Veggie hardware functionality on ISS, allow crew interactions to be vetted for future improvements, validate the ability of the hardware to grow and sustain plants, and collect data that will be helpful to future Veggie investigators as they develop their payloads. Additionally, food safety data on the lettuce plants grown will be collected to help support the development of a pathway for the crew to safely consume produce grown on orbit. Significant background research has been performed on the Veggie plant growth system, with early tests focusing on the development of the rooting pillow concept, and the selection of fertilizer, rooting medium and plant species. More recent testing has been conducted to integrate the pillow concept into the Veggie hardware and to ensure that adequate water is provided throughout the growth cycle. Seed sanitation protocols have been established for flight, and hardware sanitation between experiments has been studied. Methods for shipping and storage of rooting pillows and the development of crew procedures and crew training videos for plant activities on-orbit have been established. Science verification testing was conducted and lettuce plants were successfully grown in prototype Veggie hardware, microbial samples were taken, plant were harvested, frozen, stored and later analyzed for microbial growth, nutrients, and A TP levels. An additional verification test, prior to the final payload verification testing, is desired to demonstrate similar growth in the flight hardware and also to test a second set of pillows containing zinnia seeds. Issues with root mat water supply are being resolved, with final testing and flight scheduled for later in 2013.

  16. Scalable Packet Classification with Hash Tables

    Science.gov (United States)

    Wang, Pi-Chung

    In the last decade, the technique of packet classification has been widely deployed in various network devices, including routers, firewalls and network intrusion detection systems. In this work, we improve the performance of packet classification by using multiple hash tables. The existing hash-based algorithms have superior scalability with respect to the required space; however, their search performance may not be comparable to other algorithms. To improve the search performance, we propose a tuple reordering algorithm to minimize the number of accessed hash tables with the aid of bitmaps. We also use pre-computation to ensure the accuracy of our search procedure. Performance evaluation based on both real and synthetic filter databases shows that our scheme is effective and scalable and the pre-computation cost is moderate.

  17. A scalable method for parallelizing sampling-based motion planning algorithms

    KAUST Repository

    Jacobs, Sam Ade; Manavi, Kasra; Burgos, Juan; Denny, Jory; Thomas, Shawna; Amato, Nancy M.

    2012-01-01

    This paper describes a scalable method for parallelizing sampling-based motion planning algorithms. It subdivides configuration space (C-space) into (possibly overlapping) regions and independently, in parallel, uses standard (sequential) sampling-based planners to construct roadmaps in each region. Next, in parallel, regional roadmaps in adjacent regions are connected to form a global roadmap. By subdividing the space and restricting the locality of connection attempts, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We show that our method is general enough to handle a variety of planning schemes, including the widely used Probabilistic Roadmap (PRM) and Rapidly-exploring Random Trees (RRT) algorithms. We compare our approach to two other existing parallel algorithms and demonstrate that our approach achieves better and more scalable performance. Our approach achieves almost linear scalability on a 2400 core LINUX cluster and on a 153,216 core Cray XE6 petascale machine. © 2012 IEEE.

  18. A scalable method for parallelizing sampling-based motion planning algorithms

    KAUST Repository

    Jacobs, Sam Ade

    2012-05-01

    This paper describes a scalable method for parallelizing sampling-based motion planning algorithms. It subdivides configuration space (C-space) into (possibly overlapping) regions and independently, in parallel, uses standard (sequential) sampling-based planners to construct roadmaps in each region. Next, in parallel, regional roadmaps in adjacent regions are connected to form a global roadmap. By subdividing the space and restricting the locality of connection attempts, we reduce the work and inter-processor communication associated with nearest neighbor calculation, a critical bottleneck for scalability in existing parallel motion planning methods. We show that our method is general enough to handle a variety of planning schemes, including the widely used Probabilistic Roadmap (PRM) and Rapidly-exploring Random Trees (RRT) algorithms. We compare our approach to two other existing parallel algorithms and demonstrate that our approach achieves better and more scalable performance. Our approach achieves almost linear scalability on a 2400 core LINUX cluster and on a 153,216 core Cray XE6 petascale machine. © 2012 IEEE.

  19. Accelerating Climate and Weather Simulations through Hybrid Computing

    Science.gov (United States)

    Zhou, Shujia; Cruz, Carlos; Duffy, Daniel; Tucker, Robert; Purcell, Mark

    2011-01-01

    Unconventional multi- and many-core processors (e.g. IBM (R) Cell B.E.(TM) and NVIDIA (R) GPU) have emerged as effective accelerators in trial climate and weather simulations. Yet these climate and weather models typically run on parallel computers with conventional processors (e.g. Intel, AMD, and IBM) using Message Passing Interface. To address challenges involved in efficiently and easily connecting accelerators to parallel computers, we investigated using IBM's Dynamic Application Virtualization (TM) (IBM DAV) software in a prototype hybrid computing system with representative climate and weather model components. The hybrid system comprises two Intel blades and two IBM QS22 Cell B.E. blades, connected with both InfiniBand(R) (IB) and 1-Gigabit Ethernet. The system significantly accelerates a solar radiation model component by offloading compute-intensive calculations to the Cell blades. Systematic tests show that IBM DAV can seamlessly offload compute-intensive calculations from Intel blades to Cell B.E. blades in a scalable, load-balanced manner. However, noticeable communication overhead was observed, mainly due to IP over the IB protocol. Full utilization of IB Sockets Direct Protocol and the lower latency production version of IBM DAV will reduce this overhead.

  20. Efficient Enhancement for Spatial Scalable Video Coding Transmission

    Directory of Open Access Journals (Sweden)

    Mayada Khairy

    2017-01-01

    Full Text Available Scalable Video Coding (SVC is an international standard technique for video compression. It is an extension of H.264 Advanced Video Coding (AVC. In the encoding of video streams by SVC, it is suitable to employ the macroblock (MB mode because it affords superior coding efficiency. However, the exhaustive mode decision technique that is usually used for SVC increases the computational complexity, resulting in a longer encoding time (ET. Many other algorithms were proposed to solve this problem with imperfection of increasing transmission time (TT across the network. To minimize the ET and TT, this paper introduces four efficient algorithms based on spatial scalability. The algorithms utilize the mode-distribution correlation between the base layer (BL and enhancement layers (ELs and interpolation between the EL frames. The proposed algorithms are of two categories. Those of the first category are based on interlayer residual SVC spatial scalability. They employ two methods, namely, interlayer interpolation (ILIP and the interlayer base mode (ILBM method, and enable ET and TT savings of up to 69.3% and 83.6%, respectively. The algorithms of the second category are based on full-search SVC spatial scalability. They utilize two methods, namely, full interpolation (FIP and the full-base mode (FBM method, and enable ET and TT savings of up to 55.3% and 76.6%, respectively.

  1. Computer hardware fault administration

    Science.gov (United States)

    Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

    2010-09-14

    Computer hardware fault administration carried out in a parallel computer, where the parallel computer includes a plurality of compute nodes. The compute nodes are coupled for data communications by at least two independent data communications networks, where each data communications network includes data communications links connected to the compute nodes. Typical embodiments carry out hardware fault administration by identifying a location of a defective link in the first data communications network of the parallel computer and routing communications data around the defective link through the second data communications network of the parallel computer.

  2. An adaptive cryptographic accelerator for network storage security on dynamically reconfigurable platform

    Science.gov (United States)

    Tang, Li; Liu, Jing-Ning; Feng, Dan; Tong, Wei

    2008-12-01

    Existing security solutions in network storage environment perform poorly because cryptographic operations (encryption and decryption) implemented in software can dramatically reduce system performance. In this paper we propose a cryptographic hardware accelerator on dynamically reconfigurable platform for the security of high performance network storage system. We employ a dynamic reconfigurable platform based on a FPGA to implement a PowerPCbased embedded system, which executes cryptographic algorithms. To reduce the reconfiguration latency, we apply prefetch scheduling. Moreover, the processing elements could be dynamically configured to support different cryptographic algorithms according to the request received by the accelerator. In the experiment, we have implemented AES (Rijndael) and 3DES cryptographic algorithms in the reconfigurable accelerator. Our proposed reconfigurable cryptographic accelerator could dramatically increase the performance comparing with the traditional software-based network storage systems.

  3. Accelerator technology program. Progress report, January-June 1981

    International Nuclear Information System (INIS)

    Knapp, E.A.; Jameson, R.A.

    1982-05-01

    This report covers the activities of Los Alamos National Laboratory's Accelerator Technology Division during the first 6 months of calendar 1981. We discuss the Division's major projects, which reflect a variety of applications and sponsors. The varied technologies concerned with the Proton Storage ring are concerned with the Proton Storage Ring are continuing and are discussed in detail. For the racetrack microtron (RTM) project, the major effort has been the design and construction of the demonstration RTM. Our development of the radio-frequency quadrupole (RFQ) linear accelerator continues to stimulate interest for many possible applications. Frequent contacts from other laboratories have revealed a wide acceptance of the RFQ principle in solving low-velocity acceleration problems. In recent work on heavy ion fusion we have developed ideas for funneling beams from RFQ linacs; the funneling process is explained. To test as many aspects as possible of a fully integrated low-energy portion of a Pion generator for Medical Irradiation (PIGMI) Accelerator, a prototype accelerator was designed to take advantage of several pieces of existing accelerator hardware. The important principles to be tested in this prototype accelerator are detailed. Our prototype gyrocon has been extensively tested and modified; we discuss results from our investigations. Our work with the Fusion Materials Irradiation Test Facility is reviewed in this report

  4. Embedded High Performance Scalable Computing Systems

    National Research Council Canada - National Science Library

    Ngo, David

    2003-01-01

    The Embedded High Performance Scalable Computing Systems (EHPSCS) program is a cooperative agreement between Sanders, A Lockheed Martin Company and DARPA that ran for three years, from Apr 1995 - Apr 1998...

  5. Investigation on Reliability and Scalability of an FBG-Based Hierarchical AOFSN

    Directory of Open Access Journals (Sweden)

    Li-Mei Peng

    2010-03-01

    Full Text Available The reliability and scalability of large-scale based optical fiber sensor networks (AOFSN are considered in this paper. The AOFSN network consists of three-level hierarchical sensor network architectures. The first two levels consist of active interrogation and remote nodes (RNs and the third level, called the sensor subnet (SSN, consists of passive Fiber Bragg Gratings (FBGs and a few switches. The switch architectures in the RN and various SSNs to improve the reliability and scalability of AOFSN are studied. Two SSNs with a regular topology are proposed to support simple routing and scalability in AOFSN: square-based sensor cells (SSC and pentagon-based sensor cells (PSC. The reliability and scalability are evaluated in terms of the available sensing coverage in the case of one or multiple link failures.

  6. APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters

    International Nuclear Information System (INIS)

    Ammendola, R; Salamon, A; Salina, G; Biagioni, A; Prezza, O; Cicero, F Lo; Lonardo, A; Paolucci, P S; Rossetti, D; Tosoratto, L; Vicini, P; Simula, F

    2011-01-01

    We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable cluster network architecture. Some test results and characterization of data transmission of a complete testbench, based on a commercial development card mounting an Altera ® FPGA, are provided.

  7. APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters

    Energy Technology Data Exchange (ETDEWEB)

    Ammendola, R; Salamon, A; Salina, G [INFN Tor Vergata, Roma (Italy); Biagioni, A; Prezza, O; Cicero, F Lo; Lonardo, A; Paolucci, P S; Rossetti, D; Tosoratto, L; Vicini, P [INFN Roma, Roma (Italy); Simula, F [Sapienza Universita di Roma, Roma (Italy)

    2011-12-23

    We describe herein the APElink+ board, a PCIe interconnect adapter featuring the latest advances in wire speed and interface technology plus hardware support for a RDMA programming model and experimental acceleration of GPU networking; this design allows us to build a low latency, high bandwidth PC cluster, the APEnet+ network, the new generation of our cost-effective, tens-of-thousands-scalable cluster network architecture. Some test results and characterization of data transmission of a complete testbench, based on a commercial development card mounting an Altera{sup Registered-Sign} FPGA, are provided.

  8. Proposal for a Simplified Management of Accelerator Settings in the Injector Complex Based on Automatic Setting Propagation

    CERN Document Server

    Damerau, Heiko

    2018-01-01

    With the increasing number of beams with different characteristics the number of controls settings to pilot the accelerator hardware has grown proportionally. In addition new hardware often comes with more possibilities to configure its behaviour, also requiring more parameters to be set. Both factors have lead to a significant growth of the number of set values to control the accelerators in the injector complex. To keep track of this myriad of settings, an automatic setting propagation mechanism is suggested. It allows to group cycles in families which partly share the same settings. This is in particular efficient for cycles where most of the settings must be identical, which is the case for many beams, e.g., in the PS.

  9. Magnetic field alignment for a 20 MeV linear induction accelerator

    International Nuclear Information System (INIS)

    Zhang Wenwei; Pan Haifeng; Li Hong; Liu Yunlong; Zhang Linwen

    2002-01-01

    'Dragon-1' accelerator now is being constructed in CAEP. It will produce high current pulse electron beams. The main components of the accelerator include 72 induction accelerating cells and 18 connection cells with ports for beam di gnostic hardware and vacuum pump. In order to acquire high quality beams, a lot of problems have to be addressed such as to reduce the emittance, to control the increase of corkscrew and so on. The alignment of the focus magnetic field is the most concerned. A laser track has been used for mechanical alignment, magnetic alignment is performed by using pulsed-wire technique, and the natural tilt errors is corrected by a pair of steering coil, which is located inside the cell

  10. Scalable Coverage Maintenance for Dense Wireless Sensor Networks

    Directory of Open Access Journals (Sweden)

    Jun Lu

    2007-06-01

    Full Text Available Owing to numerous potential applications, wireless sensor networks have been attracting significant research effort recently. The critical challenge that wireless sensor networks often face is to sustain long-term operation on limited battery energy. Coverage maintenance schemes can effectively prolong network lifetime by selecting and employing a subset of sensors in the network to provide sufficient sensing coverage over a target region. We envision future wireless sensor networks composed of a vast number of miniaturized sensors in exceedingly high density. Therefore, the key issue of coverage maintenance for future sensor networks is the scalability to sensor deployment density. In this paper, we propose a novel coverage maintenance scheme, scalable coverage maintenance (SCOM, which is scalable to sensor deployment density in terms of communication overhead (i.e., number of transmitted and received beacons and computational complexity (i.e., time and space complexity. In addition, SCOM achieves high energy efficiency and load balancing over different sensors. We have validated our claims through both analysis and simulations.

  11. FPGA Mezzanine Cards for CERN’s Accelerator Control System

    CERN Document Server

    Alvarez, P R; Lewis, J; Serrano, J; Wlostowski, T

    2009-01-01

    Field Programmable Gate Arrays (FPGAs) have become a key player in modern real time control systems. They offer determinism, simple design, high performance and versatility. A typical hardware architecture consists of an FPGA interfaced with a control bus and a variable number of digital IOs, ADCs and DACs depending on the application. Until recently the low-cost hardware paradigm has been using mezzanines containing a front end interface plus custom logic (typically an FPGA) and a local bus that interfaces the mezzanine to a carrier. As FPGAs grow in size and shrink in price, hardware reuse, testability and bus access speed could be improved if the user logic is moved to the carrier. The new FPGA Mezzanine Card (FMC) Vita 57 standard is a good example of this new paradigm. In this paper we present a standard kit of FPGA carriers and IO mezzanines for accelerator control. Carriers form factors will be VME, PCI and PCIe. The carriers will feature White Rabbit support for accurate synchronization of distributed...

  12. A broadband accelerator control network

    International Nuclear Information System (INIS)

    Skelly, J.; Clifford, T.; Frankel, R.

    1983-01-01

    A broadband data communications network has been implemented at BNL for control of the Alternating Gradient Synchrotron (AGS) proton accelerator, using commercial CATV hardware, dual coaxial cables as the communications medium, and spanning 2.0 km. A 4 MHz bandwidth Digital Control Channel using CSMA-CA protocol is provided for digital data transmission, with 8 access nodes available over the length of the RELWAY. Each node consists of an rf modem and a microprocessor-based store-and-forward message handler which interfaces the RELWAY to a branch line implemented in GPIB. A gateway to the RELWAY control channel for the (preexisting) AGS Computerized Accelerator Operating System has been constructed using an LSI-11/23 microprocessor as a device in a GPIB branch line. A multilayer communications protocol has been defined for the Digital Control Channel, based on the ISO Open Systems Interconnect layered model, and a RELWAY Device Language defined as the required universal language for device control on this channel

  13. Blind Cooperative Routing for Scalable and Energy-Efficient Internet of Things

    KAUST Repository

    Bader, Ahmed; Alouini, Mohamed-Slim

    2016-01-01

    Multihop networking is promoted in this paper for energy-efficient and highly-scalable Internet of Things (IoT). Recognizing concerns related to the scalability of classical multihop routing and medium access techniques, the use of blind cooperation

  14. Approach to compact terawatt CO2 laser system for particle acceleration

    International Nuclear Information System (INIS)

    Pogorelsky, I.V.; Kimura, W.D.; Fisher, C.H.; Kannari, F.

    1994-01-01

    A compact table-top 20-GW 50-ps CO 2 laser system is in operation for strong-field physics studies at the ATF. We propose scaling up of the picosecond CO 2 laser to a terawatt peak power level to meet the requirements of advanced laser accelerators. Computer modeling shows that a relatively compact single-beam picosecond CO 2 laser system with a high-pressure x-ray picosecond amplifier of a 10-cm aperture is potentially scalable to the ∼1-TW peak power level

  15. Towards a Scalable, Biomimetic, Antibacterial Coating

    Science.gov (United States)

    Dickson, Mary Nora

    Corneal afflictions are the second leading cause of blindness worldwide. When a corneal transplant is unavailable or contraindicated, an artificial cornea device is the only chance to save sight. Bacterial or fungal biofilm build up on artificial cornea devices can lead to serious complications including the need for systemic antibiotic treatment and even explantation. As a result, much emphasis has been placed on anti-adhesion chemical coatings and antibiotic leeching coatings. These methods are not long-lasting, and microorganisms can eventually circumvent these measures. Thus, I have developed a surface topographical antimicrobial coating. Various surface structures including rough surfaces, superhydrophobic surfaces, and the natural surfaces of insects' wings and sharks' skin are promising anti-biofilm candidates, however none meet the criteria necessary for implementation on the surface of an artificial cornea device. In this thesis I: 1) developed scalable fabrication protocols for a library of biomimetic nanostructure polymer surfaces 2) assessed the potential these for poly(methyl methacrylate) nanopillars to kill or prevent formation of biofilm by E. coli bacteria and species of Pseudomonas and Staphylococcus bacteria and improved upon a proposed mechanism for the rupture of Gram-negative bacterial cell walls 3) developed a scalable, commercially viable method for producing antibacterial nanopillars on a curved, PMMA artificial cornea device and 4) developed scalable fabrication protocols for implantation of antibacterial nanopatterned surfaces on the surfaces of thermoplastic polyurethane materials, commonly used in catheter tubings. This project constitutes a first step towards fabrication of the first entirely PMMA artificial cornea device. The major finding of this work is that by precisely controlling the topography of a polymer surface at the nano-scale, we can kill adherent bacteria and prevent biofilm formation of certain pathogenic bacteria

  16. Rate control scheme for consistent video quality in scalable video codec.

    Science.gov (United States)

    Seo, Chan-Won; Han, Jong-Ki; Nguyen, Truong Q

    2011-08-01

    Multimedia data delivered to mobile devices over wireless channels or the Internet are complicated by bandwidth fluctuation and the variety of mobile devices. Scalable video coding has been developed as an extension of H.264/AVC to solve this problem. Since scalable video codec provides various scalabilities to adapt the bitstream for the channel conditions and terminal types, scalable codec is one of the useful codecs for wired or wireless multimedia communication systems, such as IPTV and streaming services. In such scalable multimedia communication systems, video quality fluctuation degrades the visual perception significantly. It is important to efficiently use the target bits in order to maintain a consistent video quality or achieve a small distortion variation throughout the whole video sequence. The scheme proposed in this paper provides a useful function to control video quality in applications supporting scalability, whereas conventional schemes have been proposed to control video quality in the H.264 and MPEG-4 systems. The proposed algorithm decides the quantization parameter of the enhancement layer to maintain a consistent video quality throughout the entire sequence. The video quality of the enhancement layer is controlled based on a closed-form formula which utilizes the residual data and quantization error of the base layer. The simulation results show that the proposed algorithm controls the frame quality of the enhancement layer in a simple operation, where the parameter decision algorithm is applied to each frame.

  17. Non-fuel bearing hardware melting technology

    International Nuclear Information System (INIS)

    Newman, D.F.

    1993-01-01

    Battelle has developed a portable hardware melter concept that would allow spent fuel rod consolidation operations at commercial nuclear power plants to provide significantly more storage space for other spent fuel assemblies in existing pool racks at lower cost. Using low pressure compaction, the non-fuel bearing hardware (NFBH) left over from the removal of spent fuel rods from the stainless steel end fittings and the Zircaloy guide tubes and grid spacers still occupies 1/3 to 2/5 of the volume of the consolidated fuel rod assemblies. Melting the non-fuel bearing hardware reduces its volume by a factor 4 from that achievable with low-pressure compaction. This paper describes: (1) the configuration and design features of Battelle's hardware melter system that permit its portability, (2) the system's throughput capacity, (3) the bases for capital and operating estimates, and (4) the status of NFBH melter demonstration to reduce technical risks for implementation of the concept. Since all NFBH handling and processing operations would be conducted at the reactor site, costs for shipping radioactive hardware to and from a stationary processing facility for volume reduction are avoided. Initial licensing, testing, and installation in the field would follow the successful pattern achieved with rod consolidation technology

  18. A Practical Introduction to HardwareSoftware Codesign

    CERN Document Server

    Schaumont, Patrick R

    2013-01-01

    This textbook provides an introduction to embedded systems design, with emphasis on integration of custom hardware components with software. The key problem addressed in the book is the following: how can an embedded systems designer strike a balance between flexibility and efficiency? The book describes how combining hardware design with software design leads to a solution to this important computer engineering problem. The book covers four topics in hardware/software codesign: fundamentals, the design space of custom architectures, the hardware/software interface and application examples. The book comes with an associated design environment that helps the reader to perform experiments in hardware/software codesign. Each chapter also includes exercises and further reading suggestions. Improvements in this second edition include labs and examples using modern FPGA environments from Xilinx and Altera, which make the material applicable to a greater number of courses where these tools are already in use.  Mo...

  19. Comparative Modal Analysis of Sieve Hardware Designs

    Science.gov (United States)

    Thompson, Nathaniel

    2012-01-01

    The CMTB Thwacker hardware operates as a testbed analogue for the Flight Thwacker and Sieve components of CHIMRA, a device on the Curiosity Rover. The sieve separates particles with a diameter smaller than 150 microns for delivery to onboard science instruments. The sieving behavior of the testbed hardware should be similar to the Flight hardware for the results to be meaningful. The elastodynamic behavior of both sieves was studied analytically using the Rayleigh Ritz method in conjunction with classical plate theory. Finite element models were used to determine the mode shapes of both designs, and comparisons between the natural frequencies and mode shapes were made. The analysis predicts that the performance of the CMTB Thwacker will closely resemble the performance of the Flight Thwacker within the expected steady state operating regime. Excitations of the testbed hardware that will mimic the flight hardware were recommended, as were those that will improve the efficiency of the sieving process.

  20. Heavy Ion Fusion Accelerator Research (HIFAR) half-year report, October 1, 1985-March 31, 1986

    International Nuclear Information System (INIS)

    1986-05-01

    The HIFAR program addresses the generation of high-power, high-brightness beams of heavy ions, the understanding of the scaling laws in this novel physics regime, and the validation of new accelerator strategies, to cut costs. Key elements to be addressed include: (1) beam quality limits set by transverse and longitudinal beam physics; (2) development of induction accelerating modules, and multiple beam hardware, at affordable costs; (3) acceleration of multiple beams with current amplification - both new features in a linac - without significant dilution of the optical quality of the beams; (4) fianl bunching, transport, and accurate focussing on a small target

  1. Remote hardware-reconfigurable robotic camera

    Science.gov (United States)

    Arias-Estrada, Miguel; Torres-Huitzil, Cesar; Maya-Rueda, Selene E.

    2001-10-01

    In this work, a camera with integrated image processing capabilities is discussed. The camera is based on an imager coupled to an FPGA device (Field Programmable Gate Array) which contains an architecture for real-time computer vision low-level processing. The architecture can be reprogrammed remotely for application specific purposes. The system is intended for rapid modification and adaptation for inspection and recognition applications, with the flexibility of hardware and software reprogrammability. FPGA reconfiguration allows the same ease of upgrade in hardware as a software upgrade process. The camera is composed of a digital imager coupled to an FPGA device, two memory banks, and a microcontroller. The microcontroller is used for communication tasks and FPGA programming. The system implements a software architecture to handle multiple FPGA architectures in the device, and the possibility to download a software/hardware object from the host computer into its internal context memory. System advantages are: small size, low power consumption, and a library of hardware/software functionalities that can be exchanged during run time. The system has been validated with an edge detection and a motion processing architecture, which will be presented in the paper. Applications targeted are in robotics, mobile robotics, and vision based quality control.

  2. A compact, repetitive accelerator for military and industrial applications

    International Nuclear Information System (INIS)

    Zutavern, F.J.; O'Malley, M.W.; Ruebush, M.H.; Rinehart, L.F.; Loubriel, G.M.; Babcock, S.R.; Denison, G.J.

    1998-04-01

    A compact, short pulse, repetitive accelerator has many useful military and commercial applications in biological counter proliferation, materials processing, radiography, and sterilization (medical instruments, waste, and food). The goal of this project was to develop and demonstrate a small, 700 kV accelerator, which can produce 7 kA particle beams with pulse lengths of 10--30 ns at rates up to 50 Hz. At reduced power levels, longer pulses or higher repetition rates (up to 10 kHz) could be achieved. Two switching technologies were tested: (1) spark gaps, which have been used to build low repetition rate accelerators for many years; and (2) high gain photoconductive semiconductor switches (PCSS), a new solid state switching technology. This plan was economical, because it used existing hardware for the accelerator, and the PCSS material and fabrication for one module was relatively inexpensive. It was research oriented, because it provided a test bed to examine the utility of other emerging switching technologies, such as magnetic switches. At full power, the accelerator will produce 700 kV and 7 kA with either the spark gap or PCSS pulser

  3. Architecture exploration of FPGA based accelerators for bioinformatics applications

    CERN Document Server

    Varma, B Sharat Chandra; Balakrishnan, M

    2016-01-01

    This book presents an evaluation methodology to design future FPGA fabrics incorporating hard embedded blocks (HEBs) to accelerate applications. This methodology will be useful for selection of blocks to be embedded into the fabric and for evaluating the performance gain that can be achieved by such an embedding. The authors illustrate the use of their methodology by studying the impact of HEBs on two important bioinformatics applications: protein docking and genome assembly. The book also explains how the respective HEBs are designed and how hardware implementation of the application is done using these HEBs. It shows that significant speedups can be achieved over pure software implementations by using such FPGA-based accelerators. The methodology presented in this book may also be used for designing HEBs for accelerating software implementations in other domains besides bioinformatics. This book will prove useful to students, researchers, and practicing engineers alike.

  4. Heavy Ion Fusion Accelerator Research (HIFAR) year-end report, April 1, 1990--September 30, 1990

    International Nuclear Information System (INIS)

    1990-12-01

    The basic objective of the Heavy Ion Fusion Accelerator Research (HIFAR) program is to assess the suitability of heavy ion accelerators as igniters for Inertial Confinement Fusion (ICF). A specific accelerator technology, induction acceleration, is being studied at the Lawrence Berkeley Laboratory and at the Lawrence Livermore National Laboratory. The HIFAR program addresses the generation of high-power, high-brightness beams of heavy ions, the understanding of the scaling laws in this novel physics regime, and the validation of new accelerator strategies to cut costs. Key elements to be addressed include: (1) beam quality limits set by transverse and longitudinal beam physics; (2) development of induction accelerating modules, and multiple-beam hardware, at affordable costs; (3) acceleration of multiple beams with current amplification without significant dilution of the optical quality of the beams; (4) final bunching, transport, and accurate focusing on a small target

  5. Transmission delays in hardware clock synchronization

    Science.gov (United States)

    Shin, Kang G.; Ramanathan, P.

    1988-01-01

    Various methods, both with software and hardware, have been proposed to synchronize a set of physical clocks in a system. Software methods are very flexible and economical but suffer an excessive time overhead, whereas hardware methods require no time overhead but are unable to handle transmission delays in clock signals. The effects of nonzero transmission delays in synchronization have been studied extensively in the communication area in the absence of malicious or Byzantine faults. The authors show that it is easy to incorporate the ideas from the communication area into the existing hardware clock synchronization algorithms to take into account the presence of both malicious faults and nonzero transmission delays.

  6. Computer hardware description languages - A tutorial

    Science.gov (United States)

    Shiva, S. G.

    1979-01-01

    The paper introduces hardware description languages (HDL) as useful tools for hardware design and documentation. The capabilities and limitations of HDLs are discussed along with the guidelines needed in selecting an appropriate HDL. The directions for future work are provided and attention is given to the implementation of HDLs in microcomputers.

  7. Accelerator controls at CERN: Some converging trends

    International Nuclear Information System (INIS)

    Kuiper, B.

    1990-01-01

    CERN's growing services to the high-energy physics community using frozen resources has led to the implementation of 'Technical Boards', mandated to assist the management by making recommendations for rationalizations in various technological domains. The Board on Process Control and Electronics for Accelerators, TEBOCO, has emphasized four main lines which might yield economy in resources. First, a common architecture for accelerator controls has been agreed between the three accelerator divisions. Second, a common hardware/software kit has been defined, from which the large majority of future process interfacing may be composed. A support service for this kit is an essential part of the plan. Third, high-level protocols have been developed for standardizing access to process devices. They derive from agreed standard models of the devices and involve a standard control message. This should ease application development and mobility of equipment. Fourth, a common software engineering methodology and a commercial package of application development tools have been adopted. Some rationalization in the field of the man-machine interface and in matters of synchronization is also under way. (orig.)

  8. Accelerator controls at CERN: Some converging trends

    Science.gov (United States)

    Kuiper, B.

    1990-08-01

    CERN's growing services to the high-energy physics community using frozen resources has led to the implementation of "Technical Boards", mandated to assist the management by making recommendations for rationalizations in various technological domains. The Board on Process Control and Electronics for Accelerators, TEBOCO, has emphasized four main lines which might yield economy in resources. First, a common architecture for accelerator controls has been agreed between the three accelerator divisions. Second, a common hardware/software kit has been defined, from which the large majority of future process interfacing may be composed. A support service for this kit is an essential part of the plan. Third, high-level protocols have been developed for standardizing access to process devices. They derive from agreed standard models of the devices and involve a standard control message. This should ease application development and mobility of equipment. Fourth, a common software engineering methodology and a commercial package of application development tools have been adopted. Some rationalization in the field of the man-machine interface and in matters of synchronization is also under way.

  9. Using the FLUKA Monte Carlo Code to Simulate the Interactions of Ionizing Radiation with Matter to Assist and Aid Our Understanding of Ground Based Accelerator Testing, Space Hardware Design, and Secondary Space Radiation Environments

    Science.gov (United States)

    Reddell, Brandon

    2015-01-01

    Designing hardware to operate in the space radiation environment is a very difficult and costly activity. Ground based particle accelerators can be used to test for exposure to the radiation environment, one species at a time, however, the actual space environment cannot be duplicated because of the range of energies and isotropic nature of space radiation. The FLUKA Monte Carlo code is an integrated physics package based at CERN that has been under development for the last 40+ years and includes the most up-to-date fundamental physics theory and particle physics data. This work presents an overview of FLUKA and how it has been used in conjunction with ground based radiation testing for NASA and improve our understanding of secondary particle environments resulting from the interaction of space radiation with matter.

  10. Support for NUMA hardware in HelenOS

    OpenAIRE

    Horký, Vojtěch

    2011-01-01

    The goal of this master thesis is to extend HelenOS operating system with the support for ccNUMA hardware. The text of the thesis contains a brief introduction to ccNUMA hardware, an overview of NUMA features and relevant features of HelenOS (memory management, scheduling, etc.). The thesis analyses various design decisions of the implementation of NUMA support -- introducing the hardware topology into the kernel data structures, propagating this information to user space, thread affinity to ...

  11. Demo: Distributed Real-Time Generative 3D Hand Tracking using Edge GPGPU Acceleration

    DEFF Research Database (Denmark)

    Qammaz, Ammar; Kosta, Sokol; Kyriazis, Nikolaos

    2018-01-01

    computations locally. The network connection takes the place of a GPGPU accelerator and sharing resources with a larger workstation becomes the acceleration mechanism. The unique properties of a generative optimizer are examined and constitute a challenging use-case, since the requirement for real......This work demonstrates a real-time 3D hand tracking application that runs via computation offloading. The proposed framework enables the application to run on low-end mobile devices such as laptops and tablets, despite the fact that they lack the sufficient hardware to perform the required...

  12. Sterilization of space hardware.

    Science.gov (United States)

    Pflug, I. J.

    1971-01-01

    Discussion of various techniques of sterilization of space flight hardware using either destructive heating or the action of chemicals. Factors considered in the dry-heat destruction of microorganisms include the effects of microbial water content, temperature, the physicochemical properties of the microorganism and adjacent support, and nature of the surrounding gas atmosphere. Dry-heat destruction rates of microorganisms on the surface, between mated surface areas, or buried in the solid material of space vehicle hardware are reviewed, along with alternative dry-heat sterilization cycles, thermodynamic considerations, and considerations of final sterilization-process design. Discussed sterilization chemicals include ethylene oxide, formaldehyde, methyl bromide, dimethyl sulfoxide, peracetic acid, and beta-propiolactone.

  13. Software for Managing Inventory of Flight Hardware

    Science.gov (United States)

    Salisbury, John; Savage, Scott; Thomas, Shirman

    2003-01-01

    The Flight Hardware Support Request System (FHSRS) is a computer program that relieves engineers at Marshall Space Flight Center (MSFC) of most of the non-engineering administrative burden of managing an inventory of flight hardware. The FHSRS can also be adapted to perform similar functions for other organizations. The FHSRS affords a combination of capabilities, including those formerly provided by three separate programs in purchasing, inventorying, and inspecting hardware. The FHSRS provides a Web-based interface with a server computer that supports a relational database of inventory; electronic routing of requests and approvals; and electronic documentation from initial request through implementation of quality criteria, acquisition, receipt, inspection, storage, and final issue of flight materials and components. The database lists both hardware acquired for current projects and residual hardware from previous projects. The increased visibility of residual flight components provided by the FHSRS has dramatically improved the re-utilization of materials in lieu of new procurements, resulting in a cost savings of over $1.7 million. The FHSRS includes subprograms for manipulating the data in the database, informing of the status of a request or an item of hardware, and searching the database on any physical or other technical characteristic of a component or material. The software structure forces normalization of the data to facilitate inquiries and searches for which users have entered mixed or inconsistent values.

  14. Targeting multiple heterogeneous hardware platforms with OpenCL

    Science.gov (United States)

    Fox, Paul A.; Kozacik, Stephen T.; Humphrey, John R.; Paolini, Aaron; Kuller, Aryeh; Kelmelis, Eric J.

    2014-06-01

    The OpenCL API allows for the abstract expression of parallel, heterogeneous computing, but hardware implementations have substantial implementation differences. The abstractions provided by the OpenCL API are often insufficiently high-level to conceal differences in hardware architecture. Additionally, implementations often do not take advantage of potential performance gains from certain features due to hardware limitations and other factors. These factors make it challenging to produce code that is portable in practice, resulting in much OpenCL code being duplicated for each hardware platform being targeted. This duplication of effort offsets the principal advantage of OpenCL: portability. The use of certain coding practices can mitigate this problem, allowing a common code base to be adapted to perform well across a wide range of hardware platforms. To this end, we explore some general practices for producing performant code that are effective across platforms. Additionally, we explore some ways of modularizing code to enable optional optimizations that take advantage of hardware-specific characteristics. The minimum requirement for portability implies avoiding the use of OpenCL features that are optional, not widely implemented, poorly implemented, or missing in major implementations. Exposing multiple levels of parallelism allows hardware to take advantage of the types of parallelism it supports, from the task level down to explicit vector operations. Static optimizations and branch elimination in device code help the platform compiler to effectively optimize programs. Modularization of some code is important to allow operations to be chosen for performance on target hardware. Optional subroutines exploiting explicit memory locality allow for different memory hierarchies to be exploited for maximum performance. The C preprocessor and JIT compilation using the OpenCL runtime can be used to enable some of these techniques, as well as to factor in hardware

  15. Scalable Integrated Region-Based Image Retrieval Using IRM and Statistical Clustering.

    Science.gov (United States)

    Wang, James Z.; Du, Yanping

    Statistical clustering is critical in designing scalable image retrieval systems. This paper presents a scalable algorithm for indexing and retrieving images based on region segmentation. The method uses statistical clustering on region features and IRM (Integrated Region Matching), a measure developed to evaluate overall similarity between images…

  16. Scalability Dilemma and Statistic Multiplexed Computing — A Theory and Experiment

    Directory of Open Access Journals (Sweden)

    Justin Yuan Shi

    2017-08-01

    Full Text Available The For the last three decades, end-to-end computing paradigms, such as MPI (Message Passing Interface, RPC (Remote Procedure Call and RMI (Remote Method Invocation, have been the de facto paradigms for distributed and parallel programming. Despite of the successes, applications built using these paradigms suffer due to the proportionality factor of crash in the application with its size. Checkpoint/restore and backup/recovery are the only means to save otherwise lost critical information. The scalability dilemma is such a practical challenge that the probability of the data losses increases as the application scales in size. The theoretical significance of this practical challenge is that it undermines the fundamental structure of the scientific discovery process and mission critical services in production today. In 1997, the direct use of end-to-end reference model in distributed programming was recognized as a fallacy. The scalability dilemma was predicted. However, this voice was overrun by the passage of time. Today, the rapidly growing digitized data demands solving the increasingly critical scalability challenges. Computing architecture scalability, although loosely defined, is now the front and center of large-scale computing efforts. Constrained only by the economic law of diminishing returns, this paper proposes a narrow definition of a Scalable Computing Service (SCS. Three scalability tests are also proposed in order to distinguish service architecture flaws from poor application programming. Scalable data intensive service requires additional treatments. Thus, the data storage is assumed reliable in this paper. A single-sided Statistic Multiplexed Computing (SMC paradigm is proposed. A UVR (Unidirectional Virtual Ring SMC architecture is examined under SCS tests. SMC was designed to circumvent the well-known impossibility of end-to-end paradigms. It relies on the proven statistic multiplexing principle to deliver reliable service

  17. Applications of microprocessors in upgrading of accelerator controls

    International Nuclear Information System (INIS)

    Mallory, K.B.

    1977-03-01

    Experience at SLAC demonstrates that the criteria for selection and use of microprocessors in modifying an existing control system may differ from the criteria that apply during installation of the control system of a new accelerator. Considerations such as cost of individual projects, progressive installation without disruption of operations and training of on-board personnel can outweigh ''obvious'' goals such as standardization of hardware, uniformity of software, or even a rigid specification of link protocols with the main computer system

  18. How accelerator operations does business at Jefferson Lab

    International Nuclear Information System (INIS)

    Green, David W. Jr.

    2004-01-01

    The accelerator is staffed 24 hours a day by the MCC Operations Group. Shift rotations are for seven days on shift, followed by seven days off shift, of which three days are spent on off-shift activities. Personnel spend 70% of their time on shift and 30% off shift. The off-shift time is utilized for meetings, training and individual projects. Individual projects can consist of hardware or software development, training, documentation development or other areas of interest, depending on the individual. (author)

  19. Approaching maximal performance of longitudinal beam compression in induction accelerator drivers

    International Nuclear Information System (INIS)

    Mark, J.W.K.; Ho, D.D.M.; Brandon, S.T.; Chang, C.L.; Drobot, A.T.; Faltens, A.; Lee, E.P.; Krafft, G.A.

    1986-01-01

    Longitudinal beam compression is an integral part of the US induction accelerator development effort for heavy ion fusion. Producing maximal performance for key accelerator components is an essential element of the effort to reduce driver costs. We outline here initial studies directed towards defining the limits of final beam compression including considerations such as: maximal available compression, effects of longitudinal dispersion and beam emittance, combining pulse-shaping with beam compression to reduce the total number of beam manipulations, etc. The use of higher ion charge state Z greater than or equal to 3 is likely to test the limits of the previously envisaged beam compression and final focus hardware. A more conservative approach is to use additional beamlets in final compression and focus. On the other end of the spectrum of choices, alternate approaches might consider new final focus with greater tolerances for systematic momentum and current variations. Development of such final focus concepts would also allow more compact (and hopefully cheaper) hardware packages where the previously separate processes of beam compression, pulse-shaping and final focus occur as partially combined and nearly concurrent beam manipulations

  20. The graphics software of the Saclay linear accelerator control system

    International Nuclear Information System (INIS)

    Gournay, J.F.

    1987-06-01

    The Control system of the Saclay Linear Accelerator is based upon modern technology hardware. In the graphic software, pictures are created in exactly the same manner for all the graphic devices supported by the system. The informations used to draw a picture are stored in an array called a graphic segment. Three output primitives are used to add graphic material in a segment. Three coordinate systems are defined

  1. Improvement of hardware basic testing : Identification and development of a scripted automation tool that will support hardware basic testing

    OpenAIRE

    Rask, Ulf; Mannestig, Pontus

    2002-01-01

    In the ever-increasing development pace, circuits and hardware are no exception. Hardware designs grow and circuits gets more complex at the same time as the market pressure lowers the expected time-to-market. In this rush, verification methods often lag behind. Hardware manufacturers must be aware of the importance of total verification if they want to avoid quality flaws and broken deadlines which in the long run will lead to delayed time-to-market, bad publicity and a decreasing market sha...

  2. An accelerator controls network designed for reliability and flexibility

    International Nuclear Information System (INIS)

    McDowell, W. P.; Sidorowicz, K. V.

    1997-01-01

    The APS accelerator control system is a typical modern system based on the standard control system model, which consists of operator interfaces to a network and computer-controlled interfaces to hardware. The network provides a generalized communication path between the host computers, operator workstations, input/output crates, and other hardware that comprise the control system. The network is an integral part of all modern control systems and network performance will determine many characteristics of a control system. This paper describes the methods used to provide redundancy for various network system components as well as methods used to provide comprehensive monitoring of this network. The effect of archiving tens of thousands of data points on a regular basis and the effect on the controls network will be discussed. Metrics are provided on the performance of the system under various conditions

  3. Architecture and development of the CDF hardware event builder

    International Nuclear Information System (INIS)

    Shaw, T.M.; Booth, A.W.; Bowden, M.

    1989-01-01

    A hardware Event Builder (EVB) has been developed for use at the Collider Detector experiment at Fermi National Accelerator (CDF). the Event builder presently consists of five FASTBUS modules and has the task of reading out the front end scanners, reformatting the data into YBOS bank structure, and transmitting the data to a Level 3 (L3) trigger system which is composed of multiple VME processing nodes. The Event Builder receives its instructions from a VAX based Buffer Manager (BFM) program via a Unibus Processor Interface (UPI). The Buffer Manager instructs the Event Builder to read out one of the four CDF front end buffers. The Event Builder then informs the Buffer Manager when the event has been formatted and then is instructed to push it up to the L3 trigger system. Once in the L3 system, a decision is made as to whether to write the event to tape

  4. Evaluation of 3D printed anatomically scalable transfemoral prosthetic knee.

    Science.gov (United States)

    Ramakrishnan, Tyagi; Schlafly, Millicent; Reed, Kyle B

    2017-07-01

    This case study compares a transfemoral amputee's gait while using the existing Ossur Total Knee 2000 and our novel 3D printed anatomically scalable transfemoral prosthetic knee. The anatomically scalable transfemoral prosthetic knee is 3D printed out of a carbon-fiber and nylon composite that has a gear-mesh coupling with a hard-stop weight-actuated locking mechanism aided by a cross-linked four-bar spring mechanism. This design can be scaled using anatomical dimensions of a human femur and tibia to have a unique fit for each user. The transfemoral amputee who was tested is high functioning and walked on the Computer Assisted Rehabilitation Environment (CAREN) at a self-selected pace. The motion capture and force data that was collected showed that there were distinct differences in the gait dynamics. The data was used to perform the Combined Gait Asymmetry Metric (CGAM), where the scores revealed that the overall asymmetry of the gait on the Ossur Total Knee was more asymmetric than the anatomically scalable transfemoral prosthetic knee. The anatomically scalable transfemoral prosthetic knee had higher peak knee flexion that caused a large step time asymmetry. This made walking on the anatomically scalable transfemoral prosthetic knee more strenuous due to the compensatory movements in adapting to the different dynamics. This can be overcome by tuning the cross-linked spring mechanism to emulate the dynamics of the subject better. The subject stated that the knee would be good for daily use and has the potential to be adapted as a running knee.

  5. Scalable optical switches for computing applications

    NARCIS (Netherlands)

    White, I.H.; Aw, E.T.; Williams, K.A.; Wang, Haibo; Wonfor, A.; Penty, R.V.

    2009-01-01

    A scalable photonic interconnection network architecture is proposed whereby a Clos network is populated with broadcast-and-select stages. This enables the efficient exploitation of an emerging class of photonic integrated switch fabric. A low distortion space switch technology based on recently

  6. On the scalability of LISP and advanced overlaid services

    OpenAIRE

    Coras, Florin

    2015-01-01

    In just four decades the Internet has gone from a lab experiment to a worldwide, business critical infrastructure that caters to the communication needs of almost a half of the Earth's population. With these figures on its side, arguing against the Internet's scalability would seem rather unwise. However, the Internet's organic growth is far from finished and, as billions of new devices are expected to be joined in the not so distant future, scalability, or lack thereof, is commonly believed ...

  7. Heavy Ion Fusion Accelerator Research (HIFAR) year-end report, April 1--September 30, 1988

    International Nuclear Information System (INIS)

    1988-12-01

    The basic objective of the Heavy Ion Fusion Accelerator Research (HIFAR) program is to assess the suitability of heavy ion accelerators as igniters for Inertial Confinement Fusion (ICF). A specific accelerator technology, the induction linac, has been studied at the Lawrence Berkeley Laboratory and has reached the point at which its viability for ICF applications can be assessed over the next few years. The HIFAR program addresses the generation of high power, high-brightness beams of heavy ions, the understanding of the scaling laws in this novel physics regime, and the validation of new accelerator strategies, to cut costs. Key elements to be addressed include: beam quality limits set by transverse and longitudinal beam physics; development of induction accelerating modules, and multiple-beam hardware, at affordable costs; acceleration of multiple beams with current amplification --both new features in a linac -- without significant dilution of the optical quality of the beams; final bunching, transport, and accurate focusing on a small target

  8. Scalable Algorithms for Adaptive Statistical Designs

    Directory of Open Access Journals (Sweden)

    Robert Oehmke

    2000-01-01

    Full Text Available We present a scalable, high-performance solution to multidimensional recurrences that arise in adaptive statistical designs. Adaptive designs are an important class of learning algorithms for a stochastic environment, and we focus on the problem of optimally assigning patients to treatments in clinical trials. While adaptive designs have significant ethical and cost advantages, they are rarely utilized because of the complexity of optimizing and analyzing them. Computational challenges include massive memory requirements, few calculations per memory access, and multiply-nested loops with dynamic indices. We analyze the effects of various parallelization options, and while standard approaches do not work well, with effort an efficient, highly scalable program can be developed. This allows us to solve problems thousands of times more complex than those solved previously, which helps make adaptive designs practical. Further, our work applies to many other problems involving neighbor recurrences, such as generalized string matching.

  9. Scalable fabrication of perovskite solar cells

    Energy Technology Data Exchange (ETDEWEB)

    Li, Zhen; Klein, Talysa R.; Kim, Dong Hoe; Yang, Mengjin; Berry, Joseph J.; van Hest, Maikel F. A. M.; Zhu, Kai

    2018-03-27

    Perovskite materials use earth-abundant elements, have low formation energies for deposition and are compatible with roll-to-roll and other high-volume manufacturing techniques. These features make perovskite solar cells (PSCs) suitable for terawatt-scale energy production with low production costs and low capital expenditure. Demonstrations of performance comparable to that of other thin-film photovoltaics (PVs) and improvements in laboratory-scale cell stability have recently made scale up of this PV technology an intense area of research focus. Here, we review recent progress and challenges in scaling up PSCs and related efforts to enable the terawatt-scale manufacturing and deployment of this PV technology. We discuss common device and module architectures, scalable deposition methods and progress in the scalable deposition of perovskite and charge-transport layers. We also provide an overview of device and module stability, module-level characterization techniques and techno-economic analyses of perovskite PV modules.

  10. GPU accelerated manifold correction method for spinning compact binaries

    Science.gov (United States)

    Ran, Chong-xi; Liu, Song; Zhong, Shuang-ying

    2018-04-01

    The graphics processing unit (GPU) acceleration of the manifold correction algorithm based on the compute unified device architecture (CUDA) technology is designed to simulate the dynamic evolution of the Post-Newtonian (PN) Hamiltonian formulation of spinning compact binaries. The feasibility and the efficiency of parallel computation on GPU have been confirmed by various numerical experiments. The numerical comparisons show that the accuracy on GPU execution of manifold corrections method has a good agreement with the execution of codes on merely central processing unit (CPU-based) method. The acceleration ability when the codes are implemented on GPU can increase enormously through the use of shared memory and register optimization techniques without additional hardware costs, implying that the speedup is nearly 13 times as compared with the codes executed on CPU for phase space scan (including 314 × 314 orbits). In addition, GPU-accelerated manifold correction method is used to numerically study how dynamics are affected by the spin-induced quadrupole-monopole interaction for black hole binary system.

  11. Static Scheduling of Periodic Hardware Tasks with Precedence and Deadline Constraints on Reconfigurable Hardware Devices

    Directory of Open Access Journals (Sweden)

    Ikbel Belaid

    2011-01-01

    Full Text Available Task graph scheduling for reconfigurable hardware devices can be defined as finding a schedule for a set of periodic tasks with precedence, dependence, and deadline constraints as well as their optimal allocations on the available heterogeneous hardware resources. This paper proposes a new methodology comprising three main stages. Using these three main stages, dynamic partial reconfiguration and mixed integer programming, pipelined scheduling and efficient placement are achieved and enable parallel computing of the task graph on the reconfigurable devices by optimizing placement/scheduling quality. Experiments on an application of heterogeneous hardware tasks demonstrate an improvement of resource utilization of 12.45% of the available reconfigurable resources corresponding to a resource gain of 17.3% compared to a static design. The configuration overhead is reduced to 2% of the total running time. Due to pipelined scheduling, the task graph spanning is minimized by 4% compared to sequential execution of the graph.

  12. Standard Modular Hydropower Technology Acceleration Workshop: Summary Report

    Energy Technology Data Exchange (ETDEWEB)

    Smith, Brennan T. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); DeNeale, Scott T. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Witt, Adam M. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Mobley, Miles H. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States); Fernandez, Alisha R. [Oak Ridge National Lab. (ORNL), Oak Ridge, TN (United States)

    2017-08-01

    In support of the Department of Energy (DOE) funded Standard Modular Hydropower (SMH) Technology Acceleration project, Oak Ridge National Laboratory (ORNL) staff convened with five small hydropower technology entrepreneurs on June 14 and 15, 2017 to discuss gaps, challenges, and opportunities for small modular hydropower development. The workshop was designed to walk through SMH concepts, discuss the SMH research vision, assess how each participant’s technology aligns with SMH concepts and research, and identify future pathways for mutually beneficial collaboration that leverages ORNL expertise and entrepreneurial industry experience. The goal coming out of the workshop is to advance standardized, scalable, modular hydropower technologies and development approaches with sustained and open dialogue among diverse stakeholder groups.

  13. A Framework for Scalable TSV Assignment and Selection in Three-Dimensional Networks-on-Chips

    Directory of Open Access Journals (Sweden)

    Amir Charif

    2017-01-01

    Full Text Available 3D integration can greatly benefit future many-cores by enabling low-latency three-dimensional Network-on-Chip (3D-NoC topologies. However, due to high cost, low yield, and frequent failures of Through-Silicon Via (TSV, 3D-NoCs are most likely to include only a few vertical connections, resulting in incomplete topologies that pose new challenges in terms of deadlock-free routing and TSV assignment. The routers of such networks require a way to locate the nodes that have vertical connections, commonly known as elevators, and select one of them in order to be able to reach other layers when necessary. In this paper, several alternative TSV selection strategies requiring a constant amount of configurable bits per router are introduced. Each proposed solution consists of a configuration algorithm, which provides each router with the necessary information to locate the elevators, and a routing algorithm, which uses this information at runtime to route packets to an elevator. Our algorithms are compared by simulation to highlight the advantages and disadvantages of each solution under various scenarios, and hardware synthesis results demonstrate the scalability of the proposed approach and its suitability for cost-oriented designs.

  14. Model-Based Evaluation Of System Scalability: Bandwidth Analysis For Smartphone-Based Biosensing Applications

    DEFF Research Database (Denmark)

    Patou, François; Madsen, Jan; Dimaki, Maria

    2016-01-01

    Scalability is a design principle often valued for the engineering of complex systems. Scalability is the ability of a system to change the current value of one of its specification parameters. Although targeted frameworks are available for the evaluation of scalability for specific digital systems...... re-engineering of 5 independent system modules, from the replacement of a wireless Bluetooth interface, to the revision of the ADC sample-and-hold operation could help increase system bandwidth....

  15. Scalability of Sustainable Business Models in Hybrid Organizations

    Directory of Open Access Journals (Sweden)

    Adam Jabłoński

    2016-02-01

    Full Text Available The dynamics of change in modern business create new mechanisms for company management to determine their pursuit and the achievement of their high performance. This performance maintained over a long period of time becomes a source of ensuring business continuity by companies. An ontological being enabling the adoption of such assumptions is such a business model that has the ability to generate results in every possible market situation and, moreover, it has the features of permanent adaptability. A feature that describes the adaptability of the business model is its scalability. Being a factor ensuring more work and more efficient work with an increasing number of components, scalability can be applied to the concept of business models as the company’s ability to maintain similar or higher performance through it. Ensuring the company’s performance in the long term helps to build the so-called sustainable business model that often balances the objectives of stakeholders and shareholders, and that is created by the implemented principles of value-based management and corporate social responsibility. This perception of business paves the way for building hybrid organizations that integrate business activities with pro-social ones. The combination of an approach typical of hybrid organizations in designing and implementing sustainable business models pursuant to the scalability criterion seems interesting from the cognitive point of view. Today, hybrid organizations are great spaces for building effective and efficient mechanisms for dialogue between business and society. This requires the appropriate business model. The purpose of the paper is to present the conceptualization and operationalization of scalability of sustainable business models that determine the performance of a hybrid organization in the network environment. The paper presents the original concept of applying scalability in sustainable business models with detailed

  16. Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research

    Directory of Open Access Journals (Sweden)

    Burgess Shane C

    2008-04-01

    Full Text Available Abstract Background This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences. Results This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs. Conclusion Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation.

  17. COMPUTER HARDWARE MARKING

    CERN Multimedia

    Groupe de protection des biens

    2000-01-01

    As part of the campaign to protect CERN property and for insurance reasons, all computer hardware belonging to the Organization must be marked with the words 'PROPRIETE CERN'.IT Division has recently introduced a new marking system that is both economical and easy to use. From now on all desktop hardware (PCs, Macintoshes, printers) issued by IT Division with a value equal to or exceeding 500 CHF will be marked using this new system.For equipment that is already installed but not yet marked, including UNIX workstations and X terminals, IT Division's Desktop Support Service offers the following services free of charge:Equipment-marking wherever the Service is called out to perform other work (please submit all work requests to the IT Helpdesk on 78888 or helpdesk@cern.ch; for unavoidable operational reasons, the Desktop Support Service will only respond to marking requests when these coincide with requests for other work such as repairs, system upgrades, etc.);Training of personnel designated by Division Leade...

  18. Fast & scalable pattern transfer via block copolymer nanolithography

    DEFF Research Database (Denmark)

    Li, Tao; Wang, Zhongli; Schulte, Lars

    2015-01-01

    A fully scalable and efficient pattern transfer process based on block copolymer (BCP) self-assembling directly on various substrates is demonstrated. PS-rich and PDMS-rich poly(styrene-b-dimethylsiloxane) (PS-b-PDMS) copolymers are used to give monolayer sphere morphology after spin-casting of s......A fully scalable and efficient pattern transfer process based on block copolymer (BCP) self-assembling directly on various substrates is demonstrated. PS-rich and PDMS-rich poly(styrene-b-dimethylsiloxane) (PS-b-PDMS) copolymers are used to give monolayer sphere morphology after spin...... on long range lateral order, including fabrication of substrates for catalysis, solar cells, sensors, ultrafiltration membranes and templating of semiconductors or metals....

  19. GOSH! A roadmap for open-source science hardware

    CERN Multimedia

    Stefania Pandolfi

    2016-01-01

    The goal of the Gathering for Open Science Hardware (GOSH! 2016), held from 2 to 5 March 2016 at IdeaSquare, was to lay the foundations of the open-source hardware for science movement.   The participants in the GOSH! 2016 meeting gathered in IdeaSquare. (Image: GOSH Community) “Despite advances in technology, many scientific innovations are held back because of a lack of affordable and customisable hardware,” says François Grey, a professor at the University of Geneva and coordinator of Citizen Cyberlab – a partnership between CERN, the UN Institute for Training and Research and the University of Geneva – which co-organised the GOSH! 2016 workshop. “This scarcity of accessible science hardware is particularly obstructive for citizen science groups and humanitarian organisations that don’t have the same economic means as a well-funded institution.” Instead, open sourcing science hardware co...

  20. Scalable Motion Estimation Processor Core for Multimedia System-on-Chip Applications

    Science.gov (United States)

    Lai, Yeong-Kang; Hsieh, Tian-En; Chen, Lien-Fei

    2007-04-01

    In this paper, we describe a high-throughput and scalable motion estimation processor architecture for multimedia system-on-chip applications. The number of processing elements (PEs) is scalable according to the variable algorithm parameters and the performance required for different applications. Using the PE rings efficiently and an intelligent memory-interleaving organization, the efficiency of the architecture can be increased. Moreover, using efficient on-chip memories and a data management technique can effectively decrease the power consumption and memory bandwidth. Techniques for reducing the number of interconnections and external memory accesses are also presented. Our results demonstrate that the proposed scalable PE-ringed architecture is a flexible and high-performance processor core in multimedia system-on-chip applications.

  1. NeuroPigPen: A Scalable Toolkit for Processing Electrophysiological Signal Data in Neuroscience Applications Using Apache Pig.

    Science.gov (United States)

    Sahoo, Satya S; Wei, Annan; Valdez, Joshua; Wang, Li; Zonjy, Bilal; Tatsuoka, Curtis; Loparo, Kenneth A; Lhatoo, Samden D

    2016-01-01

    The recent advances in neurological imaging and sensing technologies have led to rapid increase in the volume, rate of data generation, and variety of neuroscience data. This "neuroscience Big data" represents a significant opportunity for the biomedical research community to design experiments using data with greater timescale, large number of attributes, and statistically significant data size. The results from these new data-driven research techniques can advance our understanding of complex neurological disorders, help model long-term effects of brain injuries, and provide new insights into dynamics of brain networks. However, many existing neuroinformatics data processing and analysis tools were not built to manage large volume of data, which makes it difficult for researchers to effectively leverage this available data to advance their research. We introduce a new toolkit called NeuroPigPen that was developed using Apache Hadoop and Pig data flow language to address the challenges posed by large-scale electrophysiological signal data. NeuroPigPen is a modular toolkit that can process large volumes of electrophysiological signal data, such as Electroencephalogram (EEG), Electrocardiogram (ECG), and blood oxygen levels (SpO2), using a new distributed storage model called Cloudwave Signal Format (CSF) that supports easy partitioning and storage of signal data on commodity hardware. NeuroPigPen was developed with three design principles: (a) Scalability-the ability to efficiently process increasing volumes of data; (b) Adaptability-the toolkit can be deployed across different computing configurations; and (c) Ease of programming-the toolkit can be easily used to compose multi-step data processing pipelines using high-level programming constructs. The NeuroPigPen toolkit was evaluated using 750 GB of electrophysiological signal data over a variety of Hadoop cluster configurations ranging from 3 to 30 Data nodes. The evaluation results demonstrate that the toolkit

  2. Toward a Scalable and Sustainable Intervention for Complementary Food Safety.

    Science.gov (United States)

    Rahman, Musarrat J; Nizame, Fosiul A; Nuruzzaman, Mohammad; Akand, Farhana; Islam, Mohammad Aminul; Parvez, Sarker Masud; Stewart, Christine P; Unicomb, Leanne; Luby, Stephen P; Winch, Peter J

    2016-06-01

    Contaminated complementary foods are associated with diarrhea and malnutrition among children aged 6 to 24 months. However, existing complementary food safety intervention models are likely not scalable and sustainable. To understand current behaviors, motivations for these behaviors, and the potential barriers to behavior change and to identify one or two simple actions that can address one or few food contamination pathways and have potential to be sustainably delivered to a larger population. Data were collected from 2 rural sites in Bangladesh through semistructured observations (12), video observations (12), in-depth interviews (18), and focus group discussions (3). Although mothers report preparing dedicated foods for children, observations show that these are not separate from family foods. Children are regularly fed store-bought foods that are perceived to be bad for children. Mothers explained that long storage durations, summer temperatures, flies, animals, uncovered food, and unclean utensils are threats to food safety. Covering foods, storing foods on elevated surfaces, and reheating foods before consumption are methods believed to keep food safe. Locally made cabinet-like hardware is perceived to be acceptable solution to address reported food safety threats. Conventional approaches that include teaching food safety and highlighting benefits such as reduced contamination may be a disincentive for rural mothers who need solutions for their physical environment. We propose extending existing beneficial behaviors by addressing local preferences of taste and convenience. © The Author(s) 2016.

  3. Predictive Performance Tuning of OpenACC Accelerated Applications

    KAUST Repository

    Siddiqui, Shahzeb

    2014-05-04

    Graphics Processing Units (GPUs) are gradually becoming mainstream in supercomputing as their capabilities to significantly accelerate a large spectrum of scientific applications have been clearly identified and proven. Moreover, with the introduction of high level programming models such as OpenACC [1] and OpenMP 4.0 [2], these devices are becoming more accessible and practical to use by a larger scientific community. However, performance optimization of OpenACC accelerated applications usually requires an in-depth knowledge of the hardware and software specifications. We suggest a prediction-based performance tuning mechanism [3] to quickly tune OpenACC parameters for a given application to dynamically adapt to the execution environment on a given system. This approach is applied to a finite difference kernel to tune the OpenACC gang and vector clauses for mapping the compute kernels into the underlying accelerator architecture. Our experiments show a significant performance improvement against the default compiler parameters and a faster tuning by an order of magnitude compared to the brute force search tuning.

  4. Accounting Fundamentals and the Variation of Stock Price: Factoring in the Investment Scalability

    Directory of Open Access Journals (Sweden)

    Sumiyana Sumiyana

    2010-05-01

    Full Text Available This study develops a new return model with respect to accounting fundamentals. The new return model is based on Chen and Zhang (2007. This study takes into account theinvestment scalability information. Specifically, this study splitsthe scale of firm’s operations into short-run and long-runinvestment scalabilities. We document that five accounting fun-damentals explain the variation of annual stock return. Thefactors, comprised book value, earnings yield, short-run andlong-run investment scalabilities, and growth opportunities, co associate positively with stock price. The remaining factor,which is the pure interest rate, is negatively related to annualstock return. This study finds that inducing short-run and long-run investment scalabilities into the model could improve the degree of association. In other words, they have value rel-evance. Finally, this study suggests that basic trading strategieswill improve if investors revert to the accounting fundamentals. Keywords: accounting fundamentals; book value; earnings yield; growth opportuni­ties; short­run and long­run investment scalabilities; trading strategy;value relevance

  5. Scalable and balanced dynamic hybrid data assimilation

    Science.gov (United States)

    Kauranne, Tuomo; Amour, Idrissa; Gunia, Martin; Kallio, Kari; Lepistö, Ahti; Koponen, Sampsa

    2017-04-01

    Scalability of complex weather forecasting suites is dependent on the technical tools available for implementing highly parallel computational kernels, but to an equally large extent also on the dependence patterns between various components of the suite, such as observation processing, data assimilation and the forecast model. Scalability is a particular challenge for 4D variational assimilation methods that necessarily couple the forecast model into the assimilation process and subject this combination to an inherently serial quasi-Newton minimization process. Ensemble based assimilation methods are naturally more parallel, but large models force ensemble sizes to be small and that results in poor assimilation accuracy, somewhat akin to shooting with a shotgun in a million-dimensional space. The Variational Ensemble Kalman Filter (VEnKF) is an ensemble method that can attain the accuracy of 4D variational data assimilation with a small ensemble size. It achieves this by processing a Gaussian approximation of the current error covariance distribution, instead of a set of ensemble members, analogously to the Extended Kalman Filter EKF. Ensemble members are re-sampled every time a new set of observations is processed from a new approximation of that Gaussian distribution which makes VEnKF a dynamic assimilation method. After this a smoothing step is applied that turns VEnKF into a dynamic Variational Ensemble Kalman Smoother VEnKS. In this smoothing step, the same process is iterated with frequent re-sampling of the ensemble but now using past iterations as surrogate observations until the end result is a smooth and balanced model trajectory. In principle, VEnKF could suffer from similar scalability issues as 4D-Var. However, this can be avoided by isolating the forecast model completely from the minimization process by implementing the latter as a wrapper code whose only link to the model is calling for many parallel and totally independent model runs, all of them

  6. CWDD accelerator at Argonne: Status and future opportunities

    International Nuclear Information System (INIS)

    McMichael, G.; Carwardine, J.; Den Hartog, P.; Sagalovsky, L.; Yule, T.; Clarkson, I.; Papsco, R.; Pile, G.

    1994-01-01

    The Continuous Wave Deuterium Demonstrator (CWDD) accelerator, a cryogenically-cooled (26K) linac, was designed to accelerate 80 mA cw of D to 7.5 MeV. CWDD was being built to demonstrate the launching of a beam with characteristics suitable for a space-based neutral particle beam (NPB). A considerable amount of hardware was constructed and installed in the Argonne-based facility, and major performance milestones were achieved before program funding ended in October 1993. Existing assets have been turned over to Argonne for continuation under other sponsors. These include a fully functional 200 kV cw D injector and high power (1 MW) cw rf amplifier, a cw RFQ that has been tuned, leak checked and aligned, and a partially completed ramped-gradient DTL. Project status and achievements are reviewed and proposals for future use of the equipment are discussed

  7. Heavy Ion Fusion Accelerator Research (HIFAR) half-year report, October 1, 1988--March 31, 1989

    International Nuclear Information System (INIS)

    1989-06-01

    The basic objective of the Heavy Ion Fusion Accelerator Research (HIFAR) program is to assess the suitability of heavy ion accelerators as igniters for Inertial Confinement Fusion (ICF). A specific accelerator technology, the induction linac, has been studied at the Lawrence Berkeley Laboratory and has reached the point at which its viability for ICF applications can be assessed over the next few years. The HIFAR program addresses the generation of high-power, high-brightness beams of heavy ions, the understanding of the scaling laws in this novel physics regime, and the validation of new accelerator strategies, to cut costs. Key elements to be addressed include: beam quality limits set by transverse and longitudinal beam physics; development of induction accelerating modules, and multiple-beam hardware, at affordable costs; acceleration of multiple beams with current amplification --both new features in a linac -- without significant dilution of the optical quality of the beams; and final bunching, transport, and accurate focusing on a small target

  8. Heavy Ion Fusion Accelerator Research (HIFAR) year-end report, October 1, 1987--March 31, 1988

    International Nuclear Information System (INIS)

    1988-06-01

    The basic objective of the Heavy Ion Fusion Accelerator Research (HIFAR) program is to assess the suitability of heavy ion accelerators as igniters for Inertial Confinement Fusion (ICF). A specific accelerator technology, the induction linac, has been studied at Lawrence Berkeley Laboratory and has reached the point at which its viability for ICF applications can be assessed over the next few years. The HIFAR program addresses the generation of high-power, high-brightness beams of heavy ions, the understanding of the scaling laws in this novel physics regime, and the validation of new accelerator strategies, to cut costs. Key elements to be addressed include: beam quality limits set by transverse and longitudinal beam physics; development of induction accelerating modules, and multiple-beam hardware, at affordable costs; acceleration of multiple beams with current amplification -- both new features in a linac -- without significant dilution of the optical quality of beams; and final bunching, transport, and accurate focusing on a small target

  9. Progress Report 2008: A Scalable and Extensible Earth System Model for Climate Change Science

    Energy Technology Data Exchange (ETDEWEB)

    Drake, John B [ORNL; Worley, Patrick H [ORNL; Hoffman, Forrest M [ORNL; Jones, Phil [Los Alamos National Laboratory (LANL)

    2009-01-01

    This project employs multi-disciplinary teams to accelerate development of the Community Climate System Model (CCSM), based at the National Center for Atmospheric Research (NCAR). A consortium of eight Department of Energy (DOE) National Laboratories collaborate with NCAR and the NASA Global Modeling and Assimilation Office (GMAO). The laboratories are Argonne (ANL), Brookhaven (BNL) Los Alamos (LANL), Lawrence Berkeley (LBNL), Lawrence Livermore (LLNL), Oak Ridge (ORNL), Pacific Northwest (PNNL) and Sandia (SNL). The work plan focuses on scalablity for petascale computation and extensibility to a more comprehensive earth system model. Our stated goal is to support the DOE mission in climate change research by helping ... To determine the range of possible climate changes over the 21st century and beyond through simulations using a more accurate climate system model that includes the full range of human and natural climate feedbacks with increased realism and spatial resolution.

  10. On the scalability of uncoordinated multiple access for the Internet of Things

    KAUST Repository

    Chisci, Giovanni

    2017-11-16

    The Internet of things (IoT) will entail massive number of wireless connections with sporadic traffic patterns. To support the IoT traffic, several technologies are evolving to support low power wide area (LPWA) wireless communications. However, LPWA networks rely on variations of uncoordinated spectrum access, either for data transmissions or scheduling requests, thus imposing a scalability problem to the IoT. This paper presents a novel spatiotemporal model to study the scalability of the ALOHA medium access. In particular, the developed mathematical model relies on stochastic geometry and queueing theory to account for spatial and temporal attributes of the IoT. To this end, the scalability of the ALOHA is characterized by the percentile of IoT devices that can be served while keeping their queues stable. The results highlight the scalability problem of ALOHA and quantify the extend to which ALOHA can support in terms of number of devices, traffic requirement, and transmission rate.

  11. Reliable software for unreliable hardware a cross layer perspective

    CERN Document Server

    Rehman, Semeen; Henkel, Jörg

    2016-01-01

    This book describes novel software concepts to increase reliability under user-defined constraints. The authors’ approach bridges, for the first time, the reliability gap between hardware and software. Readers will learn how to achieve increased soft error resilience on unreliable hardware, while exploiting the inherent error masking characteristics and error (stemming from soft errors, aging, and process variations) mitigations potential at different software layers. · Provides a comprehensive overview of reliability modeling and optimization techniques at different hardware and software levels; · Describes novel optimization techniques for software cross-layer reliability, targeting unreliable hardware.

  12. Hardware device to physical structure binding and authentication

    Science.gov (United States)

    Hamlet, Jason R.; Stein, David J.; Bauer, Todd M.

    2013-08-20

    Detection and deterrence of device tampering and subversion may be achieved by including a cryptographic fingerprint unit within a hardware device for authenticating a binding of the hardware device and a physical structure. The cryptographic fingerprint unit includes an internal physically unclonable function ("PUF") circuit disposed in or on the hardware device, which generate an internal PUF value. Binding logic is coupled to receive the internal PUF value, as well as an external PUF value associated with the physical structure, and generates a binding PUF value, which represents the binding of the hardware device and the physical structure. The cryptographic fingerprint unit also includes a cryptographic unit that uses the binding PUF value to allow a challenger to authenticate the binding.

  13. Adaptive and accelerated tracking-learning-detection

    Science.gov (United States)

    Guo, Pengyu; Li, Xin; Ding, Shaowen; Tian, Zunhua; Zhang, Xiaohu

    2013-08-01

    An improved online long-term visual tracking algorithm, named adaptive and accelerated TLD (AA-TLD) based on Tracking-Learning-Detection (TLD) which is a novel tracking framework has been introduced in this paper. The improvement focuses on two aspects, one is adaption, which makes the algorithm not dependent on the pre-defined scanning grids by online generating scale space, and the other is efficiency, which uses not only algorithm-level acceleration like scale prediction that employs auto-regression and moving average (ARMA) model to learn the object motion to lessen the detector's searching range and the fixed number of positive and negative samples that ensures a constant retrieving time, but also CPU and GPU parallel technology to achieve hardware acceleration. In addition, in order to obtain a better effect, some TLD's details are redesigned, which uses a weight including both normalized correlation coefficient and scale size to integrate results, and adjusts distance metric thresholds online. A contrastive experiment on success rate, center location error and execution time, is carried out to show a performance and efficiency upgrade over state-of-the-art TLD with partial TLD datasets and Shenzhou IX return capsule image sequences. The algorithm can be used in the field of video surveillance to meet the need of real-time video tracking.

  14. Raspberry Pi hardware projects 1

    CERN Document Server

    Robinson, Andrew

    2013-01-01

    Learn how to take full advantage of all of Raspberry Pi's amazing features and functions-and have a blast doing it! Congratulations on becoming a proud owner of a Raspberry Pi, the credit-card-sized computer! If you're ready to dive in and start finding out what this amazing little gizmo is really capable of, this ebook is for you. Taken from the forthcoming Raspberry Pi Projects, Raspberry Pi Hardware Projects 1 contains three cool hardware projects that let you have fun with the Raspberry Pi while developing your Raspberry Pi skills. The authors - PiFace inventor, Andrew Robinson and Rasp

  15. A Hardware Abstraction Layer in Java

    DEFF Research Database (Denmark)

    Schoeberl, Martin; Korsholm, Stephan; Kalibera, Tomas

    2011-01-01

    Embedded systems use specialized hardware devices to interact with their environment, and since they have to be dependable, it is attractive to use a modern, type-safe programming language like Java to develop programs for them. Standard Java, as a platform-independent language, delegates access...... to devices, direct memory access, and interrupt handling to some underlying operating system or kernel, but in the embedded systems domain resources are scarce and a Java Virtual Machine (JVM) without an underlying middleware is an attractive architecture. The contribution of this article is a proposal...... for Java packages with hardware objects and interrupt handlers that interface to such a JVM. We provide implementations of the proposal directly in hardware, as extensions of standard interpreters, and finally with an operating system middleware. The latter solution is mainly seen as a migration path...

  16. The distributed control system of Shanghai mini-cyclotron accelerator mass spectrometer (SMCAMS)

    International Nuclear Information System (INIS)

    Shao Yuhe

    2001-01-01

    It is mainly introduced the composition, structure, hardware and software designing, function, and the method of communication between the host computer and the ADAM modules of the distributed control system on Shanghai Mini-cyclotron Accelerator Mass Spectrometer (SMCAMS). Some detail problems such as controlling the devices staying on high voltage by ADAM-4541 (RS-485 to Fiber Optic Convertor) and optical fiber are also introduced

  17. Hardware/Software Codesign in a Compact Ion Mobility Spectrometer Sensor System for Subsurface Contaminant Detection

    Directory of Open Access Journals (Sweden)

    Gribb MollyM

    2008-01-01

    Full Text Available Abstract A field-programmable-gate-array-(FPGA- based data acquisition and control system was designed in a hardware/software codesign environment using an embedded Xilinx Microblaze soft-core processor for use with a subsurface ion mobility spectrometer (IMS system, designed for detection of gaseous volatile organic compounds (VOCs. An FPGA is used to accelerate the digital signal processing algorithms and provide accurate timing and control. An embedded soft-core processor is used to ease development by implementing nontime critical portions of the design in software. The design was successfully implemented using a low-cost, off-the-shelf Xilinx Spartan-III FPGA and supporting digital and analog electronics.

  18. Superlinearly scalable noise robustness of redundant coupled dynamical systems.

    Science.gov (United States)

    Kohar, Vivek; Kia, Behnam; Lindner, John F; Ditto, William L

    2016-03-01

    We illustrate through theory and numerical simulations that redundant coupled dynamical systems can be extremely robust against local noise in comparison to uncoupled dynamical systems evolving in the same noisy environment. Previous studies have shown that the noise robustness of redundant coupled dynamical systems is linearly scalable and deviations due to noise can be minimized by increasing the number of coupled units. Here, we demonstrate that the noise robustness can actually be scaled superlinearly if some conditions are met and very high noise robustness can be realized with very few coupled units. We discuss these conditions and show that this superlinear scalability depends on the nonlinearity of the individual dynamical units. The phenomenon is demonstrated in discrete as well as continuous dynamical systems. This superlinear scalability not only provides us an opportunity to exploit the nonlinearity of physical systems without being bogged down by noise but may also help us in understanding the functional role of coupled redundancy found in many biological systems. Moreover, engineers can exploit superlinear noise suppression by starting a coupled system near (not necessarily at) the appropriate initial condition.

  19. Extreme Scale FMM-Accelerated Boundary Integral Equation Solver for Wave Scattering

    KAUST Repository

    AbdulJabbar, Mustafa Abdulmajeed

    2018-03-27

    Algorithmic and architecture-oriented optimizations are essential for achieving performance worthy of anticipated energy-austere exascale systems. In this paper, we present an extreme scale FMM-accelerated boundary integral equation solver for wave scattering, which uses FMM as a matrix-vector multiplication inside the GMRES iterative method. Our FMM Helmholtz kernels treat nontrivial singular and near-field integration points. We implement highly optimized kernels for both shared and distributed memory, targeting emerging Intel extreme performance HPC architectures. We extract the potential thread- and data-level parallelism of the key Helmholtz kernels of FMM. Our application code is well optimized to exploit the AVX-512 SIMD units of Intel Skylake and Knights Landing architectures. We provide different performance models for tuning the task-based tree traversal implementation of FMM, and develop optimal architecture-specific and algorithm aware partitioning, load balancing, and communication reducing mechanisms to scale up to 6,144 compute nodes of a Cray XC40 with 196,608 hardware cores. With shared memory optimizations, we achieve roughly 77% of peak single precision floating point performance of a 56-core Skylake processor, and on average 60% of peak single precision floating point performance of a 72-core KNL. These numbers represent nearly 5.4x and 10x speedup on Skylake and KNL, respectively, compared to the baseline scalar code. With distributed memory optimizations, on the other hand, we report near-optimal efficiency in the weak scalability study with respect to both the logarithmic communication complexity as well as the theoretical scaling complexity of FMM. In addition, we exhibit up to 85% efficiency in strong scaling. We compute in excess of 2 billion DoF on the full-scale of the Cray XC40 supercomputer.

  20. Accelerator diagnosis and control by Neural Nets

    International Nuclear Information System (INIS)

    Spencer, J.E.

    1989-01-01

    Neural Nets (NN) have been described as a solution looking for a problem. In the last conference, Artificial Intelligence (AI) was considered in the accelerator context. While good for local surveillance and control, its use for large complex systems (LCS) was much more restricted. By contrast, NN provide a good metaphor for LCS. It can be argued that they are logically equivalent to multi-loop feedback/forward control of faulty systems, and therefore provide an ideal adaptive control system. Thus, where AI may be good for maintaining a 'golden orbit,' NN should be good for obtaining it via a quantitative approach to 'look and adjust' methods like operator tweaking which use pattern recognition to deal with hardware and software limitations, inaccuracies or errors as well as imprecise knowledge or understanding of effects like annealing and hysteresis. Further, insights from NN allow one to define feasibility conditions for LCS in terms of design constraints and tolerances. Hardware and software implications are discussed and several LCS of current interest are compared and contrasted. 15 refs., 5 figs

  1. Accelerator diagnosis and control by Neural Nets

    International Nuclear Information System (INIS)

    Spencer, J.E.

    1989-01-01

    Neural Nets (NN) have been described as a solution looking for a problem. In the last conference, Artificial Intelligence (AI) was considered in the accelerator context. While good for local surveillance and control, its use for large complex systems (LCS) was much more restricted. By contrast, NN provide a good metaphore for LCS. It can be argued that they are logically equivalent to multi-loop feedback/forward control of faulty systems and therefore provide an ideal adaptive control system. Thus, where A1 may be good for maintaining a golden orbit, NN should be good for obtaining it via a quantitative approach to look and adjust methods like operator tweaking which use pattern recognition to deal with hardware and software limitations, inaccuracies or errors as well as imprecise knowledge or understanding of effects like annealing and hysteresis. Further, insights from NN allow one to define feasibility conditions for LCS in terms of design constraints and tolerances. Hardware and software implications are discussed and several LCS of current interest are compared and contrasted. 15 refs., 5 figs

  2. Development of a modular and scalable sensor system for the gathering of position and orientation of moved objects; Entwicklung eines modularen und skalierbaren Sensorsystems zur Erfassung von Position und Orientierung bewegter Objekte

    Energy Technology Data Exchange (ETDEWEB)

    Klingbeil, L.

    2006-02-15

    A modular and scalable sensor system for the estimation of position and orientation of moving objects has been developed and characterized. A sensor unit, which is mounted to the moving object, consists of acceleration -, angular rate - and magnetic field sensors for every spatial axis. Customized Kalman filter algorithms provide a robust and low latency reconstruction of the sensor's orientation. Additionally an ultrasound transducer network is used to measure the distance of a sensor unit with respect to several reference points in the room. This allows reconstruction of the absolute position using trilateration methods. The system is scalable with respect to the number of sensor units and the covered tracking volume. It is suitable for various applications for example the analysis of body movements or head tracking in augmented or virtual reality environments. (orig.)

  3. Designing Secure Systems on Reconfigurable Hardware

    OpenAIRE

    Huffmire, Ted; Brotherton, Brett; Callegari, Nick; Valamehr, Jonathan; White, Jeff; Kastner, Ryan; Sherwood, Ted

    2008-01-01

    The extremely high cost of custom ASIC fabrication makes FPGAs an attractive alternative for deployment of custom hardware. Embedded systems based on reconfigurable hardware integrate many functions onto a single device. Since embedded designers often have no choice but to use soft IP cores obtained from third parties, the cores operate at different trust levels, resulting in mixed trust designs. The goal of this project is to evaluate recently proposed security primitives for reconfigurab...

  4. Continuity-Aware Scheduling Algorithm for Scalable Video Streaming

    Directory of Open Access Journals (Sweden)

    Atinat Palawan

    2016-05-01

    Full Text Available The consumer demand for retrieving and delivering visual content through consumer electronic devices has increased rapidly in recent years. The quality of video in packet networks is susceptible to certain traffic characteristics: average bandwidth availability, loss, delay and delay variation (jitter. This paper presents a scheduling algorithm that modifies the stream of scalable video to combat jitter. The algorithm provides unequal look-ahead by safeguarding the base layer (without the need for overhead of the scalable video. The results of the experiments show that our scheduling algorithm reduces the number of frames with a violated deadline and significantly improves the continuity of the video stream without compromising the average Y Peek Signal-to-Noise Ratio (PSNR.

  5. Scalable Partitioning Algorithms for FPGAs With Heterogeneous Resources

    National Research Council Canada - National Science Library

    Selvakkumaran, Navaratnasothie; Ranjan, Abhishek; Raje, Salil; Karypis, George

    2004-01-01

    As FPGA densities increase, partitioning-based FPGA placement approaches are becoming increasingly important as they can be used to provide high-quality and computationally scalable placement solutions...

  6. Scalable domain decomposition solvers for stochastic PDEs in high performance computing

    International Nuclear Information System (INIS)

    Desai, Ajit; Pettit, Chris; Poirel, Dominique; Sarkar, Abhijit

    2017-01-01

    Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolution in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.

  7. Scalable Video Coding with Interlayer Signal Decorrelation Techniques

    Directory of Open Access Journals (Sweden)

    Yang Wenxian

    2007-01-01

    Full Text Available Scalability is one of the essential requirements in the compression of visual data for present-day multimedia communications and storage. The basic building block for providing the spatial scalability in the scalable video coding (SVC standard is the well-known Laplacian pyramid (LP. An LP achieves the multiscale representation of the video as a base-layer signal at lower resolution together with several enhancement-layer signals at successive higher resolutions. In this paper, we propose to improve the coding performance of the enhancement layers through efficient interlayer decorrelation techniques. We first show that, with nonbiorthogonal upsampling and downsampling filters, the base layer and the enhancement layers are correlated. We investigate two structures to reduce this correlation. The first structure updates the base-layer signal by subtracting from it the low-frequency component of the enhancement layer signal. The second structure modifies the prediction in order that the low-frequency component in the new enhancement layer is diminished. The second structure is integrated in the JSVM 4.0 codec with suitable modifications in the prediction modes. Experimental results with some standard test sequences demonstrate coding gains up to 1 dB for I pictures and up to 0.7 dB for both I and P pictures.

  8. Scalable Atomistic Simulation Algorithms for Materials Research

    Directory of Open Access Journals (Sweden)

    Aiichiro Nakano

    2002-01-01

    Full Text Available A suite of scalable atomistic simulation programs has been developed for materials research based on space-time multiresolution algorithms. Design and analysis of parallel algorithms are presented for molecular dynamics (MD simulations and quantum-mechanical (QM calculations based on the density functional theory. Performance tests have been carried out on 1,088-processor Cray T3E and 1,280-processor IBM SP3 computers. The linear-scaling algorithms have enabled 6.44-billion-atom MD and 111,000-atom QM calculations on 1,024 SP3 processors with parallel efficiency well over 90%. production-quality programs also feature wavelet-based computational-space decomposition for adaptive load balancing, spacefilling-curve-based adaptive data compression with user-defined error bound for scalable I/O, and octree-based fast visibility culling for immersive and interactive visualization of massive simulation data.

  9. Hardware descriptions of the I and C systems for NPP

    International Nuclear Information System (INIS)

    Lee, Cheol Kwon; Oh, In Suk; Park, Joo Hyun; Kim, Dong Hoon; Han, Jae Bok; Shin, Jae Whal; Kim, Young Bak

    2003-09-01

    The hardware specifications for I and C Systems of SNPP(Standard Nuclear Power Plant) are reviewed in order to acquire the hardware requirement and specification of KNICS (Korea Nuclear Instrumentation and Control System). In the study, we investigated hardware requirements, hardware configuration, hardware specifications, man-machine hardware requirements, interface requirements with the other system, and data communication requirements that are applicable to SNP. We reviewed those things of control systems, protection systems, monitoring systems, information systems, and process instrumentation systems. Through the study, we described the requirements and specifications of digital systems focusing on a microprocessor and a communication interface, and repeated it for analog systems focusing on the manufacturing companies. It is expected that the experience acquired from this research will provide vital input for the development of the KNICS

  10. A Fast GPU-accelerated Mixed-precision Strategy for Fully NonlinearWater Wave Computations

    DEFF Research Database (Denmark)

    Glimberg, Stefan Lemvig; Engsig-Karup, Allan Peter; Madsen, Morten G.

    2011-01-01

    We present performance results of a mixed-precision strategy developed to improve a recently developed massively parallel GPU-accelerated tool for fast and scalable simulation of unsteady fully nonlinear free surface water waves over uneven depths (Engsig-Karup et.al. 2011). The underlying wave......-preconditioned defect correction method. The improved strategy improves the performance by exploiting architectural features of modern GPUs for mixed precision computations and is tested in a recently developed generic library for fast prototyping of PDE solvers. The new wave tool is applicable to solve and analyze...

  11. Evaluation of a server-client architecture for accelerator modeling and simulation

    International Nuclear Information System (INIS)

    Bowling, B.A.; Akers, W.; Shoaee, H.; Watson, W.; Zeijts, J. van; Witherspoon, S.

    1997-01-01

    Traditional approaches to computational modeling and simulation often utilize a batch method for code execution using file-formatted input/output. This method of code implementation was generally chosen for several factors, including CPU throughput and availability, complexity of the required modeling problem, and presentation of computation results. With the advent of faster computer hardware and the advances in networking and software techniques, other program architectures for accelerator modeling have recently been employed. Jefferson Laboratory has implemented a client/server solution for accelerator beam transport modeling utilizing a query-based I/O. The goal of this code is to provide modeling information for control system applications and to serve as a computation engine for general modeling tasks, such as machine studies. This paper performs a comparison between the batch execution and server/client architectures, focusing on design and implementation issues, performance, and general utility towards accelerator modeling demands

  12. Scalable DeNoise-and-Forward in Bidirectional Relay Networks

    DEFF Research Database (Denmark)

    Sørensen, Jesper Hemming; Krigslund, Rasmus; Popovski, Petar

    2010-01-01

    In this paper a scalable relaying scheme is proposed based on an existing concept called DeNoise-and-Forward, DNF. We call it Scalable DNF, S-DNF, and it targets the scenario with multiple communication flows through a single common relay. The idea of the scheme is to combine packets at the relay...... in order to save transmissions. To ensure decodability at the end-nodes, a priori information about the content of the combined packets must be available. This is gathered during the initial transmissions to the relay. The trade-off between decodability and number of necessary transmissions is analysed...

  13. Towards the petascale in electromagnetic modeling of plasma-based accelerators for high-energy physics

    International Nuclear Information System (INIS)

    Bruhwiler, D L; Antonsen, T; Cary, J R; Cooley, J; Decyk, V K; Esarey, E; Geddes, C G R; Huang, C; Hakim, A; Katsouleas, T; Messmer, P; Mori, W B; Tsung, F S; Vieira, J; Zhou, M

    2006-01-01

    Plasma-based lepton acceleration concepts are a key element of the long-term R and D portfolio for the U.S. Office of High Energy Physics. There are many such concepts, but we consider only the laser (LWFA) and plasma (PWFA) wakefield accelerators. We present a summary of electromagnetic particle-in-cell (PIC) simulations for recent LWFA and PWFA experiments. These simulations, including both time explicit algorithms and reduced models, have effectively used terascale computing resources to support and guide experiments in this rapidly developing field. We briefly discuss the challenges and opportunities posed by the near-term availability of petascale computing hardware

  14. A Highly Parallel and Scalable Motion Estimation Algorithm with GPU for HEVC

    Directory of Open Access Journals (Sweden)

    Yun-gang Xue

    2017-01-01

    Full Text Available We propose a highly parallel and scalable motion estimation algorithm, named multilevel resolution motion estimation (MLRME for short, by combining the advantages of local full search and downsampling. By subsampling a video frame, a large amount of computation is saved. While using the local full-search method, it can exploit massive parallelism and make full use of the powerful modern many-core accelerators, such as GPU and Intel Xeon Phi. We implanted the proposed MLRME into HM12.0, and the experimental results showed that the encoding quality of the MLRME method is close to that of the fast motion estimation in HEVC, which declines by less than 1.5%. We also implemented the MLRME with CUDA, which obtained 30–60x speed-up compared to the serial algorithm on single CPU. Specifically, the parallel implementation of MLRME on a GTX 460 GPU can meet the real-time coding requirement with about 25 fps for the 2560×1600 video format, while, for 832×480, the performance is more than 100 fps.

  15. A scalable approach to modeling groundwater flow on massively parallel computers

    International Nuclear Information System (INIS)

    Ashby, S.F.; Falgout, R.D.; Tompson, A.F.B.

    1995-12-01

    We describe a fully scalable approach to the simulation of groundwater flow on a hierarchy of computing platforms, ranging from workstations to massively parallel computers. Specifically, we advocate the use of scalable conceptual models in which the subsurface model is defined independently of the computational grid on which the simulation takes place. We also describe a scalable multigrid algorithm for computing the groundwater flow velocities. We axe thus able to leverage both the engineer's time spent developing the conceptual model and the computing resources used in the numerical simulation. We have successfully employed this approach at the LLNL site, where we have run simulations ranging in size from just a few thousand spatial zones (on workstations) to more than eight million spatial zones (on the CRAY T3D)-all using the same conceptual model

  16. Constraint Solver Techniques for Implementing Precise and Scalable Static Program Analysis

    DEFF Research Database (Denmark)

    Zhang, Ye

    solver using unification we could make a program analysis easier to design and implement, much more scalable, and still as precise as expected. We present an inclusion constraint language with the explicit equality constructs for specifying program analysis problems, and a parameterized framework...... developers to build reliable software systems more quickly and with fewer bugs or security defects. While designing and implementing a program analysis remains a hard work, making it both scalable and precise is even more challenging. In this dissertation, we show that with a general inclusion constraint...... data flow analyses for C language, we demonstrate a large amount of equivalences could be detected by off-line analyses, and they could then be used by a constraint solver to significantly improve the scalability of an analysis without sacrificing any precision....

  17. Software-Controlled Dynamically Swappable Hardware Design in Partially Reconfigurable Systems

    Directory of Open Access Journals (Sweden)

    Huang Chun-Hsian

    2008-01-01

    Full Text Available Abstract We propose two basic wrapper designs and an enhanced wrapper design for arbitrary digital hardware circuit designs such that they can be enhanced with the capability for dynamic swapping controlled by software. A hardware design with either of the proposed wrappers can thus be swapped out of the partially reconfigurable logic at runtime in some intermediate state of computation and then swapped in when required to continue from that state. The context data is saved to a buffer in the wrapper at interruptible states, and then the wrapper takes care of saving the hardware context to communication memory through a peripheral bus, and later restoring the hardware context after the design is swapped in. The overheads of the hardware standardization and the wrapper in terms of additional reconfigurable logic resources and the time for context switching are small and generally acceptable. With the capability for dynamic swapping, high priority hardware tasks can interrupt low-priority tasks in real-time embedded systems so that the utilization of hardware space per unit time is increased.

  18. Fast parallel tandem mass spectral library searching using GPU hardware acceleration.

    Science.gov (United States)

    Baumgardner, Lydia Ashleigh; Shanmugam, Avinash Kumar; Lam, Henry; Eng, Jimmy K; Martin, Daniel B

    2011-06-03

    Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its sequence, traditionally accomplished by sequence database searching and more recently also by spectral library searching, has become a bottleneck in many mass spectrometry experiments. In both of these methods, the main rate-limiting step is the comparison of an acquired spectrum with all potential matches from a spectral library or sequence database. This is a highly parallelizable process because the core computational element can be represented as a simple but arithmetically intense multiplication of two vectors. In this paper, we present a proof of concept project taking advantage of the massively parallel computing available on graphics processing units (GPUs) to distribute and accelerate the process of spectral assignment using spectral library searching. This program, which we have named FastPaSS (for Fast Parallelized Spectral Searching), is implemented in CUDA (Compute Unified Device Architecture) from NVIDIA, which allows direct access to the processors in an NVIDIA GPU. Our efforts demonstrate the feasibility of GPU computing for spectral assignment, through implementation of the validated spectral searching algorithm SpectraST in the CUDA environment.

  19. Algorithmic psychometrics and the scalable subject.

    Science.gov (United States)

    Stark, Luke

    2018-04-01

    Recent public controversies, ranging from the 2014 Facebook 'emotional contagion' study to psychographic data profiling by Cambridge Analytica in the 2016 American presidential election, Brexit referendum and elsewhere, signal watershed moments in which the intersecting trajectories of psychology and computer science have become matters of public concern. The entangled history of these two fields grounds the application of applied psychological techniques to digital technologies, and an investment in applying calculability to human subjectivity. Today, a quantifiable psychological subject position has been translated, via 'big data' sets and algorithmic analysis, into a model subject amenable to classification through digital media platforms. I term this position the 'scalable subject', arguing it has been shaped and made legible by algorithmic psychometrics - a broad set of affordances in digital platforms shaped by psychology and the behavioral sciences. In describing the contours of this 'scalable subject', this paper highlights the urgent need for renewed attention from STS scholars on the psy sciences, and on a computational politics attentive to psychology, emotional expression, and sociality via digital media.

  20. Scalable Simulation of Electromagnetic Hybrid Codes

    International Nuclear Information System (INIS)

    Perumalla, Kalyan S.; Fujimoto, Richard; Karimabadi, Dr. Homa

    2006-01-01

    New discrete-event formulations of physics simulation models are emerging that can outperform models based on traditional time-stepped techniques. Detailed simulation of the Earth's magnetosphere, for example, requires execution of sub-models that are at widely differing timescales. In contrast to time-stepped simulation which requires tightly coupled updates to entire system state at regular time intervals, the new discrete event simulation (DES) approaches help evolve the states of sub-models on relatively independent timescales. However, parallel execution of DES-based models raises challenges with respect to their scalability and performance. One of the key challenges is to improve the computation granularity to offset synchronization and communication overheads within and across processors. Our previous work was limited in scalability and runtime performance due to the parallelization challenges. Here we report on optimizations we performed on DES-based plasma simulation models to improve parallel performance. The net result is the capability to simulate hybrid particle-in-cell (PIC) models with over 2 billion ion particles using 512 processors on supercomputing platforms

  1. Generation of Embedded Hardware/Software from SystemC

    Directory of Open Access Journals (Sweden)

    Dominique Houzet

    2006-08-01

    Full Text Available Designers increasingly rely on reusing intellectual property (IP and on raising the level of abstraction to respect system-on-chip (SoC market characteristics. However, most hardware and embedded software codes are recoded manually from system level. This recoding step often results in new coding errors that must be identified and debugged. Thus, shorter time-to-market requires automation of the system synthesis from high-level specifications. In this paper, we propose a design flow intended to reduce the SoC design cost. This design flow unifies hardware and software using a single high-level language. It integrates hardware/software (HW/SW generation tools and an automatic interface synthesis through a custom library of adapters. We have validated our interface synthesis approach on a hardware producer/consumer case study and on the design of a given software radiocommunication application.

  2. Generation of Embedded Hardware/Software from SystemC

    Directory of Open Access Journals (Sweden)

    Ouadjaout Salim

    2006-01-01

    Full Text Available Designers increasingly rely on reusing intellectual property (IP and on raising the level of abstraction to respect system-on-chip (SoC market characteristics. However, most hardware and embedded software codes are recoded manually from system level. This recoding step often results in new coding errors that must be identified and debugged. Thus, shorter time-to-market requires automation of the system synthesis from high-level specifications. In this paper, we propose a design flow intended to reduce the SoC design cost. This design flow unifies hardware and software using a single high-level language. It integrates hardware/software (HW/SW generation tools and an automatic interface synthesis through a custom library of adapters. We have validated our interface synthesis approach on a hardware producer/consumer case study and on the design of a given software radiocommunication application.

  3. Hardware design of the median filter based on window structure and batcher′s oddeven sort network

    Directory of Open Access Journals (Sweden)

    SUN Kaimin

    2013-06-01

    Full Text Available Area and speed are two important factors to be considered in designing Median Filter with digital circuits.Area consideration requires the use of logical resources as little as possible,while speed consideration requires the system capable of working on higher clock frequencies,with as few clock cycles as possible to complete a frame filtering or real time filtering.This paper gives a new design of Median Filter,the hardware structure of which is a 3×3 window structure with two buffers.The filter function module is based on Batcher′s Odd-Even Sort network theory.Structural design is implemented in FPGA,verified by ModelSim software and realizes video image filtering.The experimental analysis shows that this new structure of Median Filter effectively decreases logical resources (merely using 741 Logic Elements,and accelerates the pixel processing speed up to 27MHz.This filter achieves realtime processing of video images of 30 frames/s.This design not only has a certain practicality,but also provides a reference for the hardware structure design ideas in digital image processing.

  4. Cooperative communications hardware, channel and PHY

    CERN Document Server

    Dohler, Mischa

    2010-01-01

    Facilitating Cooperation for Wireless Systems Cooperative Communications: Hardware, Channel & PHY focuses on issues pertaining to the PHY layer of wireless communication networks, offering a rigorous taxonomy of this dispersed field, along with a range of application scenarios for cooperative and distributed schemes, demonstrating how these techniques can be employed. The authors discuss hardware, complexity and power consumption issues, which are vital for understanding what can be realized at the PHY layer, showing how wireless channel models differ from more traditional

  5. IDD Archival Hardware Architecture and Workflow

    Energy Technology Data Exchange (ETDEWEB)

    Mendonsa, D; Nekoogar, F; Martz, H

    2008-10-09

    This document describes the functionality of every component in the DHS/IDD archival and storage hardware system shown in Fig. 1. The document describes steps by step process of image data being received at LLNL then being processed and made available to authorized personnel and collaborators. Throughout this document references will be made to one of two figures, Fig. 1 describing the elements of the architecture and the Fig. 2 describing the workflow and how the project utilizes the available hardware.

  6. Scalable, full-colour and controllable chromotropic plasmonic printing

    Science.gov (United States)

    Xue, Jiancai; Zhou, Zhang-Kai; Wei, Zhiqiang; Su, Rongbin; Lai, Juan; Li, Juntao; Li, Chao; Zhang, Tengwei; Wang, Xue-Hua

    2015-01-01

    Plasmonic colour printing has drawn wide attention as a promising candidate for the next-generation colour-printing technology. However, an efficient approach to realize full colour and scalable fabrication is still lacking, which prevents plasmonic colour printing from practical applications. Here we present a scalable and full-colour plasmonic printing approach by combining conjugate twin-phase modulation with a plasmonic broadband absorber. More importantly, our approach also demonstrates controllable chromotropic capability, that is, the ability of reversible colour transformations. This chromotropic capability affords enormous potentials in building functionalized prints for anticounterfeiting, special label, and high-density data encryption storage. With such excellent performances in functional colour applications, this colour-printing approach could pave the way for plasmonic colour printing in real-world commercial utilization. PMID:26567803

  7. Aspects of system modelling in Hardware/Software partitioning

    DEFF Research Database (Denmark)

    Knudsen, Peter Voigt; Madsen, Jan

    1996-01-01

    This paper addresses fundamental aspects of system modelling and partitioning algorithms in the area of Hardware/Software Codesign. Three basic system models for partitioning are presented and the consequences of partitioning according to each of these are analyzed. The analysis shows...... the importance of making a clear distinction between the model used for partitioning and the model used for evaluation It also illustrates the importance of having a realistic hardware model such that hardware sharing can be taken into account. Finally, the importance of integrating scheduling and allocation...

  8. Scalable Domain Decomposed Monte Carlo Particle Transport

    Energy Technology Data Exchange (ETDEWEB)

    O' Brien, Matthew Joseph [Univ. of California, Davis, CA (United States)

    2013-12-05

    In this dissertation, we present the parallel algorithms necessary to run domain decomposed Monte Carlo particle transport on large numbers of processors (millions of processors). Previous algorithms were not scalable, and the parallel overhead became more computationally costly than the numerical simulation.

  9. 15th International Conference on Accelerator and Large Experimental Physics Control Systems

    CERN Document Server

    2015-01-01

    ICALEPCS is a biennial series of conferences that is intended to: * Provide a forum for the interchange of ideas and information between control system specialists working on large experimental physics facilities around the world (accelerators, particle detectors, fusion reactors, telescopes, etc.); * Create an archival literature of developments and progress in this rapidly changing discipline; * Promote, where practical, standardization in both hardware and software; Promote collaboration between laboratories, institutes and industry.

  10. Scalable System Design for Covert MIMO Communications

    Science.gov (United States)

    2014-06-01

    Vehicles US United States VHDL VHSIC Hardware Description Language VLSI Very Large Scale Integration WARP Wireless open-Access Research Platform WLAN ...communications, satellite radio and Wireless Local Area Network ( WLAN ) OFDM has been utilized for its multi-path resistance. OFDM relies on the...develop hardware specific to the application provides faster computation times, making FPGA development a very powerful tool. 2.5.1 MIMO Receiver Latency

  11. Brain inspired hardware architectures - Can they be used for particle physics ?

    CERN Multimedia

    CERN. Geneva

    2016-01-01

    After their inception in the 1940s and several decades of moderate success, artificial neural networks have recently demonstrated impressive achievements in analysing big data volumes. Wide and deep network architectures can now be trained using high performance computing systems, graphics card clusters in particular. Despite their successes these state-of-the-art approaches suffer from very long training times and huge energy consumption, in particular during the training phase. The biological brain can perform similar and superior classification tasks in the space and time domains, but at the same time exhibits very low power consumption, rapid unsupervised learning capabilities and fault tolerance. In the talk the differences between classical neural networks and neural circuits in the brain will be presented. Recent hardware implementations of neuromorphic computing systems and their applications will be shown. Finally, some initial ideas to use accelerated neural architectures as trigger processors i...

  12. Parallel DC3 Algorithm for Suffix Array Construction on Many-Core Accelerators

    KAUST Repository

    Liao, Gang

    2015-05-01

    In bioinformatics applications, suffix arrays are widely used to DNA sequence alignments in the initial exact match phase of heuristic algorithms. With the exponential growth and availability of data, using many-core accelerators, like GPUs, to optimize existing algorithms is very common. We present a new implementation of suffix array on GPU. As a result, suffix array construction on GPU achieves around 10x speedup on standard large data sets, which contain more than 100 million characters. The idea is simple, fast and scalable that can be easily scale to multi-core processors and even heterogeneous architectures. © 2015 IEEE.

  13. Parallel DC3 Algorithm for Suffix Array Construction on Many-Core Accelerators

    KAUST Repository

    Liao, Gang; Ma, Longfei; Zang, Guangming; Tang, Lin

    2015-01-01

    In bioinformatics applications, suffix arrays are widely used to DNA sequence alignments in the initial exact match phase of heuristic algorithms. With the exponential growth and availability of data, using many-core accelerators, like GPUs, to optimize existing algorithms is very common. We present a new implementation of suffix array on GPU. As a result, suffix array construction on GPU achieves around 10x speedup on standard large data sets, which contain more than 100 million characters. The idea is simple, fast and scalable that can be easily scale to multi-core processors and even heterogeneous architectures. © 2015 IEEE.

  14. Customizable software architectures in the accelerator control system environment

    CERN Document Server

    Mejuev, I; Kadokura, E

    2001-01-01

    Tailoring is further evolution of an application after deployment in order to adapt it to requirements that were not accounted for in the original design. End-user customization has been extensively researched in applied computer science from HCI and software engineering perspectives. Customization allows coping with flexibility requirements, decreasing maintenance and development costs of software products. In general, dynamic or diverse software requirements constitute the need for implementing end-user customization in computer systems. In accelerator physics research the factor of dynamic requirements is especially important, due to frequent software and hardware modifications resulting in correspondingly high upgrade and maintenance costs. We introduce the results of feasibility study on implementing end-user tailorability in the software for accelerator control system, considering the design and implementation of a distributed monitoring application for the 12 GeV KEK Proton Synchrotron as an example. T...

  15. Scalable Algorithms for Clustering Large Geospatiotemporal Data Sets on Manycore Architectures

    Science.gov (United States)

    Mills, R. T.; Hoffman, F. M.; Kumar, J.; Sreepathi, S.; Sripathi, V.

    2016-12-01

    The increasing availability of high-resolution geospatiotemporal data sets from sources such as observatory networks, remote sensing platforms, and computational Earth system models has opened new possibilities for knowledge discovery using data sets fused from disparate sources. Traditional algorithms and computing platforms are impractical for the analysis and synthesis of data sets of this size; however, new algorithmic approaches that can effectively utilize the complex memory hierarchies and the extremely high levels of available parallelism in state-of-the-art high-performance computing platforms can enable such analysis. We describe a massively parallel implementation of accelerated k-means clustering and some optimizations to boost computational intensity and utilization of wide SIMD lanes on state-of-the art multi- and manycore processors, including the second-generation Intel Xeon Phi ("Knights Landing") processor based on the Intel Many Integrated Core (MIC) architecture, which includes several new features, including an on-package high-bandwidth memory. We also analyze the code in the context of a few practical applications to the analysis of climatic and remotely-sensed vegetation phenology data sets, and speculate on some of the new applications that such scalable analysis methods may enable.

  16. Hardware/software virtualization for the reconfigurable multicore platform.

    NARCIS (Netherlands)

    Ferger, M.; Al Kadi, M.; Hübner, M.; Koedam, M.L.P.J.; Sinha, S.S.; Goossens, K.G.W.; Marchesan Almeida, Gabriel; Rodrigo Azambuja, J.; Becker, Juergen

    2012-01-01

    This paper presents the Flex Tiles approach for the virtualization of hardware and software for a reconfigurable multicore architecture. The approach enables the virtualization of a dynamic tile-based hardware architecture consisting of processing tiles connected via a network-on-chip and a

  17. The final technical report of the CRADA, 'Medical Accelerator Technology'

    International Nuclear Information System (INIS)

    Chu, W.T.; Rawls, J.M.

    2000-01-01

    Under this CRADA, Berkeley Lab and the industry partner, General Atomics (GA), have cooperatively developed hadron therapy technologies for commercialization. Specifically, Berkeley Lab and GA jointly developed beam transport systems to bring the extracted protons from the accelerator to the treatment rooms, rotating gantries to aim the treatment beams precisely into patients from any angle, and patient positioners to align the patient accurately relative to the treatment beams. We have also jointly developed a patient treatment delivery system that controls the radiation doses in the patient, and hardware to improve the accelerator performances, including a radio-frequency ion source and its low-energy beam transport (LEBT) system. This project facilitated the commercialization of the DOE-developed technologies in hadron therapy by the private sector in order to improve the quality of life of the nation

  18. An Integration Testing Facility for the CERN Accelerator Controls System

    CERN Document Server

    Stapley, N; Bau, J C; Deghaye, S; Dehavay, C; Sliwinski, W; Sobczak, M

    2009-01-01

    A major effort has been invested in the design, development, and deployment of the LHC Control System. This large control system is made up of a set of core components and dependencies, which although tested individually, are often not able to be tested together on a system capable of representing the complete control system environment, including hardware. Furthermore this control system is being adapted and applied to CERN's whole accelerator complex, and in particular for the forthcoming renovation of the PS accelerators. To ensure quality is maintained as the system evolves, and toimprove defect prevention, the Controls Group launched a project to provide a dedicated facility for continuous, automated, integration testing of its core components to incorporate into its production process. We describe the project, initial lessons from its application, status, and future directions.

  19. Using scalable vector graphics to evolve art

    NARCIS (Netherlands)

    den Heijer, E.; Eiben, A. E.

    2016-01-01

    In this paper, we describe our investigations of the use of scalable vector graphics as a genotype representation in evolutionary art. We describe the technical aspects of using SVG in evolutionary art, and explain our custom, SVG specific operators initialisation, mutation and crossover. We perform

  20. Better than $1/Mflops substained: a scalable PC-based parallel computer for lattice QCD

    International Nuclear Information System (INIS)

    Fodor, Z.; Papp, G.

    2002-02-01

    We study the feasibility of a PC-based parallel computer for medium to large scale lattice QCD simulations. Our cluster built at the Eoetvoes Univ., Inst. Theor. Phys. consists of 137 Intel P4-1.7 GHz nodes with 512 MB RDRAM. The 32-bit, single precision sustained performance for dynamical QCD without communication is 1510 Mflops/node with Wilson and 970 Mflops/node with staggered fermions. This gives a total performance of 208 Gflops for Wilson and 133 Gflops for staggered QCD, respectively (for 64-bit applications the performance is approximately halved). The novel feature of our system is its communication architecture. In order to have a scalable, cost-effective machine we use Gigabit Ethernet cards for nearest-neighbor communications in a two-dimensional mesh. This type of communication is cost effective (only 30% of the hardware costs is spent on the communication). According to our benchmark measurements this type of communication results in around 40% communication time fraction for lattices upto 48 3 . 96 in full QCD simulations. The price/sustained-perfomance ratio for full QCD is better than $1/Mflops for Wilson (and around $1.5/Mflops for staggered) quarks for practically any lattice size, which can fit in our parallel computer. (orig.)

  1. PERANCANGAN APLIKASI SISTEM PAKAR DIAGNOSA KERUSAKAN HARDWARE KOMPUTER METODE FORWARD CHAINING

    Directory of Open Access Journals (Sweden)

    Ali Akbar Rismayadi

    2016-09-01

    Full Text Available Abstract Damage to computer hardware, not a big disaster, because not all damage to computer hardware can not be repaired, nearly all computer users, whether public or institutions often suffer various kinds of damage that occurred in the computer hardware it has, and the damage can be caused by various factors that are basically as the user does not know the cause of what makes the computer hardware used damaged. Therefore, it is necessary to build an application that can help users to mendiganosa damage to computer hardware. So that everyone can diagnose the type of hardware damage his computer. Development of expert system diagnosis of damage to computer hardware uses forward chaining method by promoting alisisis descriptive of various damage data obtained from several experts and other sources of literature to reach a conclusion on the diagnosis of damage. As well as using the waterfall model as a model system development, starting from the analysis stage to stage software needs support. This application is built using a programming language tools Eclipse ADT as well as SQLite as its database. diagnosis expert system damage computer hardware is expected to be used as a tool to help find the causes of damage to computer hardware independently without the help of a computer technician.

  2. Flight Hardware Virtualization for On-Board Science Data Processing

    Data.gov (United States)

    National Aeronautics and Space Administration — Utilize Hardware Virtualization technology to benefit on-board science data processing by investigating new real time embedded Hardware Virtualization solutions and...

  3. Speed challenge: a case for hardware implementation in soft-computing

    Science.gov (United States)

    Daud, T.; Stoica, A.; Duong, T.; Keymeulen, D.; Zebulum, R.; Thomas, T.; Thakoor, A.

    2000-01-01

    For over a decade, JPL has been actively involved in soft computing research on theory, architecture, applications, and electronics hardware. The driving force in all our research activities, in addition to the potential enabling technology promise, has been creation of a niche that imparts orders of magnitude speed advantage by implementation in parallel processing hardware with algorithms made especially suitable for hardware implementation. We review our work on neural networks, fuzzy logic, and evolvable hardware with selected application examples requiring real time response capabilities.

  4. Computer hardware for radiologists: Part I

    International Nuclear Information System (INIS)

    Indrajit, IK; Alam, A

    2010-01-01

    Computers are an integral part of modern radiology practice. They are used in different radiology modalities to acquire, process, and postprocess imaging data. They have had a dramatic influence on contemporary radiology practice. Their impact has extended further with the emergence of Digital Imaging and Communications in Medicine (DICOM), Picture Archiving and Communication System (PACS), Radiology information system (RIS) technology, and Teleradiology. A basic overview of computer hardware relevant to radiology practice is presented here. The key hardware components in a computer are the motherboard, central processor unit (CPU), the chipset, the random access memory (RAM), the memory modules, bus, storage drives, and ports. The personnel computer (PC) has a rectangular case that contains important components called hardware, many of which are integrated circuits (ICs). The fiberglass motherboard is the main printed circuit board and has a variety of important hardware mounted on it, which are connected by electrical pathways called “buses”. The CPU is the largest IC on the motherboard and contains millions of transistors. Its principal function is to execute “programs”. A Pentium ® 4 CPU has transistors that execute a billion instructions per second. The chipset is completely different from the CPU in design and function; it controls data and interaction of buses between the motherboard and the CPU. Memory (RAM) is fundamentally semiconductor chips storing data and instructions for access by a CPU. RAM is classified by storage capacity, access speed, data rate, and configuration

  5. Computer hardware for radiologists: Part I

    Directory of Open Access Journals (Sweden)

    Indrajit I

    2010-01-01

    Full Text Available Computers are an integral part of modern radiology practice. They are used in different radiology modalities to acquire, process, and postprocess imaging data. They have had a dramatic influence on contemporary radiology practice. Their impact has extended further with the emergence of Digital Imaging and Communications in Medicine (DICOM, Picture Archiving and Communication System (PACS, Radiology information system (RIS technology, and Teleradiology. A basic overview of computer hardware relevant to radiology practice is presented here. The key hardware components in a computer are the motherboard, central processor unit (CPU, the chipset, the random access memory (RAM, the memory modules, bus, storage drives, and ports. The personnel computer (PC has a rectangular case that contains important components called hardware, many of which are integrated circuits (ICs. The fiberglass motherboard is the main printed circuit board and has a variety of important hardware mounted on it, which are connected by electrical pathways called "buses". The CPU is the largest IC on the motherboard and contains millions of transistors. Its principal function is to execute "programs". A Pentium® 4 CPU has transistors that execute a billion instructions per second. The chipset is completely different from the CPU in design and function; it controls data and interaction of buses between the motherboard and the CPU. Memory (RAM is fundamentally semiconductor chips storing data and instructions for access by a CPU. RAM is classified by storage capacity, access speed, data rate, and configuration.

  6. Accelerator Technology and High Energy Physic Experiments, WILGA 2012; EuCARD Sessions

    CERN Document Server

    Romaniuk, R S

    2012-01-01

    Wilga Sessions on HEP experiments, astroparticle physica and accelerator technology were organized under the umbrella of the EU FP7 Project EuCARD – European Coordination for Accelerator Research and Development. The paper is the second part (out of five) of the research survey of WILGA Symposium work, May 2012 Edition, concerned with accelerator technology and high energy physics experiments. It presents a digest of chosen technical work results shown by young researchers from different technical universities from this country during the XXXth Jubilee SPIE-IEEE Wilga 2012, May Edition, symposium on Photonics and Web Engineering. Topical tracks of the symposium embraced, among others, nanomaterials and nanotechnologies for photonics, sensory and nonlinear optical fibers, object oriented design of hardware, photonic metrology, optoelectronics and photonics applications, photonics-electronics co-design, optoelectronic and electronic systems for astronomy and high energy physics experiments, JET and pi-of-the ...

  7. Hardware malware

    CERN Document Server

    Krieg, Christian

    2013-01-01

    In our digital world, integrated circuits are present in nearly every moment of our daily life. Even when using the coffee machine in the morning, or driving our car to work, we interact with integrated circuits. The increasing spread of information technology in virtually all areas of life in the industrialized world offers a broad range of attack vectors. So far, mainly software-based attacks have been considered and investigated, while hardware-based attacks have attracted comparatively little interest. The design and production process of integrated circuits is mostly decentralized due to

  8. Scalable Open Source Smart Grid Simulator (SGSim)

    DEFF Research Database (Denmark)

    Ebeid, Emad Samuel Malki; Jacobsen, Rune Hylsberg; Stefanni, Francesco

    2017-01-01

    . This paper presents an open source smart grid simulator (SGSim). The simulator is based on open source SystemC Network Simulation Library (SCNSL) and aims to model scalable smart grid applications. SGSim has been tested under different smart grid scenarios that contain hundreds of thousands of households...

  9. Scientific visualization uncertainty, multifield, biomedical, and scalable visualization

    CERN Document Server

    Chen, Min; Johnson, Christopher; Kaufman, Arie; Hagen, Hans

    2014-01-01

    Based on the seminar that took place in Dagstuhl, Germany in June 2011, this contributed volume studies the four important topics within the scientific visualization field: uncertainty visualization, multifield visualization, biomedical visualization and scalable visualization. • Uncertainty visualization deals with uncertain data from simulations or sampled data, uncertainty due to the mathematical processes operating on the data, and uncertainty in the visual representation, • Multifield visualization addresses the need to depict multiple data at individual locations and the combination of multiple datasets, • Biomedical is a vast field with select subtopics addressed from scanning methodologies to structural applications to biological applications, • Scalability in scientific visualization is critical as data grows and computational devices range from hand-held mobile devices to exascale computational platforms. Scientific Visualization will be useful to practitioners of scientific visualization, ...

  10. Scalable quantum memory in the ultrastrong coupling regime.

    Science.gov (United States)

    Kyaw, T H; Felicetti, S; Romero, G; Solano, E; Kwek, L-C

    2015-03-02

    Circuit quantum electrodynamics, consisting of superconducting artificial atoms coupled to on-chip resonators, represents a prime candidate to implement the scalable quantum computing architecture because of the presence of good tunability and controllability. Furthermore, recent advances have pushed the technology towards the ultrastrong coupling regime of light-matter interaction, where the qubit-resonator coupling strength reaches a considerable fraction of the resonator frequency. Here, we propose a qubit-resonator system operating in that regime, as a quantum memory device and study the storage and retrieval of quantum information in and from the Z2 parity-protected quantum memory, within experimentally feasible schemes. We are also convinced that our proposal might pave a way to realize a scalable quantum random-access memory due to its fast storage and readout performances.

  11. Hardware-in-the-Loop Testing

    Data.gov (United States)

    Federal Laboratory Consortium — RTC has a suite of Hardware-in-the Loop facilities that include three operational facilities that provide performance assessment and production acceptance testing of...

  12. Control system for the NBS microtron accelerator

    International Nuclear Information System (INIS)

    Martin, E.R.; Trout, R.E.; Wilson, B.L.; Ayres, R.L.; Yoder, N.R.

    1985-01-01

    As various subsystems of the National Bureau of Standards/Los Alamos racetrack microtron accelerator are being brought on-line, we are gaining experience with some of the innovations implemented in the control system. Foremost among these are the joystick-based operator controls, the hierarchical distribution of control system intelligence, and the independent secondary stations, permitting sectional stand-alone operation. The result of the distributed database philosophy and parallel data links has been very fast data updates, permitting joystick interaction with system elements. The software development was greatly simplified by using the hardware arbitration of several parallel processors in the Multibus system to split the software tasks into independent modules

  13. Scalable optical quantum computer

    International Nuclear Information System (INIS)

    Manykin, E A; Mel'nichenko, E V

    2014-01-01

    A way of designing a scalable optical quantum computer based on the photon echo effect is proposed. Individual rare earth ions Pr 3+ , regularly located in the lattice of the orthosilicate (Y 2 SiO 5 ) crystal, are suggested to be used as optical qubits. Operations with qubits are performed using coherent and incoherent laser pulses. The operation protocol includes both the method of measurement-based quantum computations and the technique of optical computations. Modern hybrid photon echo protocols, which provide a sufficient quantum efficiency when reading recorded states, are considered as most promising for quantum computations and communications. (quantum computer)

  14. Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators

    International Nuclear Information System (INIS)

    Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.

    1999-01-01

    In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design

  15. SuperLU{_}DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

    Energy Technology Data Exchange (ETDEWEB)

    Li, Xiaoye S.; Demmel, James W.

    2002-03-27

    In this paper, we present the main algorithmic features in the software package SuperLU{_}DIST, a distributed-memory sparse direct solver for large sets of linear equations. We give in detail our parallelization strategies, with focus on scalability issues, and demonstrate the parallel performance and scalability on current machines. The solver is based on sparse Gaussian elimination, with an innovative static pivoting strategy proposed earlier by the authors. The main advantage of static pivoting over classical partial pivoting is that it permits a priori determination of data structures and communication pattern for sparse Gaussian elimination, which makes it more scalable on distributed memory machines. Based on this a priori knowledge, we designed highly parallel and scalable algorithms for both LU decomposition and triangular solve and we show that they are suitable for large-scale distributed memory machines.

  16. Scalable architecture for a room temperature solid-state quantum information processor.

    Science.gov (United States)

    Yao, N Y; Jiang, L; Gorshkov, A V; Maurer, P C; Giedke, G; Cirac, J I; Lukin, M D

    2012-04-24

    The realization of a scalable quantum information processor has emerged over the past decade as one of the central challenges at the interface of fundamental science and engineering. Here we propose and analyse an architecture for a scalable, solid-state quantum information processor capable of operating at room temperature. Our approach is based on recent experimental advances involving nitrogen-vacancy colour centres in diamond. In particular, we demonstrate that the multiple challenges associated with operation at ambient temperature, individual addressing at the nanoscale, strong qubit coupling, robustness against disorder and low decoherence rates can be simultaneously achieved under realistic, experimentally relevant conditions. The architecture uses a novel approach to quantum information transfer and includes a hierarchy of control at successive length scales. Moreover, it alleviates the stringent constraints currently limiting the realization of scalable quantum processors and will provide fundamental insights into the physics of non-equilibrium many-body quantum systems.

  17. Scalable force directed graph layout algorithms using fast multipole methods

    KAUST Repository

    Yunis, Enas Abdulrahman

    2012-06-01

    We present an extension to ExaFMM, a Fast Multipole Method library, as a generalized approach for fast and scalable execution of the Force-Directed Graph Layout algorithm. The Force-Directed Graph Layout algorithm is a physics-based approach to graph layout that treats the vertices V as repelling charged particles with the edges E connecting them acting as springs. Traditionally, the amount of work required in applying the Force-Directed Graph Layout algorithm is O(|V|2 + |E|) using direct calculations and O(|V| log |V| + |E|) using truncation, filtering, and/or multi-level techniques. Correct application of the Fast Multipole Method allows us to maintain a lower complexity of O(|V| + |E|) while regaining most of the precision lost in other techniques. Solving layout problems for truly large graphs with millions of vertices still requires a scalable algorithm and implementation. We have been able to leverage the scalability and architectural adaptability of the ExaFMM library to create a Force-Directed Graph Layout implementation that runs efficiently on distributed multicore and multi-GPU architectures. © 2012 IEEE.

  18. Scalability of voltage-controlled filamentary and nanometallic resistance memory devices.

    Science.gov (United States)

    Lu, Yang; Lee, Jong Ho; Chen, I-Wei

    2017-08-31

    Much effort has been devoted to device and materials engineering to realize nanoscale resistance random access memory (RRAM) for practical applications, but a rational physical basis to be relied on to design scalable devices spanning many length scales is still lacking. In particular, there is no clear criterion for switching control in those RRAM devices in which resistance changes are limited to localized nanoscale filaments that experience concentrated heat, electric current and field. Here, we demonstrate voltage-controlled resistance switching, always at a constant characteristic critical voltage, for macro and nanodevices in both filamentary RRAM and nanometallic RRAM, and the latter switches uniformly and does not require a forming process. As a result, area-scalability can be achieved under a device-area-proportional current compliance for the low resistance state of the filamentary RRAM, and for both the low and high resistance states of the nanometallic RRAM. This finding will help design area-scalable RRAM at the nanoscale. It also establishes an analogy between RRAM and synapses, in which signal transmission is also voltage-controlled.

  19. Accelerating Research Innovation by Adopting the Lean Startup Paradigm

    Directory of Open Access Journals (Sweden)

    Kaisa Still

    2017-05-01

    Full Text Available Converting scientific expertise into marketable products and services is playing an increasingly important role in the launching of new ventures, the growth of existing firms, and the creation of new jobs. In this article, we explore how the lean startup paradigm, which validates the market for a product with a business model that can sustain subsequent scaling, has led to a new process model to accelerate innovation. We then apply this paradigm to the context of research at universities and other research organizations. The article is based on the assumption that the organizational context matters, and it shows how a deeper understanding of the research context could enable an acceleration of the innovation process. We complement theoretical examples with a case example from VTT Technical Research Institute of Finland. Our findings show that many of the concepts from early-acceleration phases – and the lean startup paradigm – can also be relevant in innovation discussions within the research context. However, the phase of value-proposition discovery is less adequately addressed, and that of growth discovery, with its emphasis on building on a scalable, sustainable business does not seem to be addressed with the presented innovation approaches from the research context. Hence, the entrepreneurial activities at the research context differ from those in startups and internal startups in established organizations.

  20. Learning Machines Implemented on Non-Deterministic Hardware

    OpenAIRE

    Gupta, Suyog; Sindhwani, Vikas; Gopalakrishnan, Kailash

    2014-01-01

    This paper highlights new opportunities for designing large-scale machine learning systems as a consequence of blurring traditional boundaries that have allowed algorithm designers and application-level practitioners to stay -- for the most part -- oblivious to the details of the underlying hardware-level implementations. The hardware/software co-design methodology advocated here hinges on the deployment of compute-intensive machine learning kernels onto compute platforms that trade-off deter...